Download Data sparsity

The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh Outline • • • • • • • Introduction Mining the Web For Expert Ratings Expert Nearest-Neighbors Result User Study Discussion Conclusion Introduction • Nearest-neighbor collaborative filtering suffers from the following shortcomings - Data sparsity - Noise - Cold-start problem - Scalability Cause the problem of defining the similarity Mining the Web For Expert Ratings • Collect 8,000 movies from Netflix • 1,750 experts from “Rotten Tomatoes” - Remove those experts who rated less than   250 movies - Only 169 experts left • General users: members in the “Netflix” Mining the Web For Expert Ratings Mining the Web For Expert Ratings • Dataset analysis: Data sparsity Experts achieve lower data sparsity Mining the Web For Expert Ratings • Dataset analysis: Average rating distribution Expert Nearest-Neighbors • User-based Collaborative Filtering (CF) is used • Two stages 1. construct user-item matrix 2. construct user-expert matrix Expert Nearest-Neighbors • Construct user-expert matrix - Calculate similarities between users and expert  (r , r ) r r i sim (a, b)  ai i 2 bi i ai  2 bi 2 N a b N a  Nb N a : total number of items user " a" has rated N a b : total number of items user " a" and user b have rated rai : rating for item i from user " a" Expert Nearest-Neighbors • Only select the experts such that sim (e, a)   • A threshold  as the minimum number of expert neighbors who must have rate the item Expert Nearest-Neighbors • Construct user-expert matrix - Rating prediction rai   u   e E (rei -  e ) sim(e, a)  sim(e, a) E  {e1 , e2 ,..., en } where n    u ,  e : mean ratings with respect to user and expert rai : predicted rating for item i from user Result • Error in Predicted Recommendations - 5-fold cross validation - Base line: rai   e without similarity calculation Result Expert CF and Neighbor CF:   0.01   10 Result Result Result Result • Top-N Recommendation Precision - No recommendation list with fixed number of N items - Only classify items into recommendable and not recommendable given a threshold  Result   0.01,   10 User Study • Select 500,000 movies from Netflix (random sample) • Separate movies into 10 equal density bins according to the popularity • Select 10 movies from each bin for 57 participants to rate (total 100 movies) User Study • 8,000 movies and theirs rating are collected • Generate 4 top-10 recommendation list - Random List - Critics Choice - Neighbor-CF   0.01,   10 - Expert-CF User Study • K=50 User Study User Result Discussion • Data sparsity - Experts are more likely to have rated a large percentage of the items • Noise and malicious ratings - Noise: Experts are more consistent with their ratings - Malicious ratings: Experts rate items in good manner • Cold start problem • Scalability - Reduce the time complexity to compute the similarity matrix Conclusions • Proposed an approach to recommend, based on the opinion of an external source. • Using few experts to predict. • Tackle the problems of traditional KNN-CF

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data sparsity