Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh Outline • • • • • • • Introduction Mining the Web For Expert Ratings Expert Nearest-Neighbors Result User Study Discussion Conclusion Introduction • Nearest-neighbor collaborative filtering suffers from the following shortcomings - Data sparsity - Noise - Cold-start problem - Scalability Cause the problem of defining the similarity Mining the Web For Expert Ratings • Collect 8,000 movies from Netflix • 1,750 experts from “Rotten Tomatoes” - Remove those experts who rated less than 250 movies - Only 169 experts left • General users: members in the “Netflix” Mining the Web For Expert Ratings Mining the Web For Expert Ratings • Dataset analysis: Data sparsity Experts achieve lower data sparsity Mining the Web For Expert Ratings • Dataset analysis: Average rating distribution Expert Nearest-Neighbors • User-based Collaborative Filtering (CF) is used • Two stages 1. construct user-item matrix 2. construct user-expert matrix Expert Nearest-Neighbors • Construct user-expert matrix - Calculate similarities between users and expert (r , r ) r r i sim (a, b) ai i 2 bi i ai 2 bi 2 N a b N a Nb N a : total number of items user " a" has rated N a b : total number of items user " a" and user b have rated rai : rating for item i from user " a" Expert Nearest-Neighbors • Only select the experts such that sim (e, a) • A threshold as the minimum number of expert neighbors who must have rate the item Expert Nearest-Neighbors • Construct user-expert matrix - Rating prediction rai u e E (rei - e ) sim(e, a) sim(e, a) E {e1 , e2 ,..., en } where n u , e : mean ratings with respect to user and expert rai : predicted rating for item i from user Result • Error in Predicted Recommendations - 5-fold cross validation - Base line: rai e without similarity calculation Result Expert CF and Neighbor CF: 0.01 10 Result Result Result Result • Top-N Recommendation Precision - No recommendation list with fixed number of N items - Only classify items into recommendable and not recommendable given a threshold Result 0.01, 10 User Study • Select 500,000 movies from Netflix (random sample) • Separate movies into 10 equal density bins according to the popularity • Select 10 movies from each bin for 57 participants to rate (total 100 movies) User Study • 8,000 movies and theirs rating are collected • Generate 4 top-10 recommendation list - Random List - Critics Choice - Neighbor-CF 0.01, 10 - Expert-CF User Study • K=50 User Study User Result Discussion • Data sparsity - Experts are more likely to have rated a large percentage of the items • Noise and malicious ratings - Noise: Experts are more consistent with their ratings - Malicious ratings: Experts rate items in good manner • Cold start problem • Scalability - Reduce the time complexity to compute the similarity matrix Conclusions • Proposed an approach to recommend, based on the opinion of an external source. • Using few experts to predict. • Tackle the problems of traditional KNN-CF