Download Data sparsity

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Wisdom of the Few
Xavier Amatrian, Neal Lathis, Josep M. Pujol
SIGIR’09
Advisor: Jia Ling, Koh
Speaker: Yu Cheng, Hsieh
Outline
•
•
•
•
•
•
•
Introduction
Mining the Web For Expert Ratings
Expert Nearest-Neighbors
Result
User Study
Discussion
Conclusion
Introduction
• Nearest-neighbor collaborative filtering suffers
from the following shortcomings
- Data sparsity
- Noise
- Cold-start problem
- Scalability
Cause the problem of defining the similarity
Mining the Web For Expert Ratings
• Collect 8,000 movies from Netflix
• 1,750 experts from “Rotten Tomatoes”
- Remove those experts who rated less than
  250 movies
- Only 169 experts left
• General users: members in the “Netflix”
Mining the Web For Expert Ratings
Mining the Web For Expert Ratings
• Dataset analysis: Data sparsity
Experts achieve lower data sparsity
Mining the Web For Expert Ratings
• Dataset analysis: Average rating distribution
Expert Nearest-Neighbors
• User-based Collaborative Filtering (CF) is
used
• Two stages
1. construct user-item matrix
2. construct user-expert matrix
Expert Nearest-Neighbors
• Construct user-expert matrix
- Calculate similarities between users and expert
 (r , r )
r r
i
sim (a, b) 
ai
i
2
bi
i
ai

2
bi
2 N a b
N a  Nb
N a : total number of items user " a" has rated
N a b : total number of items user " a" and user b have rated
rai : rating for item i from user " a"
Expert Nearest-Neighbors
• Only select the experts such that sim (e, a)  
• A threshold  as the minimum number of
expert neighbors who must have rate the
item
Expert Nearest-Neighbors
• Construct user-expert matrix
- Rating prediction
rai   u


e E
(rei -  e ) sim(e, a)
 sim(e, a)
E  {e1 , e2 ,..., en } where n  
 u ,  e : mean ratings with respect to
user and expert
rai : predicted rating for item
i
from user
Result
• Error in Predicted Recommendations
- 5-fold cross validation
- Base line: rai   e without similarity
calculation
Result
Expert CF and Neighbor CF:   0.01   10
Result
Result
Result
Result
• Top-N Recommendation Precision
- No recommendation list with fixed number
of N items
- Only classify items into recommendable
and not recommendable given a
threshold 
Result
  0.01,   10
User Study
• Select 500,000 movies from Netflix
(random sample)
• Separate movies into 10 equal density
bins according to the popularity
• Select 10 movies from each bin for 57
participants to rate (total 100 movies)
User Study
• 8,000 movies and theirs rating are collected
• Generate 4 top-10 recommendation list
- Random List
- Critics Choice
- Neighbor-CF
  0.01,   10
- Expert-CF
User Study
• K=50
User Study
User Result
Discussion
• Data sparsity
- Experts are more likely to have rated a large percentage
of the items
• Noise and malicious ratings
- Noise: Experts are more consistent with their
ratings
- Malicious ratings: Experts rate items in good manner
• Cold start problem
• Scalability
- Reduce the time complexity to compute
the similarity matrix
Conclusions
• Proposed an approach to recommend,
based on the opinion of an external source.
• Using few experts to predict.
• Tackle the problems of traditional KNN-CF