Download Collaborative Filtering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Trusted Computing wikipedia , lookup

Theoretical computer science wikipedia , lookup

Natural computing wikipedia , lookup

Web of trust wikipedia , lookup

Speech-generating device wikipedia , lookup

Recommender system wikipedia , lookup

Transcript
Collaborative Data Analysis
and Multi-Agent Systems
Robert W. Thomas
CSCE 824
15 APR 2013
Agenda
•
•
•
•
Problem Description
Existing Research Overview
Limitation of Existing Results
Future Research Suggestions
2
Problem Description
• Information Overload
• Divide and Conquer; Reconcile
• Recommender Systems and Social Media
– Content Filtering
– Collaborative Filtering
– Collaborative Data Analysis through Agents
3
Content Filtering
• Recommendations based on items similar to
what has been preferred previously
4
Collaborative Filtering (CF)
• Recommendations based on what others in a
network prefer
• Different Techniques
– Memory-Based
– Model-Based
– Hybrid
5
Memory-Based CF
• Similarity Computation
• Prediction and Recommendation Computation
• Top-N Recommendations
6
Similarity Computation
• Compares Users or Items
• Correlation-Based (Pearson correlation)
• 𝑊𝑢,𝑣 =
• 𝑊𝑖,𝑗 =
𝑖∈𝐼(𝑟𝑢,𝑖 −𝑟𝑢 )(𝑟𝑣,𝑖 −𝑟𝑣 )
𝑖∈𝐼(𝑟𝑢,𝑖 −𝑟𝑢 )
2
2
𝑖∈𝐼(𝑟𝑣,𝑖 −𝑟𝑣 )
𝑢∈𝑈(𝑟𝑢,𝑖 −𝑟𝑖 )(𝑟𝑢,𝑗 −𝑟𝑗 )
2
𝑢∈𝑈(𝑟𝑢,𝑖 −𝑟𝑖 )
• Vector Cosine-Based
• 𝑊𝑖,𝑗 = cos 𝑖, 𝑗 =
𝑖∙𝑗
𝑖 ∗ 𝑗
2
𝑢∈𝑈(𝑟𝑢,𝑗 −𝑟𝑗 )
Two users: u,v
Two items: i,j
𝑖 ∈ 𝐼= items both u and v have
rated
𝑟𝑢 = avg rating of co-rated
items of the 𝑢𝑡ℎ user
𝑢 ∈ 𝑈= users who rated both i
and j
𝑟𝑖 = avg rating of the 𝑖 𝑡ℎ item
by those users
R = m x n user-item matrix
𝑖, 𝑗 are n dimensional vectors
corresponding to i and j
column of R
7
Prediction and Recommendation
Computation
• Weighted Sum of Others’ Ratings
– 𝑃𝑎,𝑖 = 𝑟𝑎 +
𝑢∈𝑈(
𝑟𝑢,𝑖 −𝑟𝑢 𝑤𝑎,𝑢 )
𝑢∈𝑈
𝑤
• Simple Weighted Average
– 𝑃𝑢,𝑖 =
𝑛∈𝑁 𝑟𝑢,𝑛 𝑤𝑖,𝑛
𝑛∈𝑁
𝑤𝑖,𝑛
Prediction P for active user a,
on item i
𝑟𝑢 = avg rating of user u
𝑤𝑎,𝑢 = weight between user a
and user u
𝑢 ∈ 𝑈= users who rated item i
Prediction P for user u on item i
𝑛 ∈ 𝑁= all other rated items
for user u
𝑤𝑖,𝑛 = weight between items i
and n
𝑟𝑢,𝑛 = rating for user u on item n
8
Top-N Recommendations
• Item-Based
• User-Based
9
Model-Based CF
•
•
•
•
•
Bayesian Belief Net
Clustering
Regression-Based
Markov Decision Process (MDP) –Based
Latent Semantic
10
Bayesian Belief Net
• Bayesian logic – decision making and inferential statistics
• Simple Bayesian
– Memory-Based
– 𝑐𝑙𝑎𝑠𝑠 = arg
max
𝑗∈𝑐𝑙𝑎𝑠𝑠𝑆𝑒𝑡
𝑝(𝑐𝑙𝑎𝑠𝑠𝑗 )
𝑜 𝑃(𝑋𝑜
= 𝑥𝑜 |𝑐𝑙𝑎𝑠𝑠𝑗 )
– Laplace Estimator to avoid a conditional probability of 0
– 𝑃 𝑋𝑖 = 𝑥𝑖 | 𝑌 = 𝑦 =
#(𝑋𝑖 =𝑥𝑖 ,𝑌=𝑦)+1
#(𝑌=𝑦)+ 𝑋𝑖
• Tree Augmented naïve Bayes and naïve Bayes optimized by
Extended Logic Regression (ELR)
– Require extended training periods to produce results beyond
simple Bayesian and Pearson correlation
11
Clustering
• Cluster: collection of similar objects, dissimilar
to objects in other clusters
– Pearson correlation can be used
• Three Categories
– Partitioning
– Density-based
– Hierarchal
• Often an Intermediate Step
12
Regression-Based
• Use approximation of ratings to make
predictions against a regression model
• Apply to situations where rating vectors have
large Euclidean distances but very high
Similarity Computation scores
13
MDP-Based
• Sequential Optimization Problem
• <S,A,R,Pr>
– S = {states}
– A = {actions}
– R = {rewards} for r(s,a,s’)
– Pr = {transition probabilities} for pr(s,a,s’)
• Partially Observable MDP (POMDP)
14
Latent Semantic
• Uses statistical modeling to discover
additional communities or profiles
15
Network Trust
• We’re all mad here; I’m mad; you’re mad.
• Opinions of different contacts are valued more
than others under certain conditions
• Accounting for this can increase CF accuracy
• Semantic Knowledge
• Social Tie-Strength
16
Hybrid CF
• CF + Content-Based
• CF + CF
• CF + CF and/or Content-Based
17
Limitations of Existing Solutions
•
•
•
•
•
•
•
•
Time / Accuracy Trade Offs
Noisy Data
Data Sparsity (New User)
Scalability
Synonymy
Gray Sheep
Shilling Attacks
Privacy
18
Future Research Suggestions
•
•
•
•
Hybrids
Semantics
Trust
Parallel Processing
– Multi-Agent Systems
19
BACKUP
20
References
• Su, Xiaoyuan, and Taghi M. Khoshgoftaar. "A survey of
collaborative filtering techniques." Advances in
Artificial Intelligence 2009 (2009): 4.
• Chen, Wei, and Simon Fong. "Social network
collaborative filtering framework and online trust
factors: a case study on Facebook." Digital Information
Management (ICDIM), 2010 Fifth International
Conference on. IEEE, 2010.
• O'Donovan, John, and Barry Smyth. "Trust in
recommender systems." Proceedings of the 10th
international conference on Intelligent user interfaces.
ACM, 2005.
21