Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MRI: Meaningful Interpretations
of Collaborative Ratings
Mahashweta Das
Gautam Das
Sihem Amer-Yahia
Cong Yu
37th International Conference on Very Large Data Bases, 2011 @ Seattle
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
2
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
3
Motivation
VLDB 2011
4
Motivation
VLDB 2011
5
Motivation
VLDB 2011
6
Motivation
Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
VLDB 2011
7
MRI Problem
Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
Novel and powerful third option: Meaningful Rating Interpretation
VLDB 2011
Explain ratings by leveraging user and item attribute information
8
MRI Problem
Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
Novel and powerful third option: Meaningful Rating Interpretation
VLDB 2011
Explain ratings by leveraging user and item attribute information
Example:
9
MRI Problem
Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
Novel and powerful third option: Meaningful Rating Interpretation
VLDB 2011
Explain ratings by leveraging user and item attribute information
Example:
10
MRI Sub-problem
DEM: Meaningful Description Mining
Identify groups of reviewers who consistently share similar
ratings on items
VLDB 2011
11
MRI Sub-problem
DEM: Meaningful Description Mining
Identify groups of reviewers who consistently share similar
ratings on items
VLDB 2011
12
MRI Sub-problem
DIM: Meaningful Difference Mining
Identify groups of reviewers who consistently disagree on item
ratings
VLDB 2011
13
MRI Sub-problem
DIM: Meaningful Difference Mining
Identify groups of reviewers who consistently disagree on item
ratings
VLDB 2011
14
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
15
Data Model
Collaborative rating site: <Set of Items, Set of Users, Ratings>
Rating tuple:
<item attributes, user attributes, rating>
ID
Title
Genre
Director
Name
Gender
Location
Rating
1
Titanic
Drama
James
Cameron
Amy
Female
New York
8.5
2
Schindler’s
List
Drama
Steven
Speilberg
John
Male
New York
7.0
Group: Set of ratings describable by a set of attribute values
Notion of group based on data cube
OLAP literature for mining multidimensional data
VLDB 2011
16
Data Model
Notion of group based on data cube lattice
Each node in lattice
is a data cube/cuboid
Query condition on
database
Figure: 4-Dimensional Data Cube Lattice
VLDB 2011
17
Data Model
Notion of group based on data cube lattice
Each node in lattice
is a data cube/cuboid
Query condition on
database
A = Gender
B = Age
C = Location
D = Occupation
Figure: 4-Dimensional Data Cube Lattice
VLDB 2011
18
Data Model
Each node/data cube/
cuboid in lattice is a group
Selection Query Condition
A = Gender: Male
B = Age: Young
C = Location: CA
D = Occupation: Student
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
19
Data Model
Each node/data cube/
cuboid in lattice is a group
Selection Query Condition
A = Gender: Male
B = Age: Young
C = Location: CA
D = Occupation: Student
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
20
Data Model
Task
Quickly indentify
“good” groups in the
lattice that help users
understand ratings
effectively
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
21
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
22
DEM: Meaningful Description Mining
For an input item covering RI ratings, return set C of cuboids, such that:
description error
is minimized, subject to:
|C| ≤ k;
coverage
≥a
Description Error
Measures how well a cuboid average rating approximates the numerical
score of each individual rating belonging to it
Coverage
Measures the percentage of ratings covered by the returned cuboids
DEM is NP-Hard: Proof details in paper
VLDB 2011
23
DEM Algorithms
Exact Algorithm (E-DEM)
Brute-force enumerating all possible combinations of cuboids in
lattice to return the exact (i.e., optimal) set as rating descriptions
Random Restart Hill Climbing Algorithm
Often fails to satisfy Coverage constraint; Large number of
restarts required
Need an algorithm that optimizes both Coverage and Description
Error constraints simultaneously
Randomized Hill Exploration Algorithm (RHE-DEM)
VLDB 2011
24
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
25
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
Say,C does not satisfy
Coverage Constraint
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
26
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
C= {Male}
{California,Student}
C= {Student}
{California,Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
27
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Say, C satisfies
Coverage Constraint
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
28
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
29
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
30
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
√
C= {Male}
{Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
31
DIM: Meaningful Difference Mining
For an input item covering RI+ RI- ratings, return set C of cuboids, such that:
difference balance
is minimized, subject to:
|C| ≤ k;
≥a∩
≥a
Difference Balance
Measures whether the positive and negative ratings are “mingled together"
(high balance) or “separated apart" (low balance)
Coverage
Measures the percentage of +, - ratings covered by the returned cuboids
DIM is NP-Hard: Proof details in paper
VLDB 2011
32
DIM Algorithms
Exact Algorithm (E-DIM)
Randomized Hill Exploration Algorithm (RHE-DIM)
Unlike DEM “error”, DIM “balance” computation is expensive
Quadratic computation scanning all possible positive and
negative ratings for each set of cuboids
Introduce the concept of Fundamental Regions to aid faster
balance computation
Partition space of all ratings and aggregate rating tuples in
each region
VLDB 2011
33
DIM Algorithms: Fundamental Region
C1 = {Male, Student}
C2 = {California, Student}
Balance =
Figure: Computing Balance using Fundamental Region
Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-)
VLDB 2011
34
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
35
Experiments
Dataset
MovieLens:100,000 ratings for 1682 movies by 943 users
Each user has 4 attributes: Gender, Age, Occupation, Location
Binning the movies: Order movies according to number of
ratings and then partition into 6 bins
Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings
Evaluation
Quantitative Indicator: Efficiency, Quality and Scalability
Qualitative Indicator: Mechanical Turk User Study
VLDB 2011
36
Quantitative Experiments: DEM
VLDB 2011
37
Quantitative Experiments: DEM
VLDB 2011
38
Qualitative Experiments: User Study
Amazon Mechanical Turk study
Two sets: one for description mining, one for difference mining
Each set: 4 randomly chosen movies, 30 independent singleuser tasks
Study 1: Users prefer simple aggregate ratings over rating
interpretations
Study 2: Users prefer rating interpretations by exact algorithm or
heuristic randomized hill exploration algorithm
VLDB 2011
39
Qualitative Experiments: User Study
VLDB 2011
40
Roadmap
Introduction
Motivation
Problem: MRI
Sub problem: DEM
Sub problem: DIM
Data Model
Algorithms
Experiments
Quantitative
Qualitative
Conclusion & Future Work
VLDB 2011
41
Conclusion and Future Work
Novel problem of meaningful rating interpretation (MRI) in
collaborative rating sites
Meaningful Description Mining
Meaningful Difference Mining
Heuristic algorithmic solutions that generate equally good rating
interpretations as exact brute-force with much less execution time
Meaningful interpretations of ratings by reviewers of interest
Additional constraints such as diversity of rating explanations
VLDB 2011
42
Related Work
Data Cubes
Clustering & Dimensionality Reduction
Gray et. al, A relational aggregation operator generalizing group-by, cross-tab,
and sub-totals, ICDE 1996
Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001
Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data
cube, VLDB 2002
Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006
Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009
Agrawal et. al, Automatic subspace clustering of high dimensional data for data
mining applications, SIGMOD 1998
Recommendation Explanation
Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000
Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005
VLDB 2011
43
Thank You
Questions
Quantitative Experiments: DIM
VLDB 2011
45
Quantitative Experiments: DEM, DIM
VLDB 2011
46