Download Slides

MRI: Meaningful Interpretations of Collaborative Ratings Mahashweta Das Gautam Das Sihem Amer-Yahia Cong Yu 37th International Conference on Very Large Data Bases, 2011 @ Seattle Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 2 Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 3 Motivation VLDB 2011 4 Motivation VLDB 2011 5 Motivation VLDB 2011 6 Motivation   Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough VLDB 2011 7 MRI Problem   Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough  Novel and powerful third option: Meaningful Rating Interpretation  VLDB 2011 Explain ratings by leveraging user and item attribute information 8 MRI Problem   Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough  Novel and powerful third option: Meaningful Rating Interpretation   VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 9 MRI Problem   Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough  Novel and powerful third option: Meaningful Rating Interpretation   VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 10 MRI Sub-problem  DEM: Meaningful Description Mining  Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 11 MRI Sub-problem  DEM: Meaningful Description Mining  Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 12 MRI Sub-problem  DIM: Meaningful Difference Mining  Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 13 MRI Sub-problem  DIM: Meaningful Difference Mining  Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 14 Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 15 Data Model  Collaborative rating site: <Set of Items, Set of Users, Ratings>  Rating tuple: <item attributes, user attributes, rating> ID Title Genre Director Name Gender Location Rating 1 Titanic Drama James Cameron Amy Female New York 8.5 2 Schindler’s List Drama Steven Speilberg John Male New York 7.0  Group: Set of ratings describable by a set of attribute values  Notion of group based on data cube  OLAP literature for mining multidimensional data VLDB 2011 16 Data Model  Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database Figure: 4-Dimensional Data Cube Lattice VLDB 2011 17 Data Model  Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database A = Gender B = Age C = Location D = Occupation Figure: 4-Dimensional Data Cube Lattice VLDB 2011 18 Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 19 Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 20 Data Model Task Quickly indentify “good” groups in the lattice that help users understand ratings effectively Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 21 Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 22 DEM: Meaningful Description Mining  For an input item covering RI ratings, return set C of cuboids, such that:  description error is minimized, subject to:  |C| ≤ k;  coverage ≥a Description Error Measures how well a cuboid average rating approximates the numerical score of each individual rating belonging to it Coverage Measures the percentage of ratings covered by the returned cuboids  DEM is NP-Hard: Proof details in paper VLDB 2011 23 DEM Algorithms  Exact Algorithm (E-DEM)  Brute-force enumerating all possible combinations of cuboids in lattice to return the exact (i.e., optimal) set as rating descriptions  Random Restart Hill Climbing Algorithm  Often fails to satisfy Coverage constraint; Large number of restarts required  Need an algorithm that optimizes both Coverage and Description Error constraints simultaneously  Randomized Hill Exploration Algorithm (RHE-DEM) VLDB 2011 24 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 25 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Say,C does not satisfy Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 26 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} C= {Male} {California,Student} C= {Student} {California,Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 27 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Say, C satisfies Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 28 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 29 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 30 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error √ C= {Male} {Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 31 DIM: Meaningful Difference Mining  For an input item covering RI+ RI- ratings, return set C of cuboids, such that:  difference balance is minimized, subject to:  |C| ≤ k;  ≥a∩ ≥a Difference Balance Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance) Coverage Measures the percentage of +, - ratings covered by the returned cuboids  DIM is NP-Hard: Proof details in paper VLDB 2011 32 DIM Algorithms  Exact Algorithm (E-DIM)  Randomized Hill Exploration Algorithm (RHE-DIM)   Unlike DEM “error”, DIM “balance” computation is expensive  Quadratic computation scanning all possible positive and negative ratings for each set of cuboids Introduce the concept of Fundamental Regions to aid faster balance computation  Partition space of all ratings and aggregate rating tuples in each region VLDB 2011 33 DIM Algorithms: Fundamental Region C1 = {Male, Student} C2 = {California, Student} Balance = Figure: Computing Balance using Fundamental Region Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-) VLDB 2011 34 Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 35 Experiments   Dataset MovieLens:100,000 ratings for 1682 movies by 943 users Each user has 4 attributes: Gender, Age, Occupation, Location Binning the movies: Order movies according to number of ratings and then partition into 6 bins       Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings Evaluation Quantitative Indicator: Efficiency, Quality and Scalability Qualitative Indicator: Mechanical Turk User Study VLDB 2011 36 Quantitative Experiments: DEM VLDB 2011 37 Quantitative Experiments: DEM VLDB 2011 38 Qualitative Experiments: User Study  Amazon Mechanical Turk study  Two sets: one for description mining, one for difference mining  Each set: 4 randomly chosen movies, 30 independent singleuser tasks   Study 1: Users prefer simple aggregate ratings over rating interpretations Study 2: Users prefer rating interpretations by exact algorithm or heuristic randomized hill exploration algorithm VLDB 2011 39 Qualitative Experiments: User Study VLDB 2011 40 Roadmap      Introduction  Motivation  Problem: MRI  Sub problem: DEM  Sub problem: DIM Data Model Algorithms Experiments  Quantitative  Qualitative Conclusion & Future Work VLDB 2011 41 Conclusion and Future Work  Novel problem of meaningful rating interpretation (MRI) in collaborative rating sites  Meaningful Description Mining  Meaningful Difference Mining  Heuristic algorithmic solutions that generate equally good rating interpretations as exact brute-force with much less execution time  Meaningful interpretations of ratings by reviewers of interest  Additional constraints such as diversity of rating explanations VLDB 2011 42 Related Work  Data Cubes       Clustering & Dimensionality Reduction   Gray et. al, A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, ICDE 1996 Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001 Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data cube, VLDB 2002 Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006 Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009 Agrawal et. al, Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD 1998 Recommendation Explanation   Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000 Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005 VLDB 2011 43 Thank You Questions Quantitative Experiments: DIM VLDB 2011 45 Quantitative Experiments: DEM, DIM VLDB 2011 46

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slides