Download Talk Slides

Prediction Cubes Bee-Chung Chen, Lei Chen, Yi Lin and Raghu Ramakrishnan University of Wisconsin - Madison Subset Mining  We want to find interesting subsets of the dataset    Interestingness: Defined by the “model” built on a subset Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) We want the measures to represent decision/prediction behavior   Summarize a subset using the “model” built on it Big change from regular OLAP! 2 The Idea  Build OLAP data cubes in which cell values represent decision/prediction behavior    In effect, build a tree for each cell/region in the cube—observe that this is not the same as a collection of trees used in an ensemble method! The idea is simple, but it leads to promising data mining tools Ultimate objective: Exploratory analysis of the entire space of “data mining choices”  Choice of algorithms, data conditioning parameters … 3 Example (1/7): Regular OLAP Z: Dimensions Y: Measure Goal: Look for patterns of unusually high numbers of applications: Location Time # of App. … AL, USA … … Dec, 04 … ... 2 … WY, USA Dec, 04 3 Location All Country State Time All All Japan USA AL Norway WY Year Month All 85 86 Jan., 86 04 Dec., 86 4 Example (2/7): Regular OLAP Goal: Look for patterns of unusually high numbers of applications: Coarser regions CA USA … 04 03 … 100 80 … 90 90 … … … … Roll up 2004 2003 … Jan … Dec Jan … Dec … CA USA … 30 70 … 20 2 … 50 8 … 25 10 … 30 … … … … … Drill down … … … Cell value: Number of loan applications Z: Dimensions Y: Measure Location Time # of App. … AL, USA … … Dec, 04 … ... 2 … WY, USA Dec, 04 3 CA USA … AB … YT AL … WY … Jan 2004 … Dec 20 5 5 55 5 10 … 15 2 3 … … … … 15 20 15 … … … … Finer regions … … … … … … … … 5 Example (3/7): Decision Analysis Goal: Analyze a bank’s loan decision process w.r.t. two dimensions: Location and Time Fact table D Z: Dimensions X: Predictors Y: Class Location Time AL, USA … Dec, 04 … White … M … … … Yes … WY, USA Dec, 04 Black F … No Cube subset Race Sex … Approval Model h(X, Z(D)) E.g., decision tree Location All Country State Time All All Japan USA AL Norway WY Year Month All 85 86 Jan., 86 04 Dec., 86 6 Example (3/7): Decision Analysis  Are there branches (and time windows) where approvals were closely tied to sensitive attributes (e.g., race)?   Suppose you partitioned the training data by location and time, chose the partition for a given branch and time window, and built a classifier. You could then ask, “Are the predictions of this classifier closely correlated with race?” Are there branches and times with decision making reminiscent of 1950s Alabama?  Requires comparison of classifiers trained using different subsets of data. 7 Example (4/7): Prediction Cubes 2004 2003 … Jan … Dec Jan … Dec … CA 0.4 0.8 0.9 0.8 … … USA 0.2 0.3 0.5 … … … … … … … … … … 0.6 … 1. Build a model using data from USA in Dec., 1985 2. Evaluate that model Data [USA, Dec 04](D) Location AL ,USA … Time Race Dec, 04 White … … WY, USA Dec, 04 Black Sex … Approval M … Y … … … F … N Measure in a cell: • Accuracy of the model • Predictiveness of Race measured based on that model • Similarity between that model and a given model Model h(X, [USA, Dec 04](D)) E.g., decision tree 8 Example (5/7): Model-Similarity Given: - Data table D - Target model h0(X) - Test set  w/o labels Data table D Location Time AL, USA … Dec, 04 … White … M … … … Yes … WY, USA Dec, 04 Black F … No Race Sex … Approval 2004 2003 … Jan … Dec Jan … Dec … CA 0.4 0.2 USA 0.2 0.3 … … … 0.3 0.6 0.9 … … 0.5 … … … … … … … … Level: [Country, Month] Build a model Similarity The loan decision process in USA during Dec 04 h0(X) was similar to a discriminatory decision model Race Sex … White … F … Yes … Yes … … … Black M No … Yes Test set  9 Example (6/7): Predictiveness Given: - Data table D - Attributes V - Test set  w/o labels Data table D Location Time AL, USA … Dec, 04 … White … M … … … Yes … WY, USA Dec, 04 Black F … No Race Sex … Approval 2004 2003 … Jan … Dec Jan … Dec … CA 0.4 0.2 USA 0.2 0.3 … … … 0.3 0.6 0.9 … … 0.5 … … … … … … … … Level: [Country, Month] Yes No . . Yes h(X) Yes No . . No Build models h(XV) Predictiveness of V Race was an important predictor of loan approval decision in USA during Dec 04 … Race Sex White … F … … … Black M … Test set  10 Example (7/7): Prediction Cube 2004 2003 … Roll up 04 03 … Jan … Dec Jan … Dec … CA 0.3 0.2 … CA 0.4 0.1 0.3 0.6 0.8 … … USA 0.2 0.3 … USA 0.7 0.4 0.3 0.3 … … … … … … … … … … … … … … … Cell value: Predictiveness of Race CA Drill down USA … 2004 2003 … Jan … Dec Jan … Dec … AB 0.4 0.2 0.1 0.1 0.2 … … … 0.1 0.1 0.3 0.3 … … … YT 0.3 0.2 0.1 0.2 … … … AL 0.2 0.1 0.2 … … … … … 0.3 0.1 0.1 … … … WY 0.9 0.7 0.8 … … … … … … … … … … … … 11 Efficient Computation  Reduce prediction cube computation to data cube computation  Represent a data-mining model as a distributive or algebraic (bottom-up computable) aggregate function, so that data-cube techniques can be directly applied 12 Bottom-Up Data Cube Computation 1985 1986 1987 1988 47 107 76 67 1985 1986 1987 1988 Norway 10 30 20 24 Norway 84 … 23 45 14 32 … 114 USA 14 32 42 11 USA 99 All All All 297 All Cell Values: Numbers of loan applications 13 Functions on Sets    Bottom-up computable functions: Functions that can be computed using only summary information Distributive function: (X) = F({(X1), …, (Xn)})  X = X1  …  Xn and Xi  Xj =  E.g., Count(X) = Sum({Count(X1), …, Count(Xn)}) Algebraic function: (X) = F({G(X1), …, G(Xn)})  G(Xi) returns a length-fixed vector of values  E.g., Avg(X) = F({G(X1), …, G(Xn)})  G(Xi) = [Sum(Xi), Count(Xi)]  F({[s1, c1], …, [sn, cn]}) = Sum({si}) / Sum({ci}) 14 Scoring Function   Represent a model as a function of sets Conceptually, a machine-learning model h(X; Z(D)) is a scoring function Score(y, x; Z(D)) that gives each class y a score on test example x    h(x; Z(D)) = argmax y Score(y, x; Z(D)) Score(y, x; Z(D))  p(y | x, Z(D)) Z(D): The set of training examples (a cube subset of D) 15 Bottom-up Score Computation  Key observations:   Observation 1: Score(y, x; Z(D)) is a function of cube subset Z(D); if it is distributive or algebraic, the data cube bottom-up technique can be directly applied Observation 2: Having the scores for all the test examples and all the cells is sufficient to compute a prediction cube   Scores  predictions  cell values Details depend on what each cell means (i.e., type of prediction cubes); but straightforward 16 Machine-Learning Models  Naïve Bayes:   Kernel-density-based classifier:   Scoring function: distributive Decision tree, random forest:   Scoring function: algebraic Neither distributive, nor algebraic PBE: Probability-based ensemble (new)   To make any machine-learning model distributive Approximation 17 Probability-Based Ensemble Decision tree on [WA, 85] PBE version of decision tree on [WA, 85] 1985 Jan … W … A … … 1985 Dec Jan … Dec … W … A … Decision trees built on the lowest-level cells 18 Probability-Based Ensemble  Scoring function: hPBE ( x; S (D))  arg max y ScorePBE ( y, x; S (D)) ScorePBE ( y, x; bi (D))  h( y | x; bi (D))  g (bi | x) ScorePBE ( y, x; S (D))     iS ScorePBE ( y, x; bi (D)) h(y | x; bi(D)): Model h’s estimation of p(y | x, bi(D)) g(bi | x): A model that predicts the probability that x belongs to base subset bi(D) 19 Outline      Motivating example Definition of prediction cubes Efficient prediction cube materialization Experimental results Conclusion 20 Experiments  Quality of PBE on 8 UCI datasets  The quality of the PBE version of a model is slightly worse (0 ~ 6%) than the quality of the model trained directly on the whole training data. PBE   … W A … … 1985 1985 … … vs. … W A … … Efficiency of the bottom-up score computation technique Case study on demographic data 21 Efficiency of Bottom-up Score Computation  Machine-learning models:      J48: J48 decision tree RF: Random forest NB: Naïve Bayes KDC: Kernel-density-based classifier Bottom-up method vs. Exhaustive method  PBE-J48  PBE-RF  NB  KDC  J48ex  RFex  NBex  KDCex 22 Synthetic Dataset  Dimensions: Z1, Z2 and Z3. All All Z1 and Z2 A B C D E 0 1 2 3 4 5 6 7 8 9  0 1 Z3 n Decision rule: Condition Rule When Z1>1 Y = I(4X1+3X2+2X3+X4+0.4X6 > 7) else when Z3 mod 2 = 0 Y = I(2X1+2X2+3X3+3X4+0.4X6 > 7) else Y = I(0.1X5+X1>1) 23 Efficiency Comparison 2500 RFex Execution Time (sec) KDCex 2000 NBex 1500 Using exhaustive method J48ex NB 1000 KDC 500 0 40K RFPBE J48PBE 80K 120K 160K # of Records Using bottom-up score computation 200K 24 Related Work: Building models on OLAP Results  Multi-dimensional regression [Chen, VLDB 02]    Goal: Detect changes of trends Build linear regression models for cube cells Step-by-step regression in stream cubes [Liu, PAKDD 03]  Loglinear-based quasi cubes [Barbara, J. IIS 01] Use loglinear model to approximately compress dense regions of a data cube  NetCube [Margaritis, VLDB 01]  Build Bayes Net on the entire dataset of approximate answer count queries  25 Related Work (Contd.)  Cubegrades [Imielinski, J. DMKD 02] Extend cubes with ideas from association rules  How does the measure change when we rollup or drill down?  Constrained gradients [Dong, VLDB 01]  Find pairs of similar cell characteristics associated with big changes in measure   User-cognizant multidimensional analysis [Sarawagi, VLDBJ 01]  Help users find the most informative unvisited regions in a data cube using max entropy principle  Multi-Structural DBs [Fagin et al., PODS 05, VLDB 05] 26 Take-Home Messages  Promising exploratory data analysis paradigm:   Can use models to identify interesting subsets Concentrate only on subsets in cube space    Those are meaningful subsets, tractable Precompute results and provide the users with an interactive tool A simple way to plug “something” into cubestyle analysis:  Try to describe/approximate “something” by a distributive or algebraic function 27 Big Picture    Why stop with decision behavior? Can apply to other kinds of analyses too Why stop at browsing? Can mine prediction cubes in their own right Exploratory analysis of mining space:   Dimension attributes can be parameters related to algorithm, data conditioning, etc. Tractable evaluation is a challenge:  Large number of “dimensions”, real-valued dimension attributes, difficulties in compositional evaluation  Active learning for experiment design, extending compositional methods 28 Community Information Management (CIM) Anhai Doan University of Illinois at Urbana-Champaign Raghu Ramakrishnan University of Wisconsin-Madison UI Structured Web-Queries  Example Queries:     UI How many alumni are top-10 faculty members?  Wisconsin does very well, by the way Find trends in publications  By topic, by conference, by alumni of schools Change tracking  Alert me if my co-authors publish new papers or move to new jobs Information is extracted from text sources on the web, then queried 30 Key Ideas  Communities are ideally scoped chunks of the web for which to build enhanced portals    UI Relative uniformity in content, interests Can exploit “people power” via mass collaboration, to augment extraction CIM platform: Facilitate collaborative creation and maintenance of community portals  Extraction management   Uncertainty, provenance, maintenance, compositional inference for refining extracted information Mass collaboration for extraction and integration Watch for new DBWorld! 31 Challenges  UI User Interaction    Declarative specification of background knowledge and user feedback Intelligent prompting for user input Explanation of results 32 Challenges  UI Extraction and Query Plans    Starting from user input (ER schema, hints) and background knowledge (e.g., standard types, look-up tables), compile a query into an execution plan Must cover extraction, storage and indexing, and relational processing  And maintenance! Algebra to represent such plans? Query optimizer?  Handling uncertainty, constraints, conflicts, multiple related sources, ranking, modular architecture 33 Challenges  UI Managing extracted data     Mapping between extracted metadata and source data Uncertainty of mapping Conflicts (in user input, background knowledge, or from multiple sources) Evolution over time 34

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Talk Slides