Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden) 1 Market basket mining algorithms Find prevalent rules that hold over large fractions of data Useful for promotions and store arrangement Intensively researched 1990 Milk and cereal sell together! 2 Prevalent Interesting Analysts already know about prevalent rules Interesting rules are those that deviate from prior expectation Mining’s payoff is in finding surprising phenomena Zzzz... 1995 Milk and cereal sell together! 1998 Milk and cereal sell together! 3 What makes a rule surprising? Does not match prior expectation Correlation between milk and cereal remains roughly constant over time Cannot be trivially derived from simpler rules Milk 10%, cereal 10% Milk and cereal 10% … surprising Eggs 10% Milk, cereal and eggs 0.1% … surprising! Expected 1% 4 Two views on data mining Data Data Mining Program Model of Analyst’s Knowledge of the Data Discovery Mining Program Discovery Analyst 5 Our contributions A new notion of surprising patterns Detect changes in correlation along time Filter out steady, uninteresting correlations Algorithms to mine for surprising patterns Encode data into bit streams using two models Surprise = difference in number of bits needed Experimental results Demonstrate superiority over prevalent patterns 6 A simpler problem: one item Milk-buying habits modeled by biased coin Customer tosses this coin to decide whether to buy milk Head or “1” denotes “basket contains milk” Coin bias is Pr[milk] Analyst wants to study Pr[milk] along time Single coin with fixed bias is not interesting Changes in bias are interesting 7 The coin segmentation problem Players A and B A has a set of coins with different biases A repeatedly Picks arbitrary coin Tosses it arbitrary number of times B observes H/T Guesses transition points and biases Return Pick A Toss 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1 B 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1 8 How to explain the data Given n head/tail observations Can assume n different coins with bias 0 or 1 • Data fits perfectly (with probability one) • Many coins needed Or assume one coin • May fit data poorly “Best explanation” is a compromise 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1/4 5/7 1/3 9 Coding examples Sequence of k zeroes Naïve encoding takes k bits Run length takes about log k bits 1000 bits, 10 randomly placed 1’s, rest 0’s Posit a coin with bias 0.01 Data encoding cost is (Shannon’s theorem): 10 log 0.01 990 log 0.99 81 bits « 1000 bits Note that 10 log100 66 bits 10 How to find optimal segments Sequence of 17 tosses: 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1 Derived graph with 18 nodes: Shortest path Edge cost = model cost + data cost Data cost for Pr[head] = 5/7, 5 heads, 2 tails Model cost = one node ID + one Pr[head] 11 Approximate shortest path Suppose there are T tosses Make T1– chunks each with T nodes (tune ) Find shortest paths within chunks Some nodes are chosen in each chunk Solve a shortest path with all chosen nodes 12 Two or more items “Unconstrained” segmentation items induce a 2k sided coin “milk and cereal” = 11, “milk, not cereal” = 10, “neither” = 00, etc. k 00 01 10 11 Shortest path finds significant shift in any of the coin face probabilities Problem: some of these shifts may be completely explained by lower order marginal 13 Example Theta=2 0.4 0.3 Support Drop in joint sale of milk and cereal is completely explained by drop in sale of milk Pr[milk & cereal] / (Pr[milk] Pr[cereal]) remains constant over time Call this ratio 0.2 0.1 0 0 Milk 2 4 6 Time Cereal 8 10 Both 14 Constant- segmentation Observed support p11 p11 p1 p1 ( p10 p11 )( p01 p11 ) Independence Compute global over all time All coins must have this common value of Segment by constrained optimization Compare with unconstrained coding cost 15 Is all this really needed? Simpler alternative Aggregate data into suitable time windows Compute support, correlation, , etc. in each window Use variance threshold to choose itemsets Pitfalls Choices: windows, thresholds May miss fine detail Over-sensitive to outliers 16 … but no simpler Smoothing leads to an estimated trend that is descriptive rather than analytic or explanatory. Because it is not based on an explicit probabilistic model, the method cannot be treated rigorously in terms of mathematical statistics. The Statistical Analysis of Time Series T. W. Anderson 17 Experiments 2.8 million baskets over 7 years, 1987-93 15800 items, average 2.62 items per basket Two algorithms Complete MDL approach MDL segmentation + statistical tests (MStat) Anecdotes MDL effective at penalizing obvious itemsets 18 0.002 1800 0.0018 1600 0.0016 1400 0.0014 1200 0.0012 Time(s) Approx/OPT-1 Quality of approximation 1000 0.001 0.0008 800 600 0.0006 400 0.0004 0.0002 200 0 0 0.3 0.5 0.7 Epsilon 0.9 0.3 0.5 0.7 0.9 Epsilon 19 1600 1600 1200 1200 Rank(MDL) Rank(MDL) Little agreement in itemset ranks 800 800 400 400 0 0 0 400 800 1200 1600 Rank(Stat, 4 week) 0 400 800 1200 1600 Rank(MStat) Simpler methods do not approximate MDL 20 2000 MDL 1500 1000 500 0 -2000 0 2000 4000 6000 Score Freq Freq MDL has high selectivity 1800 1600 1400 1200 1000 800 600 400 200 0 MStat 0 5 10 15 Score Score of best itemsets stand out from the rest using MDL 21 Three anecdotes against time High MStat score 20 15 10 5 Small marginals Polo shirt & shorts 0 600 500 High correlation 400 300 Small % variation Bedsheets & pillow cases High MDL score 200 100 0 40 30 Significant gradual drift Men’s & women’s shorts 20 10 0 22 Conclusion New notion of surprising patterns based on Joint support expected from marginals Variation of joint support along time Robust MDL formulation Efficient algorithms Near-optimal segmentation using shortest path Pruning criteria Successful application to real data 23