Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi) 2 An Example 3 Terminology Transaction Item Itemset 4 Association Rules Let U be a set of items and let X, Y U, with X Y = An association rule is an expression of the form X Y, whose meaning is: If the elements of X occur in some context, then so do the elements of Y 5 Quality Measures Let T be set of all transactions. The following statistical quantities are relevant to association rule mining: support(X) confidence(X Y) The percentage of all |{t T: X t}| / |T| transactions, containing item set x support(X Y) The percentage of all transactions, |{t T: XY t}| / |T| containing both item sets x and y The percentage of transactions |{t T: XY t}| / |{t T: X t}| containing item set x, that also contain item set y. How good is item set x at predicting item set y. 6 Learning Associations The purpose of association rule learning is to find “interesting” rules, i.e., rules that meet the following two user-defined conditions: support(X Y) MinSupport confidence(X Y) MinConfidence 7 Itemsets Frequent itemset An itemset whose support is greater than MinSupport (denoted Lk where k is the size of percentage of transactions contain the full item the itemset) High set. Candidate itemset A potentially frequent itemset (denoted Ck where k is the size of the itemset) 8 Basic Idea Generate all frequent itemsets satisfying the condition on minimum support Build all possible rules from these itemsets and check them against the condition on minimum confidence All the rules above the minimum confidence threshold are returned for further evaluation 9 10 11 12 14 15 16 17 18 19 20 21 AprioriAll (I) L1 For each item Ij I count({Ij}) = | {Ti : Ij Ti} | The number of all transactions, containing item I_j If count({Ij}) MinSupport x m If this count is big enough, we add the item k2 L1 L1 {({Ij}, count({Ij})} While Lk-1 Lk For each (l1, count(l1)) Lk-1 For each (l2, count(l2)) Lk-1 and count to a stack, L_1 If (l1 = {j1, …, jk-2, x} l2 = {j1, …, jk-2, y} x y) l {j1, …, jk-2, x, y} count(l) | {Ti : l Ti } | If count(l) MinSupport x m Lk Lk {(l, count(l))} kk+1 Return L1 L2… Lk-1 22 Rule Generation Look at set {a,d,e} Has six candidate association rules: {a}{d,e} {d,e}{a} {d}{a,e} {a,e}{d} {e}{a,d} {a,d}{e} confidence: {a,d,e} / {a} = 0.571 confidence: {a,d,e} / {d,e} = 1.000 confidence: {a,d,e} / {d} = 0.667 confidence: {a,d,e} / {a,e} = 0.667 confidence: {a,d,e} / {e} = 0.571 confidence: {a,d,e} / {a,d} = 0.800 Confidence-Based Pruning Rule Generation Look at set {a,d,e}. Let MinConfidence == 0.800 Has six candidate association rules: {d,e}{a} {a,e}{d} {a,d}{e} {d}{a,e} confidence: {a,d,e} / confidence: {a,d,e} / confidence: {a,d,e} / confidence: {a,d,e} / Selected Rules: {d,e}a and {a,d}e {d,e} = 1.000 {a,e} = 0.667 {a,d} = 0.800 {d} = 0.667 Summary Apriori is a rather simple algorithm that discovers useful and interesting patterns It is widely used It has been extended to create collaborative filtering algorithms to provide recommendations 26 References Fast Algorithms for Mining Association Rules (1994) Mining Association Rules between Sets of Items in Large Databases (1993) Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large Data Bases, VLDB (PDF) Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data Introduction to Data Mining P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Inc., 2006, Chapter 6 27