Download Apriori for Mining Association Rules

Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak 1 Data Mining Seminar 2003 Introduction Bar-Code technology  Mining Association Rules over basket data (93)  Tires ^ accessories  automotive service  Cross market, Attached mail.  Very large databases.  ©Ofer Pasternak 2 Data Mining Seminar 2003 Notation Items – I = {i1,i2,…,im}  Transaction – set of items  TI – Items are sorted lexicographically  ©Ofer Pasternak TID – unique identifier for each transaction 3 Data Mining Seminar 2003 Notation  Association Rule – X  Y X  I , Y  I and X  Y   ©Ofer Pasternak 4 Data Mining Seminar 2003 Confidence and Support   ©Ofer Pasternak Association rule XY has confidence c, c% of transactions in D that contain X also contain Y. Association rule XY has support s, s% of transactions in D contain X and Y. 5 Data Mining Seminar 2003 Define the Problem Given a set of transactions D, generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence. ©Ofer Pasternak 6 Data Mining Seminar 2003 Discovering all Association Rules  Find all Large itemsets – itemsets with support above minimum support.  ©Ofer Pasternak Use Large itemsets to generate the rules. 7 Data Mining Seminar 2003 General idea Say ABCD and AB are large itemsets  Compute conf = support(ABCD) / support(AB)  If conf >= minconf AB  CD holds.  ©Ofer Pasternak 8 Data Mining Seminar 2003 Discovering Large Itemsets Multiple passes over the data  First pass – count the support of individual items.  Subsequent pass  – Generate Candidates using previous pass’s large itemset. – Go over the data and check the actual support of the candidates.  ©Ofer Pasternak Stop when no new large itemsets are found. 9 Data Mining Seminar 2003 The Trick Any subset of large itemset is large. Therefore To find large k-itemset – Create candidates by combining large k-1 itemsets. – Delete those that contain any subset that is not large. ©Ofer Pasternak 10 Data Mining Seminar 2003 Algorithm Apriori L1  {large 1- itemsets} For ( k  2; Lk-1   ; k   ) do begin Ck  apriori- gen (Lk-1 ); forall transacti ons t  D do begin Ct  subset (C k ,t) forall candidates c  Ct do c.count  ; Count item occurrences Generate new k-itemsets candidates Find the support of all the candidates end end Lk  { c  Ck|c.count  minsup} end Answer  Take only those with support over minsup L ; k k ©Ofer Pasternak 11 Data Mining Seminar 2003 Candidate generation  Join step insert into Ck P and q are 2 k-1 large itemsets identical in all k-2 first items. select p.item1 , p.item2 , p.itemk 1 , q.itemk 1 from Lk 1 p,Lk 1q where p.item1  q.item1 ,..., p.itemk  2  q.itemk  2 , p.itemk 1  q.itemk 1  Prune step forall itemsets c  Ck do forall (k-1)-subsets s of c do if (s  Lk-1 ) then delete c from Ck ©Ofer Pasternak Join by adding the last item of q to p Check all the subsets, remove a candidate with “small” subset 12 Data Mining Seminar 2003 Example L3 = { {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4} } After joining { {1 2 3 4}, {1 3 4 5} } {1 4 5} and {3 4 5} After pruning Are not in L3 {1 2 3 4} ©Ofer Pasternak 13 Data Mining Seminar 2003 Correctness Show that Ck  Lk Any subset of large itemset must also be large insert into Ck Join is equivalent to extending Lk-1 with all items and removing those whose (k-1) subsets are not in Lk-1 ©Ofer Pasternak select p.item1 , p.item2 , p.itemk 1 , q.itemk 1 from Lk 1 p,Lk 1q where p.item1  q.item1 ,..., p.itemk  2  q.itemk  2 , p.itemk 1  q.itemk 1 forall itemsets c  Ck do forall (k-1)-subsets s of c do if (s  Lk-1 ) then delete c from Ck Prevents duplications 14 Data Mining Seminar 2003 Subset Function L1  {large 1- itemsets} Candidate itemsets - Ck are stored in a hash-tree  Finds in O(k) time whether a candidate itemset of size k is contained in transaction t.  Total time O(max(k,size(t)) For ( k  2; Lk-1   ; k   ) do begin Ck  apriori- gen (Lk-1 );  ©Ofer Pasternak forall transacti ons t  D do begin Ct  subset (C k ,t) forall candidates c  Ct do c.count  ; end end Lk  { c  Ck|c.count  minsup} end Answer  L ; k k 15

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Apriori for Mining Association Rules