Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Classical Apriori Algorithm for Mining Association Rules What is an Association Rule? • Given a set of transactions {t1, t2, ...,tn} where a transaction ti is a set of items {Xi1, … , Xim} • An association rule is an expression: A ==> B where A & B are sets of items, and A B = • Meaning: transactions which contain A also contain B Two Thresholds • Measurement of rule strength in a relational transaction database A ==> B [support, confidence] support (AB) = # _ of _ trans _ containing _ the _ items _ AB total _# _ of _ trans support ( AB) confidence (A ==> B) = support ( A) Strong Rules •We are interested in strong associations, i.e., support min_sup & confidence min_conf • Examples: bread & butter ==> milk [support=5%, confidence=60%] beer ==> diapers [support=10%, confidence=80%] Mining Association Rules •Mining association rules from a large data set of items can improve the quality of business decisions •A supermarket with a large collection of items, typical business decisions: • what to put on sale • how to design coupons, • how to place merchandise on shelves to maximize the profit, etc. Mining Association Rules (2) • There are two main steps in mining association rules 1. Find all combinations of items that have transaction support above minimum support (frequent itemsets) 2. Generate association rules from the frequent itemsets • Most existing algorithms focused on the first step because it requires a great deal of computation, memory, and I/O, and has a significant impact on the overall performance The Classical Mining Algorithm Apriori (Agrawal, et al.’94) • At the first iteration, scan all the transactions and count the the number of occurrences for each items. This derives the frequent 1-itemsets, L1 • At the k-th iteration, the candidate set Ck are those whose every (k-1)-item subset is in Lk-1 is formed • Scan the database and count the number of occurrences for each candidate k-itemset • Totally, it needs x database scans for x levels Moving 1 level at a time (Apriori) through an itemset lattice Level x … Level (k+1) Level k … Level 3 Level 2 Level l The Algorithm Apriori 1. L1 = {frequent 1-itemset} 2. For (k=2; Lk-1 L < > 0, k++) { 3. Ck = Apriori_gen(Lk-1) ; 4. for all transactions t in D do 5. for all candidates c in D do 6. c.count++ ; 7. Lk = {c in Ck | c.count >= minimum support} 8. } 9. Result = Uk Lk The Algorithm Apriori _gen Pre: all itemsets in Lk-1 Post: itemsets in Ck Insert Select From Where into Ck p.item1, p.item2, …, p.itemk-1, q.itemk-1 Lk-1 p, Lk-1 q p.item1 = q.item1, …, p.itemk-2 = q.itemk-2, p.itemk-1 = q.itemk-1 The prune step Pre: itemsets in Ck and Lk-1 Post: itemsets in Ck such that some (k-1)-subset of c which is not in Lk-1 are deleted Forall itemsets c Ck do Forall (k-1)-subsets s of c do if (s Lk-1) then delete c from Ck An Example Tid 1 2 3 4 5 6 7 8 9 10 Input Dataset items A B B A B A B A B A B B A B A B C C E C E F C D C E C E F C D E F C C D E C E F minsup = 20% L1 = { A, B, C, D, E, F} An Example (2) C2 = {AB, BC, CD, DE, AC, BD, CE, DF, AD, AE, AF, BE, BF, CF, EF} After counting C2 = {AB(6), AC(7), AD(2), AE(4), AF(2), BC(8), BD(2), BE(6), BF(4), CD(3), CE(6), CF(3), DE(2), DF(1), EF(4)} L2 = {AB, BC, CD, DE, AC, AD, AE, AF, BD, BE, BF, CE, CF, EF} An Example (3) C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF} After pruning C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, AEF, BCD, BCE, BCF, BDE, BEF, CDE, CEF} After counting C3 = {ABC(6), ABD(1), ABE(3), ABF(2), ACD(2), ACE(4), ACF(2), ADE(1), AEF(2), BCD(2), BCE(2), BCF(3), BDE(1), BEF(4), CDE(2), CEF(3)} L3 = {ABC, ABE, ABF, ACD, ACE, ACF, AEF, BCD, BCE, BCF, BEF, CDE, CEF} An Example (4) C4 = {ABCE, ABCF, ABEF, ACDE, ACDF, BCDE, BCDF, BCEF} After pruning C4 = {ABCE, ABCF, ABEF, ACEF, BCEF} After counting C4 = {ABCE(3), ABCF(2), ABEF(2), ACEF(2), BCEF(3),} L4 = {ABCE, ABCF, ABEF, ACEF, BCEF} An Example (5) C5 = {ABCEF} After counting C5 = {ABCEF(2)} L5 = {ABCEF} Assignment 1 Work: • ให้ เขียนโปรแกรมทีส่ อดคล้ องกับ An algorithm Apriori เพื่อ generate Frequent itemsets ในแต่ ละ Level ของ Itemsets lattice Data sets : • สามารถ download จากเครื่ อง “angsila/~nuansri/310214” • run ด้ วยค่ า minimum support ต่ างๆดังนี ้ xt10.data ==> minsup = 20%, 15%, และ 10% tr2000.data ==> minsup= 10%, 8% และ 5% Assignment 1 (2) Due : • วันจันทร์ ที่ 15 ก.ย. 2546 • สาธิตโปรแกรมและเอกสารประกอบโปรแกรม ณ ห้ อง SD417 Note: • Frequent itemsets ในทุก Level ของ Itemsets lattice จะต้ อง เหมือนกัน ไม่ ว่า run โดยคนละโปรแกรม หรื อโปรแกรมใช้ โครงสร้ าง ข้ อมูลทีต่ ่ างกัน ใน data sets ชุ ดเดียวกัน ้ สต • ดังนันนิ ิ ทุกคน สามารถตรวจความถูกต้อง ของ จานวนและค่ าของ frequent itemsets ใน data sets ชุ ดเดียวกัน กับเพื่อนร่ วมชั้นเรียน