Download C k

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Classical Apriori Algorithm
for Mining Association Rules
What is an Association Rule?
• Given a set of transactions {t1, t2, ...,tn}
where a transaction ti is a set of items {Xi1, … , Xim}
• An association rule is an expression:
A ==> B
where A & B are sets of items, and A  B = 
• Meaning: transactions which contain A also contain B
Two Thresholds
• Measurement of rule strength in a
relational transaction database
A ==> B [support, confidence]
support (AB) = # _ of _ trans _ containing _ the _ items _ AB
total _# _ of _ trans
support ( AB)
confidence (A ==> B) =
support ( A)
Strong Rules
•We are interested in strong associations, i.e.,
support  min_sup
& confidence  min_conf
• Examples:
bread & butter ==> milk [support=5%, confidence=60%]
beer ==> diapers [support=10%, confidence=80%]
Mining Association Rules
•Mining association rules from a large data set of items
can improve the quality of business decisions
•A supermarket with a large collection of items,
typical business decisions:
• what to put on sale
• how to design coupons,
• how to place merchandise on shelves to
maximize the profit, etc.
Mining Association Rules (2)
• There are two main steps in mining association rules
1. Find all combinations of items that have transaction
support above minimum support (frequent itemsets)
2. Generate association rules from the frequent itemsets
• Most existing algorithms focused on the first step
because it requires a great deal of computation, memory,
and I/O, and has a significant impact on the overall
performance
The Classical Mining Algorithm
Apriori (Agrawal, et al.’94)
• At the first iteration, scan all the transactions and
count the the number of occurrences for each items.
This derives the frequent 1-itemsets, L1
• At the k-th iteration, the candidate set Ck are those
whose every (k-1)-item subset is in Lk-1 is formed
• Scan the database and count the number of
occurrences for each candidate k-itemset
• Totally, it needs x database scans for x levels
Moving 1 level at a time (Apriori)
through an itemset lattice
Level x
…
Level (k+1)
Level k
…
Level 3
Level 2
Level l
The Algorithm Apriori
1. L1 = {frequent 1-itemset}
2. For (k=2; Lk-1 L < > 0, k++) {
3. Ck = Apriori_gen(Lk-1) ;
4. for all transactions t in D do
5.
for all candidates c in D do
6.
c.count++ ;
7. Lk = {c in Ck | c.count >= minimum support}
8. }
9. Result = Uk Lk
The Algorithm Apriori _gen
Pre: all itemsets in Lk-1
Post: itemsets in Ck
Insert
Select
From
Where
into Ck
p.item1, p.item2, …, p.itemk-1, q.itemk-1
Lk-1 p, Lk-1 q
p.item1 = q.item1, …, p.itemk-2 = q.itemk-2,
p.itemk-1 = q.itemk-1
The prune step
Pre: itemsets in Ck and Lk-1
Post: itemsets in Ck such that some (k-1)-subset of c
which is not in Lk-1 are deleted
Forall itemsets c  Ck do
Forall (k-1)-subsets s of c do
if (s  Lk-1) then
delete c from Ck
An Example
Tid
1
2
3
4
5
6
7
8
9
10
Input Dataset
items
A B
B
A B
A B
A B
A B
B
A B
A
B
C
C
E
C
E F
C D
C
E
C
E F
C D E F
C
C D E
C
E F
minsup = 20%  L1 = { A, B, C, D, E, F}
An Example (2)
C2 = {AB,
BC,
CD,
DE,
AC,
BD,
CE,
DF,
AD, AE, AF,
BE, BF,
CF,
EF}
After counting
C2 = {AB(6), AC(7), AD(2), AE(4), AF(2),
BC(8), BD(2), BE(6), BF(4),
CD(3), CE(6), CF(3),
DE(2), DF(1), EF(4)}
L2 = {AB,
BC,
CD,
DE,
AC, AD, AE, AF,
BD, BE, BF,
CE, CF,
EF}
An Example (3)
C3 = {ABC, ABD, ABE, ABF,
ACD, ACE, ACF, ADE, ADF, AEF,
BCD, BCE, BCF, BDE, BDF, BEF,
CDE, CDF, CEF}
After pruning
C3 = {ABC, ABD, ABE, ABF,
ACD, ACE, ACF, ADE, AEF,
BCD, BCE, BCF, BDE, BEF,
CDE, CEF}
After counting
C3 = {ABC(6), ABD(1), ABE(3), ABF(2),
ACD(2), ACE(4), ACF(2), ADE(1), AEF(2),
BCD(2), BCE(2), BCF(3), BDE(1), BEF(4),
CDE(2), CEF(3)}
L3 = {ABC, ABE, ABF, ACD, ACE, ACF, AEF,
BCD, BCE, BCF, BEF,
CDE, CEF}
An Example (4)
C4 = {ABCE, ABCF, ABEF, ACDE, ACDF,
BCDE, BCDF, BCEF}
After pruning
C4 = {ABCE, ABCF, ABEF, ACEF,
BCEF}
After counting
C4 = {ABCE(3), ABCF(2), ABEF(2), ACEF(2),
BCEF(3),}
L4 = {ABCE, ABCF, ABEF, ACEF, BCEF}
An Example (5)
C5 = {ABCEF}
After counting
C5 = {ABCEF(2)}
L5 = {ABCEF}
Assignment 1
Work:
• ให้ เขียนโปรแกรมทีส่ อดคล้ องกับ An algorithm Apriori เพื่อ generate
Frequent itemsets ในแต่ ละ Level ของ Itemsets lattice
Data sets :
• สามารถ download จากเครื่ อง “angsila/~nuansri/310214”
• run ด้ วยค่ า minimum support ต่ างๆดังนี ้
xt10.data ==> minsup = 20%, 15%, และ 10%
tr2000.data ==> minsup= 10%, 8% และ 5%
Assignment 1 (2)
Due :
• วันจันทร์ ที่ 15 ก.ย. 2546
• สาธิตโปรแกรมและเอกสารประกอบโปรแกรม ณ ห้ อง SD417
Note:
• Frequent itemsets ในทุก Level ของ Itemsets lattice จะต้ อง
เหมือนกัน ไม่ ว่า run โดยคนละโปรแกรม หรื อโปรแกรมใช้ โครงสร้ าง
ข้ อมูลทีต่ ่ างกัน ใน data sets ชุ ดเดียวกัน
้ สต
• ดังนันนิ
ิ ทุกคน สามารถตรวจความถูกต้อง
ของ จานวนและค่ าของ frequent itemsets ใน data sets ชุ ดเดียวกัน
กับเพื่อนร่ วมชั้นเรียน
Related documents