Download Slide

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Association Rule Mining
(Some material adapted from:
Mining Sequential Patterns by Karuna Pande Joshi)
2
An Example
3
Terminology
Transaction
Item
Itemset
4
Association Rules


Let U be a set of items and let X, Y  U,
with X  Y = 
An association rule is an expression of the
form X  Y, whose meaning is:

If the elements of X occur in some context,
then so do the elements of Y
5
Quality Measures

Let T be set of all transactions. The following
statistical quantities are relevant to association
rule mining:

support(X)

confidence(X Y)
The percentage of all
 |{t T: X t}| / |T| transactions, containing item set
x
 support(X Y)
The percentage of all transactions,
 |{t T: XY t}| / |T| containing both item sets x and y

The percentage of transactions
|{t T: XY t}| / |{t T: X t}| containing item set x, that also
contain item set y. How good is
item set x at predicting item set
y.
6
Learning Associations

The purpose of association rule learning is
to find “interesting” rules, i.e., rules that
meet the following two user-defined
conditions:


support(X Y)  MinSupport
confidence(X Y)  MinConfidence
7
Itemsets

Frequent itemset


An itemset whose support is greater than
MinSupport (denoted Lk where k is the size of
percentage of transactions contain the full item
the itemset) High
set.
Candidate itemset

A potentially frequent itemset (denoted Ck
where k is the size of the itemset)
8
Basic Idea



Generate all frequent itemsets satisfying
the condition on minimum support
Build all possible rules from these itemsets
and check them against the condition on
minimum confidence
All the rules above the minimum
confidence threshold are returned for
further evaluation
9
10
11
12
14
15
16
17
18
19
20
21
AprioriAll (I)


L1  
For each item Ij  I


count({Ij}) = | {Ti : Ij  Ti} | The number of all transactions, containing item I_j
If count({Ij})  MinSupport x m If this count is big enough, we add the item



k2
L1  L1  {({Ij}, count({Ij})}
While Lk-1  


Lk  
For each (l1, count(l1))  Lk-1

For each (l2, count(l2))  Lk-1



and count to a stack, L_1
If (l1 = {j1, …, jk-2, x}  l2 = {j1, …, jk-2, y}  x  y)
 l  {j1, …, jk-2, x, y}
 count(l)  | {Ti : l  Ti } |
 If count(l)  MinSupport x m
Lk  Lk  {(l, count(l))}
kk+1
Return L1 L2… Lk-1
22
Rule Generation

Look at set {a,d,e}

Has six candidate association rules:
{a}{d,e}
 {d,e}{a}
 {d}{a,e}
 {a,e}{d}
 {e}{a,d}
 {a,d}{e}

confidence: {a,d,e} / {a} = 0.571
confidence: {a,d,e} / {d,e} = 1.000
confidence: {a,d,e} / {d} = 0.667
confidence: {a,d,e} / {a,e} = 0.667
confidence: {a,d,e} / {e} = 0.571
confidence: {a,d,e} / {a,d} = 0.800
Confidence-Based Pruning
Rule Generation

Look at set {a,d,e}. Let MinConfidence == 0.800

Has six candidate association rules:
{d,e}{a}
 {a,e}{d}
 {a,d}{e}
 {d}{a,e}


confidence: {a,d,e} /
confidence: {a,d,e} /
confidence: {a,d,e} /
confidence: {a,d,e} /
Selected Rules:

{d,e}a and {a,d}e
{d,e} = 1.000
{a,e} = 0.667
{a,d} = 0.800
{d} = 0.667
Summary


Apriori is a rather simple algorithm that
discovers useful and interesting patterns
It is widely used

It has been extended to create collaborative
filtering algorithms to provide
recommendations
26
References

Fast Algorithms for Mining Association Rules (1994)


Mining Association Rules between Sets of Items in Large
Databases (1993)


Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large
Data Bases, VLDB (PDF)
Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the
1993 ACM SIGMOD International Conference on Management of Data
Introduction to Data Mining

P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining,
Pearson Education Inc., 2006, Chapter 6
27