Download Tutorial 4 Data Mining Association Rules What is an association rule

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Tutorial 4
Data Mining Association Rules
1. What is an association rule? What are the metrics for evaluating association rules?
An association rule states that: given a set of records, each of which contain some number of
items from a given collection, there will be a dependency rule that will predict the
occurrence of an item based on the occurrences of other items in the transaction. In other
words, if it has been found in all transactions that coke is always bought with milk, then
there will be a rule that states {milk} -> {coke} (however, not the other way around since not
all milk is bought with coke). The association rule evaluation metrics are “Support” (s) and
“Confidence” (c). Support is the fractions of the transactions that contain both X and Y.
Confidence measures how often items in Y appears in transactions that contain X. For
example given the following table, these are the support and confidence values:
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Example Assiciation Rule: {Milk, Diaper} => Beer
s =  (Milk, Diaper, Beer)/Total Transactions
= 2/5
= 0.4
c =  (Milk, Diaper, Beer)/  (Milk, Diaper)
= 2/3
= 0.67
1|Page
2. Apply the Apriori algorithm to find all itemsets with support >= 0.2 from the following data:
Transaction
Items in Transaction
1
Milk, Bread, Eggs
2
Milk, Juice
3
Juice, Butter
4
Milk, Bread, Eggs
5
Coffee, Eggs
6
Coffee
7
Coffee, Juice
8
Milk, Bread, Cookies, Eggs
9
Cookies, Butter
10
Milk, Bread
Apriori Principle step 1: Count up the occurrences of 1 item:
Itemset
Count
Milk
5
Bread
4
Eggs
4
Juice
3
Butter
2
Coffee
3
Cookies
2
*Note: since it is out of 10, 0.2 support means if it appears twice in the list.
Step 2: look for frequent occurrences of 2 items (in bold, not strikethrough):
Itemset
Count
Milk, Bread
4
Milk, Eggs
3
Milk, Juice
1
Milk, Cookies
1
Bread, Eggs
3
Bread, Cookies
1
Eggs, Coffee
1
Eggs, Cookies
1
Juice, Butter
1
Juice, Coffee
1
Butter, Cookies
1
Step 3: look for frequent occurrences of 3 items (in bold, not strikethrough):
Itemset
Count
Milk, Bread, Eggs
3
Therefore the most frequent and highest itemset data mining sub-itemset is {Milk, Bread,
Eggs}.
2|Page
3. Using the data set in question 2, find all association rules with support >=0.2, and
confidence >= 0.8.
Association Rules for {Milk, Bread, Eggs}:
{Milk, Bread} -> {Eggs}
Support = 3/10 = 0.3
{Milk Eggs} -> {Bread}
Support = 3/10 = 0.3
{Eggs, Bread} -> {Milk}
Support = 3/10 = 0.3
Confidence = 3/4 = 0.75
Confidence = 3/3 = 1
Confidence = 3/3 = 1
Association Rules for {Milk, Bread}:
{Milk} -> {Bread}
Support = 4/10 = 0.4
{Bread} -> {Milk}
Support = 4/10 = 0.4
Confidence = 4/5 = 0.8
Confidence = 4/4 = 1
Association Rules for {Milk Eggs}:
{Milk} -> {Eggs}
Support = 3/10 = 0.3
{Eggs} -> {Milk}
Support = 3/10 = 0.3
Confidence = 3/5 = 0.6
Confidence = 3/4 = 0.75
Association Rules for {Bread Eggs}:
{Bread} -> {Eggs}
Support = 3/10 = 0.3
{Eggs} -> {Bread}
Support = 3/10 = 0.3
Confidence = 3/4 = 0.75
Confidence = 3/4 = 0.75
Therefore, the only Association Rules that satisfy the restriction of having support >= 2 and
confidence >= 0.8 is:
- {Milk, Eggs} -> {Bread}
- {Eggs, Bread} -> {Milk}
- {Milk} -> {Bread}
- {Bread} -> {Milk}
3|Page
Related documents