Download - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bahria University
Department of Computer Sciences
Advanced Databases
Lab10: Discovering Association Rules in Data Mining
Objectives


To understand the concept of discovering Assocations Rules in Data Mining
To practice discovering Association Rules using Support and Confidence concepts
Association Rules in Data Mining

Association rules. These rules correlate the presence of a set of items with another range of
values for another set of variables.
Examples: (1) When a female retail shopper buys a handbag, she is likely to buy shoes.
(2) An X-ray image containing characteristics a and b is likely to also exhibit characteristic c.
Example
A common example is that of market-basket data containing sets of items a consumer buys in a
supermarket during one visit.
Consider four such transactions in a random sample shown in Figure 28.1.
An association rule is of the form:
X => Y
where X = {x1, x2, ..., xn}, and Y = {y1, y2,..., ym} are sets of items
with xi and yj being distinct items for all i and all j.
This association states that if a customer buys X, he or she is also likely to buy Y.
The set X Y is called an itemset, the set of items purchased by customers.
For an association rule to be of interest to a data miner, the rule should satisfy some interest measure.
Two common interest measures are support and confidence:
 The support for a rule X => Y refers to how frequently a specific itemset occurs in the database.
That is, the support is the percentage of transactions that contain all of the items in the itemset
X Y. If the support is low, it implies that there is no overwhelming evidence that items in X Y
occur together because the itemset occurs in only a small fraction of transactions.
BU, CS Department
Advanced Databases
2/3
Lab10: Discovering Association Rules in Data Mining
 The confidence is with regard to the implication shown in the rule.
The confidence of the rule X => Y is computed as the support(X Y)/support(X).
We can think of it as the probability that the items in Y will be purchased given that the items in X
are purchased by a customer.
Another term for confidence is strength of the rule.
As an example of support and confidence, consider the following two rules:
milk => juice and
bread => juice.
Looking at our four sample transactions in Figure 28.1, we see that the support of {milk, juice} is 50
percent and the support of {bread, juice} is only 25 percent.
The confidence of milk => juice is 66.7 percent
(meaning that, of three transactions in which milk occurs, two contain juice)
and the confidence of bread => juice is 50 percent
(meaning that one of two transactions containing bread also contains juice).
As we can see, support and confidence do not necessarily go hand in hand. The goal of mining
association rules, then, is to generate all possible rules that exceed some minimum user-specified
support and confidence thresholds.
The sets of items that have a support that exceeds the threshold are called large (or frequent) itemsets.
(Note that large here means large support).
BU, CS Department
Advanced Databases
3/3
Lab10: Discovering Association Rules in Data Mining
Exercises
Exercise 1
1. Use the following data to find out the answers to questions given below:
Trans_id
-------101
102
103
104
105
106
107
108
109
110
Items_purchased
--------------milk, bread, eggs
milk, juice
juice, butter
milk, bread, eggs
coffee, eggs
coffee
coffee, juice
milk, bread, cookies, eggs
cookies, butter
milk, bread
The set of items is {milk, bread, cookies, eggs, butter, coffee, juice}.
a) Find out the support for following rules:
bread => eggs
milk => eggs
milk => bread
cookies => milk
b) Find out the confidence for following rules:
bread => eggs
milk => eggs
milk => bread
cookies => milk
milk => cookies
c) Show two rules that have a large itemset containing three items from the given data.
Use the threshold value of 0.25 (25%). Also write the support for each rule:
d) Show two rules that have a confidence of 0.7 (70%) or greater for an itemset
containing three items from the given data. Also write the confidence for each rule: