Download Chapter 8 Association Rules

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 8
Association Rules
Content
• Association rule mining
• Mining single-dimensional Boolean association rules
from transactional databases
• Mining multilevel association rules from transactional
databases
• Mining multidimensional association rules from
transactional databases and data warehouse
• From association mining to correlation analysis
• Constraint-based association mining
• Summary
Data Warehouse and Data Mining
2
Chapter 10
What Is Association Mining?
Association rule mining:
Finding frequent patterns, associations,
correlations, or causal structures among sets
of items or objects in transaction databases,
relational databases, and other information
repositories.
Applications:
Basket data analysis, clustering, classification
Data Warehouse and Data Mining
3
Chapter 10
Association Rule: Basic Concepts
• Given:
• (1) database of transactions,
• (2) each transaction is a list of items
(purchased by a customer in a visit)
• Find: all rules that correlate the presence of one
set of items with that of another set of items
– E.g., 98% of people who purchase tires and auto
accessories also get automotive services done
Data Warehouse and Data Mining
4
Chapter 10
Association Rule: Basic Concepts
• Applications
– *  Maintenance Agreement (What the store
should do to boost Maintenance Agreement
sales)
– Home Electronics  * (What other products
should the store stocks up?)
– Attached mailing in direct marketing
Data Warehouse and Data Mining
5
Chapter 10
Rule Measures: Support and Confidence
Customer
buys both
Customer
buys diaper
Customer
buys beer
Data Warehouse and Data Mining
• Find all the rules X & Y  Z
with minimum confidence and
support
– support, s, probability that a
transaction contains {X & Y
=> Z}
– confidence, c, conditional
probability that a
transaction having {X & Y}
also contains Z
6
Chapter 10
Rule Measures: Support and Confidence
Customer
buys both
Transaction ID Items Bought
2000
A,B,C
1000
A,C
4000
A,D
5000
B,E,F
Customer
buys diaper
Customer
buys beer
Let minimum support 50%, and
minimum confidence 50%,
we have
– A  C (50%, 66.6%)
– C  A (50%, 100%)
Data Warehouse and Data Mining
7
Chapter 10
Mining Association Rules—An Example
Transaction ID
2000
1000
4000
5000
Min. support 50%
Min. confidence 50%
Items Bought
A,B,C
A,C
A,D
B,E,F
Frequent Itemset Support
{A}
75%
{B}
50%
{C}
50%
{A,C}
50%
For rule A  C :
support = support({A &C}) = 2/4 = 50%
confidence = support({A &C})/support({A})
=2/3= 66.6%
Data Warehouse and Data Mining
8
Chapter 10
Mining Frequent Itemsets: the Key Step
The Apriori principle:
Any subset of a frequent itemset must
be frequent
Data Warehouse and Data Mining
9
Chapter 10
The Apriori Algorithm
• Use the frequent itemsets to generate
.............association rules.
• Find the frequent itemsets: the sets of items that
have minimum support
– A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B}
should be a frequent itemset
– Iteratively find frequent itemsets with cardinality from
1 to k (k-itemset)
Data Warehouse and Data Mining
10
Chapter 10
The Apriori Algorithm
• Join Step: Ck is generated by joining Lk-1with
itself
• Prune Step: Any (k-1)-itemset that is not
frequent cannot be a subset of a frequent
k-itemset
Data Warehouse and Data Mining
11
Chapter 10
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Data Warehouse and Data Mining
12
Chapter 10
The Apriori Algorithm — Example
Database D
TID
100
200
300
400
Items
134
235
1235
25
itemset sup.
C1
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3
C2 itemset sup
L2 itemset sup
{1 3}
{2 3}
{2 5}
{3 5}
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
Scan D
Data Warehouse and Data Mining
{2 3 5}
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
13
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Chapter 10
Generating Association Rules
Confidence and Support
-Milk
-Cheese
-Bread
-Eggs
Possible associations include the following:
1. If customers purchase milk they also purchase bread.
2. If customers purchase bread they also purchase milk.
3. If customers purchase milk and eggs they also purchase
cheese and bread.
4. If customers purchase milk, cheese, and eggs they also
purchase bread.
Data Warehouse and Data Mining
14
Chapter 10
Generating Association Rules
Mining Association Rules: An Example
Data Warehouse and Data Mining
15
Chapter 10
Generating Association Rules
Mining Association Rules: An Example
Data Warehouse and Data Mining
16
Chapter 10
Generating Association Rules
Mining Association Rules: An Example
Data Warehouse and Data Mining
17
Chapter 10
Generating Association Rules
Mining Association Rules: An Example
Two possible two-item set rule are:
Data Warehouse and Data Mining
18
Chapter 10
Generating Association Rules
Mining Association Rules: An Example
Here are three of several possible three-item
set rules:
Data Warehouse and Data Mining
19
Chapter 10
Reference
Data Mining: Concepts and Techniques (Chapter 6 Slide for
textbook), Jiawei Han and Micheline Kamber, Intelligent Database
Systems Research Lab, School of Computing Science, Simon Fraser
University, Canada
Data Warehouse and Data Mining
20
Chapter 10
Related documents