Download Association Rules

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Analysis of Customer Behavior and Service Modeling
Lecture 4: Association
Market
Basket
Analysis
What Is Association Mining?



Association rule mining:
– Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
Applications:
– Market basket analysis, cross-marketing, catalog design,
loss-leader analysis, clustering, classification, etc.
Examples:
– Rule form: “Body Head [support, confidence]”
buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]
• major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
•
Support and Confidence
 Support
– Percent of samples contain both A and B
– support(A  B) = P(A ∩ B)
 Confidence
– Percent of A samples also containing B
– confidence(A  B) = P(B|A)
 Example
– computer  financial_management_software
[support = 2%, confidence = 60%]
Association Rules: Basic Concepts
 Given: (1) database of transactions, (2) each transaction is a
list of items (purchased by a customer in a visit)
 Find: all rules that correlate the presence of one set of items
with that of another set of items
– e.g., 98% of people who purchase tires and auto accessories also
get automotive services done
 Applications
– Home Electronics - What other products should the store stocks up?
– Retailing – Shelf design, promotion structuring, direct marketing
Rule Measures:
Support and Confidence
Customer
buys both
 Find all the rules A  C with minimum
confidence and support
Customer
buys diaper
– Support (s) probability that a
transaction contains {A & C}
– Confidence (c) conditional probability
that a transaction having {A} also
contains {C}
Customer
buys beer
Transaction ID
2000
1000
4000
5000
Items Bought
A,B,C
A,C
A,D
B,E,F
Let minimum support 50%, and
minimum confidence 50%, we have
A  C (50%, 66.6%)
C  A (50%, 100%)
Mining Association Rules: An
Example
Transaction ID
2000
1000
4000
5000
Items Bought
A,B,C
A,C
A,D
B,E,F
Target:
Min. support 50%
Min. confidence 50%
Frequent Itemset Support
{A}
75%
{B}
50%
{C}
50%
{A,C}
50%
For rule A  C:
support = support({A, C}) = 50%
confidence = support({A, C})/support({A}) = 66.6%
An Example of Market Basket(1)
 There are 8 transactions
on three items on A
(Apple), B (Banana) , C
(Carrot).
 Check associations for
below two cases.
(1) A  B
(2) (A, B)  C
#
Basket
1
A
2
B
3
C
4
A, B
5
A, C
6
B, C
7
A, B, C
8
A, B, C
An Example of Market Basket(1(2)
 Basic probabilities are below:
(1) AB
(2) (A, B)  C
LHS
P(A) = 5/8 = 0.625
P(A,B) = 3/8 = 0.375
RHS
P(B) = 5/8 = 0.625
P(C) = 5/8 = 0.625
Coverage
LHS = 0.625
LHS = 0.375
Support
P(A∩B) = 3/8 = 0.375
P((A,B)∩C)) = 2/8 =0.25
Confidence P(B|A)=0.375/0.625=0.6
P(C|(A,B))=0.25/0.375=0.7
Lift
0.375/(0.625*0.625)=0.96 0.25/(0.375*0.625)=1.07
Leverage
0.375 - 0.390 = -0.015
0.25 - 0.234 = 0.016
Lift
 What are good association rules?
(How to interpret them?)
– If lift is close to 1, it means there is no
association between two items (sets).
– If lift is greater than 1, it means there is a
positive association between two items (sets).
– If lift is less than 1, it means there is a negative
association between two items (sets).
Leverage
– Leverage = P(A∩B) - P(A)*P(B) , it has three types
① Leverage > 0
② Leverage = 0
③ Leverage < 0
– ① Two items (sets) are positively associated
– ② Two items (sets) are independent
– ③Two items (sets) are negatively associated
Lab on Association Rules(1)
 SPSS Clementine, SAS Enterprise Miner have
association rules softwares.
 This exercise uses Magnum Opus.
 Go to http://www.rulequest.com and
download Magnum Opus evaluation version
( click)
 After you install the problem, you can see below
initial screen. From menu, choose File – Import Data
(Ctrl – O).
 Demo Data sets are already there. Magnum Opus has two types
of data sets available: (transaction data: *.idi, *.itl) and
(attribute-value data: *.data, *.nam)
 Data format has below two types:(*.idi, *.itl).
idi
itl
(identifier-item file) (item list file)
001,
001,
001,
002,
002,
002,
002,
apples
oranges
bananas
apples
carrots
lettuce
tomatoes
apples, oranges, bananas
apples, carrots, lettuce, tomatoes
 If you open
tutorial.idi using
note pad, you
can see the file
inside as left.
 The example left
has 5
transactions
(baskets)
 File – Import Data,
or click
. click
Tutorial.idi
 Check Identifier –
item file and click
Next >.
 Click Yes and click
Next > …
 click Next > …
 Click Next > …
 What percentage of
whole file you want to
use? Type 50% and
click Next > …
 click Import Data를 클릭
 Then, you can see a
screen like below left.
 Set things
as they are.
– Search by:
LIFT
– Minimum lift:
1
– Maximum
no. of rules:
10
 Click GO
 Results are saved in tutorial.out file.
 Below are rules derived:
lettuce & carrots
are associated with tomatoes
with strength = 0.857
coverage = 0.042: 21 cases satisfy the LHS
support = 0.036: 18 cases satisfy both the LHS and the RHS
lift 3.51: the strength is 3.51 times greater than the strength
if there were no association
leverage = 0.0258: the support is 0.0258 (12.9 cases) greater than
if there were no association
 lettuce & carrots  tomatoes
– When Lettuce and carrots are purchase then they
buy tomatoes
– coverage = 0.042: 21 cases satisfy the LHS
– LHS(lettuce & carrots) = 21/500 = 0.042
 support = 0.036: 18 cases satisfy both the
LHS and the RHS
– P((lettuce & carrots) ∩ tomatoes)) = 18/500 =
0.036
 strength(confidence) = 0.857
– P(support|LHS)= 18/21 = 0.036/0.042 = 0.857
 lift 3.51: the strength is 3.51 times greater
than the strength if there were no association
– 즉, (18/21)/(122/500) = 3.51
 leverage = 0.0258: the support is 0.0258
(12.9 cases) greater than if there were no
association
– P(LHS ∩ RHS) – P(A)*P(B) = 0.036 – 0.042*0.244
= 0.0258
Related documents