Download slides - Bioinformatics Sannio

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Machine Learning
mining association rules
Luigi Cerulo
Department of Science and Technology
University of Sannio
Understanding association rules
•
A typical association rule
It means “if peanut butter and jelly are purchased, then
bread is also likely to be purchased”
• In boolean logic terms:
onions AND potatoes => burger
•
Antecedent
Left Hand Side
=>
Consequent
Right Hand Side
ety of business-related applications such as marketing promotions, inventory
management, and customer relationship management.
This chapter presents a methodology known as association analysis,
which is useful for discovering interesting relationships hidden in large data
sets. The uncovered relationships can be represented in the form of associa-
Market basket analysis
•
Transactions
Table 6.1. An example of market basket transactions.
T ID
1
2
3
4
5
•
Items
{Bread, Milk}
{Bread, Diapers, Beer, Eggs}
{Milk, Diapers, Beer, Cola}
{Bread, Milk, Diapers, Beer}
{Bread, Milk, Diapers, Cola}
6.1
Dataset as examples and features
Problem Definition 329
Table 6.2. A binary 0/1 representation of market basket data.
TID
1
2
3
4
5
Bread
1
1
0
1
1
Milk
1
0
1
1
1
Diapers
0
1
1
1
1
Beer
0
1
1
1
0
Eggs
0
1
0
0
0
Cola
0
0
1
0
1
Association rules discovery
Association rules are not used for prediction, but rather for
unsupervised knowledge discovery in large databases.
• There is no need for the algorithm to be trained and data
does not need to be labeled ahead of time.
• The algorithm is simply unleashed on a dataset in the
hope that interesting associations are found.
•
How to measure if a rule is interesting
•
Whether or not an association rule is deemed interesting is
determined by two statistical measures: support and
confidence.
1
1
0
0
0
0
1
0
1
1
1
0
0
1
1
1
0
1
1 Items
1 and
1Transactions
1 market
0 basket
0 data
very
simplistic
view
of real
1
1
1 data 0such as0 the quantity
1
rtant
aspects
of the
6.1 Problem Definiti
to purchase
them.
Methods
for
handling
such
• Let I the set of all items (features) in a market basket
ed in Chapter 7.
Table 6.2. A binary 0/1 representation of market basket data.
dataset
TID
Bread basket
Milk Diapers
perhaps a very simplistic view of real
market
data Beer Eggs Cola
1
1
1
0
0
0
0
ttainLet
I
=
{i
,i
,.
.
.,i
}
be
the
set
of
all
items
important
as
1 aspects
2
d of the data such
2
1 the 0quantity
1
1
1
0
=
{t1paid
, t2 , . to
. . ,purchase
tN } be the
set ofMethods
all transactions.
3 for 0handling
1
1
1
0
1
rice
them.
such
4
1
1
1
1
0
0
subset
of items
chosen from
I. In association
be
explained
in
Chapter
7.
5
1
1
1
0
0
1
• Let T the set of all more items is termed an itemset. If an itemset
transactions
(examples)
-itemset.
For
instance,
Milk}
rt Count Let I = {i1{Beer,
,i2 ,. . .,iDiapers,
}
be
the
set of all items
d
This
representation
is perhaps
a very simplistic view of real market bas
The
null
(or
empty)
set
is
an
itemset
that
does
ta and T = {t1 , t2 , . . . , tN } be the set of all transactions.
because it ignores certain important aspects of the data such as the
ontains a subset of itemsofchosen
Inpaid
association
items soldfrom
or theI.
price
to purchase them. Methods for handl
fined
asor
the
number
present
inwilla be
transnon-binary
explained
in Chapter 7.
of zero
more
itemsofisitems
termed
andata
itemset.
If an itemset
Each
transaction
contains
a
set
of
items
called
itemset.
dcalled
to • contain
an
itemset
X
if
X
is
a
subset
of
a k-itemset. For instance,
{Beer,
Diapers,
Milk}
Itemset and Support Count Let I = {i1 ,i2 ,. . .,id } be the set of
(k-itemset
=
and
itemset
of
k
items)
nsaction
shown
in
Table
6.2
contains
the
itemtemset. The null (or empty)
set
is
an
itemset
in a market basket data and Tthat
= {t1 ,does
t2 , . . . , tN } be the set of all tran
{Bread,
Milk}. An important
property
of ana subset of items chosen from I. In ass
Each transaction
ti contains
.
analysis,
a collection of zero
or more items is termed an itemset. If an
hich
refers
to
the
number
of
transactions
that
idth is defined as the number of items present in a trans-
width
defined
as the number of items present in a tr
tThe
I = transaction
{i1 ,i2 ,. . .,id } be
the setis of
all items
on.
said to contain an itemset X if X is a subse
t2 , . . .A
, tNtransaction
} be the set oftjallistransactions.
tFor
of items
chosenthe
from
I. In association
example,
second
transaction shown in Table 6.2 contains the i
items
is termed
ancount
itemset.
Ifnot
an
Support
anitemset
itemset
{Bread,
Diapers}
butfor
{Bread,
Milk}. An important property o
set. For instance, {Beer, Diapers, Milk}
mset
is
its
support
count,
which
refers
to
the
number
of
transactions
ll (or empty) set is an itemset that does
tain• aThe
particular
itemset.
support
count,
support
countMathematically,
of an itemset X the
is the
number
of σ(X), fo
mset
can be
stated
ascontain
follows:
as the X
number
of items
present
in a transtransactions
that
X
ontain an itemset X if X is a subset of
!
!
on shown in Table 6.2 contains the!itemσ(X) = {ti |X ⊆ ti , ti ∈ T }!,
d, Milk}. An important property of an
6.1 Problem Definitio
efers to the number of transactions that
Table 6.2. of
A binary
0/1 representation
data. data
ere the symbol
| · |count,
denote
thefor
number
elements
in aof market
set. basket
In the
matically,
the support
σ(X),
an
TID Bread Milk Diapers Beer Eggs Cola
wn in Table 6.2, the support count
for
{Beer,
Diapers,
Milk}
is
equ
1
1
1
0
0
0
0
2
1
0
1
1all three
1
0
because
there
that
contain
items.
! are only two transactions
• Example
X ⊆ ti , ti ∈ T }!,
3
4
5
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
0
1
What is Rule
the support
of?association rule is an implication expression of
sociation
An
mber of elements in a set. In the data set
m
X {Beer,
−→ Y Diapers,
, where Milk}
X and
Y areto disjoint itemsets, i.e., X ∩ Y = ∅.
nt for
is equal
This
representation
is perhaps a very
of real
market bas
actions
all three
items.
ngth ofthat
ancontain
association
rule
can be measured
insimplistic
terms view
of its
support
because it ignores certain important aspects of the data such as the q
nfidence. Support determines
how
often
is them.
applicable
tohandl
a g
of items sold or
the price
paidato rule
purchase
Methods for
width
defined
as the number of items present in a tr
tThe
I = transaction
{i1 ,i2 ,. . .,id } be
the setis of
all items
on.
said to contain an itemset X if X is a subse
t2 , . . .A
, tNtransaction
} be the set oftjallistransactions.
tFor
of items
chosenthe
from
I. In association
example,
second
transaction shown in Table 6.2 contains the i
items
is termed
ancount
itemset.
Ifnot
an
Support
anitemset
itemset
{Bread,
Diapers}
butfor
{Bread,
Milk}. An important property o
set. For instance, {Beer, Diapers, Milk}
mset
is
its
support
count,
which
refers
to
the
number
of
transactions
ll (or empty) set is an itemset that does
tain• aThe
particular
itemset.
support
count,
support
countMathematically,
of an itemset X the
is the
number
of σ(X), fo
mset
can be
stated
ascontain
follows:
as the X
number
of items
present
in a transtransactions
that
X
ontain an itemset X if X is a subset of
!
!
on shown in Table 6.2 contains the!itemσ(X) = {ti |X ⊆ ti , ti ∈ T }!,
d, Milk}. An important property of an
6.1 Problem Definitio
efers to the number of transactions that
Table 6.2. of
A binary
0/1 representation
data. data
ere the symbol
| · |count,
denote
thefor
number
elements
in aof market
set. basket
In the
matically,
the support
σ(X),
an
TID Bread Milk Diapers Beer Eggs Cola
wn in Table 6.2, the support count
for
{Beer,
Diapers,
Milk}
is
equ
1
1
1
0
0
0
0
2
1
0
1
1all three
1
0
because
there
that
contain
items.
! are only two transactions
• Example
X ⊆ ti , ti ∈ T }!,
3
4
5
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
0
1
What is Rule
the support
of?association rule is an implication expression of
sociation
An
mber of elements in a set. In the data set
m
X {Beer,
−→ Y Diapers,
, where Milk}
X and
Y areto disjoint itemsets, i.e., X ∩ Y = ∅.
nt for
is equal
This
representation
is perhaps a very
of real
market bas
actions
all three
items.
ngth ofthat
ancontain
association
rule
can be measured
insimplistic
terms view
of its
support
because it ignores certain important aspects of the data such as the q
nfidence. Support determines
how
often
is them.
applicable
tohandl
a g
of items sold or
the price
paidato rule
purchase
Methods for
σ(X) = !{ti |X ⊆ ti , ti ∈ T }!,
!
| denote the number of elements in a set. In the data set σ(X) = !{ti |X ⊆ ti , ti ∈
the support
count for rule
{Beer, Diapers, Milk} is equal to
Association
re only two transactions that contain
all
three
items.
where the symbol | · | denote
the number of elem
shown
in Table 6.2,
the support
count
for {Bee
•
An
association
rule
is
an
implication
expression
where
X
and
An association rule is an implication expression of the
twosets
because
re X and
Y two
are disjoint
itemsets,
i.e.,
X ∩ Ythere
= ∅. are
Theonly two transactions tha
Y are
disjoint
item
iation rule can be measured in terms of its support and
ort determines how often a rule
is applicable to
a given
Association
Rule
An
association rule is an
form X −→ Y , where X and Y are disjoint it
strength of an association rule can be measured
confidence.
often a
• The strength of an association
ruleSupport
can be determines
determinedhow
by its
support and confidence
• Support determines how often a rule is applicable to a
given dataset.
• Confidence determines how frequently items in Y appear
in transactions that contain X
330 Chapter
Association Analysis
Support
and 6Confidence
data set, while confidence determines how frequently items in Y appear in
transactions that contain X. The formal definitions of these metrics are
Support, s(X −→ Y ) =
Confidence, c(X −→ Y ) =
σ(X ∪ Y )
;
N
σ(X ∪ Y )
.
σ(X)
(6.1)
(6.2)
Example 6.1. Consider the rule {Milk, Diapers} −→ {Beer}. Since the
support count for {Milk, Diapers, Beer} is 2 and the total number of transactions is 5, the rule’s support is 2/5 = 0.4. The rule’s confidence is obtained
by dividing the support count for {Milk, Diapers, Beer} by the support count
for {Milk, Diapers}. Since there are 3 transactions that contain milk and diapers, the confidence for this rule is 2/3 = 0.67.
Why Use Support and Confidence? Support is an important measure
because a rule that has very low support may occur simply by chance. A
low support rule is also likely to be uninteresting from a business perspective
because it may not be profitable to promote items that customers seldom buy
together (with the exception of the situation described in Section 6.8). For
these reasons, support is often used to eliminate uninteresting rules. As will
not contain any items.
The transaction width is defined as the number of items present in a tr
action. A transaction tj is said to contain an itemset X if X is a subse
330 tChapter
6Confidence
Association
Analysis
Support
and
.
For
example,
the
second
transaction shown in Table 6.2 contains the it
j
set {Bread, Diapers} but not {Bread, Milk}. An important property o
data set, while confidence determines how frequently items in Y appear in
itemset is its support count, which refers to the number of transactions
transactions that contain X. The formal definitions of these metrics are
contain a particular itemset. Mathematically, the support count, σ(X), fo
iation Analysis itemset X can be stated as follows:
σ(X ∪ Y )
Support, s(X −→ Y ) =
;
(6.1)
N
!
!6.1
determines how frequently items in Y appear in
σ(X
∪
Y
)
!
σ(X)
Confidence,
c(X
−→=
Y )!{t=
i |X ⊆ ti , ti ∈. T } ,
X. The formal definitions of these
metrics are
σ(X)
Problem Definitio
(6.2)
Table 6.2. A binary 0/1 representation of market basket data.
σ(X ∪ Y )
ort, s(X −→
Y ) where
= 6.1.
;
Example
Consider
the
rule(6.1)
{Milk,
Diapers}of −→
{Beer}.
the
symbol
|
·
|
denote
the number
elements
in aSince
set. the
In the data
N
TID Bread Milk Diapers Beer Eggs Cola
• Example
support
count
Diapers,
total
of 0transshown
6.2,
the support
for
{Beer,
Diapers,
Milk}0 is equa
σ(Xinfor
∪ Table
Y {Milk,
)
1Beer} count
1is 2 and
1 the
0 number
0
.there
(6.2)
nce, c(X −→
Y ) two
=is 5,because
actions
the
rule’s
support
is
2/5
= 0.4.
confidence
2two
1 The 0rule’s that
1 contain
1 is obtained
are
only
transactions
all1 three0 items.
σ(X)
3
0
1Beer} by
1 the support
1
0 count1
by dividing the support count for {Milk,
Diapers,
4 3 transactions
1
1 that 1contain1milk and
0
the rule {Milk,
Diapers}
−→ {Beer}.
Sinceare
the
for {Milk,
Diapers}.
Since there
di-0
Association
Rule
Antransassociation
rule
of
5
1
1 is an1 implication
0
0expression
1
iapers, Beer}
is
2
and
the
total
number
of
apers, the confidence for this rule is 2/3 = 0.67.
X −→
Y , where
X and Y are disjoint itemsets, i.e., X ∩ Y = ∅.
port is 2/5 = 0.4. form
The rule’s
confidence
is obtained
strength
of by
anthe
association
rule canSupport
be measured
in terms measure
of its support
unt for {Milk,
Diapers,
Beer}
support
count
Why
Use Support
and
Confidence?
is an important
e there are because
3 transactions
that
contain
milk low
and
di- is perhaps
confidence.
Support
determines
howaoccur
often
a rulebyis
applicable
to bask
a g
This
representation
very simplistic
view
of real market
a
rule
that
has
very
support
may
simply
chance.
A
Support
count
(X
U
Y)
=
2
Support(X,
Y ) =aspects
2/5
his rule is 2/3 = 0.67.
because
it
ignores
certain
important
of the data such as the q
Nlow
= 5support rule is also likely to be uninteresting from a business perspective
of profitable
items sold to
or promote
theConf
priceidence(X,
paid to
purchase
them.seldom
Methods
for handli
Y customers
) = 2/3
because
it
may
not
be
items
that
buy
SupportSupport
count (X)
= 3important
Confidence?
is an
non-binarymeasure
data will be explained in Chapter 7.
together (with the exception of the situation described in Section 6.8). For
ery low support may occur simply by chance. A
these reasons, support is often used to eliminate uninteresting rules. As will
!
σ(X) = !{ti |X ⊆
Why Support and
Confidence?
where the symbol | · | denote the number
shown in Table 6.2, the support count fo
• Support is an important measure because a rule that has
two because there are only two transactio
very low support may occur simply by chance
• Confidence measures the reliability of the inference made
Association
Rule
An
association
rule
by a rule.
form X −→ Y , where X and Y are disj
strength of an association rule can be me
• Support is an estimate of the probability:
confidence. Support determines how o
P (X \ Y )
• Confidence is an estimate of the conditional probability:
P (Y |X)
The Association Rule Mining Problem
Given a set of transactions T, find all the rules having
support ≥ minsup and confidence ≥6.1
minconf.
Problem Definiti
• minsup and minconf are the corresponding support and
confidence
thresholds.
rute-force
approach
for mining association rules is to compute
d confidence
for approach
every possible
rule.association
This approach
• A brute-force
for mining
rules is tois proh
compute
the support
and confidence
for every
ve because
there
are exponentially
many
rules possible
that can be e
is thespecifically,
number of items,
i.e. features)
data rule
set. (d
More
the total
number of possible rules e
•
data set that contains d items is
d
R=3 −2
d+1
+ 1.
of for this equation is left as an exercise to the readers (see E
405). Even for the small data set shown in Table 6.1, this a
Apriori algorithm
•
R. Agrawal, and R.Srikant Fast algorithms for mining association rule
in Proceedings of the 20th International Conference on Very Large Databases,
pp. 487-499, by, (1994)
•
The Apriori principle states that all subsets of a frequent itemset
must also be frequent. In other words, if {A, B} is frequent, then
{A} and {B} both must be frequent.
•
Recall also that by definition, the support metric indicates how
frequently an itemset appears in the data. Therefore, if we know
that {A} does not meet a desired support threshold, there is no
reason to consider {A, B} or any itemset containing {A}; it
cannot possibly be frequent.
Apriori algorithm
The Apriori algorithm uses the Apriori principle to exclude
potential association rules prior to actually evaluating
them.
• It occurs in two phases:
• Identifying all itemsets that meet a minimum support
threshold.
• Creating rules from these itemsets that meet a minimum
confidence threshold.
•
Apriori algorithm (phase 1)
The first phase occurs in multiple iterations. Each iteration involves
evaluating the support of storing a set of increasingly large itemsets. E.g. iteration 1 involves evaluating 1-itemsets, iteration 2 evaluates
2-itemsets, …
• The result of each iteration i is a set of all i-itemsets that meet the
minimum support threshold.
•
All the itemsets from iteration i are combined in order to generate
candidate itemsets for evaluation in iteration i + 1.
• If {A}, {B}, and {C} are frequent in iteration 1 while {D} is not
frequent, then iteration 2 will consider only {A, B}, {A, C}, and {B, C}
rather than the six that would have been evaluated if sets containing
D had not been eliminated a priori.
•
Apriori algorithm (phase 1)
The first phase occurs in multiple iterations. Each iteration involves
evaluating the support of storing a set of increasingly large itemsets. E.g. iteration 1 involves evaluating 1-itemsets, iteration 2 evaluates
2-itemsets, …
• The result of each iteration i is a set of all i-itemsets that meet the
minimum support threshold.
•
All the itemsets from iteration i are combined in order to generate
candidate itemsets for evaluation in iteration i + 1.
If {A}, {B},
and {C}
frequent inthat
iteration
while{B,
{D}
notfrequent,
• • During
iteration
2 it are
is discovered
{A, B}1 and
C}isare
frequent,
iteration
2 will
consider
only {A,
B}, {A,begin
C}, and
but
{A, C} then
is not.
Although
iteration
3 would
normally
by {B, C}
rather thanthe
thesupport
six thatfor
would
have
evaluated
if sets
containing
evaluating
{A, B,
C},been
this step
need not
occur
at all.
D had
not been eliminated a priori.
Why
not?
•
Apriori algorithm (phase 2)
When no bigger itemsets exists the second phase of the
Apriori algorithm may begin. Given the set of frequent
itemsets, association rules are generated from all possible
subsets. E.g. {A, B} would result in candidate rules for {A} -> {B}
and {B} -> {A}.
• These are evaluated against a minimum confidence
threshold, and any rules that do not meet the desired
confidence level are eliminated.
•
Association rules discovery
Finding Patterns – Market Basket Analysis Using Association Rules
Before we get into that, it's worth noting that this algorithm, like all learning
algorithms, is not without its strengths and weaknesses. Some of these are listed
as follows:
Strengths
Weaknesses
• Is ideally suited for working with very
large amounts of transactional data
• Results in rules that are easy to
understand
• Useful for "data mining" and
discovering unexpected knowledge in
databases
• Not very helpful for small datasets
• Takes effort to separate the insight
from the common sense
• Easy to draw spurious conclusions
from random patterns
As noted earlier, the Apriori algorithm employs a simple a priori belief as guideline
for reducing the association rule search space: all subsets of a frequent itemset must
also be frequent. This heuristic is known as the Apriori property. Using this astute
observation, it is possible to dramatically limit the number of rules to search. For
example, the set {motor oil, lipstick} can only be frequent if both {motor oil} and
{lipstick} occur frequently as well. Consequently, if either motor oil or lipstick is
infrequent, then any set containing these items can be excluded from the search.
For additional details on the Apriori algorithm, refer to: Fast
algorithms for mining association rule, in Proceedings of the 20th
identifying frequently purchased groceries
with association rules
•
Dataset contained in the arules R package
M. Hahsler, K. Hornik, and T. Reutterer
Implications of probabilistic data modeling for mining association rules
in Studies in Classification, Data Analysis, and Knowledge Organization: from Data and
Information Analysis to Knowledge Engineering, pp. 598–605, (2006).
•
The data contain 9,835 transactions recorded in 30 days
327 transactions per day 30 transactions per hour in a 12 hour business day)
Exploring and preparing the data
•
There might be five brands of milk, a dozen different types of laundry detergent, and three brands of coffee.
•
We are not interested to associations between different brand of milk or
detergent.
•
Thus, all brand names can be removed from the purchases. This reduces
the number of groceries to a more manageable 169 types, using broad
categories such as chicken, frozen meals, margarine, and soda.
Exploring and preparing the data
Exploring and preparing the data
•
To look at the contents of the sparse matrix, use the inspect() function in combination with vector operators
Exploring and preparing the data
•
The itemInfo() show the column labels (items) of the sparse matrix
Exploring and preparing the data
Finding Patterns – Market Basket Analysis Using Association Rules
Step 3 – training a model on the data
The apriori function
With data preparation taken care of, we can now work at finding the associations
among shopping cart items. We will use an implementation of the Apriori algorithm
in the arules package we've been using for exploring and preparing the groceries
data. You'll need to install and load this package if you have not done so already. The
following table shows the syntax for creating sets of rules with the apriori() function:
Although running the apriori() function is straightforward, there can sometimes
be a fair amount of trial and error when finding support and confidence parameters
to produce a reasonable number of association rules. If you set these levels too high,
Train the model
minsup = 0.1, minconf = 0.8
This is not surprising because with a minsup=0.1 in order to generate a rule, an
item must have appeared in at least 0.1 * 9385 = 938.5 transactions.
Since only eight items appeared this frequently in our data, it's no wonder we
didn't find any rules.
Setting the minimum support
•
One way to approach the problem of setting support is to think about the minimum number of transactions you would need before you would consider a pattern interesting.
•
For instance, you could argue that if an item is purchased twice a
day (about 60 times) then it may be worth taking a look at.
•
From there, it is possible to calculate the support level needed to
find only rules matching at least that many transactions.
•
Since 60 out of 9,835 equals 0.006, we'll try setting minsup = 0.006
Setting the minimum confidence
•
Setting the minimum confidence involves a tricky balance.
•
On one hand, if confidence is too low, then we might be
overwhelmed with a large number of unreliable rules
(e.g. rules indicating items commonly purchased with batteries).
•
On the other hand, if we set confidence too high, then we will be
limited to rules that are obvious or inevitable (e.g. like the fact that a
smoke detector is always purchased in combination with batteries).
•
The appropriate minimum confidence level depends a
great deal on the goals of your analysis. If you start with
conservative values, you can always reduce them to broaden the
search if you aren't finding actionable intelligence.
•
Lets start with a minconf = 0.5
Train the model
minsup = 0.006, minconf = 0.25
Inspect rules
Evaluation of association rules
•
•
•
Association analysis algorithms have the potential to generate
a large number of patterns.
Sifting through the patterns to identify the most interesting
ones is not a trivial task because “one person’s trash might be
another person’s treasure.”
It is therefore important to establish a set of well-accepted
criteria for evaluating the quality of association patterns.
• Objective measures
• Subjective arguments
attribute types such as symmetric binary, nominal, and ordinal variab
Limitations of the Support-Confidence Framework Existing
tion rule mining formulation relies on the support and confidence mea
eliminate uninteresting patterns. The drawback of support was previo
scribed in Section 6.8, in which many potentially interesting patterns in
low support items might be eliminated by the support threshold. Th
back of confidence is more subtle and is best demonstrated with the fo
example.
Limitations of Support/Confidence
•
Consider this scenario
6.7
Example 6.3. Suppose we are interested in analyzing the relations
tween people who drink tea and coffee. We may gather information ab
beverage preferences
amongPatterns
a group of 373
people and summarize their re
Evaluation
of Association
into a table such as the one shown in Table 6.8.
The information given in this table can be used to evaluate the association
Table 6.8.
Beverage
preferences
among a group of 1000 people.
rule {T ea} −→ {Cof f ee}. At first glance, it may appear
that
people
who drink
tea also tend to drink coffee because the rule’s support (15%) and
confidence
Cof
f ee Cof f ee
(75%) values are reasonably high. This argument would have
been 150
acceptable
T ea
50
200
Support(T
Cof fofe)people
= 150/1000
= 15%
except
that the ea,
fraction
who drink
coffee, regardless
of whether
they
T ea
650
150
800
drink tea, is 80%, while the fraction of tea drinkers who drink coffee is only
800
200
1000
idence(T
ea,that
Cof af e)
= 150/200
= 75%
75%.Conf
Thus
knowing
person
is a tea
drinker actually decreases her
probability of being a coffee drinker from 80% to 75%! The rule {T ea} −→
{Cof f ee} is therefore misleading despite its high confidence value.
The pitfall of confidence can be traced to the fact that the measure ignores
P (Cof f ee) = 0.80
Knowing that a person is Tea drinking
the support of the itemset in the rule consequent. Indeed, if the support of
herbeprobability
being
coffee drinkers is taken into account,decreases
we would not
surprised to of
find
that a
P (Cof
ee|T ea)
0.75
coffee
drinker
many
of the fpeople
who =
drink
tea also
drink coffee.
What is more surprising is
that the fraction of tea drinkers who drink coffee is actually less than the overall
fraction of people who drink coffee, which points to an inverse relationship
Limitations of Support/Confidence
The pitfall of confidence can be traced to the fact that the
measure ignores the support of the itemset in the rule
consequent.
• If the support of coffee drinkers is taken into account, we
would not be surprised to find that many of the people
who drink tea also drink coffee.
• What is more surprising is that the fraction of tea drinkers
who drink coffee is actually less than the overall fraction of
people who drink coffee, which points to an inverse
relationship between tea drinkers and coffee drinkers.
•
σ(X) = {ti |X ⊆ ti , ti ∈ T } ,
where the symbol | · | denote the number of elements in a set. In the data set
shown
in Table
6.2, the(lift
support
count for {Beer, Diapers, Milk} is equal to
Interest
Factor
measure)
two because there are only two transactions that contain all three items.
Association Rule An association rule is an implication expression of the
form X −→ Y , where X and Y are disjoint itemsets, i.e., X ∩ Y = ∅. The
strength of an association rule can be measured in terms of its support and
Conf idence(X, Y )
confidence.
determines how often a rule is applicable to a given
lif t(X, Y Support
)=
Support(Y )
•
It is defined as the ratio between the rule’s confidence and the
support of the itemset in the rule consequent
• lift(X,Y) =1 X and Y are independent
• lift(X,Y) >1 X and Y are positively correlated
• lift(X,Y) <1 X and Y are negatively correlated
σ(X) = {ti |X ⊆ ti , ti ∈ T } ,
where the symbol | · | denote the number of elements in a set. In the data set
shown
in Table
6.2, the(lift
support
count for {Beer, Diapers, Milk} is equal to
Interest
Factor
measure)
two because there are only two transactions that contain all three items.
• StatisticalRule
interpretation
Association
An association rule is an implication expression of the
form X −→ Y , where X and Y are disjoint itemsets, i.e., X ∩ Y = ∅. The
strength of an association rule can be measured in terms of its support and
Conf idence(X, Y )
confidence.
determines how often a rule is applicable to a given
lif t(X, Y Support
)=
Support(Y )
•
The lift metric compares the frequency of a pattern
against a baseline frequency computed under the
statistical independence assumption
P (Y \ X)
P (Y |X)P (X)
P (Y |X)
lif t(X, Y ) =
=
=
P (X)P (Y )
P (X)P (Y )
P (Y )
baseline independence assumption
Sort rules by lift
Exercises
1.Explore other interestingness measures with the function
interestMeasure() provided by the arules package