Download Solutions Data Mining Week 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Solutions
Data Mining Week 2
Jonatan Møller Gøttcke
17. september 2016
Data Mining Week 2
September 17, 2016
Indhold
1 Exercise 2-1
1.1 A . . . .
1.2 B . . . .
1.3 C . . . .
1.4 D . . . .
1.5 E . . . .
Itemsets
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
and Association Rules
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
2 Rules Used
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
3
3
3
4
2/4
Data Mining Week 2
1
1.1
September 17, 2016
Exercise 2-1 Itemsets and Association Rules
A
Write an expression for the maximum number of size-3 itemsets that
can be derived from this data set. I assume that since we have the items
Beer,Butter,Cookies, Diapers,Milk And we want to know how many different
itemsets of length 3 we can make using these 5 items. We have 5 · 4 · 3 Since
we can chose 5 different on the first position 4 on the second and 3 on the last.
Which is 60.
1.2
B
What is the maximum number of association rules that can be extracted from this data (including rules, that have zero support)?
Milk, beer, diapers 0
Bread, butter, milk 1
Milk diapers cookies 0
Bread, butter, cookies 0
Beer, cookies, Diapers 0
Milk, Diapers, Bread, Butter 1
Bread, butter, diapers 1
Beer, Diapers 2
Milk, Diapers, Bread, Butter 1
Beer , cookies 1
This gives a total of 7 association rules.
1.3
C
What is the maximum size of frequent itemsets that can be extracted
(assuming σ > 0)?
The maximum size of a frequent itemsets assuming the thresshold σ > 0 is 4.
Cause the itemset
Milk, Diapers, Bread, Butter 1
is represented in two transactions.
1.4
D
Find an itemset (of size 2 or larger) that has the largest support.
I would conclude that Beer, Diapers has the largest support. Since it’s cover is:
Beer, cookies, Diapers and Milk, beer, diapers
So the support is 15
1.5
E
Find a pair of items, a and b, such that the rules {a} ⇒ {b} and {b} ⇒
{a} have the same confidence.
The items bread and butter occur 5 times each. Every time they occur together.
1/2
= 1 That
This means the items have a support of 1/2 each. Meaning that 1/2
3/4
Data Mining Week 2
September 17, 2016
the relation has a confidence of 100%. If we flip the fraction the result is the
same, and the requirement for an equal confidence holds.
2
Rules Used
cover of an itemset: set of all transactions that contain the itemset: cover(X) =
{(tid, Xtid )|(tid, Xtid ) ∈ D ∧ X ⊆ Xtid
Support of an itemset: the support s of an itemset X (s(X)) is the number
of transactions containing X (i.e., the size of the cover set): s(X) = |cover(X)|
Support of a association rule: s(X ⇒ Y ) = s(X ∪ Y )
)
Confidence: conf (X ⇒ Y ) = s(X∪Y
s(X)
Frequent itemset: given some support threshold σ, an itemset X is frequent
σ
(w.r.t σ) iff: s(X) ≥ σ or equivalently f (X) ≥ |D|
4/4
Related documents