Download Solutions Data Mining Week 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Solutions
Data Mining Week 2
Jonatan Møller Gøttcke
17. september 2016
Data Mining Week 2
September 17, 2016
Indhold
1 Exercise 2-1
1.1 A . . . .
1.2 B . . . .
1.3 C . . . .
1.4 D . . . .
1.5 E . . . .
Itemsets
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
and Association Rules
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
2 Rules Used
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
3
3
3
4
2/4
Data Mining Week 2
1
1.1
September 17, 2016
Exercise 2-1 Itemsets and Association Rules
A
Write an expression for the maximum number of size-3 itemsets that
can be derived from this data set. I assume that since we have the items
Beer,Butter,Cookies, Diapers,Milk And we want to know how many different
itemsets of length 3 we can make using these 5 items. We have 5 · 4 · 3 Since
we can chose 5 different on the first position 4 on the second and 3 on the last.
Which is 60.
1.2
B
What is the maximum number of association rules that can be extracted from this data (including rules, that have zero support)?
Milk, beer, diapers 0
Bread, butter, milk 1
Milk diapers cookies 0
Bread, butter, cookies 0
Beer, cookies, Diapers 0
Milk, Diapers, Bread, Butter 1
Bread, butter, diapers 1
Beer, Diapers 2
Milk, Diapers, Bread, Butter 1
Beer , cookies 1
This gives a total of 7 association rules.
1.3
C
What is the maximum size of frequent itemsets that can be extracted
(assuming σ > 0)?
The maximum size of a frequent itemsets assuming the thresshold σ > 0 is 4.
Cause the itemset
Milk, Diapers, Bread, Butter 1
is represented in two transactions.
1.4
D
Find an itemset (of size 2 or larger) that has the largest support.
I would conclude that Beer, Diapers has the largest support. Since it’s cover is:
Beer, cookies, Diapers and Milk, beer, diapers
So the support is 15
1.5
E
Find a pair of items, a and b, such that the rules {a} ⇒ {b} and {b} ⇒
{a} have the same confidence.
The items bread and butter occur 5 times each. Every time they occur together.
1/2
= 1 That
This means the items have a support of 1/2 each. Meaning that 1/2
3/4
Data Mining Week 2
September 17, 2016
the relation has a confidence of 100%. If we flip the fraction the result is the
same, and the requirement for an equal confidence holds.
2
Rules Used
cover of an itemset: set of all transactions that contain the itemset: cover(X) =
{(tid, Xtid )|(tid, Xtid ) ∈ D ∧ X ⊆ Xtid
Support of an itemset: the support s of an itemset X (s(X)) is the number
of transactions containing X (i.e., the size of the cover set): s(X) = |cover(X)|
Support of a association rule: s(X ⇒ Y ) = s(X ∪ Y )
)
Confidence: conf (X ⇒ Y ) = s(X∪Y
s(X)
Frequent itemset: given some support threshold σ, an itemset X is frequent
σ
(w.r.t σ) iff: s(X) ≥ σ or equivalently f (X) ≥ |D|
4/4
Related documents