Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Solutions Data Mining Week 2 Jonatan Møller Gøttcke 17. september 2016 Data Mining Week 2 September 17, 2016 Indhold 1 Exercise 2-1 1.1 A . . . . 1.2 B . . . . 1.3 C . . . . 1.4 D . . . . 1.5 E . . . . Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Rules Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 3 3 4 2/4 Data Mining Week 2 1 1.1 September 17, 2016 Exercise 2-1 Itemsets and Association Rules A Write an expression for the maximum number of size-3 itemsets that can be derived from this data set. I assume that since we have the items Beer,Butter,Cookies, Diapers,Milk And we want to know how many different itemsets of length 3 we can make using these 5 items. We have 5 · 4 · 3 Since we can chose 5 different on the first position 4 on the second and 3 on the last. Which is 60. 1.2 B What is the maximum number of association rules that can be extracted from this data (including rules, that have zero support)? Milk, beer, diapers 0 Bread, butter, milk 1 Milk diapers cookies 0 Bread, butter, cookies 0 Beer, cookies, Diapers 0 Milk, Diapers, Bread, Butter 1 Bread, butter, diapers 1 Beer, Diapers 2 Milk, Diapers, Bread, Butter 1 Beer , cookies 1 This gives a total of 7 association rules. 1.3 C What is the maximum size of frequent itemsets that can be extracted (assuming σ > 0)? The maximum size of a frequent itemsets assuming the thresshold σ > 0 is 4. Cause the itemset Milk, Diapers, Bread, Butter 1 is represented in two transactions. 1.4 D Find an itemset (of size 2 or larger) that has the largest support. I would conclude that Beer, Diapers has the largest support. Since it’s cover is: Beer, cookies, Diapers and Milk, beer, diapers So the support is 15 1.5 E Find a pair of items, a and b, such that the rules {a} ⇒ {b} and {b} ⇒ {a} have the same confidence. The items bread and butter occur 5 times each. Every time they occur together. 1/2 = 1 That This means the items have a support of 1/2 each. Meaning that 1/2 3/4 Data Mining Week 2 September 17, 2016 the relation has a confidence of 100%. If we flip the fraction the result is the same, and the requirement for an equal confidence holds. 2 Rules Used cover of an itemset: set of all transactions that contain the itemset: cover(X) = {(tid, Xtid )|(tid, Xtid ) ∈ D ∧ X ⊆ Xtid Support of an itemset: the support s of an itemset X (s(X)) is the number of transactions containing X (i.e., the size of the cover set): s(X) = |cover(X)| Support of a association rule: s(X ⇒ Y ) = s(X ∪ Y ) ) Confidence: conf (X ⇒ Y ) = s(X∪Y s(X) Frequent itemset: given some support threshold σ, an itemset X is frequent σ (w.r.t σ) iff: s(X) ≥ σ or equivalently f (X) ≥ |D| 4/4