Download Assignment 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SEEM4630 E-Commerce Data Mining
(2016-17 First Term)
Assignment 4 (100 points)
Due time and date: 5pm Dec 12, 2016
Submit to assignment box for SEEM4630, 5/F ERB
Submission Requirements
The hand-in version must be ordered correctly and stapled in the top left corner.
Include student name, student ID and assignment number.
Question 1. Given the transaction database in Table 1,
TID
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
Items
{a, b, d}
{a, b, f}
{a, b, d}
{b, c, f}
{c, d, e}
{a, b, d, f}
{b, d}
{d, e}
{a, b, d, f}
{a, e, g}
Table 1. A Transaction Database
(a) Suppose the minimum support count min sup = 3. Find all frequent itemsets
and list their support counts. (6 pts)
(b) List all closed frequent itemsets, and all maximal frequent itemsets. (12 pts)
(c) Suppose the minimum confidence threshold is 0.7. Based on the frequent
length-3 itemsets found, generate all association rules that satisfy the
minimum confidence threshold. List each rule’s support and confidence. (9 pts)
Question 2. A database with 100 transactions has its FP-tree shown Figure 1.
Figure 1. FP-tree
(a) Show item c’s conditional pattern base. (9 pts)
(b) Let min_sup = 0.5. Based on the conditional pattern base for item c found above,
find all frequent itemsets containing item c and other items, and their support counts.
(9 pts)
(c) Based on the frequent itemsets you find above, list all association rules that satisfy
min_sup = 0.5 and min_conf = 0.8. List each rule's support count and confidence
value. (16 pts)
Question 3. Table 2 lists a set of closed itemsets discovered from a transaction
database with 4 items: a, b, c, and d. Assume minimum support count = 1. List
the unreported frequent itemsets and their support counts. (8 pts)
ID
1
2
3
4
5
6
7
8
9
10
11
Closed Itemset
a
b
c
ab
ac
bc
cd
abc
acd
bcd
abcd
Support Count
5
4
5
3
3
3
4
2
2
2
1
Table 2. Closed Itemsets
Question 4. For each of the following constraints, state whether it is monotone or
antimonotone, or convertible monotone/antimonotone. (11 pts)
(a) the price difference between the most expensive and the cheapest items in an
itemset is within $20.
(b) the sum of the price of all the items in an itemset exceeds $100.
(c) the average price of all the items in an itemset is greater than $50.
Question 5. WEKA Tool Practice.
(a) Download the corresponding data file according to your student ID. Let s be the
sum of the last 4 digits of your student ID, and r be the remainder of s divided by 5.
For example, ID=1155012345, s = 2 + 3 + 4 + 5 = 14, r = 14%5 = 4.
r
data
file
0
autos.arff
1
breastcancer.arff
2
3
4
diabetes.arff sick.arff mushroom.arff
(b) Classify the data with decision tree (J48) using percentage split 66%. Copy the
result in 'classifier output' window to your assignment. (10pts)
(c) Find the association rule. Notice that association rule mining works with discrete
data. So first discretize the continuous attributes with equal frequency partitioning
with the bin number 10, then run Apriori algorithm with minimum support=30% and
minimum confidence=60%. Copy the result in 'associator output' window to your
assignment. (10pts)
Related documents