Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SEEM4630 E-Commerce Data Mining (2016-17 First Term) Assignment 4 (100 points) Due time and date: 5pm Dec 12, 2016 Submit to assignment box for SEEM4630, 5/F ERB Submission Requirements The hand-in version must be ordered correctly and stapled in the top left corner. Include student name, student ID and assignment number. Question 1. Given the transaction database in Table 1, TID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Items {a, b, d} {a, b, f} {a, b, d} {b, c, f} {c, d, e} {a, b, d, f} {b, d} {d, e} {a, b, d, f} {a, e, g} Table 1. A Transaction Database (a) Suppose the minimum support count min sup = 3. Find all frequent itemsets and list their support counts. (6 pts) (b) List all closed frequent itemsets, and all maximal frequent itemsets. (12 pts) (c) Suppose the minimum confidence threshold is 0.7. Based on the frequent length-3 itemsets found, generate all association rules that satisfy the minimum confidence threshold. List each rule’s support and confidence. (9 pts) Question 2. A database with 100 transactions has its FP-tree shown Figure 1. Figure 1. FP-tree (a) Show item c’s conditional pattern base. (9 pts) (b) Let min_sup = 0.5. Based on the conditional pattern base for item c found above, find all frequent itemsets containing item c and other items, and their support counts. (9 pts) (c) Based on the frequent itemsets you find above, list all association rules that satisfy min_sup = 0.5 and min_conf = 0.8. List each rule's support count and confidence value. (16 pts) Question 3. Table 2 lists a set of closed itemsets discovered from a transaction database with 4 items: a, b, c, and d. Assume minimum support count = 1. List the unreported frequent itemsets and their support counts. (8 pts) ID 1 2 3 4 5 6 7 8 9 10 11 Closed Itemset a b c ab ac bc cd abc acd bcd abcd Support Count 5 4 5 3 3 3 4 2 2 2 1 Table 2. Closed Itemsets Question 4. For each of the following constraints, state whether it is monotone or antimonotone, or convertible monotone/antimonotone. (11 pts) (a) the price difference between the most expensive and the cheapest items in an itemset is within $20. (b) the sum of the price of all the items in an itemset exceeds $100. (c) the average price of all the items in an itemset is greater than $50. Question 5. WEKA Tool Practice. (a) Download the corresponding data file according to your student ID. Let s be the sum of the last 4 digits of your student ID, and r be the remainder of s divided by 5. For example, ID=1155012345, s = 2 + 3 + 4 + 5 = 14, r = 14%5 = 4. r data file 0 autos.arff 1 breastcancer.arff 2 3 4 diabetes.arff sick.arff mushroom.arff (b) Classify the data with decision tree (J48) using percentage split 66%. Copy the result in 'classifier output' window to your assignment. (10pts) (c) Find the association rule. Notice that association rule mining works with discrete data. So first discretize the continuous attributes with equal frequency partitioning with the bin number 10, then run Apriori algorithm with minimum support=30% and minimum confidence=60%. Copy the result in 'associator output' window to your assignment. (10pts)