Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CE417 - Data Mining Course Homework Assignment #2 Fall 2007 1. What is the essential difference between association rules and decision rules? 2. Consider the following data set shown as Table. The goal is to develop association rules using the a priori algorithm for trying to predict when a certain (evidently indoor) game may be played. a. Using 75% minimum confidence and 20% minimum support, generate one-antecedent association rules for predicting play. b. Using 75% minimum confidence and 20% minimum support, generate two-antecedent association rules for predicting play. c. Multiply the observed support times the confidence for each of the rules in part a and b, and rank them in a table. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook sunny sunny overcast rain rain rain overcast sunny sunny rain sunny overcast overcast rain Temperature Humidity hot high hot high hot high mild high cool normal cool normal cool normal mild high cool normal mild normal mild normal mild high hot normal mild high Windy FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE Play no no yes yes yes no yes no yes yes yes yes yes no 3. In neural network algorithm a. Should we prefer a large hidden layer or a small one? Describe the benefits and drawbacks of each. b. Describe the benefits and drawbacks of using large or small values for the learning rate and momentum term. 4. Explain the fundamental differences between the design of an artificial neural network and “classical” information-processing system. 5. Consider the data in the following table. The target variable is salary. Start by discretizing salary as follows: • Less than $35,000 Level 1 • $35,000 to less than $45,000 Level 2 • $45,000 to less than $55,000 Level 3 • Above $55,000 Level 4 a. Construct a classification and regression tree to classify salary based on the other variables. b. Construct a C4.5 decision tree to classify salary based on the other variables. c. Compare the two decision trees and discuss the benefits and drawbacks of each. d. Generate the full set of decision rules for the CART decision tree. e. Generate the full set of decision rules for the C4.5 decision tree. f. Compare the two sets of decision rules and discuss the benefits and drawbacks of each. Occupation Service Management Sales Staff Notes: • • • Gender Female Male Male Male Female Male Female Female Male Female Male Age 45 25 33 25 35 26 45 40 30 50 25 Salary $48,000 $25,000 $35,000 $45,000 $65,000 $45,000 $70,000 $50,000 $40,000 $40,000 $25,000 All homeworks must be solved and written independently. If you use someone else’s work including books, papers or any other material, then you have to acknowledge it and directly cite those resources in every place in your document that they are used. You should submit your solutions in PDF format to [email protected], before 30th of Aban. The subject of the email should conform to the following format: [DMC][HW2][your student number(s)] For example: [DMC][HW2][87777777-86666666] Your email should have one PDF attachment that contains your solutions. The name of the file should be your student number and the file should reflect your full name. You should also deliver a hard copy of your solutions to Dr. Abolhassani in the first session of the class after the deadline.