Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining and Knowledge Discovery (KSE525) Assignment #3 (April 20, 2011) 1. [10 points] Build the decision tree for the following relational table. label. Use the information gain for attribute selection. the best. The last attribute is the class Let's assume that multi-way split is always You need to explain how you calculated the information gain in detail. ID code Outlook Temperature Humidity Windy Play a b c d e f g h i j k l m n Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High High Normal Normal Normal High Normal Normal Normal High Normal High False True False False False True True False False False True True False True No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 2. [6 points] Why is naive Bayesian classification called “naive?” Briefly outline the major ideas of naive Bayesian classification. 3. [6 points] Discuss the advantages and disadvantages of lazy classification (e.g., k-nearest neighbor classification) in comparison with eager classification. 4. [8 points] What is overfitting in classification? accuracy? Why does overfitting degrade classification How can we avoid such overfitting? 5. [20 points] Download and install Weka (explained in class). Then, build the decision tree using J48 (C4.5) for the Wine data set in the UCI machine learning repository. modify the format of the original data file as required by Weka. representation of the decision tree. Notice that you need to Copy and paste the text