Download KSE525 - Data Mining Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Data Mining and Knowledge Discovery (KSE525)
Assignment #3 (April 20, 2011)
1. [10 points] Build the decision tree for the following relational table.
label.
Use the information gain for attribute selection.
the best.
The last attribute is the class
Let's assume that multi-way split is always
You need to explain how you calculated the information gain in detail.
ID code
Outlook
Temperature
Humidity
Windy
Play
a
b
c
d
e
f
g
h
i
j
k
l
m
n
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
False
True
False
False
False
True
True
False
False
False
True
True
False
True
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
2. [6 points] Why is naive Bayesian classification called “naive?”
Briefly outline the major ideas of
naive Bayesian classification.
3. [6 points] Discuss the advantages and disadvantages of lazy classification (e.g., k-nearest neighbor
classification) in comparison with eager classification.
4. [8 points] What is overfitting in classification?
accuracy?
Why does overfitting degrade classification
How can we avoid such overfitting?
5. [20 points] Download and install Weka (explained in class).
Then, build the decision tree using
J48 (C4.5) for the Wine data set in the UCI machine learning repository.
modify the format of the original data file as required by Weka.
representation of the decision tree.
Notice that you need to
Copy and paste the text