Download KSE525 - Data Mining Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining and Knowledge Discovery (KSE525)
Assignment #3 (April 19, 2012)
1. [10 points] Build the decision tree for the following relational table.
label.
Use the information gain for attribute selection.
the best.
The last attribute is the class
Let's assume that multi-way split is always
You need to explain how you calculated the information gain in detail.
ID code
Outlook
Temperature
Humidity
Windy
Play
a
b
c
d
e
f
g
h
i
j
k
l
m
n
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
False
True
False
False
False
True
True
False
False
False
True
True
False
True
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
2. [6 points] Classification can be used for automatic speech recognition which is one of the main features
of Apple Siri.
Discuss what the class label is in this type of applications.
Then, briefly explain what
classification techniques can be used for developing the application.
3. [6 points] Discuss the advantages and disadvantages of lazy classification (e.g., k-nearest neighbor
classification) in comparison with eager classification.
4. [8 points] A notable problem of the information gain is that it prefers attributes with a large number
of distinct values.
Explain why the information gain suffers from the problem and why the gain
ratio or Gini index does not.
5. [20 points] Download and install Weka (explained in class).
Then, build the decision tree using
J48 (C4.5) for the Wine data set in the UCI machine learning repository.
modify the format of the original data file as required by Weka.
representation of the decision tree.
Notice that you need to
Copy and paste the text
Related documents