Download Decision Trees II - University of North Texas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 5
DSCI 4520/5240 Data Mining - Lecture Notes
DSCI 4520/5240
DATA MINING
5-1
DSCI 4520/5240 DBDSS
(DATA MINING)
DSCI 4520/5240 Lecture 5
Decision Trees II
Some slide material taken from: Witten & Frank 2000, Olson & Shi 2007,
de Ville 2006, SAS Education 2005
Lecture 5 - 1
DSCI 4520/5240
A simple example: Weather Data
DATA MINING
Outlook
Temp
Humidity
Windy
Play?
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
Rainy
Mild
High
False
Yes
Rainy
Cool
Normal
False
Yes
Rainy
Cool
Normal
True
No
Overcast
Cool
Normal
True
Yes
Sunny
Mild
High
False
No
Sunny
Cool
Normal
False
Yes
Rainy
Mild
Normal
False
Yes
Sunny
Mild
Normal
True
Yes
Overcast
Mild
High
True
Yes
Overcast
Hot
Normal
False
Yes
Rainy
Mild
High
True
No
© 2012 University of North Texas
Lecture 5 - 11
Lecture 5
DSCI 4520/5240 Data Mining - Lecture Notes
Pseudo-code for 1R
DSCI 4520/5240
Decision tree for the weather
data
DSCI 4520/5240
DATA MINING
DATA MINING
For each attribute,
For each value of the attribute, make a rule as
follows:
count how often each class appears
find the most frequent class
make the rule assign
g that class to this
attribute-value
Calculate the error rate of the rules
Choose the rules with the smallest error rate
5-2
Outlook
sunny
rainy
overcast
Windyy
Humidityy
yes
high
no
normal
yes
false
yes
true
no
Let’s apply 1R on the weather data:
„
Consider the first (outlook) of the 4 attributes (outlook, temp,
humidity, windy). Consider all values (sunny, overcast,
rainy) and make 3 corresponding rules. Continue until you
get all 4 sets of rules.
Lecture 4 - 10
DSCI 4520/5240
Evaluating the Weather Attributes in 1R
DATA MINING
Lecture 4 - 13
Discretization in 1R
DSCI 4520/5240
DATA MINING
Consider continuous Temperature data, after sorting them in ascending order:
65 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No
One way to discretize temperature is to place breakpoints wherever the class changes:
Yes | No | Yes Yes Yes | No No | Yes Yes Yes | No | Yes Yes | No
To avoid overfitting, 1R adopts the rule that observations of the majority class in each
partition be as many as possible but no more than 3, unless there is a “run”:
Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No
(*) indicates a
random choice
between two
equally likely
outcomes
If adjacent partitions have the same majority class, the partitions are merged:
Yes No Yes Yes Yes No No Yes Yes Yes | No Yes Yes No
The final discretization leads to the rule set:
IF temperature <= 77.5 THEN Yes
IF temperature > 77.5 THEN No
Lecture 4 - 12
© 2012 University of North Texas
Lecture 4 - 14
DSCI 4520/5240 Data Mining - Lecture Notes
Lecture 5
Which attribute to select?
DSCI 4520/5240
DATA MINING
Outlook
overcast
yes
yes
no
no
no
yes
yes
yes
yes
hot
yes
yes
yes
no
no
yes
yes
no
no
Windy
false
Humidity
high
yes
yes
yes
no
no
no
no
DSCI 4520/5240
„Information
Temperature
rainy
sunny
normal
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
yes
no
no
mild
yes
yes
yes
yes
no
no
•
•
•
•
is measured in bits
„Given
a probability distribution, the info required to predict
an event is the distribution’s entropy
cool
yes
yes
yes
no
„Entropy
gives the additional required information (i.e., the
information deficit) in bits
„
true
This can involve fractions of bits!
„The
negative sign in the entropy formula is needed to convert
all negative logs back to positive values
yes
yes
yes
no
no
no
Formula for computing the entropy:
Entropy (p1, p2, …, pn) = –p1 logp1 –p2 logp2 … –pn logpn
Lecture 4 - 25
A criterion for attribute selection
DATA MINING
•
Computing Information
DSCI 4520/5240
DATA MINING
5-3
Lecture 4 - 27
Continuing to split
DSCI 4520/5240
DATA MINING
Which is the best attribute?
The one which will result in the smallest tree.
Heuristic: choose the attribute that produces the “purest” nodes!
Popular impurity criterion: Information. This is the extra
information needed to classify an instance
instance. It takes a low value
for pure nodes and a high value for impure nodes.
We can then compare a tree before the split and after the split
using Information Gain = Info (before) – Info (after).
Information Gain increases with the average purity of the
subsets that an attribute produces
Strategy: choose attribute that results in greatest information
gain
Outlook
Outlook
Temperature
hot
no
no
mild
yes
no
Outlook
sunny
sunny
sunny
Windy
cool
false
yes
yes
yes
no
no
true
yes
no
Humidity
high
no
no
no
normal
yes
yes
Gain (Temperature) = 0.571 bits
Gain (Humidity) = 0.971 bits
Gain (Windy) = 0.020 bits
Lecture 4 - 26
© 2012 University of North Texas
Lecture 4 - 34
Lecture 5
DSCI 4520/5240 Data Mining - Lecture Notes
DSCI 4520/5240
5-4
Weather example: attribute “outlook”
DATA MINING
• Outlook = “Sunny”
Info([2,3])
([ , ]) = entropy(2/5,
py( , 3/5)) =
–2/5log(2/5) –3/5log(3/5) = 0.971 bits
Info([2,3])
Outlook
rainy
sunny
yes
yes
no
no
no
overcast
yes
yes
yes
yes
yes
yes
yes
no
no
• Outlook = “Overcast”
Info([4,0]) = entropy(1, 0) = –1log(1) –0log(0) = 0 bits (by definition)
• Outlook = “Rainy”
Rainy
Info([3,2]) = entropy(3/5, 2/5) = –3/5log(3/5) –2/5log(2/5) = 0.971 bits
Expected Information for attribute Outlook:
Info([3,2], [4,0], [3,2]) = (5/14)×0.971+ (4/14)×0 + (5/14)×0.971 =
0.693 bits.
Lecture 4 - 32
DSCI 4520/5240
Computing the Information Gain
DATA MINING
• Information Gain = Information Before – Information After
Gain (Outlook) = info([9
info([9,5])
5]) – info([2,3],
info([2 3] [4
[4,0],
0] [3
[3,2])
2]) = 00.940
940 – 0.693
0 693
= 0.247 bits
• Information Gain for attributes from the Weather Data:
Gain (Outlook) = 0.247 bits
Gain (Temperature) = 0.029 bits
Gain (Humidity) = 0.152 bits
Gain (Windy) = 0.048
Lecture 4 - 33
© 2012 University of North Texas