Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
“ Add your company slogan ”
Decision Tree
Prem Junsawang
Department of Statistics
Faculty of Science
Khon Kaen University
LOGO
Overview
Predictive Models
The process by which a model is
created or chosen to try to best
predict the probability of an
outcome (Classification)
Overview
Classification
Overview
Classification Concept
• Given a training set: each record
contains a set of attributes (features,
parameters, variables) and a class label
(target)
• Find a model for a function of a set of
attributes
• Gold is to predict class labels of an
unseen records as accurately as
possible:
Overview
Test Set : determine an accuracy of a
model
Examples of classification techniques
• Decision tree
• Genetic Algorithm
• Neural Network
• Bayesian Classifier
• K-nearest neighbor
Overview
Decision Tree
Tree-shaped structure that represent set of
decisions for generating rules for
classification of a dataset.
Genetic Algorithms
Optimization techniques that use the
concept of evolution such as selection,
crossover and mutation
Neural Network
Nonlinear predictive model that learn
through learning and resemble biological
neural network
Overview
Bayesian Classifier
Bayesian Theorem
K-nearest neighbor
Perform prediction by finding the
prediction value of records similar to
the record to be predicted
Determine an email as spam or not spam
Overview
Learning algorithm
• identify a model that best fits the
relationship between the set of
attributes and its class label
Examples:
• Classify credit card transactions as
legitimate (ของจริง) or fraudulent (ของ
ปลอม)
Overview
Classification Process
Tid
Attrib1
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Attrib2
Attrib3
Class
Learn
Model
10
10
Tid
Attrib1
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Attrib2
Attrib3
Class
Apply
Model
Decision Tree Induction
Example of a Decision Tree
Decision Tree Induction
Another Example of a Decision Tree
Decision Tree Induction
Decision Tree Classification Task
Tid
Attrib1
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Attrib2
Attrib3
Class
Learn
Model
10
10
Tid
Attrib1
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Attrib2
Attrib3
Class
Apply
Model
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Apply Model to Test Data
Decision Tree Induction
Decision Tree Classification Task
Tid
Attrib1
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Attrib2
Attrib3
Class
Learn
Model
10
10
Tid
Attrib1
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Attrib2
Attrib3
Class
Apply
Model
Decision Tree Induction
Many Algorithms:
• Hunt’s Algorithm
• CART
• ID 3 or C 4.5
How to build a Decision Tree
• Hunt’s Algorithm The basis of many
existing decision tree induction
algorithms, including ID3, C4.5 and
CART
Decision Tree Induction
Hunt’s Algorithm
• Let’s Dt be the set of training records
that are associated with node t and
y={y1 ,y2 , …,yc} be the class labels
1. If all records in Dt belong to the same
class yt , then t is a leaf node labeled
as yt
2. If Dt contains records that belong to
more than one class, an attribute test
condition is select to partition the
records into smaller subsets
Decision Tree Induction
How to the algorithm works
Decision Tree Induction
How to the algorithm works
Decision Tree Induction
How to the algorithm works
Decision Tree Induction
Design Issues of DT Induction
How should the training records be
split?
• An measurement is used to evaluate the
goodness of each test condition
How should the splitting procedure
stop?
• All records belong to the same class
• The records have identical attribute
values
Decision Tree Induction
Method for Expressing Attribute
Test Condition
1.
2.
3.
4.
Binary Attributes
Nominal Attributes
Ordinal Attributes
Continuous Attributes
Decision Tree Induction
Binary Attributes
Decision Tree Induction
Nominal Attributes
Decision Tree Induction
Ordinal Attributes
Decision Tree Induction
Continuous Attributes
Decision Tree Induction
Splitting Based on Continuous Attributes
Discretization – form an ordinal attribute
•Static – discretize once at the beginning
•Dynamic – range can be determined by equal
interval bucketing, equal frequency bucketing or
clustering
Binary Decision - A<v or A>v
•Consider all possible splits and find the best cut
•More time consumption
Decision Tree Induction
Continuous Attributes
Decision Tree Induction
How to Determine the Best Split
•
•
Nodes with homogeneous
distribution are preferred
Measure of node impurity
Non-homogeneous,
Homogeneous,
High degree of impurity
Low degree of impurity
class
Decision Tree Induction
Decision Tree Induction
Measure of Impurity
Let p(i | t) denote the fraction of records
belonging to class i at a given node t
Entropy(t )
C
p(i | t )log2 p(i | t )
(1)
i 1
Gini(t )
C
1 [ p(i | t )]2
(2)
i 1
Classification error(t ) = 1 max[ p(i | t )]
i
(3)
Where C is the number of classes and 0log2 0 = 0 in entropy calcalation
Decision Tree Induction
Measure of Impurity
p(0|gender) = 10/17, p(1|gender) = 7/17
• Gini Index(Gender)
= 1 - [ (10/17)2+(7/17)2 ]
= 0.4844
• Entropy(Gender)
= - [(10/17) log2(10/17) +(7/17) log2(7/17)]
= 0.7655
• Error (Gender)
= 1- max{(10/17), (7/17)} = 1-(10/17) =0.4118
Decision Tree Induction
ทดสอบ
Car Type
Family
Luxury
Sport
C0: 1
C1: 3
C0: 8
C1: 0
C0: 1
C1: 7
Decision Tree Induction
Gini Index
Decision Tree Induction
Decision Tree Induction
Table: Training data tuples from the all electronics customer
database
Decision Tree Induction
Gini Index is used as impurity
measure
2
2
Gini(root) 1 [ p(0| root ) p(1| root ) ]
5 2 9 2
1[( ) ( ) ] 0.4592
14
14
Gini(root)=Gini(age)=Gini(income)=Gini(student)
Decision Tree Induction
Age: <=30(V1), 31-40(V2) and >40(V3)
Gini (v1 ) 1 [ p (0 | v1 ) 2 p (1| v1 ) 2 ]
3 2 2 2
1 [( ) ( ) ] 0.48
5
5
Gini (v2 ) 1 [ p (0 | v2 ) 2 p (1| v2 ) 2 ]
0 2 4 2
1 [( ) ( ) ] 0
4
4
Gini (v3 ) 1 [ p (0 | v3 ) 2 p (1| v3 ) 2 ]
2 2 3 2
1 [( ) ( ) ] 0.48
5
5
Decision Tree Induction
Gain(age)
Gain(age)
N (v3 )
N (v1 )
N (v2 )
Gini(age) [
Gini(v1 )
Gini(v2 )
Gini(v3 )]
N
N
N
5
4
5
0.4592 [( )0.48 ( )0 ( )0.48]
14
14
14
0.1163
Decision Tree Induction
Income: high, medium and low
Gini (h) 1 [ p (0 | h) 2 p (1| h) 2 ]
2 2 2 2
1 [( ) ( ) ] 0.5
4
4
Gini (m) 1 [ p (0 | m) 2 p (1| m) 2 ]
2 2 4 2
1 [( ) ( ) ] 0.44
6
6
Gini (l ) 1 [ p (0 | l ) 2 p (1| l ) 2 ]
1 2 3 2
1 [( ) ( ) ] 0.38
4
4
Decision Tree Induction
Gain(income)
Gain(income)
N (h)
N (m)
N (l )
Gini(income) [
Gini(h)
Gini(m)
Gini(l )]
N
N
N
4
6
4
0.4592 [( )0.5 ( )0.44 ( )0.38]
14
14
14
0.0192
Decision Tree Induction
Student: No and Yes
Gini ( N ) 1 [ p (0 | No) 2 p (1| No) 2 ]
4 2 3 2
1 [( ) ( ) ] 0.49
7
7
Gini (Y ) 1 [ p (0 | Yes ) 2 p (1| Yes ) 2 ]
1 2 5 2
1 [( ) ( ) ] 0.47
7
7
Decision Tree Induction
Gain(student)
Gain(student)
N ( No)
N (Yes)
Gini(student) [
Gini( No)
Gini(Yes)]
N
N
7
7
0.4592 [( )0.49 ( )0.47]
14
14
0.0208
Decision Tree Induction
Credit rating: fair and excellent
Gini ( f ) 1 [ p (0 | f ) 2 p (1| f ) 2 ]
2 2 6 2
1 [( ) ( ) ] 0.38
8
8
Gini (e) 1 [ p (0 | e) 2 p (1| e) 2 ]
1 2 5 2
1 [( ) ( ) ] 0.47
7
7
Decision Tree Induction
Gain(student)
N (e)
N( f )
Gain(credit_rating) Gini(parent) [
Gini(e)
Gini( f )]
N
N
8
6
0.4592 [( )0.38 ( )0.5 0.0278
14
14
Decision Tree Induction
Gain Information
Gain(age) = 0.1163
Gain(income) = 0.0192
Gain(student) = -0.0208
Gain(credit_rating) = 0.0278
Decision Tree Induction
Decision Tree Induction
Final Decision Tree
Extract Rules
IF age = “<=30” And student=”no”
THEN buys_computer = “no”
IF age = “<=30” And student=”yes”
THEN buys_computer = “yes”
IF age = “31-40”
THEN buys_computer = “yes”
IF age = “>40”
AND credit_tating =
“excellent” THEN buys_computer = “no”
IF age = “>40”
AND credit_tating = “fair”
THEN buys_computer = “yes”
Decision Tree Induction
How to extract classification rules
from decision tree
1. Prepruning approach
2. Postpruning approach
Prepruning
Measures such as information gain can be
used to assess the goodness of a split
If partitioning the samples at a node would
result in a split that falls below a prespecified
threshold, then further partitioning of the
given subset is halted
Difficulties in choosing an appropriate
threshold.
High thresholds => oversimplified trees
Low thresholds => complicated trees
Postpruning
Some branches are remove from a fully
grown tree
The cost complexity pruning algorithm is an
example of the postpruning approach
Alternatively, prepruning and postpruning
may be combined
Characteristic of DT
Nonparametric approach for building
classification models
Robust to the presence of noise of
data set
The presence of redundant attributes
does not adversely affect the
accuracy of decision trees
Evaluating the perf. of a Classifier
Holdout Method
Random Subsampling
Cross-validation
Holdout Method
The original data with
examples is partitioned:
labeled
Two disjoint sets, called the training
and the test sets, respectively(e.g.,
50-50 or two-thirds for training and
one-third for testing).
Random Subsampling
The holdout method can be
repeated several times to improve
the estimation of a classifier’s
performance known as random
subsampling.
The overall accuracy is given by the
average accuracy of all iterations.
K-Fold Cross-validation
The dataset is partitioned into k
equal-sized parts.
One of the parts is used for testing,
while the rest of them are used for
training.
This procedure is repeated k times
so that each partition is used for
testing only once
, the total error is give by summing
up the errors for all k runs.
References
P. N. Tan, M Steinbach, V. Kumar,”Introduction
to data mining”, Pearson Addison Wesley.
เอกสารประกอบการสอนรายวิชา KNOWLEDGE / DATA MINING โดย
ผศ.ดร. จันทรเจา มงคลนาวิน
“ Add your company slogan ”
LOGO
Decision Tree Induction
How to the algorithm works
Tid
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1
Yes
Single
125k
No
2
No
Married
100k
No
3
No
Single
70k
No
4
Yes
Married
120k
No
5
No
Divorced
95k
Yes
6
No
Married
60k
No
7
Yes
Divorced
220k
No
8
No
Single
85k
Yes
9
No
Married
75k
No
10
No
Single
90k
Yes
Decision Tree Induction
How to the algorithm works
Decision Tree Induction
ทดสอบ
Car Type
p(0|Car) = 10/20, p(1|Car) = 10/20
• Gini Index(Car)
= 1 - [ (10/20)2+(10/20)2 ]
= 0.5
Family
Luxury
Sport
C0: 1
C1: 3
C0: 8
C1: 0
• Entropy(Gender)
= - [(10/20) log2(10/20) +(10/20) log2(10/20)]
=1
• Error (Gender)
= 1- max{(10/20), (10/20)} = 1-(10/20) =0.5
C0: 1
C1: 7