Download Decision Tree - home.kku.ac.th

“ Add your company slogan ” Decision Tree Prem Junsawang Department of Statistics Faculty of Science Khon Kaen University LOGO Overview  Predictive Models  The process by which a model is created or chosen to try to best predict the probability of an outcome (Classification) Overview  Classification Overview  Classification Concept • Given a training set: each record contains a set of attributes (features, parameters, variables) and a class label (target) • Find a model for a function of a set of attributes • Gold is to predict class labels of an unseen records as accurately as possible: Overview  Test Set : determine an accuracy of a model  Examples of classification techniques • Decision tree • Genetic Algorithm • Neural Network • Bayesian Classifier • K-nearest neighbor Overview  Decision Tree  Tree-shaped structure that represent set of decisions for generating rules for classification of a dataset.  Genetic Algorithms  Optimization techniques that use the concept of evolution such as selection, crossover and mutation  Neural Network  Nonlinear predictive model that learn through learning and resemble biological neural network Overview  Bayesian Classifier  Bayesian Theorem  K-nearest neighbor  Perform prediction by finding the prediction value of records similar to the record to be predicted  Determine an email as spam or not spam Overview  Learning algorithm • identify a model that best fits the relationship between the set of attributes and its class label  Examples: • Classify credit card transactions as legitimate (ของจริง) or fraudulent (ของ ปลอม) Overview  Classification Process Tid Attrib1 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Attrib2 Attrib3 Class Learn Model 10 10 Tid Attrib1 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? Attrib2 Attrib3 Class Apply Model Decision Tree Induction  Example of a Decision Tree Decision Tree Induction  Another Example of a Decision Tree Decision Tree Induction  Decision Tree Classification Task Tid Attrib1 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Attrib2 Attrib3 Class Learn Model 10 10 Tid Attrib1 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? Attrib2 Attrib3 Class Apply Model Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Apply Model to Test Data Decision Tree Induction  Decision Tree Classification Task Tid Attrib1 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Attrib2 Attrib3 Class Learn Model 10 10 Tid Attrib1 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? Attrib2 Attrib3 Class Apply Model Decision Tree Induction  Many Algorithms: • Hunt’s Algorithm • CART • ID 3 or C 4.5  How to build a Decision Tree • Hunt’s Algorithm The basis of many existing decision tree induction algorithms, including ID3, C4.5 and CART Decision Tree Induction  Hunt’s Algorithm • Let’s Dt be the set of training records that are associated with node t and y={y1 ,y2 , …,yc} be the class labels 1. If all records in Dt belong to the same class yt , then t is a leaf node labeled as yt 2. If Dt contains records that belong to more than one class, an attribute test condition is select to partition the records into smaller subsets Decision Tree Induction  How to the algorithm works Decision Tree Induction  How to the algorithm works Decision Tree Induction  How to the algorithm works Decision Tree Induction Design Issues of DT Induction  How should the training records be split? • An measurement is used to evaluate the goodness of each test condition  How should the splitting procedure stop? • All records belong to the same class • The records have identical attribute values Decision Tree Induction  Method for Expressing Attribute Test Condition 1. 2. 3. 4. Binary Attributes Nominal Attributes Ordinal Attributes Continuous Attributes Decision Tree Induction  Binary Attributes Decision Tree Induction  Nominal Attributes Decision Tree Induction  Ordinal Attributes Decision Tree Induction  Continuous Attributes Decision Tree Induction  Splitting Based on Continuous Attributes  Discretization – form an ordinal attribute •Static – discretize once at the beginning •Dynamic – range can be determined by equal interval bucketing, equal frequency bucketing or clustering  Binary Decision - A<v or A>v •Consider all possible splits and find the best cut •More time consumption Decision Tree Induction  Continuous Attributes Decision Tree Induction  How to Determine the Best Split • • Nodes with homogeneous distribution are preferred Measure of node impurity Non-homogeneous, Homogeneous, High degree of impurity Low degree of impurity class Decision Tree Induction Decision Tree Induction  Measure of Impurity Let p(i | t) denote the fraction of records belonging to class i at a given node t Entropy(t ) C    p(i | t )log2 p(i | t ) (1) i 1 Gini(t ) C  1 [ p(i | t )]2 (2) i 1 Classification error(t ) = 1  max[ p(i | t )] i (3) Where C is the number of classes and 0log2 0 = 0 in entropy calcalation Decision Tree Induction  Measure of Impurity p(0|gender) = 10/17, p(1|gender) = 7/17 • Gini Index(Gender) = 1 - [ (10/17)2+(7/17)2 ] = 0.4844 • Entropy(Gender) = - [(10/17) log2(10/17) +(7/17) log2(7/17)] = 0.7655 • Error (Gender) = 1- max{(10/17), (7/17)} = 1-(10/17) =0.4118 Decision Tree Induction  ทดสอบ Car Type Family Luxury Sport C0: 1 C1: 3 C0: 8 C1: 0 C0: 1 C1: 7 Decision Tree Induction  Gini Index Decision Tree Induction Decision Tree Induction Table: Training data tuples from the all electronics customer database Decision Tree Induction  Gini Index is used as impurity measure 2 2 Gini(root)  1 [ p(0| root )  p(1| root ) ] 5 2 9 2  1[( )  ( ) ]  0.4592 14 14 Gini(root)=Gini(age)=Gini(income)=Gini(student) Decision Tree Induction  Age: <=30(V1), 31-40(V2) and >40(V3) Gini (v1 )  1  [ p (0 | v1 ) 2  p (1| v1 ) 2 ] 3 2 2 2  1  [( )  ( ) ]  0.48 5 5 Gini (v2 )  1  [ p (0 | v2 ) 2  p (1| v2 ) 2 ] 0 2 4 2  1  [( )  ( ) ]  0 4 4 Gini (v3 )  1  [ p (0 | v3 ) 2  p (1| v3 ) 2 ] 2 2 3 2  1  [( )  ( ) ]  0.48 5 5 Decision Tree Induction  Gain(age) Gain(age) N (v3 ) N (v1 ) N (v2 )  Gini(age)  [ Gini(v1 )  Gini(v2 )  Gini(v3 )] N N N 5 4 5  0.4592  [( )0.48  ( )0  ( )0.48] 14 14 14  0.1163 Decision Tree Induction  Income: high, medium and low Gini (h)  1  [ p (0 | h) 2  p (1| h) 2 ] 2 2 2 2  1  [( )  ( ) ]  0.5 4 4 Gini (m)  1  [ p (0 | m) 2  p (1| m) 2 ] 2 2 4 2  1  [( )  ( ) ]  0.44 6 6 Gini (l )  1  [ p (0 | l ) 2  p (1| l ) 2 ] 1 2 3 2  1  [( )  ( ) ]  0.38 4 4 Decision Tree Induction  Gain(income) Gain(income) N (h) N (m) N (l )  Gini(income)  [ Gini(h)  Gini(m)  Gini(l )] N N N 4 6 4  0.4592  [( )0.5  ( )0.44  ( )0.38] 14 14 14  0.0192 Decision Tree Induction  Student: No and Yes Gini ( N )  1  [ p (0 | No) 2  p (1| No) 2 ] 4 2 3 2  1  [( )  ( ) ]  0.49 7 7 Gini (Y )  1  [ p (0 | Yes ) 2  p (1| Yes ) 2 ] 1 2 5 2  1  [( )  ( ) ]  0.47 7 7 Decision Tree Induction  Gain(student) Gain(student) N ( No) N (Yes)  Gini(student)  [ Gini( No)  Gini(Yes)] N N 7 7  0.4592  [( )0.49  ( )0.47] 14 14  0.0208 Decision Tree Induction  Credit rating: fair and excellent Gini ( f )  1  [ p (0 | f ) 2  p (1| f ) 2 ] 2 2 6 2  1  [( )  ( ) ]  0.38 8 8 Gini (e)  1  [ p (0 | e) 2  p (1| e) 2 ] 1 2 5 2  1  [( )  ( ) ]  0.47 7 7 Decision Tree Induction  Gain(student) N (e) N( f ) Gain(credit_rating)  Gini(parent)  [ Gini(e)  Gini( f )] N N 8 6  0.4592  [( )0.38  ( )0.5  0.0278 14 14 Decision Tree Induction  Gain Information Gain(age) = 0.1163 Gain(income) = 0.0192 Gain(student) = -0.0208 Gain(credit_rating) = 0.0278 Decision Tree Induction Decision Tree Induction  Final Decision Tree Extract Rules  IF age = “<=30” And student=”no” THEN buys_computer = “no”  IF age = “<=30” And student=”yes” THEN buys_computer = “yes”  IF age = “31-40” THEN buys_computer = “yes”  IF age = “>40” AND credit_tating = “excellent” THEN buys_computer = “no”  IF age = “>40” AND credit_tating = “fair” THEN buys_computer = “yes” Decision Tree Induction  How to extract classification rules from decision tree 1. Prepruning approach 2. Postpruning approach Prepruning  Measures such as information gain can be used to assess the goodness of a split  If partitioning the samples at a node would result in a split that falls below a prespecified threshold, then further partitioning of the given subset is halted    Difficulties in choosing an appropriate threshold. High thresholds => oversimplified trees Low thresholds => complicated trees Postpruning  Some branches are remove from a fully grown tree  The cost complexity pruning algorithm is an example of the postpruning approach  Alternatively, prepruning and postpruning may be combined Characteristic of DT  Nonparametric approach for building classification models  Robust to the presence of noise of data set  The presence of redundant attributes does not adversely affect the accuracy of decision trees Evaluating the perf. of a Classifier  Holdout Method  Random Subsampling  Cross-validation Holdout Method  The original data with examples is partitioned: labeled  Two disjoint sets, called the training and the test sets, respectively(e.g., 50-50 or two-thirds for training and one-third for testing). Random Subsampling  The holdout method can be repeated several times to improve the estimation of a classifier’s performance known as random subsampling.  The overall accuracy is given by the average accuracy of all iterations. K-Fold Cross-validation  The dataset is partitioned into k equal-sized parts.  One of the parts is used for testing, while the rest of them are used for training.  This procedure is repeated k times so that each partition is used for testing only once  , the total error is give by summing up the errors for all k runs. References  P. N. Tan, M Steinbach, V. Kumar,”Introduction to data mining”, Pearson Addison Wesley.  เอกสารประกอบการสอนรายวิชา KNOWLEDGE / DATA MINING โดย ผศ.ดร. จันทรเจา มงคลนาวิน “ Add your company slogan ” LOGO Decision Tree Induction  How to the algorithm works Tid Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125k No 2 No Married 100k No 3 No Single 70k No 4 Yes Married 120k No 5 No Divorced 95k Yes 6 No Married 60k No 7 Yes Divorced 220k No 8 No Single 85k Yes 9 No Married 75k No 10 No Single 90k Yes Decision Tree Induction  How to the algorithm works Decision Tree Induction  ทดสอบ Car Type p(0|Car) = 10/20, p(1|Car) = 10/20 • Gini Index(Car) = 1 - [ (10/20)2+(10/20)2 ] = 0.5 Family Luxury Sport C0: 1 C1: 3 C0: 8 C1: 0 • Entropy(Gender) = - [(10/20) log2(10/20) +(10/20) log2(10/20)] =1 • Error (Gender) = 1- max{(10/20), (10/20)} = 1-(10/20) =0.5 C0: 1 C1: 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Decision Tree - home.kku.ac.th