Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Trees Chapter 18 From Data to Knowledge Types of learning • Classification: – From examples labelled with the “class” create a decision procedure that will predict the class. • Regression – From examples labelled with a real valued, create a decision procedure that will predict the value. • Unsupervised Learning – From examples generate groups that are interesting to the user. Concerns • Representational Bias • Generalization Accuracy – Is the learned concept correct? Gold Standard • Comprehensibility – Medical diagnosis • Efficiency of Learning • Efficiency of Learned Procedure Classification Examples • • • • • Medical records on patients with diseases. Bank loan records on individuals. DNA sequences corresponding to “motif”. Digit images of digits. See UCI Machine Learning Data Base Regression • Stock histories -> stock future price • Patient data -> internal heart pressure • House data -> house value Representation is key. Unsupervised Learning • Astronomical maps -> groups of stars that astronomers found useful. • Patient Data -> new diseases – Treatments depend on correct disease class • Gene data -> corregulated genes and transcription factors • Often exploratory Weather Data: • Four Features: windy, play, outlook: nominal • Temperature: numeric outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Dumb DT Algorithm Build tree: ( discrete features only) If all entries below node are homogenous, stop Else pick a feature at random, create a node for feature and form subtrees for each of the values of the feature. Recurse on each subtree. Will this work? Properties of Dumb Algorithm • Complexity – Homogeneity cost is O(DataSize) – Splitting is O(DataSize) – Times number of node in tree = bd on work • Accuracy on training set – perfect • Accuracy on test set – Not so perfect: almost random Recall: Iris petalwidth <= 0.6: Iris-setosa (50.0) : petalwidth > 0.6 | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0) Heuristic DT algorithm • Entropy Set with mixed classes c1, c2,..ck • Entropy(S) = - sum lg(pi)*pi where pi is probability of class ci. (estimated) • Sum weighted entropies of each subtrees, where weight is proportion of examples in the subtree. • This defines a quality measure on features. Shannon Entropy • • • • Entropy is the only function that: Is 0 when only 1 class present Is k if 2^k classes, equally present Is “additive” ie. – E(X,Y) = E(X)+E(Y) if X and Y are independent. • Entropy sometimes called uncertainty and sometimes information. • Uncertainty defined on RV where “draws” are from the set of classes. Shannon Entropy Properties • Probability of guessing the state/class is – 2^{-Entropy(S)} • Entropy(S) = average number of yes/no questions needed to reveal the state/class. Majority Function • • • • • Suppose 2n boolean features. Class defined by n or more features are on. How big is the tree? At least 2n choose n leaves. Prototype Function: At least k of n are true is a common medical concept. • Concepts that are prototypical do not match the representational bias of DTS. Dts with real valued attributes • Idea: convert to solved problem • For each real valued attribute f with values v1, v2,… vn (sorted) and binary features: f1< (v1+v2)/2 f2 < (v2+v3/2) etc • Other approaches possible. • E.g. fi<any vj so no sorting needed DTs ->Rules (j48.part) • For each leaf, we make a rule by collecting the tests to the leaf. • Number of rules = number of leaves • Simplification: test each condition on a rule and see if dropping it harms accuracy. • Can we go from Rules to DTs – Not easily. Hint: no root. Summary • Comprehensible if tree is not large. • Effective if small number of features sufficient. Bias. • Does multi-class problems naturally. • Can be extended for regression. • Easy to implement and low complexity