Download Decision Trees.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Decision Trees
Chapter 18
From Data to Knowledge
Types of learning
• Classification:
– From examples labelled with the “class” create a
decision procedure that will predict the class.
• Regression
– From examples labelled with a real valued, create a
decision procedure that will predict the value.
• Unsupervised Learning
– From examples generate groups that are interesting
to the user.
Concerns
• Representational Bias
• Generalization Accuracy
– Is the learned concept correct? Gold Standard
• Comprehensibility
– Medical diagnosis
• Efficiency of Learning
• Efficiency of Learned Procedure
Classification Examples
•
•
•
•
•
Medical records on patients with diseases.
Bank loan records on individuals.
DNA sequences corresponding to “motif”.
Digit images of digits.
See UCI Machine Learning Data Base
Regression
• Stock histories -> stock future price
• Patient data -> internal heart pressure
• House data -> house value
Representation is key.
Unsupervised Learning
• Astronomical maps -> groups of stars that
astronomers found useful.
• Patient Data -> new diseases
– Treatments depend on correct disease class
• Gene data -> corregulated genes and
transcription factors
• Often exploratory
Weather Data:
• Four Features: windy, play, outlook: nominal
• Temperature: numeric
outlook = sunny
| humidity <= 75: yes (2.0)
| humidity > 75: no (3.0)
outlook = overcast: yes (4.0)
outlook = rainy
| windy = TRUE: no (2.0)
| windy = FALSE: yes (3.0)
Dumb DT Algorithm
Build tree: ( discrete features only)
If all entries below node are homogenous,
stop
Else pick a feature at random, create a node
for feature and form subtrees for each of the
values of the feature.
Recurse on each subtree.
Will this work?
Properties of Dumb Algorithm
• Complexity
– Homogeneity cost is O(DataSize)
– Splitting is O(DataSize)
– Times number of node in tree = bd on work
• Accuracy on training set
– perfect
• Accuracy on test set
– Not so perfect: almost random
Recall: Iris
petalwidth <= 0.6: Iris-setosa (50.0) :
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
Heuristic DT algorithm
• Entropy Set with mixed classes c1, c2,..ck
• Entropy(S) = - sum lg(pi)*pi where pi is
probability of class ci. (estimated)
• Sum weighted entropies of each subtrees,
where weight is proportion of examples in
the subtree.
• This defines a quality measure on features.
Shannon Entropy
•
•
•
•
Entropy is the only function that:
Is 0 when only 1 class present
Is k if 2^k classes, equally present
Is “additive” ie.
– E(X,Y) = E(X)+E(Y) if X and Y are independent.
• Entropy sometimes called uncertainty and
sometimes information.
• Uncertainty defined on RV where “draws” are
from the set of classes.
Shannon Entropy Properties
• Probability of guessing the state/class is
– 2^{-Entropy(S)}
• Entropy(S) = average number of yes/no
questions needed to reveal the state/class.
Majority Function
•
•
•
•
•
Suppose 2n boolean features.
Class defined by n or more features are on.
How big is the tree?
At least 2n choose n leaves.
Prototype Function: At least k of n are true
is a common medical concept.
• Concepts that are prototypical do not match
the representational bias of DTS.
Dts with real valued attributes
• Idea: convert to solved problem
• For each real valued attribute f with values
v1, v2,… vn (sorted) and binary features:
f1< (v1+v2)/2
f2 < (v2+v3/2) etc
• Other approaches possible.
• E.g. fi<any vj so no sorting needed
DTs ->Rules (j48.part)
• For each leaf, we make a rule by collecting
the tests to the leaf.
• Number of rules = number of leaves
• Simplification: test each condition on a rule
and see if dropping it harms accuracy.
• Can we go from Rules to DTs
– Not easily. Hint: no root.
Summary
• Comprehensible if tree is not large.
• Effective if small number of features
sufficient. Bias.
• Does multi-class problems naturally.
• Can be extended for regression.
• Easy to implement and low complexity