Download Avoiding Overfitting of Decision Trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Principles of Data Mining
Published by Springer-Verlag. 2007
+ Clashes in a Training Set
+ Adapting TDIDT to Deal With Clashes
+ Overfitting Rules to Data
Clashes –
Two (or more) instances in a training set have the same
combination of attribute values but different
classifications.
Why the clashes happened?
– One of the instances has the data incorrectly recorded.
i.e. there is noise in the data.
– The clashing instances are all correct, but it is not
possible to discriminate between them on the basis of
the attributes recorded.
+ Strategy(I)-The ‘delete branch’ strategy
Strategy(II)-The ‘majority voting’ strategy
*The ‘delete branch’(100%) strategy and the
‘majority voting’ strategy(0%) are too extreme for
the usual cases. Therefore, we use clash threshold .
Clash threshold:
A percentage from 0 to 100 inclusive.
–Normal usage might be 60%, 70%, 80% or 90%.
– The proportion of instances in the clash set with
that classification is at least equal to the clash
threshold –> classify to the most common class.
– Else –> Discarded.
EX: Credit checking dataset
*
*The predictive accuracy for the training data is no importance—we already know
the classifications! It is the accuracy for the test data that matters.
Using the ‘default classification strategy’ and automatically
allocate each unclassified instance to the largest class.
Overfitting occurs when a model is excessively
complex, such as having too many
parameters relative to the number of
observations.
Why cause overfitting?
–noise
–less training data which cause some of the
attributes can divide the training data well in
coincidence.
The linear function provides better prediction .
Consider a typical rule such as –
IF a = 1 and b = yes and z = red THEN class = OK
Specialise –
IF a = 1 and b = yes and z = red and k = green
THEN class = OK
Generalise –
IF a = 1 and b = yes THEN class = OK
Example:
(a gold digger named – A )
Give A a present – A feels happy.
Ride A with a Lamborghini – A feels very happy.
One night stand with A –A feels extremely happy.
(a princess named – B )
Give her a present – B feels happy.
Ride her with a Lamborghini – B feels very happy.
One night stand with her –B regards you as a
playboy.
We can find out that not every girls can accept a
cheap guy, which means if the machine learned
this wrong formula, it becomes a overfitting.
With the same accuracy:
The simplest explanation is the best. (Ockham’s
Razor)