Download Learning Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Learning from Examples

Example of Learning from Examples
 Classification: Is car x a family car?
 Prediction: What is the amount of rainfall
tomorrow?

Knowledge extraction: What do people
expect from a family car? What factors are
important to predict tomorrows rainfall?
1
Christoph Eick: Learning Models to Predict and Classify
Noise and Model Complexity
Use the simpler one because





Simpler to use
(lower computational
complexity)
Easier to train (needs less examples)
Less sensitive to noise
Easier to explain
(more interpretable)
Generalizes better (lower
variance - Occam’s razor)
2
Christoph Eick: Learning Models to Predict and Classify
Alterantive Approach: Regression

X  x ,r
t

t N
t 1
g x   w1x  w 0
t
r 
g x   w 2x 2  w1x  w 0
 
rt  f xt  

 
N
1
t
t 2
Lecture
Notes
for
E g | X  
r g x

N t 1
E Alpaydın
2004 Introduction
to Machine
N
2
1
t
t
The

E w 1 ,Learning
w0 | X   © 
r MIT
 w 1Press
x  w 0 
N t 1
(V1.1)
3
Christoph Eick: Learning Models to Predict and Classify
Finding Regresssion Coefficients

X  x t ,r
g x   w1x  w 0

t N
t 1
How to find w1 and w0?
Solve: dE/dw1=0 and dE/dw0=0
And solve the two obtained equations!
Group Homework!
t
r 
 
rt  f xt  

 
N
1
t
t 2
Lecture
Notes
for
E g | X  
r g x

N t 1
E Alpaydın
2004 Introduction
to Machine
N
2
1
t
t
The

E w 1 ,Learning
w0 | X   © 
r MIT
 w 1Press
x  w 0 
N t 1
(V1.1)
4
Christoph Eick: Learning Models to Predict and Classify
Model Selection & Generalization





Learning is an ill-posed problem; data is not
sufficient to find a unique solution
The need for inductive bias, assumptions about H
Generalization: How well a model performs on new
data
Overfitting: H more complex than C or f
Underfitting: H less complex than C or f
5
Christoph Eick: Learning Models to Predict and Classify
Underfitting and Overfitting
Underfitting
Overfitting
Complexity of a Decision
Tree := number of nodes
It uses
Complexity of the Used Model
Underfitting: when model is too simple, both training and test errors are large
Overfitting: when model is too complex and test errors are large although
training
errors
small.
Christoph
Eick: Learning
Models are
to Predict
and Classify
Generalization Error
Error on new examples!


Two errors: training error, and testing error usually called
generalization error (http://en.wikipedia.org/wiki/Generalization_error ). Typically, the
training error is smaller than the generalization error.
Measuring the generalization error is a major challenge in
data mining and machine learning (http://www.faqs.org/faqs/ai-faq/neuralnets/part3/section-11.html )

To estimate generalization error, we need data unseen
during training. We could split the data as



Training set (50%)
Validation set (25%)optional, for selecting ML algorithm
parameters
Test (publication) set (25%)
7
Christoph Eick: Learning Models to Predict and Classify
Triple Trade-Off

overfitting
There is a trade-off between three factors
(Dietterich, 2003):
1. Complexity of H, c (H),
Training set size, N,
3. Generalization error, E on new data
As N, E
As c (H), first E and then E
As c (H) the training error decreases for some
time and then stays constant (frequently at 0)
2.



8
Christoph Eick: Learning Models to Predict and Classify
Notes on Overfitting




Overfitting results in models that are more
complex than necessary: after learning knowledge
they “tend to learn noise”
More complex models tend to have more
complicated decision boundaries and tend to be
more sensitive to noise, missing examples,…
Training error no longer provides a good estimate
of how well the tree will perform on previously
unseen records
Need “new” ways for estimating errors
Christoph Eick: Learning Models to Predict and Classify
Thoughts on Fitness Functions
for Genetic Programming
1.
2.
3.
4.
5.
Just use the squared training error overfitting
Use the squared training error but restrict model
complexity
Split Training set into true training set and validation
set; use squared error of validation set as the fitness
function.
Combine 1, 2, 3 (many combination exist)
Consider model complexity in the fitness function:
fitness(model)= error(model) + b*complexity(model)
10
Christoph Eick: Learning Models to Predict and Classify