Download Lectures 10 Feed-Forward Neural Networks

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Lectures 10 Feed-Forward Neural Networks
Learning outcomes
You will know the two main learning paradigms
You will know some problem types and the appropriate paradigm to use
You will know some further detail of the practical techniques and problems
encountered when setting up networks
NN Learning Strategies
What is learning?
Learning is a process by which the parameters of a NN are adapted through stimulation
by the environment. The type of learning is determined by the manner in which the
parameter change takes place [adapted from Haykin - Neural Networks 1994 Mcmillan].
So the NN is stimulated by the environment.
The NN changes as a result.
The NN responds to the environment in a new way.
We have met Perceptrons and Feed Forward NNs and a training algorithm for these types
of NN. This algorithm is an example of an error correction learning algorithm. The
learning paradigm is that of supervised learning.
The NN learns from examples which are given to it – the NN responds and if necessary is
then changed because the supervisor (the person who wrote the training program) has
decided that response was not (sufficiently) correct. The particular algorithm we have
seen differentially changes the weights that are likely to have most effect.
We will see in the final lecture that there are kinds of NN which learn from data in an
unsupervised way using a different algorithm (competitive learning).
Problem Type and suitable Learning Paradigms
Function Approximation
Input –output mapping needed – Supervised learning
Need to learn some patterns – then later on recall the pattern from partial information or
noisy data. Unsupervised learning
Pattern classification
If patterns known in advance this reduces to input –output mappings and hence
supervised learning. If no knowledge of the expected patterns then unsupervised can be
used to detect them.
Time series prediction
Sequence of data – want to guess the next one [stock market prediction] View as input –
output mapping – supervised with error correction.
Further practical issues to do with Neural Network training
In the last lecture we looked at coding the data in an appropriate way. Assumming we
have managed to get the data coding correct what next? We will look at the internal
working of the training next lecture - for now we will concentrate on practical issues.
We are using an error correction algorithm – we learn to minimize the error on our
training set then we apply the learnt rules to new data.
This means that we need to get as representative a sample as possible to train the network
on. We want to randomise input order in case the order was biased by the collection
method. We need to take care that we try to get representatives of every kind of data into
the training set. We take particular care if one group is numerically small (but there are
limits to what we can do).
We need to set some data aside to use to test the network. After training the network test
it on data where we know the answers (but which the network has NOT previously
seeen). Again we want this test data to be as representative as possible. Only if the
performance is acceptable on this test data will we use the network.
We usually divide data into two thirds training and one third testing.
Training methods
Approaches date from the earlier days of neural networks where the training methods
were pretty primitive. Early backpropagation algorithms were slow and cumbersome.
Subsequent work by mathematicians have refined training methods so that there are now
many variants, using sophisticated specialist techniques. They are difficult to describe
without technical mathematical jargon – but they are still accessible in matlab for our use.
The Levenberg-Marquhardt algorithm works extremely quickly and reliably for example
and is probably the algorithm of choice within matlab.
One of the problems which can occur is that our network learns the training data too well
– it specialises to the data input and it is no good at generalisation. This is what happened
with the curve fitting backprop demo in the laboratory when the difficulty index was 1
but we used 9 neurons to fit the data.
The difficulty is that we are training and always looking for a smaller error value – how
do we know that reducing the error on the training set is a bad thing from the point of
view of generalising?
One answer is early stopping or cross-validation.
Recall the process we have settled on to train a network:
Split data into train set and test set (randomly);
Train network using training data – then test network on test set. If we get a sufficiently
good result then we will use the net on new data and trust the results.
However when we overtrain the test set gives bad results – so we can't use the network.
When we use early stopping we split the original data into 3 sets – a train, a test and a
cross-validation set.
We train the network on the training set – but we keep checking on the accuracy of the
net on the validation set as well as the training set. As long as the error on both sets is
reducing we keep training – but only the training data is used in changing the network
weights. When we start to get increasing error on the validation data we stop training the
net and test it on the test data. If it works ok with the test data then we are willing to use it
on new data. We haven't covered this in labs but matlab can build this in to the training
with the NN tool.
Because of the problems with the danger of overfitting we develop our network layout by
tial and error, gradually increasing the number of neurons in the hidden layer trying to get
acceptable performance levels on the test data. Too few neurons and the net will have
difficulty learning the training data – too many and it will not generalise well.
Portfolio Exercise
Describe supervised learning in your own words.