Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lectures 10 Feed-Forward Neural Networks Learning outcomes You will know the two main learning paradigms You will know some problem types and the appropriate paradigm to use You will know some further detail of the practical techniques and problems encountered when setting up networks NN Learning Strategies What is learning? Learning is a process by which the parameters of a NN are adapted through stimulation by the environment. The type of learning is determined by the manner in which the parameter change takes place [adapted from Haykin - Neural Networks 1994 Mcmillan]. So the NN is stimulated by the environment. The NN changes as a result. The NN responds to the environment in a new way. We have met Perceptrons and Feed Forward NNs and a training algorithm for these types of NN. This algorithm is an example of an error correction learning algorithm. The learning paradigm is that of supervised learning. The NN learns from examples which are given to it – the NN responds and if necessary is then changed because the supervisor (the person who wrote the training program) has decided that response was not (sufficiently) correct. The particular algorithm we have seen differentially changes the weights that are likely to have most effect. We will see in the final lecture that there are kinds of NN which learn from data in an unsupervised way using a different algorithm (competitive learning). Problem Type and suitable Learning Paradigms Function Approximation Input –output mapping needed – Supervised learning Association Need to learn some patterns – then later on recall the pattern from partial information or noisy data. Unsupervised learning Pattern classification If patterns known in advance this reduces to input –output mappings and hence supervised learning. If no knowledge of the expected patterns then unsupervised can be used to detect them. Time series prediction Sequence of data – want to guess the next one [stock market prediction] View as input – output mapping – supervised with error correction. Further practical issues to do with Neural Network training In the last lecture we looked at coding the data in an appropriate way. Assumming we have managed to get the data coding correct what next? We will look at the internal working of the training next lecture - for now we will concentrate on practical issues. We are using an error correction algorithm – we learn to minimize the error on our training set then we apply the learnt rules to new data. This means that we need to get as representative a sample as possible to train the network on. We want to randomise input order in case the order was biased by the collection method. We need to take care that we try to get representatives of every kind of data into the training set. We take particular care if one group is numerically small (but there are limits to what we can do). We need to set some data aside to use to test the network. After training the network test it on data where we know the answers (but which the network has NOT previously seeen). Again we want this test data to be as representative as possible. Only if the performance is acceptable on this test data will we use the network. We usually divide data into two thirds training and one third testing. Training methods Approaches date from the earlier days of neural networks where the training methods were pretty primitive. Early backpropagation algorithms were slow and cumbersome. Subsequent work by mathematicians have refined training methods so that there are now many variants, using sophisticated specialist techniques. They are difficult to describe without technical mathematical jargon – but they are still accessible in matlab for our use. The Levenberg-Marquhardt algorithm works extremely quickly and reliably for example and is probably the algorithm of choice within matlab. Overfitting One of the problems which can occur is that our network learns the training data too well – it specialises to the data input and it is no good at generalisation. This is what happened with the curve fitting backprop demo in the laboratory when the difficulty index was 1 but we used 9 neurons to fit the data. picture The difficulty is that we are training and always looking for a smaller error value – how do we know that reducing the error on the training set is a bad thing from the point of view of generalising? One answer is early stopping or cross-validation. Recall the process we have settled on to train a network: Split data into train set and test set (randomly); Train network using training data – then test network on test set. If we get a sufficiently good result then we will use the net on new data and trust the results. However when we overtrain the test set gives bad results – so we can't use the network. When we use early stopping we split the original data into 3 sets – a train, a test and a cross-validation set. We train the network on the training set – but we keep checking on the accuracy of the net on the validation set as well as the training set. As long as the error on both sets is reducing we keep training – but only the training data is used in changing the network weights. When we start to get increasing error on the validation data we stop training the net and test it on the test data. If it works ok with the test data then we are willing to use it on new data. We haven't covered this in labs but matlab can build this in to the training with the NN tool. Because of the problems with the danger of overfitting we develop our network layout by tial and error, gradually increasing the number of neurons in the hidden layer trying to get acceptable performance levels on the test data. Too few neurons and the net will have difficulty learning the training data – too many and it will not generalise well. Portfolio Exercise Describe supervised learning in your own words.