Download Lecture 7 Supervised..

SUPERVISED LEARNING NETWORK 2 • Hebb learning •The Hebb learning is based on the premise that if two neurons were active simultaneously, then the strength of the connection between then should be increased. •This was later extended to include increasing the weights when two neurons simultaneously disagreed too. •Perceptron learning rule •The perceptron is also called the Rosenblatt neuron. •This is considered to be more powerful than Hebb training. •One can show that the perceptron weights will converge if a solution exists. •Perceptrons use a threshold output function. •The updates occur only when there is an error. 3 •Widrow-Hoff Learning Rule •Also known as the LMS, Delta or batch learning rule. •The delta rule adjusts the weights to reduce the difference between the output unit and the desired output. •This results in the smallest mean square error. •This improvement ensures that the training gives you a more generalized system (inputs which are similar but not exact to the training vectors). 4 SUPERVISED LEARNING NETWORKS  Training and test data sets  Training set; input & target are specified 6 PERCEPTRON LEARNING wi = wi + wi wi =  (t - o) xi where t is the target value, o is the perceptron output,  Is a small constant (e.g., 0.1) called learning rate.  If the output is correct (t = o) the weights will are not changed  If the output is incorrect (t  o) the weights wi are changed such that the output of the perceptron for the new weights is closer to t.  The algorithm converges to the correct classification • if the training data is linearly separable •  is sufficiently small LEARNING ALGORITHM  Epoch : Presentation of the entire training set to the neural network.  In the case of the AND function, an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]).  Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it outputs 1, then Error = -1.  Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function, the target value will be 1.  Output , O : The output value from the neuron.  Ij : Inputs being presented to the neuron.  Wj : Weight from input neuron to the output neuron.  LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1. TRAINING ALGORITHM  Adjust neural network weights to map inputs to outputs.  Use a set of sample patterns where the desired output (given the inputs presented) is known. (For binary input and binary output) 11 12 April 2007 13 14 (For binary or bipolar input and bipolar output) 15 April 2007 16 April 2007 17 April 2007 18 April 2007 19 April 2007 20 April 2007 21 April 2007 22 April 2007 23 April 2007 24 ADAPTIVE LINEAR NEURON (ADALINE) • In 1959, Bernard Widrow and Marcian Hoff of Stanford developed models they called ADALINE (Adaptive Linear Neuron) and MADALINE (Multilayer ADALINE). •MADALINE was the first neural network to be applied to a real world problem. It is an adaptive filter which eliminates echoes on phone lines. The units with linear activation function are called linear units. A network with a single linear unit is called an Adaline. The input-output relationship is linear. 26 ADALINE MODEL USING ADALINE NETWORKS Initialize   Training Thinking  Initialize • Assign random weights to all links Training • Feed-in known inputs in random sequence • Simulate the network • Compute error between the input and the output (Error Function) • Adjust weights (Learning Function) • Repeat until total error < ε Thinking • Simulate the network • Network will respond to any input • Does not guarantee a correct solution even for trained inputs April 2007 t l 29 April 2007 30 April 2007 31 April 2007 32 April 2007 33 MADALINE NETWORK MADALINE is a Multilayer Adaptive Linear Element. MADALINE was the first neural network to be applied to a real world problem. It is used in several adaptive filtering process. April 2007 35 Only the weights for hidden ADALINES are adjusted36  The weights for output unit are fixed.  MEDALINE  Figure shows a MADALINE with two hidden ADALINE and one output ADALINE 37 TRAINING ALGORITHM  The weights entering Y unit and its bias may be taken as ≥ 38 TRAINING ALGORITHM ≥ 39 TRAINING ALGORITHM  Step 5: Calculate output of each hidden unit: zj  f ( zinj)  Step 6: Find the output of the net: ≥ m yin  b0   zjvj j 1 y  f ( yin) 40 TRAINING ALGORITHM Step 7: Step 8: Test for the stopping condition. 41 LAYERS IN NEURAL NETWORK  The input layer: • Introduces input values into the network. • No activation function or other processing.  The hidden layer(s): • Performs classification of features. • Two hidden layers are sufficient to solve any problem. • Features imply more layers may be better.  The output layer: • Functionally is just like the hidden layers. • Outputs are passed on to the world outside the neural network. MULTILAYER FEEDFORWARD NETWORK Inputs Hiddens I0 Outputs h0 o0 I1 h1 o1 I2 h2 I3 Inputs Hiddens Outputs MULTILAYER FEEDFORWARD NETWORK: ACTIVATION AND TRAINING  For feed forward networks: • A continuous function can be differentiated allowing gradientdescent. • Back propagation is an example of a gradient-descent technique. • Uses sigmoid (binary or bipolar) activation function. In multilayer networks, the activation function is usually more complex than just a threshold function, like 1/[1+exp(-x)] or even 2/[1+exp(-x)] – 1, etc. Binary Sigmoidal Function Bipolar Sigmoidal Function BACK PROPAGATION NETWORK    A training procedure which allows multilayer feed forward Neural Networks to be trained. Can theoretically perform “any” input-output mapping. Can learn to solve linearly inseparable problems. BACKPROPAGATION TRAINING ALGORITHM BACKPROPAGATION TRAINING ALGORITHM BACKPROPAGATION TRAINING ALGORITHM BACKPROPAGATION TRAINING ALGORITHM BACKPROPAGATION  Gradient descent over entire network weight vector  Easily generalized to arbitrary directed graphs  Will find a local, not necessarily global error minimum -in practice often works well (can be invoked multiple times with different initial weights)  Often include weight momentum term  Minimizes error training examples  Will it generalize well to unseen instances (over-fitting)? APPLICATIONS OF BACKPROPAGATION NETWORK  Load forecasting problems in power systems.  Image processing.  Fault diagnosis and fault detection.  Gesture recognition, speech recognition.  Signature verification.  Bioinformatics.  Structural engineering design (civil). RADIAL BASIS FUCNTION NETWORK  The radial basis function (RBF) is a classification and functional approximation neural network developed by M.J.D. Powell.  An RBFN performs classification by measuring the input’s similarity to examples from the training set.  The network uses the most common nonlinearities such as sigmoidal and Gaussian kernel functions.  Each RBFN neuron stores a “prototype”, which is just one of the examples from the training set.  When we want to classify a new input, each neuron computes the Euclidean distance between the input and its prototype.  If the input more closely resembles the class A prototypes than the class B prototypes, it is classified as class A. RADIAL BASIS FUCNTION NETWORK RADIAL BASIS FUCNTION NETWORK       Each RBF neuron stores a “prototype” vector which is just one of the vectors from the training set. Each RBF neuron compares the input vector to its prototype, and outputs a value between 0 and 1 which is a measure of similarity. If the input is equal to the prototype, then the output of that RBF neuron will be 1. As the distance between the input and prototype grows, the response falls off exponentially towards 0. The shape of the RBF neuron’s response is a bell curve The prototype vector is also often called the neuron’s “center”, since it’s the value at the center of the bell curve. RADIAL BASIS FUCNTION NETWORK   The output of the network consists of a set of nodes, one per category that we are trying to classify The output node will typically give a positive weight to the RBF neurons that belong to its category, and a negative weight to the others. RBF NEURON ACTIVATION FUNCTION  There are different possible choices of similarity functions, but the most popular is based on the Gaussian. Below is the equation for a Gaussian with a one-dimensional input Where x is the input, µ is the mean, and σ is the standard deviation.  This produces the familiar bell curve which is centered at the mean  58 RBF NEURON ACTIVATION FUNCTION  The RBF neuron activation function is slightly different, and is typically written as: µ refers to the prototype vector which is at the center of the bell curve.  β is a coefficient that controls the width of the bell curve.  The double bar notation indicates the Euclidean distance between x and µ  Generally, beta is considered as M/d2max , where M is the number of basis functions and dmax is maximum distance between the prototypes.  59 TRAINING THE RBFN  The training process for an RBFN consists of selecting three sets of parameters: the prototypes (µ)  beta coefficient for each of the RBF neurons,  matrix of output weights between the RBF neurons and the output nodes  60 TRAINING THE RBFN 61 THE XOR PROBLEM IN RBF FORM First decide how many basis functions are needed?  Given there are four training patterns and two classes, M = 2 seems a reasonable first guess.  Then the basis function centres need to be chosen. The two separated zero targets seem a good random choice, so μ1 = ( 0 , 0) and μ 2 = ( 1 ,1 )  The distance between them is dmax = √ 2. That gives the basis functions:  62 THE XOR PROBLEM IN RBF FORM  Since the hidden unit activation space is only two dimensional, it is easy to plot the activations to see how the four input patterns have been transformed: 63 April 2007 64 SUMMARY This chapter discussed on the several supervised learning networks like      Perceptron, Adaline, Madaline, Backpropagation Network, Radial Basis Function Network. Apart from these mentioned above, there are several other supervised neural networks like tree neural networks, wavelet neural network, functional link neural network, and so on.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 7 Supervised..