Download Lecture 7 Supervised..

Document related concepts
no text concepts found
Transcript
SUPERVISED LEARNING
NETWORK
2
• Hebb learning
•The Hebb learning is based on the premise that if two neurons
were active simultaneously, then the strength of the connection
between then should be increased.
•This was later extended to include increasing the weights when
two neurons simultaneously disagreed too.
•Perceptron learning rule
•The perceptron is also called the Rosenblatt neuron.
•This is considered to be more powerful than Hebb training.
•One can show that the perceptron weights will converge if a
solution exists.
•Perceptrons use a threshold output function.
•The updates occur only when there is an error.
3
•Widrow-Hoff Learning Rule
•Also known as the LMS, Delta or batch learning rule.
•The delta rule adjusts the weights to reduce the difference
between the output unit and the desired output.
•This results in the smallest mean square error.
•This improvement ensures that the training gives you a more
generalized system (inputs which are similar but not exact to the
training vectors).
4
SUPERVISED LEARNING NETWORKS

Training and test data sets

Training set; input & target are specified
6
PERCEPTRON LEARNING
wi = wi + wi
wi =  (t - o) xi
where
t is the target value,
o is the perceptron output,
 Is a small constant (e.g., 0.1) called learning rate.

If the output is correct (t = o) the weights will are not changed

If the output is incorrect (t  o) the weights wi are changed such
that the output of the perceptron for the new weights is closer to t.

The algorithm converges to the correct classification
• if the training data is linearly separable
•  is sufficiently small
LEARNING ALGORITHM

Epoch : Presentation of the entire training set to the neural
network.

In the case of the AND function, an epoch consists of four sets of
inputs being presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1]).

Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.

Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the target value will be 1.

Output , O : The output value from the neuron.

Ij : Inputs being presented to the neuron.

Wj : Weight from input neuron to the output neuron.

LR : The learning rate. This dictates how quickly the network
converges. It is set by a matter of experimentation. It is typically
0.1.
TRAINING ALGORITHM

Adjust neural network weights to map inputs to outputs.

Use a set of sample patterns where the desired output (given the
inputs presented) is known.
(For binary input and binary output)
11
12
April 2007
13
14
(For binary or bipolar input and
bipolar output)
15
April 2007
16
April 2007
17
April 2007
18
April 2007
19
April 2007
20
April 2007
21
April 2007
22
April 2007
23
April 2007
24
ADAPTIVE LINEAR NEURON (ADALINE)
• In 1959, Bernard Widrow and Marcian Hoff of Stanford developed
models they called ADALINE (Adaptive Linear Neuron) and MADALINE
(Multilayer ADALINE).
•MADALINE was the first neural network to be applied to a real world
problem. It is an adaptive filter which eliminates echoes on phone
lines.
The units with linear activation function are called linear units.
A network with a single linear unit is called an Adaline.
The input-output relationship is linear.
26
ADALINE MODEL
USING ADALINE NETWORKS
Initialize


Training
Thinking

Initialize
• Assign random weights to all links
Training
• Feed-in known inputs in random sequence
• Simulate the network
• Compute error between the input and the
output (Error Function)
• Adjust weights (Learning Function)
• Repeat until total error < ε
Thinking
• Simulate the network
• Network will respond to any input
• Does not guarantee a correct solution even
for trained inputs
April 2007
t
l
29
April 2007
30
April 2007
31
April 2007
32
April 2007
33
MADALINE NETWORK
MADALINE is a Multilayer Adaptive Linear Element. MADALINE was the
first neural network to be applied to a real world problem. It is used in
several adaptive filtering process.
April 2007
35
Only the weights for hidden ADALINES are adjusted36
 The weights for output unit are fixed.

MEDALINE

Figure
shows
a
MADALINE with two
hidden ADALINE and
one output ADALINE
37
TRAINING ALGORITHM

The weights entering Y unit and its bias may be
taken as
≥
38
TRAINING ALGORITHM
≥
39
TRAINING ALGORITHM

Step 5: Calculate output of each hidden unit:
zj  f ( zinj)

Step 6: Find the output of the net:
≥
m
yin  b0   zjvj
j 1
y  f ( yin)
40
TRAINING ALGORITHM
Step 7:
Step 8: Test for the stopping condition.
41
LAYERS IN NEURAL NETWORK

The input layer:
• Introduces input values into the network.
• No activation function or other processing.

The hidden layer(s):
• Performs classification of features.
• Two hidden layers are sufficient to solve any problem.
• Features imply more layers may be better.

The output layer:
• Functionally is just like the hidden layers.
• Outputs are passed on to the world outside the neural
network.
MULTILAYER FEEDFORWARD NETWORK
Inputs
Hiddens
I0
Outputs
h0
o0
I1
h1
o1
I2
h2
I3
Inputs
Hiddens
Outputs
MULTILAYER FEEDFORWARD NETWORK:
ACTIVATION AND TRAINING

For feed forward networks:
• A continuous function can be differentiated allowing gradientdescent.
• Back propagation is an example of a gradient-descent technique.
• Uses sigmoid (binary or bipolar) activation function.
In
multilayer
networks,
the
activation
function
is
usually
more
complex
than
just
a
threshold
function,
like 1/[1+exp(-x)] or even 2/[1+exp(-x)] – 1, etc.
Binary Sigmoidal Function
Bipolar Sigmoidal Function
BACK PROPAGATION NETWORK



A training procedure which allows multilayer feed forward Neural
Networks to be trained.
Can theoretically perform “any” input-output mapping.
Can learn to solve linearly inseparable problems.
BACKPROPAGATION TRAINING ALGORITHM
BACKPROPAGATION TRAINING ALGORITHM
BACKPROPAGATION TRAINING ALGORITHM
BACKPROPAGATION TRAINING ALGORITHM
BACKPROPAGATION

Gradient descent over entire network weight vector

Easily generalized to arbitrary directed graphs

Will find a local, not necessarily global error minimum -in practice
often works well (can be invoked multiple times with different
initial weights)

Often include weight momentum term

Minimizes error training examples

Will it generalize well to unseen instances (over-fitting)?
APPLICATIONS OF BACKPROPAGATION
NETWORK

Load forecasting problems in power systems.

Image processing.

Fault diagnosis and fault detection.

Gesture recognition, speech recognition.

Signature verification.

Bioinformatics.

Structural engineering design (civil).
RADIAL BASIS FUCNTION NETWORK

The radial basis function (RBF) is a classification and functional
approximation neural network developed by M.J.D. Powell.

An RBFN performs classification by measuring the input’s
similarity to examples from the training set.

The network uses the most common nonlinearities such as
sigmoidal and Gaussian kernel functions.

Each RBFN neuron stores a “prototype”, which is just one of the
examples from the training set.

When we want to classify a new input, each neuron computes the
Euclidean distance between the input and its prototype.

If the input more closely resembles the class A prototypes than
the class B prototypes, it is classified as class A.
RADIAL BASIS FUCNTION NETWORK
RADIAL BASIS FUCNTION NETWORK






Each RBF neuron stores a “prototype” vector which is just
one of the vectors from the training set.
Each RBF neuron compares the input vector to its prototype,
and outputs a value between 0 and 1 which is a measure of
similarity.
If the input is equal to the prototype, then the output of that
RBF neuron will be 1.
As the distance between the input and prototype grows, the
response falls off exponentially towards 0.
The shape of the RBF neuron’s response is a bell curve
The prototype vector is also often called the neuron’s “center”,
since it’s the value at the center of the bell curve.
RADIAL BASIS FUCNTION NETWORK


The output of the network consists of a set of nodes, one
per category that we are trying to classify
The output node will typically give a positive weight to
the RBF neurons that belong to its category, and a
negative weight to the others.
RBF NEURON ACTIVATION FUNCTION

There are different possible choices of similarity
functions, but the most popular is based on the
Gaussian. Below is the equation for a Gaussian with a
one-dimensional input
Where x is the input, µ is the mean, and σ is the
standard deviation.
 This produces the familiar bell curve which is centered
at the mean

58
RBF NEURON ACTIVATION FUNCTION

The RBF neuron activation function is slightly different,
and is typically written as:
µ refers to the prototype vector which is at the center of
the bell curve.
 β is a coefficient that controls the width of the bell curve.
 The double bar notation indicates the Euclidean
distance between x and µ
 Generally, beta is considered as M/d2max , where M is the
number of basis functions
and dmax is maximum distance
between the prototypes.

59
TRAINING THE RBFN

The training process for an RBFN consists of selecting
three sets of parameters:
the prototypes (µ)
 beta coefficient for each of the RBF neurons,
 matrix of output weights between the RBF neurons and the
output nodes

60
TRAINING THE RBFN
61
THE XOR PROBLEM IN RBF FORM
First decide how many basis functions are needed?
 Given there are four training
patterns and two
classes, M = 2 seems a reasonable first guess.
 Then the basis function centres need to be chosen.
The two separated zero targets seem a good random
choice, so μ1 = ( 0 , 0) and μ 2 = ( 1 ,1 )
 The distance between them is dmax = √ 2. That gives
the basis functions:

62
THE XOR PROBLEM IN RBF FORM

Since the hidden unit activation space is only two
dimensional, it is easy to plot the activations to see
how the four input patterns have been transformed:
63
April 2007
64
SUMMARY
This chapter discussed on the several supervised learning networks like





Perceptron,
Adaline,
Madaline,
Backpropagation Network,
Radial Basis Function Network.
Apart from these mentioned above, there are several other supervised
neural networks like tree neural networks, wavelet neural network,
functional link neural network, and so on.