Download Artificial Neural Networks - Eve

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Artificial Neural Networks
The Brain
 How do brains work?
 How do human brains differ from that
of other animals?
 Can we base models of
artificial intelligence on
the structure and inner
workings of the brain?
The Brain
 The human brain consists of:
 Approximately 10 billion neurons
 …and 60 trillion connections
 The brain is a highly complex, nonlinear,
parallel information-processing system
 By firing neurons simultaneously, the brain performs
faster than the fastest computers in existence today
 The human brain consists of:
 Approximately 10 billion neurons
 …and 60 trillion connections (synapses)
Synapse
Axon
Soma
Synapse
Dendrites
Axon
Soma
Dendrites
Synapse
Synapse
Axon
Soma
Synapse
Dendrites
Axon
Soma
Dendrites
Synapse
 An individual neuron has a very simple structure
 Cell body is called a soma
 Small connective fibers are called dendrites
 Single long fibers are called axons
 An army of such elements constitutes tremendous
processing power
Artificial Neural Networks
 An artificial neural network consists of a number
of very simple processors called neurons
 Neurons are connected by weighted links
 The links pass signals from one neuron to another
based on predefined thresholds
Artificial Neural Networks
 An individual neuron (McCulloch & Pitts, 1943):
 Computes the weighted sum of the input signals
 Compares the result with a threshold value, q
 If the net input is less than the threshold,
the neuron output is –1 (or 0)
 Otherwise, the neuron becomes activated
and its output is +1
Artificial Neural Networks
Input Signals
Weights
Output Signals
x1
Y
w1
x2
w2
Neuron
Q
Y
Y
X = x1w1 + x2w2 + ... + xnwn
wn
xn
Y
Activation Functions
 Individual neurons adhere to an activation function,
which determines whether they propagate their
signal (i.e. activate) or not:
Sign Function
Activation Functions
Activation Functions
 The step, sign, and sigmoid activation functions
are also often called hard limit functions
 We use such functions in decision-making neural
networks
 Support classification and other pattern recognition tasks
Perceptrons
 Can an individual neuron learn?
 In 1958, Frank Rosenblatt introduced a training algorithm
that provided the first procedure for training a
single-node neural network
 Rosenblatt’s perceptron model consists of a single neuron
with adjustable synaptic weights, followed by a hard
limiter
Perceptrons
Inputs
x1
w1
Linear
Combiner
Hard
Limiter
Output
Y
X = x1w1 + x2w2
Y = Ystep
w2
x2
q
Threshold
Perceptrons
 A perceptron:
 Classifies inputs x1, x2, ..., xn
into one of two distinct
classes A1 and A2
 Forms a linearly separable
function defined by:
x2
Class A1
1
Class A2
2
x1w1 + x2w2 q = 0
x1
Perceptrons
 Perceptron
with three
x2
x2
inputs x1, x2, and x3
classifies its inputs
Class A1
into two distinct
1
sets A1 and A2
Class A2
1
2
x1
x1
2
x1w1 + x2w2 q = 0
x3
x1w1 + x2w2 + x3w3 q = 0
Perceptrons
 How does a perceptron learn?
 A perceptron has initial (often random) weights
typically in the range [-0.5, 0.5]
 Apply an established training dataset
 Calculate the error as
expected output minus actual output:
error e = Yexpected – Yactual
 Adjust the weights to reduce the error
Perceptrons
 How do we adjust a perceptron’s weights to produce
Yexpected?
 If e is positive, we need to increase Yactual (and vice versa)
 Use this formula:
, where

α is the learning rate (between 0 and 1)

e is the calculated error
and
Use threshold Θ = 0.2 and
learning rate α = 0.1
Perceptron Example – AND
 Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 and
learning rate α = 0.1
Perceptron Example – AND
 Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 and
learning rate α = 0.1
Perceptron Example – AND
 Repeat until convergence
 i.e. final weights do not change and no error
Perceptron Example – AND
 Two-dimensional plot
of logical AND operation:
 A single perceptron can
be trained to recognize
any linear separable function
 Can we train a perceptron to
x2
x2
1
1
x1
0
recognize logical OR?
 How about logical exclusive-OR (i.e. XOR)?
1
(b
Perceptron – OR and XOR
 Two-dimensional plots of logical OR and XOR:
x2
x2
1
1
x1
x1
0
1
(b) OR (x 1  x 2 )
x1
0
1
(c) Ex cl usive- OR
(x  x )
Perceptron Coding Exercise
 Write a code to:
 Calculate the error at each step
 Modify weights, if necessary

i.e. if error is non-zero
 Loop until all error values are zero
for a full epoch
 Modify your code to learn to recognize the
logical OR operation
 Try to recognize the XOR operation....
Multilayer Neural Networks
 Multilayer neural networks consist of:
 An input layer of source neurons
 One or more hidden layers of
computational neurons
 An output layer of more
computational neurons
Input Layer
 Input signals are propagated in a
layer-by-layer feedforward manner
Middle Layer
Output Layer
Input Signals
Output Signals
Multilayer Neural Networks
Middle Layer
Input Layer
Output Layer
Input Signals
Output Signals
Multilayer Neural Networks
Input
layer
First
hidden
layer
Second
hidden
layer
Output
layer
XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1
Multilayer Neural Networks
XINPUT = x1
XHInput
= xsignals
1w11 + x2w21 + ... + xiwi1 + ... + xnwn1
1
x1
x2
2
xi
y1
2
y2
k
yk
l
yl
1
2
i
1
wij
j
wjk
m
n
xn
Input
layer
Hidden
layer
Error signals
Output
layer
Multilayer Neural Networks
 Three-layer network:
1
q3
x1
1
w13
3
1
w35
w23
q5
5
w
w24
14
x2
2
w45
4
w24
Input
layer
q4
1
Hiddenlayer
Output
layer
y5
Multilayer Neural Networks
 Commercial-quality neural networks often
incorporate 4 or more layers
 Each layer consists of about 10-1000 individual neurons
 Experimental and research-based neural networks
often use 5 or 6 (or more) layers
 Overall, millions of individual neurons may be used
Back-Propagation NNs
 A back-propagation neural network is a multilayer neural
network that propagates error backwards through the
network as it learns
 Weights are modified based on the calculated error
 Training is complete when the error is below a
specified threshold

e.g. less than 0.001
Back-Propagation NNs
Input signals
1
x1
x2
2
xi
y1
2
y2
k
yk
l
yl
1
2
i
1
wij
j
wjk
m
n
xn
Input
layer
Hidden
layer
Error signals
Output
layer
Use the sigmoid activation function; and
apply Θ by connecting fixed input -1 to weight Θ
Back-Propagation NNs
Initially: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0,
w35 = -1.2, w45 = 1.1, q3 = 0.8, q4 = -0.1 and q5 = 0.3.
1
q3
x1
1
w13
3
1
w35
w23
q5
5
w
w24
14
x2
2
w45
4
w24
Input
layer
q4
1
Hiddenlayer
Output
layer
y5
Step 2: Activation
Activate the back-propagation neural network by
applying inputs x1(p), x2(p),…, xn(p) and desired
outputs yd,1(p), yd,2(p),…, yd,n(p).
(a) Calculate the actual outputs of the neurons in
the hidden layer:
 n

y j ( p )  sigmoid   xi ( p)  wij ( p )  q j 
 i 1

where n is the number of inputs of neuron j in the
hidden layer, and sigmoid is the sigmoid activation
function.
33
Step 2 : Activation (continued)
(b) Calculate the actual outputs of the neurons in
the output layer:
m

yk ( p)  sigmoid   x jk ( p)  w jk ( p)  q k 
 j 1

where m is the number of inputs of neuron k in the
output layer.
34
 We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as
y3  sigmoid ( x1w13  x2w23  q3 )  1/ 1 e(10.510.410.8)  0.5250
y4  sigmoid ( x1w14  x2w24  q4 )  1/ 1  e (10.911.010.1)  0.8808
 Now the actual output of neuron 5 in the output layer
is determined as:
y5  sigmoid( y3w35  y4w45  q5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097
 Thus, the following error is obtained:
e  yd ,5  y5  0  0.5097  0.5097
35
Step 3: Weight training
Update the weights in the back-propagation network
propagating backward the errors associated with
output neurons.
(a) Calculate the error gradient for the neurons in the
output layer:
k ( p)  yk ( p) 
1  yk ( p)  ek ( p)
where ek ( p)  yd ,k ( p)  yk ( p)
Calculate the weight corrections:
w jk ( p)   y j ( p) 
k ( p)
Update the weights at the output neurons:
w jk ( p  1)  w jk ( p)  w jk ( p)
5/24/2017
Intelligent Systems and Soft Computing
36
Step 3: Weight training (continued)
(b) Calculate the error gradient for the neurons in
the hidden layer:
l
j ( p)
 y j ( p)  [1  y j ( p)]   k ( p) w jk ( p)
k 1
Calculate the weight corrections:
wij ( p) 
 xi ( p)  j ( p)
Update the weights at the hidden neurons:
wij ( p  1)  wij ( p)  wij ( p)
5/24/2017
Intelligent Systems and Soft Computing
37
 The next step is weight training. To update the
weights and threshold levels in our network, we
propagate the error, e, from the output layer
backward to the input layer.
 First, we calculate the error gradient for neuron 5 in
the output layer:
5
 y5 (1 y5) e  0.5097 (1 0.5097) ( 0.5097) 0.1274
 Then we determine the weight corrections assuming
that the learning rate parameter, a, is equal to 0.1:
w35   y3  5  0.1 0.5250 (0.1274)  0.0067
w45   y4  5  0.1 0.8808 (0.1274)  0.0112
q5   ( 1)  5  0.1 (1)  (0.1274)  0.0127
5/24/2017
Intelligent Systems and Soft Computing
38
 Next we calculate the error gradients for neurons 3
and 4 in the hidden layer:
3  y3(1  y3)  5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381
4
 y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274) 1.1  0.0147
 We then determine the weight corrections:
w13 
w23 
 q3 
w14 
w24 
q 4 
5/24/2017
 x1 
 x2 
 ( 1) 
 x1 
 x2 
 ( 1) 
 0.1 1 0.0381 0.0038
3  0.11 0.0381 0.0038
3  0.1 ( 1)  0.0381 0.0038
4  0.11 (  0.0147)  0.0015
4  0.11 ( 0.0147)  0.0015
4  0.1 ( 1)  ( 0.0147)  0.0015
3
Intelligent Systems and Soft Computing
39
 At last, we update all weights and threshold:
w13  w13   w13  0.5  0.0038  0.5038
w14  w14  w14  0.9  0.0015  0.8985
w23  w23  w23  0.4  0.0038  0.4038
w24  w24   w24  1.0  0.0015  0.9985
w35  w35   w35   1.2  0.0067   1.2067
w45  w45   w45  1.1  0.0112  1.0888
q 3  q 3   q 3  0.8  0.0038  0.7962
q 4  q 4   q 4   0.1  0.0015   0.0985
q 5  q 5   q 5  0.3  0.0127  0.3127
 The training process is repeated until the sum of
squared errors is less than 0.001.
5/24/2017
Intelligent Systems and Soft Computing
40
Related documents