Download Artifical Neural Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Neural Networks
2
Biological Neural Networks
Artifical Neural Networks
Neural Networks
Biological Neural Networks
Artificial Neural Networks .
ANN Structure. . . . . . . . .
ANN Illustration. . . . . . . .
.
.
.
.
2
2
3
4
5
Perceptrons
Perceptron Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Perceptrons Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of Perceptron Learning (α = 1) . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural networks are inspired by our brains.
The human brain has about 1011 neurons and 1014 synapses.
A neuron consists of a soma (cell body), axons (sends signals), and
dendrites (receives signals).
A synapse connects an axon to a dendrite.
Given a signal, a synapse might increase (excite) or decrease (inhibit)
electrical potential. A neuron fires when its electrical potential reaches a
threshold.
Learning might occur by changes to synapses.
CS 5233 Artificial Intelligence
Artificial Neural Networks
An (artificial) neural network consists of units, connections, and weights. Inputs
and outputs are numeric.
Biological NN Artificial NN
soma
unit
axon, dendrite connection
synapse
weight
potential
weighted sum
threshold
bias weight
signal
activation
Gradient Descent
9
Gradient Descent (Linear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Multilayer Networks
Illustration. . . . . . . . . . . . . . . . .
Sigmoid Activation . . . . . . . . . . .
Plot of Sigmoid Function. . . . . . .
Applying The Chain Rule. . . . . . .
Backpropagation . . . . . . . . . . . .
Hypertangent Activation Function.
Gaussian Activation Function . . . .
Issues in Neural Networks . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
14
15
16
17
18
Artificial Neural Networks – 2
CS 5233 Artificial Intelligence
Artificial Neural Networks – 3
2
Perceptrons
ANN Structure
A typical unit i receives inputs aj1 , aj2 , . . . from other units and performs a
weighted sum:
P
ini = W0,i + j Wj,i aj
and outputs activation ai = g(ini ).
Typically, input units store the inputs, hidden units transform the inputs
into an internal numeric vector, and an output unit transforms the hidden
values into the prediction.
An ANN is a function f (x, W) = a, where x is an example, W is the
weights, and a is the prediction (activation value from output unit).
Learning is finding a W that minimizes error.
CS 5233 Artificial Intelligence
Perceptron Learning Rule
Artificial Neural Networks – 4
x1
x2
x3
x4
HIDDEN
UNITS
W15
W25
W35
W45
W16
W26
W36
W46
A perceptron is a single unit with activation:
P
a = sign W0 + j Wj ∗ xj
sign returns −1 or 1. W0 is the bias weight.
One version of the perceptron learning rule is:
P
in ← W0 + Wj ∗ xj
E ← max(0, 1 − y ∗ in)
if E > 0 then
W0 ← W0 + α ∗ y
Wj ← Wj + α ∗ xj ∗ y for each input xj
x is the inputs, y ∈ {1, −1} is the target, and α > 0 is the learning rate.
ANN Illustration
INPUT
UNITS
6
OUTPUT
UNIT
CS 5233 Artificial Intelligence
OUTPUT
H5
Perceptrons Continued
W05
W57
bias
W07
O7
a7
W06
W67
Artificial Neural Networks – 6
This learning rule tends to minimize E.
The perceptron convergence theorem states that if some W classifies all
the training examples correctly, then the perceptron learning rule will
converge to zero error on the training examples.
Usually, many epochs (passes over the training examples) are needed until
convergence.
If zero error is not possible, use α ≈ 0.1/n, where n is the number of
normalized or standardized inputs.
CS 5233 Artificial Intelligence
Artificial Neural Networks – 7
H6
WEIGHTS
CS 5233 Artificial Intelligence
Artificial Neural Networks – 5
3
4
Example of Perceptron Learning (α = 1)
The Delta Rule
Using α = 1:
Given error E(W), obtain the gradient:
∂E
∂E ∂E
,
,...
∇E(W) =
∂W0 ∂W1
∂Wn
To decrease error, use the update rule:
∂E
Wj ← Wj − α
∂Wj
The LMS update rule can be derived using:
P
E(W) = (y − (W0 + Wj ∗ xj ))2/2
x1
Inputs
x2 x3
x4
y
in
E
0
1
1
0
0
0
1
1
0
0
1
1
0
0
1
0
0
1
1
0
1
1
0
1
0
1
0
-1
1
1
-1
1
-1
1
1
-1
0
-1
2
0
-1
-1
1
-1
2
1
2
0
1
2
0
0
2
3
0
1
1
1
0
0
0
1
0
W0 W1
0
0
-1
0
0
1
0
1
-1
1
0
1
0
1
0
1
1
2
0
2
Weights
W2 W3 W4
0
0
0
0
0 -1
1
1 -1
1
1 -1
1
0 -2
1
0 -2
1
0 -2
1
0 -2
1
1 -1
0
1 -1
CS 5233 Artificial Intelligence
Artificial Neural Networks – 8
For the perceptron learning rule, use:
P
E(W) = max (0, 1 − y ∗ (W0 + Wj ∗ xj ))
CS 5233 Artificial Intelligence
Gradient Descent
9
Multilayer Networks
Gradient Descent (Linear)
Suppose activation is a linear function:
P
a = W0 + Wj ∗ xj
INPUT
UNITS
The Widrow-Hoff (LMS) update rule is:
HIDDEN
UNITS
1
2
1
3
x1
where α is the learning rate.
This update rule tends to minimize squared error.
P
(y − a)2
E(W) =
x2
(x,y)
CS 5233 Artificial Intelligence
11
Illustration
diff ← y − a
W0 ← W0 + α ∗ diff
Wj ← Wj + α ∗ xj ∗ diff for each input j
Artificial Neural Networks – 10
x3
x4
OUTPUT
O7
a7
H5
−1
2
bias
1
−2
−4
0
−3
4
1
Artificial Neural Networks – 9
OUTPUT
UNIT
H6
WEIGHTS
CS 5233 Artificial Intelligence
5
Artificial Neural Networks – 11
6
Sigmoid Activation
Applying The Chain Rule
The sigmoid function is defined as:
1
sigmoid(x) =
1 + e−x
It is commonly used for ANN activation functions:
ai = sigmoid(ini) P
= sigmoid(W0,i + j Wj,i aj )
∂E
∂E ∂ai ∂ini
=
∂Wj,i
∂ai ∂ini ∂Wj,i
= −2(yi − ai ) ai (1 − ai ) aj
Note that
∂sigmoid(x)
= sigmoid(x)(1 − sigmoid(x))
∂x
CS 5233 Artificial Intelligence
Using E = (yi − ai )2 for output unit i:
∂E ∂ai ∂ini ∂aj ∂inj
∂E
=
∂Wk,j
∂ai ∂ini ∂aj ∂inj ∂Wk,j
= −2(yi − ai ) ai (1 − ai ) Wj,i
aj (1 − aj ) xk
Artificial Neural Networks – 12
Plot of Sigmoid Function
For weights from input to hidden units:
CS 5233 Artificial Intelligence
1.0
Artificial Neural Networks – 14
Backpropagation
sigmoid(x)
0.8
Backpropagation is an application of the delta rule.
Update each weight W using the rule:
0.6
W ←W −α
0.4
∂E
∂W
where α is the learning rate.
0.2
CS 5233 Artificial Intelligence
Artificial Neural Networks – 15
0.0
-4
-2
0
2
4
CS 5233 Artificial Intelligence
Artificial Neural Networks – 13
7
8
Issues in Neural Networks
Hypertangent Activation Function
1.0
tanh(x)
0.5
0.0
-0.5
-1.0
-2
-1
0
1
2
CS 5233 Artificial Intelligence
Artificial Neural Networks – 16
Sufficiently complex ANNs can approximate any “reasonable” function.
ANNs approx. a preference bias for interpolation.
How many hidden units and layers?
What are good initial values?
One approach to avoid overfitting is:
Remove “validation” exs. from training exs.
Train neural network using training exs.
Choose weights that are best on validation set.
Faster algorithms might rely on:
Momentum, a running average of deltas.
Conjugant Gradient, a second deriv. method,
or RPROP, update based only on sign.
CS 5233 Artificial Intelligence
Artificial Neural Networks – 18
Gaussian Activation Function
1.0
gaussian(x,1)
0.8
0.6
0.4
0.2
0.0
-3
-2
-1
0
1
2
3
CS 5233 Artificial Intelligence
Artificial Neural Networks – 17
9
10
Related documents