Download CS407 Neural Computation

Document related concepts

Perceptual control theory wikipedia , lookup

Gene expression programming wikipedia , lookup

Neural modeling fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
CS407 Neural Computation
Lecture 2: Neurobiology and
Architectures of ANNs
Lecturer: A/Prof. M. Bennamoun
NERVOUS SYSTEM
& HUMAN BRAIN
Organization of the nervous
system
Central Nervous System
Central Nervous System
Brain
Spinal cord
Hindbrain & Midbrain
Brain stem & cerebellum
Forebrain
Sub-cortical structures
Cortex
Thalamus
Hypothalamus
Limbic system
Frontal, parietal,
occipital & temporal lobes
in left & right hemispheres
THE BIOLOGICAL NEURON
The Structure of Neurons
synapse
synapse
axon
axon
nucleus
nucleus
cell
cellbody
body
dendrites
dendrites
The Structure of Neurons
A neuron has a cell body, a branching
input structure (the dendrIte) and a
branching output structure (the axOn)
• Axons connect to dendrites via synapses.
• Electro-chemical signals are propagated
from the dendritic input, through the cell
body, and down the axon to other neurons
The Structure of Neurons
• A neuron only fires if its input signal
exceeds a certain amount (the threshold)
in a short time period.
• Synapses vary in strength
– Good connections allowing a large signal
– Slight connections allow only a weak signal.
– Synapses can be either excitatory or
inhibitory.
Neurotransmission
http://www.health.org/pubs/qdocs/mom/TG/intro.htm
Neurons come in many shapes & sizes
The brain’s plasticity
The ability of the brain to alter its neural pathways.
Recovery from brain damage
– Dead neurons are not replaced, but branches of the
axons of healthy neurons can grow into the pathways
and take over the functions of damaged neurons.
– Equipotentiality: more than one area of the brain
may be able to control a given function.
– The younger the person, the better the recovery
(e.g. recovery from left hemispherectomy).
THE ARTIFICIAL NEURON
Model
Models of Neuron
„
„
„
„
Neuron is an information processing unit
A set of synapses or connecting links
– characterized by weight or strength
An adder
– summing the input signals weighted by synapses
– a linear combiner
An activation function
– also called squashing function
• squash (limits) the output to some finite values
Nonlinear model
of
a
neuron
(I)
Bias
bk
x1
wk1
x2
wk2
vk
Σ
...
...
xm
Activation
function
ϕ(.)
wkm
Summing
junction
Synaptic
weights
Input
signal
m
v = ∑w x +b
k
j =1
kj
j
k
y = ϕ (v )
k
k
Output
yk
Analogy
• Inputs represent synapses
• Weights represent the strengths of
synaptic links
• Wi represents dentrite secretion
• Summation block represents the addition
of the secretions
• Output represents axon voltage
Nonlinear model of a neuron (II)
Wk0 = bk (bias)
X0 = +1
wk0
x1
wk1
x2
wk2
Activation
function
Σ
...
...
xm
vk
ϕ(.)
wkm
Summing
junction
Synaptic
weights
Input
signal
m
v = ∑w x
k
j =0
kj
j
y = ϕ (v )
k
k
Output
yk
THE ARTIFICIAL NEURON
Activation Function
Types of Activation Function
Oj
Oj
Oj
+1
+1
+1
t
ini
The hard-limiting
Threshold Function
Corresponds to
the biological
paradigm: either
fires or not
1
O (in ) = 
0
in > t
in ≤ t
t
Piecewise-linear
Function
ini
ini
Sigmoid Function
(differentiable)
('S'-shaped curves)
1
ϕ (v) = 1 + exp(
− av)
a is slope parameter
Activation Functions...
„
„
„
Threshold or step function (McCulloch & Pitts
model)
Linear: neurons using a linear activation
function are called in the literature ADALINEs
(Widrowy1960)
Sigmoidal functions: functions which more
exactly describe non-linear functions of the
biological neurons.
Activation Functions...sigmoid
β1
1
0
1
ϕβ (v) = 1 + exp(−βv)
if
β2
v
(i ) βν → ∞ then ϕ β (ν ) → 1
(ii ) βν → −∞ then ϕ β (v) → 0
β →∞

(iii )
then ϕ β (ν ) → 1(ν )
and ν fixed
1(v) is the modified
Heaviside function
Activation Functions... sigmoid
1 ν ≥ 0
1H (ν ) = 
0 ν < 0
1H(s) is the Heaviside function
1H (ν )
1
v
 1 ν >0

1(ν ) =  0 ν < 0
1 / 2 ν = 0

1(ν )
1
1/2
s
1(v) is the modified
Heaviside function
Activation Function value range
+1
+1
vi
vi
-1
Hyperbolic tangent Function
ϕ (v) = tanh(v)
Signum Function
(sign)
Stochastic Model of a Neuron
• So far we have introduced only deterministic
models of ANNs.
• A stochastic (probabilistic) model can also be
defined.
• If x denotes the state of a neuron, then P(v)
denotes the prob. of firing a neuron, where
v is the induced activation potential (bias +
linear combination).
P (v ) =
1
1+ e
−
v
T
Stochastic Model of a Neuron…
• Where T is a pseudo-temperature used to
control the noise level (and therefore the
uncertainty in firing)
T →0
Stochastic model Æ deterministic
model
+ 1 v ≥ 0
x=
− 1 v < 0
DECISION BOUNDARIES
Decision boundaries
• In simple cases, divide feature space by
drawing a hyperplane across it.
• Known as a decision boundary.
• Discriminant function: returns different values
on opposite sides. (straight line)
• Problems which can be thus classified are
linearly separable.
E.g. Decision Surface of a
Perceptron
x2
+
+
+
x2
+
-
+
-
-
Linearly separable
x1
x1
-
+
Non-Linearly separable
• Perceptron is able to represent some useful functions
• AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1
• But functions that are not linearly separable (e.g. XOR)
are not representable
Linear Separability
X2
A
A
A
(x2,y2)
(x3,y3)
x2 = −
(x8,y8)
A
A
(x4,y4)
A
A
(x7,y7)
B
B
(x5,y5)
(x6,y6)
Decision
Boundary
B
(x1,y1)
B
(x10,y10)
B
B
B
(x11,y11)
B
w1
t
x1+
w2
w2
X1
Rugby players & Ballet dancers
Rugby ?
2
Height (m)
Ballet?
1
50
Weight (Kg)
120
1 ν >0

f (ν ) =  0 ν = 0
− 1 ν < 0

Training the neuron
X0=-1
+1
W0 = t=?
x1
W1 = ?
v
Σ
v
W2 = ?
x2
x0 w0 + x1w1 + x3 w3 = 0
t
-1
x0 = −1; w0 = t
It is clear that:
( x, y) ∈ A iff
x1w1 + x2 w2 > t
( x, y) ∈ B iff
x1w1 + x2 w2 < t
Finding wi is called
learning
THE ARTIFICIAL NEURON
Learning
Supervised Learning
–The desired response of the system is provided by a
teacher, e.g., the distance ρ[d,o] as as error measure
– Estimate the negative error gradient direction and
reduce the error accordingly
– Modify the synaptic weights to reduce the stochastic
minimization of error in multidimensional weight
space
Unsupervised Learning
(Learning without a teacher)
–The desired response is unknown, no explicit error
information can be used to improve network behavior.
E.g. finding the cluster boundaries of input pattern
–Suitable weight self-adaptation mechanisms have to
embedded in the trained network
Training
1 if Σ wi xi >t
i=0
Output=
0
otherwise
{
„
„
„
Linear threshold is used.
W - weight value
t - threshold value
Simple network
1 if
Output=
0
AND with a Biased input
-1
X
Y
W1 = 1.5
W2 = 1
t = 0.0
W3 = 1
Σ wi xi >t
otherwise
Learning algorithm
While epoch produces an error
Present network with next inputs from
epoch
Error = T – O
If Error <> 0 then
Wj = Wj + LR * Ij * Error
End If
End While
Learning algorithm
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1
Learning algorithm
Target Value, T : When we are training a network we not
only present it with the input but also with a value
that we require the network to produce. For
example, if we present the network with [1,1] for
the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1
Training the neuron
-1
W1 = ?
x
y
W2 = ?
t = 0.0
W3 = ?
For AND
A B Output
00
0
01
0
10
0
11
1
•What are the weight values?
•Initialize with random weight values
Training the neuron
-1
W1 = 0.3
x
W2 = 0.5
t = 0.0
W3 =-0.4
y
I1 I2 I3
Summation
For AND
A B Output
00
0
01
0
10
0
11
1
Output
-1
0
0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3
0
-1
0
1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7
0
-1
1
0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2
1
-1
1
1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2
0
Learning in Neural Networks
„
„
„
„
„
„
„
Learn values of weights from I/O pairs
Start with random weights
Load training example’s input
Observe computed input
Modify weights to reduce difference
Iterate over all training examples
Terminate when weights stop changing OR
when error is very small
NETWORK ARCHITECTURE/
TOPOLOGY
Network Architecture
„
Single-layer Feedforward Networks
– input layer and output layer
• single (computation) layer
„
„
– feedforward, acyclic
Multilayer Feedforward Networks
– hidden layers - hidden neurons and hidden units
– enables to extract high order statistics
– 10-4-2 network, 100-30-10-3 network
– fully connected layered network
Recurrent Networks
– at least one feedback loop
– with or without hidden neuron
Network Architecture
Single layer
Multiple layer
fully connected
Unit delay
operator
Recurrent network
without hidden units
}
inputs
{
outputs
Recurrent network
with hidden units
Feedforward Networks (static)
Input
Layer
Hidden
Layers
Output
Layer
Feedforward Networks…
• One I/P and one O/P layer
• One or more hidden layers
• Each hidden layer is built from artificial
neurons
• Each element of the preceding layer is
connected with each element of the next
layer.
• There is no interconnection between artificial
neurons from the same layer.
• Finding weights is a task which has to be
done depending on which solution problem is
to be performed by a specific network.
Feedback Networks
(Recurrent or dynamic systems)
Input
Layer
Hidden
Layers
Output
Layer
Feedback Networks …
(Recurrent or dynamic systems)
• The interconnections go in two directions
between ANNs or with the feedback.
• Boltzmann machine is an example of
recursive nets which is a generalization of
Hopfield nets. Other example of recursive
nets: Adaptive Resonance Theory (ART) nets.
Neural network as directed Graph
x0 = +1
Wk0 = bk
x1
wk1
vk
wk2
x2
...
wkm
xm
ϕ(.)
yk
Neural network as directed Graph…
„
„
„
Block diagram can be simplify by the idea of
signal flow graph
node is associated with signal
directed link is associated with transfer
function
– synaptic links
•
•
governed by linear input-output relation
signal xj is multiplied by synaptic weight wkj
– activation links
•
•
governed by nonlinear input-output relation
nonlinear activation function
Feedback
„
Output determines in part own output via feedback
xj’(n)
xj(n)
w
z-1
„
yk
∞
( n) = ∑ w
l +1
i =0
x
j
(n − l )
depending on w
– stable, linear divergence, exponential
divergence
– we are interested in the case of |w| <1 ; infinite
memory
•
„
yk(n)
output depends on inputs of infinite past
NN with feedback loop : recurrent network
NEURAL PROCESSING
Neural Processing
• Recall
– The process of computation of an output o
for a given input x performed by the ANN.
– It’s objective is to retrieve the information,
i.e., to decode the stored content which must
have been encoded in the network previously
• Autoassociation
– A network is presented a pattern similar to a
member of the stored set, autoassociation
associates the input pattern with the closest
stored pattern.
Neural Processing…
Autoassociation: reconstruction of
incomplete or noisy image.
• Heteroassociation:
– The network associates the input
pattern with pairs of patterns stored
Neural Processing…
Classification
– A set of patterns is already divided into a
number of classes, or categories
- When an input pattern is presented, the
classifier recalls the information regarding the
class membership of the input pattern
– The classes are expressed by discrete-valued
output vectors, thus the output neurons of the
classifier employ binary activation functions
– A special case of heteroassociation
Neural Processing…
•Recognition
If the desired response is the class
number, but the input pattern doesn’t
exactly corresponding to any of the
patterns in the stored set
Neural Processing…
• Clustering
– Unsupervised classification of patterns/objects
without providing information about the actual
classes
– The network must discover for itself any existing
patterns, regularities, separating properties, etc.
– While discovering these, the network undergoes
change of its parameters, which is called Selforganization
Neural Processing…
patterns stored
Summary
„
Parallel distributed processing (especially a
hardware based neural net) is a good approach for
complex pattern recognition
(e.g. image recognition, forecasting, text retrieval, optimization)
„
„
„
„
Less need to determine relevant factors a priori
when building a neural network
Lots of training data are needed
High tolerance to noisy data. In fact, noisy data
enhance post-training performance
Difficult to verify or discern learned relationships
even with special knowledge extraction utilities
developed for neural nets
References:
1. ICS611 Foundations of Artificial Intelligence, Lecture
notes, Univ. of Nairobi, Kenya: Learning –
http://www.uonbi.ac.ke/acad_depts/ics/course_material-
1. Berlin Chen Lecture notes: Normal University, Taipei,
Taiwan, ROC. http://140.122.185.1202. Lecture notes on Biology of Behaviour, PYB012- Psychology,
by James Freeman, QUT.
3. Jarl Giske Lecture notes: University of Bergen Norway,
http://www.ifm.uib.no/staff/giske/
4. Denis Riordan Lecture notes, Dalhousie
Univ.:http://www.cs.dal.ca/~riordan/
5. Artificial Neural Networks (ANN) by
David Christiansen:
http://www.pa.ash.org.au/qsite/conferences/conf2000/
moreinfo.asp?paperid=95
References:
•Jin Hyung Kim, KAIST Computer Science Dept.,
CS679 Neural Network lecture notes
http://ai.kaist.ac.kr/~jkim/cs679/detail.htm