Download 10 - 11 : Fundamentals of Neurocomputing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Perceptual learning wikipedia , lookup

Metastability in the brain wikipedia , lookup

Psychophysics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Nervous system network models wikipedia , lookup

Biological neuron model wikipedia , lookup

Reconstructive memory wikipedia , lookup

State-dependent memory wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Learning wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Learning theory (education) wikipedia , lookup

Central pattern generator wikipedia , lookup

Synaptic gating wikipedia , lookup

Artificial neural network wikipedia , lookup

Concept learning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Convolutional neural network wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
Fundamentals of Neurocomputing
• systems constructed to make use of some of the organizational principles felt to be used by the
brain.
• theoretical themes
1. network structure
2. learning algorithms
3. knowledge representation
The Generic Connectionist Model
• neurons receive inputs from other neurons via synapses which can be excitatory or inhibitory.
• excitatory input - the receiving neuron is likely to fire action potentials.
• inhibitory input - the receiving neuron is less likely to fire.
• outputs are sent to other neurons by axons.
• a neuron contains a continuous internal potential called a membrane potential and when this
exceeds
a threshold, the neuron can propagate an all-or-nothing action potential down its axon.
• many neurons can be active simultaneously — the set of simultaneous element activities is
represented
by a state vector.
• artificial neural networks have many computing elements connected together — often arranged in a
connection matrix.
• overall system behaviour is determined by the structure and strengths of the connections —
the strengths
may be changed by various learning algorithms.
• the learning phase — connection strengths in the network are modified.
• the retrieval phase - some initial information (initial state vector or activity pattern) is put
into the
system, passes through the connections and gives rise to an output pattern.
• network structure
— elements are arranged in groups or layers.
— a single layer of neurons that connects to itself is referred to as an autoassociative system.
— multi-layer systems contain input and output neurons and neurons which are neither, called
hidden units.
• brain-like general rules for representations:
1. similar inputs usually give rise to similar representations.
2. things to be separated should be given widely different representations.
3. if something is important, lots of elements should be used to represent it.
4. do as much lower-level preprocessing as possible, so the learning and adaptive parts of the
network
need do as little work as possible
— build invariances into the hardware and do not require the system to learn them.
Foundations of Artificial Neural Networks
• Three Elements
1. an organized topology of interconnected processing elements.
2. a method of encoding information.
3. a method of recalling information.
• Two Key Concepts
1. techniques for analyzing neural network dynamics.
2. general taxonomy of all neural network paradigms.
Processing Elements
• components where most, if not all, of the computing is done.
• input signals
— from environment or other PEs.
— form an input vector A = (a1,..., ai,..., an) where ai, is the activity level of the ith PE or input.
• weights
— associated with each connected pair of PEs is an adjustable value.
— the weights connected to the jth PE form a vector Wj = (w1j,.. .,Wij,.. .,Wnj) where Wij
represents the
connection strength from the PE ai to the PE bj.
• internal threshold value
— j is modulated by the weight WOJ that is associated with the inputs.
— must be exceeded for there to be any PE activation.
• output value
bj = f(A • Wj – woj j) or
bj = f(ni=1 ai Wij – Woj j)
Threshold Functions
• map a PE's infinite input domain to a prespecified range or output.
• there are four common functions:
1. linear function
2. nonlinear ramp function
3. step threshold function
4. sigmoid threshold function
Linear Function
• f(x) = x where  is a real-valued constant that regulates the magnification of the PE activity
x.
Nonlinear Ramp Function
• the linear function bounded to the range [—, +].
f(x) = + if x  
x if x < 
- if x  -
where  (-) is the PE's maximum (minimum) output value or saturation level.
• this is a piece-wise linear function which is often used to represent a simplified nonlinear
function.
Step Threshold Function
• the threshold function only responds to the sign of the input.
f(x) = + if x > 0
- otherwise
where  and  are positive scalars.
• often this is a binary function with 1 and 0 outputs.
Sigmoid Threshold Function
• this S-shaped function is a bounded, monotonic, non-decreasing function that provides a
graded,
nonlinear response.
• examples
1. the logistic function
S(x)=(l+e- x)- l
with saturation levels of 0 and 1.
2. the hyperbolic tangent
S(x) = tanh(x)
with saturation levels -1 and 1.
3. the augmented ratio of squares
f(x) = (x2)/(1+ x2) if x > 0
0
otherwise
with saturation levels 0 and 1.
Topology Characteristics
1. Connection Types
(a) excitatory
• these connections increase a PE's activation.
• represented by a positive signal.
(b) inhibitory
• they decrease a PE's activation.
• represented by a negative signal.
2. Interconnection Schemes
(a) intra-field or lateral connections
• connections between PEs in the same layer.
(b) inter-field
• connections between PEs in different layers.
(c) recurrent
• connections that loop and connect back to the same PE.
• inter-field signals propagate in one of two ways:
(a) feedforward signals only allow information to flow amongst PEs in one direction.
(b) feedback signals allow information to flow amongst PEs in either direction.
3. Field or Layer Configurations
• layer configurations combine PEs, information flow and connection schemes into an
architecture.
• types:
(a) lateral feedback
(b) field feedforward
(c) field feedback
• input layer - layer that receives input signals from the environment.
• output layer — layer that emits signals to the environment.
• hidden layers - any layers that lie between the input and output layers.
Memory
• pattern types
1. spatial - single static image.
2. spatiotemporal - a sequence of spatial patterns.
• types of spatial pattern matching memories
1. random access memory - maps addresses to data.
2. content-addressable memory — maps data to addresses.
3. associative memory - maps data to data.
• artificial neural networks can provide:
1. CAM - stores data at stable states in some memory matrix W.
2. AM - provides output responses from input stimuli.
• mechanisms for mapping
1. autoassociative
— the memory, W, stores the vectors (patterns) A1,...,Am.
2. heterassociative
— W stores pattern pairs (A1,B1),..., (Am, Bm}.
Recall
• a heterassociative recall mechanism is a function g() that takes W (memory) and Ak (stimuli)
as input
and returns Bk (responses) as output.
B=g(Ak,W)
• two primary recall mechanisms:
1. nearest-neighbour recall
2. interpolative recall
1. Nearest-Neighbour Recall
• finds the stored input that closely matches the stimulus and responds with the corresponding
output.
Bk= g(A',W) where dist{A',Ak)= MIN{dist{A',Aq)}
Q=1 to m
where dist() is usually the Hamming or Euclidian distance function.
2. Interpolative Recall
• accepts a stimulus and interpolates (possibly nonlinearly) from the entire set of stored inputs
to produce
the corresponding output.
• for linear interpolation:
B' = g(A', W) where Ap <= A' <= Aq and Bp <. B' ^ Bq
for some pattern pairs (Ap, Bp) and (Aq, Bq).
Learning
The ANN Perspective
• learning is defined to be any change in the memory W.
Learning = dW/dt  0
• two categories
1. supervised learning
2. unsupervised learning
Supervised Learning
• a process that incorporates an external teacher and/or global information.
• techniques:
— deciding when to turn off learning
— deciding how long and how often to present the training
— supplying performance error information
• algorithms: error-correction learning, reinforcement learning, stochastic learning, hardwired
systems.
Unsupervised Learning (Self-Organization)
• process that incorporates no external teacher and relies upon only local information
and
internal control.
• self-organizes presented data and discovers its emergent collective properties.
Error-Correction Learning (supervised)
• adjusts the connection weights between PEs in proportion to the difference between the
desired and
computed values of each PE in the output layer.
wij =  ai [cj - bj}
where
— Wij is the memory connection strength from ai to bj.
—  is the learning rate, typically 0 <  << 1.
Reinforcement Learning (supervised)
• weights are reinforced for properly performed actions and punished for poorly performed
ones.
• requires only one value to describe the output layer's performance - a scalar error value.
• wij =  [r — j] eij where
— r is the scalar success/failure value.
— j is the reinforcement threshold value for the jth output PE.
— eij is the canonical eligibility of the weight from the ith PE to the jth PE.
—  is the learning rate constant, 0 <  < 1.
Stochastic Learning (supervised)
• uses random processes, probability and an energy relationship to adjust the memory connection weights.
• makes a random weight change, determines the resultant energy after the change, and decides
to keep
the weight change
1. if the energy is lower - accept the change
2. if the energy is not lower — accept the change according to a pre-chosen probability
distribution
3. otherwise - reject the change.
• allows escape from local energy minima, e.g. simulated annealing.
Donald Hebb on Learning
• Hebb describes the adjustment of a connection weight according to the correlation of the
values of the
two PEs it connects in his book, "The Organization of Behavior (1949)":
When an axon of cell A is near enough to excite a cell B and repeatedly or
persistently takes
a part in firing it, some growth process or metabolic change takes place in one or
both cells
such that A's efficiency as one of the cells firing B is increased.
Donald Hebb on Organization
• Cell Assemblies
— the cooperative nature of synaptic modifications is such as to induce the formation of
subsets
made up of cells which are mutually activated. A single cell may belong to several
assemblies,
and several assemblies may be active simultaneously. The information would then be
contained
in these patterns of collective excitation, and would therefore possess a distributed
nature.
Hebbian Learning or Correlation Learning
• the adjustment of a connection weight according to the correlation of the values of the two
PEs it connects.
• simple Hebbian correlation
— the weight value Wij is the correlation (multiplication) of the PE a, with the PE a, using
the discrete
time equation
wij = ai aj
where wij represents the discrete time change to Wij.