Download Bioinspired Computing Lecture 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Brain–computer interface wikipedia , lookup

Neuroinformatics wikipedia , lookup

State-dependent memory wikipedia , lookup

Haemodynamic response wikipedia , lookup

Aging brain wikipedia , lookup

Brain wikipedia , lookup

Axon wikipedia , lookup

Neurolinguistics wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

History of neuroimaging wikipedia , lookup

Neuropsychology wikipedia , lookup

Neuroplasticity wikipedia , lookup

Multielectrode array wikipedia , lookup

Neuroeconomics wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Connectome wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Artificial intelligence wikipedia , lookup

Cognitive neuroscience wikipedia , lookup

Neurotransmitter wikipedia , lookup

Neural oscillation wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Binding problem wikipedia , lookup

Brain Rules wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Mirror neuron wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Neurophilosophy wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Neural modeling fields wikipedia , lookup

Central pattern generator wikipedia , lookup

Circumventricular organs wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Mind uploading wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Optogenetics wikipedia , lookup

Artificial general intelligence wikipedia , lookup

Catastrophic interference wikipedia , lookup

Neural engineering wikipedia , lookup

Artificial neural network wikipedia , lookup

Single-unit recording wikipedia , lookup

Neural coding wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Neuroanatomy wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Development of the nervous system wikipedia , lookup

Convolutional neural network wikipedia , lookup

Biological neuron model wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Synaptic gating wikipedia , lookup

Metastability in the brain wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Nervous system network models wikipedia , lookup

Transcript
Bioinspired Computing
Lecture 3
Biological Neural Networks and
Artificial Neural Networks
Based on slides from
Netta Cohen
lecture 2008
1
Last week:
We introduced swarm intelligence.
We saw how many simple agents can follow simple rules
that allow them to collectively perform more complex
tasks.
Today...
Biological systems whose manifest function is information
processing: computation, thought, memory, communication
and control. We begin a dissection of a brain:
How different is a brain from an artificial computer?
How can we build and use artificial neural networks?
lecture 2008
2
Investigating the brain
Imagine landing on an abandoned alien planet and finding
thousands of alien computers. You and your crew’s
mission is to find out how they work. What do you do?
Summon Scottie, your engineer
Summon Data - your software wiz
to disassemble the machines
into component parts, test each
part (electronically, optically,
chemically…), decode the
machine language, and study
how components are connected.
to connect to the input & output
ports of a machine, find a
language to communicate with it
& write computer programs to
test the system’s response by
measuring its speed, efficiency
& performance at different tasks.
Inputs
Input
program
part
#373a
Outputs
lecture 2008
The
computer
Output
3
The brain as a computer
Higher level functions in animal behaviour
• Gathering data (sensation)
• Inferring useful structures in data (perception)
• Storing and recalling information (memory)
• Planning and guiding future actions (decision)
• Carrying out the decisions (behaviour)
• Learning consequences of these actions
Hardware functions and architectures
• 10 billion neurons in human cortex
• 10,000 synapses (connections) per neuron
• Machine language: 100mV, 1-2msec spikes (action potential)
• Specialised regions & pathways (visual, auditory, language…)
lecture 2008
4
versus
The brain as a computer
Special task: program often
hard-coded into system.
Universal, general-purpose.
Software: general, user-supplied.
Hardware not hard: plastic,
rewiring.
Hardware is hard:
Only upgraded in discrete units.
No clear hierarchy. Bi-directional
feedback up & down the system.
Obvious hierarchy: each
component has a specific function.
Unreliable components.
Parallelism, redundancy appear
to compensate.
Once burned in, circuits run
without failure for extended
lifetimes.
Output doesn’t always match
input: Internal state is important.
Input-output relations are welldefined.
Development & evolutionary
constraints are crucial.
Engineering design depends on
engineer. Function is not an issue.
lecture 2008
5
Neuroscience pre-history
• 200 AD: Greek physician Galen hypothesises that nerves
carry signals back & forth between sensory organs & the brain.
• 17th century: Descartes suggests that nerve signals account
for reflex movements.
• 19th century: Helmholtz discovers the electrical nature of
these signals, as they travel down a nerve.
• 1838-9: Schleiden & Schwann systematically study plant &
animal tissue. Schwann proposes the theory of the cell (the
basic unit of life in all living things).
• Mid-1800s: anatomists map the structure of the brain.
but…
The microscopic composition of the brain remains elusive. A
raging debate surrounds early
neuroscience research, until...
lecture 2008
6
The neuron doctrine
Ramon y Cajal (1899)
1) Neurons are cells: distinct entities (or agents).
2) Inputs & outputs are received at junctions called synapses.
3) Input & output ports are distinct. Signals are uni-directional
from input to output.
Today, neurons (or nerve cells) are regarded as the basic
information processing unit of the nervous system.
Inputs
neuron
lecture 2008
Outputs
7
lecture 2008
8
Neuron details
lecture 2008
9
Organisation of neurons
lecture 2008
10
Ion channels and spiking
• Membrane potential negative (inside /outside)
• Na+ would like to rush in but can’t
• Depolarisation opens Na+ channels, Na+
flows in
• Chain reaction! More Na+ flows in!
• This opens K+ channels, K+ flows out:
hyperpolarisation
lecture 2008
11
Macaque brain
(Felleman & van Essen 1991)
lecture 2008
12
lecture 2008
13
The neuron as a transistor
• Both have well-defined inputs and outputs.
• Both are basic information processing units that comprise
computational networks.
If transistors can perform logical operations, maybe neurons
can too?
Neuronal function is typically modelled by a combination of
• a linear operation (sum over inputs) and
• a nonlinear one (thresholding).
This simple representation relies on Cajal’s concept of
input  neuron  output
lecture 2008
14
Machine language
The basic “bit” of information is represented by neurons in
spikes. The cell is said to be either at rest or active. A spike
(action potential) is a strong, brief electrical pulse. Since
these action potentials are mostly identical, we can safely
refer to them as all-or-none signals.
Why Spikes?
Why don’t neurons use analog signals? One answer lies in
the network architecture: signals cover long distances (both
within the brain and throughout the body). Reliable
transmissions requires strong pulses.
lecture 2008
15
Computation of a pyramidal neuron
soma
axon
Many inputs
(dendrites)
Single
all-or-none
output

lecture 2008
16
From transistors to networks
We can now summarise our working principles:
• The basic computational unit of the brain is the neuron.
• The machine language is binary: spikes.
• Communication between neurons is via synapses.
However, we have not yet asked how information is
encoded in the brain, how it is processed in the brain, and
whether what goes on in the brain is really ‘computation’.
lecture 2008
17
Information codes
Temporal code
Neural code
Rate code
Population code/
Distributed code
noise
Examples of both neural codes and distributed
representations have been found in the brain.
Example in the visual system: colour representation,
face recognition, orientation, motion detection, & more…
lecture 2008
http://www.cs.stir.ac.uk/courses/31YF/Notes/Notes_NC.html
18
Information content
Example. A spike train produced by a neuron over an
interval of 100ms is recorded. Neurons can produce a spike
every 2ms.
Therefore, 51 different rates (individual code words) can be
produced by this neuron.
In contrast, if the neuron were using temporal coding, up to
250 different words could be represented.
In this sense, temporal coding is much more powerful.
lecture 2008
19
Circuitry depends on neural code
Temporal codes rely on a noise-free signal transmission.
Thus, we would expect to find very few ‘redundant’ neurons
with co-varying outputs in that network. Accordingly, an
optimal temporal coding circuit might tend to eliminate
redundancy in the pattern of inputs to different neurons.
On the other hand, if neural information is carried by a
noisy rate-based code, then noise can be averaged out
over a population of neurons. Population coding schemes,
in which many neurons represent the same information,
would therefore be the norm in those networks.
Experiments on various brain systems find either coding
systems, and in some cases, combinations of temporal
and rate coding are found.
lecture 2008
20
Neuronal computation
Having introduced neurons, neuronal circuits and even
information codes with well defined inputs and outputs, we
still have not mentioned the term computation. Is neuronal
computation anything like computer computation?
1 1 1 1 0 1
If read 1, write 0, go right, repeat.
If read 0, write 1, HALT!
If read •
, write 1, HALT!
In a computer program, variable have initial states, there are
possible transitions, and a program specifies the rules. The
same is true for machine language. To obtain an answer at
the end of a computation, the program must HALT.
Does the brain initialise variables?
lecture 2008 Does the brain ever halt?
21
Association
an example of bio-computation
One recasting of biological brain function in these
computational terms was proposed by John Hopfield in the
1980s as a model for associative memory.
Question: How does the brain associate some memory with
a given input?
Answer: The input causes the network to enter an initial
state. The state of the neural network then evolves until it
reaches some new stable state.
The new state is associated with the input state.
lecture 2008
22
Association (cont.)
Trajectories in a
schematic
state space
Whatever initial condition is chosen, the system will follow a
well-defined route through state-space that is guaranteed to
always reach some stable point (i.e., pattern of activity)
Hopfield’s ideas were strongly motivated by existing theories
of self-organisation in neural networks. Today, Hopfield nets
are a successful example of bio-inspired computing (but no
longer believed to model computation
in the brain).
lecture 2008
23
Learning
No discussion of the brain, or nervous systems more generally is
complete without mention of learning.
•
•
•
•
What is learning?
How does a neural network ‘know’ what computation to perform?
How does it know when it gets an ‘answer’ right (or wrong)?
What actually changes as a neural network undergoes ‘learning’?
body
Sensory inputs
brain
lecture 2008
Motor outputs
environment
24
Learning (cont.)
Learning can take many forms:
• Supervised learning
• Reinforcement learning
• Association
• Conditioning
• Evolution
At the level of neural networks, the best understood forms of
learning occur in the synapses, i.e., the strengthening and
weakening of connections between neurons. The brain uses its
own learning algorithms to define how connections should
change in a network.
lecture 2008
25
Learning from experience
How do the neural networks form in the brain? Once
formed, what determines how the circuit might change?
In 1948, Donald Hebb, in his book, "The Organization of
Behavior", showed how basic psychological phenomena of
attention, perception & memory might emerge in the brain.
Hebb regarded neural networks as a collection of cells that
can collectively store memories. Our memories reflect our
experience.
How does experience affect neurons and neural networks?
How do neural networks learn?
lecture 2008
26
Synaptic Plasticity
Definition of Learning: experience alters behaviour
The basic experience in neurons is spikes.
Spikes are transmitted between neurons through synapses.
Hebb suggested that connections in the brain change in
response to experience.
delay
Pre-synaptic cell
Post-synaptic cell
time
Hebbian learning: If the pre-synaptic cell causes the
post-synaptic cell to fire a spike, then the connection
between them will be enhanced. Eventually, this will
lecture 2008 in the network.
lead to a path of ‘least resistance’
27
Today... From biology to information processing
At the turn of the 21st century, “how does it work”
remains an open question. But even the kernel of
understanding and simplified models we already have for
various brain function are priceless, in providing useful
intuition and powerful tools for bioinspired computation.
Next time... Artificial neural networks (part 1)
Focus on the simplest cartoon models of biological neural
nets. We will build on lessons from today to design simple
artificial neurons and networks that perform useful
computational tasks.
lecture 2008
28
The Appeal of Neural Computing
The only intelligent systems that we know of are biological. In
particular most brains share the following feature in their neural
architecture – they are massively parallel networks organised into
interconnected hierarchies of complex structures.
For computer scientists, many natural systems appear
to share many attractive properties:
• speed, tolerance, robustness, flexibility, self-driven
dynamic activity
In addition, they are very good at some tasks that
computers are typically poor at:
• recognising patterns, balancing conflicts, sensorymotor coordination, interaction with the
environment, anticipation, learning… even curiosity,
creativity & consciousness.
lecture 2008
29
The first artificial neuron model
In analogy to a biological neuron, we can think of a virtual
neuron that crudely mimics the biological neuron and performs
analogous computation.
inputs
Σ
output
cell
body
Just like biological neurons, this artificial neuron neuron will have:
• Inputs (like biological dendrites) carry signal to cell body.
• A body (like the soma), sums over inputs to compute output, and
• outputs (like synapses on the axon) transmit the output downstream.
The artificial neuron is a cartoon model that will not have all
the biological complexity of real neurons. How powerful is it?
lecture 2008
30
Early history (1943)
McCulloch & Pitts (1943). “A logical calculus of the ideas immanent in nervous
activity”, Bulletin of Mathematical Biophysics, 5, 115-137.
In this seminal paper, Warren McCulloch and Walter Pitts
invented the first artificial (MP) neuron, based on the insight
that a nerve cell will fire an impulse only if its threshold
value is exceeded. MP neurons are hard-wired devices,
reading pre-defined input-output associations to determine
their final output. Despite their simplicity, M&P proved that a
single MP neuron can perform universal logic operations.
A network of such neurons can therefore do anything a
Turing machine can do, but with a much more flexible (and
potentially very parallel) architecture.
lecture 2008
31
The McCulloch-Pitts (MP) neuron
• Inputs x are binary: 0,1.
• Each input has an assigned weight w.
• Weighted inputs are summed  in the cell body.
• Neuron fires if sum exceeds (or equals) activation threshold .
• If the neuron fires, the output =1.
The “computation”
• Otherwise, the output=0.
consists of "adders" and a
x1 *
w1
threshold.
x2 *
••
•
x

(
inputs
n

w3

••
•
output
* w
n
*
weights
inputs
x3 *
w2
*
wb
)
over all i
=
bias
1 if   
0 if  < 
lecture 2008
Note: an equivalent
formalism assigns =0
& instead of threshold
introduces an extra
bias input, such that
bias * wbias = - 
32
Logic gates with MP neurons
For binary logic gates, with only one input, possible outputs are described
by the following truth tables:
Always 0
IN 1 OUT 1
0
1
IDENTITY
IN 1 OUT 2
0
0
For example:
0
1
x
0
1
NOT
IN 1 OUT 3
0
1
1
0
Always 1
IN 1 OUT 4
0
1
1
1

NOT x
w
w = -1
 = -0.5
Excercise: Find w and  for the 3 remaining gates.
lecture 2008
33
Logic gates with MP neurons
(cont.)
With two binary inputs, there are 4 possible inputs
and 24 = 16 corresponding truth tables (outputs)!
For example, the AND gate implemented in the MP neuron:
Here is a compact,
graphical representation
of the same truth table:
x1

1
x2
x1 AND x2
1
 = +1.5
IN 2
IN 1 IN 2 OUT
0
0
0
0
1
0
1
0
0
1
1
1
0
1
lecture 2008
IN 1
0
1
0
0
0
1
Excercise:
Find w and 
for OR & NAND.
34
Computational power of MP neurons
Universality: NOT & AND can be combined to perform any logical
function; MP neurons, circuited together in a network can solve
any problem that a conventional computer could.
But let’s examine the single neuron a little longer.
Q: Just how powerful is a single MP neuron?
A: It can solve any problem that can be expressed as a
classification of points on a plane by a single straight line.
IN 2
IN 1
AND
0
1
0
0
0
1
0
1
Generalisation to many inputs:
points in many dimensions are
now classified, not by a line,
but by a flat surface.
Even one neuron can successfully handle simple
classification problem.
lecture 2008
35
Classification in Action
A set of patients may have a medical problem. Blood samples
are analysed for the quantities of two trace elements.
x1
w1
x2
w2
inputs
*
weights
bias w3
trace 1 trace 2 problem?
∑xi wi
sum output
2.4
9.8
1.2
0.4
7.9
6.7
etc.
1.0
8.3
0.2
2.1
8.8
7.2
etc.
yes
no
yes
yes
no
no
etc.
output
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
etc.
+6.6
Yes
-8.1
No
+8.6
Yes
+7.5
Yes
-6.7
No
-3.9
No
etc.
etc.
+ive output = problem w1=-1, w2=-1, w3=+10 & bias=+1
With correct weights, this MP neuron consistently
classifies patients.
lecture 2008
36
The missing step
The ability of the neuron to classify inputs correctly hinges
on the appropriate assignment of the weights and
threshold.
So far, we have done this by hand.
Imagine we had an automatic algorithm for the neuron to
learn the right weights and threshold on its own.
In 1962, Rosenblatt, inspired by biological learning rules,
did just that.
lecture 2008
Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan, New York
37
The perceptron algorithm
Take wj random
START: Take X ε F+ U FCHECK:
ADD:
SUB:
if x ε F+ and Σ wjxj > 0 goto START
if x ε F+ and Σ wjxj ≤ 0 goto ADD
if x ε F- and Σ wjxj ≤ 0 goto START
if x ε F- and Σ wjxj > 0 goto SUB
wj → wj + xj
goto START
wj → wj - xj
goto START:
lecture 2008
38
The Perceptron Theorem
• Says that the previous algorithm will
converge on a set of weights in a finite
number of steps if w* exists
lecture 2008
39
 Learning Rule:
Imagine a naive, randomly weighted neuron. One way to train a
neuron to discriminate the sick from the healthy, is by reinforcing
good behaviour and penalising bad. This carrot & stick model is
the basis for the  learning rule:
• Compile a training set of N (say 100) sick and healthy patients.
•
Initialise the neuronal weights (random initialisation is the standard).
Run each input set in turn through the neuron & note its output.
Whenever a wrong output is encountered, alter responsible weights.
•
wi  wi + xi if output too low
wi  wi  xi
if output too high
Repeatedly run through training set until all outputs agree with targets.
•
•
• When training is complete, test the neuron on a new testing set of patients.
• If neuron succeeds, patients whose health is unknown may be determined.
lecture 2008
40
Related idea
• Minimize E = Σi(ti – oi )2
• Here:
– t is the desired output
– o is the observed output
• Find weights that minimize E
• Steepest gradient descent will also yield
lecture 2008
41
Supervised learning
The  learning rule is an example of supervised learning.
Training MP neurons requires a training set, for which the
‘correct’ output is known.
These ‘correct’ or ‘desired’ outputs are used to calculate
the error, which in turn is used to adjust the input-output
relation of the neuron.
Without knowledge of the desired output, the neuron
cannot be trained. Therefore, supervised learning is a
powerful tool when training sets with desired outputs are
available.
When can’t supervised learning be used?
Are biological neurons supervised?
lecture 2008
42
A simple example
Let’s try to train a neuron to learn the logical OR operation:
Decision ( le 0 or gt 0)
x1
w1
x2
w2 ∑ x w
i i
w3
x3
output
bias
desired output
x1 OR x2
x1
x2
x3
0
0
1
0
0
1
1
1
1
0
1
1
1
1
1
1
Example on white board
0
w0i  wi + xi if output low
0 high
w1i  wi ∑ xxii wi if output
lecture 2008
43
The power of learning rules
The  rule is guaranteed to converge on a set of appropriate
weights, if a solution exists. While it might not be the most
efficient of algorithms, this proven convergence is crucial.
What can be done to improve the convergence rate?
Some common variations on this learning rule:
Adding a learning rate 0<r<1 which “damps” weight
changes (i = rxi or i = -rxi).
Widrow & Hoff recognised that weight changes should be
large when actual output a and target output t were very
different, but smaller otherwise.
They introduced an error term, ∆=t-a, such that i =r∆xi.
lecture 2008
44
 Learning Rule
• Called  rule because weight
updates have the following form
• w→w+x
•  is a measure for the error:  = 0,
no weight change
lecture 2008
45
The Perceptron convergence
Theorem
• Suppose there are two sets:
• F+ and F- ; F+ ∩ F- empty
Goal:
• X ε F+ → Σ wjxj > 0
• X ε F- → Σ wjxj < 0
• If there are wj* for which this is true, then
the following algorithm finds wj (possibly
different ones) which also do the trick
lecture 2008
46
The Fall of the Artificial Neuron
•
•
Before long researchers had begun to discover the neuron’s limitations.
Unless input categories were “linearly separable”, a perceptron could not
learn to discriminate between them.
• Unfortunately, it appeared that many important categories were not
linearly separable. This proved a fatal blow to the artificial neural
networks community.
In this example, an MP
Successful
neuron would not be
able to discriminate
Few
Many
between the footballers
Hours
Hours
and the academics…
in the
in the
This failure caused the
Gym
Gym
majority of researchers
per
per
to walk away.
Week
Week
Unsuccessful
Footballers
Academics
lecture 2008
Exercise: Which logic
operation is described in
this example?
47
Marvin Minsky & Seymour Papert (1969). Perceptrons, MIT Press, Cambridge.
Connectionism Reborn
The crisis in artificial neural networks can be understood, not as
an inability to connect many neurons in a network, but an inability
to generalise the training algorithms to arbitrary architectures. By
arranging the neurons in an ‘appropriate’ architecture, a suitable
training algorithm could be invented. The solution, once found,
quickly emerged as the most popular learning algorithm for nnets.
Back-propagation first discovered in 1974 (Werbos, PhD thesis,
Harvard) but discovery went unnoticed. In the mid-80s, it was
rediscovered independently by three groups within about one year.
Most influential of these was a two-volume book by Rumelhart &
McClelland, who suggested a feed-forward architecture of
neurons: layers of neurons, with each layer feeding its
calculations on to the next.
David E. Rumelhart & James L. McClelland (1986).lecture 2008
Parallel Distributed Processing, Vols. 1 & 2, MIT Press, Cambridge, MA.
48
This time…
•
•
•
•
•
The appeal of neural computing
From biological to artificial neurons
Nervous systems as logic circuits
Classification with the McCulloch & Pitts neuron
Developments in the 60s:
– The Delta learning rule & variations
– Simple applications
– The fatal flaw of linearity
Next time…
The disappointment with the single neuron dissipated as promptly as it
dawned upon the AI community. Next time, we will see why the single
neuron’s simplicity does not rule out immense richness at the network
level. We will examine the simplest architecture of feed-forward neural
networks and generalise the delta-learning rule to these multi-layer
networks. We will also re-discover
some
lecture
2008 impressive applications.
49
Optional reading
Excellent treatments of the perceptron, the delta rule & Hebbian
learning, the multi-layer perceptron and the back-propagation
learning algorithm can be found in:
Beale & Jackson (1990). Neural Computing, chaps. 3 & 4.
Hinton (1992). How neural networks learn from experience,
Scientific American, 267 (Sep):104-109.
lecture 2008
50