Download Connexionism and Computationalism

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Theta model wikipedia , lookup

Neuroeconomics wikipedia , lookup

Neuromuscular junction wikipedia , lookup

Endocannabinoid system wikipedia , lookup

Neurophilosophy wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

State-dependent memory wikipedia , lookup

Binding problem wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Brain Rules wikipedia , lookup

Multielectrode array wikipedia , lookup

Artificial general intelligence wikipedia , lookup

Neural oscillation wikipedia , lookup

Axon wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Neural engineering wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Synaptogenesis wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Mind uploading wikipedia , lookup

Mirror neuron wikipedia , lookup

Neuroanatomy wikipedia , lookup

Artificial neural network wikipedia , lookup

Optogenetics wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neurotransmitter wikipedia , lookup

Central pattern generator wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Single-unit recording wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Catastrophic interference wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Chemical synapse wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Neural coding wikipedia , lookup

Development of the nervous system wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Metastability in the brain wikipedia , lookup

Neural modeling fields wikipedia , lookup

Convolutional neural network wikipedia , lookup

Recurrent neural network wikipedia , lookup

Biological neuron model wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Synaptic gating wikipedia , lookup

Nervous system network models wikipedia , lookup

Transcript
Draft 3
6/29/2017
Connectionism – Some Thoughts concerning Neural Networks
1. Introduction – Connexionism and Computationalism
To understand how Artificial Neural Networks work and can be applied to produce artificial minds, it
is useful to compare them with the Computationalist (also known as “Symbolic”) approach to
constructing artificial minds.
Let’s revisit Newell and Simons “General Problem Solver” which we explored through the “blocks
problem”, where starting from a single black block (the axiom) and a series of production rules we
generated a series of patterns (theorems) by the application of each rule, and chose the theorem which
was closest to the goal state (alternate series of black and while blocks). We then applied all rules to
this new state. In the language of the Symbolic paradigm, the set of theorems at any time is called the
“working memory”, or “short-term memory”. The rules themselves can be thought of as “long-termmemory” since this is what the system knows, from a declarative perspective. From an imperative
perspective these rules are operations, they make up what the system can do. The control structure
decides which production rule to apply next. The blocks example chose the result of the rule which
produced a theorem closest to the goal state. We commented that this was an example of a “hillclimbing” algorithm, where we always attempted to move upwards to the goal which is located at the
mountain peak. At each stage of the production system, we created a new theorem (a pattern of blocks)
which was added to the working memory, to be used or discarded. Remember that this approach was
used as an alternative to testing all possible rules on all possible theorems, since we know that in
general such a computation may be intractable.
At each stage in the computation of the blocks problem, we had a goal, the current state, and the next
rule to apply. We could represent each of these symbolically as attribute-value pairs residing in
working memory (this harks back to Newell and Simon’s “Symbols Physical System” hypothesis. For
example we could have
(goal bwbwbwbwbwbw)
(state bwwbb)
(rule b -> bbb)
These symbols are represented as exact statements which can be processed by an imperative or
declarative paradigm. It has been traditionally asserted that symbols are the fodder of
Computationalism and that connexionism eschews the concept of a symbol.
2. Biological Neural Networks
As explained in class, biological neural networks, following the work of Golgi and Cajal have been
found to comprise a network of neurons. Each neuron has a clear structure with input “dendrites”, a
processing “soma” and an output “axon”. Each neuron is connected on average to around 1000 other
neurons through “synapses” which are connections whose strength may change as the brain is exposed
to stimuli (and therefore learns). Neurons work using electrical signals, which travel at about 100
km/hour (which is slow compared to CPU signals). The electrical signals are produced by movement of
Draft 3
6/29/2017
sodium and potassium ions across the neuron membrane wall. For example, when the neuron receives
an excitatory input, sodium rushes in and potassium rushes out. If there are sufficient inputs to the
neuron then these ion movements add up and will cause the neuron to generate an output signal. The
structure of the axon enables this signal to propagate like a Tsunami wave along the axon to the next
neuron.
When the electrical wave arrives at the next neuron, it terminates in a “synapse” which then connects to
the next neuron’s dendrite. Think of it like this. You travel from Worcester to London and you arrive in
Paddington (the Synapse). But you are traveling to Brighton so you need to get to Victoria. How? The
tube (Bakerloo then Victoria Lines). Well, the tube is analogous to a synapse, and here’s a working
synapse:
The axon terminating in the synapse is shown on the left, and when the electrical signal arrives at the
synapse, it releases neurotransmitters which are chemicals. (So the overground trains are like electrical
signals and the neurotransmitters are like the tube-trains). This activity needs the presence of Calcium
ions. When the neurotransmitters reach the dendrite of the next neuron, then the dendrite opens sone
“input channels” so that the neurotransmitters can interact with receptor proteins. This interaction leads
to further movement of sodium and potassium ions (as explained at the start of this paragraph). Of
course, it takes some time to re-generate the neurotransmitters.
And so the process goes on. I find this whole thing interesting, that our brain is a combination of pure
electrical signals within the neuron’s soma and axon, and chemical signals (which are very slow) across
the synapse-neuron interface. Why do we need the slower chemical signals? I guess it’s to do with
learning where the synaptic strength changes. At this time, well, Judge Judy is out.
3. The difference between Biological and Artificial Neural Networks (ANNs)
There are three main differences in the theories of Biological Neural Networks (BNNs) and Artificial
Neural Networks (ANNs). The first difference concerns the degree of modeling. BNN models attempt
Draft 3
6/29/2017
to reproduce accurately the neurophysiological processes actually found in living (live) neurons. These
models include concentrations of the various chemicals, such as potassium, sodium and chloride which
have been found to explain how biological neurons work. These models may include tens of interrelated variables, and are intended to understand how the biological neurons work and are not intended
to help us construct Artificial Intelligence. ANNs on the other hand are simpler models and have a
small number of variables such as the inputs into the neuron, the ‘firing threshold’ and the output.
The second difference concerns the modeling approach. BNN models use systems of ordinary
differential equations (ODEs) which model how the various chemical concentrations vary with time.
This is often called the “integrate-and-fire” model. This uses the language of dynamic systems
modeling. On the other hand ANN models are not interested in individual neuron dynamics, they take a
functional modeling approach, specifying how the neuron output is computed as a function of the
neuron inputs (instantaneously, without the need for time). Typically these models sum the inputs to
the neuron, see if this sum is above or below a threshold and respectively decide whether to “fire”
(produce an output) or not. All this happens in one time step.
The third difference concerns the modeling structure, for example in BNNS we will accurately model
the flow of the electrical signal down the axon, which takes some time. It would be daft for ANNs to
include this, since it would act to slow down the ANN computations.
4. An example Artificial Neural Network – The Retina
We saw in class a slide from Golgi and Cajal of an animal retina. We clearly saw how the light receptor
neurons (‘rods’ and ‘cones’) were connected to intermediate neurons which descended into the optic
nerve. Here’s a hypothetical connection of Artificial neurons, which we could use to construct a vision
system. In the diagram below we look at just one point in the retina where there are three light
receptors each sensitive to, respectively (R)ed, (G)reen and (B)lue light. One circuit in the ANN
combines the outputs of these sensor neurons to produce the symbol “white”. The other circuit produces
the symbol “white”. But note how the presence of “white” has to inhibit the presence of “red” for the
“red” to output without being a component of “white”.
Draft 3
6/29/2017
How would a symbolic system be programmed to deal with this situation? RGB input values of
(255,255,255) would be interpreted as “white”. If(r == 255 && g == 255 & b == 255) color = white
but (254,243, 255) would not be, because they would fail the if-statements. So you change the program
to work with limits, e.g. If ( (r > 240) && (r <= 255) ) color = red. But who sets these limits? The
programmer. But on what basis, how can he decide where to set the limits?
If we return to an ANN, then the ANN are able to “set these limits” by experience, from the training set
where various shades of “white” are presented. (Procul Harem ). So if an image which was ‘off-white’
were presented to the ANN, the AAN would respond with “white” rather than “purple”. This is called
generalization which is the ability of an ANN to give a correct output when the inputs are close (but
not exact) to those presented in the training set.
Let’s consider the ethnographic difference between ourselves as individuals with out visual experiences
in the UK and those of Eskimos in an EK. Individuals within these societies (located in time and space)
have the same “neural makeup” (same ‘brains’). But because of the different visual experiences we all
have (Eskimos may see various colors of “white snow” within their world, whereas for us “white =
paper”, but we have more colors), our BNNs learn from the visual stimuli to produce different
classifications of the stimuli. What we perceive as “pink” may correspond to the Eskimo’s shade
number 13 of “white”.
Let’s look at the neurons in a real biological retina.
Draft 3
6/29/2017
5. The Individual Artificial Neuron
The Artificial neuron presented in class was a really cut-down version of the full-blown Artificial
neuron which is used in AI research. Here we said that a neuron sums its inputs, and if the input was
above a threshold would “fire” to produce an output signal. So the computation is a two-stage process.
The figure below show this, (a) in general and (b) to produce an “or-gate”.
We also saw in class that “learning” in neural networks (both BNNs and ANNs) consisted in the
change in the synapse strength increasing or decreasing the strength of connection between neurons. So
we need to extend our cut-down version to include this. This is done through the introduction of
weights into the input summation process. The diagram below shows how the weights are used in the
first part of the computation; each input is multiplied by its synaptic weight of its incoming connection
to the neuron before it is summed. In this example, if the weighted sum is equal to or greater than the
threshold, then the neuron will fire. This is the original McCulloch-Pitts model.
Draft 3
6/29/2017
Note that some weights may be negative, which would reduce the total input to the neuron. These
synapses are called inhibitory since they inhibit the firing of the neuron. You’ll see this working in the
example of the NOT-gate in the next section.
5. Neurons can effect the operation of Logic Gates
We saw in class that a simple neuron can replicate the functionality of an AND and an OR gate. Here
we shall extend this, demonstrating how neurons can produce more gates such as NOT and NAND.
There is a fundamental issue here, since as all elements of the CPU and memory can be constructed
through a combination of NAND (AND-followed-by-NOT) gates. So if we can show that neurons can
effect a NAND gate, then we have shown that neurons can effect the principal components of a PC. In
other words, we can build a biological PC.
First let’s have a look at the AND gate. The figure below shows the equivalent neuron. We can
calculate the weighted sums of the inputs and compare with the threshold for all four possible values of
input, e.g. if A=1 and B=1 and the weights are both 1 then we calculate (1x1) + (1x1) = 1 + 1 = 2. This
is equal to the threshold (shown inside the neuron) so the neuron will fire giving an output 1. Calculate
the other values for yourself.
A
0
0
1
1
B
0
1
0
1
(1 x A) + (1 x B)
0
1
1
2
out
0
0
0
1
Draft 3
6/29/2017
Now for the not gate whose equivalent neuron is sketched below. Here the weight is -1 and the
threshold is zero. So if we have an input 0, the weighted sum is -1 x 0 = 0. This is equal to the
threshold, so the neuron fires. Input 0, output 1, that’s a not-gate. Now for input 1, the weighted sum is
-1x1 = -1, which is less than the threshold. So for 1 in we get 0 out. Again that’s again a not-gate.
6. Short-term and Long-term memory.
Think of “short-term-memory” (STM) as containing information you have seen, heard, felt or smelt
within the last few seconds, It may include some of the above diagrams or the information present in
the sentence you have just finished reading. Common wisdom says that we can hold seven (plus or
minus two) “chunks” of information in our mind at any one time. Perhaps that’s why phone numbers
have 6 digits? The diagram below shows a theoretical model of STM from Steven Grossberg. While it
shows three neurons, A,B,C, in reality the BNN would use pools of many hundreds of neurons.
OK, so how does this work. The key neurons are A and B which are connected together in a loop where
A excites B and B excites A. So if A is firing then it fires B which causes A to fire again and so on. If
A is not firing then it does not excite B. So the A-B loop is storing “firing” or “not-firing”. That’s the
memory element. So how do we know if A-B are firing? Well that’s determined by the input to the
Draft 3
6/29/2017
loop which comes in as excitation to A as shown. How do we get the memory out of the loop? That’s
the function of neuron C which fires if B is firing together with an “arousal” signal which comes from
another part of the brain. The output of C also connects back to B but with an inhibitory weight. So
when C fires, it stops B firing (if it was) which stops A firing, removing the memory trace. In other
words when the data is read from memory the memory cell is cleared.
Long term memory is easier to understand; this involves a change in synaptic strength. The synapse
between neurons A and B is strengthened whenever A and B are simultaneously firing. This “fire
together wire together” principle proposed by Donald Hebb has become a paradigm in the
neurosciences. The figure below is intended to illustrate this:
7. Learning
[Supervised, Unsupervised]
8. Central Pattern Generators
9. Connectionism versus Computationalism.
The computationalist approach rests on the use of symbols and the connection between these symbols
as explained above. This involves rules which relate these symbols in a programming aspect. Each
symbol exists as a data object and can be processed by a functional paradigm. The symbol structures
are defined by the programmer and therefore exist as explicit data structures (e.g. statements) within
the computer program. These symbols are explicitly present as statements and can be observed within
the computer program.
The connectionist approach is different. A typical neural network may involve three layers, the “input”
(I) , the output “O” and the hidden “H”. It can be trained through asserting values to the input and
output layers, and this will work as seen in class. For example we could train a three layer network to
simultaneously compute the AND and the NAND of the input values. But what exists in the H-layer?
That’s what we shall try to understand here.
Let’s have a look at the “training set” We apply the input patterns to neurons 0 and 1, and the output
patterns to neurons 6 and 7 like this.
Draft 3
6/29/2017
inputs
neuron 0 neuron 1
0
0
0
1
1
0
1
1
output
neuron 6 neuron 7
0
1
0
1
0
1
1
0
So we are training neuron 6 to make the AND of the inputs and neuron 7 to make the NAND of the
inputs. (The training set is available for my NNET1 Simulator : “Training2” to be used with “Netspace
= Net2”). How let’s look at the values of the hidden layer and see if we can find any structure there. I
ran my simulator for 8230 steps in the learning mode (time to cook some Quorn burgers), and then
entered the single-step run mode and here are the neurons’ values.
Neuron
Values
0
0
0
1
1
1
0
1
0
1
2
0.01
0.17
0.17
0.81
3
0.86
0.63
0.60
0.30
4
0.03
0.25
0.24
0.77
5
0.45
0.42
0.43
0.41
6
0
0
0
1
7
1
1
1
0
We’re interested in the hidden layer values (neurons 2 to 5). At first these appear a load of different
numbers, but no. Behold the second and third lines (corresponding to the inputs 0,1 and 1,0). The
activation values of the hidden neurons are almost identical. It is as though the network has learned that
the input patterns 0,1 and 1,0 are actually the same (in the context of the training set). But we can
interpret this in a different way. We could suggest that these hidden layer values represent an encoded
version of the input patterns. And if we acknowledge that these patterns are symbols then we can argue
that a neural network’s hidden layer provides an encoded representation of symbols. Therefore, neural
networks do support symbolic representations.
So the processing from the input to the hidden layer can be viewed as an encoding of the input patterns
into a representation of the patterns which is distributed across the four hidden nodes. So what about
the processing from the hidden nodes to the output nodes? Well we can view this as a decoding of the
distributed symbolic information into the output values. But the encoding and decoding are not
identical, since the input and output patterns are not the same. So we conclude that the ANN has
learned the rules for encoding and the rules for decoding. So viewed in this light, we see that the
functioning of ANNs can be seen to be equivalent to an imperative program comprising a set of
encoding rules and a set of decoding rules. We could of course write an equivalent imperative program.
And that’s my point, that ANNs are no less powerful that computationalist imperative silly-symbol
programs. But there’s more to come; I suggest ANNs are better, more powerful as well as being rather
cute, and here’s why.
Imagine we coded an imperative or declarative computationalist system where we had a list of Prolog
rules or Java if-then-else statements to simulate the above example. We know these rules or statements
are stored in computer RAM. Now let’s say that one RAM chip where a rule or statement was stored
failed. What would happen to our imperative or functional program? Of course it would not run, since
one of the rules or statements would be lost. Without that rule, because the system needs all of the
Draft 3
6/29/2017
programmed rules or statements, the system could not work, and as expected the system would crash.
Now let’s suppose that one of our neuron’s failed. Would this be true? Well, the answer is no, since the
rules are distributed across the hidden layer, an removal of a single neuron would not kill the network.
Well in the case of the above example, it might, since there are only four neurons, but in a real BNN or
ANN there are likely to be thousands of neurons in the hidden layer, so my hypothesis is safe and
sound.
But let’s think this out a little further. Our brains work using neural networks. That’s a fact. But, why?
Whoever engineered our brains could have used a computationalist approach, i.e., provided us with a
register-based Pentium architecture (implemented using biological cells – probably another sort of
neural architecture) with an operating system and a load of human applications. This would never
work. Why? Because we are mortal and as we live, we die; our cells including neurons are continually
dying as each day rolls by. If our brains were symbolic, the first neuron to die would have wiped out a
load of cognition. But since our brains use neural networks which work on distributed encoding of
symbols, when a single neuron dies, then the encoding-decoding process may become a little less
efficient (we lose memory when we age) but we never stop thinking!
10. Representation and The “Embodied Mind”
The computationalist and connexionist mind-models presented above (according to my broad
interpretation share one common aspect illustrated in the figure below. Here some real-world objects
are perceived by the human eye, and a representation of these objects is stored in the brain. The mind
then operates on these symbols (“the pyramid is larger than the cub”, “is it really a cube?”, etc.).
Of course how this processing proceeds differs in the two mind-models. In the connexionist model, the
processing is implied and is functional. In the computationalist model, the processing is explicit,
operating on a set of rules or statements encoding the symbols.
However, the latest mind-model dumps the requirement for symbols all together. Why operate on a
symbolic representation of perceived objects, when the objects are actually there to grasp? This is
called the “Embodied Mind” model which states that the mind and body cannot be taken as two
disjoined entities, and that thought and consciousness emerge through the interaction of the mind and
body will the real world. So we have a situation like this:
Draft 3
6/29/2017
The main feature of embodied cognition is the perception-action loop, where internal and external
processes of perception (the senses) and actions are intertwined. The claim is that embodiment gains its
cognitive power through real-world interactions and not by operating on some internal world model
(representation). This model was developed by Varela, Thompson and Rosch in the early 90’s and has
been applied to vision by Dana Ballard and to robotics by Rod Brooks.
Here’s one problem I’ve been struggling with personally in the context of “to represent or not to
represent”. Imagine an artist painting a bowl of fruit. In the real world, he has a bowl of fruit and the
canvas on which he is painting a representation of the bowl of fruit. The two are of course distinct. But
what does the artist have in his mind. If we choose “to represent” then the has the following (i) a
representation of the bowl of fruit generated by perception and processing, (ii) a representation of the
bowl of fruit he is painting (iii) a representation of the “idea” he has of what the painting should look
like (Impressionist, Cubist, etc). So we have four representations and one reality. The reality is static,
but the representations are dynamic, and all may change as he paints. So we have a dynamic system
which, of course, is modeled through a set of 4 ODEs. Interesting eh?
Draft 3
6/29/2017
11. Conclusions
Let’s try to wrap this up. We have noted the following points. Computationalism works by the
processing of symbols which represent real-world entities and their relationships. Knowledge is stored
as facts or rules and processed in a language like LISP. That’s why it’s called “Computationalism”
since this model of the mind is akin to a computer program. The central underlying concept is that
symbols are representations. Connexionism uses no explicitly programmed symbolic representations of
real-world object, but is based upon the known facts about biological neurons. Neural networks
(according to me) store symbols in a distributed form, spread across many neurons, all mixed-up, but
this mixture can be decoded. Neural networks can generalize whereas Symbolic systems cannot. They
are also robust to degradation from removal (death) of processing elements.
Connexionism
Computationalism
Structure
Network of elements (“neurons”) and
interconnections with strengths (“synapses”).
Large number of low-speed elements
(neurons). No imperative or declarative
program.
Imperative program (if-then-else
statements) or rules (e.g. Prolog).
Runs on a small number of high
speed elements (CPUs)
Processing
Local calculation of weighted inputs to each
neuron, and local calculation of output.
Dynamical system.
Sequential processing of statements;
imperative, or declarative
processing of rules.
Representation
Distributed representation of encoded
patterns or symbols, and encoding-decoding
rules.
Explicit representation of rules and
symbols as lines of computer code
Generalization
Non-exact inputs produce correct outputs.
Non-exact inputs cannot be easily
handled
Degradation
Robust to deletion of a processing element
(neuron).
Will crash if a processing element
(line of code compiled into
memory) fails
Learning
Supervised learning from expert examples or
unsupervised learning
Needs expert to convert the
expertise into programmed
statements or rules.
Draft 3
6/29/2017
12. Appendix
Here’s an interesting table I found – it’s the “Data Sheet” of Human Cortical Tissue. The number of
neurons, 100,000 million is about the same as the estimated number of stars in the universe. Each
neuron runs at a speed of 100 operations per second. Now human reaction time (from stimulus to
action) is roughly 0.5 second. This means that there are about 50 sequential steps in neural processing.
What imperative program could compute anything worthwhile in this time? Yet the total number of
operations per second is 10,000 trillion, so where does this large number come from? Simply because
there are so many neurons which operate in parallel; the brain is a huge parallel distributed processor.
Another interesting fact are the two “breakdown” field strengths (This is the electric field which must
be applied across the neuron membrane for it to fail). This value is identical to that of the silicondioxide insulators used on computer chips! Finally, you can calculate that the total power dissipated by
the brain is around 0.25 Watts. So the combined activity of ten brains is comparable with an energysaving light bulb. This gives a new meaning to the “light-bulb-symbol” used to represent having a
brain-wave.
Draft 3
6/29/2017