Download Connexionism and Computationalism

Draft 3 6/29/2017 Connectionism – Some Thoughts concerning Neural Networks 1. Introduction – Connexionism and Computationalism To understand how Artificial Neural Networks work and can be applied to produce artificial minds, it is useful to compare them with the Computationalist (also known as “Symbolic”) approach to constructing artificial minds. Let’s revisit Newell and Simons “General Problem Solver” which we explored through the “blocks problem”, where starting from a single black block (the axiom) and a series of production rules we generated a series of patterns (theorems) by the application of each rule, and chose the theorem which was closest to the goal state (alternate series of black and while blocks). We then applied all rules to this new state. In the language of the Symbolic paradigm, the set of theorems at any time is called the “working memory”, or “short-term memory”. The rules themselves can be thought of as “long-termmemory” since this is what the system knows, from a declarative perspective. From an imperative perspective these rules are operations, they make up what the system can do. The control structure decides which production rule to apply next. The blocks example chose the result of the rule which produced a theorem closest to the goal state. We commented that this was an example of a “hillclimbing” algorithm, where we always attempted to move upwards to the goal which is located at the mountain peak. At each stage of the production system, we created a new theorem (a pattern of blocks) which was added to the working memory, to be used or discarded. Remember that this approach was used as an alternative to testing all possible rules on all possible theorems, since we know that in general such a computation may be intractable. At each stage in the computation of the blocks problem, we had a goal, the current state, and the next rule to apply. We could represent each of these symbolically as attribute-value pairs residing in working memory (this harks back to Newell and Simon’s “Symbols Physical System” hypothesis. For example we could have (goal bwbwbwbwbwbw) (state bwwbb) (rule b -> bbb) These symbols are represented as exact statements which can be processed by an imperative or declarative paradigm. It has been traditionally asserted that symbols are the fodder of Computationalism and that connexionism eschews the concept of a symbol. 2. Biological Neural Networks As explained in class, biological neural networks, following the work of Golgi and Cajal have been found to comprise a network of neurons. Each neuron has a clear structure with input “dendrites”, a processing “soma” and an output “axon”. Each neuron is connected on average to around 1000 other neurons through “synapses” which are connections whose strength may change as the brain is exposed to stimuli (and therefore learns). Neurons work using electrical signals, which travel at about 100 km/hour (which is slow compared to CPU signals). The electrical signals are produced by movement of Draft 3 6/29/2017 sodium and potassium ions across the neuron membrane wall. For example, when the neuron receives an excitatory input, sodium rushes in and potassium rushes out. If there are sufficient inputs to the neuron then these ion movements add up and will cause the neuron to generate an output signal. The structure of the axon enables this signal to propagate like a Tsunami wave along the axon to the next neuron. When the electrical wave arrives at the next neuron, it terminates in a “synapse” which then connects to the next neuron’s dendrite. Think of it like this. You travel from Worcester to London and you arrive in Paddington (the Synapse). But you are traveling to Brighton so you need to get to Victoria. How? The tube (Bakerloo then Victoria Lines). Well, the tube is analogous to a synapse, and here’s a working synapse: The axon terminating in the synapse is shown on the left, and when the electrical signal arrives at the synapse, it releases neurotransmitters which are chemicals. (So the overground trains are like electrical signals and the neurotransmitters are like the tube-trains). This activity needs the presence of Calcium ions. When the neurotransmitters reach the dendrite of the next neuron, then the dendrite opens sone “input channels” so that the neurotransmitters can interact with receptor proteins. This interaction leads to further movement of sodium and potassium ions (as explained at the start of this paragraph). Of course, it takes some time to re-generate the neurotransmitters. And so the process goes on. I find this whole thing interesting, that our brain is a combination of pure electrical signals within the neuron’s soma and axon, and chemical signals (which are very slow) across the synapse-neuron interface. Why do we need the slower chemical signals? I guess it’s to do with learning where the synaptic strength changes. At this time, well, Judge Judy is out. 3. The difference between Biological and Artificial Neural Networks (ANNs) There are three main differences in the theories of Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs). The first difference concerns the degree of modeling. BNN models attempt Draft 3 6/29/2017 to reproduce accurately the neurophysiological processes actually found in living (live) neurons. These models include concentrations of the various chemicals, such as potassium, sodium and chloride which have been found to explain how biological neurons work. These models may include tens of interrelated variables, and are intended to understand how the biological neurons work and are not intended to help us construct Artificial Intelligence. ANNs on the other hand are simpler models and have a small number of variables such as the inputs into the neuron, the ‘firing threshold’ and the output. The second difference concerns the modeling approach. BNN models use systems of ordinary differential equations (ODEs) which model how the various chemical concentrations vary with time. This is often called the “integrate-and-fire” model. This uses the language of dynamic systems modeling. On the other hand ANN models are not interested in individual neuron dynamics, they take a functional modeling approach, specifying how the neuron output is computed as a function of the neuron inputs (instantaneously, without the need for time). Typically these models sum the inputs to the neuron, see if this sum is above or below a threshold and respectively decide whether to “fire” (produce an output) or not. All this happens in one time step. The third difference concerns the modeling structure, for example in BNNS we will accurately model the flow of the electrical signal down the axon, which takes some time. It would be daft for ANNs to include this, since it would act to slow down the ANN computations. 4. An example Artificial Neural Network – The Retina We saw in class a slide from Golgi and Cajal of an animal retina. We clearly saw how the light receptor neurons (‘rods’ and ‘cones’) were connected to intermediate neurons which descended into the optic nerve. Here’s a hypothetical connection of Artificial neurons, which we could use to construct a vision system. In the diagram below we look at just one point in the retina where there are three light receptors each sensitive to, respectively (R)ed, (G)reen and (B)lue light. One circuit in the ANN combines the outputs of these sensor neurons to produce the symbol “white”. The other circuit produces the symbol “white”. But note how the presence of “white” has to inhibit the presence of “red” for the “red” to output without being a component of “white”. Draft 3 6/29/2017 How would a symbolic system be programmed to deal with this situation? RGB input values of (255,255,255) would be interpreted as “white”. If(r == 255 && g == 255 & b == 255) color = white but (254,243, 255) would not be, because they would fail the if-statements. So you change the program to work with limits, e.g. If ( (r > 240) && (r <= 255) ) color = red. But who sets these limits? The programmer. But on what basis, how can he decide where to set the limits? If we return to an ANN, then the ANN are able to “set these limits” by experience, from the training set where various shades of “white” are presented. (Procul Harem ). So if an image which was ‘off-white’ were presented to the ANN, the AAN would respond with “white” rather than “purple”. This is called generalization which is the ability of an ANN to give a correct output when the inputs are close (but not exact) to those presented in the training set. Let’s consider the ethnographic difference between ourselves as individuals with out visual experiences in the UK and those of Eskimos in an EK. Individuals within these societies (located in time and space) have the same “neural makeup” (same ‘brains’). But because of the different visual experiences we all have (Eskimos may see various colors of “white snow” within their world, whereas for us “white = paper”, but we have more colors), our BNNs learn from the visual stimuli to produce different classifications of the stimuli. What we perceive as “pink” may correspond to the Eskimo’s shade number 13 of “white”. Let’s look at the neurons in a real biological retina. Draft 3 6/29/2017 5. The Individual Artificial Neuron The Artificial neuron presented in class was a really cut-down version of the full-blown Artificial neuron which is used in AI research. Here we said that a neuron sums its inputs, and if the input was above a threshold would “fire” to produce an output signal. So the computation is a two-stage process. The figure below show this, (a) in general and (b) to produce an “or-gate”. We also saw in class that “learning” in neural networks (both BNNs and ANNs) consisted in the change in the synapse strength increasing or decreasing the strength of connection between neurons. So we need to extend our cut-down version to include this. This is done through the introduction of weights into the input summation process. The diagram below shows how the weights are used in the first part of the computation; each input is multiplied by its synaptic weight of its incoming connection to the neuron before it is summed. In this example, if the weighted sum is equal to or greater than the threshold, then the neuron will fire. This is the original McCulloch-Pitts model. Draft 3 6/29/2017 Note that some weights may be negative, which would reduce the total input to the neuron. These synapses are called inhibitory since they inhibit the firing of the neuron. You’ll see this working in the example of the NOT-gate in the next section. 5. Neurons can effect the operation of Logic Gates We saw in class that a simple neuron can replicate the functionality of an AND and an OR gate. Here we shall extend this, demonstrating how neurons can produce more gates such as NOT and NAND. There is a fundamental issue here, since as all elements of the CPU and memory can be constructed through a combination of NAND (AND-followed-by-NOT) gates. So if we can show that neurons can effect a NAND gate, then we have shown that neurons can effect the principal components of a PC. In other words, we can build a biological PC. First let’s have a look at the AND gate. The figure below shows the equivalent neuron. We can calculate the weighted sums of the inputs and compare with the threshold for all four possible values of input, e.g. if A=1 and B=1 and the weights are both 1 then we calculate (1x1) + (1x1) = 1 + 1 = 2. This is equal to the threshold (shown inside the neuron) so the neuron will fire giving an output 1. Calculate the other values for yourself. A 0 0 1 1 B 0 1 0 1 (1 x A) + (1 x B) 0 1 1 2 out 0 0 0 1 Draft 3 6/29/2017 Now for the not gate whose equivalent neuron is sketched below. Here the weight is -1 and the threshold is zero. So if we have an input 0, the weighted sum is -1 x 0 = 0. This is equal to the threshold, so the neuron fires. Input 0, output 1, that’s a not-gate. Now for input 1, the weighted sum is -1x1 = -1, which is less than the threshold. So for 1 in we get 0 out. Again that’s again a not-gate. 6. Short-term and Long-term memory. Think of “short-term-memory” (STM) as containing information you have seen, heard, felt or smelt within the last few seconds, It may include some of the above diagrams or the information present in the sentence you have just finished reading. Common wisdom says that we can hold seven (plus or minus two) “chunks” of information in our mind at any one time. Perhaps that’s why phone numbers have 6 digits? The diagram below shows a theoretical model of STM from Steven Grossberg. While it shows three neurons, A,B,C, in reality the BNN would use pools of many hundreds of neurons. OK, so how does this work. The key neurons are A and B which are connected together in a loop where A excites B and B excites A. So if A is firing then it fires B which causes A to fire again and so on. If A is not firing then it does not excite B. So the A-B loop is storing “firing” or “not-firing”. That’s the memory element. So how do we know if A-B are firing? Well that’s determined by the input to the Draft 3 6/29/2017 loop which comes in as excitation to A as shown. How do we get the memory out of the loop? That’s the function of neuron C which fires if B is firing together with an “arousal” signal which comes from another part of the brain. The output of C also connects back to B but with an inhibitory weight. So when C fires, it stops B firing (if it was) which stops A firing, removing the memory trace. In other words when the data is read from memory the memory cell is cleared. Long term memory is easier to understand; this involves a change in synaptic strength. The synapse between neurons A and B is strengthened whenever A and B are simultaneously firing. This “fire together wire together” principle proposed by Donald Hebb has become a paradigm in the neurosciences. The figure below is intended to illustrate this: 7. Learning [Supervised, Unsupervised] 8. Central Pattern Generators 9. Connectionism versus Computationalism. The computationalist approach rests on the use of symbols and the connection between these symbols as explained above. This involves rules which relate these symbols in a programming aspect. Each symbol exists as a data object and can be processed by a functional paradigm. The symbol structures are defined by the programmer and therefore exist as explicit data structures (e.g. statements) within the computer program. These symbols are explicitly present as statements and can be observed within the computer program. The connectionist approach is different. A typical neural network may involve three layers, the “input” (I) , the output “O” and the hidden “H”. It can be trained through asserting values to the input and output layers, and this will work as seen in class. For example we could train a three layer network to simultaneously compute the AND and the NAND of the input values. But what exists in the H-layer? That’s what we shall try to understand here. Let’s have a look at the “training set” We apply the input patterns to neurons 0 and 1, and the output patterns to neurons 6 and 7 like this. Draft 3 6/29/2017 inputs neuron 0 neuron 1 0 0 0 1 1 0 1 1 output neuron 6 neuron 7 0 1 0 1 0 1 1 0 So we are training neuron 6 to make the AND of the inputs and neuron 7 to make the NAND of the inputs. (The training set is available for my NNET1 Simulator : “Training2” to be used with “Netspace = Net2”). How let’s look at the values of the hidden layer and see if we can find any structure there. I ran my simulator for 8230 steps in the learning mode (time to cook some Quorn burgers), and then entered the single-step run mode and here are the neurons’ values. Neuron Values 0 0 0 1 1 1 0 1 0 1 2 0.01 0.17 0.17 0.81 3 0.86 0.63 0.60 0.30 4 0.03 0.25 0.24 0.77 5 0.45 0.42 0.43 0.41 6 0 0 0 1 7 1 1 1 0 We’re interested in the hidden layer values (neurons 2 to 5). At first these appear a load of different numbers, but no. Behold the second and third lines (corresponding to the inputs 0,1 and 1,0). The activation values of the hidden neurons are almost identical. It is as though the network has learned that the input patterns 0,1 and 1,0 are actually the same (in the context of the training set). But we can interpret this in a different way. We could suggest that these hidden layer values represent an encoded version of the input patterns. And if we acknowledge that these patterns are symbols then we can argue that a neural network’s hidden layer provides an encoded representation of symbols. Therefore, neural networks do support symbolic representations. So the processing from the input to the hidden layer can be viewed as an encoding of the input patterns into a representation of the patterns which is distributed across the four hidden nodes. So what about the processing from the hidden nodes to the output nodes? Well we can view this as a decoding of the distributed symbolic information into the output values. But the encoding and decoding are not identical, since the input and output patterns are not the same. So we conclude that the ANN has learned the rules for encoding and the rules for decoding. So viewed in this light, we see that the functioning of ANNs can be seen to be equivalent to an imperative program comprising a set of encoding rules and a set of decoding rules. We could of course write an equivalent imperative program. And that’s my point, that ANNs are no less powerful that computationalist imperative silly-symbol programs. But there’s more to come; I suggest ANNs are better, more powerful as well as being rather cute, and here’s why. Imagine we coded an imperative or declarative computationalist system where we had a list of Prolog rules or Java if-then-else statements to simulate the above example. We know these rules or statements are stored in computer RAM. Now let’s say that one RAM chip where a rule or statement was stored failed. What would happen to our imperative or functional program? Of course it would not run, since one of the rules or statements would be lost. Without that rule, because the system needs all of the Draft 3 6/29/2017 programmed rules or statements, the system could not work, and as expected the system would crash. Now let’s suppose that one of our neuron’s failed. Would this be true? Well, the answer is no, since the rules are distributed across the hidden layer, an removal of a single neuron would not kill the network. Well in the case of the above example, it might, since there are only four neurons, but in a real BNN or ANN there are likely to be thousands of neurons in the hidden layer, so my hypothesis is safe and sound. But let’s think this out a little further. Our brains work using neural networks. That’s a fact. But, why? Whoever engineered our brains could have used a computationalist approach, i.e., provided us with a register-based Pentium architecture (implemented using biological cells – probably another sort of neural architecture) with an operating system and a load of human applications. This would never work. Why? Because we are mortal and as we live, we die; our cells including neurons are continually dying as each day rolls by. If our brains were symbolic, the first neuron to die would have wiped out a load of cognition. But since our brains use neural networks which work on distributed encoding of symbols, when a single neuron dies, then the encoding-decoding process may become a little less efficient (we lose memory when we age) but we never stop thinking! 10. Representation and The “Embodied Mind” The computationalist and connexionist mind-models presented above (according to my broad interpretation share one common aspect illustrated in the figure below. Here some real-world objects are perceived by the human eye, and a representation of these objects is stored in the brain. The mind then operates on these symbols (“the pyramid is larger than the cub”, “is it really a cube?”, etc.). Of course how this processing proceeds differs in the two mind-models. In the connexionist model, the processing is implied and is functional. In the computationalist model, the processing is explicit, operating on a set of rules or statements encoding the symbols. However, the latest mind-model dumps the requirement for symbols all together. Why operate on a symbolic representation of perceived objects, when the objects are actually there to grasp? This is called the “Embodied Mind” model which states that the mind and body cannot be taken as two disjoined entities, and that thought and consciousness emerge through the interaction of the mind and body will the real world. So we have a situation like this: Draft 3 6/29/2017 The main feature of embodied cognition is the perception-action loop, where internal and external processes of perception (the senses) and actions are intertwined. The claim is that embodiment gains its cognitive power through real-world interactions and not by operating on some internal world model (representation). This model was developed by Varela, Thompson and Rosch in the early 90’s and has been applied to vision by Dana Ballard and to robotics by Rod Brooks. Here’s one problem I’ve been struggling with personally in the context of “to represent or not to represent”. Imagine an artist painting a bowl of fruit. In the real world, he has a bowl of fruit and the canvas on which he is painting a representation of the bowl of fruit. The two are of course distinct. But what does the artist have in his mind. If we choose “to represent” then the has the following (i) a representation of the bowl of fruit generated by perception and processing, (ii) a representation of the bowl of fruit he is painting (iii) a representation of the “idea” he has of what the painting should look like (Impressionist, Cubist, etc). So we have four representations and one reality. The reality is static, but the representations are dynamic, and all may change as he paints. So we have a dynamic system which, of course, is modeled through a set of 4 ODEs. Interesting eh? Draft 3 6/29/2017 11. Conclusions Let’s try to wrap this up. We have noted the following points. Computationalism works by the processing of symbols which represent real-world entities and their relationships. Knowledge is stored as facts or rules and processed in a language like LISP. That’s why it’s called “Computationalism” since this model of the mind is akin to a computer program. The central underlying concept is that symbols are representations. Connexionism uses no explicitly programmed symbolic representations of real-world object, but is based upon the known facts about biological neurons. Neural networks (according to me) store symbols in a distributed form, spread across many neurons, all mixed-up, but this mixture can be decoded. Neural networks can generalize whereas Symbolic systems cannot. They are also robust to degradation from removal (death) of processing elements. Connexionism Computationalism Structure Network of elements (“neurons”) and interconnections with strengths (“synapses”). Large number of low-speed elements (neurons). No imperative or declarative program. Imperative program (if-then-else statements) or rules (e.g. Prolog). Runs on a small number of high speed elements (CPUs) Processing Local calculation of weighted inputs to each neuron, and local calculation of output. Dynamical system. Sequential processing of statements; imperative, or declarative processing of rules. Representation Distributed representation of encoded patterns or symbols, and encoding-decoding rules. Explicit representation of rules and symbols as lines of computer code Generalization Non-exact inputs produce correct outputs. Non-exact inputs cannot be easily handled Degradation Robust to deletion of a processing element (neuron). Will crash if a processing element (line of code compiled into memory) fails Learning Supervised learning from expert examples or unsupervised learning Needs expert to convert the expertise into programmed statements or rules. Draft 3 6/29/2017 12. Appendix Here’s an interesting table I found – it’s the “Data Sheet” of Human Cortical Tissue. The number of neurons, 100,000 million is about the same as the estimated number of stars in the universe. Each neuron runs at a speed of 100 operations per second. Now human reaction time (from stimulus to action) is roughly 0.5 second. This means that there are about 50 sequential steps in neural processing. What imperative program could compute anything worthwhile in this time? Yet the total number of operations per second is 10,000 trillion, so where does this large number come from? Simply because there are so many neurons which operate in parallel; the brain is a huge parallel distributed processor. Another interesting fact are the two “breakdown” field strengths (This is the electric field which must be applied across the neuron membrane for it to fail). This value is identical to that of the silicondioxide insulators used on computer chips! Finally, you can calculate that the total power dissipated by the brain is around 0.25 Watts. So the combined activity of ten brains is comparable with an energysaving light bulb. This gives a new meaning to the “light-bulb-symbol” used to represent having a brain-wave. Draft 3 6/29/2017

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Connexionism and Computationalism