Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Financial Informatics –XIII: Neural Computing Systems Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-2, IRELAND November 19th, 2008. https://www.cs.tcd.ie/Khurshid.Ahmad/Teaching.html 1 1 Neural Networks: Real Networks London, Michael and Michael Häusser (2005). Dendritic Computation. Annual Review of Neuroscience. Vol. 28, pp 503–32 2 Real Neural Networks: Cells and Processes A neuron is a cell with appendages; every cell has a nucleus and the one set of appendages brings in inputs – the dendrites – and another set helps to output signals generated by the cell NUCLEUS DENDRITES CELL BODY AXON 3 Real Neural Networks: Cells and Processes The human brain is mainly composed of neurons: specialised cells that exist to transfer information rapidly from one part of an animal's body to another. This communication is achieved by the transmission (and reception) of electrical impulses (and chemicals) from neurons and other cells of the animal. Like other cells, neurons have a cell body that contains a nucleus enshrouded in a membrane which has double-layered ultrastructure with numerous pores. Dendrite Axon Terminals Soma Nucleus SOURCE: http://en.wikipedia.org/wiki/Neurons 4 Real Neural Networks All the neurons of an organism, together with their supporting cells, constitute a nervous system. The estimates vary, but it is often reported that there are as many as 100 billion neurons in a human brain. Neurobiologist and neuroethologists have argued that intelligence is roughly proportional to the number of neurons when different species of animals compared. Typically the nervous system includes a : Spinal Cord is the least differentiated component of the central nervous system and includes neuronal connections that provide for spinal reflexes. There are also pathways conveying sensory data to the brain and pathways conducting impulses, mainly of motor significance, from the brain to the spinal cord; and, Medulla Oblongata: The fibre tracts of the spinal cord are continued in the medulla, which also contains of clusters of nerve cells called nuclei. 5 Real Neural Networks Inputs to and outputs from an animal nervous system Cerebellum receives data from most sensory systems and the cerebral cortex, the cerebellum eventually influences motor neurons supplying the skeletal musculature. It produces muscle tonus in relation to equilibrium, locomotion and posture, as well as non-stereotyped movements based on individual experiences. 6 Real Neural Networks Processing of some information in the nervous system takes place in Diencephalon. This forms the central core of the cerebrum and has influence over a number of brain functions including complex mental processes, vision, and the synthesis of hormones reaching the blood stream. Diencephalon comprises thalamus, epithalmus, hypothalmus, and subthalmus. The retina is a derivative of diencephalon; the optic nerve and the visual system are therefore intimately related to this part of the brain. 7 Real Neural Networks Inputs to the nervous system are relayed thtough the Telencephalon (Cereberal Hemispheres) which includes the cerebral cortex, corpus striatum, and medullary center. Nine-tenths of the human cerebral cortex is neocortex, a possible result of evolution, and contains areas for all modalities of sensation (except smell), motor areas, and large expanses of association cortex in which presumably intellectual activity takes place. Corpus striatum, large mass of gray matter, deals with motor functions and the medullary center contains fibres to connect cortical areas of two hemispheres. 8 Real Neural Networks Inputs to the nervous system are relayed thtough the Telencephalon (Cereberal Hemispheres) which includes the cerebral cortex, corpus striatum, and medullary center. Nine-tenths of the human cerebral cortex is neocortex, a possible result of evolution, and contains areas for all modalities of sensation (except smell), motor areas, and large expanses of association cortex in which presumably intellectual activity takes place. Corpus striatum, large mass of gray matter, deals with motor functions and the medullary center contains fibres to connect cortical areas of two hemispheres. 9 Real Neural Networks: Cells and Processes Neurons have a variety of appendages, referred to as 'cytoplasmic processes known as neurites which end in close apposition to other cells. In higher animals, neurites are of two varieties: Axons are processes of generally of uniform diameter and conduct impulses away from the cell body; dendrites are short-branched processes and are used to conduct impulses towards the cell body. The ends of the neurites, i.e. axons and dendrites are called synaptic terminals, and the cell-to-cell contacts they make are known as synapses. Dendrite Axon Terminals Soma Nucleus SOURCE: http://en.wikipedia.org/wiki/Neurons 10 Real Neural Networks: Cells and Processes 1010 neurons with 104 connections and an average of 10 spikes per second = 1015 adds/sec. This is a lower bound on the equivalent computational power of the brain. – – Asynchronous firing rate, c. 200 per sec. 4 + 10 fan-in summation 4 10 fan-out 1 - 100 meters per sec. 11 Real Neural Networks: Cells and Processes Henry Markram (2006). The Blue Brain Project. Nature Reviews Neuroscience Vol. 7, 153-160 12 Neural Networks: Real and Artificial Observed Biological Processes (Data) Neural Networks & Neurosciences Biologically Plausible Mechanisms for Neural Processing & Learning (Biological Neural Network Models) Theory (Statistical Learning Theory & Information Theory) http://en.wikipedia.org/wiki/Neural_network#Neural_networks_and_neuroscience 13 Neural Networks & Neuro-economics The behaviour of economic actors: pattern recognition, risk-averse/prone activities; risk/reward Neural Networks & Neurosciences Biologically Plausible Mechanisms for Neural Processing & Learning (Biological Neural Network Models) Theory (Statistical Learning Theory & Information Theory) http://en.wikipedia.org/wiki/Neural_network#Neural_networks_and_neuroscience 14 Neural Networks Artificial Neural Networks The study of the behaviour of neurons, either as 'single' neurons or as cluster of neurons controlling aspects of perception, cognition or motor behaviour, in animal nervous systems is currently being used to build information systems that are capable of autonomous and intelligent behaviour. 15 Neural Networks Artificial Neural Networks, the uses of Application Artificial Neural Networks Classification: Given the knowledge of classes amongst objects, train a network to recognise the different classes using examples of idiosyncratic features of each of the classes Categorisation No class information is available. The network trains itself by assigning ‘similar’ objects to proximate neurons Pattern Auto- and hetero associations between input Association and generated output patterns Forecasting Training neurons with time serial data with one- or two step forecasting; Networks that learn to remove noise from a time serial data. 16 Neural Networks Artificial Neural Networks Artificial neural networks emulate threshold behaviour, simulate co-operative phenomenon by a network of 'simple' switches and are used in a variety of applications, like banking, currency trading, robotics, and experimental and animal psychology studies. These information systems, neural networks or neuro-computing systems as they are popularly known, can be simulated by solving first-order difference or differential equations. 17 Neural Networks Artificial Neural Networks The basic premise of the course, Neural Networks, is to introduce our students to an alternative paradigm of building information systems. 18 Neural Networks Artificial Neural Networks • Statisticians generally have good mathematical backgrounds with which to analyse decision-making algorithms theoretically. […] However, they often pay little or no attention to the applicability of their own theoretical results’ (Raudys 2001:xi). • Neural network researchers ‘advocate that one should not make assumptions concerning the multivariate densities assumed for pattern classes’ . Rather, they argue that ‘one should assume only the structure of decision making rules’ and hence there is the emphasis in the minimization of classification errors for instance. • In neural networks there are algorithms that have a theoretical justification and some have no theoretical elucidation’. •Given that there are strengths and weaknesses of both statistical and other soft computing algorithms (e.g. neural nets, fuzzy logic), one should integrate the two classifier design strategies (ibid) Raudys, Šarûunas. (2001). Statistical and Neural Classifiers: An integrated approach to design. London: Springer-Verlag 19 Neural Networks Artificial Neural Networks • Artificial Neural Networks are extensively used in dealing with problems of classification and pattern recognition. The complementary methods, albeit of a much older discipline, are based in statistical learning theory •Many problems in finance, for example, bankruptcy prediction, equity return forecasts, have been studied using methods developed in statistical learning theory: 20 Neural Networks Artificial Neural Networks Many problems in finance, for example, bankruptcy prediction, equity return forecasts, have been studied using methods developed in statistical learning theory: “The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, making decisions or constructing models from a set of data. This is studied in a statistical framework, that is there are assumptions of statistical nature about the underlying phenomena (in the way the data is generated).” (Bousquet, Boucheron, Lugosi) Statistical learning theory can be viewed as the study of algorithms that are designed to learn from observations, instructions, examples and so on. Olivier Bousquet, Stephane Boucheron, and Gabor Lugosi. Introduction to Statistical Learning Theory (http://www.econ.upf.edu/~lugosi/mlss_slt.pdf) 21 Neural Networks Artificial Neural Networks “Bankruptcy prediction models have used a variety of statistical methodologies, resulting in varying outcomes. These methodologies include linear discriminant analysis regression analysis, logit regression, and weighted average maximum likelihood estimation, and more recently by using neural networks.” The five ratios are net cash flow to total assets, total debt to total assets, exploration expenses to total reserves, current liabilities to total debt, and the trend in total reserves (over a three year period, in a ratio of change in reserves in computed as changes in Yrs 1 & 2 and changes in Yrs 2 &3). Zhang et al (1999) found that (a variant of) ‘neural network and Fischer Discriminant Analysis ‘achieve the best overall estimation’, although discriminant analysis gave superior results. Z. R. Yang, Marjorie B. Platt and Harlan D. Platt (1999)Probabilistic Neural Networks in Bankruptcy Prediction. Journal of Business Research, Vol 44, pp 67–74 22 Neural Networks Artificial Neural Networks A neural network can be described as a type of multiple regression in that it accepts inputs and processes them to predict some output. Like a multiple regression, it is a data modeling technique. Neural networks have been found particularly suitable in complex pattern recognition compared to statistical multiple discriminant analysis (MDA) since the networks are not subject to restrictive assumptions of MDA models Shaikh A. Hamid and Zahid Iqbal (2004). (1999) Using neural networks for forecasting volatility of S&P 500 Index futures prices. Journal of Business Research, Vol 57, pp 1116-1125. 23 Neural Networks Artificial Neural Networks Neural Networks for forecasting volatility of S&P 500 Index: A neural network was trained to learn to generate volatility forecasts for the S&P 500 Index over different time horizons. The results were compared with pricing option models used to compute implied volatility from S&P 500 Index futures options (the Barone-Adesi and Whaley American futures options pricing model) Forecasts from neural networks outperform implied volatility forecasts and are not found to be significantly different from realized volatility. Implied volatility forecasts are found to be significantly different from realized volatility in two of three forecast horizons. Shaikh A. Hamid and Zahid Iqbal (2004). (1999) Using neural networks for forecasting volatility of S&P 500 Index futures prices. Journal of Business Research, Vol 57, pp 1116-1125. 24 Neural Networks Artificial Neural Networks The ‘remarkable qualities’ of neural networks: the dynamics of a single layer perceptron progresses from the simplest algorithms to the most complex algorithms: each pattern class characterized by sample mean vector neuron behaves like E[uclidean] D[istance] C[lassifier] ; • Further Training neuron begins to evaluate correlations and variances of features neuron behaves like standard linear Fischer classifier • More training neuron minimizes number of incorrectly identified training patterns neuron behaves like a support vector classifier. • Initial Training Statisticians and engineers usually design decision-making algorithms from experimental data by progressing from simple algorithms to more complex ones. Raudys, Šarûunas. (2001). Statistical and Neural Classifiers: An integrated approach to design. London: Springer-Verlag 25 Neural Networks: Real and Artificial Observed Biological Processes (Data) Neural Networks & Neurosciences Biologically Plausible Mechanisms for Neural Processing & Learning (Biological Neural Network Models) Theory (Statistical Learning Theory & Information Theory) http://en.wikipedia.org/wiki/Neural_network#Neural_networks_and_neuroscience 26 Neural Networks & Neuro-economics The behaviour of economic actors: pattern recognition, risk-averse/prone activities; risk/reward Neural Networks & Neurosciences Biologically Plausible Mechanisms for Neural Processing & Learning (Biological Neural Network Models) Theory (Statistical Learning Theory & Information Theory) http://en.wikipedia.org/wiki/Neural_network#Neural_networks_and_neuroscience 27 Neural Networks: Neuro-economics Observed Biological Processes (Data) Investors systematically deviate from rationality when making financial decisions, yet the mechanisms responsible for these deviations have not been identified. Using event-related fMRI, we examined whether anticipatory neural activity would predict optimal and suboptimal choices in a financial decision-making task. [] Two types of deviations from the optimal investment strategy of a rational risk-neutral agent as risk-seeking mistakes and risk-aversion mistakes. Camelia M. Kuhnen and Brian Knutson (2005). “Neural Antecedents of Financial 28 Decisions.” Neuron, Vol. 47, pp 763–770 Neural Networks: Neuro-economics Observed Biological Processes (Data) Nucleus accumbens [NAcc] activation preceded risky choices as well as risk-seeking mistakes, while anterior insula activation preceded riskless choices as well as risk-aversion mistakes. These findings suggest that distinct neural circuits linked to anticipatory affect promote different types of financial choices and indicate that excessive activation of these circuits may lead to investing mistakes. Camelia M. Kuhnen and Brian Knutson (2005). “Neural Antecedents of Financial Decisions.” Neuron, Vol. 47, pp 763–770 29 Neural Networks: Observed Biological Processes (Data) Neuro-economics Association of Anticipatory Neural Activation with Subsequent Choice The left panel indicates a significant effect of anterior insula activation on the odds of making riskless (bond) choices and risk-aversion mistakes (RAM) after a stock choice (Stockt-1). The right panel indicates a significant effect of NAcc activation on the odds of making risk-aversion mistakes, risky choices, and risk-seeking mistakes (RSM) after a bond choice (Bondt-1). The odds ratio for a given choice is defined as the ratio of the probability of making that choice divided by the probability of not making that choice. Percent change in odds ratio results from a 0.1% increase in NAcc or anterior insula activation. Error bars indicate the standard errors of the estimated effect. *coefficient significant at p < 0.05. Camelia M. Kuhnen and Brian Knutson (2005). “Neural Antecedents of Financial Decisions.” Neuron, Vol. 47, pp 763–770 30 Neural Networks: Observed Biological Processes (Data) Neuro-economics Nucleus accumbens [NAcc] activation preceded risky choices as well as riskseeking mistakes, while anterior insula activation preceded riskless choices as well as risk-aversion mistakes. These findings suggest that distinct neural circuits linked to anticipatory affect promote different types of financial choices and indicate that excessive activation of these circuits may lead to investing mistakes. Camelia M. Kuhnen and Brian Knutson (2005). “Neural Antecedents of Financial Decisions.” Neuron, Vol. 47, pp 763–770 31 Neural Networks: Neuro-economics Observed Biological Processes (Data) To explain investing decisions, financial theorists invoke two opposing metrics: expected reward and risk. Recent advances in the spatial and temporal resolution of brain imaging techniques enable investigators to visualize changes in neural activation before financial decisions. Research using these methods indicates that although the ventral striatum plays a role in representation of expected reward, the insula may play a more prominent role in the representation of expected risk. Accumulating evidence also suggests that antecedent neural activation in these regions can be used to predict upcoming financial decisions. These findings have implications for predicting choices and for building a physiologically constrained theory of decision-making. Brian Knutson and Peter Bossaerts (2007). “Neural Antecedents of Financial Decisions.” The Journal of Neuroscience, (August 1, 2007), Vol. 27 (No. 31), pp 8174–8177 32 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons Two examples: Categorisation and attentional modulation of conditioning Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 33 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. Feature nodes respond to the presence or absence of marks at particular locations. Category Nodes Weighted sum of inputs Feature Nodes Category nodes respond to the patterns of activation in the feature nodes. Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 34 Artificial Neural Networks Artificial Neural Networks (ANN) are computational systems, either hardware or software, which mimic animate neural systems comprising biological (real) neurons. An ANN is architecturally similar to a biological system in that the ANN also uses a number of simple, interconnected artificial neurons. 35 Artificial Neural Networks In a restricted sense artificial neurons are simple emulations of biological neurons: the artificial neuron can, in principle, receive its input from all other artificial neurons in the ANN; simple operations are performed on the input data; and, the recipient neuron can, in principle, pass its output onto all other neurons. Intelligent behaviour can be simulated through computation in massively parallel networks of simple processors that store all their long-term knowledge in the connection strengths. 36 Artificial Neural Networks According to Igor Aleksander, Neural Computing is the study of cellular networks that have a natural propensity for storing experiential knowledge. Neural Computing Systems bear a resemblance to the brain in the sense that knowledge is acquired through training rather than programming and is retained due to changes in node functions. Functionally, the knowledge takes the form of stable states or cycles of states in the operation of the net. A central property of such states is to recall these states or cycles in response to the presentation of cues. 37 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet: e.g. 25 patterns for representing 32 letters. O 1 1 0 0 1 1 1 0 0 0 I 1 0 0 1 0 E 1 0 1 0 0 A 1 0 0 0 0 U The line patterns (vertical, horizontal, short. Long strokes) and circular patterns in Roman alphabet can be represented in a binary system. Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 38 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons The transformation of linear and circular patterns into binary patterns requires a degree of pre-processing and judicious guesswork! U O I M A P P I E N G A 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 0 The line patterns (vertical, horizontal, short. Long strokes) and circular patterns in Roman alphabet can be represented in a binary system. Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 39 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. Association between - A Category Nodes Weighted sum of inputs 1 0 0 0 0 feature nodes and category nodes should be allowed to change over time (during training), more specifically as a result of repeated activation of the connection Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 40 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. Association between - A Category Nodes E Weighted sum of inputs 1 0 1 0 0 feature nodes and category nodes should be allowed to change over time (during training), more specifically as a result of repeated activation of the connection Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 41 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. Category Nodes Weighted sum of inputs Feature Nodes Competition amongst to win over an individual letter shape and then to inhibit other neurons to respond to that shape. Especially helpful when there is noise in the signal, e.g. sloppy writing Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 42 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet: e.g. 25 patterns for representing 32 letters. θ 1 1 0 0 1 1 1 0 0 0 ι 1 0 0 1 0 ε 1 0 1 0 0 α 1 0 0 0 0 υ The line patterns (vertical, horizontal, short. Long strokes) and circular patterns in Greek alphabet can be represented in a binary system. Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 43 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. - α Category Nodes Weighted sum of inputs 1 0 0 0 0 Since the weights change due to repeated presentation our system will learn to ‘identify’ Greek letters of the alphabet Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 44 Neural Networks: Real and Artificial Organising Principles and Common Themes: Association between neurons and competition amongst the neurons A network for identifying handwritten letters of the alphabet. Association between - α Category Nodes ε Weighted sum of inputs 1 0 1 0 0 feature nodes and category nodes should be allowed to change over time (during training), more specifically as a result of repeated activation of the connection Levine, Daniel S. (1991). Introduction to Neural and Cognitive Modeling. Hillsdale, NJ & London: Lawrence Erlbaum Associates, Publishers (See Chapter 1) 45 Artificial Neural Networks and Learning Artificial Neural Networks 'learn' by adapting in accordance with a training regimen: The network is subjected to particular information environments on a particular schedule to achieve the desired endresult. There are three major types of training regimens or learning paradigms: SUPERVISED; UN-SUPERVISED; REINFORCEMENT or GRADED. 46 Biological and Artificial NN’s Entity Biological Neural Networks Artificial Neural Networks Processing Units Neurons Network Nodes Input Dendrites Network Arcs (Dendrites may form synapses onto other dendrites) (No interconnection between arcs) Axons or Processes Network Arcs (Axons may form synapses onto other axons) (No interconnection between arcs) Synaptic Contact Node to Node via Arcs Output Inter-linkage (Chemical and Electrical) Plastic Connections Weighted Connections Matrix 47 Biological and Artificial NN’s Entity Biological Neural Networks Artificial Neural Networks Processing Units Neurons Network Nodes Input Dendrites Network Arcs (Dendrites may form synapses onto other dendrites) (No interconnection between arcs) Axons or Processes Network Arcs (Axons may form synapses onto other axons) (No interconnection between arcs) Synaptic Contact Node to Node via Arcs Output Inter-linkage (Chemical and Electrical) Plastic Connections Weighted Connections Matrix 48 Biological and Artificial NN’s Entity Inter-linkage Biological Neural Networks Artificial Neural Networks Excitatory – Positive connection weights between nodes with asymmetrical membrane specialisation, thicker on the post-synaptic side and presynaptic side containing round vesicles Inter-linkage Inhibitory– with symmetrical membrane specialisation, with ellipsoidal vesicles Negative connection weights between nodes 49 Biological and Artificial NN’s Entity Output Biological Neural Networks Artificial Neural Networks Dendrites bring inputs from different locations: so does the brain wait for all the inputs and then start up the summing exercise or does it perform many different intermediate computations? All inputs arrive instantaneously and are summed up in the same computational cycle: distance (or location) between neuronal nodes is not an issue. 50 Biological and Artificial NN’s Entity Output Biological Neural Networks Artificial Neural Networks The threshold: the neurons, being in a noisy environment, tend not to abide by a fixed, discontinuous threshold and there is a degree of tolerance of the input. The threshold is usually a discontinuous (step) function and after the threshold is ‘crossed’ the amount of input is immaterial 51 Artificial Neural Networks An ANN system can be characterised by •its ability to learn; •its dynamic capability; and • its interconnectivity 52 Artificial Neural Networks: An Operational View Neuron xk x1 Summing Junction wk2 x3 wk3 x4 wk4 Activation Function S yk Output Signal Input Signals x2 wk1 bk Net input or weighted sum : net w1 * x1 w2 * x2 w3 * x3 w4 * x4 Neuronal output identity function y1 net non negative identity function y1 0 if net THRESHOLD ( ) y1 net if net THRESHOLD ( ) 53 Artificial Neural Networks: An Operational View A neuron is an information processing unit forming the key ingredient of a neural network: The diagram above is a model of a biological neuron. There are three key ingredients of this neuron labelled xk which is connected to the (rest of the) neurons in the network labelled x1, x2, x3,…xj. A set of links, the biological equivalent of synapses, which the kth neuron has with the (rest of the) neurons in the network. Note that each link has a WEIGHT denoted by labelled wk1, wk2,…wkj, where the first subscript (k in this case) denotes the recipient neurons and the second subscript (1,2,3…..j) denotes the neurons transmitting to the recipient neurons. The synaptic weight wkj may lie in a range that includes negative (inhibitory) values and positive (excitatory) values. (From Haykin 1999:10-12) 54 Artificial Neural Networks: An Operational View The kth neuron adds up the inputs of all the transmitting neurons at the summing junction or the adder, denoted by S. The adder acts as a linear combiner and generates a weighted average usually denoted by uk: uk = wk1i*x1 + wk2*x2 + wk3*x3 + ………. + wkj*xj; the bias (bk )has the effect of increasing or decreasing the net input to the activation function depending on the value of the bias. (From Haykin 1999:10-12) (From Haykin 1999:10-12) 55 ANN’s: an Operational View Finally, the linear combination, denoted as vk = uk + bk, is passed through the activation function which engenders the non-linear behaviour seen in the behaviour of the biological neurons: the inputs to and outputs from a given neuron show a complex, often non-linear behaviour. For example, if the output from the adder was positive or zero then the neuron will emit a signal, yk = 1 if (vk)0 , however if the output from the adder was negative then there will be no output, yk = 0 if (vk)< 0 . There are other models of the activiation function as we will see later. (From Haykin 1999:10-12) 56 ANN’s: an Operational View Neuron xk x1 wk2 x3 wk3 x4 wk4 Summing Junction Activation Function S yk Output Signal Input Signals x2 wk1 bk Net input or weighted sum : net w1 * x1 w2 * x2 w3 * x3 w4 * x4 Neuronal output cons tan t y1 net y1 0 if net THRESHOLD ( ) y1 C if net THRESHOLD ( ) 57 ANN’s: an Operational View Neuron xk x1 wk2 x3 wk3 x4 wk4 Summing Junction Activation Function S yk Output Signal Input Signals x2 wk1 bk Net input or weighted sum : net w1 * x1 w2 * x2 w3 * x3 w4 * x4 Neuronal output Saturated output y1 0 if net THRESHOLD ( ) y1 1 if net THRESHOLD ( ) 58 ANN’s: an Operational View Discontinuous Output Neuron xk x1 wk2 x3 wk3 x4 wk4 S Threshold (θ) y1 0 if net THRESHOLD ( ) y1 1 if net THRESHOLD ( ) Output f(net) Saturated output yk bk Net input or weighted sum : net w1 * x1 w2 * x2 w3 * x3 w4 * x4 Neuronal output Activation Function Output Signal Input Signals x2 Summing Junction wk1 No output (Normalised) output (eg. 1) net 59 ANN’s: an Operational View Neuron xk x1 wk2 x3 wk3 x4 wk4 Summing Junction S Activation Function yk Output Signal Input Signals x2 wk1 bk The notion of a discontinuous function simulates the fundamental notion that biological neurons usually fire if there is ‘enough’ stimulus available in the environment. But discontinuous is biologically implausible, so there must be some degree of continuity in the output such that an artificial neuron has a degree of biological plausibility. 60 ANN’s: an Operational View Pseudo-Continuous Output Neuron xk x1 wk2 x3 wk3 x4 wk4 Activation Function S yk bk Saturation Threshold (θ’) Net input or weighted sum : f(net) net w1 * x1 w2 * x2 w3 * x3 w4 * x4 Neuronal output Saturated output Output Signal Input Signals x2 Summing Junction wk1 Output β Threshold (θ) Output=α y1 if net THRESHOLD ( ) y1 if net Saturation THRESHOLD ( ' ) y1 f ( , ' , , ) net 61 ANN’s: an Operational View Neuron xk x1 x3 x4 wk2 Summing Junction Activation Function S yk wk3 wk4 bk A schematic for an 'electronic' neuron Output Signal Input Signals x2 wk1 62 ANN’s: an Operational View Neural Nets as directed graphs A directed graph is a geometrical object consisting of a set of points (called nodes) along with a set of directed line segments (called links) between them. A neural network is a parallel distributed information processing structure in the form of a directed graph. 63 ANN’s: an Operational View Input Connections Processing Unit Output Connection Fan Out 64 ANN’s: an Operational View A neural network comprises A set of processing units A state of activation An output function for each unit A pattern of connectivity among units A propagation rule for propagating patterns of activities through the network An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce a new level of activation for the unit A learning rule whereby patterns of connectivity are modified by experience An environment within which the system must operate 65 The McCulloch-Pitts Network . McCulloch and Pitts demonstrated that any logical function can be duplicated by some network of all-ornone neurons referred to as an artificial neural network (ANN). Thus, an artificial neuron can be embedded into a network in such a manner as to fire selectively in response to any given spatial temporal array of firings of other neurons in the ANN. Artificial Neural Networks for Real Neuroscientists: Khurshid Ahmad, Trinity College, 28 Nov 2006 66 The McCulloch-Pitts Network Demonstrates that any logical function can be implemented by some network of neurons. •There are rules governing the excitatory and inhibitory pathways. •All computations are carried out in discrete time intervals. •Each neuron obeys a simple form of a linear threshold law: Neuron fires whenever at least a given (threshold) number of excitatory pathways, and no inhibitory pathways, impinging on it are active from the previous time period. •If a neuron receives a single inhibitory signal from an active neuron, it does not fire. •The connections do not change as a function of experience. Thus the network deals with performance but not learning. 67 The McCulloch-Pitts Network Computations in a McCulloch-Pitts Network ‘Each cell is a finite-state machine and accordingly operates in discrete time instants, which are assumed synchronous among all cells. At each moment, a cell is either firing or quiet, the two possible states of the cell’ – firing state produces a pulse and quiet state has no pulse. (Bose and Liang 1996:21) ‘Each neural network built from McCulloch-Pitts cells is a finite-state machine is equivalent to and can be simulated by some neural network.’ (ibid 1996:23) ‘The importance of the McCulloch-Pitts model is its applicability in the construction of sequential machines to perform logical operations of any degree of complexity. The model focused on logical and macroscopic cognitive operations, not detailed physiological modelling of the electrical activity of the nervous system. In fact, this deterministic model with its discretization of time and summation rules does not reveal the manner in which biological neurons integrate their inputs.’ (ibid 1996:25) 68 The McCulloch-Pitts Network Consider a McCulloch-Pitts network which can act as a minimal model of the sensation of heat from holding a cold object to the skin and then removing it or leaving it on permanently. Each cell has a threshold of TWO, hence fires whenever it receives two excitatory (+) and no inhibitory (-) signals from other cells at a previous time. Artificial Neural Networks for Real Neuroscientists: Khurshid Ahmad, Trinity College, 28 Nov 2006 69 The McCulloch-Pitts Network Heat Sensing Network 1 + + Hot 3 + Heat + B Receptors Cold + + A + 2 + + + 4 Cold 70 The McCulloch-Pitts Network Heat Sensing Network Truth tables of the firing neurons when the cold object contacts the skin and is then removed 1 + + Hot 3 Heat Receptors Cold Cell 1 Cell 2 Cell a Cell b Cell 3 Cell 4 INPUT INPUT HIDDEN HIDDEN OUTPUT OUTPUT 1 No Yes No No No No 2 No No Yes No No No 3 No No No Yes No No 4 No No No No Yes No + + B + + A + + 2 Time + + 4 Cold 71 The McCulloch-Pitts Network Heat Sensing Network ‘Feel hot’/’Feel cold’ neurons show how to create OUTPUT UNIT RESPONSE to given INPUTS that depend ONLY on the previous values. This is known as a TEMPORAL CONTRAST ENHANCEMENT. The absence or presence of a stimulus in the PREVIOUS time cycle plays a major role here. The McCulloch-Pitts Network demonstrates how this ENHANCEMENT can be simulated using an ALL-OR-NONE Network. 72 The McCulloch-Pitts Network Heat Sensing Network Truth tables of the firing neurons for the case when the cold object is left in contact with the skin – a simulation of temporal contrast enhancement Time Cell 1 Cell 2 Cell a Cell b Cell 3 Cell 4 INPUT INPUT HIDDEN HIDDEN OUTPUT OUTPUT 1 2 3 73 The McCulloch-Pitts Network Heat Sensing Network Truth tables of the firing neurons for the case when the cold object is left in contact with the skin – a simulation of temporal contrast enhancement 1 + + Hot Time 3 Heat Receptors Cold Cell 2 Cell a Cell b Cell 3 Cell 4 INPUT INPUT HIDDEN HIDDEN OUTPUT OUTPUT 1 No Yes No No No No 2 No Yes Yes No No No 3 No Yes Yes No No Yes + + B + + A + + 2 Cell 1 + + 4 Cold 74 The McCulloch-Pitts Network Memory Models + + A + + + 1 2 + ++ + + B Three stimulus model 1 + 2 + Permanent Memory model 75 The McCulloch-Pitts Network Memory Models In the permanent memory model, the output neuron has threshold ‘1’; neuron 2 fires if the light has ever been on anytime in the past. Levine, D. S. (1991:16) + 1 + 2 Permanent Memory model 76 The McCulloch-Pitts Network Memory Models Consider, the three stimulus all-or-none neural network. In this network, neuron 1 responds to a light being on. Each of the neurons has threshold ‘3’. In the three stimulus model neuron 2 fires after the light has been on three time units in a row. Time A 1 2 Cell 1 Cell A Cell B Cell 2 1 Yes No No No 2 Yes No Yes No 3 Yes Yes Yes No 4 No Yes Yes B Three stimulus model All connections are unit positive Yes 77 The McCulloch-Pitts Network Why is a McCulloch-Pitts a FSM? A finite state machine (FSM)is an AUTOMATON.An input string is read from left to right; the machine looks at each symbol in turn. At any time the FSM is in one of many finitely interval states. The state changes after each input symbol is read. The NEW STATE depends (only) on the symbol just read and on the current state. 78 The McCulloch-Pitts Network ‘The McCulloch-Pitts model, though it uses an oversimplified formulation of neural activity patterns, presages some issues that are still important in current cognitive models. [..][Some] Modern connectionist networks contain three types of units or nodes – input units, output units, and hidden units. The input units react to particular data features from the environment […]. The output units generate particular organismic responses […]. The hidden units are neither input nor output units themselves but, via network connections, influence output units to respond to prescribed patterns of input unit firings or activities. [..] [This] input-output-hidden trilogy can be seen as analogous to the distinction between sensory neurons, motor neurons, and all other (interneurons) in the brain’ Levine, Daniel S. (1991: 14-15) Artificial Neural Networks for Real Neuroscientists: Khurshid Ahmad, Trinity College, 28 Nov 2006 79 The McCulloch-Pitts Network Linear Neuron: Output is the weighted sum of all the inputs; McCulloch-Pitts Neuron: Output is the thresholded value of the weighted sum Input Vector? X = X (1,-20,4,-2); Weight vector? wji=w(wj1,wj2,wj3,wj4) wj1 =[0.8,0.2,-1,-0.9] x1 x2 wj2 S w yj j3 x3 j wj4 x4 0th input 80 The McCulloch-Pitts Network vj=Swjixi; y=f(v); y=0 if v<=0 or y=1 if v>0 Input Vector? X = X (1,-20,4,-2); Weight vector? wji=w(wj1,wj2,wj3,wj4) =[0.8,0.2,-1,-0.9] wj0=0, x0=0 x1 x2 x3 x4 wj1 wj2 wj3 wj4 S j 0th input yj 81 The McCulloch-Pitts Network Input Vector? X = X (1,-20,4,-2); Weight vector? w=w(wj1,wj2,wj3,wj4) =[0.8,0.2,-1,-0.9] wj0=0, x0=0 vj=Swjixi; y=f(v); f activation function Linear Neuron: y=v McCulloch Pitts: y=0 if v<=0 or y=1 if v>0 Sigmoid activation function: f(v)=1 /(1+exp(-v)) 82 The McCulloch-Pitts Network What are the circumstance in a neuron with a sigmoidal activation function will act like a McCulloch Pitts network? Large synaptic weights What are the circumstance in a neuron with a sigmoidal activation function will act like a linear neuron? Small synaptic weights 83 Widrow-Hoff Networks: Error-correction or Performance Learning Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule, wnew wold (t ) e (t ) x (t ) η a positive constant, aka Rate of Learning, usually selected by trial and error through the heuristic that if η is too large, w will not converge., if η is too small, then w will take a long time to converge. Typically, 0.01≤ η ≤ 10, with η=0.1 as a usual starting point. 84 Widrow-Hoff Networks: Error-correction or Performance Learning Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that w(t 1) w(t ) e(t ) x (t ) w e(t ) x (t ) e(t ) d net ( x ) 85 Widrow-Hoff Networks: Error-correction or Performance Learning The net input is calculated by computing the weighted sum of all the input patterns. During each training cycle, the difference between the actual and desired output is to be minimized using the well-known least square minimization technique where attempt is made to minimize the error-energy, E, or the square of the difference of the errors: 1 2 2 E (e1 e2 .......... en2 ) n E w j ( ) 2 w j E w j ( ) (e1 x j1 e2 x j 2 ...en x jn ) 2 w j n 86 Widrow-Hoff Networks: Error-correction or Performance Learning Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that w(t 1) w(t ) e(t ) x (t ) w e(t ) x (t ) e(t ) d net ( x ) 87 Widrow-Hoff Networks: Error-correction or Performance Learning More precisely, let us consider n patterns that have to a Widrow-Hoff network has ‘learn’ to recognise correctly: for all the weights in the network, say j, Widrow and Hoff showed that w j (t ) (e1 (t ) x j1 (t ) e2 (t ) x j 2 (t ) ...en (t ) x jn (t )) n 88 Rosenblatt’s Perceptron Rosenblatt, Selfridge and others generalised McCulloch-Pitts form of the linear threshold law was generalised to laws such that activities of all pathways (cf. dendrites) impinging on a neuron are computed, and the neuron fires whenever some weighted sum of those activities is above a given amount. 89 Rosenblatt’s Perceptron In the early days of neural network modelling, considerable attention was paid to McCulloch and Pitts who essentially incorporated the behaviouristic learning approach, that of interrelating stimuli and responses as a mechanism for learning, due originally to Donald Hebb, for learning into a network of all-ornone neurons. This led a number of other workers to adapt this approach during the late 1940's. Prominent among these workers were Rosenblatt (1962): PERCEPTRONS Selfridge (1959): PANDEMONIUM The modellers called these networks, adaptive networks in that the network adapted to its environment quite autonomously. Rosenblatt developed a network architecture, and successfully implemented aspects of his architecture, which could make and learn choices between different patterns of sensory stimuli. 90 Rosenblatt’s Perceptron Rosenblatt's Perceptron has the following 'properties': (1) It can receive inputs from other neurons (2) The 'recipient' neuron can integrate the input (3) The connection weights are modelled as follows: If the presence of features xi stimulates the perceptron to fire then wi will be positive; If the presence of features xi inhibits the perceptron then wi will be negative. (4) The output function of the neuron is all-or-none (5) Learning is a process of modifying the weights Whatever a neuron can compute, it can learn to compute! 91 Rosenblatt’s Perceptron Rosenblatt's Perceptron is an early example of the so-called electronic neurons. The electronic neuron, a simulation of the biological neuron, had the following properties: (1)It can receive inputs from a number of sources (~dendrites inputting onto a neuron) e.g. other neurons (e.g. sensory input to an interneuron). Typically the inputs are vector-like - i.e. a magnitude and a sign; x = {x1, x2,....xn} (2) The 'recipient' electronic neuron can integrate the input - either by simply summing up the individual inputs or by weighing the individual inputs in proportion to their 'strength of connection' (wi) with the recipient (a biological neuron can filter, add, subtract and amplify the input) and then summing up the weighted input as g(x). Usually the summation function g(x) has an additional weight w0 - the threshold weight which incorporates the propensity of the electronic neuron to fire irrespective of the input (the depolarisation of the biological neuron's membrane induced by an external stimuli, results in the neuron responding with an action potential or impulse. The critical value of this depolarisation is called the threshold value). 92 Rosenblatt’s Perceptron (3) The connection weights are modelled as follows: (3a) If the presence of some features xi tends the perceptron to fire then wi will be positive; (3b) If the presence of some features xi inhibits the perceptron then wi will be negative. (4) The output function of the electronic neuron (the impulse output along the axon terminals of biological neurons) is all-ornone output in that the output function: ouptut(x) = 1 if g(x) > 0 = 0 if g(x) < 0 (5) Learning, in electronic neurons, is a process of modifying the values of the weights (plasticity of synaptic connections) and the threshold. 93 Rosenblatt’s Perceptron Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates. A combination of these gates, in turn, comprise the central processing unit of a computer (and othe parts): Ergo, perceptrons can learn to build themselves into computer systems!! Rosenblatt became very famous for suggesting that one can design a computer based on neuro-scientific evidence 94 Rosenblatt’s Perceptron The XOR ‘problem’ The simple perceptron cannot learn a linear decision surface to separate the different outputs, because no such decision surface exists. Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture. Rosenblatt’s Perceptron Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates. A combination of these gates, in turn, comprise the central processing unit of a computer: Ergo, perceptrons can learn to build themselves into computer systems!! The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book: ‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’. 96 Rosenblatt’s Perceptron The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book: ‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’. Boole’s algebra of logic forms the basis of (computer) hardware design and includes the various processing units within. Boolean logic is used to specify how key operations on a computer system, like addition, subtraction, comparison of two values and so on, are to be excuted. Boolean logic is an integral part of hardware design and hardware circuits are usually referred to as logic circuits. 97 Rosenblatt’s Perceptron Logic Gate: A digital circuit that implements an elementary logical operation. It has one or more inputs but ONLY one output. The conditions applied to the input(s) determine the voltage levels at the output. The output, typically, has two values ‘0’ or ‘1’. Digital Circuit: A circuit that responds to discrete values of input (voltage) and produces discrete values of output (voltage). Binary Logic Circuits: Extensively used in computers to carry out instructions and arithmetical processes. Any logical procedure maybe effected by a suitable combinations of the gates. Binary circuits are typically formed from discrete components like the integrated circuits. 98 Rosenblatt’s Perceptron Logic Circuits: Designed to perform a particular logical function based on AND, OR (either), and NOR (neither). Those circuits that operate between two discrete (input) voltage levels, high & low, are described as binary logic circuits. Logic element: Small part of a logic circuit, typically, a logic gate, that may be represented by the mathematical operators in symbolic logic. 99 Rosenblatt’s Perceptron Gate Input(s) Output AND Two (or more) High if and only if both (or all) inputs are high. NOT One High if input low and vice versa OR Two (or more) High if any one (or more) inputs are high 100 Rosenblatt’s Perceptron The operation of an AND gate Input 1 Input 2 Output 0 0 0 0 1 0 1 0 0 1 1 1 AND (x,y)= minimum_value(x,y); AND (1,0)=minimum_value(1,0)=0; AND (1,1)=minimum_value(1,1)=1 101 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. A hard-wired perceptron below performs the AND operation. This is hardwired because the weights are predetermined and not learnt x 1 w=+1 1 w=+1 2 x Sw1x1+w2x2+ y=1 if S y=0 ifS 2 = 1.5 102 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. A learning perceptron below performs the AND operation. An algorithm: Train the network for a number of epochs (1) Set initial weights w1 and w2 and the threshold θ to set of random numbers; (2) Compute the weighted sum: x1*w1+x2*w2+ θ (3) Calculate the output using a delta function y(i)= delta(x1*w1+x2*w2+ θ ); delta(x)=1, if x is greater than zero, delta(x)=0,if x is less than equal to zero (4) compute the difference between the actual output and desired output: e(i)= y(i)-ydesired (5) If the errors during a training epoch are all zero then stop otherwise update wj(i+1)=wj(i)+ *xj*e(i) , j=1,2 103 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices: =0.1 Θ=0.2 Epoch X1 X2 Y Weights W2 Actual Output 1 0 0 0 0.3 -0.1 0 0 0.3 -0.1 0 1 0 0.3 -0.1 0 0 0.3 -0.1 1 0 0 0.3 -0.1 1 -1 0.2 -0.1 1 1 1 0.2 -0.1 0 desired Initial W1 Error Final W1 1 0.3 Weights W2 0.0 104 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. Epoch X1 X2 Ydesire d Initial W1 Weights W2 Actual Output Error Final W1 Weights W2 2 0 0 0 0.3 0.0 0 0 0.3 0.0 0 1 0 0.3 0.0 0 0 0.3 0.0 1 0 0 0.3 0.0 1 -1 0.2 0.0 1 1 1 0.2 0.0 1 0 0.2 0.0 105 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. Epoch X1 X2 Ydesire d Initial W1 Weights W2 Actual Output Error Final W1 Weights W2 3 0 0 0 0.2 0.0 0 0 0.2 0.0 0 1 0 0.2 0.0 0 0 0.2 0.0 1 0 0 0.2 0.0 1 -1 0.1 0.0 1 1 1 0.0 1 1 0.2 0.1 0.1 106 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. Epoch X1 X2 Ydesire d Initial W1 Weights W2 Actual Output Error Final W1 Weights W2 4 0 0 0 0.2 0.1 0 0 0.2 0.1 0 1 0 0.2 0.1 0 0 0.2 0.1 1 0 0 0.2 0.1 1 -1 0.1 0.1 1 1 1 0.1 1 0 0.1 0.1 0.1 107 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. Epoch X1 X2 Ydesire d Initial W1 Weights W2 Actual Output Error Final W1 Weights W2 5 0 0 0 0.1 0.1 0 0 0.1 0.1 0 1 0 0.1 0.1 0 0 0.1 0.1 1 0 0 0.1 0.1 0 0 0.1 0.1 1 1 1 0.1 0.1 1 0 0.1 0.1 108 Preamble Neural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms. ERROR-CORRECTION OR PERFORMANCE LEARNING HEBBIAN OR COINCIDENCE LEARNING BOLTZMAN LEARNING (STOCHASTIC NET LEARNING) COMPETITIVE LEARNING FILTER LEARNING (GROSSBERG'S NETS) 109 Preamble Neural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms. California sought to have the license of one of the largest auditing firms (Ernst & Young) removed because of their role in the well-publicized collapse of Lincoln Savings & Loan Association. Further, regulators could use a bankruptcy 110 Rosenblatt’s Perceptron A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices. However, the single layer perceptron cannot perform the exclusive-OR or XOR operation. The reason is that a single layer perceptron can only classify two classes, say C1 and C2, should be sufficiently separated from each other to ensure the decision surface consists of a hyperplane. Linearly separable classes C1 C2 Linearly non-separable classes C1 C2 111 Rosenblatt’s Perceptron An informal perceptron learning algorithm: •If the perceptron fires when it should not, make each wi smaller by an amount proportional to xi. •If the perceptron fails to fire when it should fire, make each wi larger by a similar amount. 112 Rosenblatt’s Perceptrons A neuron learns because it is adaptive: • SUPERVISED LEARNING: The connection strengths of a neuron are modifiable depending on the input signal received, its output value and a pre-determined or desired response. The desired response is sometimes called teacher response. The difference between the desired response and the actual output is called the error signal. • UNSUPERVISED LEARNING: In some cases the teacher’s response is not available and no error signal is available to guide the learning. When no teacher’s response is available the neuron, if properly configured, will modify its weight based only on the input and/or output. Zurada (1992:59-63) 113 Rosenblatt’s Perceptrons Rosenblatt’s perceptrons learn in presence of a teacher. The desired signal is denoted as di and the output as yi. The error signal is denoted as ei . The weights are modified in accordance with the perceptron learning rule; the weight change is denoted as w which is proportional to the error signal; c is a proportionality constant: r d i yi ; yi sgn( wit x); and wi c[d i sgn( wit x)] x wij c[d i sgn( wit x)] x j 114 Rosenblatt’s Perceptrons A fixed increment perceptron algorithm Given: A classification problem with n input features (x0,x1, x2, .....xn) and 2 output classes. Compute: A set of weights (w0, ,w1,.....wn) that will cause a perceptron to fire whenever the input falls into the first output class. An Algorithm Step Action 1. Create a perceptron with n+1 inputs and n+1 weights, where the extra input x0 is always set to 1. 2. Initialise the weights (w0, ,w1,.....wn) to random real values. 3. Iterate thorough the training set, collecting all examples misclassified by the current set of weights. 4. If all examples are classified correctly, output the weights and quit. 5. Otherwise, compute the vector sum S of the misclassified input vectors, where each vector has the form (x0,x1, x2, .....xn). In creating the sum, add to S a vector x if x is an input for which the perceptron incorrectly fails to fire, but add vector -x if x is an input for which the perceptron incorrectly fires. Multiply the sum by a scale factor h 6. Modify the weights (w0, ,w1,.....wn) by adding the elements of the vector S them. GO TO STEP 3. to 115 Rosenblatt’s Perceptrons Consider the following set of training vectors x1, x2, and x3, which are to be used in training a Rosenblatt's perceptron, labelled j with the desired responses d1, d2, and d3, and initial weights wj1, wj2, and wj3, x1 x2 x3 wj1 wj2 wj3 S yj j d1 d 2 d 3 116 Rosenblatt’s Perceptrons The Method: The Perceptron j has to learn all the three patterns x1, x2, and x3, such that when we show patterns as same as the three or similar patterns the perceptron recognises them. How will the perceptron indicate that it has recognised the patterns? By responding as d1, d2, and d3, respectively when shown x1, x2, and x3. We have to show the patterns repeatedly to the perceptron. At each showing (training cycle) the weights change in an attempt to produce the correct desired response. 117 Rosenblatt’s Perceptrons Definition x of OR (0,1) Input Decision line Output X1 X2 Y 0 0 0 0 1 1 1 0 1 1 1 1 (1,1) x1 (0,0) (1,0) Denotes 1 Denotes 0 118 Rosenblatt’s Perceptrons Definition of AND (0,1) Input x2 Output X1 X2 Y 0 0 0 0 1 0 1 0 0 (1,1) Decision line x1 (0,0) (1,0) Denotes 1 Denotes 0 119 Rosenblatt’s Perceptrons Definition of XOR (0,1) x2 (1,1) Decision line #2 Input Output Decision line #1 X1 X2 Y 0 0 0 0 1 1 1 0 1 1 1 0 x1 (0,0) (1,0) Denotes 1 Denotes 0 120 Rosenblatt’s Perceptron The XOR ‘problem’ The simple perceptron cannot learn a linear decision surface to separate the different outputs, because no such decision surface exists. Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture. Rosenblatt’s Perceptron The XOR ‘problem’ For simulating the behaviour of an XOR-gate we need to draw elliptical decision surfaces that would encircle two ‘1’ outputs: A simple perceptron is unable to do so. Solution? Employ two separate linedrawing stages. Rosenblatt’s Perceptron The XOR ‘problem’ One line drawing to separate the pattern where both the inputs are ‘0’ leading to an output ‘0’ and another line drawing to separate the remaining three I/O patterns where either of the inputs is ‘0’ leading to an output ‘1’ where both the inputs are ‘0’ leading to an output ‘0’ Rosenblatt’s Perceptron The XOR ‘solution’ In effect we use two perceptrons to solve the XOR problem: The output of the first perceptron becomes the input of the second. If the first perceptron sees both inputs as ‘1’. it sends a massive inhibitory signal to the second perceptron causing it to output ‘0’. If either of the inputs is ‘0’ the second perceptron gets no inhibition from the first perceptron and outputs 1, and outputs ‘1’ if either of the inputs is ‘1’. Rosenblatt’s Perceptron The XOR ‘solution’ The multilayer perceptron designed to solve the XOR problem has a serious problem. The perceptron convergence theorem does not extend to multilayer perceptrons. The perceptron learning algorithm can adjust the weights between the inputs and outputs, but it cannot adjust weights between perceptrons. For this we have to wait for the back-propagation learning algorithms. Rosenblatt’s Perceptrons •A perceptron computes a binary function of its input. A group of perceptrons can be trained on sample inputoutput pairs until it learns to compute the correct function. •Each perceptron, in some models, can function independently of others in the group, they can be separately trained – linearly separable. •Thresholds can be varied together with weights. •Given values of x and x to train such that the perceptron outputs 1 for white dots and 0 for black dots. 1 2 126 Rosenblatt’s Perceptrons Rosenblatt’s contribution •What Rosenblatt proved was that if the patterns were drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes the perceptron convergence theorem (Haykin 117). 127 Rosenblatt’s Perceptrons X2 X1 X 2 X1XORX 2 0 0 0 0 1 1 1 0 1 1 1 0 x X1 1 -0.5 1 -1.5 1 x1 -9.0 S x1 1 x2 x2 x If w J(w) = x x S 1 1 is misclassified as a negative example If –x is misclassified as a positive example J(w) is Called the Perceptron Criterion Function 128 Rosenblatt’s Perceptrons X2 The rate of change of J(w) with all the different weights, w1, w2, w3, w4…w, tells us the direction to move in. To find a solution change the weights in the direction of the gradient, recompute J(w), and recompute the gradient of J(w) and iterate until J(w)=0 wnew = wold +J(w) 129 Rosenblatt’s Perceptrons Multilayer Perceptron The perceptron built around a single neuron is limited to performing pattern classification with only two classes (hypotheses). By expanding the output (computation) layer of perceptron to include more than one neuron, it is possible to perform classification with more than two classes- but the classes have to be seperable. 130 ANN Learning Algorithms Vector describing the environment ENVIRONMENT TEACHER Desired response LEARNING SYSTEM S Actual Response - + Error Signal 131 ANN Learning Algorithms ENVIRONMENT Vector describing state of the environment LEARNING SYSTEM 132 ANN Learning Algorithms Primary Reinforcement State-vector input ENVIRONMENT CRITIC Heuristic Reinforcement Actions LEARNING SYSTEM 133 Hebbian Learning DONALD HEBB, a Canadian psychologist, was interested in investigating PLAUSIBLE MECHANISMS FOR LEARNING AT THE CELLULAR LEVELS IN THE BRAIN. (see for example, Donald Hebb's (1949) The Organisation of Behaviour. New York: Wiley) 134 Hebbian Learning HEBB’s POSTULATE: When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A's efficiency as one of the cells firing B, is increased. 135 Hebbian Learning Hebbian Learning laws CAUSE WEIGHT CHANGES IN RESPONSE TO EVENTS WITHIN A PROCESSING ELEMENT THAT HAPPEN SIMULTANEOUSLY. THE LEARNING LAWS IN THIS CATEGORY ARE CHARACTERIZED BY THEIR COMPLETELY LOCAL - BOTH IN SPACE AND IN TIME-CHARACTER. 136 Hebbian Learning LINEAR ASSOCIATOR: A substrate for Hebbian Learning Systems y’1 y1 y’2 y2 y’3 y3 Output y’ w11 w12 w13 w21 w22 w23 w31 w32 w33 Input x x1 x2 x3 137 Hebbian Learning A simple form of Hebbian Learning Rule wk j new wk j old yk x j , where h is the so-called rate of learning and x and y are the input and output respectively. This rule is also called the activity product rule. 138 Hebbian Learning A simple form of Hebbian Learning Rule If there are "m" pairs of vectors, x1 , y 1 xm , y m to be stored in a network ,then the training sequence will change the weight-matrix, w, from its initial value of ZERO to its final state by simply adding together all of the incremental weight change caused by the "m" applications of Hebb's law: w y 1x 1T y 2 x T2 y n x Tn 139 Hebbian Learning A worked example: Consider the Hebbian learning of three input vectors: x (1) 1 1 0 2 0.5 1 ; x ( 3 ) ; x ( 2) 1.5 2 1 0 1.5 1.5 in a network with the following initial weight vector: 1 1 w 0 0.5 140 Hebbian Learning A worked example: Consider the Hebbian learning of three input vectors: x (1) 1 1 0 2 0.5 1 ; x ( 3 ) ; x ( 2) 1.5 2 1 0 1.5 1.5 in a network with the following initial weight vector: 1 1 w 0 0.5 141 Hebbian Learning A worked example: Consider the Hebbian learning of three input vectors: x (1) 1 1 0 2 0.5 1 ; x ( 3 ) ; x ( 2) 1.5 2 1 0 1.5 1.5 in a network with the following initial weight vector: 1 1 w 0 0.5 142 Hebbian Learning A worked example: Consider the Hebbian learning of three input vectors: x (1) 1 1 0 2 0.5 1 ; x ( 3 ) ; x ( 2) 1.5 2 1 0 1.5 1.5 in a network with the following initial weight vector: 1 1 w 0 0.5 143 Hebbian Learning The worked example shows that with discrete f(net) and =1, the weight change involves ADDING or SUBTRACTING the entire input pattern vectors to and from the weight vectors respectively. Consider the case when the activation function is a continuous one. For example, take the bipolar continuous activation function: f (net ) 2 1 exp ( *net ) 1; where 0. 144 Hebbian Learning The worked example shows that with bipolar continuous activation function indicates that the weight adjustments are tapered for the continuous function but are generally in the same direction: Vector x(1) Discrete Bipolar f(net) 1 Continuous Bipolar f(net) 0.905 x(2) -1 -0.077 x(3) -1 -0.932 145 Hebbian Learning The details of the computation for the three steps with a discrete bipolar activation function are presented below in the notes pages. The input vectors and the initial weight vector are: x (1) 1 1 0 2 0.5 1 ; x ( 2) ; x ( 3 ) 1.5 2 1 0 1 . 5 1 . 5 1 1 w 0 0 . 5 146 Hebbian Learning The details of the computation for the three steps with a continuous bipolar activation function are presented below in the notes pages. The input vectors and the initial weight vector are: x (1) 1 1 0 2 0.5 1 ; x ( 2) ; x ( 3 ) 1.5 2 1 0 1 . 5 1 . 5 1 1 w 0 0 . 5 147 Hebbian Learning Recall that the simple form of Hebbian learning law suggests that the repeated application of the presynaptic signal xj leads to an increase in yk and therefore exponential growth that finally drives the synaptic connection into saturation. wk j new wk j old wk j yk x j A number of researchers have proposed ways in which such saturation can be avoided. Sejnowski has suggested that wk j ( y k y ) ( x j x ), where x the time averaged value of x j ; & y the time averaged value of y k . 148 Hebbian Learning The Hebbian synapse described below is said to involve the use of POSITIVE FEEDBACK. wk j new wk j old wk j yk x j 149 Hebbian Learning What is the principal limitation of this simplest form of learning? wk j new wk j old wk j yk x j The above equation suggests that the repeated application of the input signal leads to an increase in , and therefore exponential growth that finally drives the synaptic connection into saturation. At that point of saturation no information cannot be stored in the synapse and selectivity will be lost. Graphically the relationship with the postsynaptic activityis a simple one: it is linear with a slope . 150 Hebbian Learning The so-called covariance hypothesis was introduced to deal with the principal limitation of the simplest form of Hebbian learning and is given as wkj (n) ( yk (n) y ) ( x j (n) x ) where and denote the time-averaged values of the pre-synaptic and postsynaptic signals. 151 Hebbian Learning wkj (n) ( yk (n) y ) ( x j (n) x ) If we expand the above equation: wkj (n) yk (n) x j (n) y x j (n) yk (n) x yx ) the last term in the above equation is a constant and the first term is what we have for the simplest Hebbian learning rule: Modified wkj ( n ) Simple wkj ( n ) y x j (n) yk ( n ) x y x 152 Hebbian Learning Graphically the relationship ∆wij with the postsynaptic activity yk is still linear but with a slope ( x j ( n) x ) and the assurance that the straight line curve changes its rate of change at y and the minimum value of the weight change ∆wij is ( x j ( n) x ) 153