* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS407 Neural Computation
Survey
Document related concepts
Transcript
CS407 Neural Computation Lecture 2: Neurobiology and Architectures of ANNs Lecturer: A/Prof. M. Bennamoun NERVOUS SYSTEM & HUMAN BRAIN Organization of the nervous system Central Nervous System Central Nervous System Brain Spinal cord Hindbrain & Midbrain Brain stem & cerebellum Forebrain Sub-cortical structures Cortex Thalamus Hypothalamus Limbic system Frontal, parietal, occipital & temporal lobes in left & right hemispheres THE BIOLOGICAL NEURON The Structure of Neurons synapse synapse axon axon nucleus nucleus cell cellbody body dendrites dendrites The Structure of Neurons A neuron has a cell body, a branching input structure (the dendrIte) and a branching output structure (the axOn) • Axons connect to dendrites via synapses. • Electro-chemical signals are propagated from the dendritic input, through the cell body, and down the axon to other neurons The Structure of Neurons • A neuron only fires if its input signal exceeds a certain amount (the threshold) in a short time period. • Synapses vary in strength – Good connections allowing a large signal – Slight connections allow only a weak signal. – Synapses can be either excitatory or inhibitory. Neurotransmission http://www.health.org/pubs/qdocs/mom/TG/intro.htm Neurons come in many shapes & sizes The brain’s plasticity The ability of the brain to alter its neural pathways. Recovery from brain damage – Dead neurons are not replaced, but branches of the axons of healthy neurons can grow into the pathways and take over the functions of damaged neurons. – Equipotentiality: more than one area of the brain may be able to control a given function. – The younger the person, the better the recovery (e.g. recovery from left hemispherectomy). THE ARTIFICIAL NEURON Model Models of Neuron Neuron is an information processing unit A set of synapses or connecting links – characterized by weight or strength An adder – summing the input signals weighted by synapses – a linear combiner An activation function – also called squashing function • squash (limits) the output to some finite values Nonlinear model of a neuron (I) Bias bk x1 wk1 x2 wk2 vk Σ ... ... xm Activation function ϕ(.) wkm Summing junction Synaptic weights Input signal m v = ∑w x +b k j =1 kj j k y = ϕ (v ) k k Output yk Analogy • Inputs represent synapses • Weights represent the strengths of synaptic links • Wi represents dentrite secretion • Summation block represents the addition of the secretions • Output represents axon voltage Nonlinear model of a neuron (II) Wk0 = bk (bias) X0 = +1 wk0 x1 wk1 x2 wk2 Activation function Σ ... ... xm vk ϕ(.) wkm Summing junction Synaptic weights Input signal m v = ∑w x k j =0 kj j y = ϕ (v ) k k Output yk THE ARTIFICIAL NEURON Activation Function Types of Activation Function Oj Oj Oj +1 +1 +1 t ini The hard-limiting Threshold Function Corresponds to the biological paradigm: either fires or not 1 O (in ) = 0 in > t in ≤ t t Piecewise-linear Function ini ini Sigmoid Function (differentiable) ('S'-shaped curves) 1 ϕ (v) = 1 + exp( − av) a is slope parameter Activation Functions... Threshold or step function (McCulloch & Pitts model) Linear: neurons using a linear activation function are called in the literature ADALINEs (Widrowy1960) Sigmoidal functions: functions which more exactly describe non-linear functions of the biological neurons. Activation Functions...sigmoid β1 1 0 1 ϕβ (v) = 1 + exp(−βv) if β2 v (i ) βν → ∞ then ϕ β (ν ) → 1 (ii ) βν → −∞ then ϕ β (v) → 0 β →∞ (iii ) then ϕ β (ν ) → 1(ν ) and ν fixed 1(v) is the modified Heaviside function Activation Functions... sigmoid 1 ν ≥ 0 1H (ν ) = 0 ν < 0 1H(s) is the Heaviside function 1H (ν ) 1 v 1 ν >0 1(ν ) = 0 ν < 0 1 / 2 ν = 0 1(ν ) 1 1/2 s 1(v) is the modified Heaviside function Activation Function value range +1 +1 vi vi -1 Hyperbolic tangent Function ϕ (v) = tanh(v) Signum Function (sign) Stochastic Model of a Neuron • So far we have introduced only deterministic models of ANNs. • A stochastic (probabilistic) model can also be defined. • If x denotes the state of a neuron, then P(v) denotes the prob. of firing a neuron, where v is the induced activation potential (bias + linear combination). P (v ) = 1 1+ e − v T Stochastic Model of a Neuron… • Where T is a pseudo-temperature used to control the noise level (and therefore the uncertainty in firing) T →0 Stochastic model Æ deterministic model + 1 v ≥ 0 x= − 1 v < 0 DECISION BOUNDARIES Decision boundaries • In simple cases, divide feature space by drawing a hyperplane across it. • Known as a decision boundary. • Discriminant function: returns different values on opposite sides. (straight line) • Problems which can be thus classified are linearly separable. E.g. Decision Surface of a Perceptron x2 + + + x2 + - + - - Linearly separable x1 x1 - + Non-Linearly separable • Perceptron is able to represent some useful functions • AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1 • But functions that are not linearly separable (e.g. XOR) are not representable Linear Separability X2 A A A (x2,y2) (x3,y3) x2 = − (x8,y8) A A (x4,y4) A A (x7,y7) B B (x5,y5) (x6,y6) Decision Boundary B (x1,y1) B (x10,y10) B B B (x11,y11) B w1 t x1+ w2 w2 X1 Rugby players & Ballet dancers Rugby ? 2 Height (m) Ballet? 1 50 Weight (Kg) 120 1 ν >0 f (ν ) = 0 ν = 0 − 1 ν < 0 Training the neuron X0=-1 +1 W0 = t=? x1 W1 = ? v Σ v W2 = ? x2 x0 w0 + x1w1 + x3 w3 = 0 t -1 x0 = −1; w0 = t It is clear that: ( x, y) ∈ A iff x1w1 + x2 w2 > t ( x, y) ∈ B iff x1w1 + x2 w2 < t Finding wi is called learning THE ARTIFICIAL NEURON Learning Supervised Learning –The desired response of the system is provided by a teacher, e.g., the distance ρ[d,o] as as error measure – Estimate the negative error gradient direction and reduce the error accordingly – Modify the synaptic weights to reduce the stochastic minimization of error in multidimensional weight space Unsupervised Learning (Learning without a teacher) –The desired response is unknown, no explicit error information can be used to improve network behavior. E.g. finding the cluster boundaries of input pattern –Suitable weight self-adaptation mechanisms have to embedded in the trained network Training 1 if Σ wi xi >t i=0 Output= 0 otherwise { Linear threshold is used. W - weight value t - threshold value Simple network 1 if Output= 0 AND with a Biased input -1 X Y W1 = 1.5 W2 = 1 t = 0.0 W3 = 1 Σ wi xi >t otherwise Learning algorithm While epoch produces an error Present network with next inputs from epoch Error = T – O If Error <> 0 then Wj = Wj + LR * Ij * Error End If End While Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1 Learning algorithm Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1 Output , O : The output value from the neuron Ij : Inputs being presented to the neuron Wj : Weight from input neuron (Ij) to the output neuron LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1 Training the neuron -1 W1 = ? x y W2 = ? t = 0.0 W3 = ? For AND A B Output 00 0 01 0 10 0 11 1 •What are the weight values? •Initialize with random weight values Training the neuron -1 W1 = 0.3 x W2 = 0.5 t = 0.0 W3 =-0.4 y I1 I2 I3 Summation For AND A B Output 00 0 01 0 10 0 11 1 Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0 Learning in Neural Networks Learn values of weights from I/O pairs Start with random weights Load training example’s input Observe computed input Modify weights to reduce difference Iterate over all training examples Terminate when weights stop changing OR when error is very small NETWORK ARCHITECTURE/ TOPOLOGY Network Architecture Single-layer Feedforward Networks – input layer and output layer • single (computation) layer – feedforward, acyclic Multilayer Feedforward Networks – hidden layers - hidden neurons and hidden units – enables to extract high order statistics – 10-4-2 network, 100-30-10-3 network – fully connected layered network Recurrent Networks – at least one feedback loop – with or without hidden neuron Network Architecture Single layer Multiple layer fully connected Unit delay operator Recurrent network without hidden units } inputs { outputs Recurrent network with hidden units Feedforward Networks (static) Input Layer Hidden Layers Output Layer Feedforward Networks… • One I/P and one O/P layer • One or more hidden layers • Each hidden layer is built from artificial neurons • Each element of the preceding layer is connected with each element of the next layer. • There is no interconnection between artificial neurons from the same layer. • Finding weights is a task which has to be done depending on which solution problem is to be performed by a specific network. Feedback Networks (Recurrent or dynamic systems) Input Layer Hidden Layers Output Layer Feedback Networks … (Recurrent or dynamic systems) • The interconnections go in two directions between ANNs or with the feedback. • Boltzmann machine is an example of recursive nets which is a generalization of Hopfield nets. Other example of recursive nets: Adaptive Resonance Theory (ART) nets. Neural network as directed Graph x0 = +1 Wk0 = bk x1 wk1 vk wk2 x2 ... wkm xm ϕ(.) yk Neural network as directed Graph… Block diagram can be simplify by the idea of signal flow graph node is associated with signal directed link is associated with transfer function – synaptic links • • governed by linear input-output relation signal xj is multiplied by synaptic weight wkj – activation links • • governed by nonlinear input-output relation nonlinear activation function Feedback Output determines in part own output via feedback xj’(n) xj(n) w z-1 yk ∞ ( n) = ∑ w l +1 i =0 x j (n − l ) depending on w – stable, linear divergence, exponential divergence – we are interested in the case of |w| <1 ; infinite memory • yk(n) output depends on inputs of infinite past NN with feedback loop : recurrent network NEURAL PROCESSING Neural Processing • Recall – The process of computation of an output o for a given input x performed by the ANN. – It’s objective is to retrieve the information, i.e., to decode the stored content which must have been encoded in the network previously • Autoassociation – A network is presented a pattern similar to a member of the stored set, autoassociation associates the input pattern with the closest stored pattern. Neural Processing… Autoassociation: reconstruction of incomplete or noisy image. • Heteroassociation: – The network associates the input pattern with pairs of patterns stored Neural Processing… Classification – A set of patterns is already divided into a number of classes, or categories - When an input pattern is presented, the classifier recalls the information regarding the class membership of the input pattern – The classes are expressed by discrete-valued output vectors, thus the output neurons of the classifier employ binary activation functions – A special case of heteroassociation Neural Processing… •Recognition If the desired response is the class number, but the input pattern doesn’t exactly corresponding to any of the patterns in the stored set Neural Processing… • Clustering – Unsupervised classification of patterns/objects without providing information about the actual classes – The network must discover for itself any existing patterns, regularities, separating properties, etc. – While discovering these, the network undergoes change of its parameters, which is called Selforganization Neural Processing… patterns stored Summary Parallel distributed processing (especially a hardware based neural net) is a good approach for complex pattern recognition (e.g. image recognition, forecasting, text retrieval, optimization) Less need to determine relevant factors a priori when building a neural network Lots of training data are needed High tolerance to noisy data. In fact, noisy data enhance post-training performance Difficult to verify or discern learned relationships even with special knowledge extraction utilities developed for neural nets References: 1. ICS611 Foundations of Artificial Intelligence, Lecture notes, Univ. of Nairobi, Kenya: Learning – http://www.uonbi.ac.ke/acad_depts/ics/course_material- 1. Berlin Chen Lecture notes: Normal University, Taipei, Taiwan, ROC. http://140.122.185.1202. Lecture notes on Biology of Behaviour, PYB012- Psychology, by James Freeman, QUT. 3. Jarl Giske Lecture notes: University of Bergen Norway, http://www.ifm.uib.no/staff/giske/ 4. Denis Riordan Lecture notes, Dalhousie Univ.:http://www.cs.dal.ca/~riordan/ 5. Artificial Neural Networks (ANN) by David Christiansen: http://www.pa.ash.org.au/qsite/conferences/conf2000/ moreinfo.asp?paperid=95 References: •Jin Hyung Kim, KAIST Computer Science Dept., CS679 Neural Network lecture notes http://ai.kaist.ac.kr/~jkim/cs679/detail.htm