Download Neural network

Machine Learning Neural Network Bai Xiao Neural Network –Roadmap • What is Neural Network • Artificial Neuron Model – Perceptron • Multi-layer and Backpropogation algorithm • Conclusion Neural Network • Analogy to biological neural systems, the most robust learning systems we know. • Attempt to understand natural biological systems through computational modeling. • Massive parallelism allows for computational efficiency. • Help understand “distributed” nature of neural representations (rather than “localist” representation) that allow robustness and graceful degradation. • Intelligent behavior as an “emergent” property of large number of simple units rather than from explicitly encoded symbolic rules and algorithms. Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems. The nervous system is build by relatively simple units, the neurons, so copying their behavior and functionality should be the solution. Biological inspiration Dendrites Soma (cell body) Axon Neural Network • Cell structures – – – – Cell body （细胞核） Dendrites （树突） Axon （轴突） Synaptic terminals （轴突末梢） Neural Network • Electrical potential across cell membrane(细胞膜) exhibits spikes called action potentials. • Spike originates in cell body, travels down axon（轴突）, and causes synaptic terminals to release neurotransmitters. • Chemical diffuses across synapse to dendrites of other neurons. • Neurotransmitters can be excititory or inhibitory. • If net input of neurotransmitters to a neuron from other neurons is excititory and exceeds some threshold, it fires an action potential. How the Human Brain learns • • • In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. A Neuron Model • When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes. • We conduct these neural networks by first trying to deduce the essential features of neurons and their interconnections. We then typically program a computer to simulate these features. • Biological inspiration The spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse. The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron. The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron. The contribution of the signals depends on the strength of the synaptic connection. Artificial neurons Neurons work by processing information. They receive and provide information in form of spikes. w1 x1 Inputs x2 x3 … xn-1 xn n z   wi xi ; y  H ( z ) w2 . i 1 w3 . . wn-1 wn The McCullogh-Pitts model Output y A Simple Neuron • An artificial neuron is a device with many inputs and one output. • The neuron has two modes of operation; • the training mode and • the using mode. A Simple Neuron (Cont.) • In the training mode, the neuron can be trained to fire (or not), for particular input patterns. • In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not. • The firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained on previously. Pattern Recognition • An important application of neural networks is pattern recognition. Pattern recognition can be implemented by using a feed-forward neural network that has been trained accordingly. • During training, the network is trained to associate outputs with input patterns. • When the network is used, it identifies the input pattern and tries to output the associated output pattern. • The power of neural networks comes to life when a pattern that has no output associated with it, is given as an input. • In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern. Pattern Recognition (cont.) • Suppose a network is trained to recognize the patterns T and H. The associated patterns are all black and all white respectively as shown above. Pattern Recognition (cont.) Since the input pattern looks more like a ‘T’, when the network classifies it, it sees the input closely resembling ‘T’ and outputs the pattern that represents a ‘T’. Pattern Recognition (cont.) The input pattern here closely resembles ‘H’ with a slight difference. The network in this case classifies it as an ‘H’ and outputs the pattern representing an ‘H’. Pattern Recognition (cont.) • Here the top row is 2 errors away from a ‘T’ and 3 errors away from an H. So the top output is a black. • The middle row is 1 error away from both T and H, so the output is random. • The bottom row is 1 error away from T and 2 away from H. Therefore the output is black. • Since the input resembles a ‘T’ more than an ‘H’ the output of the network is in favor of a ‘T’. Neural Network • Learning approach based on modeling adaptation in biological neural systems. • Perceptron (感知器): Initial algorithm for learning simple neural networks (single layer) developed in the 1950’s. • Backpropagation（反向传播）: More complex algorithm for learning multi-layer neural networks developed in the 1980’s. Artificial Neuron Model • Model network as a graph with cells as nodes and synaptic connections as weighted edges from node i to node j, wji 1 • Model net input to cell as w12 net j   w jioi w15 w16 w13 w14 2 i 3 4 5 6 • Cell output is: 0 if net j  T j oj  1 if neti  T j (Tj is threshold for unit j) oj 1 0 Tj netj Artificial Neuron Model • McCollough and Pitts (1943) showed how such model neurons could compute logical functions and be used to construct finite-state machines. • Can be used to simulate logic gates: – AND: Let all wji be Tj/n, where n is the number of inputs. – OR: Let all wji be Tj – NOT: Let threshold be 0, single input with a negative weight. • Can build arbitrary logic circuits, sequential machines, and computers with such gates. • Given negated inputs, two layer network can compute any boolean function using a two level AND-OR network. Artificial Neuron Model Perceptron Training • Assume supervised training examples giving the desired output for a unit given a set of known input activations. • Learn synaptic weights (w) so that unit produces the correct output for each example. • Perceptron uses iterative update algorithm to learn a correct set of weights. Artificial Neuron Model Perceptron Training • Update weights by: w ji  w ji   (t j  o j )oi where η is the “learning rate” tj is the teacher specified output for unit j. • Equivalent to rules: – If output is correct do nothing. – If output is high, lower weights on active inputs – If output is low, increase weights on active inputs • Also adjust threshold to compensate: T j  T j   (t j  o j ) Artificial Neuron Model Perceptron Training • Iteratively update weights until convergence. Initialize weights to random values Until outputs of all training examples are correct For each training pair, E, do: Compute current output oj for E given its inputs Compare current output to target value, tj , for E Update synaptic weights and threshold using learning rule • Each execution of the outer loop is typically called an epoch. Perceptron as a Linear Separator • Since perceptron uses linear threshold function, it is searching for a linear separator that discriminates the classes. w12o2  w13o3  T1 o3 ?? w12 T1 o3   o2  w13 w13 o2 Or hyperplane in n-dimensional space Perceptoron Limits • Cannot learn exclusive-or, or parity function in general. o3 1 + ?? – 0 – + o2 1 26 Perceptron Limits • System obviously cannot learn concepts it cannot represent. • Minksy and Papert (1969) wrote a book analyzing the perceptron and demonstrating many functions it could not learn. • These results discouraged further research on neural nets. Perceptron Convergence and Cycling Theorems • Perceptron convergence theorem: If the data is linearly separable and therefore a set of weights exist that are consistent with the data, then the Perceptron algorithm will eventually converge to a consistent set of weights. • Perceptron cycling theorem: If the data is not linearly separable, the Perceptron algorithm will eventually repeat a set of weights and threshold at the end of some epoch and therefore enter an infinite loop. Perceptron as Hill Climbing • The hypothesis space being search is a set of weights and a threshold. • Objective is to minimize classification error on the training set. • Perceptron effectively does hill-climbing (gradient descent) in this space, changing the weights a small amount at each point to decrease training set error. • For a single model neuron, the space is well behaved with a single minima. training error 0 weights Backpropogation • Multi-layer networks can represent arbitrary functions, but an effective learning algorithm for such networks was thought to be difficult. • A typical multi-layer network consists of an input, hidden and output layer, each fully connected to the next, with activation feeding forward. output hidden activation input • The weights determine the function computed. Given an arbitrary number of hidden units, any boolean function can be computed with a single hidden layer. 30 Backpropagation Learning Rule • Each weight changed by: w ji   j oi  j  o j (1  o j )(t j  o j )  j  o j (1  o j ) k wkj if j is an output unit if j is a hidden unit k where η is a constant called the learning rate tj is the correct teacher output for unit j δj is the error measure for unit j 31 Error Backpropagation • First calculate error of output units and use this to change the top layer of weights. Current output: oj=0.2 Correct output: tj=1.0 Error δj = oj(1–oj)(tj–oj) 0.2(1–0.2)(1–0.2)=0.128 Update weights into j output hidden w ji   j oi input Error Backpropagation • Next calculate error for hidden units based on errors on the output units it feeds into. output  j  o j (1  o j )  k wkj hidden k input 33 Error Backpropagation • Finally update bottom layer of weights based on errors calculated for hidden units. output  j  o j (1  o j )  k wkj k hidden Update weights into j w ji   j oi input 34 Backpropagation Training Algorithm Create the 3-layer network with H hidden units with full connectivity between layers. Set weights to small random real values. Until all training examples produce the correct value (within ε), or mean squared error ceases to decrease, or other termination criteria: Begin epoch For each training example, d, do: Calculate network output for d’s input values Compute error between current output and correct output for d Update weights by backpropagating error and using learning rule End epoch 35 Conclusion on Backpropogation • Not guaranteed to converge to zero training error, may converge to local optima or oscillate indefinitely. • However, in practice, does converge to low error for many large networks on real data. • Many epochs (thousands) may be required, hours or days of training for large networks. • To avoid local-minima problems, run several trials starting with different random weights (random restarts). – Take results of trial with lowest training set error. – Build a committee of results from multiple trials (possibly weighting votes by training set accuracy). 36 Artificial neural networks Inputs Output An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs. Artificial neural networks Tasks to be solved by artificial neural networks: • controlling the movements of a robot based on self-perception and other information (e.g., visual information); • deciding the category of potential food items (e.g., edible or non-edible) in an artificial world; • recognizing a visual object (e.g., a familiar face); • predicting where a moving object goes, when a robot wants to catch it. Learning in biological systems Learning = learning by adaptation The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behavior. At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones. Learning as optimisation The objective of adapting the responses on the basis of the information received from the environment is to achieve a better state. E.g., the animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy. In other words, the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state. Learning in biological neural networks The learning rules of Hebb: • synchronous activation increases the synaptic strength; • asynchronous activation decreases the synaptic strength. These rules fit with energy minimization principles. Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed. Learning principle for artificial neural networks ENERGY MINIMIZATION We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons. ENERGY = measure of task performance error Neural network mathematics Inputs Output  y11  2  1  y1  f ( y1 , w12 )  y32   2 2 3 y 12  f ( x 2 , w12 ) y1   y 2  2 2 1 2 y  f ( y , w y  y   y 2  f ( y , w2 ) Out 1) 3  1  2  y 31  f ( x3 , w31 )  y3  2 1 2 y3  1  y3  f ( y , w3 )    y4  y 14  f ( x 4 , w14 ) y11  f ( x1 , w11 ) Neural network mathematics Neural network: input / output transformation yout  F ( x,W ) W is the matrix of all weight vectors. MLP neural networks MLP = multi-layer perceptron Perceptron: yout  wT x x yout MLP neural network: 1 y1k   w1 kT x  a1k 1 e y1  ( y11 , y12 , y31 )T , k  1,2,3 1 y k2   w 2 kT y 1  a k2 1 e y 2  ( y12 , y 22 )T 2 , k  1,2 yout   wk3 y k2  w3T y 2 k 1 x yout RBF neural networks RBF = radial basis function r ( x)  r (|| x  c ||) f ( x)  e Example: 4 y out   wk2  e k 1 || x  w1,k ||2  2( ak ) 2 || x  w||2  2a 2 Gaussian RBF x yout Neural network tasks • control • classification • prediction • approximation These can be reformulated in general as FUNCTION APPROXIMATION tasks. Approximation: given a set of values of a function g(x) build a neural network that approximates the g(x) values for any input x. Neural network approximation Task specification: Data: set of value pairs: (xt, yt), yt=g(xt) + zt; zt is random measurement noise. Objective: find a neural network that represents the input / output transformation (a function) F(x,W) such that F(x,W) approximates g(x) for every x Learning to approximate Error measure: 1 E N N 2 ( F ( x ; W )  y )  t t t 1 Rule for changing the synaptic weights: E wi  c  (W ) j wi j wi j , new  wi  wi j j c is the learning parameter (usually a constant) Learning with a perceptron T y  w x Perceptron: out 1 2 N Data: ( x , y1 ), ( x , y2 ),...,( x , y N ) 2 T t 2 E ( t )  ( y ( t )  y )  ( w ( t ) x  y ) Error: out t t Learning:  ( w(t )T x t  yt ) 2 E (t ) wi (t  1)  wi (t )  c   wi (t )  c  wi wi wi (t  1)  wi (t )  c  ( w(t )T x t  yt )  xit m w(t ) x   w j (t )  x tj T j 1 A perceptron is able to learn a linear function. Learning with RBF neural networks M 2 RBF neural network:yout  F ( x, W )   wk  e || x  w1,k ||2  2( ak ) 2 k 1 1 2 N ( x , y ), ( x , y ),..., ( x , yN ) Data: 1 2 M Error: E (t )  ( y (t ) out  yt )  ( wk2 (t )  e 2 || x t  w1,k ||2  2 ( ak ) 2 k 1 Learning:  yt ) 2 E (t ) w (t  1)  w (t )  c  wi2 2 i 2 i E (t ) t  2  ( F ( x ,W (t ))  yt )  e 2 wi  || x t  w1,i ||2 2 ( ai ) 2 Only the synaptic weights of the output neuron are modified. An RBF neural network learns a nonlinear function. Learning with MLP neural networks MLP neural network: 1  w1 kT x  a1k 1 e y1  ( y11 ,..., y1M )T , k  1,..., M 1 1 with p layers x y1k  you t y k2  1  w 2 kT y 1  a k2 1 e y 2  ( y12 ,..., y M2 )T , k  1,..., M 2 2 ... yout  F ( x;W )  w pT y p 1 1 2 … p-1 p 1 2 N ( x , y ), ( x , y ),..., ( x , yN ) Data: 1 2 Error: E(t )  ( y(t ) out  yt ) 2  ( F ( x t ;W )  yt ) 2 It is very complicated to calculate the weight changes. Learning with backpropagation Solution of the complicated learning: • calculate first the changes for the synaptic weights of the output neuron; • calculate the changes backward starting from layer p-1, and propagate backward the local error terms. The method is still relatively complicated but it is much simpler than the original optimisation problem. Learning with general optimisation In general it is enough to have a single layer of nonlinear neurons in a neural network in order to learn to approximate a nonlinear function. In such case general optimisation may be applied without too much difficulty. Example: an MLP neural network with a single hidden layer:M 1 2 yout  F ( x;W )   wk  k 1 1  e w xa 1,kT k Learning with general optimisation Synaptic weight change rules for the output neuron: E (t ) w (t  1)  w (t )  c  wi2 2 i 2 i E (t ) 1 t  2  ( F ( x , W ( t ))  y )  1,iT t t wi2 1  e  w x  ai Synaptic weight change rules for the neurons of the hidden layer: w (t  1)  w (t )  c  Ew(t ) 1, i j 1, i j 1, i j E (t )  t  2  ( F ( x , W ( t ))  y )  t w1j,i w1j,i  w1j,i 1   w 1 e 1,iT  ew     1  ew 1,iT x  ai t  1   w 1 e 1,iT x t  ai 1,iT x t  ai  2  x  ai t       w1,iT x t  ai 1, i w j     w1,iT x t  ai   x tj 1,i w j ew 1,iT w (t  1)  w (t )  c  2  ( F ( x , W (t ))  yt )  1, i j 1, i j t 1  e w x t  ai 1,iT x  ai t  2  ( x tj ) Conclusion on Neural Network • Many applications: speech recognition, automated vehicles, game playing, handwriting recognition. • More efficient training methods – Quickprop – Conjugate gradient • More biologically plausible learning – Hebbian theory. • Learn something more, unsupervised learning, cognitive science, biological science and etc. 56

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Neural network