Download Neural Networks

Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 6 — Neural Networks Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science University of South Australia 2017/5/24 Chap7 NN/Zhongzhi Shi 1 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Networks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 2 Introduction • (Artificial) Neural networks are – computational models which mimic the brain's learning processes. – They have the essential features of neurons and their interconnections as found in the brain. – Typically, a computer is programmed to simulate these features. • Other definitions … A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resemble the brain in two respects: • Knowledge is acquired by the network from its environment through a learning process • Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 2017/5/24 Chap7 NN/Zhongzhi Shi 3 Neural Networks • A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task: – Knowledge about the learning task is given in the form of examples. – Inter neuron connection strengths (weights) are used to store the acquired information (the training examples). – During the learning process the weights are modified in order to model the particular learning task correctly on the training examples. 2017/5/24 Chap7 NN/Zhongzhi Shi 4 Biological Neural Systems • The brain is composed of approximately 100 billion (1011) neurons A typical neuron collects signals from other neurons through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin strand known as an axon, which splits into thousands of branches. Axon Synapse Dendrites Schematic drawing of two biological neurons connected by synapses At the end of the branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on the other changes 2017/5/24 Chap7 NN/Zhongzhi Shi 5 Dimensions of a Neural Network • • • • Various types of neurons Various network architectures Various learning algorithms Various applications 2017/5/24 Chap7 NN/Zhongzhi Shi 6 History “…Nervous Activity” (1943) was a decade before Hodgkin, Huxley.  Created logical gates with simple threshold activations.  Hand designed connections with locked weight assignments.  2017/5/24 Chap7 NN/Zhongzhi Shi 7 McCulloch-Pitts Neuron 2017/5/24 Chap7 NN/Zhongzhi Shi 8 History   Rosenblatt (1958) wires McCulloch-Pitts neurons with a training procedure. Rosenblatt’s Perceptron (Rosenblatt, F., Principles of Neurodynamics, New York: Spartan Books, 1962).  1969 Minsky: Failure with linearly separable problems – XOR [(X1=T & X2=F) or (X1=F & X2=T)] – Weakness repaired with hidden layers (Minsky, M. and Papert, S., Perceptrons, MIT Press, Cambridge, 1969). 2017/5/24 Chap7 NN/Zhongzhi Shi 9 History   1982 Hopfield network model Late 1980’s - NN re-emerge with Rumelhart and McClelland (Rumelhart, D., McClelland, J., Parallel and Distributed Processing, MIT Press, Cambridge, 1988). 2017/5/24 Chap7 NN/Zhongzhi Shi 10 The Neuron • The neuron is the basic information processing unit of a NN. It consists of: 1 A set of synapses or connecting links, each link characterized by a weight: W1, W2, …, Wm 2 An adder function (linear combiner) which m computes the weighted sum of the inputs: j 1 u   wjxj 3 Activation function (squashing function)  for limiting the amplitude of the output of the neuron. y   (u  b) 2017/5/24 Chap7 NN/Zhongzhi Shi 11 The Neuron Bias b x1 Input signal w1 x2  w2  xm  Local Field v Activation function  () Output y Summing function wm Synaptic weights 2017/5/24 Chap7 NN/Zhongzhi Shi 12 Bias of a Neuron • Bias b has the effect of applying an affine transformation to u v=u+b • v is the induced field of the neuron v u m u   wjxj j 1 2017/5/24 Chap7 NN/Zhongzhi Shi 13 Bias as extra input • Bias is an external parameter of the neuron. Can be m modeled by adding an extra input. x0 = +1 x1 Input signal w0 j 0 w0  b w1 x2  w2  xm v   wj xj Local Field v Summing function  wm Synaptic weights Activation function  () Output y Hebbian Learning • Simultaneous activation causes increased synaptic strength • Asynchronous activation causes weakened synaptic connection • Pruning • Hebbian, anti-Hebbian, and nonHebbian connections 2017/5/24 Chap7 NN/Zhongzhi Shi 15 Common Rules • Simplify the complexity • Task specific 2017/5/24 Chap7 NN/Zhongzhi Shi 16 Biological Firing Rate • Average Firing Rate depends on biological effects such as leakage, saturation, and noise 2017/5/24 Chap7 NN/Zhongzhi Shi 17 Training Methods • Supervised training • Unsupervised training – Self-organization • Back-Propagation • Simulated annealing • Credit assignment 2017/5/24 Chap7 NN/Zhongzhi Shi 18 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Netowrks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 19 Perceptron : Single Layer Feed-forward Rosenblatt’s Perceptron: a network of processing elements (PE): Input layer of source nodes 2017/5/24 Output layer of neurons Chap7 NN/Zhongzhi Shi 20 Perceptron : Multi layer feed-forward 3-4-2 Network Output layer Input layer Hidden Layer 2017/5/24 Chap7 NN/Zhongzhi Shi 21 Perceptron: Learning Rule • Err = T – O – O is the predicted output – T is the correct output • Wj  Wj + α * Ij * Err – Ij is the activation of a unit j in the input layer – α is a constant called the learning rate 2017/5/24 Chap7 NN/Zhongzhi Shi 22 Perceptron 2 .5 1 .3  =-1 2(0.5) + 1(0.3) + -1 = 0.3 , O=1 Learning Procedure: Randomly assign weights (between 0-1) Present inputs from training data Get output O, nudge weights to gives results toward our desired output T Repeat; stop when no errors, or enough epochs completed 2017/5/24 Chap7 NN/Zhongzhi Shi 23 Least Mean Square learning LMS = Least Mean Square learning Systems, more general than the previous perceptron learning rule. The concept is to minimize the total error, as measured over all training examples, P. O is the raw output, as calculated by  wi Ii   i 1 2 Dis tan ce( LMS )   TP  OP  2 P E.g. if we have two patterns and T1=1, O1=0.8, T2=0, O2=0.5 then D=(0.5)[(1-0.8)2+(0-0.5)2]=.145 We want to minimize the LMS: C-learning rate E W(old) W(new) 2017/5/24 Chap7 NN/Zhongzhi Shi W 24 Activation Function • To apply the LMS learning rule, also known as the delta rule, we need a differentiable activation function. wk  cI k T j  O j  f '  ActivationFunction  Old:  1 :  wi I i    0  O i    0 : otherwise   2017/5/24  New: Chap7 NN/Zhongzhi Shi O 1 e 1   wi I i   i 25 The activities within a processing unit 2017/5/24 Chap7 NN/Zhongzhi Shi 26 Representation of a processing unit 2017/5/24 Chap7 NN/Zhongzhi Shi 27 A neural network with two different programs 2017/5/24 Chap7 NN/Zhongzhi Shi 28 Uppercase C and uppercase T 2017/5/24 Chap7 NN/Zhongzhi Shi 29 Various orientations of the letters C and T 2017/5/24 Chap7 NN/Zhongzhi Shi 30 The structure of the character recognition system 2017/5/24 Chap7 NN/Zhongzhi Shi 31 The letter C in the field of view 2017/5/24 Chap7 NN/Zhongzhi Shi 32 The letter T in the field of view 2017/5/24 Chap7 NN/Zhongzhi Shi 33 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Netowrks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 34 Back-Propagated Delta Rule Networks (BP) • Inputs are put through a ‘Hidden Layer’ before the output layer • All nodes connected between layers 2017/5/24 ... Input 0 Input 1 H0 H1 ... Hm O0 O1 ... Oo Output 0 Output 1 Chap7 NN/Zhongzhi Shi ... Input n Hidden Layer Output o 35 Learning Rule Measure error Reduce that error By appropriately adjusting each of the weights in the network 2017/5/24 Chap7 NN/Zhongzhi Shi 36 BP Network Details • Forward Pass: – Error is calculated from outputs – Used to update output weights • Backward Pass: – Error at hidden nodes is calculated by back propagating the error at the outputs through the new weights – Hidden weights updated 2017/5/24 Chap7 NN/Zhongzhi Shi 37 Learning Rule – Back-propagation Network • Erri = Ti – Oi • Wj,i  Wj,i + α * aj * Δi – Δi = Erri * g’(ini) – g’ is the derivative of the activation function g – aj is the activation of the hidden unit • Wk,j  Wk,j + α * Ik * Δj – Δj = g’(inj) * ΣiWj,i * Δi 2017/5/24 Chap7 NN/Zhongzhi Shi 38 Backpropagation Networks • To bypass the linear classification problem, we can construct multilayer networks. Typically we have fully connected, feedforward networks. Input Layer I1 Output Layer Hidden Layer O1 H1 I2 O( x)  H2 I3 1 Wi,j 1’s - bias 2017/5/24 1 O2 Wj,k 1 e H ( x)  1 e Chap7 NN/Zhongzhi Shi 1  w j , x H j j 1   wi , x I i i 39 Back Propagation We had computed: wk  cI k T j  O j  f '  ActivationFunction; wk  cI k T j  O j  f ( sum )(1  f ( sum )   1  f  sum  1 e  For the Output unit k, f(sum)=O(k). For the output units, this is: w j ,k  cH j Tk  Ok Ok (1  Ok ) For the Hidden units (skipping some math), this is: wi , j  cH j (1  H j ) I i  (Tk  Ok )Ok (1  Ok )w j ,k k I 2017/5/24 H Wi,j O Wj,k Chap7 NN/Zhongzhi Shi 40 Learning Rule – Back-propagation Network • E = 1/2Σi(Ti – Oi)2 • E = - Ik * Δj Wk , j 2017/5/24 Chap7 NN/Zhongzhi Shi 41 BP Algorithm Learning Procedure: 1. Randomly assign weights (between 0-1) 2. Present inputs from training data, propagate to outputs 3. Compute outputs O, adjust weights according to the delta rule, backpropagating the errors. The weights will be nudged closer so that the network learns to give the desired output. 4. Repeat; stop when no errors, or enough epochs completed 2017/5/24 Chap7 NN/Zhongzhi Shi 42 Backpropagation • Very powerful - can learn any function, given enough hidden units! With enough hidden units, we can generate any function. • Have the same problems of Generalization vs. Memorization. With too many units, we will tend to memorize the input and not generalize well. Some schemes exist to “prune” the neural network. • Networks require extensive training, many parameters to fiddle with. Can be extremely slow to train. May also fall into local minima. • Inherently parallel algorithm, ideal for multiprocessor hardware. • Despite the cons, a very powerful algorithm that has seen widespread successful deployment. 2017/5/24 Chap7 NN/Zhongzhi Shi 43 Prediction by BP 2017/5/24 Chap7 NN/Zhongzhi Shi 44 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Netowrks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 45 Recurrent Connections • A sequence is a succession of patterns that relate to the same object. • For example, letters that make up a word or words that make up a sentence. • Sequences can vary in length. This is a challenge. • How many inputs should there be for varying length inputs ? 2017/5/24 Chap7 NN/Zhongzhi Shi 46 The simple recurrent network • Jordan network has connections that feed back from the output to the input layer and also some input layer units feed back to themselves. • Useful for tasks that are dependent on a sequence of a successive states. • The network can be trained by backpropogation. • The network has a form of short-term memory. • Simple recurrent network (SRN) has a similar form of short-term memory. 2017/5/24 Chap7 NN/Zhongzhi Shi 47 Jordan Network 2017/5/24 Chap7 NN/Zhongzhi Shi 48 Elman Network (SRN) The number of context units is the same as the number of hidden units 2017/5/24 Chap7 NN/Zhongzhi Shi 49 Short-term memory in SRN • The context units remember the previous internal state. • Thus, the hidden units have the task of mapping both an external input and also the previous internal state to some desired output. 2017/5/24 Chap7 NN/Zhongzhi Shi 50 Recurrent network Recurrent Network with hidden neuron(s): unit delay operator z-1 implies dynamic system z-1 input hidden output z-1 z-1 2017/5/24 Chap7 NN/Zhongzhi Shi 51 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Netowrks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 52 Hopfield Network • John Hopfield (1982) – Associative Memory via artificial neural networks – Solution for optimization problems – Statistical mechanics 2017/5/24 Chap7 NN/Zhongzhi Shi 53 Associative memory • Nature of associative memory – part of information given – the rest of the pattern is recalled 2017/5/24 Chap7 NN/Zhongzhi Shi 54 Physical Analogy with Memory • The location of bottom of the bowl (X0) represents the stored pattern • Ball’s initial position represents the partial knowledge • In corrugated surface, can store {X1, X2,…, Xn} memories, and recall one which is closest to the initial state 2017/5/24 Chap7 NN/Zhongzhi Shi 55 Key Elements of Associative Net • It is completely described by a state vector v = (v1, v2, …, vm) • There are set of stable states v1 , v2 , …, vn . These correspond to the stored patterns • System evolves from arbitrary starting state v to one of the stable states by decreasing its energy E 2017/5/24 Chap7 NN/Zhongzhi Shi 56 Hopfield 网络 • Every node is connected to every other nodes • Weights are symmetric • Recurrent network • State of the net is given by the vector of the node outputs (x1, x2, x3) 2017/5/24 Chap7 NN/Zhongzhi Shi 57 Neurons in Hopfield Network • The neurons are binary units – They are either active (1) or passive – Alternatively + or – • The network contains N neurons • The state of the network is described as a vector of 0s and 1s: U  (u1 , u2 ,..., u N )  (0,1,0,1,...,0,0,1) 2017/5/24 Chap7 NN/Zhongzhi Shi 58 State Transition • Choose one randomly, fire it. • Calculate activation and transit the state – transit to itself, or to another state at Hamming distance 2017/5/24 Chap7 NN/Zhongzhi Shi 59 State Transition Diagram • Transition tend to take place down • Show to reflect the way the system decreases its energy • No matter where we start in the diagram, final states will be either 3 or 6 2017/5/24 Chap7 NN/Zhongzhi Shi 60 Defining Energy for the Net • If two nodes i and j is connected by positive weight – if i=1, j=0, and j fires, input from i through positive weight let j becomes 1 – if both nodes are 1, they reinforce each other’s current output • Define energy function eij = -wijxixj • The energy of the net is the summing of them E  eij w xi x j Pairs 2017/5/24 (2) kj Chap7 NN/Zhongzhi Shi 61 The architecture of Hopfield Network • The network is fully interconnected – All the neurons are connected to each other – The connections are bidirectional and symmetric Wi , j  W j ,i – The setting of weights depends on the application 2017/5/24 Chap7 NN/Zhongzhi Shi 62 Defining Energy for the Net • Since the weight is symmetric E 1 x xj  w kj i 2 i, j (3) • If node k is selected, E 1  xx 2 i  k , wkj i j j k   - 1  w xk xi - 1  w xi xk 2 2 ki ik i 1  x x  i wkixk xi 2 i  k , wkj i (4) i (5) j k  S  x  wki x k i i (6)  S  x ak k (7) 2017/5/24 Chap7 NN/Zhongzhi Shi 63 Defining Energy for the Net • Energy after node k is updated : E’ E '  S  x' a k k E  - x a k k (8) (9) – ak  0, then output goes from 0 to 1 or stays 1. In either case xk  0 , and E  0 – ak < 0, then output goes from 1 to 0 or stays 0. In either case xk < 0 , and E  0 – For any node is selected, the energy of the net decreases or stays the same – At most N changes of the states, a stable state has been reached 2017/5/24 Chap7 NN/Zhongzhi Shi 64 Asynchronous vs. Synchronous update • Asynchronous update : only one node is update at any time step • Synchronous update : every nodes are update at the same time – need to store both the current state vector and the next state vector 2017/5/24 Chap7 NN/Zhongzhi Shi 65 Finding the Weight • Stable state – nodes do not change their values – weights reinforce the node values – same valued pairs are reinforced by positive weight – different valued pairs are reinforced by negative weight • Storage prescription of Hopfield net is – vi = {-1, 1} : spin representation – w   v p v jp (1) ij 2017/5/24 p i Chap7 NN/Zhongzhi Shi 66 Hebb Rule • Training algorithm For each training patterns present the components of the pattern at the outputs nodes if two nodes have the same value then make small positive increment to the internode weight else make small negative decrement to the internode weight endif endfor 2017/5/24 Chap7 NN/Zhongzhi Shi 67 Hebb Rule • The learning rule (Hebb Rule) may be written w  vip v jp ij (2) • Original rule proposed by Hebb (1949) The organization behavior When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes parts in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. That is, the correlation of activity between two cells is reinforced by increasing the synaptic strength between them. 2017/5/24 Chap7 NN/Zhongzhi Shi 68 Updating the Hopfield Network • The state of the network changes at each time step. There are four updating modes: – Serial – Random: • The state of a randomly chosen single neuron will be updated at each time step – Serial-Sequential : • The state of a single neuron will be updated at each time step, in a fixed sequence – Parallel-Synchronous: • All the neurons will be updated at each time step synchronously – Parallel Asynchronous: • The neurons that are not in refractoriness will be updated at the same time 2017/5/24 Chap7 NN/Zhongzhi Shi 69 The updating Rule • Here we assume that updating is serialRandom • Updating will be continued until a stable state is reached. – Each neuron receives a weighted sum of the inputs from other neurons: N h j   ui .w j ,i i 1 i j – If the input h j is positive the state of the neuron will be 1, otherwise 0: 2017/5/24 1 uj   0 if h j  0 if h  0 Chap7 NN/Zhongzhi Shi j 70 Convergence of the Hopfield Network • Does the network eventually reach a stable state (convergence)? • To evaluate this a ‘energy’ value will be associated to the network: N 1 E    w j ,i ui u j 2 j i 1 i j • The system will be converged if the energy is minimized 2017/5/24 Chap7 NN/Zhongzhi Shi 71 Convergence of the Hopfield Network • Why energy? – An analogy with spin-glass models of Ferromagnetism (Ising model): k wi , j  1 0 0 1 1 1 0 1 0 1 1 1 i 1 i j 1 1 0 0 h j   w j ,i ui : the local field exerted upon the unit j 0 0 1 N 1 1 1 1 1 1 1 , d i , j  distance d i, j u j : the spin of unit j 1 1 0 2 1 0 1 e j   h j u j : The potential energy of unit j 2 E   e j : The overal potential energy of the system j N 1 E    w j ,i ui u j 2 j i 1 Chap7ifNN/Zhongzhi i  j Shi – The system is stable the energy is minimized 2017/5/24 72 Convergence of the Hopfield Network • Why convergence? N h j   ui .w j ,i i 1 i j 1 uj   0 if h j  0 if h j  0 N N 1 1 1 E    w j ,i ui u j    u j  w j ,i ui    u j h j 2 j i 1 2 j 2 j i 1 i j i j if h j  0 and u j  1 then u j will not change  u j h j  h j  0 if h j  0 and u j  0 then u j will change  u j h j  0 if h j  0 and u j  1 then u j will not change  u j h j  0 if h j  0 and u j  1 then u j will change  u j h j  h j  0 in each case u j h j is maximum when u j does not change  1 Chap7 NN/Zhongzhi Shi E  -  u j h j is minimum if u j values do not change 2 2017/5/24 73 Convergence of the Hopfield Network (4) • The changes of E with updating: N h j   ui .w j ,i i 1 i j 1 uj   0 E  Enew  Eold  ( if h j  0 if h j  0 N 1 E    w j ,i ui u j   1  u j h j 2 j i 1 2 j i j 1 1 1 1 1 1 u j h j  uk newhk )  (  u j h j  uk old hk )   (uk new  uk old )hk   uk .hk  2 j k 2 2 j k 2 2 2 1 if uk old  1 and hk  0  uk new  1  uk  0   uk .hk  0 2 1 if uk old  1 and hk  0  uk new  0  uk  1   uk .hk  0 2 1 if uk old  0 and hk  0  uk new  0  uk  1   uk .hk  0 2 1 if uk old  0 and hk  0  uk new  0  uk  1   uk .hk  0 2 In each case the energy will decrease or remains constant thus the system tends to Stabilize. 2017/5/24 Chap7 NN/Zhongzhi Shi 74 The Energy Function: • The energy function is similar to a multidimensional (N) terrain Local Minimum Local Minimum Global Minimum 2017/5/24 Chap7 NN/Zhongzhi Shi 75 Hopfield network as a model for associative memory • Associative memory – Associates different features with eacother • Karen  green • George  red • Paul  blue – Recall with partial cues 2017/5/24 Chap7 NN/Zhongzhi Shi 76 Neural Network Model of associative memory • Neurons are arranged like a grid: 2017/5/24 Chap7 NN/Zhongzhi Shi 77 Setting the weights • Each pattern can be denoted by a vector of -1s or 1s:S  (1,1,1,1,...,1,1,1)  (s p , s p , s p ,...s p ) p 1 2 3 N • If the number of patterns is m then: m wi , j   si s p j p p 1 • Hebbian Learning: – The neurons that fire together , wire together 2017/5/24 Chap7 NN/Zhongzhi Shi 78 Learning in Hopfield net 2017/5/24 Chap7 NN/Zhongzhi Shi 79 Storage Capacity • As the number of patterns (m) increases, the chances of accurate storage must decrease • Hopfield’s empirical work in 1982 – About half of the memories were stored accurately in a net of N nodes if m = 0.15N • McCliece’s analysis in 1987 – If we require almost all the required memories to be stored accurately, then the maximum number of patterns m is N/2logN – For N = 100, m = 11 2017/5/24 Chap7 NN/Zhongzhi Shi 80 Limitations of Hopfield Net • The number of patterns that can be stored and accurately recalled is severely limited – If too many patterns are stored, net may converge to a novel spurious pattern : no match output • Exemplar pattern will be unstable if it shares many bits in common with another exemplar pattern 2017/5/24 Chap7 NN/Zhongzhi Shi 81 Outline • • • • • • • Introduction Perceptron Back Propagation Recurrent network Hopfield Netowrks Self-Organization Maps Summary 2017/5/24 Chap7 NN/Zhongzhi Shi 82 Self-Organization Maps • Kohonen (1982, 1984) • In biological systems – cells tuned to similar orientations tend to be physically located in proximity with one another – microelectrode studies with cats • Orientation tuning over the surface forms a kind of map with similar tunings being found close to each other – topographic feature map – Train a network using competitive learning to create feature maps automatically 2017/5/24 Chap7 NN/Zhongzhi Shi 83 SOM Clustering • Self-organizing map (SOM) – A unsupervised artificial neural network – Mapping high-dimensional data into a twodimensional representation space – Similar documents may be found in neighboring regions • Disadvantages – Fixed size in terms of the number of units and their particular arrangement – Hierarchical relations between the input data are not mirrored in a straight-forward manner 2017/5/24 Chap7 NN/Zhongzhi Shi 84 Features • Kohonen’s algorithm creates a vector quantizer by adjusting weight from common input nodes to M output nodes • Continuous valued input vectors are presented without specifying the desired output • After the learning, weight will be organized such that topologically close nodes are sensitive to inputs that are physically similar • Output nodes will be ordered in a natural manner 2017/5/24 Chap7 NN/Zhongzhi Shi 85 Initial setup of SOM • Consists a set of units i in a twodimension grid • Each unit i is assigned a weight vector mi as the same dimension as the input data • The initial weight vector is assigned random values 2017/5/24 Chap7 NN/Zhongzhi Shi 86 Winner Selection • Initially, pick up a random input vector x(t) • Compute the unit c with the highest activity level (the winner c(t)) by Euclidean distance formula 2017/5/24 Chap7 NN/Zhongzhi Shi 87 Learning Process (Adaptation) • Guide the adaptation by a learning-rate  (tune weight vectors from the random initialization value towards the actual input space) • Decrease neighborhood around the winner towards the currently presented input pattern (map input onto regions close to each other in the grid of output pattern, viewed as a neural network version of k-means clustering) 2017/5/24 Chap7 NN/Zhongzhi Shi 88 Learning Process (Adaptation) 2017/5/24 Chap7 NN/Zhongzhi Shi 89 Neighborhood Strategy • Neighborhood-kernel hci • A guassian is used to define neighborhoodkernel • ||rc-ri||2 denotes the distance between the winner node c and input vector i • A time-varying parameter  enable formation of large clusters in the beginning and finegrained input discrimination towards the end of the learning process 2017/5/24 Chap7 NN/Zhongzhi Shi 90 Within-Cluster Distance v.s. Between-Clusters Distance 2017/5/24 Chap7 NN/Zhongzhi Shi 91 SOM Algorithm Initialise Network Get Input Find Focus Update Focus Update Neighbourhood Adjust neighbourhood size 2017/5/24 Chap7 NN/Zhongzhi Shi 92 Learning • Decreasing the neighbor ensures progressively finer features are encoded • gradual lowering of the learn rate ensures stability 2017/5/24 Chap7 NN/Zhongzhi Shi 93 Basic Algorithm – Initialize Map (randomly assign weights) – Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. For all output units j=1 to m and input units i=1 to n Find the one that minimizes:  W n i 1 ij 2  Ii  • Modify weights on the winner to more closely match the input W t 1  c( X it  W t ) where c is a small positive learning constant that usually decreases as the learning proceeds 2017/5/24 Chap7 NN/Zhongzhi Shi 94 Result of Algorithm • Initially, some output nodes will randomly be a little closer to some particular type of input • These nodes become “winners” and the weights move them even closer to the inputs • Over time nodes in the output become representative prototypes for examples in the input • Note there is no supervised training here • Classification: – Given new input, the class is the output node that is the winner 2017/5/24 Chap7 NN/Zhongzhi Shi 95 Typical Usage: 2D Feature Map • In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner Output Layers Input Layer O11 O12 O13 O14 O15 O21 O22 O23 O24 O25 O31 O32 O33 O34 O35 O41 O42 O43 O44 O45 O51 O52 O53 O54 O55 I1 I2 … I3 2017/5/24 Chap7 NN/Zhongzhi Shi 96 Modified Algorithm – Initialize Map (randomly assign weights) – Loop over training examples • Assign input unit values according to the values in the current example • Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g. • Modify weights on the winner to more closely match the input • Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input – Over time this will tend to cluster similar items closer on the map 2017/5/24 Chap7 NN/Zhongzhi Shi 97 Updating the Neighborhood • Node O44 is the winner – Color indicates scaling to update neighbors Output Layers W t 1  c( X it  W t ) O11 O12 O13 O14 O15 O21 O22 O23 O24 O25 c=1 O31 O32 O33 O34 O35 c=0.75 O41 O42 O43 O44 O45 c=0.5 O51 2017/5/24 O52 O53 O54 O55 Chap7 NN/Zhongzhi Shi Consider if O42 is winner for some other input; “fight” over claiming O43, O33, O98 53 Selecting the Neighborhood • Typically, a “Sombrero Function” or Gaussian function is used • Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds 2017/5/24 Chap7 NN/Zhongzhi Shi 99 Hierarchical and Partitive Approaches • Partitive algorithm – – – – – Determine the number of clusters. Initialize the cluster centers. Compute partitioning for data. Compute (update) cluster centers. If the partitioning is unchanged (or the algorithm has converged), stop; otherwise, return to step 3 • k-means error function – To minimize error function 2017/5/24 Chap7 NN/Zhongzhi Shi 100 Hierarchical and Partitive Approaches • Hierarchical clustering algorithm (Dendrogram) – – – – Initialize: Assign each vector to its own cluster Compute distances between all clusters. Merge the two clusters that are closest to each other. Return to step 2 until there is only one cluster left. • Partition strategy – Cut at different level 2017/5/24 Chap7 NN/Zhongzhi Shi 101 Hierarchical SOM • GHSOM – Growing Hierarchical SelfOrganizing Map – grow in size in order to represent a collection of data at a particular level of detail 2017/5/24 Chap7 NN/Zhongzhi Shi 102 Plastic Self Organising Maps • Family of similar networks • Adds neurons using error threshold • Uses link lengths for pruning • Unconnected neurons removed • Converges quickly 2017/5/24 Chap7 NN/Zhongzhi Shi 103 PSOM Architecture Neuron Vector 0.1 0.6 0.2 0.4 2017/5/24 Neuron Weight Chap7 NN/Zhongzhi Shi 104 PSOM Algorithm 10 Initialise Accept Input Remove Unconnected neurons Find focus Yes Create Neurons Is euclidean Distance between Input and focus larger than an? No Remove Long links 2017/5/24 Age links Update focus and neighbourhood Chap7 NN/Zhongzhi Shi 105 Important Parameters • An is the Node Building Parameter and controls the ease of adding neurons • Ba is the link aging parameter and controls the speed of aging 2017/5/24 Chap7 NN/Zhongzhi Shi 106 Letter and Word recognition Rumelhart & Zipser (1986) • Training set {AA, AB, BA, BB} – 2 units learn to detect either A or B in an particular serial position – 4 units learn the pairs : word detector • Training set {AA, AB, AC, AD, BA, BB, BC, BD} – 2 units learn to respond pairs start with A or B – 4 units learn to recognize pairs end with A, B, C, or D – 8 units learn to learn the pairs 2017/5/24 Chap7 NN/Zhongzhi Shi 107 SOM-Graphics Well trained net should have same topology as that in the physical space, and will reflect the properties of the training set 2017/5/24 Chap7 NN/Zhongzhi Shi 108 Face Recognition 90% accurate learning head pose, and recognizing 1-of-20 faces 2017/5/24 Chap7 NN/Zhongzhi Shi 109 Handwritten digit recognition 2017/5/24 Chap7 NN/Zhongzhi Shi 110 Summary Basics of neural network theory and practice for supervised and unsupervised learning. Most popular Neural Network models: • • • • • Perceptron Back Propagation Recurrent network Hopfield Networks Self-Organization Maps 2017/5/24 Chap7 NN/Zhongzhi Shi 111 References [1] Simon Haykin. Neural Networks: A Comprehensive Foundation. 2nd Edition, 1998, Prentice Hall [2] Neural Networks Software, URL: http://cortex.snowseed.com/cortex.htm [3] Sun Hwa Hahn. Neural Network (IV): Hopfield Net & Kohonen’s SelfOrganization Net. Information & Communication University. http://ai.kaist.ac.kr/~jkim/cs570-2000/ Lectures/Dr.SHHahn's/NeuralNetwork4.ppt [4] [4] Harry R. Erwin. Neural Network Toolbox. http://www.cet.sunderland.ac.uk/ 2017/5/24 Chap7 NN/Zhongzhi Shi 112 www.intsci.ac.cn/shizz Thank you !!! 2017/5/24 Chap7 NN/Zhongzhi Shi 113

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Neural Networks