Download Talk 8

Artificial Neural Network Paradigms Marc Pomplun Department of Computer Science University of Massachusetts at Boston E-mail: [email protected] Homepage: http://www.cs.umb.edu/~marc/ Artificial Neural Network Paradigms Overview: The Backpropagation Network (BPN) Supervised Learning in the BPN The Self-Organizing Map (SOM) Unsupervised Learning in the SOM Instantaneous Learning: The Hopfield Network The Backpropagation Network The backpropagation network (BPN) is the most popular type of ANN for applications such as classification or function approximation. Like other networks using supervised learning, the BPN is not biologically plausible. The structure of the network is identical to the one we discussed before: • Three (sometimes more) layers of neurons, • Only feedforward processing: input layer  hidden layer  output layer, • Sigmoid activation functions The Backpropagation Network BPN units and activation functions: output vector y O1 H1 H2 I1 f(neto) … OK H3 … II I2 input vector x … HJ f(neth) Supervised Learning in the BPN Before the learning process starts, all weights (synapses) in the network are initialized with pseudorandom numbers. We also have to provide a set of training patterns (exemplars). They can be described as a set of ordered vector pairs {(x1, y1), (x2, y2), …, (xP, yP)}. Then we can start the backpropagation learning algorithm. This algorithm iteratively minimizes the network’s error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction (gradient-descent technique). Supervised Learning in the BPN Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): f(x) slope: f’(x0) x0 x1 = x0 - f’(x0) x Repeat this iteratively until for some xi, f’(xi) is sufficiently close to 0. Supervised Learning in the BPN Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient. Supervised Learning in the BPN In the BPN, learning is performed as follows: 1. Randomly select a vector pair (xp, yp) from the training set and call it (x, y). 2. Use x as input to the BPN and successively compute the outputs of all neurons in the network (bottom-up) until you get the network output o. 3. Compute the error opk, for the pattern p across all K output layer units by using the formula: o  pk  ( yk  ok ) f ' (netko ) Supervised Learning in the BPN 4. Compute the error hpj, for all J hidden layer units by using the formula: K o  pjh  f ' (netkh )  pk wkj k 1 5. Update the connection-weight values to the hidden layer by using the following equation: w ji (t  1)  w ji (t )   pjh xi Supervised Learning in the BPN 6. Update the connection-weight values to the output layer by using the following equation: wkj (t  1)  wkj (t )   f (net ) o pk h j Repeat steps 1 to 6 for all vector pairs in the training set; this is called a training epoch. Run as many epochs as required to reduce the network error E to fall below a threshold : P K E   ( ) p 1 k 1 o 2 pk Supervised Learning in the BPN The only thing that we need to know before we can start our network is the derivative of our sigmoid function, for example, f’(netk) for the output neurons: 1 f (net k )  1  e  net k f (net k ) f ' (net k )   ok (1  ok ) net k Supervised Learning in the BPN Now our BPN is ready to go! If we choose the type and number of neurons in our network appropriately, after training the network should show the following behavior: • If we input any of the training vectors, the network should yield the expected output vector (with some margin of error). • If we input a vector that the network has never “seen” before, it should be able to generalize and yield a plausible output vector based on its knowledge about similar input vectors. Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised learning. This is not biologically plausible: In a biological system, there is no external “teacher” who manipulates the network’s weights from outside the network. Biologically more adequate: unsupervised learning. We will study Self-Organizing Maps (SOMs) as examples for unsupervised learning (Kohonen, 1980). Self-Organizing Maps (Kohonen Maps) In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) are represented by two-dimensional maps. The projection from sensory inputs onto such maps is topology conserving. This means that neighboring areas in these maps represent neighboring areas in the sensory input space. For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions. Self-Organizing Maps (Kohonen Maps) Such topology-conserving mapping can be achieved by SOMs: • Two layers: input layer and output (map) layer • Input and output layers are completely connected. • Output neurons are interconnected within a defined neighborhood. • A topology (neighborhood relation) is defined on the output layer. Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional (completely interconnected) i i Two-dimensional (connections omitted, only neighborhood relations shown [green]) Neighborhood of neuron i Self-Organizing Maps (Kohonen Maps) A neighborhood function (i, k) indicates how closely neurons i and k in the output layer are connected to each other. Usually, a Gaussian function on the distance between the two neurons in the layer is used:  position of i position of k Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector wi for neuron i, i = 1, ..., m (2) Choose random input x (3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + ·(i, k)·(x – wi) (wi is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function  and learning parameter  and go to (2). Unsupervised Learning in SOMs Example I: Learning a one-dimensional representation of a two-dimensional (triangular) input space: 0 20 1000 10000 100 25000 Unsupervised Learning in SOMs Example II: Learning a two-dimensional representation of a two-dimensional (square) input space: Unsupervised Learning in SOMs Example III: Learning a twodimensional mapping of texture images The Hopfield Network The Hopfield model is a single-layered recurrent network. It is usually initialized with appropriate weights instead of being trained. The network structure looks as follows: X1 X2 … XN The Hopfield Network We will focus on the discrete Hopfield model, because its mathematical description is more straightforward. In the discrete model, the output of each neuron is either 1 or –1. In its simplest form, the output function is the sign function, which yields 1 for arguments  0 and –1 otherwise. The Hopfield Network For input-output pairs (x1, y1), (x2, y2), …, (xP, yP), we can initialize the weights in the following way (like associative memory): 1 P W   y p x tp N p 1 This is identical to the following formula: 1 wij  N P y p 1 (i ) p x ( j) p where xp(j) is the j-th component of vector xp, and yp(i) is the i-th component of vector yp. The Hopfield Network In the discrete version of the model, each component of an input or output vector can only assume the values 1 or –1. The output of a neuron i at time t is then computed according to the following formula:  N  oi (t )  sgn   wij o j (t  1)   j 1  This recursion can be performed over and over again. In some network variants, external input is added to the internal, recurrent one. The Hopfield Network Usually, the vectors xp are not orthonormal, so it is not guaranteed that whenever we input some pattern xp, the output will be yp, but it will be a pattern similar to yp. Since the Hopfield network is recurrent, its behavior depends on its previous state and in the general case is difficult to predict. However, what happens if we initialize the weights with a set of patterns so that each pattern is being associated with itself, (x1, x1), (x2, x2), …, (xP, xP)? The Hopfield Network This initialization is performed according to the following equation: 1 wij  N P (i ) ( j ) x  p xp p 1 You see that the weight matrix is symmetrical, i.e., wij = wji. We also demand that wii = 0, in which case the network shows an interesting behavior. It can be mathematically proven that under these conditions the network will reach a stable activation state within an finite number of iterations. The Hopfield Network And what does such a stable state look like? The network associates input patterns with themselves, which means that in each iteration, the activation pattern will be drawn towards one of those patterns. After converging, the network will most likely present one of the patterns that it was initialized with. Therefore, Hopfield networks can be used to restore incomplete or noisy input patterns. The Hopfield Network Example: Image reconstruction (Ritter, Schulten, Martinetz 1990) A 2020 discrete Hopfield network was trained with 20 input patterns, including the one shown in the left figure and 19 random patterns as the one on the right. The Hopfield Network After providing only one fourth of the “face” image as initial input, the network is able to perfectly reconstruct that image within only two iterations. The Hopfield Network Adding noise by changing each pixel with a probability p = 0.3 does not impair the network’s performance. After two steps the image is perfectly reconstructed. The Hopfield Network However, for noise created by p = 0.4, the network is unable the original image. Instead, it converges against one of the 19 random patterns. The Hopfield Network The Hopfield model constitutes an interesting neural approach to identifying partially occluded objects and objects in noisy images. These are among the toughest problems in computer vision.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Talk 8