Download neural-networks

Neural Networks Kostas Kontogiannis E&CE General Concepts • Neurons: the cells that perform information processing in the brain. It is the fundamental functional unit of all nervous system tissue, including brain • Soma: The neuron’s cell body • Dendrites: collection of fibers branching out of the soma body cell • Axon: A single long fiber in the collection of dendrites. Eventually, the axon also branches into strands and substrands that connect to the dendrites and cell bodies of other neurons • Synapse: The point where stands from two neurons connect Neural Networks • A neural network is composed of a number of nodes, or units, connected by links. Each link has a numeric weight associated with it. • Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights. • Each unit has a set of input links from other units, a set of output links to other units, a current activation level, and a means of computing the activation level at the next step in time, given its inputs and weights. Neural Networks • To build a neural network to perform some task, one must first decide how many units are to be used, what kind of units, and how the units are connected to form a network. • One then initializes the weights of the network, and “trains” the weights using a learning algorithm applied to a set of training examples for the task. • The use of examples also implies that one must decide how to encode the examples in terms of inputs and outputs of the network. Neural Networks • To build a neural network to perform some task, one must first decide how many units are to be used, what kind of units, and how the units are connected to form a network. • One then initializes the weights of the network, and “trains” the weights using a learning algorithm applied to a set of training examples for the task. • The use of examples also implies that one must decide how to encode the examples in terms of inputs and outputs of the network. Simple Computing Elements • Each unit performs a simple computation: It receives signals from its input links and computes a new activation level that it sends along each of its output links. • The computation of the activation level is based on the values of each input signal received from a neighboring node, and the weights of each input link. • The computation is split into two components. First is a linear function ini that computes weighted sum of the unit’s input values. Second is a nonlinear component called the activation function g, that transforms the weighted sum into the final value that serves as the unit’s activation value ai. Models for Activation Functions • Different models are obtained by using different mathematical functions for g. Three common choices are the step, sign, and sigmoid functions. +1 +1 +1 t ini ini -1 Step Sign Sigmoid Network Structures • There are a variety of kinds of network structure, each of which results in a very different computational properties. • The main distinction is between feed-forward and recurrent networks. • In a feed-forward network, the links can form arbitrary topologies. In essence these networks are DAGs. • Usually we deal with networks that are arranged in layers. In a layered feed-forward network, each unit is linked only to the units in the next layer; there are no links between units in the same layer, no links backward to a previous layer, and no links that skip a layer. Fundamental Network Types • Hopfield Networks: They use bi-directional connections with symmetric weights; all of the units are input and output units, the activation function g is the sign function; and the activation levels can only be +1 or -1. • Boltzmann Machines: also use symmetric weights, but include units that are neither input nor output units. They also use a stochastic activation function, such that the probability of the output being 1 is some function of the total weighted input. • Networks with no hidden units are called perceptrons. • Input units are directly connected to the external input sources. Output units are connected to the observed output. Hidden units are neither connected to input sources nor the observed output. • Networks with one or more layers of hidden units are called multi-layer networks. Perceptron Neural Network Learning function NEURAL-NETWORK-LEARNING(examples) returns network network = a network with randomly assigned weights; repeat for each e in examples do O = NEURAL-NETWORK-OUTPUT(network, e); T = the observed output values from e; update the weights in network based on e, O, T; end until all examples correctly predicted or stopping criterion is reached return network Essentially Err = T - O Wj = Wj + (a * Ij * Err) Multi-Layer Feed-Forward Networks • Initial work in the 1950’s. • Learning algorithms for multi-layer are neither efficient, nor can guarantee that they can converge to a global optimum • On the other hand, learning general functions from examples is an intractable problem in the worst case • • The most popular method for learning in multi-layer networks is called backpropagation. Back Propagation Learning • Learning in multi-layer feed-forward networks using back-propagation proceeds the same way as for perceptrons: example inputs are presented to the network, and if the network computes an output vector that matches the output, nothing is done. If there is an error, then the weights are adjusted to reduce the error. • The trick is to assess the blame for an error and divide it among the contributing weights. In perceptrons, this is easy because there is only one weight between each input and the output. But in multilayer networks, there are many weights connecting each input to an output, and each of these weights contributes to more than one output • The back-propagation algorithm is a sensible approach to dividing the contribution of each weight. Back Propagation Learning • As in the perceptron learning algorithm, we try to minimize the error between each target output and the output value computed by the network. • At the output layer, the weight update rule is very similar to the rule for the perceptrons. However, there are two differences: The activation of the hidden unit aj is used instead of the input value, and the rule contains a term for the gradient of the activation function. • If Erri is the error Ti - O at the output node, then the weight update rule for the link from unit j to unit i is W j,i = W j,i + (alpha * aj * Erri * g’(ini) • where g’ is the derivative of the activation function g, and the above can be rewritten as: Wj,i = Wj, i + alpha * aj * Deltai Back Propagation Learning • On the previous formula, for updating the connections between the input units and the hidden units, we need to define a quantity analogous to the error term for output nodes. • The idea is that hidden node j is “responsible” for some fraction of the error Deltai, in each of the output nodes to which it connects. Thus, the Deltai values are divided according to the strength of the connection between the hidden node and the output node, and propagated back to provide the Deltai values for the hidden layer. The propagation rule for the Delta values is the following: Deltai = g’(inj) * Sumi (Wj,i * Deltai) • Now the update rule for the weights between the inputs and the hidden layer is almost identical to the update rule for the output layer: W k,j = W k,j + (alpha * Ik * Deltaj) Back Propagation Learning • The learning algorithm can be summarized as follows: – Compute the Delta values for the output units using the observed behavior – Starting with the output layer, repeat the following for each layer in the network, until the earliest (closest to input) hidden layer is reached • Propagate the Delta values values back to the previous layer • Update the weights between the two layers Back Propagation Learning Algorithm Algorithm Back-Prop-Update(network, examples, alpha) : new network weights repeat for each e in examples do O = Run-Network(network, Ie) Erre = Te - O W j,i = W j,i + (alpha * aj * Erre i * g’(ini)) for each subsequent layer in network do Deltaj = g’(inj) * Sum i W j,i * Delta I W k,j = W k,j +(alpha * Ik * Deltaj) end end until network has converged Discussion • Expressiveness: Well suited for continuous input/output, but do not have the expressive power of general logical representations • Computational Efficiency: For m examples and |W| weights each epoch takes O(m|W|) time. The worst case number of epochs is exponential to the number of inputs • Generalization: Good on generalizing on continuous functions that vary smoothly with the input • Sensitivity to noise: Very sensitive to noise since they do non-linear regression • Transparency: Neural networks are essential black boxes • Prior knowledge: Difficult to chose good training examples, and the best network topology

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download neural-networks