* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download WHY WOULD YOU STUDY ARTIFICIAL INTELLIGENCE? (1)
Survey
Document related concepts
Perceptual control theory wikipedia , lookup
Neural modeling fields wikipedia , lookup
Gene expression programming wikipedia , lookup
Pattern recognition wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Backpropagation wikipedia , lookup
Transcript
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] LEARNING IN NEURAL NETWORKS Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information Technology Institute of Applied Computer Systems Department of Systems Theory and Design E-mail: [email protected] LEARNING IN NEURAL NETWORKS (1) • A (artificial) neural network is composed of a number of simple arithmetic computing elements or nodes (units), connected by links. – Synonyms: connectionism, parallel distributed processing, and neural computation. • From a biological viewpoint, a neural network is a mathematical model for the operation of brain. • The nodes of a neural network correspond to neurons. • A neuron is a cell in the brain whose principal function is the collection, processing, and dissemination of electrical signals. LEARNING IN NEURAL NETWORKS (2) Synapse Axonal arborization Axon from another cell Dendrite Nucleus Cell body or Soma Synapses • The brains information capacity is thought to emerge primarily from networks of such neurons. COMPARING BRAINS WITH COMPUTERS (1) Computer Human Brain 1011 neurons Cycle time 1 CPU, 108 gates 1010 bits RAM 1011 bits disk 10-9 sec Bandwidth 1010 bits/sec 1014 bits/sec Memory updates/sec 109 1014 Computational units Storage units 1011 neurons 1014 synapses 10-3 sec COMPARING BRAINS WITH COMPUTERS (2) COMPARISON OF THE MEMORY • Human brain is evolving very slowly, whereas computer memories are growing rapidly. COMPARISON OF SWITCHING SPEED AND PARALLELISM • Computer chips can execute an instruction in tens of nanoseconds, whereas neurons require milliseconds to fire. • Most current computers have only one or at most a few CPU, whereas all neurons and synapses are active simultaneously (parallel processing). COMPARING BRAINS WITH COMPUTERS (3) • A neural network running on a serial computer requires hundreds of cycles to decide if a single neuron-like unit will fire, whereas in a human brain, all the neurons do this in a single step. • CONCLUSION: Even though a computer is a million times faster in raw switching speed, the brain ends up being a billion times faster at what it does. COMPARING BRAINS WITH COMPUTERS (4) • Brains are more fault-tolerant than computers. • Brains are constantly faced with novel input, yet manage to do something with it. Computer programs rarely work as well with novel input, unless the programmer has been exceptionally careful. • The attraction of neural networks is graceful degradation: they tend to have a gradual rather than sharp drop-off in performance as conditions worsen. • The attraction of neural networks also is that they are designed to be trained using an inductive learning algorithm. A MATHEMATICAL MODEL FOR A NEURON (1) • A simple mathematical model of the neuron is devised by McCulloch and Pitts (1943). Bias Weight a0 = -1 aj W0,j Wj,i Input Links ai = g(ini) ini g ai Activation Output Input Function Function Output Links A MATHEMATICAL MODEL FOR A NEURON (2) • A neural network is composed of a number of units, connected by links. • Each link has a numeric weight associated with it. Weights are the primary means of long-terms storage in neural networks, and learning usually takes place by updating the weights. • Some of the units are connected to the external environment, and can be designated as input or output units. A MATHEMATICAL MODEL FOR A NEURON (3) • The weights are modified so as to try to bring the network’s input/output behaviour more into line with that of the environment providing the inputs. • Each unit has a set of input links from other units, a set of output links to other units, a current activation level, and a means of computing the activation level at the next step in time. • Each unit does a local computation based on inputs of its neighbors, but without the need for any global control over the set of units as a whole. SIMPLE COMPUTING ELEMENTS (1) • Each unit performs a simple computation: it receives signals from its input links and computes a new activation level that it sends along each of its output links. • The computation is split into two components. • First is a linear component, called the input function, ini, that computes the weighted sum of n input values the unit’s in i Wji a j j0 SIMPLE COMPUTING ELEMENTS (2) • Second is a nonlinear component, called the activation function, g, that transforms the weighted sum into the final value that serves as the unit’s output (activation value), ai. n a i g(in i ) g( Wji a j ) j0 • The computation of the activation level is based on the values of each input signal received from a neighbor node, and the weights on each input line. • A bias weight W0i connected to a fixed input a0 = -1 sets the actual threshold for the unit. ACTIVATION FUNCTION (1) • The activation function g is designed to meet two wishes. • First, the unit must be “active” (near +1) when the “right” inputs are given, and “inactive” (near 0) when the “wrong” inputs are given. • Second, the activation needs to be nonlinear, otherwise the entire neural network collapses into a simple linear function and will not be useful for representation of more complex functions. • Different models are obtained by using different mathematical functions for g. ACTIVATION FUNCTION (2) • THREE COMMON CHOICES •The threshold function g(ini) +1 ini •The sign function g(ini) +1 1, if in i 0 g(in i ) 0, if in i 0 g(ini) ini -1 1, if in i 0 g(in i ) 1, if in i 0 •The sigmoid function +1 ini 1 g(in i ) ( in i ) 1 e REPRESENTATION OF BOOLEAN FUNCTIONS BY THRESHOLD UNITS • Individual units are able to represent basic Boolean function (logic gates). W0 = 1.5 W0 = 0.5 W1 = 1 W1 = 1 W0 = 0.5 W1 = 1 W2 = 1 W2 = 1 AND OR NOT THE FUNCTION REPRESENTED BY THE NETWORK • With fixed structure and fixed activation function g, the functions representable by a feed-forward network are restricted to have specific parameterized structure. • The weights chosen for the network determine which of these function is actually represented. a5 = g(W35a3 + W45a4) = g(W35g(W13a1 + W23a2) + + W45g(W14a1 + W24a2)) EXAMPLE I1 I2 W13 W14 W23 W24 H3 W35 O5 H4 W45 where g is the activation function, and ai is the output of node i. NETWORK REPRESENTATION • The links are determined by three parameters – a start node of the link – an end node of the link – a numeric weight of the link • Network topology (structure of links) – a weight matrix NETWORK STRUCTURES (1) TWO MAIN CATEGORIES • FEED FORWARD NETWORKS – The network structure is acyclic. – A feed-forward network represents a function of its current input (it has no internal state other than the weights themselves). EXAMPLES NETWORK STRUCTURES (2) • RECURRENT NETWORKS – The network structure is cyclic. EXAMPLES NETWORK STRUCTURES (3) – A recurrent network feeds its outputs back into its own inputs. – The activation levels of recurrent networks form a dynamic system that may reach a stable state or exhibit oscillations or even chaotic behaviour. – The response of the network to a given input depends on its initial state, which may depend on previous inputs. Hence, recurrent networks can support short-term memory. NETWORK STRUCTURES (4) • Hopfield networks are the best-understood class of recurrent networks • They use bidirectional connections with symmetric weights Wi,j = Wj,i • All of the units are both input and output units • The activation function g is the sign function, and the activation levels can only be 1 • A Hopfield network functions as an associative memory – after training on a set of examples, a new stimulus will cause the network to settle into an activation pattern correspoding to the example in the training set that most closely resembles the new stimulus FEED-FORWARD NEURAL NETWORKS • SINGLE LAYER FEED-FORWARD NEURAL NETWORK – A network with all the inputs connected directly to the outputs is called a single layer network or a perceptron network. – Each output unit is independent of the others. EXAMPLES Perceptron network Single perceptron PERCEPTRON (1) • With a threshold activation function, the perceptron represents a Boolean function. • A threshold perceptron returns 1 if and only if the weighted sum of its inputs (including the bias) is positive: n W x j0 j j 0 • The equation n W x j j a hyperplane in the defines 0 0 input space, so thejperceptron returns 1 if and only if the input is on one side of that hyperplane. • For this reason, the threshold perceptron is called a linear separator. PERCEPTRON (2) • LINEAR SEPARABILITY IN PERCEPTRONS I1 I1 1,5 1 1 I2 0 1 1,5 a) Boolean function AND 0,5 I2 • Each function is represented as a two dimensional plot, based on the values of the two inputs. 0 0,5 1 b) Boolean function OR • Black dots indicate a point in the input space where the value of the function is 1, and white dots indicate a point where the value is 0. PERCEPTRON (3) • In Figure (a) one possible separating “plane” (a line) is defined by the equation I1 + I2 = 1,5 or I1 = - I2 + 1,5 The region above the line, where the output is 1, is given by I1 + I2 – 1,5 > 0 • In Figure (b) one possible separating “plane” (a line) is defined by the equation I1 + I2 = 0,5 or I1 = - I2 + 0,5 The region above the line, where the output is 1, is given by I1 + I2 – 0,5 > 0 LINEAR CLASSIFICATION (1) EXAMPLE Problem: CLASSIFICATION OF AIRPLANES • RULES IF WEIGHT > 0,80 AND SPEED < 0,55 THEN BOMBER IF WEIGHT < 0,90 AND SPEED > 0,25 THEN FIGHTER Let have 10 examples of airplanes 3 2,5 2 WEIGHT 1,5 1 0,5 0 0 0,2 0,4 0,6 BOMBER FIGHTER 0,8 1 SPEED LINEAR CLASSIFICATION (2) • One possible separating line is defined by the equation I2 = 1,5 I1 + 0,5 • The equation may be used for the decision-making rule f(I1, I2) = 1,5 I1 – I2 + 0,5 IF f(I1, I2) 0 THEN FIGHTER IF f(I1, I2) < 0 THEN BOMBER • THE CORRESPONDING PERCEPTRON 0,5 1 1,5 I 1 I2 -1,0 LEARNING LINEARLY SEPARABLE FUNCTIONS (1) • There is a perceptron algorithm that will learn any linearly separable function, given enough training examples. • The idea behind most algorithms for neural network learning is to adjust the weights of the network to minimize some measure of the error on the training set. • The initial network has randomly assigned weights, usually from the range [-0,5, 0,5]. LEARNING LINEARLY SEPARABLE FUNCTIONS (2) • The network is then updated to try to make it consistent with the examples. This is done by making small adjustments in the weights to reduce the difference between the observed and predicted values (optimization search in weight space). • Typically, the updating process is iterative. Each iteration involves updating all the weights for all the examples. THE WEIGHT UPDATING PROCESS THE MEASURE OF ERROR • For a single training example First case: ERR = T – O where O is the output of the perceptron on the example, and T is the true output value. 1 Second case: ERR (T O) 2 2 • If the error is positive, then O must be increased. • If the error is negative, then O must be decreased. • Each input unit contributes Wj · Ij to the total input, so if Ij is positive, an increase in Wj will tend to increase O, and if Ij is negative, an increase in Wj will tend to decrease O. THE PERCEPTRON LEARNING RULE (1) Wj = Wj + · Ij · Err where is the learning rate. The training example EXAMPLE The initial network 1 0,2 0,2 I 1 I2 -0,5 The fighter, speed = 0,4, weight = 0,8. The output O = 0,2 · 0,4 + (-0,5) · 0,8 + 0,2 = 0,08 – 0,4 + 0,2 = -0,12 f(I1, I2) < 0, the classification – bomber The error ERR = 1 – 0 = 1 THE PERCEPTRON LEARNING RULE (2) The updated weights ( = 1) W1 = 0,2 + 0,4 · 1 = 0,6 W2 = -0,5 + 0,8 · 1 = 0,3 Wbias = 0,2 + 1 · 1 = 1,2 1 I1 I2 1,2 0,6 0,3 The new output O = 0,6 · 0,4 + 0,3 · 0,8 + 1,2 = 0,24 + 0,24 + 1,2 1,68 f(I1, I2) > 0, the classification – fighter = MULTILAYERED FEEDFORWARD NETWORKS (1) • Networks with one or more layers of hidden units are called multilayer networks. I1 I2 Wji ai ak Wkj aj Input units Hidden units Output units • The advantage of adding hidden layers is that it enlarges the space of hypothesis that the network can represent. MULTILAYERED FEEDFORWARD NETWORKS (2) • With a single, sufficiently large hidden layer, it is possible to represent any continuous function of the inputs with arbitrary accuracy. • Unfortunately, for any particular network structure, it is harder to characterize exactly which functions can be represented and which ones cannot. The problem of choosing the right number of hidden units in advance is still not well understood. MULTILAYERED FEEDFORWARD NETWORKS (3) EXAMPLE Boolean XOR function The two layer feed forward network I2 Linear classification is not possible 1 1 I1 1,5 -1 -1 a1 -1 -1 a2 1 a3 I1 0 1 I2 1 -0,5 1 MULTILAYERED FEEDFORWARD NETWORKS (4) First case Input=(0,0) a1 = (1·1,5)+(0·(-1)) + (0·(-1)) = 1,5 Output=1 a2 = (1·0,5)+(0·(-1))+(0·(-1)) = 0,5 Output=1 a3 = (1·(-0,5))+(1·1)+(1·(-1)) = -0,5 Output=0 Second case Input=(1,0) a1 = (1·1,5)+(1·(-1)) + (0·(-1)) = 0,5 Output=1 a2 = (1·0,5)+(1·(-1))+(0·(-1)) = -0,5 Output=0 a3 = (1·(-0,5))+(1·1)+(0·(-1)) = 0,5 Output=1 MULTILAYERED FEEDFORWARD NETWORKS (5) Third case Input=(0,1) a1 = (1·1,5)+(0·(-1)) + (1·(-1)) = 0,5 Output=1 a2 = (1·0,5)+(0·(-1))+(1·(-1)) = -0,5 Output=0 a3 = (1·(-0,5))+(1·1)+(0·(-1)) = 0,5 Output=1 Fourth case Input=(1,1) a1 = (1·1,5)+(1·(-1)) + (1·(-1)) = -0,5 Output=0 a2 = (1·0,5)+(1·(-1))+(1·(-1)) = -1,5 Output=0 a3 = (1·(-0,5))+(0·1)+(0·(-1)) = -0,5 Output=0 CLASSIFICATION EXAMPLE Problem: Any pair of persons must be classified into two classes: brothers and friends. There are 3 brothers in each family. Persons from different families are friends. John Paul George Charles Dick Rodger 1 1 1 0,5 1,5 FRIENDS -1,5 BROTHERS -1 1 1 0,5 -1 LEARNING IN MULTILAYER NETWORKS • Learning algorithms for multilayer networks are similar to the perceptron learning algorithm. • Inputs are presented to the network, and if the network computes an output vector that matches the target nothing is done. • If there is an error (a difference between the output and target), then the weights are adjusted to reduce this error. • The trick is to assess the blame for an error and divide it among the contributing weights, because in multilayer networks there are many weights connecting each input to each output, and each of these weights contributes to more than one output. BACK-PROPAGATION LEARNING (1) • The back-propagation algorithm is sensible approach to dividing the contribution of each weight. • At the output layer, in weight update rule the activation of the hidden unit aj is used instead of the input value; and the rule contains a term for the gradient of the activation function Wji = Wji + · aj ·Erri · g’(ini) where Erri is the error (Ti - Oi) at the output node; g’(ini) is the derivative of the activation function g (for this reason only sigmoid function can be used in multilayer network). • BACK-PROPAGATION LEARNING (2) A new error term is defined, which for output i nodes is i = Erri · g’(ini). • The update rule now is the following Wji = Wji + · aj · i • The propagation rule for the values is the following j g' (in j ) Wji i i • The weight update rule for the weights between the inputs and the hidden layer is the following Wkj = Wkj + · Ik · j SUMMARIZED BACKPROPAGATION ALGORITHM • Compute the values for the output units using the observed error. • Starting with output layer, repeat the following for each layer in the network, until the earliest hidden layer is reached: – Propagate the values back to the previous layer. – Update the weights between the two layers. BUILDING OF NEURAL NETWORKS • To build a neural network to perform some task, it is needed: – To decide how many units are to be used. – To decide what kind of units are appropriate. – To decide how the units are connected to form a network. – To initialize the weights of the network. – To decide how to encode the examples in terms of inputs and outputs of the network. – To train the weights using a learning algorithm applied to a set of training examples for the task.