Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008 Prénom Nom Outline Biological vs. artificial neural networks Artificial neuron model Artificial neural networks Multi-layer perceptron Feed-forward activation Learning approach Back-propagation method Optimal learning Illustration of JavaNNS 2 © Prof. Rolf Ingold Biological neurons Artificial neural networks are inspired by biological neurons of the central nervous system each neuron is connected to many other neurons information is transmitted via synapses (electrochemical process) a neuron receives input on the from its dendrites, and transmit output via the axon to synapses 3 © Prof. Rolf Ingold Biological vs artificial networks biologic neural network artificial neural network processing chemical mathematical function transmission time relatively slow very fast number of neurons approx. 1010 max. 104 à 106 number de synapses approx. 1013 up to 108 4 © Prof. Rolf Ingold Artificial neuron model A neuron receives input signals x1, ..., xn These signals are multiplied by synaptic weights w1, ..., wn, which can be positive or negative The activation of the neuron a w x i i i is transmitted to a non linear function f with threshold w0 The output signal y = f (a-w0) is then propagated to other neurons 5 © Prof. Rolf Ingold Characteristics of artificial neural networks Artificial neural networks may vary in different aspects the topology of the network, i.e. the number of neurons, possibly organized in layers or classes how each neuron (of a given layer/class) is connected to its neighbors the transfer function used in each neuron The use and the learning strategy has to be adapted 6 © Prof. Rolf Ingold Topology of the neural network The synaptic connections have a major influence on the behavior of the neural network Two main categories can be considered feed-forward networks where each neuron is propagating its output signal to neurons that have not yet been used as special case the multi-layer perceptron has a sequence of layers such than a neurons from one layer is connected only to neurons of the next layer dynamic networks where neurons are connected without restrictions, in a cyclic way 7 © Prof. Rolf Ingold Multi-layer perceptron The multi-layer perceptron (MLP) has 3 (or more) layers an input layer with one input neuron per feature one or several hidden layers having each an arbitrary number of neurons, connected to the previous layer an output layer with one neuron per class each neuron being connected to the previous layer Hidden and output layers can be completely or only partly connected The decision is in favor of the class corresponding to the highest output activation 8 © Prof. Rolf Ingold Impact of the hidden layer(s) Networks with hidden layers generate arbitrary decision boundaries however the number of hidden layers has no impact ! 9 © Prof. Rolf Ingold Feed-forward activation As for the single perceptron, the feature space is augmented with a feature x0=1 to take into account the bias w0 . Each neuron j of a hidden layer computes an activation y j f (net j ) d with net j xi w ji w tj x i 0 Each neuron k of a output layer computes an activation zk f (netk ) d with net k y j wkj w tk y i 0 10 © Prof. Rolf Ingold Transfer function The transfer function f is supposed to be monotonic increasing, within the range [-1,+1] antisymmetric, i.e. f (-net) = - f (net) continuous and derivable (for back-propagation) Typical functions are simple threshold 1 si a w0 0 f (a w0 ) 1 sinon xxx 1 si a w0 T f (a w0 ) x / T si T a w0 T 1 si a w T 0 sigmoide e( a w0 ) / T 1 f (a w0 ) ( a w ) / T tanh(a w0 ) / 2T e 0 1 11 © Prof. Rolf Ingold Learning in a multi-layer perceptron Learning consists of setting the weights w, based on training samples The method is called back-propagation, because the training error is propagated recursively from the output layer back to the hidden and input layers The training error on a given pattern is defined as the squared difference between the desired output and the observed output, i.e. 1 1 C 2 2 J (w ) t z t k z k 2 2 k 1 In practice, the desired output is +1 for the correct class and 1 (or sometimes 0) for all other classes 12 © Prof. Rolf Ingold Back-propagation of errors The weight vectors are changed in the direction of their gradient w h J w where h is the learning rate 13 © Prof. Rolf Ingold Error correction on the output layer Since the error does not directly depend upon wji we apply the differential chain rule netk J J netk k wkj netk wkj wkj J J z k (t k zk ) f ' (net k ) net k zk net k with k and net k yj wkj Thus the update rule becomes wkj h (t k zk ) f ' (net k ) y j h k y j 14 © Prof. Rolf Ingold Error correction on the hidden layer(s) Applying the following chain rule J J y j w ji y j w ji y j y j net j f ' (net j ) xi with w ji net j w ji c z 1 c 2 t k zk t k z k k y j k 1 2 k 1 zk zk netk f ' (netk ) wkj y j netk y j c c J t k zk f ' (net k ) wkj wkj k y j k 1 k 1 J y j y j Finally the update rule becomes c w ji h wkj k f ' (netk ) xi h j xi k 1 15 © Prof. Rolf Ingold Learning algorithm The learning process starts with randomly initialized weights The weights are adjusted iteratively by patterns from the training set the pattern is presented to the network and the feed-forward activation is computed the output error is computed the error is used to update the weights w[1] reversely, from the [q+1][1] • f'(z) w[2] output layer to the [q+1][2] • hidden layers w[3] [q] [q+1][3] • • • The process is repeated for until a quality criteria w[n] is reached • [q+1][n] 16 © Prof. Rolf Ingold Risk of overfitting By minimizing the global error over all training sample tends to produce overfitting To avoid overfitting, the best strategy is to minimize the global error on a validation set which is independent of training set 17 © Prof. Rolf Ingold JavaNNS JavaNNS is an interactive software framework for experimenting artificial neural networks, it has been developed at University of Tübingen it is It supporting the following features multiple topologies (MLP, dynamic networks, ...) different transfer functions different learning strategies network pruning ... 18 © Prof. Rolf Ingold Font recognition with JavaNNS Original neural network with 9 hidden units 19 © Prof. Rolf Ingold Pruned neural network for font recognition Neural network obtained after pruning 20 © Prof. Rolf Ingold