Download ANN

Artificial neural networks Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems. The nervous system is build by relatively simple units, the neurons, so copying their behavior and functionality should be the solution. Comparison of Brains and Traditional Computers • 200 billion neurons, 32 trillion synapses • Element size: 10-6 m • Energy use: 25W • Processing speed: 100 Hz • Parallel, Distributed • Fault Tolerant • Learns: Yes • Intelligent/Conscious: Usually • 1 billion bytes RAM but trillions of bytes on disk • Element size: 10-9 m • Energy watt: 30-90W (CPU) • Processing speed: 109 Hz • Serial, Centralized • Generally not Fault Tolerant • Learns: Some • Intelligent/Conscious: Generally No Biological Inspiration Idea : To make the computer more robust, intelligent, and learn, … Let’s model our computer software (and/or hardware) after the brain “My brain: It's my second favorite organ.” - Woody Allen Biological inspiration Dendrites Soma (cell body) Axon Biological inspiration axon dendrites synapses The information transmission happens at the synapses. Biological inspiration The spikes travelling along the axon of the presynaptic neuron trigger the release of neurotransmitter substances at the synapse. The neuro-transmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron. The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron. The contribution of the signals depends on the strength of the synaptic connection. Artificial neurons Neurons work by processing information. They receive and provide information in form of spikes. x1 w1 x2 Inputs w2 x3 … xn-1 xn w3 .. . n z   wi xi ; y  H ( z ) i 1 wn-1 wn The McCullogh-Pitts model Output y Artificial neurons The McCullogh-Pitts model: • spikes are interpreted as spike rates; • synaptic strength are translated as synaptic weights; • excitation means positive product between the incoming spike rate and the corresponding synaptic weight; • inhibition means negative product between the incoming spike rate and the corresponding synaptic weight; Biological Neural Network Soma Dendrite Axon Synapse Artificial Neural Network Neuron Input Output Weight 10 Artificial neurons Nonlinear generalization of the McCulloghPitts neuron: y  f ( x, w) y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights. Artificial neural networks Output Inputs An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs. Artificial neural networks Tasks to be solved by artificial neural networks: • controlling the movements of a robot based on self-perception and other information (e.g., visual information); • deciding the category of potential food items (e.g., edible or non-edible) in an artificial world; • recognizing a visual object (e.g., a familiar face); • predicting where a moving object goes, when a robot wants to catch it. Learning in biological systems Learning = learning by adaptation The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behavior. At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones. Learning in biological neural networks The learning rules of Hebb: • synchronous activation increases the synaptic strength; • asynchronous activation decreases the synaptic strength. These rules fit with energy minimization principles. Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed. Learning principle for artificial neural networks ENERGY MINIMIZATION We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons. ENERGY = measure of task performance error • Supervised learning one set of observations, called inputs, is assumed to be the cause of another set of observations, called outputs. • Unsupervised learning can be used for bridging the causal gap between input and output observations. 17 The Weights • In supervised learning: – Examples are presented along with the corresponding desired output vectors. – Weight is adjusted with each iteration until the actual output for each input is close to the desired vector. • In unsupervised learning: – Examples are presented without any corresponding desired output vectors. – Weight is adjusted in accordance with naturally occurring patterns in the data using a suitable training algorithm. – Output vector represents the position of the input vector within the discovered patterns of the data. 18 • When presented with noisy/incomplete data, ANN produce approximate answer rather than incorrect. • When presented with unfamiliar data within the range of its previously seen examples, ANN will generally produce a reasonable output interpolated between the example outputs. • However, ANN is unable to extrapolate reliably beyond the range of the previously seen examples. • For Interpolation we can use fuzzy logic. Therefore, ANN and fuzzy logic are alternative solutions to engineering problem and may be combined in a hybrid system. 19 ANN applications • ANN can be applied to many tasks. • ANN associates input vector (x1, x2, … xn) with output vector (y1, y2, … ym) • The function linking the input and output may be unknown and can be highly nonlinear. – A linear function is one that can be represented as f(x) = mx + c, where m and c are constants; – a nonlinear one may include higher order terms for x, or trigonometric or logarithmic functions of x.) 20 Application 1: Nonlinear estimation • ANN technique can determine values of variables that cannot be measured easily, but known to depend on other more accessible variables. • The measurable variables form the network input vector and the unknown variables constitute the output vector. • In Nonlinear estimation, the network is initially trained using a set of examples known as the training data. • Supervised learning is used; – i.e: each example in the training comprises two vectors: an input vector and its corresponding desired output vector. – This assumes that some values for the less accessible variable have been obtained to form the desired outputs. 21 Application 1: Nonlinear estimation • During training, the network learns to associate the example input vectors with their desired output vectors. • When it is subsequently presented with a previously unseen input vector, the network is able to interpolate between similar examples in the training data to generate an output vector. 22 Classification • Output vector classify input into one of a set of known possible class. • Example: speech recognition system: – Classify input into 3 different words: yes, no, and maybe. – Input: Preprocessed digitized sound of the words – Output: (0, 0, 1) for yes (0, 1, 0) for no (1, 0, 0) for maybe. – During training, the network learns to associate similar input vectors with a particular output vector. – When it is subsequently presented with a previously unseen input vector, the network selects the output vector that offers the closest match. 23 • Example X  n  xi wi i 1 1, if X   Y 1, if X   24 Clustering • Unsupervised learning • Input vectors are clustered into N groups, (N is integer, may be prespecified or may be allowed to grow according to the diversity of the data). • Example: In speech recognition – Input : only spoken words – Training: cluster together examples that is similar to each other. (eg: according to different words or voices). – Once the clusters have formed, a second neural network is trained to associate each cluster with a particular desired output. – The overall system then becomes a classifier, where the first network is unsupervised and the second one is supervised. – Clustering is useful for data compression and is an important aspect of data mining, i.e., finding patterns in complex data. 25 26 Content-addressable memory • • A form of unsupervised learning. no desired output vectors associated with the training data. During training, each example input vector becomes stored in a dispersed form through the network. • When a previously unseen vector is subsequently presented to the network, it is treated as though it were an incomplete or error-ridden version of one of the stored examples. • So the network regenerates the stored example that most closely resembles the presented vector. • This can be thought of as a type of classification, where each of the examples in the training data belongs to a separate class, and each represents the ideal vector for that class. 27 Nodes and interconnections • • • • • Node or neuron is a simple computing element having an input side and an output side. Each node may have directional connections to many other nodes at both its input and output sides. Each input xi is multiplied by its associated weight wi. Typically, the node’s role is to sum each of its weighted inputs and add a bias term w0 to form an intermediate quantity called the activation, a. It then passes the activation through a nonlinear function ft known as the transfer function or activation function. Figure shows the function of a single neuron. 28 • Input layer to represent the number of inputs ( raw data). • The hidden layer's job is to transform the inputs into something that the output layer can use. • The output layer transforms the hidden layer activations into whatever scale you wanted your output to be on. 29 Nodes and interconnections • • • • • The behavior of a neural network depends on its topology, the weights, the bias terms, and the transfer function. The weights and biases can be learned, and the learning behavior of a network depends on the chosen training algorithm. Typically a sigmoid function is used as the transfer function For each neuron, the activation function is given by: where n is the number of inputs and the bias term w0 is defined separately for each node. 30 Typical Transfer Functions • Non-linear transfer function: Step function Sign function Sigmoid function Linear function Y Y Y Y +1 +1 1 1 0 -1 X Sigmoid function 0 -1 Ramp function 0, if X  0 1, if X  0 X -1 -1 1, if X  0 sign 1, if X  0 sigmoid step  Y  Y Y  0 X 0 X Step function 1 1  e X Y linear X 31 MultiLayer Perceptron (MLP) • The neurons are organized in layers. • Each neuron is totally connected to the neurons in the layers above and below, but not to the neurons in the same layer. • These networks are also called feed forward networks. • MLPs can be used either for classification or as nonlinear estimators. • Number of nodes in each layer and the number of layers are determined by the network builder, often on a trial-and-error basis. • There is always an input layer and an output layer; the number of nodes in each is determined by the number of inputs and outputs being considered. 32 MultiLayer Perceptron (MLP) • Can have any number of hidden layers between input and output layers. • have no obvious meaning associated with them. • If no hidden layers, the network is a single layer perceptron (SLP). • Network shown has – Three input nodes – Two hidden layers with four nodes each. – One output layer of two nodes. – Short form name is 3–4–4–2 MLP. 33 Perceptrons as classifiers • Normally there is one input node for each element of the input vector and one output node for each element of the output vector. • Each output node would usually represent a particular class • Typical representation for a class would be – ~1 for one class and the rest ~0. – (0, 0, 1) for yes (0, 1, 0) for no (1, 0, 0) for maybe. • Other representations are such as two output nodes to represent four classes. Eg: (0,0), (0,1), (1,0), and (1,1). 34 Linear classifiers •  Output, prior to application of the transfer function, is given by  The dividing criterion is assumed to be a = 0 corresponding to output of 0.5 after the application of the sigmoid transfer function  Thus the hyper plane that separates the two regions is given by: Example: Single layer perceptron – – – – Input : 2 neuron Output: 3 classes Each class has 1 dividing line Linearly separable 35 Training a perceptron • Training separate the regions in state space by adjusting its weights and bias. • Difference between the generated value and the desired value is the error • The overall error is expressed as the root mean squares (RMS) of the errors (both –ve and +ve) • Training minimized RMS by altering the weights and bias, through many passes of the training data. • This search for weights and biases that gives the minimum RMS error is an optimization problem with RMS error as the cost function. • When RMS error is within a small range, we say that the network converged. 36 Training Algorithm • Most common is : back-error propagation (BP) algorithm (or generalized delta rule) • A gradient-proportional descent technique with continuous and differentiable transfer function such as sigmoid. • For sigmoid function , the derivative is 37 BP algorithm  2 stages  Gather error term  Update weights  Repeat as many times required 38 BP Input signals 1 x1 x2 2 xi y1 2 y2 k yk l yl 1 2 i 1 wij j wjk m n xn Input layer Hidden layer Output layer Error signals 39 Inputs Desired output x1 x2 yd 1 0 1 0 1 1 0 0 0 1 1 0 Actual output y5 0.0155 0.9849 0.9849 0.0175 e Sum of squared errors 0.0010 40 Hierarchical Perceptrons • In complex problems recommended to divide MLP into several smaller MLPs arranged in a hierarchy. • Each MLP independent from each other, can be trained separately or in parallel. 41 Some Practical Considerations • Stop training if RMS error remains constant so as not to overtrain the network (expert at giving correct output for training data, but not with new data). • Some reasons : – too many cycles of training – over-complex network (many hidden layers or numbers of neurons) • To avoid: – divide the data into training, testing, and validation. – Use leave-one-out method – Use scaled data. 42 Effects of Over training 43

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ANN