Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural Networks 2 Biological Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks Artificial Neural Networks . ANN Structure. . . . . . . . . ANN Illustration. . . . . . . . . . . . 2 2 3 4 5 Perceptrons Perceptron Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perceptrons Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Perceptron Learning (α = 1) . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neural networks are inspired by our brains. The human brain has about 1011 neurons and 1014 synapses. A neuron consists of a soma (cell body), axons (sends signals), and dendrites (receives signals). A synapse connects an axon to a dendrite. Given a signal, a synapse might increase (excite) or decrease (inhibit) electrical potential. A neuron fires when its electrical potential reaches a threshold. Learning might occur by changes to synapses. CS 5233 Artificial Intelligence Artificial Neural Networks An (artificial) neural network consists of units, connections, and weights. Inputs and outputs are numeric. Biological NN Artificial NN soma unit axon, dendrite connection synapse weight potential weighted sum threshold bias weight signal activation Gradient Descent 9 Gradient Descent (Linear) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 The Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Multilayer Networks Illustration. . . . . . . . . . . . . . . . . Sigmoid Activation . . . . . . . . . . . Plot of Sigmoid Function. . . . . . . Applying The Chain Rule. . . . . . . Backpropagation . . . . . . . . . . . . Hypertangent Activation Function. Gaussian Activation Function . . . . Issues in Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 12 13 14 15 16 17 18 Artificial Neural Networks – 2 CS 5233 Artificial Intelligence Artificial Neural Networks – 3 2 Perceptrons ANN Structure A typical unit i receives inputs aj1 , aj2 , . . . from other units and performs a weighted sum: P ini = W0,i + j Wj,i aj and outputs activation ai = g(ini ). Typically, input units store the inputs, hidden units transform the inputs into an internal numeric vector, and an output unit transforms the hidden values into the prediction. An ANN is a function f (x, W) = a, where x is an example, W is the weights, and a is the prediction (activation value from output unit). Learning is finding a W that minimizes error. CS 5233 Artificial Intelligence Perceptron Learning Rule Artificial Neural Networks – 4 x1 x2 x3 x4 HIDDEN UNITS W15 W25 W35 W45 W16 W26 W36 W46 A perceptron is a single unit with activation: P a = sign W0 + j Wj ∗ xj sign returns −1 or 1. W0 is the bias weight. One version of the perceptron learning rule is: P in ← W0 + Wj ∗ xj E ← max(0, 1 − y ∗ in) if E > 0 then W0 ← W0 + α ∗ y Wj ← Wj + α ∗ xj ∗ y for each input xj x is the inputs, y ∈ {1, −1} is the target, and α > 0 is the learning rate. ANN Illustration INPUT UNITS 6 OUTPUT UNIT CS 5233 Artificial Intelligence OUTPUT H5 Perceptrons Continued W05 W57 bias W07 O7 a7 W06 W67 Artificial Neural Networks – 6 This learning rule tends to minimize E. The perceptron convergence theorem states that if some W classifies all the training examples correctly, then the perceptron learning rule will converge to zero error on the training examples. Usually, many epochs (passes over the training examples) are needed until convergence. If zero error is not possible, use α ≈ 0.1/n, where n is the number of normalized or standardized inputs. CS 5233 Artificial Intelligence Artificial Neural Networks – 7 H6 WEIGHTS CS 5233 Artificial Intelligence Artificial Neural Networks – 5 3 4 Example of Perceptron Learning (α = 1) The Delta Rule Using α = 1: Given error E(W), obtain the gradient: ∂E ∂E ∂E , ,... ∇E(W) = ∂W0 ∂W1 ∂Wn To decrease error, use the update rule: ∂E Wj ← Wj − α ∂Wj The LMS update rule can be derived using: P E(W) = (y − (W0 + Wj ∗ xj ))2/2 x1 Inputs x2 x3 x4 y in E 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 1 0 -1 1 1 -1 1 -1 1 1 -1 0 -1 2 0 -1 -1 1 -1 2 1 2 0 1 2 0 0 2 3 0 1 1 1 0 0 0 1 0 W0 W1 0 0 -1 0 0 1 0 1 -1 1 0 1 0 1 0 1 1 2 0 2 Weights W2 W3 W4 0 0 0 0 0 -1 1 1 -1 1 1 -1 1 0 -2 1 0 -2 1 0 -2 1 0 -2 1 1 -1 0 1 -1 CS 5233 Artificial Intelligence Artificial Neural Networks – 8 For the perceptron learning rule, use: P E(W) = max (0, 1 − y ∗ (W0 + Wj ∗ xj )) CS 5233 Artificial Intelligence Gradient Descent 9 Multilayer Networks Gradient Descent (Linear) Suppose activation is a linear function: P a = W0 + Wj ∗ xj INPUT UNITS The Widrow-Hoff (LMS) update rule is: HIDDEN UNITS 1 2 1 3 x1 where α is the learning rate. This update rule tends to minimize squared error. P (y − a)2 E(W) = x2 (x,y) CS 5233 Artificial Intelligence 11 Illustration diff ← y − a W0 ← W0 + α ∗ diff Wj ← Wj + α ∗ xj ∗ diff for each input j Artificial Neural Networks – 10 x3 x4 OUTPUT O7 a7 H5 −1 2 bias 1 −2 −4 0 −3 4 1 Artificial Neural Networks – 9 OUTPUT UNIT H6 WEIGHTS CS 5233 Artificial Intelligence 5 Artificial Neural Networks – 11 6 Sigmoid Activation Applying The Chain Rule The sigmoid function is defined as: 1 sigmoid(x) = 1 + e−x It is commonly used for ANN activation functions: ai = sigmoid(ini) P = sigmoid(W0,i + j Wj,i aj ) ∂E ∂E ∂ai ∂ini = ∂Wj,i ∂ai ∂ini ∂Wj,i = −2(yi − ai ) ai (1 − ai ) aj Note that ∂sigmoid(x) = sigmoid(x)(1 − sigmoid(x)) ∂x CS 5233 Artificial Intelligence Using E = (yi − ai )2 for output unit i: ∂E ∂ai ∂ini ∂aj ∂inj ∂E = ∂Wk,j ∂ai ∂ini ∂aj ∂inj ∂Wk,j = −2(yi − ai ) ai (1 − ai ) Wj,i aj (1 − aj ) xk Artificial Neural Networks – 12 Plot of Sigmoid Function For weights from input to hidden units: CS 5233 Artificial Intelligence 1.0 Artificial Neural Networks – 14 Backpropagation sigmoid(x) 0.8 Backpropagation is an application of the delta rule. Update each weight W using the rule: 0.6 W ←W −α 0.4 ∂E ∂W where α is the learning rate. 0.2 CS 5233 Artificial Intelligence Artificial Neural Networks – 15 0.0 -4 -2 0 2 4 CS 5233 Artificial Intelligence Artificial Neural Networks – 13 7 8 Issues in Neural Networks Hypertangent Activation Function 1.0 tanh(x) 0.5 0.0 -0.5 -1.0 -2 -1 0 1 2 CS 5233 Artificial Intelligence Artificial Neural Networks – 16 Sufficiently complex ANNs can approximate any “reasonable” function. ANNs approx. a preference bias for interpolation. How many hidden units and layers? What are good initial values? One approach to avoid overfitting is: Remove “validation” exs. from training exs. Train neural network using training exs. Choose weights that are best on validation set. Faster algorithms might rely on: Momentum, a running average of deltas. Conjugant Gradient, a second deriv. method, or RPROP, update based only on sign. CS 5233 Artificial Intelligence Artificial Neural Networks – 18 Gaussian Activation Function 1.0 gaussian(x,1) 0.8 0.6 0.4 0.2 0.0 -3 -2 -1 0 1 2 3 CS 5233 Artificial Intelligence Artificial Neural Networks – 17 9 10