Download lecture notes - The College of Saint Rose

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D. Machine Learning  Machine learning involves adaptive mechanisms that enable computers to: – – –  Learn from experience Learn by example Learn by analogy Learning capabilities improve the performance of intelligent systems over time The Brain  How do brains work? –  How do human brains differ from that of other animals? Can we base models of artificial intelligence on the structure and inner workings of the brain? The Brain  The human brain consists of: – –  Approximately 10 billion neurons …and 60 trillion connections The brain is a highly complex, nonlinear, parallel information-processing system – By firing neurons simultaneously, the brain performs faster than the fastest computers in existence today The Brain  Building blocks of the human brain: Synapse Axon Soma Synapse Dendrites Axon Soma Dendrites Synapse The Brain  An individual neuron has a very simple structure – – –  Cell body is called a soma Small connective fibers are called dendrites Single long fibers are called axons An army of such elements constitutes tremendous processing power Artificial Neural Networks  An artificial neural network consists of a number of very simple processors called neurons – Neurons are connected by weighted links – The links pass signals from one neuron to another based on predefined thresholds Artificial Neural Networks  An individual neuron (McCulloch & Pitts, 1943): – – – – Computes the weighted sum of the input signals Compares the result with a threshold value, q If the net input is less than the threshold, the neuron output is –1 (or 0) Otherwise, the neuron becomes activated and its output is +1 Artificial Neural Networks Input Signals Weights Output Signals x1 Y w1 x2 w2 Neuron Q Y Y X = x1w1 + x2w2 + ... + xnwn wn xn Y Activation Functions  Individual neurons adhere to an activation function, which determines whether they propagate their signal (i.e. activate) or not: n X   xi wi i 1 1, if X  q Y 1, if X  q Sign Function Activation Functions Step function Sign function Sigmoid function Linear function hard limit functions Y Y +1 +1 0 X 0 Y 1 1 X -1 -1 Y 0 -1 1, if X  0 sign 1, if X  0 sigmoid step Y  Y  Y  0, if X  0 1, if X  0 X 0 -1 1 1  e X Y linear X X Write functions or methods for the activation functions on the previous slide Activation Functions  The step, sign, and sigmoid activation functions are also often called hard limit functions  We use such functions in decision-making neural networks – Support classification and other pattern recognition tasks Perceptrons  Can an individual neuron learn? – – In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a single-node neural network Rosenblatt’s perceptron model consists of a single neuron with adjustable synaptic weights, followed by a hard limiter Write code for a single two-input neuron – (see below) Perceptrons Set w1, w2, and Θ through trial and error to obtain a logical AND of inputs x1 and x2 Inputs x1 w1 Linear Combiner Hard Limiter Output Y X = x1w1 + x2w2 Y = Ystep w2 x2 q Threshold Perceptrons  A perceptron: – – x2 Classifies inputs x1, x2, ..., xn into one of two distinct classes A1 and A2 Forms a linearly separable function defined by: n  xi wi  q  0 i 1 Class A1 1 Class A2 2 x1w1 + x2w2 q = 0 x1 Perceptrons  Perceptron with three x2 inputs x1, x2, and x3 A1 classifies its Class inputs into two distinct 1 sets A1 and A2 n Class A2 2  xi wi  q  0 x2 1 2 x1 x1 i 1 x1w1 + x2w2 q = 0 x3 x1w1 + x2w2 + x3w3 q = 0 Perceptrons  How does a perceptron learn? – – – A perceptron has initial (often random) weights typically in the range [-0.5, 0.5] Apply an established training dataset Calculate the error as expected output minus actual output: error e = Yexpected – Yactual – Adjust the weights to reduce the error Perceptrons  How do we adjust a perceptron’s weights to produce Yexpected? – – If e is positive, we need to increase Yactual (and vice versa) Use this formula: wi = wi + Δwi , where Δwi = α x xi x e and  α is the learning rate (between 0 and 1)  e is the calculated error Use threshold Θ = 0.2 and learning rate α = 0.1 Perceptron Example – AND  Train a perceptron to recognize logical AND Inputs Epoch Desired output Yd Initial weights w1 w2 Actual output Y Error Final weights w1 w2 x1 x2 1 0 0 1 1 0 1 0 1 0 0 0 1 0.3 0.3 0.3 0.2  0.1  0.1  0.1  0.1 0 0 1 0 0 0 1 1 0.3 0.3 0.2 0.3  0.1  0.1  0.1 0.0 2 0 0 1 1 0 1 0 1 0 0 0 1 0.3 0.3 0.3 0.2 0.0 0.0 0.0 0.0 0 0 1 1 0 0 1 0 0.3 0.3 0.2 0.2 0.0 0.0 0.0 0.0 3 0 0 0 1 0 0 0.2 0.2 0.0 0.0 0 0 0 0 0.2 0.2 0.0 0.0 e Inputs Epoch x1 x2 0 0 1 1 0 1 0 1 Desired output Yd Initial weights w1 w2 Use threshold Θ = 0.2 and Actual Error Finalα = 0.1 learning rate output Y weights w1 w2 e 0 0 1 0 0 0 1 1 0.3 0.3 0.2 0.3 0 Error 1 0 e 0.3Final0.0 0.2 0.0 weights 0.2 0.0 w1 w 2 Perceptron Example – AND 1  0 0 0 1 0.3 0.3 0.3 0.2  0.1  0.1  0.1  0.1  0.1  0.1  0.1 0.0 Train a0perceptron to recognize logical AND 2 0 0 0.3 0.0 0 0 0.3 0.0 0 1 Inputs 0 Epoch 1 x11 x12 0 Desired 0 ut outp 1d Y 0.3Initial0.0 0.3 0.0 weights 0.2 0.0 w1 w 2 0 Actual 1 ut outp 1 Y 1 3 0 0 1 1 0 1 0 1 0 0 0 1 0.3 0.2 0.3 0.2 0.3 0.2 0.2 0.1 0.1  0.0 0.1  0.0 0.0  0.1 0.1  0.0 0 0 1 0 0 0 1 1 0.3 0.2 0.3 0.2 0.2 0.1 0.3 0.2 0.1  0.0 0.1  0.0 0.0  0.1 0.0 0.1 2 4 0 0 1 1 0 1 0 1 0 0 0 1 0.3 0.2 0.3 0.2 0.3 0.2 0.2 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0 0 1 1 0 0 1 0 0.3 0.2 0.3 0.2 0.2 0.1 0.2 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 3 5 0 0 0 1 0 0 0.2 0.1 0.2 0.1 0.0 0.1 0.0 0.1 0 0 0 0 0.2 0.1 0.2 0.1 0.0 0.1 0.0 0.1 2 0 1 1 1 0 1 0 0 1 0.3 0.3 0.2  0.1  0.1  0.1 0 1 0 0 0.3  0.1 0.2  0.1 1 0.3 Θ 0.0 Use1threshold = 0.2 and 0 0 1 1 0 1 0 1 0 0 0 1 0.3 0.3 0.3 0.2 0.0 0.0 0.0 0.0 0 0 1 1 0 1 0 0.3 0.2 0.2 0.0 0.0 0.0 0 0 1 1 0 1 0 1 0 0 0 1 0.2 0.2 0.2 0.1 0.0 0.0 0.0 0.0 0 0 1 0 0 0 1 1 0.2 0.2 0.1 0.2 0.0 0.0 0.0 0.1 0 Error 1 0 e 0.1 0.2Final0.1 0.1 0.1 weights 0.1 0.1 w1 w 2 learning = 0.1 0 0.3rate α 0.0 Perceptron Example – AND 3  Repeat until convergence – i.e. not change and no 0error0.2 4 final 0 weights 0 0 do 0.2 0.1 0 0 1 Inputs 0 Epoch 1 x11 x12 1 5 0 0 1 1 0 1 0 1 0 Desired 0 ut outp 1d Y 0 0 0 1 0.2Initial0.1 0.2 0.1 weights 0.1 0.1 w1 w 2 0.3 0.1 0.1 0.3 0.3 0.1 0.2 0.1 Thres rate: 2 hold:0 q =00.2; learning 0 0.3 0 0 0.3 1 1 0 0 0.3 1 1 1 0.2 3 0 0 0 0.2 0 Actual 1 ut outp 1 Y  0.1  0.1  0.1  0.1 0 0 1 0 1 0 0 0 1 0 1 0 0.3 0.1 0.1 0.3 0.2 0.1 0.3 0.1  0.1  0.1  0.1 0.1 0.0 =0.0 0.1 0.0 0.0 0.0 0 0 1 1 0 0 1 0 0.3 0.3 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0 0 0.2 0.0 Perceptron Example – AND   Two-dimensional plot of logical AND operation: A single perceptron can be trained to recognize any linear separable function – – x2 x2 1 1 0 Can we train a perceptron to recognize logical OR? How about logical exclusive-OR (i.e. XOR)? x1 1 (b Perceptron – OR and XOR  Two-dimensional plots of logical OR and XOR: x2 x2 1 1 x1 x1 0 1 (b) OR (x 1  x 2 ) x1 0 1 (c) Ex cl usive- OR (x  x ) Perceptron Coding Exercise  Modify your code to: – – Calculate the error at each step Modify weights, if necessary  –  i.e. if error is non-zero Loop until all error values are zero for a full epoch Modify your code to learn to recognize the logical OR operation – Try to recognize the XOR operation.... Multilayer Neural Networks  Multilayer neural networks consist of: – – –  An input layer of source neurons One or more hidden layers of computational neurons An output layer of more computational neurons Middle Layer Input Layer Input signals are propagated in a layer-by-layer feedforward manner Output Layer Input Signals Output Signals Multilayer Neural Networks Middle Layer Input Layer Output Layer Input Signals Output Signals Multilayer Neural Networks Input layer First hidden layer Second hidden layer Output layer XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1 Multilayer Neural Networks XINPUT = x1 XHInput = xsignals 1w11 + x2w21 + ... + xiwi1 + ... + xnwn1 1 x1 x2 2 xi y1 2 y2 k yk l yl 1 2 i 1 wij j wjk m n xn Input layer Hidden layer Error signals Output layer Multilayer Neural Networks  Three-layer network: 1 q3 x1 1 w13 3 1 w35 w23 q5 5 w w24 14 x2 2 w45 4 w24 Input layer q4 1 Hiddenlayer Output layer y5 Multilayer Neural Networks  Commercial-quality neural networks often incorporate 4 or more layers –  Each layer consists of about 10-1000 individual neurons Experimental and research-based neural networks often use 5 or 6 (or more) layers – Overall, millions of individual neurons may be used Back-Propagation NNs  A back-propagation neural network is a multilayer neural network that propagates error backwards through the network as it learns – Weights are modified based on the calculated error – Training is complete when the error is below a specified threshold  e.g. less than 0.001 Back-Propagation NNs Input signals 1 x1 x2 2 xi y1 2 y2 k yk l yl 1 2 i 1 wij j wjk m n xn Input layer Hidden layer Error signals Output layer Write code for the three-layer neural network below Use the sigmoid activation function; and apply Θ by connecting fixed input -1 to weight Θ Back-Propagation NNs 1 q3 x1 1 w13 3 1 w35 w23 q5 5 w w24 14 x2 2 w45 4 w24 Input layer q4 1 Hiddenlayer Output layer y5 Back-Propagation NNs Start with random weights – – Repeat until the sum of the squared errors is below 0.001 Depending on initial weights, final converged results may vary Sum-Squared Network Erro r for 224 Epochs 10 1 10 0 Sum-Squared Error  10 -1 10 -2 10 -3 10 -4 0 50 100 Epoch 150 200 Back-Propagation NNs  After 224 epochs (896 individual iterations), the neural network has been trained successfully: Inputs Desired output x1 x2 yd 1 0 1 0 1 1 0 0 0 1 1 0 Actual output y5 0.0155 0.9849 0.9849 0.0175 e Sum of squared errors 0.0010 Back-Propagation NNs   No longer limited to linearly separable functions 1 Another solution: +1.5 x1 – 1 Isolate neuron 3, then neuron 4.... +1.0 3 2.0 +1.0 2 +0.5 5 +1.0 x2 1 +1.0 +1.0 4 +0.5 1 y5 Back-Propagation NNs  Combine linearly separable functions of neurons 3 and 4: x2 x2 x2 x11 + x22 – 1.5 = 0 x11 + x22 – 0.5 = 0 1 1 1 x1 x1 0 1 (a) 0 1 (b) x1 0 1 (c) Using Neural Networks  Handwriting recognition 0 4 A 1 0 Input layer First hidden layer Second hidden layer Output layer 0 0100 => 4 0101 => 5 0110 => 6 0111 => 7 etc. Using Neural Networks  Advantages of neural networks: – –  Given a training dataset, neural networks learn Powerful classification and pattern matching applications Drawbacks of neural networks: – – Solution is a “black box” Computationally intensive

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture notes - The College of Saint Rose