Download Lecture 9

Lecture 9 Artificial Intelligence Artificial Intelligence The study of computer systems that attempt to model and apply the intelligence of the human mind. Natural Language Processing Self-Awarness (Consciousness) Machine Vision Creativity Knowledge Representation Humor Learning Systems Empathy Expert Systems Emotions (anger, love, fear, curiosity, depression, joy, desire) General Problem Solving Self-Motivation Neural Networks Genetic Algorithms Autonomous Robots The Turing Test Semantic Networks Solution/Problem Space Tree Searches Biological Neural Networks dendrites colaterals cell body signal direction axon synapse Biological Neuron electrical signal axon vesicles neurotransmitters presynaptic membrane dendrite Biological Network synaptic gap postsynaptic membrane electrical signal http://pharmacyebooks.com/2010/10/artifitial-neural-networks-hot-topic-pharmaceutical-research.html The Perceptron The perceptron was developed by Frank Rosenblatt in 1957. It is a simple feed-forward network that can solve (create a decision function for) linearly separable problems. input data −∞, +∞ output −1, +1 Inside the Perceptron 𝑺𝟎 weights 𝝎𝟎 𝑺𝟏 𝝎𝟏 sigma-pi ... perceptron output ... 𝑺𝒊 𝝎𝒊 𝑶𝒌 𝚺𝝎𝒊 𝑺𝒊 𝝎𝑵−𝟐 𝑺𝑵−𝟐 𝑺𝑵−𝟏 𝝎𝑵−𝟏 step function When is a Problem Linearly Separable? RED vs BLUE Linearly Separable Not Linearly Separable http://dynamicnotions.blogspot.com/2008/09/single-layer-perceptron.html The Iris Flower Dataset Iris setosa Iris versicolor Iris virginica https://en.wikipedia.org/wiki/Iris_flower_data_set http://sebastianraschka.com/Articles/2014_python_lda.html Sepal /Petal A Practical Application of a Neural Network Classification The Iris Data - This is one of the most famous datasets used to illustrate the classification problem. From four characteristics of the flower (the length of the sepal, the width of the sepal, the length of the petal and the width of the petal), the objective is to classify a sample of 150 irises into three species: versicolor, virginica and setosa. Sources: R.A. Fisher. "The use of multiple measurements in taxonomic problems. Annals of Eugenics", 7(2), 179–188 (1936) Data from: UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/ Training a 4-2-1 Network for the Iris Data sepal length 1/5 of Iris Data selected uniformly, 10 samples per class for a total of 30 training set pairs. The 4-2-1 network is comprised of a total of 10 weights, 8 between the input and hidden layers, and 2 between the hidden layer and the output. sepal width 0.0 Iris-setosa 4-2-1 net 0.5 Iris-versicolor 1.0 Iris-virginica petal length Iris Data - 3 classes 50 samples each petal width 5.1 4.9 4.7 4.6 5.0 3.5 3.0 3.2 3.1 3.6 1.4 1.4 1.3 1.5 1.4 7.0 6.4 6.9 5.5 6.5 3.2 3.2 3.1 2.3 2.8 4.7 4.5 4.9 4.0 4.6 iris characteristics 6.3 5.8 7.1 6.3 6.5 3.3 2.7 3.0 2.9 3.0 6.0 5.1 5.9 5.6 5.8 0.2 0.2 0.2 0.2 0.2 : 1.4 1.5 1.5 1.3 1.5 : 2.5 1.9 2.1 1.8 2.2 : Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor trained network specification input layer hidden layer output layer learning rate error limit max runs # training sets ihweights Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica The outputs for the three classes were set to 0, 0.5 and 1.0 howeights 4 2 1 0.28 0.01 10000 30 0.1835137273718 -1.52185484488147 1.06085392071769 -10.1057086709985 -1.53328697751333 4.0131689222145 -1.63759087701708 10.741961194748 -6.01331593454728 6.66056158141261 Classifier Performance Sample Count 1 2 3 1 50 0 0 2 0 46 1 3 0 4 49 Perf. Fraction 1 2 3 1 1.0 0.0 0.0 2 0.0 0.92 1.0 3 0.0 0.08 0.98 A Demonstration Typical Feed-Forward Neural Network input layer hidden layer output layer output data input data −∞, +∞ −∞, +∞ −1, +1 Inside an Artificial Neuron 𝑶𝟎 weights 𝝎𝟎 𝝎𝒊 𝑶𝒌 𝚺𝝎𝒊 𝑶𝒊 𝝎𝑵−𝟐 𝑶𝑵−𝟐 𝑶𝑵−𝟏 𝝎𝑵−𝟏 sigmoid function distribution to next layer sigma-pi neuron output ... 𝑶𝒊 𝝎𝟏 ... outputs from previous layer 𝑶𝟏 Backward Error Propagation 1. Initialize the network with small random weights. 2. Present an input pattern to the input layer of the network. 3. Feed the input pattern forward through the network to calculate its activation value. 4. Take the difference between desired output and the activation value to calculate the network’s activation error. 5. Adjust the weights feeding the output neurons to reduce the activation error for this input pattern. 6. Propagate an error value back to each hidden neuron that is proportional to its contribution to the network activation error. 7. Adjust the weights feeding each hidden neuron to reduce its contribution of error for this input pattern. 8. Repeat steps 2 to 7 for each input pattern in the training set ensemble. 9. Repeat step 8 until the network is suitably trained. Implementing a Neural Network t output training sets each with p values t input training sets each with m values m input layer nodes mxn weights input to hidden layer n hidden layer nodes nxp weights hidden to output layer p output layer nodes Neural Network Data Structure & Components public public public public public public public public public public public public public public public public public static static static static static static static static static static static static static static static static static double learn = 0.28; double error = 0.01; int npairs = 0; int maxnumruns = 10000; int numinput = 1; int numhidden = 1; int numoutput = 1; double[,] inTrain; double[,] outTrain; neuron[] iLayer; neuron[] hLayer; neuron[] oLayer; weight[,] ihWeight; weight[,] hoWeight; int pxerr; double Scalerr; bool showtoterr = true; public class neuron { public double input; public double output; public double error; public neuron() { input = 0.0; output = 0.0; error = 0.0; } } public class weight { public double wt; public double delta; public weight(double wght) { wt = wght; delta = 0.0; } } Generalized Delta Rule pth training set input 𝚫𝒑 𝒘𝒊𝒋 =𝜼𝜹𝒑𝒋 𝝄𝒑𝒊 tpi 𝝄𝒑𝒊 𝚫𝒑 𝒘𝒊𝒋 𝜹𝒑𝒋 𝜼 𝒘𝒊𝒋 correction to weight value error in jth unit learning rate Quantifying Error for Back Propagation 𝒇(𝒂𝒑𝒋 ) neuron output function for pth presentation for training 𝜹𝒑𝒋 = 𝒇′ (𝒂𝒑𝒋 ) 𝒕𝒑𝒋 − 𝒐𝒑𝒋 𝜹𝒑𝒋 = 𝒇′ 𝒋 (𝒂𝒑𝒋 ) (𝒇𝒐𝒓 𝒂𝒍𝒍 𝒌) error for jth unit in output layer 𝜹𝒑𝒌 𝒘𝒋𝒌 error for jth unit in hidden layer 𝒘𝒋𝟏 𝜹𝒑𝟏 𝒕𝒑𝟏 𝜹𝒑𝟐 𝒕𝒑𝟐 𝒘𝒋𝟐 𝒘𝒋𝒌 ... 𝜹𝒑𝒌 hidden layer 𝒕𝒑𝒌 output layer pth training set output 𝜹𝒑𝒋 The Sigmoid Function 𝟐 𝒇 𝒙 = −𝟏 𝟏 + 𝒆−𝟐𝒙 sigmoid 𝒇′ 𝒙 = 𝟏 − 𝒇(𝒙)𝟐 derivative of the sigmoid Another Sigmoid Function 𝟏 𝒇 𝒙 = 𝟏 + 𝒆−𝒙 sigmoid 𝒇′ 𝒙 = 𝒇(𝒙) 𝟏 − 𝒇(𝒙) derivative of the sigmoid Running the Neural Network public void calcInputLayer(int p) { for (int i = 0; i < iLayer.Length; i++) { iLayer[i].output = inTrain[i, p]; } } public void calcHiddenLayer() { for(int h=0;h<hLayer.Length;h++) { hLayer[h].input = 0.0; for (int i = 0; i < iLayer.Length; i++) hLayer[h].input += ihWeight[i, h].wt * iLayer[i].output; hLayer[h].output = f(hLayer[h].input); } } public void calcOutputLayer() { for (int o = 0; o < oLayer.Length; o++) { oLayer[o].input = 0.0; for (int h = 0; h < hLayer.Length; h++) oLayer[o].input += hoWeight[h, o].wt * hLayer[h].output; oLayer[o].output = f(oLayer[o].input); } } public double f(double x) { return 1.0 / (1.0 + Math.Exp(-x)); } public double df(double x) { return f(x) * (1.0 - f(x)); } Running the network is a feed-forward process. Input data is presented to the input layer. The activation (input) is computed for each node of the hidden layer and then used to compute the output of the hidden layer nodes The activation (input) is computed and used to compute the output of the network. Training the Network In backward error propagation, the difference between the actual output and the goal (or target) output provided in the training set is used to compute the error in the network. This error is then used to compute the delta (change) in weight values for the weights between the hidden layer and the output layer. public void calcOutputError(int p, int r) { for (int o = 0; o < oLayer.Length; o++) oLayer[o].error = df(oLayer[o].input) * (outTrain[o, p] - oLayer[o].output); for (int h = 0; h < hLayer.Length; h++) for (int o = 0; o < oLayer.Length; o++) hoWeight[h, o].wt += learn * oLayer[o].error * hLayer[h].output; } public void calcHiddenError(int p, int r) { double err = 0.0; for (int h = 0; h < hLayer.Length; h++) { for (int o = 0; o < oLayer.Length; o++) err = oLayer[o].error * hoWeight[h, o].wt; hLayer[h].error = df(hLayer[h].input) * err; } for (int i = 0; i < iLayer.Length; i++) for (int h = 0; h < hLayer.Length; h++) ihWeight[i, h].wt += learn * hLayer[h].error * iLayer[i].output; } These new weight values are then used to distribute the output error to the hidden layer nodes. These nodes errors are, in turn, used to compute the changes in value for the weights between the input layer and the hidden layer of the network. 1. Set the number of neurons in each level 2. Select the learning rate, error limit and max training runs 3. Give the number of training pairs and include them in the left-hand text window with input output pairs listed sequentially input 1 output 1 input 2 output 2 : input n output n Total Training Set Ensemble Error during training process Training rate depends on initial value of random weights How Many Nodes? Number of Input Layer Nodes matches number of input values Number of Ouput Layer Nodes matches number of output values But what about the hidden Layer? Too few hidden layer nodes and the NN can't learn the patterns. Too many hidden layer nodes and the NN doesn't generalize. When Should We Use Neural Networks? Neural Networks need lots of data (example solutions) for training. The functional relationships of the problem/solution are not well understood. The problem/solution is not applicable to a rule-based solution. "Similar input data sets generate "similar" outputs. Neural Networks perform general Pattern Recognition. Neural Networks are particularly good as Decision Support tools. Also good for modeling behavior of living systems. Can a Neural Network do More than a Digital Computer? Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the computer on which it is being executed. The question is, "Can a computational system such as a Neural Network be built that can do something that a digital computer cannot?" A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of all computable functions. An artificial Neural Network is loosely modeled on the human brain. Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the activities of human brain cells. Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a human brain can do? Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated by a computer or any other physical (man-made) system? What is the Computational Power of the Human Mind? Since we can't quantify consciousness, it is not likely that we can determine the level of computational power necessary to manifest it. However, we can establish a relative measure of computational power for systems that do and (so far) do not exhibit consciousness. Human Mind/Brain Turing Machine Digital Computer Neural Network Physical System/Model Relative Computational Power Mind/Brain Turing Machine Digital Computer Physical System Neural Network Relative Computational Power Mind/Brain Dualism vs Materialism The Revised Turing Test Turing Machine Finite Storage and Finite Precision Digital Computer Physical System Due to limitations of finite storage and the related issue of finite precision arithmetic, a Turing Machine can exhibit greater computational power than a digital computer. Symbolism vs Connectionism Engineering and Technology Neural Network Relative Computational Power Mind/Brain Turing Machine Digital Computer Physical System Neural Network

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 9