* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 9
History of artificial intelligence wikipedia , lookup
Neural modeling fields wikipedia , lookup
Perceptual control theory wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Gene expression programming wikipedia , lookup
Backpropagation wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Pattern recognition wikipedia , lookup
Lecture 9
Artificial Intelligence
Artificial Intelligence
The study of computer systems that attempt to model and apply the intelligence of the human mind.
Natural Language Processing
Self-Awarness (Consciousness)
Machine Vision
Creativity
Knowledge Representation
Humor
Learning Systems
Empathy
Expert Systems
Emotions (anger, love, fear, curiosity, depression, joy, desire)
General Problem Solving
Self-Motivation
Neural Networks
Genetic Algorithms
Autonomous Robots
The Turing Test
Semantic Networks
Solution/Problem Space Tree Searches
Biological Neural Networks
dendrites
colaterals
cell body
signal
direction
axon
synapse
Biological Neuron
electrical
signal
axon
vesicles
neurotransmitters
presynaptic
membrane
dendrite
Biological
Network
synaptic
gap
postsynaptic
membrane
electrical
signal
http://pharmacyebooks.com/2010/10/artifitial-neural-networks-hot-topic-pharmaceutical-research.html
The Perceptron
The perceptron was developed by Frank Rosenblatt in 1957. It is a simple feed-forward
network that can solve (create a decision function for) linearly separable problems.
input
data
ββ, +β
output
β1, +1
Inside the Perceptron
πΊπ
weights
ππ
πΊπ
ππ
sigma-pi
...
perceptron
output
...
πΊπ
ππ
πΆπ
πΊππ πΊπ
ππ΅βπ
πΊπ΅βπ
πΊπ΅βπ
ππ΅βπ
step
function
When is a Problem Linearly Separable?
RED vs BLUE
Linearly Separable
Not Linearly Separable
http://dynamicnotions.blogspot.com/2008/09/single-layer-perceptron.html
The Iris Flower Dataset
Iris setosa
Iris versicolor
Iris virginica
https://en.wikipedia.org/wiki/Iris_flower_data_set
http://sebastianraschka.com/Articles/2014_python_lda.html
Sepal /Petal
A Practical Application of a Neural Network
Classification
The Iris Data - This is one of the most famous datasets used to illustrate the classification problem. From four
characteristics of the flower (the length of the sepal, the width of the sepal, the length of the petal and the width of
the petal), the objective is to classify a sample of 150 irises into three species: versicolor, virginica and setosa.
Sources: R.A. Fisher. "The use of multiple measurements in taxonomic problems. Annals of Eugenics", 7(2), 179β188
(1936)
Data from: UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/
Training a 4-2-1 Network for the Iris Data
sepal
length
1/5 of Iris Data selected uniformly, 10 samples per class for a total of
30 training set pairs. The 4-2-1 network is comprised of a total of 10
weights, 8 between the input and hidden layers, and 2 between the
hidden layer and the output.
sepal
width
0.0 Iris-setosa
4-2-1 net
0.5 Iris-versicolor
1.0 Iris-virginica
petal
length
Iris Data - 3 classes 50 samples each
petal
width
5.1
4.9
4.7
4.6
5.0
3.5
3.0
3.2
3.1
3.6
1.4
1.4
1.3
1.5
1.4
7.0
6.4
6.9
5.5
6.5
3.2
3.2
3.1
2.3
2.8
4.7
4.5
4.9
4.0
4.6
iris characteristics
6.3
5.8
7.1
6.3
6.5
3.3
2.7
3.0
2.9
3.0
6.0
5.1
5.9
5.6
5.8
0.2
0.2
0.2
0.2
0.2
:
1.4
1.5
1.5
1.3
1.5
:
2.5
1.9
2.1
1.8
2.2
:
Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-versicolor
trained network specification
input layer
hidden layer
output layer
learning rate
error limit
max runs
# training sets
ihweights
Iris-virginica
Iris-virginica
Iris-virginica
Iris-virginica
Iris-virginica
The outputs for the
three classes were set
to 0, 0.5 and 1.0
howeights
4
2
1
0.28
0.01
10000
30
0.1835137273718
-1.52185484488147
1.06085392071769
-10.1057086709985
-1.53328697751333
4.0131689222145
-1.63759087701708
10.741961194748
-6.01331593454728
6.66056158141261
Classifier
Performance
Sample Count
1
2
3
1
50
0
0
2
0
46
1
3
0
4
49
Perf. Fraction
1
2
3
1
1.0
0.0
0.0
2
0.0
0.92
1.0
3
0.0
0.08
0.98
A Demonstration
Typical Feed-Forward Neural Network
input
layer
hidden
layer
output
layer
output
data
input
data
ββ, +β
ββ, +β
β1, +1
Inside an Artificial Neuron
πΆπ
weights
ππ
ππ
πΆπ
πΊππ πΆπ
ππ΅βπ
πΆπ΅βπ
πΆπ΅βπ
ππ΅βπ
sigmoid
function
distribution to next layer
sigma-pi
neuron
output
...
πΆπ
ππ
...
outputs from previous layer
πΆπ
Backward Error Propagation
1. Initialize the network with small random weights.
2. Present an input pattern to the input layer of the network.
3. Feed the input pattern forward through the network to calculate its activation value.
4. Take the difference between desired output and the activation value to calculate the
networkβs activation error.
5. Adjust the weights feeding the output neurons to reduce the activation error for this
input pattern.
6. Propagate an error value back to each hidden neuron that is proportional to its
contribution to the network activation error.
7. Adjust the weights feeding each hidden neuron to reduce its contribution of error for
this input pattern.
8. Repeat steps 2 to 7 for each input pattern in the training set ensemble.
9. Repeat step 8 until the network is suitably trained.
Implementing a Neural Network
t output
training
sets
each with
p values
t input
training
sets
each with
m values
m input
layer nodes
mxn
weights
input to
hidden
layer
n hidden
layer nodes
nxp
weights
hidden to
output
layer
p output
layer nodes
Neural Network Data Structure & Components
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
double learn = 0.28;
double error = 0.01;
int npairs = 0;
int maxnumruns = 10000;
int numinput = 1;
int numhidden = 1;
int numoutput = 1;
double[,] inTrain;
double[,] outTrain;
neuron[] iLayer;
neuron[] hLayer;
neuron[] oLayer;
weight[,] ihWeight;
weight[,] hoWeight;
int pxerr;
double Scalerr;
bool showtoterr = true;
public class neuron
{
public double input;
public double output;
public double error;
public neuron()
{
input = 0.0;
output = 0.0;
error = 0.0;
}
}
public class weight
{
public double wt;
public double delta;
public weight(double wght)
{
wt = wght;
delta = 0.0;
}
}
Generalized Delta Rule
pth training set input
π«π πππ =πΌπΉππ πππ
tpi
πππ
π«π πππ
πΉππ
πΌ
πππ
correction to weight value
error in jth unit
learning rate
Quantifying Error for Back Propagation
π(πππ ) neuron output function for pth presentation for training
πΉππ = πβ² (πππ ) πππ β πππ
πΉππ = πβ² π (πππ )
(πππ πππ π)
error for jth unit in output layer
πΉππ πππ error for jth unit in hidden layer
πππ
πΉππ
πππ
πΉππ
πππ
πππ
πππ
...
πΉππ
hidden layer
πππ
output layer
pth training set output
πΉππ
The Sigmoid Function
π
π π =
βπ
π + πβππ
sigmoid
πβ² π = π β π(π)π
derivative of
the sigmoid
Another Sigmoid Function
π
π π =
π + πβπ
sigmoid
πβ² π = π(π) π β π(π)
derivative of
the sigmoid
Running the Neural Network
public void calcInputLayer(int p)
{
for (int i = 0; i < iLayer.Length; i++)
{
iLayer[i].output = inTrain[i, p];
}
}
public void calcHiddenLayer()
{
for(int h=0;h<hLayer.Length;h++)
{
hLayer[h].input = 0.0;
for (int i = 0; i < iLayer.Length; i++)
hLayer[h].input += ihWeight[i, h].wt * iLayer[i].output;
hLayer[h].output = f(hLayer[h].input);
}
}
public void calcOutputLayer()
{
for (int o = 0; o < oLayer.Length; o++)
{
oLayer[o].input = 0.0;
for (int h = 0; h < hLayer.Length; h++)
oLayer[o].input += hoWeight[h, o].wt * hLayer[h].output;
oLayer[o].output = f(oLayer[o].input);
}
}
public double f(double x)
{
return 1.0 / (1.0 + Math.Exp(-x));
}
public double df(double x)
{
return f(x) * (1.0 - f(x));
}
Running the network is a feed-forward
process. Input data is presented to the
input layer.
The activation (input) is computed for each
node of the hidden layer and then used to
compute the output of the hidden layer
nodes
The activation (input) is computed and
used to compute the output of the
network.
Training the Network
In backward error propagation, the difference between the actual output and the goal (or target)
output provided in the training set is used to compute the error in the network. This error is then used
to compute the delta (change) in weight values for the weights between the hidden layer and the
output layer.
public void calcOutputError(int p, int r)
{
for (int o = 0; o < oLayer.Length; o++)
oLayer[o].error = df(oLayer[o].input) * (outTrain[o, p] - oLayer[o].output);
for (int h = 0; h < hLayer.Length; h++)
for (int o = 0; o < oLayer.Length; o++)
hoWeight[h, o].wt += learn * oLayer[o].error * hLayer[h].output;
}
public void calcHiddenError(int p, int r)
{
double err = 0.0;
for (int h = 0; h < hLayer.Length; h++)
{
for (int o = 0; o < oLayer.Length; o++)
err = oLayer[o].error * hoWeight[h, o].wt;
hLayer[h].error = df(hLayer[h].input) * err;
}
for (int i = 0; i < iLayer.Length; i++)
for (int h = 0; h < hLayer.Length; h++)
ihWeight[i, h].wt += learn * hLayer[h].error * iLayer[i].output;
}
These new weight values are then used to distribute the output error to the hidden layer nodes. These
nodes errors are, in turn, used to compute the changes in value for the weights between the input layer
and the hidden layer of the network.
1. Set the number of neurons in each level
2. Select the learning rate, error limit and max
training runs
3. Give the number of training pairs and include
them in the left-hand text window with input
output pairs listed sequentially
input 1
output 1
input 2
output 2
:
input n
output n
Total Training Set Ensemble Error
during training process
Training rate depends on initial
value of random weights
How Many Nodes?
Number of Input Layer Nodes matches number of input values
Number of Ouput Layer Nodes matches number of output values
But what about the hidden Layer?
Too few hidden layer nodes and the NN can't learn the patterns.
Too many hidden layer nodes and the NN doesn't generalize.
When Should We Use Neural Networks?
Neural Networks need lots of data (example solutions) for training.
The functional relationships of the problem/solution are not well understood.
The problem/solution is not applicable to a rule-based solution.
"Similar input data sets generate "similar" outputs.
Neural Networks perform general Pattern Recognition.
Neural Networks are particularly good as Decision Support tools.
Also good for modeling behavior of living systems.
Can a Neural Network do More than a Digital Computer?
Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the
computer on which it is being executed.
The question is, "Can a computational system such as a Neural Network be built that can do something that a
digital computer cannot?"
A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of
all computable functions.
An artificial Neural Network is loosely modeled on the human brain.
Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the
activities of human brain cells.
Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a
human brain can do?
Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated
by a computer or any other physical (man-made) system?
What is the Computational Power of the Human Mind?
Since we can't quantify consciousness, it is not likely that we can determine the level of computational
power necessary to manifest it.
However, we can establish a relative measure of computational power for systems that do and (so far) do
not exhibit consciousness.
Human Mind/Brain
Turing Machine
Digital Computer
Neural Network
Physical System/Model
Relative Computational Power
Mind/Brain
Turing
Machine
Digital
Computer
Physical
System
Neural
Network
Relative Computational Power
Mind/Brain
Dualism
vs
Materialism
The Revised
Turing Test
Turing
Machine
Finite Storage
and
Finite Precision
Digital
Computer
Physical
System
Due to limitations of finite
storage and the related issue of
finite precision arithmetic, a
Turing Machine can exhibit
greater computational power
than a digital computer.
Symbolism
vs
Connectionism
Engineering
and
Technology
Neural
Network
Relative Computational Power
Mind/Brain
Turing
Machine
Digital
Computer
Physical
System
Neural
Network