Download Artificial Intelligence 人工智能

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concept learning wikipedia , lookup

Perceptual control theory wikipedia , lookup

Gene expression programming wikipedia , lookup

Machine learning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Backpropagation wikipedia , lookup

Pattern recognition wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Chapter 3
Neural Network
Xiu-jun GONG (Ph. D)
School of Computer Science and Technology, Tianjin
University
[email protected]
http://cs.tju.edu.cn/faculties/gongxj/course/ai/
Outline

Introduction

Training a single TLU

Network of TLUs—Artificial Neural Network

Pros & Cons of ANN

Summary
Biological /Artificial Neural Network
x1
x2
Structure of a typical
neuron
SMI32-stained
pyramidal neurons in
cerebral cortex.
w1
w2
… wn
xn
Artificial Intelligence
Recognition modeling
Neuroscience
 f(s)
F(s)
Definition of ANN

Stimulate Neural Network: SNN, NN
It is an interconnected group of artificial
neurons that uses a mathematical or
computational model for information
processing based on a connectionist
approach to computation. In most cases
an ANN is an adaptive system that
changes its structure based on external or
internal information that flows through the
network.
Applications of ANN
Function approximation, or regression
analysis, including time series prediction
and modeling.
 Classification, including pattern and
sequence recognition, novelty detection
and sequential decision making.
 Data processing, including filtering,
clustering, blind signal separation and
compression.

Extension of a TLU

Threshold Logic Unit -> Perceptron (Neuron)
Inputs are not limited be boolean
values
Outputs are not limited be binary
functions
Output functions of a perceptron
θ
1
f 
0
s 
s 
1
f 
1  e s
Characters of sigmoid function
Smooth, continuous, and monotonically
increasing (derivative is always positive)
 Bounded range - but never reaches max
or min
 The logistic function is often used

1
f 
s
1 e
f '  f (1  f )
Linear Separable function by TLU
___
f ( x1 , x\ 2 , x3 )  x1 x2 x3
___
___
f ( x1 , x\ 2 )  x1 x2  x1 x2
A network of TLUs
1
x1
y1
-1
x2
XOR
1
0.5
0.5
-1
0.5
y2
1
f
___
___
f ( x1 , x\ 2 )  x1 x2  x1 x2
Even-Parity Function
f  x1x 2  x1x 2
Training single neuron

What is the learning/training

The methods



The Delta Procedure
The Generalized Delta Procedure
The Error-Correction Procedure
Reform the representation of a perceptron
x1
Summing
Junction
W1
x2
W2
S=WX
…
xn
Activation
Function
f = f (s)
output
Wn
xn+1 ≡ 1
Wn+`
s  i 1 w i x i    i 1 w i x i
n
n 1
 w1 
 w2 


s  x1 x 2 ... xn xn  1 ... 


 wn 
 wn  1
Gradient Decent Methods

Minimizing the squared error of desired
response and neuron output

Squared error function: ε = (d - f)2
 def  


 

,
,...,
,

W

w

w

w

w
1
2
n
n
1


S W  X

 f S
f

 2(d  f )
X
W f S W
S
The Delta Procedure
Using linear function
 Weight update:



f=s
W ← W + c (d – f ) X
Delta rule (Widrow-Hoff rule)
The Generalized Delta Procedure
 Using
sigmoid function f (s) = 1 /
(1+e-s)
 Weight update
W ← W + c (d – f ) f (1-f ) X

Generalized delta procedure:
f (1– f ) → 0 , where f → 0 or f → 1
Weight change can occur only within ‘fuzzy’
region surrounding the hyperplane near the
point f = 0.5
The Error-Correction Procedure
Using threshold function (output : 0,1)
 The weight change rule



W ← W + c (d – f ) X
W←W±cX
In the linearly separable case, after finite
iterations, W will be converged to the
solution.
 In the nonlinearly separable case, W will
never be converged.

An example
x1=S2+S3
x2=S4+S5
x3=S6+S7
x4=S8+S9
x1
x2
x3
x4
1
W11
W21
W31
W41
W51
east
ANN: Its topologies
Recurrent
ANN
Inputs
Inputs
Outputs
Context Layer
Outputs
Feedback
Feedforward
Training Neural Network

Supervised method



Unsupervised method (Self-organization)



Trained by matching input and output patterns
Input-output pairs can be provided by an external teacher, or
by the system
An (output) unit is trained to respond to clusters of pattern
within the input.
There is no a priori set of categories
Enforcement learning



An intermediate form of the above two types of learning.
The learning machine does some action on the environment
and gets a feedback response from the environment.
The learning system grades its action good (rewarding) or bad
(punishable) based on the environmental response and
accordingly adjusts its parameters.
Supervised training
Back-propagation—Notations
j  Layer #
j0
jm
j 1
x p1
N 0 # input
O p1
T p1
---Op2
x p2
|
|
|
N j # neurons in layer j
x pN 0
|
|
|
----
|
|
|
|
|
|
O pN M
Tp 2
p : the pth pattern of n patterns
NM # output
|
|
|
T pN M
Y ji : output of ith neuron in Layer j
 ji : the error valu e associated with
the ith neuron in Layer j
W jik : the connection weight from kth neuron
in layer (j - 1) to the ith neuron in Layer j
Back-propagation: The method
1. Initialize connection weights into small random values.
2. Present the pth sample input vector of
pattern and the corresponding output target to
the network
X p  ( x p1 , x p 2 ,....x pN0 )
Yp  (Yp1 , Yp 2 ,....YpNM )
3. Pass the input values to the first layer, layer 1.
For every input node i in layer 0, perform:
4 For every neuron i in every layer j = 1, 2, ...,
M, from input to output layer, find the output
from the neuron:
Y0 i  x pi
N j 1
Y ji  f (  Y( j 1) kW jik )
5. Obtain output values. For every output node i in
layer M, perform:
6.Calculate error value for every neuron i in every
layer in backward order j = M, M-1, ... , 2, 1
k 1
O pi  YMi
The method cont.
6.1 For the output layer, the error value is:
 Mi  YMi (1  YMi )(Tpi  YMi )
6.2 For the hidden layer, the error value is:
N j 1
 ji  Y ji (1  Y ji )   ( j 1) kW( j 1) ki
k 1
6.3 The weight adjustment can be done for every connection
from neuron k in layer (i-1) to every neuron i in every layer i:
Wijk  Wijk   jiY ji
The actions in steps 2 through 6 will be repeated for every
training sample pattern p, and repeated for these sets until the
root mean square (RMS) of output errors is minimized.
NM
E p   (Tpj  O pj ) 2
j 1
Generalization vs. specialization

Optimal number of hidden neurons



Overtraining:


Too many hidden neurons : you get an over
fit, training set is memorized, thus making the
network useless on new data sets
Not enough hidden neurons:
network is unable to learn problem concept
Too much examples, the ANN memorizes the
examples instead of the general idea
Generalization vs. specialization trade-off
K-fold cross validation is often used
Unsupervised method
No help from the outside
 No training data, no information available
on the desired output
 Learning by doing
 Used to pick out structure in the input:




Clustering
Reduction of dimensionality  compression
Kohonen’s Learning Law (SelfOrganization Map)

Winner takes all (only update weights of
winning neuron)
SOM algorithm
An example: Kohonen Network.
Reinforcement learning
Teacher: training data
 The teacher scores the performance of the
training examples
 Use performance score to shuffle weights
‘randomly’
 Relatively slow learning due to
‘randomness’

Anatomy of ANN learning algorithm
ANN
Learning
Unsupervis
ed
Supervised
Reinforcem
ent learning
Logic inputs
Continuous
inputs
Logic inputs
Continuous
inputs
Hopfield
Back
propagation
ART
SOM, Hebb,
Pros & Cons of ANN
Pros:




A neural network can perform
tasks that a linear program can
not.
When an element of the neural
network fails, it can continue
without any problem by their
parallel nature.
A neural network learns and does
not need to be reprogrammed.
It can be implemented in any
application.
Cons :



The neural network
needs training to
operate.
The architecture of a
neural network is
different from the
architecture of
microprocessors
therefore needs to be
emulated.
Requires high
processing time for
large neural networks.
Summary

The capability of ANN representations

Training a single perceptron

Training neural networks

The ability of Generalization vs.
specialization should be memorized