Download TotalPT - Department of Computer Engineering

Document related concepts

Optogenetics wikipedia , lookup

Secure multi-party computation wikipedia , lookup

Corecursion wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial neural network wikipedia , lookup

Nonblocking minimal spanning switch wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Central pattern generator wikipedia , lookup

Backpropagation wikipedia , lookup

Pattern recognition wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
Kak Neural Network
Mehdi Soufifar:
Mehdi Hoseini:
Amir hosein Ahmadi:
[email protected]
[email protected]
[email protected]
Corner Classification approach
Corners For XOR Function:
0
1
1
1
0.8
0.6
0.4
0.2
0
1
0.8
1
0.6
0
0.8
0.6
0.4
0.4
0.2
0.2
0
0
2
Corner Classification
approach…



Map n-dimensional binary vectors (input) into mdimensional binary vectors (as output)
Mapping function (f) is:
0 
0 0
 
1
01
i
i


Y
 f 
Y  f (X )
1
10
 
 
1 1
0 
 
Using…:


Backpropagation (does not quarantee convergence).
…
3
Introduction


Feedback (Hopfield with delta learning) and feedforward
(backpropagation) networks learn patterns slowly: the network must
adjust weights connecting links between input and output until it
obtains the correct response to the training patterns.
But biological learning is not a single process: some forms are very
quick and others relatively slow. Short-term biological memory, in
particular, works very quickly, so slow neural network models are not
plausible candidates in this case
4
Training feedforward NN [1]


Kak proposed CC1,CC2 in January 1993.
Example:

Exclusive-OR mapping
5
Training feedforward NN [1]


Kak proposed CC1,CC2 in January 1993.
Example:

Exclusive-OR mapping
6
CC1 as an example




Initialize all weight with zero.
If result is true do nothing.
If result=1 and supervise say 0 subtract x vector from weight
vector.
If result=0 and supervise say 1 add x vector to weight vector.
Input Layer
Hidden Layer
as corners
X1
01
Output Layer
(OR Gate)
1
1
X2
y1
10
7
CC1…

Result on first output corner:
samples W1 W2 W3
Init,1
0
0
0
2
0
1
1
3
-1
1
0
8
CC1…

Result on second output corner:
samples W1 W2 W3
Init,1,2
0
0
0
3
1
0
1
4,1,2
0
-1
0
3
1
-1
1
4,1,2
0
-2
0
3,4
1
-2
1
1
1
-2
0
9
CC1 Algorithm

Notations:






Mapping is Y=f(X), X,Y are n and m dimensional binary vectors.
Therefore we have Y i  f ( X i ) (i=1,…,k) (k=number of vectors).
Weight of Vector: number of 1 element on it.
If the k output sequence are written out in an array then the
columns may be viewed as a sequence of m, k dimensional vectors
Wi .
Weight of Y i is  i .
Wj
Wm
W1
i
Weight of X is s i .
Weight of W j  ( y1 j y 2 j ... y kj ) is  j .
y11  y1 j  y1m
y 21  y 2 j



y k1  y kj
 y2m


 y km
10
CC1 Algorithm…





Start with the random initial weight vector.
If the neuron says no when it should say yes, add
the input vector to the weight vector.
If the neuron says yes when it should say no,
subtract the input vector from the weight vector.
Do nothing otherwise.
Note that a main problem is “what’s the number neurons in the
hidden layer?”
11
Number of hidden neurons
•Consider that:
• And the number of
hidden neurons can be
reduced by the
duplicating neurons
equals to:

ik
i 0
i
12
Number of hidden neurons…

Theorem: The number of hidden neurons required to realize
i
i
the mapping Y  X ,i=1,2,…,k is equal to:

m
 j i 1  i
And since  j 1  j i 1 i we can say:
m

j 1
k

k
The number of hidden neurons required to realize the
mapping is at most k.
13
Real Applications problem [1]

Comparison Training results:
Alg. On XOR problem
Number of Iteration
BP
6,587 [1]
CC (CC1)
8 [1]
14
Proof of convergence [1]

We would establish that the classification algorithm

converges if there is a weight vector W such that W   X  
for the corner that needs to be classified, and W   X  
otherwise.
Wt is the weight vector of t-th iteration
Θ is the angle between W  and Wt

If neuron say no, when it must say yes:


15
Proof of convergence…




Numerator on cosine becomes:
W  produces correct result, we know that:
W X 
And:
And we get same inequality for the other type of
misclassification(W   X   ).
16
Proof of convergence…

Repeating this process for t iteration produces:

For the cosine’s denominator(

If neuron says no we have Wt 1  X  0 then:

):
And same result will be obtained for other type of

misclassification( W  X   ).
17
Proof of convergence…

Repeating substitution produces:

Since X
2
 n ,we have:
Wt

2
 tn
Then we have:
18
Proof of convergence…

From (1), (2) we can say:
W   Wt
t
 t
cos  




W Wt
W
t n W n
19
Types of memory

Long-term
In AI like BP & RBF,…

Short-term
Learn instantaneously with good generalization
20
Current network characteristics
What the problem of BP and RBF



They require iterative training
Take long time to learn
Sometimes doesn’t converge
Result


They are not applicable in real-time application
They could never learn short-term,
instantaneously-learned memory (the most
significant aspects of biological working memory ).
21
CC2 algorithm

In this algorithm weight are given as follows:


The value of
implies that the threshold of
hidden neurons to separate this sequence is
.
Ex:

Result of CC2 on last example is:
0 1
1 0
-1 1
1 -1
W3 = -(si-1)=-(1-1)=0
22
Real Applications problem

Comparison Training results:
Alg. On XOR problem
Number of Iteration
BP
6,587 [1]
CC (CC1)
8 [1]
CC (CC2)
1 [1]
23
CC2’s Generalization…[3]

Hidden neurons’ weight are:



r is the radius of the generalized region
If no generalization is needed then r = 0.
For function mapping, where the input vectors are
equally distributed into the 0 and the 1 classes, then:
n
r 
2
24
About choice of h[3]


consider a 2¡dimensional problem:
The function of the hidden node can be
expressed by the separating line:
25
About choice of h[3]

Assume that the input pattern being
classified is (0 1), then x2 = 1. Also,w1 = h,
w2 = 1, and s = 1. The equation of the
dividing line represented by the hidden
node now becomes:
26
About choice of h…
4
(h=-1 and
r=0)
3
2
1
0
-1
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
27
About choice of h…
4
(h=-0.8 and
r=0)
3
2
1
0
-1
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
28
About choice of h…
3
2
(h=-1 and
r=1)
1
0
-1
-2
-3
-4
-5
-4
-3
-2
-1
0
1
2
3
4
29
CC4 [6]




The CC4 network maps an input binary vector X to an
output vector Y.
The input and output layers are fully connected.
The neurons are all binary neurons with binary step
activation function as follows:
The number of hidden neurons is equal to the number of
training samples with each hidden neuron representing one
training sample.
30
CC4 Training[6]



Let wij (i  1,.., N  j  1,..., H )
be the weight of the connection
from input neuron i to hidden
neuron j and
let X ij be the input for the i-th
input neuron when the j-th
training sample is presented to
the network.
Then the weights are assigned
as follows:
31
CC4 Training [6]…



Let u jk ( j  1,.., H  k  1,..., M )
be the weight of the connection
from j-th hidden neuron to the
k-th output neuron.
let Y jk be the output of the k-th
output neuron for the j-th
training sample.
The value of u jk are determined
by the following equation:
32
Sample of CC4


Consider The 16 by 16 area of a spiral pattern that
contains 256 binary pixel (as black and white) as figure 2..
And we want to train a system with 1 exemplar sample as
figure 2 that total 75 point are used for training.
Figure 1
Figure 2
33
Sample of CC4…



16
We can code 16 integer numbers
with 4 binary bits.
Therefore for location (x,y), we
will use 4 bits for x and 4 bits for
y, and 1 extra bit (always equal
to 1) for the bias.
Totally we have 9 inputs.
16
34
Sample of CC4…
0
-1
1
1
0
-1
1
# corner
(5,6)
-1
-1
0
1
1
-1
0
-1
0
r-s+1=r-3+1=r-2
0 corner
35
Sample of CC4 result…
Original spiral

Training sample
Number of point classified
/misclassified in the spiral
pattern.
Output, r=2
Output, r=3
Output, r=1
Output, r=4
36
FC motivation
Disadvantages of CCs algorithm


Input and output must be discrete
Input is best presented in a unary code
increases the number of input neurons considerably

Degree of generalization for all nodes is the same
37
Problem


In reality this degree vary from node to node
We need to work on real data

An interative version of the CC algorithm that
does provide a varying degree of
generalization has been devised .

Problem :
It is not instantaneous
38
Fast classification network
What is FC?



a generalization of the CC network
This network can operate on real data directly
Learn instantaneously
It reduces to CC in a way that :


data is binary
amount of generalization is fixed
39
Input
X=( x1, x2, …,xk ) ,
F(x)
Y
•All xi and Y are real data
•K is determined by problem nature
What to do
Define weight for input & output weight
Define radius of generalization
40
Input
Index
1
Input
Output
x1,x2,x3,x4 Y1,Y2
2
..
x1,x2,x3,x4 Y1,Y2
41
FC network structure
42
The hidden neurons
43
The rule base
Rule 1:
IF m = 1, THEN assign μi using single-nearest-neighbor (1NN)
Rule 2:
IF m = 0, THEN assign μi using k-nearest-neighbor (kNN) heuristic.
M=the number of hi that equal to 0
• value of k is typically a small fraction of the
size of the training set.
• Membership grades are normalized,
44
1NN heuristic

when exactly one element in the
distance vector h is 0
45
kNN heuristic
Based on k nearest neighbors.
Triangular membership
46
Training of the FC network
Training involves two separate step:


Step1:input and output weights are prescribed simply
by inspection of the training input/output pairs
Step2:the radius of generalization for each hidden
neuron is determined
ri=1/2dmin i
47
Radius of generalization
hard generalization with separated decision
regions
Soft generalization together with interpolation
48
Generalization by fuzzy
membership
The output neuron then computes the dot product
between the output weight vector and the
membership grade vector
49
Other consideration

Other membership function.
quadratic function known as S
50
Other consideration

Other distance metric.
city block distance
..
Result :
performance of the network is not seriously affected
by the choice of distance metric and membership
function
51
Hidden neuron


As in CC4:
Number of training samples that the
network is required to learn.
Note: training sample are exemplar
52
Example
}
d23 = 11.27
r1=2.5
r2=2.5
r3=5
53
Example
}
d23 = 11.27
r1=2.5
r2=2.5
r3=5
Input :
Y=0.372*7 + 0.256*4 +
0.372*9 =6.976
54
Experimental result
Time-series prediction

electric load demand

Forecast

Traffic volume forecast

Prediction of stock prices, currency, and interest rates
describe the performance of the FC network using two benchmark
With different characteristic

Henon map time series

Mackey–Glass time series
55
Henon map
one-dimensional Henon map:
Generated point
Training samples
Testing samples
Window size
544
500 out of 504
50
4
Input X
Output Y
X(1), X(2), X(3), X(4)
X(5)
X(2), X(3), X(4) ,X(5)
X(6)
56
Henon map time-series
prediction using
FC (4-500-1), k = 5.
57
Result
Henon map time-series prediction using FC network
SSE : sum-of-squared error
58
Mackey-Glass time series
nonlinear time delay differential equation originally
developed for modeling white blood cell production.
A, B, C : constants
D : the time delay parameter.
Popular case :
A
B
C
0.2 0.1 10
D
30
59
Henon map time-series
prediction using
FC (4-500-1), k = 5.
60
PERFORMANCE SCALABILITY
FC network and RBF network are optimized for
a sample size of 500 and window size of 4.
Parameter such as spread constant for RBF are
set to the best value
Then
The window and the sample size are allowed to
change without reoptimization
61
PERFORMANCE SCALABILITY
62
PERFORMANCE SCALABILITY
63
Result


performance of the FC network remains good and
reasonably consistent throughout all window and
sample sizes
RBF network is adversely affected by changes in the
window size or sample size or both
Conclusion


The performance of the RBF network can become
erratic for certain combinations of these parameters.
FC is generally applicable to other window sizes and
sample sizes
64
Pattern recognition





pattern in a 32-by-32 area
Input : row and column coordinates of
the training samples [1,32]
Two output neurons, one for each class
White region : (1,0)
black region : (0,1)
65
Result
Two-class spiral pattern classification
Input neuron
Training sample
Output neuron
66
Result
Four-class spiral pattern classification
Input neuron
Training sample
Output neuron
67
References
[1] S.C. Kak, On training feedforward neural networks. Pramana -J.
of Physics, 40, 35-42 (1993).
[2] G. Mirchandani and W. Cao, On hidden nodes for neural nets.
IEEE Trans. on Circuits and Systems 36, 661-664 (1989).
[3] S. Kak (1998), “On generalization by neural networks”,
Information Sciences, vol. 111, pp. 293-302.
[4] S. Kak, Better web searches and prediction with
instantaneously trained neural networks, IEEE Intelligent
Systems, vol. 14(6), pp. 78–81, 1999.
[5] CHAPTER 7 , RESULTS AND DISCUSSION
[6] Bo Shu, Subhash Kak, A neural network-based intelligent
metasearch engine ,Information Sciences, 120 (1999)1-11
68
References



[7] S. Kak (2002), “A class of instantaneously trained neural
networks”, Information Sciences, vol. 148, pp. 97-102.
[8] K.W. Tang and S. Kak (2002), “Fast Classification Networks
for Signal Processing”, Circuits Systems Signal Processing, vol.
21, pp. 207-224.
[9] S. Kak, “Three languages of the brain: Quantum,
reorganizational, and associative, “ In Learning as SelfOrganization, K. Pribram and J. King, eds., Lawrence Erlbaum,
Mahwah, N.J., 1996, pp. 185--219.
69