Download PERFORMANCE OF MEE OVER TDNN IN A TIME SERIES PREDICTION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Machine learning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Gene expression programming wikipedia , lookup

Cross-validation (statistics) wikipedia , lookup

Pattern recognition wikipedia , lookup

Time series wikipedia , lookup

Convolutional neural network wikipedia , lookup

Catastrophic interference wikipedia , lookup

Backpropagation wikipedia , lookup

Transcript
PERFORMANCE OF MEE OVER TDNN IN A TIME SERIES PREDICTION
Rakesh Chalasani
Electrical and Computer Engineering, University of Florida, Gainesville
ABSTRACT
The purpose of the project is to analyze the performance of
the minimum error entropy (MEE) cost function in prediction
over a non-linear system and compare it with that of the least
mean square (LMS). Both are implemented over a time-delay
neural network (TDNN), with back propagation, to predict
the next sample of sunspot time series using the previous samples. It is observed that the MEE enhances the performance of
the prediction. Autonomous prediction is also used to verify
the performances of both the systems.
1. INTRODUCTION
This project deals with the prediction of a non-linear time
series, like sunspot time series, using a non-linear system.
For this the time-delay neural network with back propagation learning is used to predict the time series. The primary
objective is to implement the MEE cost function over TDNN
and verify the improvement this would provide over the traditional MSE cost function. The idea behind using the MEE
is to upgrade the weights of different layers of TDNN in way
to increase the information potential of the final error[1]. The
weight up gradation rules of the outer and the hidden layers
using the back propagation algorithm are derived using the
new cost function. For the MEE cost function the stochastic
information gradient (SIG) [1] is implemented to reduce the
complexity.
Section 2 contains the TDNN architecture and back propagation weight up gradation rules using the MEE cost function. It also contains the cross validation procedure used to
obtain the optimal parameters of the network. Section 3 and
4 contains the results of both LMS and MSE cost functions
and autonomous prediction results, respectively. The paper is
concluded in section 6.
2. TDNN AND ERROR BACK PROPAGATION
The TDNN has 5 delay elements, one hidden layer with X
hidden elements, determined by cross validation, and the only
one output neuron.So, the topology of the network is 5-X-1
with the outer layer neuron linear and the hidden layer having
similar non-linear function. The non-linear function used is
tanh(β ∗ x) where β < 1 [2] to ensure that the non-linear
function is not driven into saturation all the while.
For the MSE cost function the weight up gradation follows
the rule in as in Eq. 1.
Wk (n) = Wk (n − 1) − µ ∗ 5(Ve (n))
(1)
where 5(V ) is the gradient of the error with respect to the
weights and µ is the learning rate.
For a single linear output unit the gradient of the MSE with
respect to the ith input is shown in Eq. (2). [2]
∂E(n)
= e(n) ∗ xi (n)
∂wio
where e = error, xi = ith input
(2)
And for the k th unit of the hidden layer, the gradient of the
MSE with respect to the ith input weight is given by Eq. (3).
0
∂E(n)
= e(n) ∗ wko ∗ f (netk (n)) ∗ xi (n)
h
∂wki
where net = x.w
(3)
The error back propagation while using MEE cost function
should be made in such a way that the information potential of the error increases with the number of iterations. So,
here steepest ascent algorithm can used to update the weights.
However, MEE-SIG [1] can be used instead to reduce the
complexity. The weight up gradation rule is given in Eq.(4).
Wk (n) = Wk (n − 1) + µ ∗ 5(Ve (n))
(4)
where 5(V ) is the gradient of the information potential with
respect to the weights and µ is the learning rate.
If the IP is estimated using a Gaussian kernel then, using
the MEE - SIG, the gradient with respect to the weight of ith
input to the k th neuron in the output layer (without non-linear
function) is given by Eq. (5). [1]
X
L
1
∂(Ve (n))
=
G√2σ (e(n) − e(s))
o
∂wki
L ∗ σ 2 s=1
∗(e(n) − e(s)) ∗ (xi (n) − xi (s))
where e = error, xi = ith input
(5)
And for the hidden layer neurons (where there is a nonlinear function) the neurons are ’hidden’ from the errors at
the output and has to propagate through the output neurons
to calculate the gradient. The gradient of the k th neuron in
the hidden layer with respect to the ith input to the neuron is
given by Eq. (6).
is a non-linear time series, as shown in Figure. 1. A point to
be noted is that the signal is normalized so that the non-linear
units do not go into saturation all the while.
X
L
1
∂(Ve (n)
=
G√2σ (e(n) − e(s))
h
L ∗ σ 2 s=1
∂wki
0
∗(e(n) − e(s)) ∗ wko ∗ f (net(n)) ∗ xi (n)
0
o
−wk ∗ f (net(s)) ∗ xi (s)
where net = x.w
(6)
To find the number of hidden layers that would give the
best possible results for the given number of inputs cross validation is used. For this the data is divided into 3 parts and
for different number of the hidden units the network is tested
with one block of data while the other two are used for training. The average MSE of the error for the 3 cases is calculated
for the LMS cost function and the average of final error potential in the 3 cases is used while using the MEE cost function.
The results are summarized in Table 1. For these results the
Hidden layer
units
3
4
5
6
7
8
9
10
Average
Normalized MSE
(LMS)
0.1280
0.1280
0.1279
0.1278
0.1278
0.1277
0.1277
0.1277
Average
Normalized IP
(MEE)
0.1243
0.1231
0.1234
0.1213
0.1207
0.1209
0.1208
0.1209
Figure.1: The Sunspot time series.
A TDNN with 5 delay elements is implemented to predict
the next sample of the time series using the previous samples.
As obtained from the cross validation 9 hidden units with nonlinear function tanh is used. The output layer contains only
one linear unit.
Over this network MSE and MEE cost functions are used
to train the system. The network is trained for 1300 iterations
and the testing data is of 1500 samples. The learning curves
for MSE and MEE are shown in figure 2 and figure 3, respectively.
Table 1. Cross validation results for both MSE cost function
and MEE cost function
number of hidden layers that would give the best results for
the given number of delays is considered to be 7.
To avoid the local minima in the learning curve momentum
term is added to the weight up gradation. The gradient with
momentum generating term is as in Eq. (7)
Figure.2: The learning curve for LMS cost function. The MSE decreases
with the iterations and reaches almost saturation. For output layer: µ = 0.9,
for hidden layer µ = 0.9
4(w)k = (1 − α) ∗ 4(w)k−1 + α ∗ µ ∗ 5(V )
where α < 1
(7)
3. RESULTS: MSE AND MEE
The Sunspot time series is used for he prediction and compare
between the MSE and MEE cost functions. This time series
The final normalized error power for the testing data using MSE cost function is 0.1464 and that of MEE cost function is 0.1432. Figure 4 shows the histogram of the errors
of the MSE and MEE cost function. This figures shows that
the peaking of error in the vicinity of zero values is more
50 different points in the training data set and the length of
the series till which the system could predict the next sample
with normalized error < 0.8 is noted and the average is taken
over the 50 different results. The results are summarized in
the table below.
Cost function
LMS
MEE
Prediction length
12.3
15.1
Table 2. Autonomous Prediction results
Figure.3: The learning curve for MEE cost function. The normalized
information potential increases with the iterations and comes to a
saturation.For output layer: µ = 0.9, for hidden layer µ = 0.5, σ = 0.9
for MEE than MSE, concluding that the MEE is able to reduce the error in the prediction. Figure 5 shows how the time
series is tracked while using different methods. In terms of
MSE of the error over the testing data set MSE gives 0.1278
and MEE gives 0.1205, both better than the trivial predictor
performance which gives MSE of 0.199.
In the autonomous prediction the system cannot tract the
testing data like that of the normal testing because the error
in predicting the sample in one iteration is made to propagate
into the next and hence the error gets accumulated. At one
point it cannot be even close to the true value.
5. CONCLUSION
From the above results it can be asserted the MEE cost function performs better than the MSE in prediction. Given the
complex nature of the sunspot time series, the performance of
the MEE is considerable. And the complexity of the MEE is
the cost that has to be paid for the better performance.
REFERENCES
[1] Principe J., Erdogmus D., Information Theoretic Learning, under contract with John Wiley, expected publication.
[2] Hassoun M., Fundamentals Of Artificial Neural Networks, MIT Press, 1995.
Figure.4: The histogram of the errors using the MSE and MEE cost
functions.
4. AUTONOMOUS PREDICTION
For autonomous prediction the trained model is initialized
with the 6 inputs of the training data set. The output of this
first iteration is fed into the input as the present input and try
to predict the next sample of the series.
Here the autonomous prediction is used for the system
trained with the MSE and the MEE cost functions. The initialization of the values of the delays in the TDNN are done from