Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine learning wikipedia , lookup
Neural modeling fields wikipedia , lookup
Gene expression programming wikipedia , lookup
Cross-validation (statistics) wikipedia , lookup
Pattern recognition wikipedia , lookup
Time series wikipedia , lookup
Convolutional neural network wikipedia , lookup
PERFORMANCE OF MEE OVER TDNN IN A TIME SERIES PREDICTION Rakesh Chalasani Electrical and Computer Engineering, University of Florida, Gainesville ABSTRACT The purpose of the project is to analyze the performance of the minimum error entropy (MEE) cost function in prediction over a non-linear system and compare it with that of the least mean square (LMS). Both are implemented over a time-delay neural network (TDNN), with back propagation, to predict the next sample of sunspot time series using the previous samples. It is observed that the MEE enhances the performance of the prediction. Autonomous prediction is also used to verify the performances of both the systems. 1. INTRODUCTION This project deals with the prediction of a non-linear time series, like sunspot time series, using a non-linear system. For this the time-delay neural network with back propagation learning is used to predict the time series. The primary objective is to implement the MEE cost function over TDNN and verify the improvement this would provide over the traditional MSE cost function. The idea behind using the MEE is to upgrade the weights of different layers of TDNN in way to increase the information potential of the final error[1]. The weight up gradation rules of the outer and the hidden layers using the back propagation algorithm are derived using the new cost function. For the MEE cost function the stochastic information gradient (SIG) [1] is implemented to reduce the complexity. Section 2 contains the TDNN architecture and back propagation weight up gradation rules using the MEE cost function. It also contains the cross validation procedure used to obtain the optimal parameters of the network. Section 3 and 4 contains the results of both LMS and MSE cost functions and autonomous prediction results, respectively. The paper is concluded in section 6. 2. TDNN AND ERROR BACK PROPAGATION The TDNN has 5 delay elements, one hidden layer with X hidden elements, determined by cross validation, and the only one output neuron.So, the topology of the network is 5-X-1 with the outer layer neuron linear and the hidden layer having similar non-linear function. The non-linear function used is tanh(β ∗ x) where β < 1 [2] to ensure that the non-linear function is not driven into saturation all the while. For the MSE cost function the weight up gradation follows the rule in as in Eq. 1. Wk (n) = Wk (n − 1) − µ ∗ 5(Ve (n)) (1) where 5(V ) is the gradient of the error with respect to the weights and µ is the learning rate. For a single linear output unit the gradient of the MSE with respect to the ith input is shown in Eq. (2). [2] ∂E(n) = e(n) ∗ xi (n) ∂wio where e = error, xi = ith input (2) And for the k th unit of the hidden layer, the gradient of the MSE with respect to the ith input weight is given by Eq. (3). 0 ∂E(n) = e(n) ∗ wko ∗ f (netk (n)) ∗ xi (n) h ∂wki where net = x.w (3) The error back propagation while using MEE cost function should be made in such a way that the information potential of the error increases with the number of iterations. So, here steepest ascent algorithm can used to update the weights. However, MEE-SIG [1] can be used instead to reduce the complexity. The weight up gradation rule is given in Eq.(4). Wk (n) = Wk (n − 1) + µ ∗ 5(Ve (n)) (4) where 5(V ) is the gradient of the information potential with respect to the weights and µ is the learning rate. If the IP is estimated using a Gaussian kernel then, using the MEE - SIG, the gradient with respect to the weight of ith input to the k th neuron in the output layer (without non-linear function) is given by Eq. (5). [1] X L 1 ∂(Ve (n)) = G√2σ (e(n) − e(s)) o ∂wki L ∗ σ 2 s=1 ∗(e(n) − e(s)) ∗ (xi (n) − xi (s)) where e = error, xi = ith input (5) And for the hidden layer neurons (where there is a nonlinear function) the neurons are ’hidden’ from the errors at the output and has to propagate through the output neurons to calculate the gradient. The gradient of the k th neuron in the hidden layer with respect to the ith input to the neuron is given by Eq. (6). is a non-linear time series, as shown in Figure. 1. A point to be noted is that the signal is normalized so that the non-linear units do not go into saturation all the while. X L 1 ∂(Ve (n) = G√2σ (e(n) − e(s)) h L ∗ σ 2 s=1 ∂wki 0 ∗(e(n) − e(s)) ∗ wko ∗ f (net(n)) ∗ xi (n) 0 o −wk ∗ f (net(s)) ∗ xi (s) where net = x.w (6) To find the number of hidden layers that would give the best possible results for the given number of inputs cross validation is used. For this the data is divided into 3 parts and for different number of the hidden units the network is tested with one block of data while the other two are used for training. The average MSE of the error for the 3 cases is calculated for the LMS cost function and the average of final error potential in the 3 cases is used while using the MEE cost function. The results are summarized in Table 1. For these results the Hidden layer units 3 4 5 6 7 8 9 10 Average Normalized MSE (LMS) 0.1280 0.1280 0.1279 0.1278 0.1278 0.1277 0.1277 0.1277 Average Normalized IP (MEE) 0.1243 0.1231 0.1234 0.1213 0.1207 0.1209 0.1208 0.1209 Figure.1: The Sunspot time series. A TDNN with 5 delay elements is implemented to predict the next sample of the time series using the previous samples. As obtained from the cross validation 9 hidden units with nonlinear function tanh is used. The output layer contains only one linear unit. Over this network MSE and MEE cost functions are used to train the system. The network is trained for 1300 iterations and the testing data is of 1500 samples. The learning curves for MSE and MEE are shown in figure 2 and figure 3, respectively. Table 1. Cross validation results for both MSE cost function and MEE cost function number of hidden layers that would give the best results for the given number of delays is considered to be 7. To avoid the local minima in the learning curve momentum term is added to the weight up gradation. The gradient with momentum generating term is as in Eq. (7) Figure.2: The learning curve for LMS cost function. The MSE decreases with the iterations and reaches almost saturation. For output layer: µ = 0.9, for hidden layer µ = 0.9 4(w)k = (1 − α) ∗ 4(w)k−1 + α ∗ µ ∗ 5(V ) where α < 1 (7) 3. RESULTS: MSE AND MEE The Sunspot time series is used for he prediction and compare between the MSE and MEE cost functions. This time series The final normalized error power for the testing data using MSE cost function is 0.1464 and that of MEE cost function is 0.1432. Figure 4 shows the histogram of the errors of the MSE and MEE cost function. This figures shows that the peaking of error in the vicinity of zero values is more 50 different points in the training data set and the length of the series till which the system could predict the next sample with normalized error < 0.8 is noted and the average is taken over the 50 different results. The results are summarized in the table below. Cost function LMS MEE Prediction length 12.3 15.1 Table 2. Autonomous Prediction results Figure.3: The learning curve for MEE cost function. The normalized information potential increases with the iterations and comes to a saturation.For output layer: µ = 0.9, for hidden layer µ = 0.5, σ = 0.9 for MEE than MSE, concluding that the MEE is able to reduce the error in the prediction. Figure 5 shows how the time series is tracked while using different methods. In terms of MSE of the error over the testing data set MSE gives 0.1278 and MEE gives 0.1205, both better than the trivial predictor performance which gives MSE of 0.199. In the autonomous prediction the system cannot tract the testing data like that of the normal testing because the error in predicting the sample in one iteration is made to propagate into the next and hence the error gets accumulated. At one point it cannot be even close to the true value. 5. CONCLUSION From the above results it can be asserted the MEE cost function performs better than the MSE in prediction. Given the complex nature of the sunspot time series, the performance of the MEE is considerable. And the complexity of the MEE is the cost that has to be paid for the better performance. REFERENCES [1] Principe J., Erdogmus D., Information Theoretic Learning, under contract with John Wiley, expected publication. [2] Hassoun M., Fundamentals Of Artificial Neural Networks, MIT Press, 1995. Figure.4: The histogram of the errors using the MSE and MEE cost functions. 4. AUTONOMOUS PREDICTION For autonomous prediction the trained model is initialized with the 6 inputs of the training data set. The output of this first iteration is fed into the input as the present input and try to predict the next sample of the series. Here the autonomous prediction is used for the system trained with the MSE and the MEE cost functions. The initialization of the values of the delays in the TDNN are done from