* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Document related concepts
Forecast of the Mean Monthly Prices of the Dispatch Contracts in Wholesale Electricity Market of Colombia Using Cascade Correlation Neural Networks Paola Sánchez, Fernán Villa, Juan Velásquez Abstract—Forecasting of the electricity prices in liberalized and deregulated markets have been considered a difficult task due to the amount and complexity of the factors that influence their representation. Traditional neural networks models allow represent those complexities, however often, they are criticized criticism by their lack of statistical. The neural networks model type Cascade Correlation has been used for resolved this problem. Although the Cascade Correlation can be the best of all the traditional neural networks models, they can suffer of over-fitting. For controlling this problem, in this paper some regularization strategies are proposed: weight decay, weight elimination and ridge regression for to forecast the mean monthly prices of the dispatch contracts in the wholesale electricity market of Colombia. We compare the obtained forecasts with a multilayer perceptron and an ARIMA model. The results shows that the regularized cascade correlation capture better the intrinsic dynamics of the time series than other traditional models, and it is able the more accuracy forecast for a horizon of twelve months ahead. Keywords—Time series forecast, cascade correlation. I. INTRODUCTION In the last decade, the electricity industry has experienced significant changes towards deregulation and competition with the aim of improving economic efficiency. In many places, these changes have culminated in the appearance of a wholesale electricity market. In Colombia, the Residential Public Services and the Electricity Laws, has conducted to restructuring of the electricity sector, to a new scheme of free competition. In this new context, the actual operation of the generating units depends on decentralized decisions of generation firms whose goals are to maximize their own profits. All firms compete to provide generation services at a price set by the market through two basic mechanisms: bilateral contracts between agents, and trading of energy spot market. The stock price forecast is a particularly complex problem due to the amount and complexity of the factors that influence their representation , such as, physical Paola Sánchez is with Systems School. National University of Colombia. Av 80 No. 65 - 223 Bl M8, Medellin – Colombia (Phone:057-4-4255370; [email protected]) FernánVilla is with Systems School. National University of Colombia. Av 80 No. 65 - 223 Bl M8, Medellin – Colombia (Phone:057-4-4255370; [email protected]) Juan Velásquez is with Systems School. National University of Colombia. Av 80 No. 65 - 223 Bl M8, Medellin – Colombia (Phone:057-44255370; [email protected]) characteristics of the generation system (‘‘electricity’’ cannot be stored and its transportation requires a transmission lines), the influence of decisions business of the individual agents of the market, and the regulation. In general, the time series of stock prices exhibit these complexities through its features, including: pronounced seasonal cycles of daily, weekly, monthly, and others; timevarying volatility; strong variations from year to year and season to season; long-term structure dynamics; leverage effects and asymmetric response of volatility to positive and negative values; outliers; high-order correlations; structural changes; local trends and mean reversion. Furthermore, dependence on the conditions of generating units in the short run, investment in capacity and demand growth in the long term, and different determinants for the risk of short, medium and long term. Given the complexity of the dynamics of stock prices, the difficulty of forecast and the risk involved, the contracts are a mechanism of risk mitigation. First, prevent the buyer is bounded to price volatility in the stock market, and exceptionally high prices that occur in the presence of extreme hydrological events; second, stabilize earnings of the seller and protect of exceptionally low price. In the Colombian electricity market there are two types of contracts: pay-to-contracted and pay-to-demanded. The payto-contracted contract specifies that the buyer agrees to pay all electricity contracted, whether it was consumed or not. In pay-to-demanded, the buyer pays only the energy actually consumed. The accuracy of forecastingthe stock prices is critical for producers, consumers and retailers. In fact, they must set up bids for the spot market in the short term and define contract policies in the medium term, and in addition, they must define their expansion plans in the long term . For these reasons, all the decisions that each market player must take are strongly affected by price forecasts . Consequently, it was necessary to develop forecast models with high performance for these series. In this paper we analyze the forecast of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia. Neural networks have been often used for modeling of complex time series, particularly in the electricity market has been reported its use in ,  and . However, the estimation of the parameters in the traditional neural networks models, like multilayer perceptron – MLP, has been characterized as a particularly difficult problem. The lack of statistical identifiability of the model is one of the aspects that hinder their specification. This relates to the fact that the optimal parameters are not unique for a specification of the model (inputs or lags, hidden neurons, etc.). The artificial neural network type Cascade Correlation CASCOR  presents interesting conceptual advantages in relation to the statistical identifiability problem of the MLP. CASCOR is designed in the scheme growth size of the network or constructive learning, where there is no need to know a priori the number of required hidden neurons, so the learning can be faster and can have better generalization ability of a MLP . Although CASCOR has advantages to the problem of statistical identifiability of the traditional multilayer perceptron and have proven robust enough to model complex sets, they can suffer overfitting. For controlling this problem, in this paper, we propose use some regularization strategies: weight decay, weight elimination and ridge regression. The main purposes of this paper is forecasting of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia using CASCOR networks, and compare the results obtained with ARIMA and MLP models, in order to determine the best prediction model for the studied series. The results prove the efficiency and practicality of the proposed method. The originality and importance of the proposed paper is based on the following aspects: • Although there is extensive experience in the forecasting of electricity price in short-term markets , there are no references in the literature on forecasting of prices of contracts with CASCOR networks. • There are few experiences in the literature comparing the performance of networks CASCOR with other models in forecast of real world series. • It helps to promulgate the use of CASCOR for the forecast of electricity price series, to increasing the amount of tools available. This article is organized as follow: Section II gives a general description of the CASCOR models for time series forecasting. Section III presents the proposed time seriesCASCOR procedure with regulations strategies, its implementation to the forecast of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia and the comparison of the results obtained with ARIMA and MLP models. Conclusions are drawn in Section IV. II. CASCOR MODEL FOR TIME SERIES FORECASTING The artificial neural network known as Cascade Correlation (CASCOR) proposed in , is designed in the scheme growth size of the network or constructive learning, ie it starts with a minimal network without hidden layers and then constructs a multilayered structure adding one neuron at once in the hidden layer. In the process of adding hidden neurons to the network, each new neuron receives a synaptic connection of each input and each hidden neurons that precede it. After adding the new hidden neuron, input synaptic weights are frozen, while its output weights are trained repeatedly. This process is continuous until it reaches a satisfactory performance. Figure 1 shows the schematic of a network CASCOR, the boxes at the intersections of the lines indicate the weights (parameters wp, h) that are frozen once they have added a unit in the hidden layer. The crosses indicate the weights that are modified after inserting the neuron. Thus, the network CASCOR combines two basic ideas: the first is the cascade architecture, where each hidden neuron is added at once, and are not changed after being added; the second is the incremental or constructive learning, which is concerns how the new hidden neurons are created, where by each new hidden neuron, the algorithm is to maximize the correlation between the new hidden neuron and the residual error of the network, ie hidden neurons are added trying to reduce the error until their performance is satisfactory. However, there are some criticisms against the CASCOR networks, specially oriented to the overfitting problems ,as seen in the next section. Given the improvements of a network CASCOR on a MLP; CASCOR networks could theoretically perform non-linear regression functions to better than an MLP. This (the general problem of regression) has already been addressed in the literature, but the problem of modeling and forecasting time series is more complex than the regression problem, since it must take into account the order of the data and new statistical properties that this ordering induces on the information. Then CASCOR be expected to perform the time-series forecast with accuracy greater than that of MLP. However, this hypothesis has not been proven in the literature and will be demonstrated experimentally in this paper. Fig. 1. Scheme of CASCOR Network  III. RESEARCH METHODOLOGY A. CasCor Regularization Although the cascade correlation neural networks (CasCor) can be better than multilayer perceptrons, they can suffer overfitting. For controlling this problem, we propose use some regularization strategies: weight decay, weight elimination and ridge regression. Weight decay was proposed by Hinton (1989) , and weight elimination by Weigend et al. (1991), this strategies are described in Palit y Popovic (2005). Ridge Regression was proposed by Hoerl and Kennard (1970), the main idea is controlling the bias variance trade-off, for more details, can be consulted. Ridge Regression can be reduced the weight variance, minimized the outliers effect and reduced the validation error of the network. Then for the forecast, we consider the following five regularization schemes: WE, CasCor network regularized with weight elimination. WD, CasCor network regularized with weight decay. RR, CasCor network regularized with ridge regression. WE+RR, CasCor network regularized with weight elimination and ridge regression. WD+RR, CasCor network regularized with weight decay and ridge regression. For weight decay, we take 0.0001 for the lambda value. While for weight elimination we take the same lambda value and 100 for the w0 value. Additionally for estimating the CasCor model parameters, we use the ConRprop optimization algorithm, this was proposed by Villa et al. (2009) and it’s described in. B. Data analysis In this study we use the natural logarithm of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia in $/kWh, between Between January 1997 (1997:01) and October 2009 (2009:10). This data series is available in the Neon system of the enterprise XM Compañía de Expertos en Mercados S.A. E.S.P. Fig. 2 shows that this series features a long-term upward trend from 1997:1 to the first half of 2003 and during the same interval of time is evidence of a cyclical component of variable amplitude annually, explained, possibly for the winter cycle -summer. The largest amplitude of the periodic component coincides with the “El Niño” phenomenon occurred between 1997 and 1998, this cyclical component, although not so marked with an amplitude remains until early 2004. Since 2003, there is a slight downward trend ending sometime in the first half of 2006. Evidenced in this moment of time, a structural change in the series, both in its tendency and in its cyclical component. On the one hand, recovering levels of growth that characterized the years 2000, 2001 and 2002, while the other, it’s again a seasonal cycle of annual period, whose highest level coincides with the summer season. The series consists of 154 data, of which the first 130 (1997:01 to 2007:10) were used to estimate the model parameters. Table I shows the models for time series forecasting. To test the generalizability of the models, we use different two time horizons to forecast: the first consists of 12 data (from 2007:11 to 2008:10), for a year and the second, corresponding two years, from 24 observations (between 2006:7 and 2009:10). C. Empirical results For the series studied in this paper we estimate the models presented in Table I, which was conducted with a forecast Fig. 2. One-Step Ahead Forecasting of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia using a CasCor model. horizon of one to two years, i.e. 12 and 24 months respectively. The fit goodness of the models was measured Model ARIMA TABLE I THE SSE FOR CASCOR AND TRADITIONAL MODELS Sum Square of Errors (SSE) Lags Train Forecast 1 year Forecast 2 year 13 , 0.1255 0.1188 0.7513 14 MLP-1 CASCOR-1 CASCOR-WE-1 CASCOR-WD-1 CASCOR-RR-1 CASCOR-WE+RR-1 CASCOR-WD+RR-1 1–3 1–3 1–3 1–3 1–3 1–3 1–3 0.1773 0.1701 0.1652 0.1242 0.1275 0.0870 0.1608 0.0119 0.0115 0.0111 0.0146 0.0096 0.0100 0.0101 0.0217 0.0207 0.0206 0.0200 0.0149 0.0144 0.0166 MLP-2 CASCOR-2 CASCOR-WE-2 CASCOR-WD-2 CASCOR-RR-2 CASCOR-WE+RR-2 CASCOR-WP+RR-2 1–6 1–6 1–6 1–6 1–6 1–6 1–6 0.1277 0.1254 0.1252 0.1252 0.0809 0.0542 0.0555 0.0121 0.0114 0.0113 0.0112 0.0087 0.0095 0.0093 0.0210 0.0206 0.0201 0.0200 0.0147 0.0160 0.0155 MLP-3 CASCOR-3 CASCOR-WE-3 CASCOR-WD-3 CASCOR-RR-3 CASCOR-WE+RR-3 CASCOR-WD+RR-3 1 – 13 1 – 13 1 – 13 1 – 13 1 – 13 1 – 13 1 – 13 0.0960 0.0743 0.0645 0.0724 0.0423 0.0323 0.0269 0.0100 0,0048 0.0047 0.0045 0.0042 0.0040 0.0022 0.0150 0.0090 0.0090 0.0081 0.0071 0.0070 0.0046 using the sum squared error (SSE) both in training and in prediction (validation), the results are presented in Table I. To evaluate the predictive power of CasCor networks respect to other models, the comparison is made with respect to a MLP, and illustratively presents a model integrated autoregressive moving average (ARIMA).The MLP model was estimated for different sets of lags, and selected the best models with less error. The MLP architecture consists of an input layer with one neuron for each of the lags considered, a hidden layer with 5 neurons hit by the same amount CASCOR models, and one output layer. While the ARIMA model is obtained by using the auto.arima () implemented in R package forecast of Hyndman and Khandakar (2008), which seeks the best ARIMA model for a univariate time series, the ARIMA model was found (0,1,0) (2,0,2), the result of the forecast is also presented in Table 15. Furthermore, stressed that all models achieve an error CASCOR regularized lower than the corresponding network without regularizing. The results show that in models with three lags, the CASCOR-WE+RR-1 is that we get the slightest error in training and forecasting to 2 years, while the CASCOR-RR1 yields the lowest forecast one year, however, the forecast error of a model year CASCOR-WE+RR-1 is only 4% greater than the lesser models of 3 lags.In addition, all three lags CASCOR models have better generalization that the MLP-1, also, the models CASCOR-WD-1 and CASCORWE-RR-1 are better than the ARIMA model in both training and forecasting, the rest only are superior in prediction. By increasing the number of lags, six and thirteen, it appears that CASCOR models are better than the respective MLP, even so are compared to ARIMA. When we have 6 lags, models CASCOR-WE+RR-2 and CASCOR-RR-2 remain the best in training and one year forecast, respectively, but now CASCOR-RR-2 also is the better forecast to 2 years. The difference CASCOR-RR-2 on CASCOR-WE+RR-2 in training is 33%, while that of CASCOR-WE+RR-2 on CASCOR-RC-2 in forecast is 8.42% and 8.13% , at one and two years respectively.The difference between the two models is broader in training, so in this case might be more appropriate model CASCORWE+RR-2. Furthermore, the difference CASCOR-WE+RR2 on CASCOR-WD+RR-2 is -2.34%, 2.11% and 3.13% in training, outcome at one and two years respectively, the modelCASCOR-WD+RR-2 also is emerging as one suitable to model the series.It emphasizes that the regularized models in both layers reach a lower training error than others, but in the lowest forecast error attains the model is regularized only in the output layer. Increase to 13 lags, the model CASCOR-WD+RC-3 as the best of all, this is regulated between the input layer and hidden with weight decay, and between the hidden and output with ridge regression, i.e. in this model controls overfitting. While the model CASCOR-3, which does not have any adjustment strategy, his errors are noticeably larger than those achieved by CASCOR-WD+RR-3. In general, for this series, the models CASCOR fully regularized (between input and hidden layer and between hidden and output layer) achieves better errors that most models regularized only with a technique. As well, completely regularized are more appropriate for forecasting, given that largely control the causes of over fitting.  H.S. Hippert, C.E. Pedreira, and R.C. Souza, "Neural networks for short-term load forecasting: a review and evaluation," IEEE Transactions on Power Systems, vol. 16, no. 1, pp. 44 - 55, Febrero 2001.  Scott E. Fahlman and Christian Lebiere, "The Cascade-Correlation IV. CONCLUSIONS Forecast of the mean monthly prices of the dispatch contracts in wholesale electricity market of Colombia is complex task due to the presence of changes in the cyclic annual pattern, as well as several changes in its long-term trend. We used the first 130 data for parameters estimation of the models, while the remainders were used to evaluate the predictive capability. The results indicate that fully regularized CASCOR networks more accurately predict the MLP, the ARIMA model and those they CASCOR without regulating. The procedure proposed allows finding models with better generalization those other proposals in the literature. Learning Architecture," Advances in Neural Information Processing Systems , vol. 2, pp. 524-532, 1990.  Fernán A. Villa, Juan D. Velásquez, and Reinaldo C. Souza, "Una aproximación a la regularización de redes cascada-correlación para la predicción de series de tiempo.," Investigación Operacional., no. 28, pp. 151-161, 2008.  H.S. Hippert, D.W. Bunn, and R.C. Souza, "Large neural networks for electricity load forecasting: Are they overfitted?," International Journal of Forecasting, vol. 21, no. 3, pp. 425 - 434, 2005.  G.E. Hinton, "Connectionist learning procedures," Artificial Intelligence, no. 40, p. 185–243, 1989.  Andreas S. Weigend, David E. Rumelhart, and Barnardo A. Huberman, "Generalization by weight-elimination with application to forecasting," in Advances in Neural Information Processing Systems, 1558601848th ed., R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Eds. San Mateo, CA, USA: Morgan Kaufmann Publishers Inc., 1991, vol. 3, p. 875–882, ISBN:1-55860-184-8.  Ajoy K. Palit and Dobrivoje Popovic, Computational Intelligence in Time Series Forecasting. London: Springer, 2005. REFERENCES  J. D. Velásquez, I. Dyner, and R. C. Sousa, "¿Por qué es tan difícil obtener buenos pronósticos de los precios de la electricidad en mercados competitivos?," Cuadernos de Administración, no. 20, p. 259 – 282, 2007.  A. Conejo, J. Contreras, R. Espínosa, and M. Plazas, "Forecasting electricity prices for a day-ahead pool-based electric energy market," International Journal of Forecasting, no. 21, p. 435–462, 2005.  Y. Hong and C. Lee, "Aneuro-fuzzy price forecasting approach in deregulated electricitymarkets," Electric Power Systems Research, no. 73, p. 151–157., 2005.  R. Gareta, A. Gil, A. Monzón, and L.M. Romeo, "Las redes neuronales como herramienta para predecir el precio de la energía eléctrica," Energía: Ingeniería energética y medioambiental, vol. 30, no. 180, p. 67—72, 2004, ISSN 0210-2056.  A. E. Hoerl and R. W. Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems," Technometrics., vol. 12, no. 1, p. 55–67, 1970.  Donald W. Marquardt and Ronald D. Snee, "Ridge regression in practice," The American Statistician, vol. 29, no. 1, pp. 3-20, Feb. 1975.  Fernán A. Villa, Juan D. Velásques, and Patricia Jaramillo, "Conrprop: un algoritmo para la optimización de funciones no lineales con restricciones," Revista Facultad de Ingeniería Universidad de Antioquia, no. 50, pp. 188-194, 2009.  R.J. Hyndman and Y. Khandakar, "Automatic time series forecasting: The forecast package for R," Journal of Statistical Software, vol. 26, no. 3, 2008.