Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A brief reflection on automatic econometric model selection in a fuel price elasticity context. In December 2005, the VSAE (econometrics study association of the UvA) organised a trip to London and one of the study related activities was a lecture at the University of Oxford on automatic model selection using PcGets. Having just completed my first master’s courses I did not at that time fully understand all that was stated. Despite this lack of understanding, or my physical condition at the time, I was captured by the idea that it was possible to insert data obtained through research into a program and, with the simple click of a button, this program would report the best model obtainable with the present data. I embraced the idea that econometric modelling could be automated. This meant I would not have to do much of it in the future and could spend my time solely on interpreting the results. On the other hand, the word “data-mining” engaged my mind and would not abate. Robert Schipperhein obtained his master’s degree in Econometrics at the University of Amsterdam (UvA) in September 2008. He has been a member of the 2006 VSAE board and during his tenure was the chief editor of Aenorm. He co-organised the 2006 and 2007 editions of the Econometric Game and competed on behalf of the UvA in the 2008 edition. Then two years later I needed to come up with a topic for my thesis. At an internship at the Ministry of Finance I was asked to investigate the price elasticities of motor fuels. Due to the abundant research available on this topic, it was interesting to compare my estimates (based on the extensive literature) to the estimates and model presented by Autometrics. This article is a summary of the second half of my thesis in which I detail the aforementioned comparison. What is Autometrics? Autometrics is the most recent automatic model selection program. It was created by Jurgen Doornik, a student of econometrics at the UvA. He introduced it in a 2007 online paper. Autometrics is a successor of PcGets, which was previously explained by Jennifer Castle in the 52nd edition of Aenorm. Because of limited space I will only briefly explain the Autometrics algorithm. Further information is available by consulting the two articles mentioned above. The General Unrestricted Model (GUM) is the starting point for the model reduction procedure and covers the entire dataset. This GUM 4 AENORM 52 Juni 2006 must provide sufficient information on the process being modelled and (has to be statistically well behaved.) The latter property is checked by series of diagnostic tests. For each insignificant variable the GUM defines a path: delete that variable and continue with backward deletion, one variable at a time. Each path is followed until termination, leaving only significant variables. The process then tests the resultant path with respect to the GUM to prevent too great a loss of information. This terminal is also subjected to diagnostic tests and if rejected, the path is followed in the reverse direction until a suitable model is found. Usually several terminal models are selected by the procedure. In this case Autometrics combines the elements of the selected terminals and creates a new GUM, after which the procedure starts all over again. When the next GUM equals the former one, and still several terminals remain, an information criterion of choice is applied to choose between them. The last important elements of the systematic approach of Autometrics are the ‘pre-search’ possibilities. Lag-length pre-search checks the significance of lags up to a manually decided length. The program deletes the lags of a certain variable up to the first significant one. This ensures that the number of variables in the first GUM is reduced and improves efficiency. A process called ‘bunching’ also reduces the number of paths. The program is designed to begin by attempting to remove blocks of variables that are ‘highly insignificant’. If this is successful the number of branches is reduced. Because of the Variable formula form software acronym entity gasoline consumption qg qbenz 1000 kg diesel consumption qd qdies 1000 kg price of gasoline pg pbenz €/100 liter price of diesel pd pdies €/100 liter vehicle stock c wagenpark 1 income i bni € 1.000.000 population n bevolking 1 inflatie weging CPI(1965 =1) inflation p cpi Table 1: data summary Lag-length pre-search and the block deletion, Autometrics is able to handle GUMs with more variables than observations. Data mining and its risks Classical econometric research begins with an hypothesis based on theory, earlier research or common sense. This hypothesis is then tested using data and statistical tests. The problem with data mining is that it is difficult to properly test the validity of a model obtained from data based on common sense. Once you believe in data mining, and why would you use data mining software if you were not a believer, you will always find a way to explain its results. With a success rate of 70%, many ´wrong´ results will be published. There is further uncertainty regarding the claimed success rate as other sample properties such as size surely affect it. A brief summary of the ´manually obtained´ results With the data presented in table 1 the following two partial adjustment models emerged from the literature and withstood extensive testing: qg b1 b2 pg b3i b4c b5qg1 b6 pd e (1) qd b1 b2 pd b3i b4c b5qd1 b6 pg disturbances Early work on data mining by Lovell (1983) introduces data mining by six quotes. The two most relevant to this discussion are: “The precise variables included in the regression were determined on the basis of extensive experimentation (on the same body of data) . . . .” “The method of step-wise regression provides an economical way of choosing from a large set of variables . . . those which are most statistically significant . . . .” The last quote describes precisely the core purpose of automatic econometrics model selection software like Autometrics. Lovell warns of two possible problems when data mining is used. In an experimental setting he shows that the significance of tests suffers when data mining is performed. This is caused by the large number of tests performed on the same data. Moreover, the researcher who uses a data mining approach is uncertain to select the correct (true) model. Lovell tries three basic data mining strategies in an experimental environment. The best two strategies have a success rate of almost 70%. (2) The variables in these equations are corrected for inflation and population growth, which are not included explicitly. Logs have been taken of all variables. All data is for the Netherlands and aggregated yearly. Consistent data is available from 1975 to 2007. The results of my research on price elasticity are summarised in table 2 below. For further information regarding these results and how they were obtained, please refer to my thesis. Autometrics in action As with any process, output is influenced by input. Autometrics is not designed to distinguish between log-linear specification and linear specification. Moreover, the researcher must choose to use level or differenced input. By making these choices the researcher directs the program towards a certain class of models. To be able to compare the Autometrics results with the earlier results, Autometrics should have the possibility to construct the partial adjustment models (1) and (2). The PcGive–12 guide, price elasticity of gasoline price elasticity of gasoline dependent variable short term long term short term gasoline -0.401 -0.954 0.168 0.490 (0.091) (0.193) (0.057) (0.105) diesel long term 0.247 1.343 -0.263 -1.410 (0.192) (1.653) (0.085) (1.132) Table 2: price elasticities AENORM 52 Juni 2006 5 which includes Autometrics, recommends the use of first differences when the data is integrated of order 1. Such is the case in this instance. However, if the programme is directed to use the data in such a way that a Partial Adjustment Model does or does not result dependant on the researcher’s preference, the automatic model selection program has less selection to do itself. Autometrics should find the right model of its own accord as that is the purpose for which it was designed. the final equation are presented. Although the number of observations declines by one every successive run, the RSS is indicative of the statistical quality of the model. The fourth run has the best RSS, although run 5 and 6 have more freedom and should therefore be able to statistically outperform run 4, instead of indicating an RSS that is twice as large. Moreover, as opposed to earlier runs, the last two runs eliminate the number of vehicles as a relevant variable. This is not only counterintuitive from the point of view of the modeller, who would expect it to be a very important variable. It is also counterintuitive from a statistical point of view because a decreasing number of observations had a negative influence on the power of the significance tests. An important reason for this ‘strange’ behaviour of Autometrics could well be the fact that run 5 and 6 contain more variables than observations while the earlier runs have more observations than variables. Autometrics is able to handle more variables than observations by creating blocks of variables, which it then treats as one variable. The program searches for blocks that can be deleted as a whole, decreasing the number of variables until the ‘normal’ situation of more observations than variables is reached once again. Considering that this setting offers more challenges than a low number of observations, it would seem wise to work with more observations than variables. To keep the possibility of the equations (1) and (2), but not manually skew Autometrics towards this result, level log-specification of the data is chosen. Moreover, the data inserted is not corrected for inflation or population growth. Both correction variables are inserted in the equation individually to give Autometrics all possible freedom. The next decision is the number of lags to include in the GUM. With the small number of observations, more lags will increase the challenge for Autometrics and may make it impossible to automatically select a model. Including only the lagged dependent will direct Autometrics towards my outcome. To explore this dilemma, consider the gasoline equation. When we include all relevant variables up to a certain lag length, we can see what final equations Autometrics provides. The results of running Autometrics with an increasing number of lags included in the initial dataset are reported in table 3. Run 1 includes only the level variables. Run 2 includes the lagged dependent. Run 3 includes all variables and their first lags. Every successive run adds the lags of the previous level. An ‘a’ means the level variable is in the final equation, the ‘b’ means the first lag is included, and so on. Furthermore, the number of included observations, number of included (lags of) variables and the Residual Sum of Squares (RSS) of Variable Run 1 qb Running Autometrics twice, with both gasoline and diesel consumption as dependent variables, yields the two models presented in the Appendix as A.1 and A.2. The final equation for the diesel model does not include a lag of the dependent variable and is therefore not an error correction model. Moreover, the Lagrange Multiplier test rejects the presence of first order autocorrelation in the final gasoline equa- Run 2 Run 3 Run 4 Run 5 b b b d d a,d,e pb a a a a,b,c a pd a a a a,b b a,c d I a a a a,b,c C a a a a b d b,e n a a a a,b a,d a constant a a a a a trend a a a a a a 26 pcpi # observations 30 29 29 28 27 # of var (inc. Lags) 8 9 17 25 33 41 RSS 0.00672 0.00384 0.00384 0.000629 0.001402 0.00129 Table 3: models estimated by Autometrics starting from different GUMs 6 Run 6 AENORM 52 Juni 2006 Short term Long term gasoline diesel gasoline diesel Section 2 0.007 0.022 0.042 0.146 Autometrics 0.001 0.003 0.012 0.012 Table 4: RSS of the models tion. This indicates that the model chosen by Autometrics is improperly specified. Contrary to the final Gasoline model, in which some lag of all the inserted variables remains significant, the Diesel model contains only 5 of the 9 possible variables. Autometrics therefore shows that it ‘believes’ the processes that determine gasoline and diesel consumption are quite different. A feature exploited in the first part of my research is the correction for population growth and inflation. The inclusion of this correction term is based on economic theory and Autometrics will not include it unless imposed. Since correcting for inflation and population growth is considered valid, helping Autometrics by imposing these criteria could improve its results. The final results are beyond the scope of this article, but the selected models differ significantly from A1 and A2. Comparison of Autometrics results with the Partial Adjustment Models The performance of automatic model selection algorithms can only be determined by comparing their outcomes to manually obtained results. There has been extensive research on fuel consumption. The model and estimates obtained manually are in line with the mass of earlier research and therefore reliable enough to compare with the results generated by Autometrics. Equations (1) and (2) include only the first lag of the dependent variable, while the models generated by Autometrics also use second order lags. This causes a difference in the number of observations between the two approaches. The performance of the models can only be compared by RSS if the number of observations is equal. Therefore, we use 28 observations to estimate both models, with only Autometrics using the 1975 observation. For all four equations the resulting RSS is presented in table 4. Note that calculating the RSS for the four models is done with four different dependent variables. This means that the diesel and gasoline statistics cannot be compared to one another. The Section 2 gasoline statistics on the errors can be compared to the Autometrics gasoline residuals. This is because equation 1 can be rewritten as: log Qg Pg I b3 log P cpi N * P cpi Qg C Pd b4 log b5 log 1 b6 log cpi e N N P log N b1 b2 log In this equation the Log N on the right hand side has a coefficient of one. Estimating this equation (with the same dependent variable as the Autometrics model) leads to the same coefficients and, more importantly, to the same residuals as estimating equation 2. The same reasoning is applied to modelling diesel. Obviously the RSS of Autometrics’s choices are superior to my own model. This is of no surprise as Autometrics is designed to find the model of best fit. However, note that the gasoline equation suffers from an AR-test rejection. The RSS in table 4 are obtained for all models with the same number of observations. Another interesting feature is the correlation between the residuals of the estimation of two equations modelling the same consumption by another strategy. Do they tend to over- or under- estimate for the same observations? The correlations between the residual series are 0.11 for gasoline and 0.23 for diesel. Both correlations are very small and I have therefore concluded that the models are indeed very different. In short, on statistical grounds it may be concluded that the results of conventional modelling and Autometrics differ and that Autometrics finds a superior fit. However, the use of the model is more interesting. Most of the time we do not wish to know how the past developed. Understanding the past is just a means to understand the future. With this small number of observations I was able to save only one observation and add two others because I took so much time writing this thesis. Forecasts can be done for 2005-2007. The forecasting errors can be compared between the two methods for the two dependent variables. PcGive reports the Root Mean Square Error (RMSE) when forecasting. The definition of the RMSE is: 1 ¯ h RMSE = ¡ k=1(y k - fk )2 ° ¡¢ h °± 1/ 2 In this case h=3 because there are three of sample observations. yk+fk is the error of the forecast. These are again comparable because of the reasoning given above. As becomes clear from table 5, Autometrics is less accurate at forecasting than my manually AENORM 52 Juni 2006 7 gasoline diesel PAM’s 0.027 0.010 Autometrics 0.034 0.109 Table 5: RMSE of the forecasts obtained models. Correcting the data for population and inflation does improve the forecasting results of Autometrics. The RMSE is better than Autometrics’ earlier results, but still worse than the manual results presented in table 5. If both my own model and Autometrics make a forecast of the 2006 Gasoline tax revenues based on the data until 2004, the forecast of Autometrics is 80 million further off than my own model. This is on top of the fact that the manual model estimate itself is off by roughly 220 million. The partial adjustment model performs better when estimating diesel consumption, but the results of Autometrics in this case are much worse. Conclusion The previous section compared the Autometrics results to my own manually obtained results. Although a single investigation based on a very small sample, the research revealed again the typical pit fall of data mining. Statistically Autometrics outperformed my own research. Contrary to this, the out of sample forecast showed better results for the ‘classical’ approach. This provides further evidence that sound research is not possible without some knowledge about the topic of interest and some common sense. Some remarks in favour of Autometrics have to be made though. The dataset used in my research is very small, with only 30 observations to start with and 28 observations if lags are accounted for. Moreover Co-integratedVAR-modelling, another feature of PcGive that is complementary to Autometrics, has not been tried. Last but not least, I have done little to guide Autometrics with any principles of common sense. There are possibilities to lock certain key-variables that should be in the final model or to insert less ‘raw’ data. Examples of the researcher helping the software are using first differences and inserting data that guides to an error correction model. Final remarks For more information on my research please ask for my thesis (robert.schipperhein@gmail. com). If you have become interested in automatic econometric model selection and are in need of a thesis topic, I have quite a number of ideas I was unable to include in my own thesis. Feel free to contact me since lots of research can be done to map the current use of these al- 8 AENORM 52 Juni 2006 gorithms or even improve them. Serious development of these algorithms has been as recent as 1999 to 2007. Applied work that makes use of either PcGets or Autometrics is still scarce to my knowledge. References Castle, J., (2006). Automatic Econometric Model Selection using PcGets, Aenorm, 52, pp.43-46. (online available; http:// www.aenorm.nl/artikelen/52-castle.pdf) Doornik, J.A., (2007). Autometrics, Department of Econometrics, University of Oxford, http://www.economics.ox.ac. uk/hendryconference/Papers/Doornik_ DFHVol.pdf, 2008-06-06. Lovell, M.C., (1983). Data Mining, The Review of Economics and Statistics, 65, pp. 1-12.