Download A Brief Reflection on Automatic Econometric Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
A brief reflection on automatic
econometric model selection in a
fuel price elasticity context.
In December 2005, the VSAE (econometrics study association of the UvA) organised a trip
to London and one of the study related activities was a lecture at the University of Oxford on
automatic model selection using PcGets. Having just completed my first master’s courses I
did not at that time fully understand all that was stated. Despite this lack of understanding,
or my physical condition at the time, I was captured by the idea that it was possible to insert
data obtained through research into a program and, with the simple click of a button, this
program would report the best model obtainable with the present data. I embraced the idea
that econometric modelling could be automated. This meant I would not have to do much of
it in the future and could spend my time solely on interpreting the results. On the other hand,
the word “data-mining” engaged my mind and would not abate.
Robert Schipperhein
obtained his master’s degree in Econometrics at the University
of Amsterdam (UvA) in September 2008. He has been a
member of the 2006 VSAE board and during his tenure was
the chief editor of Aenorm. He co-organised the 2006 and
2007 editions of the Econometric Game and competed on
behalf of the UvA in the 2008 edition.
Then two years later I needed to come up with
a topic for my thesis. At an internship at the
Ministry of Finance I was asked to investigate
the price elasticities of motor fuels. Due to the
abundant research available on this topic, it was
interesting to compare my estimates (based on
the extensive literature) to the estimates and
model presented by Autometrics. This article is
a summary of the second half of my thesis in
which I detail the aforementioned comparison.
What is Autometrics?
Autometrics is the most recent automatic model selection program. It was created by Jurgen
Doornik, a student of econometrics at the
UvA. He introduced it in a 2007 online paper.
Autometrics is a successor of PcGets, which
was previously explained by Jennifer Castle in
the 52nd edition of Aenorm. Because of limited
space I will only briefly explain the Autometrics
algorithm. Further information is available by
consulting the two articles mentioned above.
The General Unrestricted Model (GUM) is the
starting point for the model reduction procedure and covers the entire dataset. This GUM
4
AENORM
52
Juni 2006
must provide sufficient information on the process being modelled and (has to be statistically
well behaved.) The latter property is checked by
series of diagnostic tests. For each insignificant
variable the GUM defines a path: delete that
variable and continue with backward deletion,
one variable at a time. Each path is followed
until termination, leaving only significant variables. The process then tests the resultant path
with respect to the GUM to prevent too great a
loss of information. This terminal is also subjected to diagnostic tests and if rejected, the
path is followed in the reverse direction until a
suitable model is found. Usually several terminal models are selected by the procedure. In
this case Autometrics combines the elements of
the selected terminals and creates a new GUM,
after which the procedure starts all over again.
When the next GUM equals the former one, and
still several terminals remain, an information
criterion of choice is applied to choose between
them.
The last important elements of the systematic
approach of Autometrics are the ‘pre-search’
possibilities. Lag-length pre-search checks the
significance of lags up to a manually decided
length. The program deletes the lags of a certain variable up to the first significant one. This
ensures that the number of variables in the first
GUM is reduced and improves efficiency. A process called ‘bunching’ also reduces the number
of paths. The program is designed to begin by
attempting to remove blocks of variables that
are ‘highly insignificant’. If this is successful the
number of branches is reduced. Because of the
Variable
formula
form
software
acronym
entity
gasoline
consumption
qg
qbenz
1000 kg
diesel
consumption
qd
qdies
1000 kg
price of
gasoline
pg
pbenz
€/100 liter
price of
diesel
pd
pdies
€/100 liter
vehicle stock
c
wagenpark
1
income
i
bni
€
1.000.000
population
n
bevolking
1
inflatie
weging
CPI(1965
=1)
inflation
p
cpi
Table 1: data summary
Lag-length pre-search and the block deletion,
Autometrics is able to handle GUMs with more
variables than observations.
Data mining and its risks
Classical econometric research begins with an
hypothesis based on theory, earlier research or
common sense. This hypothesis is then tested
using data and statistical tests. The problem
with data mining is that it is difficult to properly
test the validity of a model obtained from data
based on common sense. Once you believe in
data mining, and why would you use data mining software if you were not a believer, you will
always find a way to explain its results. With a
success rate of 70%, many ´wrong´ results will
be published. There is further uncertainty regarding the claimed success rate as other sample properties such as size surely affect it.
A brief summary of the ´manually obtained´
results
With the data presented in table 1 the following
two partial adjustment models emerged from
the literature and withstood extensive testing:
qg
b1 b2 pg b3i b4c b5qg1 b6 pd e (1)
qd
b1 b2 pd b3i b4c b5qd1 b6 pg
disturbances
Early work on data mining by Lovell (1983) introduces data mining by six quotes. The two
most relevant to this discussion are:
“The precise variables included in the regression were determined on the basis of extensive
experimentation (on the same body of data) .
. . .”
“The method of step-wise regression provides
an economical way of choosing from a large set
of variables . . . those which are most statistically significant . . . .”
The last quote describes precisely the core purpose of automatic econometrics model selection
software like Autometrics. Lovell warns of two
possible problems when data mining is used.
In an experimental setting he shows that the
significance of tests suffers when data mining is
performed. This is caused by the large number
of tests performed on the same data. Moreover,
the researcher who uses a data mining approach
is uncertain to select the correct (true) model.
Lovell tries three basic data mining strategies
in an experimental environment. The best two
strategies have a success rate of almost 70%.
(2)
The variables in these equations are corrected
for inflation and population growth, which are
not included explicitly. Logs have been taken of
all variables.
All data is for the Netherlands and aggregated
yearly. Consistent data is available from 1975
to 2007. The results of my research on price
elasticity are summarised in table 2 below.
For further information regarding these results
and how they were obtained, please refer to
my thesis.
Autometrics in action
As with any process, output is influenced by input. Autometrics is not designed to distinguish
between log-linear specification and linear specification. Moreover, the researcher must choose to use level or differenced input. By making
these choices the researcher directs the program towards a certain class of models. To be
able to compare the Autometrics results with
the earlier results, Autometrics should have
the possibility to construct the partial adjustment models (1) and (2). The PcGive–12 guide,
price elasticity of gasoline
price elasticity of gasoline
dependent variable
short term
long term
short term
gasoline
-0.401
-0.954
0.168
0.490
(0.091)
(0.193)
(0.057)
(0.105)
diesel
long term
0.247
1.343
-0.263
-1.410
(0.192)
(1.653)
(0.085)
(1.132)
Table 2: price elasticities
AENORM
52
Juni 2006
5
which includes Autometrics, recommends the
use of first differences when the data is integrated of order 1. Such is the case in this instance.
However, if the programme is directed to use
the data in such a way that a Partial Adjustment
Model does or does not result dependant on the
researcher’s preference, the automatic model
selection program has less selection to do itself.
Autometrics should find the right model of its
own accord as that is the purpose for which it
was designed.
the final equation are presented.
Although the number of observations declines
by one every successive run, the RSS is indicative of the statistical quality of the model. The
fourth run has the best RSS, although run 5
and 6 have more freedom and should therefore be able to statistically outperform run 4,
instead of indicating an RSS that is twice as
large. Moreover, as opposed to earlier runs,
the last two runs eliminate the number of vehicles as a relevant variable. This is not only
counterintuitive from the point of view of the
modeller, who would expect it to be a very important variable. It is also counterintuitive from
a statistical point of view because a decreasing
number of observations had a negative influence on the power of the significance tests. An
important reason for this ‘strange’ behaviour of
Autometrics could well be the fact that run 5
and 6 contain more variables than observations
while the earlier runs have more observations
than variables. Autometrics is able to handle
more variables than observations by creating
blocks of variables, which it then treats as one
variable. The program searches for blocks that
can be deleted as a whole, decreasing the number of variables until the ‘normal’ situation of
more observations than variables is reached
once again. Considering that this setting offers
more challenges than a low number of observations, it would seem wise to work with more
observations than variables.
To keep the possibility of the equations (1) and
(2), but not manually skew Autometrics towards
this result, level log-specification of the data is
chosen. Moreover, the data inserted is not corrected for inflation or population growth. Both
correction variables are inserted in the equation individually to give Autometrics all possible
freedom.
The next decision is the number of lags to include in the GUM. With the small number of observations, more lags will increase the challenge for Autometrics and may make it impossible
to automatically select a model. Including only
the lagged dependent will direct Autometrics
towards my outcome. To explore this dilemma,
consider the gasoline equation. When we include all relevant variables up to a certain lag length, we can see what final equations Autometrics
provides. The results of running Autometrics
with an increasing number of lags included in
the initial dataset are reported in table 3. Run
1 includes only the level variables. Run 2 includes the lagged dependent. Run 3 includes all
variables and their first lags. Every successive
run adds the lags of the previous level. An ‘a’
means the level variable is in the final equation,
the ‘b’ means the first lag is included, and so
on. Furthermore, the number of included observations, number of included (lags of) variables and the Residual Sum of Squares (RSS) of
Variable
Run 1
qb
Running Autometrics twice, with both gasoline
and diesel consumption as dependent variables, yields the two models presented in the
Appendix as A.1 and A.2. The final equation for
the diesel model does not include a lag of the
dependent variable and is therefore not an error correction model. Moreover, the Lagrange
Multiplier test rejects the presence of first order autocorrelation in the final gasoline equa-
Run 2
Run 3
Run 4
Run 5
b
b
b
d
d
a,d,e
pb
a
a
a
a,b,c
a
pd
a
a
a
a,b
b
a,c
d
I
a
a
a
a,b,c
C
a
a
a
a
b
d
b,e
n
a
a
a
a,b
a,d
a
constant
a
a
a
a
a
trend
a
a
a
a
a
a
26
pcpi
# observations
30
29
29
28
27
# of var (inc. Lags)
8
9
17
25
33
41
RSS
0.00672
0.00384
0.00384
0.000629
0.001402
0.00129
Table 3: models estimated by Autometrics starting from different GUMs
6
Run 6
AENORM
52
Juni 2006
Short term
Long term
gasoline
diesel
gasoline
diesel
Section 2
0.007
0.022
0.042
0.146
Autometrics
0.001
0.003
0.012
0.012
Table 4: RSS of the models
tion. This indicates that the model chosen by
Autometrics is improperly specified. Contrary
to the final Gasoline model, in which some lag
of all the inserted variables remains significant,
the Diesel model contains only 5 of the 9 possible variables. Autometrics therefore shows
that it ‘believes’ the processes that determine
gasoline and diesel consumption are quite different.
A feature exploited in the first part of my research is the correction for population growth
and inflation. The inclusion of this correction term is based on economic theory and
Autometrics will not include it unless imposed.
Since correcting for inflation and population
growth is considered valid, helping Autometrics
by imposing these criteria could improve its results. The final results are beyond the scope of
this article, but the selected models differ significantly from A1 and A2.
Comparison of Autometrics results with the
Partial Adjustment Models
The performance of automatic model selection
algorithms can only be determined by comparing their outcomes to manually obtained results. There has been extensive research on
fuel consumption. The model and estimates
obtained manually are in line with the mass of
earlier research and therefore reliable enough to compare with the results generated by
Autometrics.
Equations (1) and (2) include only the first lag
of the dependent variable, while the models generated by Autometrics also use second order
lags. This causes a difference in the number of
observations between the two approaches. The
performance of the models can only be compared by RSS if the number of observations is
equal. Therefore, we use 28 observations to
estimate both models, with only Autometrics
using the 1975 observation. For all four equations the resulting RSS is presented in table 4.
Note that calculating the RSS for the four models is done with four different dependent variables. This means that the diesel and gasoline
statistics cannot be compared to one another.
The Section 2 gasoline statistics on the errors
can be compared to the Autometrics gasoline
residuals. This is because equation 1 can be rewritten as:
log Qg
Pg
I
b3 log
P cpi
N * P cpi
Qg
C
Pd
b4 log b5 log 1 b6 log cpi e
N
N
P
log N b1 b2 log
In this equation the Log N on the right hand
side has a coefficient of one. Estimating this
equation (with the same dependent variable
as the Autometrics model) leads to the same
coefficients and, more importantly, to the same
residuals as estimating equation 2. The same
reasoning is applied to modelling diesel.
Obviously the RSS of Autometrics’s choices are
superior to my own model. This is of no surprise
as Autometrics is designed to find the model of
best fit. However, note that the gasoline equation suffers from an AR-test rejection. The RSS
in table 4 are obtained for all models with the
same number of observations.
Another interesting feature is the correlation
between the residuals of the estimation of two
equations modelling the same consumption by
another strategy. Do they tend to over- or under- estimate for the same observations? The
correlations between the residual series are
0.11 for gasoline and 0.23 for diesel. Both correlations are very small and I have therefore
concluded that the models are indeed very different.
In short, on statistical grounds it may be concluded that the results of conventional modelling and Autometrics differ and that Autometrics
finds a superior fit.
However, the use of the model is more interesting. Most of the time we do not wish to know
how the past developed. Understanding the
past is just a means to understand the future.
With this small number of observations I was
able to save only one observation and add two
others because I took so much time writing this
thesis. Forecasts can be done for 2005-2007.
The forecasting errors can be compared between the two methods for the two dependent
variables. PcGive reports the Root Mean Square
Error (RMSE) when forecasting. The definition
of the RMSE is:
1
¯
h
RMSE = ¡ œ k=1(y k - fk )2 °
¡¢ h
°±
1/ 2
In this case h=3 because there are three of
sample observations. yk+fk is the error of the
forecast. These are again comparable because
of the reasoning given above.
As becomes clear from table 5, Autometrics is
less accurate at forecasting than my manually
AENORM
52
Juni 2006
7
gasoline
diesel
PAM’s
0.027
0.010
Autometrics
0.034
0.109
Table 5: RMSE of the forecasts
obtained models. Correcting the data for population and inflation does improve the forecasting results of Autometrics. The RMSE is better
than Autometrics’ earlier results, but still worse
than the manual results presented in table 5.
If both my own model and Autometrics make
a forecast of the 2006 Gasoline tax revenues
based on the data until 2004, the forecast of
Autometrics is 80 million further off than my
own model. This is on top of the fact that the
manual model estimate itself is off by roughly
220 million. The partial adjustment model performs better when estimating diesel consumption, but the results of Autometrics in this case
are much worse.
Conclusion
The previous section compared the Autometrics
results to my own manually obtained results.
Although a single investigation based on a very
small sample, the research revealed again
the typical pit fall of data mining. Statistically
Autometrics outperformed my own research.
Contrary to this, the out of sample forecast showed better results for the ‘classical’ approach.
This provides further evidence that sound research is not possible without some knowledge
about the topic of interest and some common
sense.
Some remarks in favour of Autometrics have
to be made though. The dataset used in my
research is very small, with only 30 observations to start with and 28 observations if lags
are accounted for. Moreover Co-integratedVAR-modelling, another feature of PcGive that
is complementary to Autometrics, has not been
tried. Last but not least, I have done little to
guide Autometrics with any principles of common sense. There are possibilities to lock certain key-variables that should be in the final
model or to insert less ‘raw’ data. Examples of
the researcher helping the software are using
first differences and inserting data that guides
to an error correction model.
Final remarks
For more information on my research please
ask for my thesis (robert.schipperhein@gmail.
com). If you have become interested in automatic econometric model selection and are in
need of a thesis topic, I have quite a number of
ideas I was unable to include in my own thesis.
Feel free to contact me since lots of research
can be done to map the current use of these al-
8
AENORM
52
Juni 2006
gorithms or even improve them. Serious development of these algorithms has been as
recent as 1999 to 2007. Applied work that
makes use of either PcGets or Autometrics
is still scarce to my knowledge.
References
Castle, J., (2006). Automatic Econometric
Model Selection using PcGets, Aenorm,
52, pp.43-46. (online available; http://
www.aenorm.nl/artikelen/52-castle.pdf)
Doornik,
J.A.,
(2007).
Autometrics,
Department of Econometrics, University
of Oxford, http://www.economics.ox.ac.
uk/hendryconference/Papers/Doornik_
DFHVol.pdf, 2008-06-06.
Lovell, M.C., (1983). Data Mining, The
Review of Economics and Statistics, 65,
pp. 1-12.