Download Serial Correlation and Heteroskedasticity in Time Series Regressions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

German tank problem wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 12: Serial Correlation and
Heteroskedasticity in Time Series Regressions
Econometrics II
Spring 2010
Properties of OLS with Serially Correlated Errors
• Unbiasedness and Consistency
• Efficiency and Inference
Because the Gauss-Markov Theorem requires both homoskedasticity
and serially uncorrelated errors, OLS is no longer BLUE in the presence
of serial correlation. Even more importantly, the usual OLS standard
errors and test statistics are not valid, even asymptotically.
Consider an AR(1) serial correlation model with the first four GaussMarkov assumptions held true. Assume
yt = β0 + β − 1xt + ut
ut = ρut−1 + et , t = 1, 2, . . . , n
where |ρ| < 1 and the et are uncorrelated random variables with mean
zero and variance σe2 . For simplicity, we assume that the sample average
of the xt is zero (x̄ = 0). Then, the OLS estimator of β1 can be written
as
n
P
xt ut
t=1
,
β̂1 = β1 +
SSTx
1
P
where SSTx = nt=1 x2t . Now, in computing the variance of β̂1 (conditional on X), we must account for the serial correlation in the ut :
n
X
xt u t )
V ar(β̂1 ) = SSTx−2 · V ar(
t=1
= SSTx−2 ·
n
X
x2t V ar(ut ) + 2
σ2
σ2
+2
SSTx
SSTx2
xt xt+j E(ut ut+j )
t=1 j=1
t=1
=
n−1 X
n−t
X
n−1 X
n−t
X
ρj xt xt+j
(12.4)
t=1 j=1
where σ 2 = Var(ut ) and we have used the fact that E(ut ut+j ) = Cov(ut , ut+j )
= ρj σ 2 .
• Goodness-of-fit
• Serial Correlation in the Presence of Lagged Dependent Variables
Almost every textbook on econometrics contains some form of the statement “OLS is inconsistent in the presence of lagged dependent variables
and serially correlated errors.” Unfortunately, as a general assertion,
this statement is false.
2
There is a version of the statement that is correct. To illustrate, suppose that the expected value of yt given yt−1 is linear:
E(yt |yt−1 ) = β0 + β1 yt−1 ,
where we assume stability, |β1 | < 1. Certainly, we can write this equation with an error term as
yt = β0 + β1 yt−1 + ut
(12.6)
where E(ut |yt−1 ) = 0.
We now see how the OLS estimation of (12.6) leads to inconsistent
estimators, provided the errors ut follow an AR(1) model:
Testing for Serial Correlation
We now start to discuss several methods of testing for serial correlation in
the error terms. Consider a multiple regression model:
yt = β0 + β1 xt1 + . . . + βk xtk + ut .
• A t test for AR(1) Serial Correlation with Strictly Exogenous Regressors
In the AR(1) model,
ut = ρut−1 + et , t = 2, . . . , n,
the null hypothesis that the errors are serially uncorrelated is
H0 : ρ = 0
3
However, the actual ρ can never be observed. We could estimate ρ by
regressing ût on ût−1 , for every t = 2, . . . , n, to obtain the estimated
value, ρ̂.
• The Durbin-Watson Test under Classical Assumptions
Another test for AR(1) serial correlation is the Durbin-Watson test.
The Durbin-Watson (DW) statistic is also based on the OLS residuals:
n
P
(ût − ût−1 )2
DW = t=2 P
.
n
2
ût
t=1
In addition, it can be shown that
DW ≈ 2(1 − ρ)
4
DW test versus t test
• Testing for AR(1) Serial Correlation without Strictly Exogenous Regressors
When the explanatory variables are not strictly exogenous,so that one
or more xtj are correlated with ut−1 , neither the t test nor the DurbinWatson test are valid, even in large samples. To overcome this challenge, Durbin suggested two alternatives:
1. Durbin’s h test =⇒ This statistic is not always computable.
5
2. Testing with general regressors:
More generally, we can test for serial correlation in the autoregressive
model of order q:
ut = ρ1 ut−1 + ρ2 ut−2 + . . . + ρq ut−q + et .
The null hypothesis is
H0 : ρ1 = 0, ρ2 = 0, . . . , ρq = 0
(12.21)
• Testing for Higher Order Serial Correlation
The previous test is easily extended to higher orders of serial correlation.
Example [AR(2)]
Procedure for testing AR(q) serial correlation:
6
• The Breusch-Godfrey test
An alternative to computing the F test is to use the Lagrange multiplier
(LM ) form of the statistic. The LM statistic for testing (12.21) is
H
LM = (n − q)Rû2 ∼0 χ2q
where Rû2 is the R-squared derived from running the regression of
ût on xt1 , xt2 , . . . , xtk , ût−1 , ût−2 , . . . , ût−q , f or all t = (q + 1), . . . , n.
• Testing for Seasonality
For example, with quarterly data, we might postulate the autoregressive model
ut = ρ4 ut−4 + et .
7
Correcting for Serial Correlation with Strictly
Exogenous Regressors
• obtaining the BLUE in the AR(1) model
Consider a simple regression model with errors follow the AR(1) process:
yt = β0 + β1 xt + ut , f or all t = 1, 2, . . . , n.
For t ≥ 2, we write
yt−1 = β0 + β1 xt−1 + ut−1 ,
yt = β0 + β1 xt + ut .
Now,if we multiply this first equation by ρ and subtract it from the
second equation, we get
yt − ρyt−1 = (1 − ρ)β0 + β1 (xt − ρxt−1 ) + et , t ≥ 2
where we have used the fact that et = ut − ρut−1 . We can write this as
ỹt = (1 − ρ)β0 + β1 x̃t + et , t ≥ 2,
where ỹt = yt − ρyt−1 and x̃t = xt − ρxt−1 are called the quasidifferenced data. (If ρ = 1, these are differenced data; here we
are assuming |ρ| < 1.)
Be cautious about the equation for t = 1:
8
• Feasible GLS Estimation with AR(1) Errors
Although the ρ is rarely known, we already know how to get a consistent
estimator for it: we simply regress ût on ût−1 to obtain the estimate,
ρ̂. Next, we use this ρ̂ in place of ρ to obtain the quasi-differenced
variables. We then use OLS on the equation
ỹt = β0 x̃t0 + β1 x̃t1 + . . . + βk x̃tk + errort ,
where x̃t0 = (1 − ρ̂) for t ≥ 2, and x̃10 = (1 − ρ̂2 )1/2 . This results in the
feasible GLS (FGLS) estimator of the βj .
Remarks:
There are several names for FGLS estimation of the AR(1) model that
come from different methods of estimating ρ and different treatment of
the first observation. For example,
1. Cochrane-Orcutt (CO) estimation
2. Prais-Winsten (PW) estimation
• Comparing OLS and FGLS
Consider the regression model
yt = β0 + β1 xt + ut
9
where the time series processes are stationary. Now assuming that the
law of large numbers holds, consistency of OLS for β1 holds if
Cov(xt , ut ) = 0.
Consistency of FGLS estimators
Why OLS and FGLS differ?
• Correcting for Higher Order Serial Correlation
Here, we illustrate the approach for AR(2) serial correlation:
ut = ρ1 ut−1 + ρ2 ut−2 + et
where et is identically with mean zero and variance σe2 . The stability
conditions are more complicated now. They can be shown to be
ρ2 > −1, ρ2 − ρ1 < 1, and ρ1 + ρ2 < 1.
10
Differencing and Serial Correlation
Differencing with respect to the highly persistent data possesses some advantage to the estimation. Suppose we have a simple regression model
yt = β0 + β1 xt + ut , t = 1, 2, . . . ,
where ut follows the AR(1) process.
Serial Correlation-Robust Inference after OLS
Recall equation (12.4), which represents the variance of the OLS slope estimator in a simple regression model with AR(1) errors. We can estimate
this variance very simply by plugging in our standard estimators of ρ and
σ 2 . Now we relax the assumption that errors follow AR(1) process and are
homoskedastic.
Consider the standard multiple linear regression model
yt = β0 + β1 xt1 + . . . + βk xtk + ut , t = 1, 2, . . . , n,
(12.39)
which we have estimated by OLS. We are interested in obtaining a serial
correlation-robust standard error for β̂1 . Write xt1 as a linear function of the
remaining independent variables and an error term,
xt1 = δ0 + δ2 xt2 + . . . + δk xtk + rt
11
where the error rt has zero mean and is uncorrelated with xt2 , xt3 , . . . , xtk .
Then it can be shown that the asymptotic variance of β̂1 is
V ar
n
P
rt ut
t−1
Avar(β̂1 ) =
n
P
2
E(rt2 )
.
t=1
Wooldridge (1989) shows that Avar(β̂1 ) can be estimated as follows. Let
se(β̂1 ) denote the usual (but incorrect) OLS standard error and let σ̂ be the
usual standard error of the regression (or root mean squared error) from
setimating (12.39) by OLS. Let r̂t denote the residuals from the auxiliary
regression of xt1 on 1, xt2 , xt3 , . . . , xtk . For a chosen integer g > 0, define
ν̂ =
n
X
t=1
â2t
g
X
n
X
h
ât ât−h
]
+2
[1 −
g + 1 t=h+1
h=1
(12.43)
where ât = r̂t ût , t = 1, 2, . . . , n.
Once we have ν̂, the serial correlation-robust standard error of β̂1 is simply
se(β̂1 ) =
se(β̂1 ) 2 √
ν̂.
σ̂
The standard error in (12.43) is also robust to arbitrary heteroskedasticity. In the time series literature, the serial correlation-robust standard errors
12
are sometimes called heteroskedasticity and autocorrelation consistent, or HAC, standard errors.
Notes for the serial correlation-robust standard error:
Heteroskedasticity in Time Series Regressions
Because the usual OLS statistics are asymptotically valid under Assumptions TS.10 through TS.50 , we are interested in what happens when the homoskedasticity assumption, TS.40 , does not hold.
• Heteroskedasticity-Robust Statistics
• Testing for Heteroskedasticity
Sometimes,we wish to test for heteroskedasticity in time series regressions,especially if we are concerned about the performance of heteroskedasticityrobust statistics in relatively small sample sizes. The tests proposed
13
in Chapter 8 can be applied directly. However, these test should be
performed with caution:
(1)
(2)
If heteroskedasticity is found in the ut (and the ut are not serially
correlated), then the heteroskedasticity-robust test statistics can be
used. An alternative is to use weighted least squares, as for the
cross-sectional case.
• Autoregressive Conditional Heteroskedasticity (ARCH)
Consider a simple static regression model:
yt = β0 + β1 zt + ut,
and assume that the Gauss-Markov assumptions hold. This means that
the OLS estimators are BLUE. The homoskedasticity assumption says
that Var(ut —Z) is constant, where Z denotes all n outcomes of zt .
Even if the variance of ut given Z is constant,there are other ways that
heteroskedasticity can arise. Engle (1982) suggested looking at the conditional variance of ut given past errors. He proposed a model known
as the autoregressive conditional heteroskedasticity (ARCH)
model.
14
The first order ARCH model is
E(u2t |ut−1 , ut−2 , . . .) = E(u2t |ut−1 ) = α0 + α1 u2t−1 ,
where we leave the conditioning on Z implicit. This equation represents
the conditional variance of ut given past ut only if E(ut |ut−1 , ut−2 , . . .)
= 0, which means that the errors are serially uncorrelated.
Why should we care about ARCH forms of heteroskedasticity?
ARCH models also apply when there are dynamics in the conditional
mean. Suppose we have the dependent variable, yt , a contemporaneous
exogenous variable, zt , and
E(yt |zt , yt−1 , zt−1 , yt−2 , . . .) = β0 + β1 zt + β2 yt−1 + β3 zt−1 ,
so that at most one lag of y and z appears in the dynamic regression.
15
• Heteroskedasticity and Serial Correlation in Regression Models
It is possible to have both heteroskedasticity and serial correlation
present in a regression model. We can model them and make a correction through a combined weighted least squares AR(1) procedure.
Specifically, consider the model
yt = β0 + β1 xt1 + . . . + βtk + ut
p
ut = ht νt
νt = ρνt−1 + et , |ρ| < 1,
(12.52)
where the explanatory variables X are independent of et for all t, and
ht is a function of the xtj . the process et has zero mean and constant
variance σe2 and is serially uncorrelated. Therefore, νt satisfies a stable AR(1) process. Suppressing the conditioning on the explanatory
variables, we have
V ar(ut ) = σν2 ht ,
√
where σν2 = σe2 /(1 − ρ2 ). But νt = ut / ht is homoskedastic and follows
a stable AR(1) model. Therefore, the transformed equation
y
1
x
x
√ t = β0 √ + β1 √t1 + . . . + βk √tk + νt
ht
ht
ht
ht
has AR(1) errors. Now, if we have a particular kind of heteroskedasticity in mind, such as ht , we can estimate (12.52) using standard CO
or PW methods.
16