Download Time Series and Forecasting Lecture 3 Forecast Intervals, Multi

Document related concepts
no text concepts found
Transcript
Time Series and Forecasting
Lecture 3
Forecast Intervals, Multi-Step Forecasting
Bruce E. Hansen
Summer School in Economics and Econometrics
University of Crete
July 23-27, 2012
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
1 / 102
Today’s Schedule
Review
Forecast Intervals
Forecast Distributions
Multi-Step Direct Forecasts
Fan Charts
Iterated Forecasts
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
2 / 102
Review
Optimal point forecast of yn +1 given information In is the conditional
mean E (yn +1 jIn )
Estimate linear approximations by least-squares
Combine point forecasts to reduce MSFE
Select estimators and combination weights by cross-validation
Estimate GARCH models for conditional variance
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
3 / 102
Interval Forecasts
Take the form [a, b ]
Should contain yn +1 with probability 1
1
2α
2α = Pn (yn +1 2 [a, b ])
= Pn (yn +1 b )
= Fn (b ) Fn (a)
Pn (yn +1
a)
where Fn (y ) is the forecast distribution
It follows that
a = qn ( α )
b = qn ( 1
a = α’th and b = (1
Bruce Hansen (University of Wisconsin)
α)
α)’th quantile of conditional distribution
Forecasting
July 23-27, 2012
4 / 102
Interval Forecasts are Conditional Quantiles
The ideal 80% forecast interval, is the 10% and 90% quantile of the
conditional distribution of yn +1 given In
Our feasible forecast intervals are estimates of the 10% and 90%
quantile of the conditional distribution of yn +1 given In
The goal is to estimate conditional quantiles.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
5 / 102
Mean-Variance Model
Write
yt + 1 = µ t + σ t ε t + 1
µt
σ2t
= E (yt +1 jIt )
= var (yt +1 jIt )
Assume that εt +1 is independent of It .
Let qt (α) and q ε (α) be the α’th quantiles of yt +1 and εt +1 . Then
qt ( α ) = µ t + σ t q ε ( α )
Thus a (1
2α) forecast interval for yn +1 is
[ µn + σ n q ε ( α ),
Bruce Hansen (University of Wisconsin)
µn + σ n q ε (1
Forecasting
α)]
July 23-27, 2012
6 / 102
Mean-Variance Model
Given the conditional mean µn and variance σ2n , the conditional
quantile of yn +1 is a linear function µn + σn q ε (α) of the conditional
quantile q ε (α) of the normalized error
ε n +1 =
en + 1
σn
Interval forecasts thus can be summarized by µn , σ2n , and q ε (α)
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
7 / 102
Normal Error Quantile Forecasts
N (0, 1)
Make the approximation εt +1
I
I
q ε (α)
Then
= Z (a) are normal quantiles
Useful simpli…cation, especially in small samples
0.10, 0.25, 0.75, 0.90 quantiles are
I
1.285,
0.675, 0.675, 1.285
Forecast intervals
b n Z ( α ),
bn + σ
[µ
Bruce Hansen (University of Wisconsin)
b n Z (1
bn + σ
µ
Forecasting
α)]
July 23-27, 2012
8 / 102
Nonparametric Error Quantile Forecasts
Let εt +1
I
I
I
I
F be unknown
We can estimate q ε (α) as the empirical quantiles of the residuals
Set
e
e
bεt +1 = t +1
bt
σ
Sort bε1 , ...,bεn .
q
bε (α) and q
bε (1
α) are the α’th and (1
bn q
bn + σ
bε ( α ),
[µ
Computationally simple
Reasonably accurate when n
α)’th percentiles
bn q
bn + σ
µ
bε (1
α)]
100
Allows asymmetric and fat-tailed error distributions
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
9 / 102
Constant Variance Case
bt = σ
b is a constant, there is no advantage for estimation of σ
b for
If σ
forecast interval
Let q
be (α) and q
be (1 α) be the α’th and (1
original residuals e
et + 1
α)’th percentiles of
Forecast Interval:
bn + q
bε ( α ),
[µ
bn + q
µ
be ( 1
α)]
When the estimated variance is a constant, this is numerically
identical to the de…nition with rescaled errors bεt +1
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
10 / 102
Computation in R
quadreg package
I
I
I
may need to be installed
library(quadreg)
rq command
If e is vector of (normalized) residuals and a is the quantile to be
evalulated
I
I
I
rq(e~1,a)
q=coef(rq(e~1,a))
Quantile regression of e on an intercept
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
11 / 102
Example: Interest Rate Forecast
n = 603 observations
e
et + 1
bεt +1 =
from GARCH(1,1) model
bt
σ
0.10, 0.25, 0.75, 0.90 quantiles
1.16,
0.59, 0.62, 1.26
Point Forecast = 1.96
50% Forecast interval = [1.82,
2.10]
80% Forecast interval = [1.69,
2.25]
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
12 / 102
Example: GDP
n = 207 observations
e
et + 1
bεt +1 =
from GARCH(1,1) model
bt
σ
0.10, 0.25, 0.75, 0.90 quantiles
1.18,
0.63, 0.57, 1.26
Point Forecast = 1.31
50% Forecast interval = [0.04,
80% Forecast interval = [ 1.07,
Bruce Hansen (University of Wisconsin)
2.4]
3.8]
Forecasting
July 23-27, 2012
13 / 102
Mean-Variance Model Interval Forecasts - Summary
The key is to break the distribution into the mean µt , variance σ2t and
the normalized error εt +1
yt +1 = µt + σt εt +1
Then the distribution of yn +1 is determined by µn , σ2n and the
distribution of εn +1
Each of these three components can be separately approximated and
estimated
Typically, we put the most work into modeling (estimating) the mean
µt
I
I
The remainder is modeled more simply
For macro forecasts, this re‡ects a belief (assumption?) that most of
the predictability is in the mean, not the higher features.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
14 / 102
Alternative Approach: Quantile Regression
Recall, the ideal 1
2α interval is [qn (α), qn (1
α)]
qn (α) is the α’th quantile of the one-step conditional distribution
Fn (y ) = P (yn +1
y j In )
Equivalently, let’s directly model the conditional quantile function
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
15 / 102
Quantile Regression Function
The conditional distribution is
P (yn +1
y j In ) ' P (yn +1
y j xn )
The conditional quantile function qα (x) solves
P (yn +1
qα (x) j xn = x) = α
q.5 (x) is the conditional median
q.1 (x) is the 10% quantile function
q.9 (x) is the 90% quantile function
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
16 / 102
Quantile Regression Functions
For each α, qα (x) is an arbitrary function of x
For each x, qα (x) is monotonically increasing in α
Quantiles are well de…ned even when moments are in…nite
When distributions are discrete then quantiles may be intervals – we
ignore this
We approximate the functions as linear in qα (x)
qα (x ) ' x0 β α
(after possible transformations in x)
The coe¢ cient vector x0 βα depends on α
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
17 / 102
Linear Quantile Regression Functions
qα (x ) = x0 β α
If only the intercept depends on α,
qα (x ) ' µ α + x0 β
then the quantile regression lines are parallel
I
I
This is when the error et +1 in a linear model is independent of the
regressors
Strong conditional homoskedasticity
In general, the coe¢ cients are functions of α
I
Similar to conditional heteroskedasticity
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
18 / 102
Interval Forecasts
An ideal 1
2α interval forecast interval is
xn0 βα ,
xn0 β1
Note that the ideal point forecast is
predictor
xn0 β
α
where β is the best linear
An alternative point forecast is the conditional median xn0 β0.5
I
This has the property of being the best linear predictor in L1 (mean
absolute error)
All are linear functions of xn , just di¤erent functions
A feasible forecast interval is
h
b ,
xn0 β
α
b and β
b
where β
α
1
Bruce Hansen (University of Wisconsin)
α
b
xn0 β
1
α
i
are estimates of βα and β1
Forecasting
α
July 23-27, 2012
19 / 102
Check Function
Recall that the mean µ = EY minimizes the L2 risk E (Y
Similarly the median q0.5 minimizes the L1 risk E jY
m )2
mj
The α’th quantile qα minimizes the “check function risk
E ρ α (Y
m)
where
ρ α (u ) =
8
<
:
= u (α
u (1
α)
uα
u<0
u
0
1 (u < 0))
This is a tilted absolute value function
To see the equivalence, evaluate the …rst order condition for
minimization
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
20 / 102
Extremum Representation
qα (x) solves
qα (x) = argmin E (ρα (yt +1
m
m ) jxt = x)
Sample criterion
Sα ( β ) =
1n 1
ρα yt +1
n t∑
=0
xt0 β
Quantile regression estimator
b = argmin Sα ( β)
β
α
β
Computation by linear programming
I
I
I
Stata
R
Matlab
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
21 / 102
Computation in R
quantreg package
I
I
I
may need to be installed
library(quantreg)
For quantile regression of y on x at a’th quantile
F
I
I
do not include intercept in x , it will be automatically included
rq(y~x,a)
For coe¢ cients,
F
b=coef(rq(y~x,a))
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
22 / 102
Distribution Theory
The asymptotic theory for the dependent data case is not well
developed
The theory for the cross-section (iid) case is Angrist, Chernozhukov
and Fernandez-Val (Econometrica, 2006)
Their theory allows for quantile regression viewed as a best linear
approximation
p
d
b
βα
n β
! N (0, Vα )
α
Vα = Jα 1 Σ α Jα
Jα = E fy xt0 βα jxt xt xt0
Σα = E xt xt0 ut2
ut
= 1 yt +1 < xt0 βα
α
Under correct speci…cation, Σα = α(1 α)E (xt xt0 )
I suspect that this theorem extends to dependent data if the score is
uncorrelated (dynamics are well speci…ed)
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
23 / 102
Standard Errors
The asymptotic variance depends on the conditional density function
I
Nonparametric estimation!
To avoid this, most researchers use bootstrap methods
For dependent data, this has not been explored
Recommend: Use current software, but be cautious!
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
24 / 102
Crossing Problem and Solution
The conditional quantile functions qα (x) are monotonically increasing
in α
But the linear quantile regression approximations qα (x) ' x0 βα
cannot be globally monotonic in α, unless all lines are parallel
The regression approximations may cross!
b may cross!
The estimates q
bα (x) = x0 β
α
If this happens, forecast intervals may be inverted:
I
A 90% interval may not nest an 80% interval
Simple Solution: Reordering
I
I
I
1
If q
bα1 (x) > q
bα2 (x) when α1 < α2 < , simply set q
bα1 (x) = q
bα2 (x),
2
1
and conversely quantiles above
2
Take the wider interval
Then the endpoint of the two intervals will be the same
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
25 / 102
Model Selection and Combination
To my knowledge, no theory of model selection for median regression
or quantile regression, even in iid context
A natural conjecture is to use cross-validation on the sample check
function
I
But no current theory justi…es this choice
My recommendation for model selection (or combination)
I
I
I
I
I
Select the model for the conditional mean by cross-validation
Use the same variables for all quantiles
Select the weights by cross-validation on the conditional mean
For each quantile, estimate the models with positive weights
Take the weighted combination using the same weights.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
26 / 102
Example: Interest Rates
AR(2) Speci…cation (selected for regression by CV)
yt +1 = β0 + β1 yt + β2 yt
β0
β1
β2
α = 0.10
0.31
0.46
0.22
α = 0.25
0.14
0.31
0.17
1
+ et
α = 0.75
0.15
0.35
0.21
α = 0.90
0.29
0.34
0.25
Forecast 10% quantile
q0.1 (xn ) =
0.31 + 0.46yn
50% Forecast interval = [1.84,
2.12]
80% Forecast interval = [1.65,
2.25]
0.22yn
1
Very close to those from mean-variance estimates
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
27 / 102
Example: GDP
Leading Indicator Model
yt +1 = β0 + β1 yt + β2 Spreadt + β3 HighYield + β4 Starts + β5 Permits + e
β0
β1
β2
β3
β4
β5
α = 0.10
2.72
0.28
1.17
2.12
2.20
3.45
α = 0.25
0.14
0.14
0.75
1.83
0.44
1.61
50% Forecast interval = [0.1,
80% Forecast interval = [ 1.8,
Bruce Hansen (University of Wisconsin)
α = 0.75
0.10
0.33
0.31
0.62
6.68
4.87
α = 0.90
2.0
0.28
0.14
0.37
11.4
9.53
3.2]
4.0]
Forecasting
July 23-27, 2012
28 / 102
Distribution Forecasts
The conditional distribution is
Ft (y ) = P (yt +1
y j It )
It is not common to directly report Ft (y )
I
or the one-step forecast distribution Fn (y )
However, Ft (y ) may be used as an input
For example, simulation
We thus may want an estimate Fbt (y ) of Ft (y )
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
29 / 102
Mean-Variance Model Distribution Forecasts
Model
yt +1 = µt + σt εt +1
with εt +1 is independent of It .
Let εt +1 have distribution F ε (u ) = P (εt
u) .
The conditional distribution of yt +1 is
Ft ( y ) = F ε
yt +1 µt
σt
Fbt (y ) = Fb ε
bt
yt +1 µ
bt
σ
Estimation
where Fb ε (u ) is an estimate of F ε (u ) = P (εt
Bruce Hansen (University of Wisconsin)
Forecasting
u) .
July 23-27, 2012
30 / 102
Normal Error Model
Under the assumption εt +1 N (0, 1), F ε (u ) = Φ(u ), the normal
CDF
bt
y µ
Fbt (y ) = Φ
bt
σ
To simulate from Fbt (y )
I
I
I
bt
bt and σ
Calculate µ
Draw εt +1 iid from N (0, 1)
b t ε t +1
bt + σ
yt + 1 = µ
The normal assumption can be used when sample size n is very small
bt
b and σ
But then Fbt (y ) contains no information beyond µ
t
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
31 / 102
Nonparametric Error Model
Let Fbnε be the empirical distribution function (EDF) of the normalized
residuals bεt +1
The EDF puts probability mass 1/n at each point fbε1 , ...,bεn g
Fbnε (u ) = n
Fbt (y ) = Fbnε
= n
1
n 1
∑ 1 (bεt +1
y
1
bt
σ
n 1
∑1
bt
µ
y
j =0
= n
1
u)
t =0
n 1
∑ 1 (y
j =0
bt
σ
bt
µ
bεj +1
bt bεj +1 )
bt + σ
µ
bt …xed
bt , σ
Notice the summation over j, holding µ
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
32 / 102
Simulate Estimated Conditional Distribution
To simulate
I
I
I
I
bt
bt and σ
Calculate µ
Draw εt +1 iid from normalized residuals fbε1 , ...,bεn g
b t ε t +1
bt + σ
yt + 1 = µ
yt +1 is a draw from Fbt (y )
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
33 / 102
Plot Estimated Conditional Distribution
Fbn (y ) = n
bnbεt +1 )
bn + σ
µ
∑tn=01 1 (y
bnbεt +1
bn + σ
A step function, with steps of height 1/n at µ
1
Calculation
I
I
I
I
bn , and yt +1 = µ
bnbεt +1 , t = 0, ..., n 1
bn , σ
bn + σ
Calculate µ
Sort yt +1 into order statistics y(j )
Equivalently, sort bεt +1 into order statistics bε(1 ) and set
bnbε(j )
bn + σ
y(j ) = µ
Plot on the y-axis f1/n, 2/n, 3/n, ..., 1g against on the x-axis
y(1 ) , y(2 ) , ..., y(n )
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
34 / 102
Examples:
Interest Rate
GDP
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
35 / 102
0.0
0.2
0.4
0.6
0.8
1.0
Figure: 10-Year Bond Rate Forecast Distribution
1.0
Bruce Hansen (University of Wisconsin)
1.5
2.0
Forecasting
2.5
July 23-27, 2012
36 / 102
0.0
0.2
0.4
0.6
0.8
1.0
Figure: GDP Forecast Distribution
-4
Bruce Hansen (University of Wisconsin)
-2
0
2
Forecasting
4
6
8
July 23-27, 2012
37 / 102
Quantile Regression Approach
The distribution function may also be recovered from the estimated
quantile functions.
Fn (qα (xn )) = α
Fbn (q
bα (xn )) = α
b
q
bα (xn ) = xn0 β
α
b for a set of quantiles fα1 , ..., αJ g
Compute q
bα (xn ) = xn0 β
α
Plot αj on the y -axis against q
bαj (xn ) on the x-axis
I
The plot is Fbn (y ) at y = q
bαj (xn )
If the quantile lines cross, then the plot will be non-monotonic
The reordering method ‡attens the estimated distribution at these
points
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
38 / 102
Multi-Step Forecasts
Forecast horizon: h
We say the forecast is “multi-step” if h > 1
Forecasting yn +h given In
e.g., forecasting GDP growth for 2012:3, 2012:4, 2013:1, 2013:2
The forecast distribution is yn +h j In
Bruce Hansen (University of Wisconsin)
Forecasting
Fh (yn +h jIn )
July 23-27, 2012
39 / 102
Point Forecast
fn +h jh minimizes expected squared loss
fn +h jh
= argmin E (yn +h
f
f ) 2 j In
= E (yn +h jIn )
Optimal point forecasts are h-step conditional means
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
40 / 102
Relationship Between Forecast Horizons
Take an AR(1) model
yt +1 = αyt + ut +1
Iterate
yt +1 = α (αyt
+ ut ) + ut + 1
= α yt 1 + αut + ut +1
1
2
or
yt +2 = α2 yt + et +2
ut +2 = αut +1 + ut +2
Repeat h times
yt +h
et + h
= αh yt + et +h
= ut +h + αut +h
Bruce Hansen (University of Wisconsin)
1
+ α 2 ut + h
Forecasting
2
+
+ αh
1
ut + 1
July 23-27, 2012
41 / 102
AR(1)
h-step forecast
= αh yt + et +h
et +h = ut +h + αut +h
E (yn +h jIn ) = αh yn
yt +h
1
+ α 2 ut + h
2
+
+ αh
1
ut + 1
h step point forecast is linear in yn
h-step forecast error en +h is a MA(h
Bruce Hansen (University of Wisconsin)
Forecasting
1)
July 23-27, 2012
42 / 102
AR(2) Model
1-step AR(2) model
yt +1 = α0 + α1 yt + α2 yt
1
+ ut + 1
2-steps ahead
yt +2 = α0 + α1 yt +1 + α2 yt + ut +2
Taking conditional expectations
E (yt +2 jIt ) = α0 + α1 E (yt +1 jIt ) + α2 E (yt jIt ) + E (et +2 jIt )
= α0 + α1 (α0 + α1 yt + α2 yt 1 ) + α2 yt
= α0 + α1 α0 + α21 + α2 yt + α1 α2 yt 1
which is linear in (yt , yt
1)
In general, a 1-step linear model implies an h-step approximate linear
model in the same variables
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
43 / 102
AR(k) h-step forecasts
If
yt +1 = α0 + α1 yt + α2 yt
1
+
+ αk yt
k +1
+ ut + 1
yt +h = β0 + β1 yt + β2 yt
1
+
+ βk yt
k +1
+ et + h
then
where et +h is a MA(h-1)
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
44 / 102
Leading Indicator Models
If
yt +1 = xt0 β + ut
then
E (yt +h jIt ) = E (xt +h
If E (xt +h
1 j It )
1 j It )
0
β
is itself (approximately) a linear function of xt , then
E (yt +h jIt ) = xt0 γ
yt +h = xt0 γ + et +h
Common Structure: h-step conditional mean is similar to 1-step structure,
but error is a MA.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
45 / 102
Forecast Variable
We should think carefully about the variable we want to report in our
forecast
The choice will depend on the context
What do we want to forecast?
I
Future level: yn +h
F
I
I
interest rates, unemployment rates
Future di¤erences: ∆yt +h
Cummulative Change: ∆yt +h
F
Cummulative GDP growth
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
46 / 102
Forecast Transformation
fn +h jn = E (yn +h jIn ) = expected future level
I
Level speci…cation
yt + h
fn + h j n
I
Di¤erence speci…cation
∆yt +h
fn +h jn
I
= xt0 β + et +h
= xt0 β
= xt0 βh + et +h
= yn + xt0 β1 +
+ xt0 βh
Multi-Step di¤erence speci…cation
yt +h
yt
fn +h jn
Bruce Hansen (University of Wisconsin)
= xt0 β + et +h
= yn + xt0 β
Forecasting
July 23-27, 2012
47 / 102
Direct and Iterated
There are two methods of multistep (h > 1) forecasts
Direct Forecast
I
Model and estimate E (yn +h jIn ) directly
Iterated Forecast
I
I
I
Model and estimate one-step E (yn +1 jIn )
Iterate forward h steps
Requires full model for all variables
Both have advantages and disadvantages
I
For now, we will forcus on direct method.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
48 / 102
Direct Multi-Step Forecasting
Markov approximation
I
E (yn +h jIn ) = E (yn +h jxn , xn 1 , ...)
E (yn +h jxn , ..., xn p )
Linear approximation
I
E (yn +h jxn , ..., xn p )
β0 xn
Projection De…nition
I
β = (E (xt xt0 ))
1
(E (xt yt +h ))
Forecast error
I
et +h = yt +h
Bruce Hansen (University of Wisconsin)
β0 xt
Forecasting
July 23-27, 2012
49 / 102
Multi-Step Forecast Model
yt +h = β0 xt + et +h
β =
E xt xt0
1
(E (xt yt +h ))
E (xt et +h ) = 0
σ2 = E et2+h
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
50 / 102
Properties of the Error
E (xt et +h ) = 0
I
Projection
E ( et + h ) = 0
I
Inclusion of an intercept
The error et +h is NOT serially uncorrelated
It is at least a MA(h-1)
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
51 / 102
Least Squares Estimation
b =
β
ybn +h jn
Bruce Hansen (University of Wisconsin)
n 1
∑
t =0
xt xt0
!
1
n 1
∑ xt yt +h
t =0
!
b 0 xn
= b
fn + h j n = β
Forecasting
July 23-27, 2012
52 / 102
Distribution Theory - Consistent Estimation
By the WLLN,
b=
β
n 1
∑ xt xt0
t =0
!
1
p
= β
Bruce Hansen (University of Wisconsin)
! E xt xt0
Forecasting
n 1
∑ xt yt +h
t =0
1
!
(E xt yt +h )
July 23-27, 2012
53 / 102
Distribution Theory - Asymptotic Normality
By the dependent CLT,
1n 1
xt et +h
n t∑
=0
Ω = E xt xt0 et2+h +
∞
∑
d
! N (0, Ω)
xt xt0 +j et +h et +h +j + xt +j xt0 et +h et +h +j
j =1
' E xt xt0 et2+h +
h 1
∑
xt xt0 +j et +h et +h
j
+ xt +j xt0 et +h et +h +j
j =1
A long-run (HAC) covariance matrix
If model is correctly speci…ed, the errors are a MA(h-1) and the sum
truncates at h 1
Otherwise, this is an approximation
It does not simplify to the iid covariance matrix
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
54 / 102
Distribution Theory
p
b
n β
V =Q
Ω
β
d
! N (0, V )
1 ΩQ 1
E xt xt0 et2+h + ∑hj =11 xt xt0 +j et +h et +h
j
+ xt +j xt0 et +h et +h +j
HAC variance matrix
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
55 / 102
Residuals
Least-squares residuals
I
I
b 0 xt
b
et +h = yt +h β
Standard, but over…t
Leave-one-out residuals
I
I
b 0 xt
e
et +h = yt +h β
t
Does not correct for MA errors
Leave h out residuals
b
β
t,h
=
e
et +h = yt +h
∑
xj xj0
jj +h t j h
!
b0
β
t,h xt
1
∑
jj +h t j h
The summation is over all observations outside h
Bruce Hansen (University of Wisconsin)
Forecasting
xj yj +h
!
1 periods of t + h.
July 23-27, 2012
56 / 102
Algebraic Computation of Leave h out residuals
Loop across each observation t = (yt +h , xt )
Leave out observations t
h + 1, ..., t, ..., t + h
1
R command
I
I
I
For positive integers i
x[-i] returns elements of x excluding indices i
Consider
F
F
F
F
I
ii=seq(i-h+1,i+h-1)
ii<-ii[ii>0]
yi=y[-ii]
xi=x[-ii,]
This removes t
Bruce Hansen (University of Wisconsin)
h + 1, ..., t, ..., t + h
Forecasting
1 from y and x
July 23-27, 2012
57 / 102
Variance Estimator
Asymptotic variance (HAC) estimator with leave-h-out residuals
bQ
b 1Ω
b 1
= Q
n 1
b = 1 ∑ xt xt0
Q
n t =0
b
V
b =
Ω
1 n
1 h 1n j
xt xt0 e
et2+h + ∑ ∑ xt xt0 +j e
et + h e
et +h +j + xt +j xt0 e
et + h e
et + h +
∑
n t =1
n j =1 t =1
Can use least-squares residuals b
et +h instead of leave-h-out residuals,
b
but then multiply V by n/(n dim(xt )).
b are the square roots of the diagonal elements of
Standard errors for β
1
b
n V
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
58 / 102
Example: GDP Forecast
yt = 400 log(GDPt )
Forecast Variable: GDP growth over next h quarters, at annual rate
yt +h yt
= β0 + β1 ∆yt + β1 ∆yt
h
1
+ Spreadt + HighYieldt + β2 HSt + et +h
HSt =Housing Startst
β0
∆yt
∆yt 1
Spreadt
HighYieldt
HSt
h=1
0.33
0.16
0.09
0.61
1.10
1.86
(1.0)
(.10)
(.10)
(.23)
(.75)
(.65)
Bruce Hansen (University of Wisconsin)
h=2
0.38 (1.3)
0.18 (.09)
0.04 (.05)
0.65(.19)
0.68 (.70)
1.64 (.70)
Forecasting
h=3
0.01
0.13
0.05
0.65
0.48
1.31
(1.6)
(.08)
(.07)
(.22)
(.90)
(.80)
h=4
0.47 (1.8)
0.13 (.09)
0.02 (.06)
0.65 (.25)
0.41 (1.01)
1.01 (.94)
July 23-27, 2012
59 / 102
Example: GDP Forecast
2012:2
2012:3
2012:4
2013:1
2013:2
2013:3
2013:4
2014:1
Bruce Hansen (University of Wisconsin)
Cummulative Annualized Growth
1.3
1.6
2.9
2.2
2.4
2.7
2.9
3.2
Forecasting
July 23-27, 2012
60 / 102
Selection and Combination for h step forecasts
AIC routinely used for model selection
PLS (OOS MSFE) routinely used for model evaluation
Neither well justi…ed
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
61 / 102
Point Forecast and MSFE
b (m ) of β , the point forecast for yn +h is
Given an estimate β
b 0 xn
fn + h j n = β
The mean-squared-forecast-error (MSFE) is
MSFE
= E en + h
' σ2 + E
b
xn0 β
b
β
where Q = E (xn xn0 ) and σ2 = E en2+h
β
2
β
0
b
Q β
β
Same form as 1-step case
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
62 / 102
Residual Fit
b2 =
σ
1n 1 2
1n 1
b
et +h + ∑ xt0 β
∑
n t =0
n t =0
2n 1
b
et +h xt0 β
n t∑
=0
2 0
' MSFE
e Pe
n
where B = E (e0 Pe)
Save form as 1-step case
Bruce Hansen (University of Wisconsin)
b2 ' MSFEn
E σ
Forecasting
2
β
β
2
B
n
July 23-27, 2012
63 / 102
Asymptotic Penalty
1
p e0 X
n
e0 Pe =
1 0
XX
n
!d Z 0 Q
where Z
1
1
1
p X0 e
n
Z
N (0, Ω), with Ω =HAC variance.
B
Bruce Hansen (University of Wisconsin)
= E e0 Pe
! tr Q 1 E ZZ 0
= tr Q 1 Ω
Forecasting
July 23-27, 2012
64 / 102
Ideal MSFE Criterion
b 2 (m ) +
Cn ( m ) = σ
2
tr Q
n
1
Ω
Q = E xt xt0
Ω = E xt xt0 et2+h +
h 1
∑
xt xt0 +j et +h et +h
j
+ xt +j xt0 et +h et +h +j
j =1
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
65 / 102
H-Step Robust Mallows Criterion
b 2 (m ) +
Cn ( m ) = σ
b is a HAC covariance matrix
where Ω
Bruce Hansen (University of Wisconsin)
2
b
tr Q
n
Forecasting
1b
Ω
July 23-27, 2012
66 / 102
H-Step Cross-Validation for Selection
CVn (m ) =
b
β
t,h
=
1n 1
e
et + h ( m ) 2
n i∑
=0
b0
e
et +h = yt +h β
! 1
∑
xj xj0
jj +h t j h
t,h xt
∑
jj +h t j h
xj yj +h
!
Theorem: E (CVn (m )) ' MSFE (m )
Thus m
b = argmin CVn (m ) is an estimate of m = argmin MSFEn (m ), but
there is no proof of optimality
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
67 / 102
H-Step Cross-Validation for Forecast Combination
CVn (w) =
=
1 n
e
et + 1 ( w ) 2
n t∑
=1
1 n
n t∑
=1
M
=
∑
M
M
∑
m =1
w (m )e
et + 1 ( m )
∑ w (m)w (`)
m =1 `=1
0e
= w Sw
where
!2
1 n
e
et +1 (m )e
et +1 (`)
n t∑
=1
e = 1e
S
e 0e
e
n
is covariance matrix of leave-h-out residuals.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
68 / 102
Cross-validation Weights
Combination weights found by constrained minimization of CVn (w)
e
min CVn (w) = w0 Sw
w
subject to
M
∑
w (m ) = 1
m =1
0
Bruce Hansen (University of Wisconsin)
w (m )
Forecasting
1
July 23-27, 2012
69 / 102
Illustration 1
k = 8 regressors
I
I
intercept
normal AR(1)’s with coe¢ cient ρ = 0.9
h-step error
I
I
normal MA(h-1)
equal coe¢ cients
Regression coe¢ cients
I
I
I
β = (µ, 0, ..., 0)
n = 50
MSPE plotted as a function of µ
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
70 / 102
Estimators
Unconstrained Least-Squares
Leave-1-out CV Selection
Leave-h-out CV Selection
Leave-1-out CV Combination
Leave-h-out CV Combination
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
71 / 102
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
72 / 102
Illustration 2
Model
yt = αyt
1
+ ut
Unconstrained model: AR(3)
yt = µ̂ + β̂1 yt
Bruce Hansen (University of Wisconsin)
h
+ β̂2 yt
h 1
Forecasting
+ β̂3 yt
h 2
+ êt
July 23-27, 2012
73 / 102
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
74 / 102
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
75 / 102
Example: GDP Forecast Weights by Horizon
h=1
AR(1)
AR(2)
AR(1)+HS
AR(1)+HS+BP
AR(2)+HS
ybn +h jn
.30
.66
h=2
.15
h=3
.19
h=4
.28
h=5
.18
h=6
.16
h=7
.11
.70
.14
.22
.58
.72
.82
.84
.89
2.0
1.9
2.0
2.1
2.3
2.6
.04
1.7
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
76 / 102
h-step Variance Forecasting
Not well developed using direct methods
Suggest using constant variance speci…cation
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
77 / 102
h-step Interval Forecasts
Similar to 1-step interval forecasts
I
But calculated from h step residuals
Use constant variance speci…cation
Let q
be (α) and q
be ( 1
residuals e
et + h
α) be the α’th and (1
α)’th percentiles of
Forecast Interval:
bn + q
bε ( α ),
[µ
Bruce Hansen (University of Wisconsin)
bn + q
µ
be ( 1
Forecasting
α)]
July 23-27, 2012
78 / 102
Quantile Regression Approach
Fn (y ) = P (yn +h
qα (x ) '
x0 β
y j In )
α
Estimate quantile regression of yt +h on xt
b
b , xn0 β
1 2α forecast interval is [xn0 β
1 α]
α
Asymptotic theory not developed for h step case
I
I
Developed for 1-step case
Extension is expected to work
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
79 / 102
Example: GDP Forecast Intervals (80%)
Using quantile regression approach
2012 : 2
2012 : 3
2012 : 4
2013 : 1
2013 : 2
2013 : 3
2013 : 4
2014 : 1
Bruce Hansen (University of Wisconsin)
ybn +h jn
1.3
1.6
2.0
2.2
2.4
2.7
2.9
3.2
Forecasting
Interval
[ 1.8, 4.1]
[ 0.4, 3.6]
[ 0.6, 4.6]
[ 0.3, 4.1]
[0.2, 4.2]
[0.6, 3.8]
[0.7, 4.8]
[1.5, 4.8]
July 23-27, 2012
80 / 102
Fan Charts
Plots of a set of interval forecasts for multiple horizons
I
I
I
I
Pick a set of horizons, h = 1, ..., H
Pick a set of quantiles, e.g. α = .10, .25, .75, .90
Recall the quantiles of the conditional distribution are
qn (α, h ) = µn (h ) + σn (h )q ε (α, h )
Plot qn (.1, h ), qn (.25, h ), µn (h ), qn (.75, h ), qn (.9, h ) against h
Graphs easier to interpret than tables
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
81 / 102
Illustration
I’ve been making monthly forecasts of the Wisconsin unemployment
rate
Forecast horizon h = 1, ..., 12 (one year)
Quantiles: α = .1, .25, .75, .90
This corresponds to plotting 50% and 80% forecast intervals
50% intervals show “likely” region (equal odds)
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
82 / 102
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
83 / 102
Comments
Showing the recent history gives perspective
Some published fan charts use colors to indicate regions, but do not
label the colors
Labels important to infer probabilities
I like clean plots, not cluttered
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
84 / 102
Illustration: GDP Growth
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
85 / 102
-1
0
1
2
3
4
5
Figure: GDP Average Growth Fan Chart
2011.0
2011.5
Bruce Hansen (University of Wisconsin)
2012.0
2012.5
Forecasting
2013.0
2013.5
2014.0
July 23-27, 2012
86 / 102
It doesn’t “fan” because we are plotting average growth
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
87 / 102
Iterated Forecasts
Estimate one-step forecast
Iterate to obtain multi-step forecasts
Only works in complete systems
I
I
Autoregressions
Vector autoregressions
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
88 / 102
Iterative Forecast Relationships in Linear VAR
vector yt
yt +1 = A0 + A1 yt + A2 yt
1
+
+ Ak yt
k +1
+ ut + 1
1-step conditional mean
E (yt +1 jIt ) = A0 + A1 E (yt jIt ) +
+ Ak E (yt
= A0 + A1 yt + A2 yt 1 +
+ Ak yt
k +1 jIt )
k +1
2-step conditional mean
E (yt +1 jIt
1)
= E (E (yt +1 jIt ) jIt 1 )
= A0 + A1 E (yt jIt 1 ) +
+ Ak E (yt k +1 jIt
= A0 + A1 E (yt jIt 1 ) + A2 yt 1 +
+ Ak yt
1)
k +1
h step conditional mean
E (yt +1 jIt
h +1 )
= E E ( y t + 1 j It ) j I t h + 1
= A0 + A1 E (yt jIt h +1 ) +
Linear in lower-order (up to h
Bruce Hansen (University of Wisconsin)
+ Ak E (yt
1 step) conditional means
Forecasting
k + 1 j It h + 1
July 23-27, 2012
89 / 102
Iterative Least Squares Forecasts
Estimate 1-step VAR(k) by least-squares
b0 + A
b 1 yt + A
b 2 yt
yt +1 = A
1
b k yt
+A
+
Gives 1-step point forecast
b0 + A
b 1 yn + A
b 2 yn
ybn +1 jn = A
1
+
2-step iterative forecast
b k yn
+A
h step iterative forecast
1 jn
b 2 ybn +h
+A
2 jn
+ ubt +1
b k yn
+A
b0 + A
b 1 ybn +1 jn + A
b 2 yn +
ybn +2 jn = A
b0 + A
b 1 ybn +h
ybn +h jn = A
k +1
+
k +1
k +2
b k ybn +h
+A
k jn
This is (numerically) di¤erent than the direct LS forecast
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
90 / 102
Illustration 1: GDP Growth
AR(2) Model
yt +1 = 1.6 + 0.30yt + .16yt
1
yn = 1.8, yn
ybn +1
ybn +2
ybn +3
ybn +4
1 = 2.9
= 1.6 + 0.30 1.8 + .16
= 1.6 + 0.30 2.6 + .16
= 1.6 + 0.30 2.7 + .16
= 1.6 + 0.30 2.9 + .16
Bruce Hansen (University of Wisconsin)
2.9 = 2.6
1.8 = 2.7
2.6 = 2.9
2.7 = 3.0
Forecasting
July 23-27, 2012
91 / 102
Point Forecasts
2012:2
2012:3
2012:4
2013:1
2013:2
2013:3
2013:4
2014:1
Bruce Hansen (University of Wisconsin)
Forecasting
2.65
2.72
2.87
2.93
2.97
2.99
3.00
3.01
July 23-27, 2012
92 / 102
Illustration 2: GDP Growth+Housing Starts
VAR(2) Model
y1t = GDP Growth, y2t =Housing Starts
xt = (GDP Growtht , Housing Startst , GDP Growtht
Startst 1
b0 + A
b 1 yt + A
b 2 yt 1 + u
yt +1 = A
bt +1
y1t +1 = 0.43 + 0.15y1t + 11.2y2t + 0.18y1t
y2t +1 = 0.07
0.001y1t + 1.2y2t
Bruce Hansen (University of Wisconsin)
Forecasting
0.001y1t
1,
Housing
10.1y2t
1
1
0.26y2t
1
1
July 23-27, 2012
93 / 102
Illustration 2: GDP Growth+Housing Starts
y1n = 1.8, y2n = 0.71, y1n
= 2.9, y2n 1 = 0.68
y1n +1 = 0.43 + 0.15 1.8 + 11.2 0.71 + 0.18 2.9 10.1 0.68 = 2.3
y2t +1 = 0.07 0.001 1.8 + 1.2 0.71 0.001 2.9 0.26 0.68 =
0.76
y1n +2 = 0.43 + 0.15 2.3 + 11.2 0.76 + 0.18 1.8 10.1 0.71 = 2.4
y2t +1 = 0.07 0.001 2.3 + 1.2 0.76 0.001 1.8 0.26 0.71 =
0.80
Bruce Hansen (University of Wisconsin)
1
Forecasting
July 23-27, 2012
94 / 102
Point Forecasts
2012:2
2012:3
2012:4
2013:1
2013:2
2013:3
2013:4
2014:1
Bruce Hansen (University of Wisconsin)
GDP
2.36
2.38
2.53
2.58
2.64
2.66
2.69
2.71
Forecasting
Housing
0.76
0.80
0.84
0.88
0.92
0.95
0.98
1.01
July 23-27, 2012
95 / 102
Model Selection
It is typical to select the 1-step model and use this to make all h-step
forecasts
However, there theory to support this is incomplete
(It is not obvious that the best 1-step estimate produces the best
h-step estimate)
For now, I recommend selecting based on the 1-step estimates
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
96 / 102
Model Combination
There is no theory about how to apply model combination to h-step
iterated forecasts
Can select model weights based on 1-step, and use these for all
forecast horizons
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
97 / 102
Variance, Distribution, Interval Forecast
While point forecasts can be simply iterated, the other features cannot
Multi-step forecast distributions are convolutions of the 1-step
forecast distribution.
I
Explicit calculation computationally costly beyond 2 steps
Instead, simple simulation methods work well
The method is to use the estimated condition distribution to simulate
each step, and iterate forward. Then repeat the simulation many
times.
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
98 / 102
Multi-Step Forecast Simulation
Let µ (x) and σ (x) denote the models for the conditional one-step
mean and standard deviation as a function of the conditional variables
x
b (x) denote the estimates of these functions, and let
b (x) and σ
Let µ
fbε1 , ...,bεn g be the normalized residuals
xn = (yn , yn 1 , ..., yn p ) is known. Set xn = xn
To create one h-step realization:
I Draw ε
ε1 , ...,bεn g
n +1 iid from normalized residuals fb
I Set y
b
b
=
µ
x
+
σ
x
ε
(
)
(
)
n
n t +1
n +1
I Set x
n +1 = (yn +1 , yn , ..., yn p +1 )
I Draw ε
ε1 , ...,bεn g
n +2 iid from normalized residuals fb
I Set y
b
b
=
µ
x
+
σ
x
ε
n +2
n +1
n +1 t +2
I Set x
n +2 = (yn +2 , yn +1 , ..., yn p +2 )
I
I
Repeat until you obtain yn +h
yn +h is a draw from the h step ahead distribution
Repeat this B times, and let yn +h (b ), b = 1, ..., B denote the B
repetitions
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
99 / 102
Multi-Step Forecast Simulation
The simulation has produced yn +h (b ), b = 1, ..., B
For forecast intervals, calculate the empirical quantiles of yn +h (b )
I
For an 80% interval, calculate the 10% and 90%
For a fan chart
I
I
Calculate a set of empirical quantiles (10%, 25%, 75%, 90%)
For each horizon h = 1, ..., H
As the calculations are linear they are numerically quick
I
I
I
Set B large
For a quick application, B = 1000
For a paper, B = 10, 000 (minimum))
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
100 / 102
VARs and Variance Simulation
The simulation method requires a method to simulate the conditional
variances
In a VAR setting, you can:
I
Treat the errors as iid (homoskedastic)
F
I
Treat the errors as independent GARCH errors
F
I
Easiest
Also easy
Treat the errors as multivariate GARCH
F
F
Allows volatility to transmit across variables
Probably not necessary with aggregate data
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
101 / 102
Assignment
Take your favorite model from yesterday’s assignment
Calculate forecast intervals
Make 1 through 12 step forecasts
I
I
point
interval
Create a fan chart
Bruce Hansen (University of Wisconsin)
Forecasting
July 23-27, 2012
102 / 102