Download bootstrap 2019

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Bootstrap
Econometrics III Lecture Notes
Ke-Li Xu
Indiana University
September 13, 2019
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
1 / 47
Contents
Bootstrap bias correction
Bootstrap standard error
Bootstrap coe¢ cient-based test (and CI)
Bootstrap t-test (and percentile-t CI)
Example: linear regression model
I
I
I
I
Pairwise bootstrap and residual-based bootstrap
Restricted bootstrap
Bootstrapping F test
Parametric bootstrap
Permutation test
Bootstrap: Improve oneself by one’s own e¤orts
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
2 / 47
Bootstrap bias correction
Suppose the parameter of interest is θ.
The bias of an estimator b
θ is
τ = Eb
θ
θ.
E.g. θ can be a linear regression coe¢ cient, when the strict exogeneity
assumption is not satis…ed.
If τ is known, then we can construct an (infeasible) bias-corrected estimator
of θ :
bc ,inf
b
θ
= bθ τ.
bc ,inf
bc ,inf
b
θ
is unbiased: E b
θ
= θ.
We now want to estimate τ. The bootstrap provides a solution.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
3 / 47
Suppose the (raw) data are fZ1 , ..., Zn g, coming from the unknown
distribution F .
We can rewrite (making the dependence on F explicit)
τ = τ ( F ) = EF b
θ
θ (F ).
E.g. in the linear regression yi = xi0 θ + ui ,
θ = θ (F ) = (EF xi xi0 ) 1 EF xi yi .
The idea of bootstrap is to approximate F by Fb , the empirical CDF of the
data:
n
Fb (z ) = n 1 ∑ I (Zi
z ).
i =1
Intuitively, Fb assigns the probability mass 1/n to each data point Zi .
Thus τ is estimated by τ (Fb ).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
4 / 47
Denote the moments under Fb as E ( ), Var ( ), etc.
De…ne b
θ to be like b
θ, but in terms of data coming from Fb (instead of the
original data, which come from F ).
What is τ (Fb ) then?
τ (F )
τ (Fb )
= EF bθ
= E bθ
θ (F )
θ (Fb ) = E b
θ
b
θ.
Here we have used θ (Fb ) = b
θ, which is typically true in econometrics when b
θ
is an MM (method of moments) estimator.
E.g. in the linear regression,
θ (Fb )
b
θ
= ( E xi xi 0 )
1
n
E xi yi = ( ∑ xi xi0 ) 1
i =1
n
= ( ∑ xi xi 0 )
i =1
1
∑ xi yi ,
i =1
n
∑ xi yi ,
i =1
where random variables (yi , xi 0 )0 are from Fb .
A resampling approach is needed to obtain E b
θ .
Ke-Li Xu (Indiana University)
n
Bootstrap
September 13, 2019
5 / 47
θ by taking many random draws from the data.
Bootstrap: Approximate E b
I
I
I
Take n random draws (with replacement) from the raw data. These n draws
are called a bootstrap sample: fZ1 , ..., Zn g.
Repeat doing this B times. So we have B bootstrap samples:
fZ1 (b ), ..., Zn (b )g, b = 1, ..., B .
For each bootstrap sample, compute b
θ (b ), where b
θ (b ) is just like b
θ except
that the sample fZ1 (b ), ..., Zn (b )g, instead of the raw data, is used.
B should be large, so that the distribution of fb
θ (b ) : b = 1, ..., B g well
approximates the distribution of b
θ .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
6 / 47
Then
E b
θ
1 B b
b
θ ,
θ (b )
B b∑
=1
So de…ne the estimator of τ as b
τ:
τ (Fb ) = E b
θ
b
θ
b
θ,b
τ
b
θ
The bias-corrected estimator of b
θ is formed as
bc
b
θ =b
θ
b
τ = 2b
θ
b
θ .
Lastly, on a theoretical note, the approximation of F using Fb is justi…ed by
Glivenko-Cantelli Theorem: if Zi is iid,
sup jFb (z )
z
Ke-Li Xu (Indiana University)
Bootstrap
p
F (z )j ! 0.
September 13, 2019
7 / 47
Bootstrap standard errors
The bootstrap method estimates the …nite-sample variance of b
θ (i.e. Var(b
θ))
by:
Var (b
θ ).
After re-sampling the data B times, Var (b
θ ) is appximated by
Vbθboot =
B
1
B
∑ (bθ
1 b =1
(b )
b
θ )(b
θ (b )
b
θ )0 .
Vbboot is called the bootstrap variance estimator.
θ
Bootstrap standard error (if θ is a scalar)
SEbθboot =
Ke-Li Xu (Indiana University)
h
B
b
B 1 ∑ b =1 ( θ (b )
1
Bootstrap
b
θ )2
i1 /2
.
September 13, 2019
8 / 47
Bootstrap test (coe¢ cient-based)
In the bias correction above, we use the re-sampling to approximate a
moment
b
E (b
θ
θ ).
In fact, the distribution of b
θ
Ke-Li Xu (Indiana University)
b
θ can be also useful.
Bootstrap
September 13, 2019
9 / 47
Consider a generic scalar parameter θ.
In the regression setting, θ can be βj (the j-th slope).
Consider H0 : θ = θ 0 .
For an estimator b
θ, based on the asymptotic normality result
d
0
1
/
2
n (b
θ θ ) ! N (0, v 2 ) (if the null is true), we have
n1 /2 (b
θ
θ0 )
Asy .
where vb is an estimator of v .
N (0, vb2 ),
(1)
The bootstrap provides an alternative way of approximating the
(…nite-sample) distribution of n1 /2 (b
θ θ 0 ), instead of using N (0, vb2 ), as in
(1).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
10 / 47
θ θ 0 ) is Gn (F ), where F is the true
Denote the true distribution of n1 /2 (b
CDF behind the data. Note that both b
θ and θ 0 (true value, if the null holds)
depend on F .
If we know Gn (F ), then we calculate its lower and upper quantiles, so that a
test can be formed.
But in general, Gn (F ) is unknown
Two methods to approximate Gn (F ):
I
I
the asymptotic method (using G∞ (F ))
the bootstrap method (using Gn (Fb )), where Fb is the empirical CDF of the
data. [This is commonly referred to as the nonparametric bootstrap.]
n 1/2 (b
θ
n
The distribution of n1 /2 (b
θ
Ke-Li Xu (Indiana University)
1/2
(bθ
θ0 )
exact
G n (F )
exact
b
θ)
Gn (Fb )
b
θ ) is obtained by re-sampling.
Bootstrap
September 13, 2019
11 / 47
Algorithm (a bootstrap test for θ = θ 0 )
Suppose the (raw) data are fZ1 , ..., Zn g.
As before, we generate B bootstrap samples: fZ1 (b ), ..., Zn (b )g,
b = 1, ..., B.
For each bootstrap sample, compute b
θ (b ).
N Let q (α) be the α th quantile of {n1 /2 (bθ (b )
b
θ ), b = 1, ..., B g. N
Two-sided test: Reject H0 at level 5% if
n1 /2 (b
θ
θ 0 ) > q (0.975)
or n1 /2 (b
θ
θ 0 ) < q (0.025).
(2)
One-sided test (e.g. H0 : θ = θ 0 vs. HA : θ > θ 0 ): Reject H0 at level 5% if
n1 /2 (b
θ
Ke-Li Xu (Indiana University)
θ 0 ) > q (0.95).
Bootstrap
September 13, 2019
12 / 47
d
We need θ 0 be the true value (under H0 ) for n1 /2 (b
θ θ 0 ) ! N (0, v 2 ) to
hold.
In the bootstrap world, b
θ is the true value. Thus θ 0 has no role in the
bootstrap statistic:
n1 /2 (b
θ (b ) b
θ ).
A common mistake is to use n1 /2 (b
θ
I
I
θ 0 ) as the bootstrap statistic.
If so, the critical value would also change with θ 0 (note that n 1/2 (b
θ θ0 )
changes with θ 0 ), thereby the test may have no power.
[we will revisit this point later when we show the bootstrap test is consistent
(i.e. the power goes to one under the alternative)]
This coe¢ cient-based bootstrap test (and the induced con…dence interval (3)
below) has the advantage of not having to estimate the standard error.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
13 / 47
The advantage of bootstrap is that we can easily take random draws from Fb
as many as we want. (We can only take one random draw from F , i.e. the
original data in hand).
The asymptotic validity is usually shown like this:
Gn (F )
Goal
&
G∞ (F )
Asym.
Gn (Fb )
Bootstrap
%
Thus, in order to approximate Gn (F ), the asymptotic method and bootstrap
are …rst-order equivalent.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
14 / 47
Note that for each method, we need another approximation.
I
I
For the asymptotic method, we estimate the asymptotic variance.
For bootstrap, we draw B samples (requiring B to be large).
Fb is referred to as the bootstrap world (F is the world we want to learn
about).
In the bootstrap world, b
θ (b ) b
θ is one draw.
The main advantage of the bootstrap procedure above is that we don’t need
to derive G∞ (F ). In many cases, we don’t need to estimate the asymptotic
variance v 2 .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
15 / 47
Bootstrap con…dence interval (coe¢ cient-based)
The 95% CI is obtained by inverting the test (2) (collecting all values which
can not be rejected).
θ θ ) q (0.975)g, or
We then have fθ : q (0.025) n1 /2 (b
b
θ
n 1 /2 q (0.975)
θ
b
θ
n 1 /2 q (0.025).
(3)
The CI (3) has correct asymptotic coverage 95%.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
16 / 47
Percent CI
A commonly-used bootstrap CI is formed by simply using lower and upper
quantiles of fb
θ (b ) : b = 1, ..., B g.
This is called percentile CI.
The percentile CI can be writtten as
θ 2 [b
θ + n 1 /2 q (0.025), b
θ + n 1 /2 q (0.975)].
(4)
It is important to note that the CI (4) is in general di¤erent from (3). They
are the same only if the bootstrap distribution is symmetric around zero (so
that q (0.025) = q (0.975)).
Asymptotic justi…cation of the percentile CI needs G∞ (F ) to be symmetric.
Thus the CI (3) is a preferred choice.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
17 / 47
We now show that the percent CI has the claimed coverage asymptotically if
symmetry is assumed.
Suppose
n1 /2 (b
θ
n1 /2 (b
θ
d
θ ) ! ξ,
(5)
d
b
θ ) ! ξ.
(6)
Assume that the distribution of ξ is symmetric around zero. (e.g. consider a
regression model, then ξ is normal.)
In our previous notation, ξ is G∞ (F ).
Then
P
θ 2 Percentile CI
b
θ + n 1 /2 q (0.025)
=
P
=
!
P
q (0.975)
P ( qξ (0.975)
=
P (qξ (0.025)
symmetry
Ke-Li Xu (Indiana University)
θ
b
θ + n 1 /2 q (0.975)
n1 /2 (b
θ θ)
q (0.025)
ξ
qξ (0.025))
ξ
Bootstrap
qξ (0.975)) = 0.95.
September 13, 2019
18 / 47
Consistency of the bootstrap test
Under a particular model, showing both (5) and (6) (with the same ξ)
implies the bootstrap validity (asymptotically), i.e. the distribution of
b
n1 /2 (b
θ θ ) can be approximated by the distribution of n1 /2 (b
θ
θ ).
We will showcase this for di¤erent models later in the class (e.g. (7) below).
We can also show that the bootstrap test is consistent.
Suppose H0 : θ = θ 0 is wrong. That means θ true 6= θ 0 .
Note that (5) holds only for θ true (not θ 0 ), thus the test statistic diverges, i.e.
n1 /2 (b
θ
θ0 )
=
!
n1 /2 (b
θ
dξ
θ true ) + n1 /2 (θ true
+ n1 /2 (θ true
On the other hand, by (6), n1 /2 (b
θ
θ0 )
θ 0 ) ! ∞.
b
θ ) remains …xed (irrelavant to θ 0 ).
So the test will always reject, asymptotically.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
19 / 47
Bootstrap t-test
Consider H0 : θ = θ 0 .
The t-statistic is t (θ 0 ) = vb 1 (b
θ
θ 0 ).
The asymptotic method relies on
d
t (θ 0 ) ! N (0, 1), under H0 .
Let the distribution of t (θ 0 ) be Gn (F ). Then G∞ (F ) = N (0, 1).
The bootstrap approximates Gn (F ) by Gn (Fb ). The bootstrap t-stat:
t = vb
1
(bθ
b
θ ).
Let qt (α) be 100α% quantile of t (where the actual calculation needs B
bootstrap re-sampling).
The hypothesis H0 is rejected if t (θ 0 ) > qt (0.975) or t (θ 0 ) < qt (0.025).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
20 / 47
Percent-t CI
Percentile-t con…dence interval for θ:
[bθ
vb
θ
qt (0.975), b
vb
qt (0.025)].
It is obtained by inverting the bootstrap t-test:
(bθ θ 0 ) < qt (0.975)
vbqt (0.975) < θ 0 < b
θ vbqt (0.025)
qt (0.025)
b
θ
Ke-Li Xu (Indiana University)
< vb
Bootstrap
1
September 13, 2019
21 / 47
Bootstrapping Linear regression
Pairwise bootstrap
The model
yi = xi0 β + ui ,
where xi is k 1, and ui
(0, σ2 ).
β.
bi = yi xi0 b
The OLS: b
β = (∑ni=1 xi xi0 ) 1 (∑ni=1 xi yi ). OLS residuals u
0
0
Pairwise bootstrap: generate iid sample (yi , xi ) by taking random draws
from (yi , xi0 )0 .
Population regression slope in the bootstrap world is
( E xi xi 0 )
1
E xi yi = ( n 1
n
∑ xi xi0 )
i =1
1
(n
1
n
∑ xi yi ) = bβ
i =1
True residuals: ui = yi
xi 0 b
β, which equals u
bj (for some j)
Orthogonal condition: E xi ui = E xi (yi
xi 0 b
β) = 0.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
22 / 47
In the original data world,
n 1 /2 ( b
β
d
β) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ).
In the bootstrap world,
b
β
n
, ( ∑ xi xi 0 )
1
= ( ∑ xi xi 0 )
1
i =1
n
i =1
We will show that
n 1 /2 ( b
β
Ke-Li Xu (Indiana University)
n
( ∑ xi yi )
i =1
n
n
i =1
i =1
( ∑ xi ( xi 0 b
β + ui )) = b
β + ( ∑ xi xi 0 )
1
n
( ∑ xi u i )
i =1
d
b
β) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ).
Bootstrap
September 13, 2019
(7)
23 / 47
To show (7), note that
1 n
1 n
xi xi 0 = E xi xi 0 + op (1) = ∑ xi xi0 + op (1) = Exi xi0 + op (1)
∑
n i =1
n i =1
n 1 /2
n
∑ xi ui
i =1
d
d
! N (0, E xi xi 0 ui 2 ) ! N (0, Exi xi0 ui2 ),
(8)
by CLT for iid sequences, where
E xi xi 0 ui 2
= E xi xi 0 ( yi
= n
1
n
xi 0 b
β )2 = n 1
p
∑ xi xi0 ubi 2 ! Exi xi0 ui2 .
n
∑ xi xi0 (yi
i =1
xi0 b
β )2
(9)
i =1
Thus (7) holds.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
24 / 47
We now consider the Wald test for H0 : R β = r (q linear restrictions):
W
= (R b
β
d
r )0 R (∑i xi xi0 ) 1 (∑i u
bi2 xi xi0 )(∑i xi xi0 ) 1 R 0
! χ2 (q ).
The Wald statistic in the bootstrap world: W =
b
[R ( b
β
β)]0 R (∑i xi xi 0 ) 1 (∑i u
bi 2 xi xi 0 )(∑i xi xi 0 ) 1 R 0
where u
b =y
x 0b
β .
i
i
We can show that
Ke-Li Xu (Indiana University)
i
1
1
(R b
β
R (b
β
r)
b
β ),
d
W ! χ2 (q ).
Bootstrap
(10)
September 13, 2019
25 / 47
p
bi 2 ! Exi xi0 ui2 .
To show (10), we only need to show n1 ∑ni=1 xi xi 0 u
Note that
1 n
bi 2
xi xi 0 u
n i∑
=1
=
1 n
xi xi 0 ( yi
n i∑
=1
=
1 n
xi xi 0 [ u i
n i∑
=1
=
1 n
xi xi 0 ui 2 + op (1)
n i∑
=1
β )2
xi 0 b
β
xi 0 (b
b
β)]2
p
! Exi xi 0 ui 2
p
! Exi xi0 ui2 ,
where the second line uses the bootstrap model yi = xi 0 b
β + ui , and the
1
/
2
b
b
third line uses β
β = Op ( n
) (by (7)), and the last line is by (9).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
26 / 47
Residual-based iid bootstrap
The model remains the same:
yi = xi0 β + ui ,
where xi is k
1, ui
(0, σ2 ) and Exi ui = 0.
Bootstrap model:
β + ui ,
yi = xi0 b
n 1 ∑ni=1 u
bi g (centered residuals)
This is referred to as (…xed-design) residual-based iid bootstrap.
where ui is a random draw from fu
bi
Centered residuals are needed so that E ui = 0. (If there is an intercept,
residuals are always centered)
Bootstrap data: fyi , xi g.
The OLS: b
β = (∑ni=1 xi xi0 ) 1 (∑ni=1 xi yi ).
OLS residuals u
bi = yi
xi0 b
β .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
27 / 47
In the bootstrap world,
n
= ( ∑ xi xi0 )
b
β
1
i =1
n
Note that
n 1 /2
n
∑
i =1
1
1
n
β + ui ))
( ∑ xi (xi0 b
i =1
i =1
i =1
= b
β + ( ∑ xi xi0 )
i =1
n
n
( ∑ xi yi ) = ( ∑ xi xi0 )
n
( ∑ xi ui )
i =1
d
xi ui ! N (0, plim n 1
n
∑ Var
i =1
d
(xi ui )) = N (0, σ2 Exi xi0 ),
by CLT for inid sequences, where
n 1
n
∑ Var
( xi u i ) = n
1
i =1
= n
1
n
∑ xi xi0 E
i =1
n
∑ xi xi0
i =1
So
Ke-Li Xu (Indiana University)
n 1 /2 ( b
β
ui 2
n 1
n
i =1
d
b
β) ! N (0, σ2 (Exi xi0 ) 1 ).
Bootstrap
p
∑ ubi2 ! σ2 Exi xi0 .
September 13, 2019
28 / 47
Thus, the iid residual-based bootstrap is only valid when conditional
homoskedasticity (CH, i.e. E (ui2 jxi ) = σ2 ) holds.
Under CH, in the original data world
n 1 /2 ( b
β
d
β) ! N (0, σ2 (Exi xi0 ) 1 ).
Without conditional homoskedasticity,
n 1 /2 ( b
β
d
β) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ).
b
Thus in genernal, without imposing CH, n1 /2 (b
β
β) does not provide a
1
/
2
b
valid approximation of n ( β β), distribution-wise.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
29 / 47
Wild Bootstrap
Now we consider a di¤erent type of bootstrap
iid
bi ei , where ei
Wild bootstrap: Let ui = u
(0, 1).
Under this scheme, the randomness in the bootstrap world comes from ei
(the original data are considered …xed, as common in the bootstrap analysis).
No need for centered residuals.
Several auxilary variables ei are used in practice:
I
I
Rademacher two-point random variables: P(ep
i = 1)=P(e
i = 1)=1/2.
p
Mammen’s two-point distribution: P(ei = 1 +2 5 )= 5p 1 and
P(ei =
I
1
p
2
Some use ei
5
2 5
p
)= 5p+1 .
2 5
N (0, 1).
Rademacher two-point random variables are recommended for ei .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
30 / 47
Then
n 1 /2
n
∑ xi ui
= n
d
=
n 1 /2 ( b
β
n
∑ xi ubi ei
i =1
i =1
Thus
1 /2
d
! N (0, plimn
N (0, Exi xi0 ui2 ).
1
n
∑ xi xi0 ubi2 )
i =1
d
b
β) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ).
(11)
Now the bootstrap is asymptotically valid.
e.g. con…dence interval for βj can be constructed (using the distribution of
b
n 1 /2 ( b
β
β ) to approximate that of n1 /2 (b
β
β )).
j
j
Ke-Li Xu (Indiana University)
j
Bootstrap
j
September 13, 2019
31 / 47
Recovering the DGP is important, the construction of the statistic is not
Bootstrapping conditional homoskedasticity-based test is still valid under
heteroskedasticiy if wild bootstrap is used.
Consider the case k = 1.
Under original data, the standard t-stat is
(b
β
h
i 1 /2
d
(Ex 2 )
β) σ
! N 0, i
b2 (∑ni=1 xi2 ) 1
1 (Ex 2 u 2 )(Ex 2 )
i i
i
σ2 (Exi2 ) 1
1
.
In bootstrap world (using wild bootstrap),
(b
β
h
i 1 /2
d
(Ex 2 )
b
β) σ
! N 0, i
b 2 (∑ni=1 xi2 ) 1
1 (Ex 2 u 2 )(Ex 2 )
i
i i
σ2 (Exi2 ) 1
1
,
using the result (11), and
b 2=n 1
σ
Ke-Li Xu (Indiana University)
n
∑ ubi 2 = n
i =1
1
n
∑ ubi2 ei2 = n
i =1
Bootstrap
1
n
∑ ubi2 + op
i =1
p
(1) ! σ 2 .
September 13, 2019
32 / 47
Restricted Bootstrap
When performing a test, we can use restricted estimator e
β (under the null
e
hypothesis) in the bootstrap DGP. Thus β is the true value in the bootstrap
universe.
This subsection and the next both highlight the role played by the true value
used in the bootstrap DGP.
We illustrate this idea in the regression model:
yi = xi0 β + ui .
Suppose H0 : β = β0 .
In this case, e
β = β0 .
Generally, we can consider H0 : R β = R 0 . Recall how e
β is obtained in this
case (CLS or EMD).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
33 / 47
Bootstrap DGP:
where ui = u
ei ei , with u
ei = yi
(wild bootstrap).
yi = xi0 β0 + ui ,
xi0 β0 (restricted residual) and ei
Bootstrap data: fyi , xi : i = 1, ..., n g.
Bootstrap test: use the distribution of n1 /2 (b
β
distribution of n1 /2 (b
β β0 ).
iid
(0, 1)
β0 ) to approximate the
There is evidence that the restricted bootstrap is more precise than the
unrestricted bootstrap (which we covered before).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
34 / 47
Simple calculations indicate the validity:
n 1 /2 ( b
β
since
b
β
d
β0 ) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ),
n
= ( ∑ xi xi0 )
i =1
1
n
n
( ∑ xi yi ) = ( ∑ xi xi0 )
i =1
i =1
n
n
i =1
i =1
1
(12)
n
( ∑ xi (xi0 β0 + ui ))
i =1
β0 + ( ∑ xi xi0 ) 1 ( ∑ xi ui ),
=
where
n 1 /2
n
∑ xi u i
= n
i =1
1 /2
n
∑ xi uei ei
i =1
d
=
d
! N (0, plimn
N (0, Exi xi0 ui2 ),
1
n
∑ xi xi0 uei2 )
i =1
noting that u
ei = ui under H0 : β = β0 .
The consistency of the test follows from the fact that (12) is true regardless
H0 . (On the other hand, H0 needs to be true so that
d
n 1 /2 ( b
β β0 ) ! N (0, (Exi xi0 ) 1 (Exi xi0 ui2 )(Exi xi0 ) 1 ). Otherwise
n 1 /2 ( b
β β0 ) would diverge.)
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
35 / 47
Bootstrapping F Test
This is an example of bootstrapping criterion function-based tests when the
criterion function involves restrictions.
Consider the regression model:
yi = xi0 β + ui ,
where for simplicity, we assume conditional homoskedasticity.
Suppose H0 : R β = R 0 , where R is q
The F-statistic
F =
k.
e2 σ
b2 )/q
(σ
,
2
b / (n k )
σ
e2 is restricted residual-variance estimator.
where σ
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
36 / 47
Bootstrap sample: fyi , xi g. (e.g. iid residual-based)
The bootstrap F-statistic
F =
e 2 σ
b 2 )/q
(σ
,
b 2 / (n k )
σ
e 2 is calculated using fyi , xi g and imposing the restriction R β = R b
where σ
β.
(A common mistake is still imposing the restriction R β = R 0 ).
[Think about how it works if β = ( β1 , β2 )0 and the null is β2 = 0.]
Then, as always, the distribution of F is used to approximate the
distribution of F .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
37 / 47
An alternative is to consider the restricted bootstrap
Restricted bootstrap sample: fyi , xi g. (e.g. iid residual-based).
The restricted bootstrap F-statistic
F
=
e 2 σ
b 2 )/q
(σ
,
b 2 / (n k )
σ
e 2 is calculated using fyi , xi g and imposing the restriction R β = R 0 .
where σ
The distribution of F
Ke-Li Xu (Indiana University)
is used to approximate the distribution of F .
Bootstrap
September 13, 2019
38 / 47
Final words: Parametric Bootstrap
What we have discussed is called nonparametric bootstrap, since the
empirical CDF Fb is a nonparametric estimator of F .
Nonparametric bootstrap is what most applications in econometrics use.
Parametric bootstrap: utilizes the function form of F .
Suppose yi
F (y j β), where F has a known form but β is unknown.
F (y j b
β), where b
β is the maximum
Parametric bootstrap draws yi
likelihood (ML) estimator of β.
Denote b
β is the MLE using the bootstrap data fy : i = 1, ..., n g.
i
We then use use the distribution of
distribution of n1 /2 (b
β β ).
Ke-Li Xu (Indiana University)
n 1 /2 ( b
β
Bootstrap
b
β) to approximate the
September 13, 2019
39 / 47
The second example: Gaussian linear regression model, yi jxi
b2
MLE: b
β, σ
Parametric bootstrap:
N (xi0 β, σ2 ).
β + ui ,
yi = xi0 b
b 2 ).
where ui
N (0, σ
Think about how this is di¤erent from the wild bootstrap when we use the
auxilary variable ei
N (0, 1).
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
40 / 47
A permutation test
Consider the single-regressor linear regression
yi = β 0 + β 1 xi + u i .
(13)
We test H0 : β1 = 0.
While we can use the asymptotic approach or the bootstrap, a permutation
test can also be used.
The idea is that if β1 = 0, then the order of fxi : i = 1, ..., n g shouldn’t
matter, if the order of fyi : i = 1, ..., n g is kept unchanged.
Let π (1), ..., π (n ) be a permutation of (1, ..., n ).
Suppose all permutations we consider are in the set Π. Then jΠj
n!.
For each permutation π, compute the least squares estimator of β1 using the
π
data fxπ (i ) , yi g, denoted as b
β1 .
π
If β1 = 0, b
β1 and b
β1 (original estimator) should come from the same
distribution.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
41 / 47
Here the test statistic is b
β1 .
π
We approximate its distribution Gn by GnΠ (the empirical distribution b
β1 over
fπ 2 Πg).
Denote the α-th quantile of GnΠ by q Π (α).
We reject H0 at level 5% if
b
β1 < q Π (0.025).
β1 > q Π (0.975) or b
If the approximation of Gn by GnΠ is valid, then under the null,
P (b
β1 2rejection region) = 0.05.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
42 / 47
Compare with bootstrap.
The bootstrap resamples the pair fxi , yi g (without re-matching within the
pair).
The permutation test resamples xi (without replacement), while using the
same data (in the same order) yi for each permutation.
In general, if you are interested in H0 : β1 = β01 , rewrite the model as
yi
β01 xi = β0 + ( β1
β01 )xi + ui .
Then the permutation test is implemented the same as above except using
the outcome yi β01 xi .
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
43 / 47
Asymptotics of the permutation test
Although such a test is widely used, it is only asymptotically valid under
conditional homoskedasticity (i.e. E (ui2 jxi ) = E (ui2 ) = σ2 ).
Asymptotic validity here means Gn and GnΠ converge to the same
distribution.
An important implication for CH is: for any π 2 Π, for each i,
E [ui2 (xπ (i )
Ex )2 ] = σ2 Var (x ).
(14)
It is because
=
E [ui2 (xπ (i )
(
E [ui2 (xi
Ex )2 ]
CH
Ex )2 ] = σ2 Var (x ),
Eui2 E (xπ (i )
iid data
Ex )2 =
σ2 Var (x ),
if π (i ) = i;
if π (i ) 6= i.
We can show that under H0 ,
Ke-Li Xu (Indiana University)
π d
n 1 /2 b
β1 ! N (0, σ2 Var (x ) 1 ).
Bootstrap
(15)
September 13, 2019
44 / 47
To prove (15), by the FWL theorem, under H0 ,
π
b
β1
=
h
∑ni=1 (xπ (i )
x )2
h
= ∑ni=1 (xπ (i )
H0
=
∑ni=1 (xi
i 1 n
∑ ( xπ (i )
x )2
x )(yi
i =1
i 1 n
∑ ( xπ (i )
y)
x ) ui
i =1
1
x )2
n
∑ ( xπ (i )
x ) ui .
i =1
We then have
π d
β1 ! N (0, Vπ ),
n 1 /2 b
where
Vπ
= Var (xi )
2
lim n 1
(16)
n
∑ Eui2 (xπ(i )
Exi )2
(17)
i =1
(14 )
2
= Var (xi )
= σ2 Var (x )
Ke-Li Xu (Indiana University)
1
Eui2 (xπ (i )
Exi )2
(18)
.
Bootstrap
September 13, 2019
45 / 47
In (16) above, we have used the CLT for independent but not identically
distributed data.
E.g. Suppose π : (1, 2, 3) ! (3, 2, 1). Then (x3 Ex )u1 and (x2 Ex )u2
are not identically distributed, if there is (higher order) dependece between x2
and u2 . These two are independent (by considering the correlation of any
moments of these two).
Thus (15) holds.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
46 / 47
If we allow conditional heteroskedasticity, Vπ takes the general form of (17).
(instead of (18))
It is because Eui2 (xπ (i ) Exi )2 may di¤er across i for a particular π.
This happens if π does not move every unit.
I
I
For i such that π (i ) 6= i , Eui2 (xπ (i ) Ex )2 = Eui2 E (xπ (i ) Ex )2 = σ2 Var (x ).
But for i such that π (i ) = i , Eui2 (xπ (i ) Ex )2 6= σ2 Var (x ).
If π moves every unit (like π : (1, 2, 3, 4) ! (2, 3, 4, 1)), then
Vπ = σ2 Var (x ) 1 .
π
Thus Vπ depends on π. We thus don’t expect the distribution b
β1 over Π
would provide a useful approximation.
Ke-Li Xu (Indiana University)
Bootstrap
September 13, 2019
47 / 47