Download Monte Carlo, Bootstrap, and Jackknife Estimation Assume that your

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Monte Carlo, Bootstrap, and Jackknife Estimation
Assume that your true model is
y = Xβ + u,
(1.1)
where u is i.i.d. with 1) E(u|X) = 0 and 2) E(uu′ |X) = σ 2 I, that is, the conditional mean of the error is zero and there is no autocorrelation or heteroskedasticity
conditional on X. Then using 1) the ordinary least squares (OLS) estimator of β,
β̂ = (X ′ X)−1 X ′ y, is unbiased. You will want to estimate the variance of β̂. Using 2)
an estimator of the var(β̂) = σ 2 (X ′ X)−1 is
σ̂ 2 (X ′ X)−1 ,
(1.2)
where σ̂ 2 = û′ û/(N − K), û = y − X β̂, N = the number of observations, and K = the
number of regressors. To understand what is meant by the var(β̂) and its estimator,
consider the following Monte Carlo procedure. Keep in mind that you would never
want to apply this procedure to the classical linear model in (1.1) for actual data,
because you can easily evaluate (1.2).
1. MONTE CARLO ESTIMATION OF STANDARD ERRORS OF β̂: (= positive
square root of the estimated variance).
a. Assume a value for β, which is otherwise unobservable. You also select a matrix
of values for X , which you hold constant over repeated trials.
b. Draw u randomly with replacement from some distribution you assume to be
correct, using a random number generator. This use of a random number generator yields the term “Monte Carlo,” famed for its roulette wheels and games of
chance.
c. Compute y from equation (1.1).
d. Estimate β̂ by regressing your generated y on X obtaining
β̂ = (X ′ X)−1 X ′ y.
(1.3)
e. Repeat steps (a)-(d) many times (say 10,000) holding β constant. Note that
you have generated 10,000 drawings of the random variable u and y through
equation (1.1), from which we could compute 10,000 estimates of β̂ using (1.3).
Our estimate of the sample variance of β̂ over 10,000 such outcomes is our sample
measure of the population variance.
1
f. The main use of the Monte Carlo method is to compute the bias and meansquare error of your estimator when it is difficult to do so analytically. However,
it is also useful to demonstrate omitted variable bias and other related models to
econometrics students. Keep in mind that the assumptions made in the Monte
Carlo method may make your results specific to your exact model.
2. BOOTSTRAP ESTIMATION OF THE STANDARD ERRORS OF β̂.
The term bootstrap implies that you are going pull yourself up by your bootstraps.
The wide range of bootstrap methods fit into two categories: 1) methods that allow computation of standard errors when the analytical formulas are hard to derive and 2) bootstrap
methods that lead to better small-sample approximations.
Here you are confined to one actual data set and wish to resample from the empirical
distribution of the residuals or the original {X, y} data, rather than assume some distribution of the true error as with Monte Carlo analysis. Given (1.1) as your true model, you
could easily evaluate (1.2) and not do any bootstrapping. With more complex models containing non-normal error terms and non-linearities in β, or in two-step models where you
need to correct the estimated standard errors of the second-step estimators, the derivation
of analytical formulas for the variance of β̂ is complex. Examples are two-step M estimators, two-step panel data estimators, and two-step logit or probit estimators. In these
cases θ̂, a second-step estimator, is a function of parameters that are estimated in the first
step. The bootstrap will adjust for this. For software that does not compute heteroskedastic consistent (HC) standard errors, if there is heteroskedasticity in the model, the wild
and pairs bootstrap estimators make the HC correction of the estimated standard errors.
With clustered data, cluster-robust standard errors can be obtained by resampling the
clusters via bootstrapping. Theory and Monte Carlo evidence indicate that the bootstrap
estimates are more accurate (measured by the size and power of a t-test based on the
estimated standard error) in small samples than the asymptotic formula, when an asymptotically pivotal statistic is employed (one whose asymptotic normal distribution does not
depend on unknown parameters). Otherwise, there is no guarantee of a gain in accuracy
However, usually there is a gain in accuracy even if an asymptotically pivotal statistic
is not employed. A nice summary is found in “Bootstrap Inference in Econometrics” by
James MacKinnon, Dept. of Economics Working Paper, Queens Univ., June, 2002. Also,
see J. L. Horowitz, The Bootstrap, Ch. 52, Handbook of Econometrics, J.J. Heckman and
E. Leamer editors, Vol. 5, 2001 for technical derivations.
There are three basic non-parametric bootstrap methods we will focus on: the naive
bootstrap, pairs bootstrapping, and the wild bootstrap. These are in contrast to the less
2
popular parametric bootstrap. The three methods are most easily explained for the simple
model (1.1). As with the Monte Carlo method, you assume that the model generating
your data is the same as in (1.1). However, now you do not assume knowledge of β or
u and do not generate random data from (1.1). Instead you use the estimator β̂ and the
original data {X, y}.:
1. Bootstrap Methods
1.1 Non-Parametric: Residual Bootstrap
a. Estimate β̂ = (X ′ X)−1 X ′ y.
b. Compute û = y − X β̂. You work with û instead of assuming the distribution
of u as in Monte Carlo estimation.
c. Draw with replacement a sample of size N using a discrete uniform random
number generator U [1, N ], where N is your sample size. Let these random
numbers be represented by z1 , . . . , zN . Generate element u∗n as element zn of
û, n = 1, . . . , N . What this means is that each element of û has probability
1/N of being drawn. See the Residual Bootstrap example in the Stata do
file called monte carlo.do.
d. Treating y ∗ = X β̂ + u∗ as your true model, compute y ∗ .
e. Compute β ∗ = (X ′ X)−1 X ′ y ∗ .
f. Repeat (c)-(e) B times. See MacKinnon for details.
g. Compute the square root of the sample variance of these β ∗ estimates. This
is the estimate of the standard error of β̂. With B bootstrap replications,
∗
compute
β1∗ , . . . , βB
∑
1
(βb∗ − β̄ ∗ )2 .
(B − 1)
B
s2β̂,Boot =
b=1
∑B
where β̄ ∗ = B −1 b=1 βb∗ .
h. Take the square root of s2β̂,Boot to get the bootstrap estimate of the standard
error.
i. This bootstrap provides no asymptotic refinement (an improved approximation to the finite-sample distribution of an asymptotically pivotal statistic),
since its distribution depends on the unknown parameter defining the mean
and variance of β̂. That is, there will be no guarantee of an improvement in
finite-sample performance. However, such an improvement usually obtains
anyway. This method can be very useful in computing adjusted standard
3
errors with 2-step models or in computing cluster-robust standard errors by
resampling clusters.
1.2 Non-Parametric: Pairs Bootstrap
a. Follow step a above.
b. Then draw pairs randomly with replacement, where the probability of any
pair being drawn is equal to 1/N , from {X, y} to obtain {X ∗ , y ∗ }
c. Then use the {X ∗ , y ∗ } data to obtain the pairs estimator βp∗
′
′
(X ∗ X ∗ )−1 X ∗ y ∗ ,
=
d. Note that the pairs bootstrap produces a HC covariance matrix. See Lancaster (2003) for a proof of this. See the Pairs Estimator in the Stata file
called monte carlo.do.
1.3 Non-Parametric: Wild Bootstrap
a. The wild bootsrap also produces a HC covariance matrix; see MacKinnon
(2002) for details.
b. The wild first generates
yn∗ = Xn β̂ + f (ûn )vn∗ ,
(1.4)
where
f (ûn ) =
ûn
(1 − hn )1/2
(1.5)
and hn is the nth diagonal element of X(X ′ X)−1 X ′ . We do this normalization so that, if un is homoskedastic, then the normalized residual in (1.5) is
homoskedastic. To see this remember that ûn = (1 − hn )un and compute
the variance of ûn , where (1 − hn ) is sometimes called mn .
c. The best approach to specifying vn∗ is to use the Rademacher distribution
(See Davidson and Flachaire (2001)):
vn∗
{
=
1
with probability 1/2,
.
−1 with probability 1/2
(1.6)
d. Now vn∗ has E(vn∗ ) = 0, E(vn∗2 ) = 1, E(vn∗3 ) = 0, and E(vn∗4 ) = 1. Since vn∗
and ûn are independent, the mean of the composite residual is zero, which
preserves E(ûn ) = 0. This is a nice property and if we take Xn as given, this
implies unbiasedness of β ∗ .
4
e. One can prove that var(wz) = var(w)var(z) assuming independence of w
and z and E(w) = E(z) = 0. Then the variance of the composite residual
is one times the variance of ûn , preserving the variance of ûn , the skewness
of ûn is eliminated, but the kurtosis of ûn is preserved. Further, Wu and
Mammen (1993) shows that the asymptotic distribution of their version of
the wild bootstrap is the same as the asymptotic distribution of various
statistics. These asymptotic refinements are due to their wild bootstrap’s
taking account of the skewness of ûn . However, their version of the wild
ignores kurtosis.
f. Now follow steps e) – h) of section 1.1 using the wild data for y ∗ generated
in step b) of this section. See the Wild Estimator in the Stata do file called
monte carlo.do.
1.4 Pairs vs. Wild
Based on Atkinson and Cornwell, ”Inference in Two-Step Panel Data Models with
Instruments and Time-Invariant Regressors: Bootstrap versus Analytic Estimators”, for
models with endogeneity, the wild has more accurate size and virtually the same power as
the pairs estimator in estimation of t-values for the second-step estimators. Both generally
outperform the asymptotic formula in terms of size and power. In a linear model context
without panel data, Davidson and Flachaire (2001) find that the wild often outperforms
the pairs when the error is heteroskedastic.
1.5 Parametric Bootstrap If it known that yn ∼ Normal[µ, σ 2 ] then we could
obtain B bootstrap samples of size N by drawing from the Normal[µ̂, s2 ] distribution. This is an example of a parametric bootstrap.
2. Number of Bootstrap Draws
The bootstrap asymptotics rely on big N , even if B is small. However, the bootstrap
is more accurate with big B. How large B should be depends on the simulation error you
can accept in your work. Davidson and MacKinnon recommend B = 399 for a type I
error of .05 and B = 1, 499 for tests at a level of .01. If you are performing bootstrapping
within a Monte Carlo analysis, then B = 399 is adequate. You need to have α ∗ (B + 1)
be an integer. Note: If you assume a two-sided confidence interval with α = .05 then for
the upper-tail, 399* .025=9.98 is the theoretical number of significant t-values you would
5
expect if the size were correct. You would array t-values from high to low. With 400
bootstrap draws, 400*.025=10, which says that you should have 10 t-values equal to 1.96
or greater to have correct size. However, if the 10-th ranked t-value is the last t-value
greater than or equal to 1.96, should the 10-th ranked t-value belong to one set or the
other. It sits on the cusp. Since .025 percent of 399 is 9.98 and 9.98 is not an even number,
you eliminate ambiguity, since the required number is not an integer. This is not a major
issue in my opinion.
3. Bias Adjustment Using The Bootstrap or Jackknife
In small samples many sandwich estimators may be biased. Weak instruments may
also cause bias. We can correct for these biases using the bootstrap or the jackknife via
the following:
∑B
a. Since the Bootstrap estimator of β̂ is 1/B b=1 βb∗ , we can compute the bias
∑B
∑B
correction for β̂ as β̂ − (1/B b=1 βb∗ − β̂) = 2β̂ − 1/B b=1 βb∗ . The intuition is
that since we do not know β, we treat β̂ as the “true” value and determine the
bias of the bootstrap estimator relative to this value. We then adjust β̂ by this
computed bias, assuming that the bias of the bootstrap estimator relative to β̂ is
the same as the bias of β̂ relative to β.
b. We can compute the jackknife estimator of the standard deviation of β̂ for a
sample of size N, n = 1, . . . , N, by computing N jackknife estimates of β obtained
by successively dropping observation n and recomputing βJ,n , where J stands for
Jackknife. Then compute the variance of the N estimates and multiply by N − 1
to get the estimated variance of β̂. Take the square root to get the estimated
standard error. We can employ the jackknife two-stage-least-squares (JK2SLS)
estimator of Hahn, J., and J. Hausman (2003), “Weak Instruments: Diagnosis
and Cures in Empirical Econometrics,” American Economics Review Papers and
Proceedings 93: 118–125, to correct for the bias caused by weak instruments.
The formula for the jackknife bias correction is given in Shao and Tu (1995). To
compute the jackknife bias correction for the estimated coefficients, let β̂ be the
estimator of β for a sample of size N . First compute N jackknife estimates of
β̂ obtained by successively dropping one observation and recomputing β̂. Call
∑N
each of these N estimates βJ,n , n = 1 . . . , N , and their average β̄J = n=1 βJ,n .
Define the jackknife bias estimator as
BIASJ = (N − 1)(β̄J − β̂).
6
(1.7)
Then the jackknife bias-adjusted (BA) estimator of β is
β̂BA = β̂ − BIASJ = N β̂ − (N − 1)(β̄J ).
(1.8)
Again, the intuition is that since we do not know β, we treat β̂ as the “true” value
and determine the bias of the jackknife estimator relative to this value. We then
adjust β̂ by this computed bias, assuming that the bias of the jacknife estimator
relative to β̂ is the same as the bias of β̂ relative to β.
c. The jackknife uses fewer computations (N < B) than the the bootstrap, but is
outperformed by the bootstrap as B → ∞.
4. Hypothesis Testing
Assume a model y = α + xβ + u. You can compute
t̃ = (β̂ − β)/sβ̂,Boot ,
using the bootstrap estimator of the standard deviation. For the specific null hypothesis
that β = 0 you would compute
t̃ = (β̂ − 0)/sβ̂,Boot .
While this is asymptotically valid so long as β ∗ and β̂ approach the true β, this will not
give you asymptotic refinements for any N. To obtain asymptotic refinement, we need to
compute asymptotically pivotal test statistics whose asymptotic normal distribution does
not depend on unknown parameters. This would require the studentized test statistic
based on the asymptotic standard error of β̂ = sθ̂∗ . We fashion this after the usual test
b
statistic
t = (β̂ − β)/sβ̂ ∼ N [0, 1],
that provides asymptotic refinement since it is asymptotically pivotal. This occurs because its asymptotic distribution does not depend on unknown parameters. To achieve
asymptotic refinement, you have to compute
t∗ = (β ∗ − β̂)/sβ̂ ∗ ,
b
where sβ̂ ∗ is the analytic or asymptotic estimator evaluated using the bootstrap data for
b
each draw, and then find t∗(1−α/2) and t∗(α/2) for the bootstrap after rank ordering the B
bootstrap draws. Use this to test the null hypothesis. For α = .05, take the (1−α/2) = 97.5
7
percentile and the (α/2) = .025 percentile. Then these standardized t∗ values can then
compared with the t̃ value. If t̃ > t∗(1−α/2) or t̃ < t∗(α/2) then the null hypothesis is
rejected. We are comparing one standardized statistic with another. However, computing
the analytic formula may be very difficult and one may have to use the bootstrap estimator
based on the standard deviations (sβ̂,Boot ), computed over the B bootstrap trials. This will
not yield asymptotic refinements but will probably still be better than using the asymptotic
formula.
5. Boostrapping Time Series Data
The bootstrap does not generally work well with time series data. The reason is that
the bootstrap relies on resampling from an iid distrubution. With standard bootstrapping
you are randomly selecting among a set of residuals which follow some autocorrelation
process, thereby destroying that process. Two alternatives that can be employed are block
bootstrapping and the sieve bootstrap. With block bootstrapping, time-series blocks that
capture the autoregressive process are randomly selected and the entire block is resampled.
The sieve bootstrap works by fitting an autoregressive process with order p for the original
data and then generating boostrap samples by resampling the rescaled residuals randomly
which are assumed to be iid. Since the sieve imposes more structure on the DGP, it should
have better performance than the block sootstrap. As an example of the sieve, with p = 1
consider the model
yt = βxt + ut ,
(1.9)
where
ut = ρut−1 + ϵt ,
(1.10)
and ϵt is white noise. Now estimate β and ρ and obtain
ϵ̂t = ût − ρ̂ût−1 .
Bootstrap these residuals to get ϵ̂∗ , t = 1, . . . , T . Then recursively compute û∗t = ρû∗t−1 +ϵ̂∗t
and hence yt∗ = β̂xt + û∗t . Then regress yt∗ on xt .
The Moving Block Bootstrap constructs overlapping moving blocks. For the movingblock bootstrap, there are n - b + 1 blocks. The first contains obs. 1 through b, the second
contains obs. 2 through b + 1, and the last contains obs. n - b + 1 through n. Choice of
b is critical. In theory, it must increase as n increases. If blocks are too short, bootstrap
samples cannot mimic original sample. Dependence is broken whenever we start a new
block. If blocks are too long, bootstrap samples are not random enough.
8
For a nice discussion of the moving block bootstrap and a comparison of this and
the other methods for time series see Bootstrap Methods in Econometrics by James G.
MacKinnon Department of Economics Queens University Kingston, Ontario, Canada K7L
3N6 [email protected] http://www.econ.queensu.ca/faculty/mackinnon/ September,
2005.
6. Boostrapping Panel Data
With both panel-data bootstrap methods, three resampling schemes are available.
These are cross-sectional (also called panel bootstrap) resampling, temporal resampling
(also called block bootstrap resampling), and cross-sectional/temporal resampling. With
panel-bootstrap resampling, one randomly selects among N cross-sectional units and uses
all T observations for each. If cross-sectional dependence exists, one can select the relevant
blocks of cross-sectional units. With temporal resampling, one randomly selects temporal
units and uses all N observations for each. If temporal dependence exists, one can select
the relevant blocks of temporal units. Of course this choice is critical to the accuracy
of the bootstrap. With cross-sectional/temporal resampling, both methods are utilized.
Following Cameron and Trivedi (2005), in the fixed-T case consistent (as N → ∞) standard
errors can be obtained using the cross-sectional bootstrap method. Hence, we employ
this method for both the pairs and wild methods, where we assume no cross-sectional
or temporal dependence. Also, see Kapetnaios (2008), A Bootstrap Procedure for Panel
Data Sets with Many Cross-Sectional Units, The Econometrics Journal 11, 377-95, who
shows that if the data do not exhibit cross-sectional dependence but exhibit temporal
dependence, then cross-sectional resampling is superior to block bootstrap resampling.
Further, he shows that cross-sectional resampling provides asymptotic refinements. Monte
Carlo results using these assumptions indicate the superiority of the cross-sectional method.
9