Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Monte Carlo, Bootstrap, and Jackknife Estimation Assume that your true model is y = Xβ + u, (1.1) where u is i.i.d. with 1) E(u|X) = 0 and 2) E(uu′ |X) = σ 2 I, that is, the conditional mean of the error is zero and there is no autocorrelation or heteroskedasticity conditional on X. Then using 1) the ordinary least squares (OLS) estimator of β, β̂ = (X ′ X)−1 X ′ y, is unbiased. You will want to estimate the variance of β̂. Using 2) an estimator of the var(β̂) = σ 2 (X ′ X)−1 is σ̂ 2 (X ′ X)−1 , (1.2) where σ̂ 2 = û′ û/(N − K), û = y − X β̂, N = the number of observations, and K = the number of regressors. To understand what is meant by the var(β̂) and its estimator, consider the following Monte Carlo procedure. Keep in mind that you would never want to apply this procedure to the classical linear model in (1.1) for actual data, because you can easily evaluate (1.2). 1. MONTE CARLO ESTIMATION OF STANDARD ERRORS OF β̂: (= positive square root of the estimated variance). a. Assume a value for β, which is otherwise unobservable. You also select a matrix of values for X , which you hold constant over repeated trials. b. Draw u randomly with replacement from some distribution you assume to be correct, using a random number generator. This use of a random number generator yields the term “Monte Carlo,” famed for its roulette wheels and games of chance. c. Compute y from equation (1.1). d. Estimate β̂ by regressing your generated y on X obtaining β̂ = (X ′ X)−1 X ′ y. (1.3) e. Repeat steps (a)-(d) many times (say 10,000) holding β constant. Note that you have generated 10,000 drawings of the random variable u and y through equation (1.1), from which we could compute 10,000 estimates of β̂ using (1.3). Our estimate of the sample variance of β̂ over 10,000 such outcomes is our sample measure of the population variance. 1 f. The main use of the Monte Carlo method is to compute the bias and meansquare error of your estimator when it is difficult to do so analytically. However, it is also useful to demonstrate omitted variable bias and other related models to econometrics students. Keep in mind that the assumptions made in the Monte Carlo method may make your results specific to your exact model. 2. BOOTSTRAP ESTIMATION OF THE STANDARD ERRORS OF β̂. The term bootstrap implies that you are going pull yourself up by your bootstraps. The wide range of bootstrap methods fit into two categories: 1) methods that allow computation of standard errors when the analytical formulas are hard to derive and 2) bootstrap methods that lead to better small-sample approximations. Here you are confined to one actual data set and wish to resample from the empirical distribution of the residuals or the original {X, y} data, rather than assume some distribution of the true error as with Monte Carlo analysis. Given (1.1) as your true model, you could easily evaluate (1.2) and not do any bootstrapping. With more complex models containing non-normal error terms and non-linearities in β, or in two-step models where you need to correct the estimated standard errors of the second-step estimators, the derivation of analytical formulas for the variance of β̂ is complex. Examples are two-step M estimators, two-step panel data estimators, and two-step logit or probit estimators. In these cases θ̂, a second-step estimator, is a function of parameters that are estimated in the first step. The bootstrap will adjust for this. For software that does not compute heteroskedastic consistent (HC) standard errors, if there is heteroskedasticity in the model, the wild and pairs bootstrap estimators make the HC correction of the estimated standard errors. With clustered data, cluster-robust standard errors can be obtained by resampling the clusters via bootstrapping. Theory and Monte Carlo evidence indicate that the bootstrap estimates are more accurate (measured by the size and power of a t-test based on the estimated standard error) in small samples than the asymptotic formula, when an asymptotically pivotal statistic is employed (one whose asymptotic normal distribution does not depend on unknown parameters). Otherwise, there is no guarantee of a gain in accuracy However, usually there is a gain in accuracy even if an asymptotically pivotal statistic is not employed. A nice summary is found in “Bootstrap Inference in Econometrics” by James MacKinnon, Dept. of Economics Working Paper, Queens Univ., June, 2002. Also, see J. L. Horowitz, The Bootstrap, Ch. 52, Handbook of Econometrics, J.J. Heckman and E. Leamer editors, Vol. 5, 2001 for technical derivations. There are three basic non-parametric bootstrap methods we will focus on: the naive bootstrap, pairs bootstrapping, and the wild bootstrap. These are in contrast to the less 2 popular parametric bootstrap. The three methods are most easily explained for the simple model (1.1). As with the Monte Carlo method, you assume that the model generating your data is the same as in (1.1). However, now you do not assume knowledge of β or u and do not generate random data from (1.1). Instead you use the estimator β̂ and the original data {X, y}.: 1. Bootstrap Methods 1.1 Non-Parametric: Residual Bootstrap a. Estimate β̂ = (X ′ X)−1 X ′ y. b. Compute û = y − X β̂. You work with û instead of assuming the distribution of u as in Monte Carlo estimation. c. Draw with replacement a sample of size N using a discrete uniform random number generator U [1, N ], where N is your sample size. Let these random numbers be represented by z1 , . . . , zN . Generate element u∗n as element zn of û, n = 1, . . . , N . What this means is that each element of û has probability 1/N of being drawn. See the Residual Bootstrap example in the Stata do file called monte carlo.do. d. Treating y ∗ = X β̂ + u∗ as your true model, compute y ∗ . e. Compute β ∗ = (X ′ X)−1 X ′ y ∗ . f. Repeat (c)-(e) B times. See MacKinnon for details. g. Compute the square root of the sample variance of these β ∗ estimates. This is the estimate of the standard error of β̂. With B bootstrap replications, ∗ compute β1∗ , . . . , βB ∑ 1 (βb∗ − β̄ ∗ )2 . (B − 1) B s2β̂,Boot = b=1 ∑B where β̄ ∗ = B −1 b=1 βb∗ . h. Take the square root of s2β̂,Boot to get the bootstrap estimate of the standard error. i. This bootstrap provides no asymptotic refinement (an improved approximation to the finite-sample distribution of an asymptotically pivotal statistic), since its distribution depends on the unknown parameter defining the mean and variance of β̂. That is, there will be no guarantee of an improvement in finite-sample performance. However, such an improvement usually obtains anyway. This method can be very useful in computing adjusted standard 3 errors with 2-step models or in computing cluster-robust standard errors by resampling clusters. 1.2 Non-Parametric: Pairs Bootstrap a. Follow step a above. b. Then draw pairs randomly with replacement, where the probability of any pair being drawn is equal to 1/N , from {X, y} to obtain {X ∗ , y ∗ } c. Then use the {X ∗ , y ∗ } data to obtain the pairs estimator βp∗ ′ ′ (X ∗ X ∗ )−1 X ∗ y ∗ , = d. Note that the pairs bootstrap produces a HC covariance matrix. See Lancaster (2003) for a proof of this. See the Pairs Estimator in the Stata file called monte carlo.do. 1.3 Non-Parametric: Wild Bootstrap a. The wild bootsrap also produces a HC covariance matrix; see MacKinnon (2002) for details. b. The wild first generates yn∗ = Xn β̂ + f (ûn )vn∗ , (1.4) where f (ûn ) = ûn (1 − hn )1/2 (1.5) and hn is the nth diagonal element of X(X ′ X)−1 X ′ . We do this normalization so that, if un is homoskedastic, then the normalized residual in (1.5) is homoskedastic. To see this remember that ûn = (1 − hn )un and compute the variance of ûn , where (1 − hn ) is sometimes called mn . c. The best approach to specifying vn∗ is to use the Rademacher distribution (See Davidson and Flachaire (2001)): vn∗ { = 1 with probability 1/2, . −1 with probability 1/2 (1.6) d. Now vn∗ has E(vn∗ ) = 0, E(vn∗2 ) = 1, E(vn∗3 ) = 0, and E(vn∗4 ) = 1. Since vn∗ and ûn are independent, the mean of the composite residual is zero, which preserves E(ûn ) = 0. This is a nice property and if we take Xn as given, this implies unbiasedness of β ∗ . 4 e. One can prove that var(wz) = var(w)var(z) assuming independence of w and z and E(w) = E(z) = 0. Then the variance of the composite residual is one times the variance of ûn , preserving the variance of ûn , the skewness of ûn is eliminated, but the kurtosis of ûn is preserved. Further, Wu and Mammen (1993) shows that the asymptotic distribution of their version of the wild bootstrap is the same as the asymptotic distribution of various statistics. These asymptotic refinements are due to their wild bootstrap’s taking account of the skewness of ûn . However, their version of the wild ignores kurtosis. f. Now follow steps e) – h) of section 1.1 using the wild data for y ∗ generated in step b) of this section. See the Wild Estimator in the Stata do file called monte carlo.do. 1.4 Pairs vs. Wild Based on Atkinson and Cornwell, ”Inference in Two-Step Panel Data Models with Instruments and Time-Invariant Regressors: Bootstrap versus Analytic Estimators”, for models with endogeneity, the wild has more accurate size and virtually the same power as the pairs estimator in estimation of t-values for the second-step estimators. Both generally outperform the asymptotic formula in terms of size and power. In a linear model context without panel data, Davidson and Flachaire (2001) find that the wild often outperforms the pairs when the error is heteroskedastic. 1.5 Parametric Bootstrap If it known that yn ∼ Normal[µ, σ 2 ] then we could obtain B bootstrap samples of size N by drawing from the Normal[µ̂, s2 ] distribution. This is an example of a parametric bootstrap. 2. Number of Bootstrap Draws The bootstrap asymptotics rely on big N , even if B is small. However, the bootstrap is more accurate with big B. How large B should be depends on the simulation error you can accept in your work. Davidson and MacKinnon recommend B = 399 for a type I error of .05 and B = 1, 499 for tests at a level of .01. If you are performing bootstrapping within a Monte Carlo analysis, then B = 399 is adequate. You need to have α ∗ (B + 1) be an integer. Note: If you assume a two-sided confidence interval with α = .05 then for the upper-tail, 399* .025=9.98 is the theoretical number of significant t-values you would 5 expect if the size were correct. You would array t-values from high to low. With 400 bootstrap draws, 400*.025=10, which says that you should have 10 t-values equal to 1.96 or greater to have correct size. However, if the 10-th ranked t-value is the last t-value greater than or equal to 1.96, should the 10-th ranked t-value belong to one set or the other. It sits on the cusp. Since .025 percent of 399 is 9.98 and 9.98 is not an even number, you eliminate ambiguity, since the required number is not an integer. This is not a major issue in my opinion. 3. Bias Adjustment Using The Bootstrap or Jackknife In small samples many sandwich estimators may be biased. Weak instruments may also cause bias. We can correct for these biases using the bootstrap or the jackknife via the following: ∑B a. Since the Bootstrap estimator of β̂ is 1/B b=1 βb∗ , we can compute the bias ∑B ∑B correction for β̂ as β̂ − (1/B b=1 βb∗ − β̂) = 2β̂ − 1/B b=1 βb∗ . The intuition is that since we do not know β, we treat β̂ as the “true” value and determine the bias of the bootstrap estimator relative to this value. We then adjust β̂ by this computed bias, assuming that the bias of the bootstrap estimator relative to β̂ is the same as the bias of β̂ relative to β. b. We can compute the jackknife estimator of the standard deviation of β̂ for a sample of size N, n = 1, . . . , N, by computing N jackknife estimates of β obtained by successively dropping observation n and recomputing βJ,n , where J stands for Jackknife. Then compute the variance of the N estimates and multiply by N − 1 to get the estimated variance of β̂. Take the square root to get the estimated standard error. We can employ the jackknife two-stage-least-squares (JK2SLS) estimator of Hahn, J., and J. Hausman (2003), “Weak Instruments: Diagnosis and Cures in Empirical Econometrics,” American Economics Review Papers and Proceedings 93: 118–125, to correct for the bias caused by weak instruments. The formula for the jackknife bias correction is given in Shao and Tu (1995). To compute the jackknife bias correction for the estimated coefficients, let β̂ be the estimator of β for a sample of size N . First compute N jackknife estimates of β̂ obtained by successively dropping one observation and recomputing β̂. Call ∑N each of these N estimates βJ,n , n = 1 . . . , N , and their average β̄J = n=1 βJ,n . Define the jackknife bias estimator as BIASJ = (N − 1)(β̄J − β̂). 6 (1.7) Then the jackknife bias-adjusted (BA) estimator of β is β̂BA = β̂ − BIASJ = N β̂ − (N − 1)(β̄J ). (1.8) Again, the intuition is that since we do not know β, we treat β̂ as the “true” value and determine the bias of the jackknife estimator relative to this value. We then adjust β̂ by this computed bias, assuming that the bias of the jacknife estimator relative to β̂ is the same as the bias of β̂ relative to β. c. The jackknife uses fewer computations (N < B) than the the bootstrap, but is outperformed by the bootstrap as B → ∞. 4. Hypothesis Testing Assume a model y = α + xβ + u. You can compute t̃ = (β̂ − β)/sβ̂,Boot , using the bootstrap estimator of the standard deviation. For the specific null hypothesis that β = 0 you would compute t̃ = (β̂ − 0)/sβ̂,Boot . While this is asymptotically valid so long as β ∗ and β̂ approach the true β, this will not give you asymptotic refinements for any N. To obtain asymptotic refinement, we need to compute asymptotically pivotal test statistics whose asymptotic normal distribution does not depend on unknown parameters. This would require the studentized test statistic based on the asymptotic standard error of β̂ = sθ̂∗ . We fashion this after the usual test b statistic t = (β̂ − β)/sβ̂ ∼ N [0, 1], that provides asymptotic refinement since it is asymptotically pivotal. This occurs because its asymptotic distribution does not depend on unknown parameters. To achieve asymptotic refinement, you have to compute t∗ = (β ∗ − β̂)/sβ̂ ∗ , b where sβ̂ ∗ is the analytic or asymptotic estimator evaluated using the bootstrap data for b each draw, and then find t∗(1−α/2) and t∗(α/2) for the bootstrap after rank ordering the B bootstrap draws. Use this to test the null hypothesis. For α = .05, take the (1−α/2) = 97.5 7 percentile and the (α/2) = .025 percentile. Then these standardized t∗ values can then compared with the t̃ value. If t̃ > t∗(1−α/2) or t̃ < t∗(α/2) then the null hypothesis is rejected. We are comparing one standardized statistic with another. However, computing the analytic formula may be very difficult and one may have to use the bootstrap estimator based on the standard deviations (sβ̂,Boot ), computed over the B bootstrap trials. This will not yield asymptotic refinements but will probably still be better than using the asymptotic formula. 5. Boostrapping Time Series Data The bootstrap does not generally work well with time series data. The reason is that the bootstrap relies on resampling from an iid distrubution. With standard bootstrapping you are randomly selecting among a set of residuals which follow some autocorrelation process, thereby destroying that process. Two alternatives that can be employed are block bootstrapping and the sieve bootstrap. With block bootstrapping, time-series blocks that capture the autoregressive process are randomly selected and the entire block is resampled. The sieve bootstrap works by fitting an autoregressive process with order p for the original data and then generating boostrap samples by resampling the rescaled residuals randomly which are assumed to be iid. Since the sieve imposes more structure on the DGP, it should have better performance than the block sootstrap. As an example of the sieve, with p = 1 consider the model yt = βxt + ut , (1.9) where ut = ρut−1 + ϵt , (1.10) and ϵt is white noise. Now estimate β and ρ and obtain ϵ̂t = ût − ρ̂ût−1 . Bootstrap these residuals to get ϵ̂∗ , t = 1, . . . , T . Then recursively compute û∗t = ρû∗t−1 +ϵ̂∗t and hence yt∗ = β̂xt + û∗t . Then regress yt∗ on xt . The Moving Block Bootstrap constructs overlapping moving blocks. For the movingblock bootstrap, there are n - b + 1 blocks. The first contains obs. 1 through b, the second contains obs. 2 through b + 1, and the last contains obs. n - b + 1 through n. Choice of b is critical. In theory, it must increase as n increases. If blocks are too short, bootstrap samples cannot mimic original sample. Dependence is broken whenever we start a new block. If blocks are too long, bootstrap samples are not random enough. 8 For a nice discussion of the moving block bootstrap and a comparison of this and the other methods for time series see Bootstrap Methods in Econometrics by James G. MacKinnon Department of Economics Queens University Kingston, Ontario, Canada K7L 3N6 [email protected] http://www.econ.queensu.ca/faculty/mackinnon/ September, 2005. 6. Boostrapping Panel Data With both panel-data bootstrap methods, three resampling schemes are available. These are cross-sectional (also called panel bootstrap) resampling, temporal resampling (also called block bootstrap resampling), and cross-sectional/temporal resampling. With panel-bootstrap resampling, one randomly selects among N cross-sectional units and uses all T observations for each. If cross-sectional dependence exists, one can select the relevant blocks of cross-sectional units. With temporal resampling, one randomly selects temporal units and uses all N observations for each. If temporal dependence exists, one can select the relevant blocks of temporal units. Of course this choice is critical to the accuracy of the bootstrap. With cross-sectional/temporal resampling, both methods are utilized. Following Cameron and Trivedi (2005), in the fixed-T case consistent (as N → ∞) standard errors can be obtained using the cross-sectional bootstrap method. Hence, we employ this method for both the pairs and wild methods, where we assume no cross-sectional or temporal dependence. Also, see Kapetnaios (2008), A Bootstrap Procedure for Panel Data Sets with Many Cross-Sectional Units, The Econometrics Journal 11, 377-95, who shows that if the data do not exhibit cross-sectional dependence but exhibit temporal dependence, then cross-sectional resampling is superior to block bootstrap resampling. Further, he shows that cross-sectional resampling provides asymptotic refinements. Monte Carlo results using these assumptions indicate the superiority of the cross-sectional method. 9