Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrap Methods for Complex Sampling Designs in Finite Population Erika Antal and Yves Tillé University of Neuchâtel Colloque Deville Neuchâtel 2009 Erika Antal and Yves Tillé () Bootstrap Methods June 2009 1 / 30 Table of contents 1 Objectives and Interest of the Method 2 Two Tools: Sampling with Over-replacement and One-one Sampling 3 The Problem of Bootstrap in Complex Designs 4 Bootstrap for Poisson Sampling 5 Bootstrap in Other Sampling Designs 6 Two-phase Designs and Conclusions 7 Simulation Results Erika Antal and Yves Tillé () Bootstrap Methods June 2009 2 / 30 Objectives and Interest of the Method Method for bootstrapping from sample selected with complex design. The bootstrap sample has the same size (in expectation) as the original sample. No need of a reconstruction of pseudo-population (Gross, 1980; Chao and Lo, 1985). No need of rescaling the bootstrapped units (Mac Carthy and Snowden, 1985; Rao and Wu, 1988; Kuk, 1989; Rao et al., 1992; Sitter, 1992a,b; Booth et al., 1994; Holmberg, 1998; Preston and Henderson, 2007). In the bootstrap sample: use of the same inclusion probabilities. No need of correction of the variance. The variance of the bootstrapped estimators is directly an estimator of variance. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 3 / 30 Multinomial Sampling or Simple Sampling with Replacement Well-known result (Bol'shev, 1965; Johnson et al., 1997) Let (X1 , . . . , XN ) a vector of independent random variables with Poisson distribution, Sk ∼ Poiss (λ) then # N X S = (S1 , . . . , SN ) = (X1 , . . . , XN ) Sk = n k =1 " has a multinomial distribution. Thus Pr(S = s) = n s1 · · · sk · · · sN where k k ∈U n , for all s ∈ Rn , ) N X N s ∈ N sk = n . k =1 ( Rn = Erika Antal and Yves Tillé () −1 Y πk s Bootstrap Methods June 2009 4 / 30 Multinomial sampling In a multinomial sampling 1 Sk ∼ Bin n N n E(Sk ) = N n 1 var(Sk ) = 1− N N Erika Antal and Yves Tillé () Bootstrap Methods June 2009 5 / 30 Sampling with over-replacement Consider now a vector of N independent geometric random variables Xk : Pr(Xk = xk ) = (1 − p )p x , xk = 0, 1, 2, 3, . . . k Then we say that # N X S = (S1 , . . . , SN ) = (X1 , . . . , XN ) Sk = n k =1 " is a sampling with over-replacement. We get 1 Pr(S1 = x1 , . . . SN = xN ) = N +n−1 n Erika Antal and Yves Tillé () Bootstrap Methods . June 2009 6 / 30 Sampling with over-replacement Sampling with over-replacement Sk has a negative hypergeometric distribution N −1+n−j −1 Pr(Sk = j ) = n−j N +n−1 n , j = 0, . . . , n , n N n(N − 1)(N + n) var(Sk ) = . N 2 (N + 1) E(Sk ) = Erika Antal and Yves Tillé () Bootstrap Methods June 2009 7 / 30 Comparison of simples sampling designs Under simple random sampling without replacement var(Yb ) = N 2 N −n 1 X (y − Y )2 , nN N − 1 k ∈U k Under simple random sampling with replacement (multinomial) var(Yb ) = N (N − 1 ) 1 X (y − Y )2 , n N − 1 k ∈U k Under simple random sampling with over-replacement var(Yb ) = N (N − 1) Erika Antal and Yves Tillé () N +n 1 X (y − Y )2 , n(N + 1) N − 1 k ∈U k Bootstrap Methods June 2009 8 / 30 One-one sampling design A one-one sampling design is only a tool for bootstrapping. Selection with replacement of n units in a sample of size n. E(Sk ) = 1 var(Sk ) = 1 Can be implemented by selecting a part of the observation by multinomial sampling and another part by sampling with over-replacement. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 9 / 30 Fundamental problem of the bootstrap in survey sampling Yb : an estimator of the total Y . b ) : an estimator of the variance of Y b. var c (Y Yb ∗ : an estimator of the total Y obtained by resampling. var(Yb ∗ |S ): conditional variance of Yb ∗ given the sample S . A `good' bootstrap method must be such that b) E(Yb ∗ |S ) = Yb and var(Yb ∗ |S ) = var c (Y b) PROBLEM : in a complex design an estimator of the variance var c (Y ∗ can be very dierent form the variance var(Yb |S ) Thus the resampling design MUST be dierent from the sampling design, except in very special cases. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 10 / 30 Fundamental problem of the bootstrap in survey sampling S the sample Sk number of times the unit is selected in the sample E(Sk ) = πk and cov(Sk , S` ) = ∆k ` S ∗ bootstrap sample Sk∗ number of times the unit is resampled var(Yb ∗ |S ) conditional variance of Yb ∗ given the sample S . E(Sk∗ |S ) = 1 (to have E(Yb ∗ |S ) = Yb ) cov(Sk∗ , S`∗ |S ) = Σk ` . b ), we must have In order to have var(Yb ∗ |S ) = var c (Y Σk ` = Erika Antal and Yves Tillé () ∆k ` . πk ` Bootstrap Methods June 2009 11 / 30 Simple Example: Poisson Sampling Sk are independent Bernoulli random variables E(Sk ) = πk , var(Sk ) = πk (1 − πk ), and cov(Sk , S` ) = 0. S ∗ must be such that E(Sk∗ |S ) = 1, cov(Sk∗ , S`∗ |S ) = 0, var(Sk∗ |S ) = Erika Antal and Yves Tillé () var(Sk ) πk Bootstrap Methods = 1 − πk . June 2009 12 / 30 Implementation of Bootstrap for Poisson sampling ∗ with a Poisson sampling (without replacement) Select a sample SkA from S with inclusion probabilities πk . ∗ = 0, select a sample S ∗ From the set of units of S such that SkA kB according to a Poisson random variable (with replacement) with a parameter 1 i.e. Pr(SkB = x |S ) = ∗ e , x = 0, 1, 2, 3, . . . x! 0 ∗ =0 if SkA ∗ = 1. if SkA ∗ + S∗ . The resampling design is Sk∗ = SkA kB Erika Antal and Yves Tillé () Bootstrap Methods June 2009 13 / 30 Simple Random Sampling With Replacement Sampling design: sampling with replacement. Resampling with a one-one design. In this case, var(Yb ∗ |S ) = N2 X (y − Y )2 . n (n − 1 ) k ∈ S k Note that, if the resampling design is a simple random sampling with replacement, then the variance is slightly underestimated: var(Yb ∗ |S ) = Erika Antal and Yves Tillé () N2 X (y − Y )2 . n2 k ∈S k Bootstrap Methods June 2009 14 / 30 Unequal Probability Sampling With Replacement Sampling design: unequal probability sampling with replacement. Resampling with a one-one design. var(Yb ∗ |S ) = Erika Antal and Yves Tillé () n X n − 1 k ∈S Bootstrap Methods yk πk Yb − n !2 , June 2009 15 / 30 Simple Random Sampling Without Replacement Sampling design: sampling without replacement. Resampling with a composite design: A sample SA of m = n2 /N units is selected in S with simple random sampling without replacement. In the non-selected units, select a sample SB with a one-one design. The bootstrap sample is S ∗ = SA + SB . N −n 1 X b )2 . var(Yb ∗ |S ) = N 2 (yk − Y n(N − 1) n − 1 k ∈S Erika Antal and Yves Tillé () Bootstrap Methods June 2009 16 / 30 Unequal Probability Sampling Without Replacement Sampling design: sampling without replacement. Resampling with a composite design: A sample SA of units is selected in S with unequal probability sampling and inclusion probabilities φk without replacement. In the non-selected units, select a sample SB with a one-one design. The bootstrap sample is S ∗ = SA + SB . Erika Antal and Yves Tillé () Bootstrap Methods June 2009 17 / 30 Unequal Probability Sampling Without Replacement The choice n 1 − πk 1 − φk = min (1 − πk ) 1 − P ,1 . n−1 j ∈U Sj (1 − πj ) " ! # will be very close to the estimator of variance proposed by Hájek (1981): b) = var(Yb ∗ |S ) ≈ var c (Y ck = Erika Antal and Yves Tillé () X k ∈S n ck n−1 P 2 k ∈S ck yk /πk P − , πk k ∈S ck yk (1 − πk ). Bootstrap Methods June 2009 18 / 30 Unequal Probability Sampling Without Replacement Other choices for ck : The more simple choice is φk = πk . In this case, var(Sk∗ |S ) = 1 − πk . This choice will reconstruct a matrix that is very close to an estimator of variance proposed by Deville and Tillé (2005). A third choice can be φk such that X ∆ 1 − φk = min Sj kj , 1 − , πkj ∈ j U j 6=k which has the interest of reconstructing exactly the diagonal of ∆k ` /πk ` . Erika Antal and Yves Tillé () Bootstrap Methods June 2009 19 / 30 Two-stage Designs Possibility to generalize to two-stage and two-phases designs. All the primary units must not be selected. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 20 / 30 Conclusion In order to reconstruct the `good' estimator of variance, the resampling design must be an ad hoc procedure that reproduces the rst and the second moment. The interest is that the rescaling is not necessary. The bootstrap sample can be treated as the original sample (calibration, etc.) Erika Antal and Yves Tillé () Bootstrap Methods June 2009 21 / 30 Population, functions and sampling designs Generated population of 284 units. Variance of the estimator of four functions: Total Ratio of two totals Y and X Median Index of Gini Three dierent sampling designs: Poisson sampling Simple Random Sampling Without Replacement (srswor) Unequal Probability Sampling Without Replacement (UPwor) Erika Antal and Yves Tillé () Bootstrap Methods June 2009 22 / 30 Total Total: b) Goal: var(Yb ∗ |S ) = var c (Y Comparison of the results of: the bootstrap with replacement (generally used bootstrap) the bootstrap without replacement (e.g. creating articial population) the new method proposed here. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 23 / 30 Other function Other statistics: b ≈ varsim (θ) b var(θ) ∗ b Goal: var(θb |S ) = varsim (θ) Comparison of the results of the bootstrap with replacement (generally used bootstrap) the new method proposed here. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 24 / 30 Measures of comparison Measures used for the comparison: Ratio (R) = var(θb∗ |S ) b varsim (θ) Relative Bias (RB) = b var(θb∗ |S )−varsim (θ) b varsim (θ) = B b varsim (θ) Relative Root Mean Squared Error (RRMSE) = Bias Ratio (BR) = √ Erika Antal and Yves Tillé () B √ [B ]2 +var[var(θb∗ |S )] b varsim (θ) var[var(θb∗ |S )] Bootstrap Methods June 2009 25 / 30 Simulation results: Poisson Sampling Ratio Relative Bias RRMSE Bias Ratio Bootstrap WR Bootstrap WOR New Method 0.9844 1.0246 0.9997 −0.0156 0.0246 −0.0002 0.1643 0.0813 0.1447 −0.0892 0.3174 −0.0014 Bootstrap WR New Method 0.9955 0.9875 −0.0045 −0.0125 0.2370 0.2376 −0.0188 −0.0529 Bootstrap WR New Method 1.0490 1.0502 0.0490 0.0502 0.5355 0.5619 0.0918 0.0897 Bootstrap WR New Method 0.9625 1.0027 −0.0375 0.0027 0.6116 0.6528 −0.0614 0.0042 TOTAL RATIO MEDIAN GINI Erika Antal and Yves Tillé () Bootstrap Methods June 2009 26 / 30 Simulation results: Simple Random Sampling Without Replacement Ratio Relative Bias RRMSE Bias Ratio Bootstrap WR Bootstrap WOR New Method 1.0101 0.9897 1.0067 0.0101 −0.0103 0.0067 0.1662 0.1633 0.1656 0.0611 −0.0631 0.0407 Bootstrap WR New Method 0.9865 0.9831 −0.0135 −0.0169 0.1939 0.1936 −0.0698 −0.0874 Bootstrap WR New Method 1.0427 1.0703 0.0427 0.0703 0.4745 0.5188 0.0903 0.1367 Bootstrap WR New Method 1.0088 1.0044 0.0088 0.0044 0.4152 0.2619 0.0335 0.0169 TOTAL RATIO MEDIAN GINI Erika Antal and Yves Tillé () Bootstrap Methods June 2009 27 / 30 Simulation results: Unequal Probability Sampling Without Replacement Ratio Relative Bias RRMSE Bias Ratio Bootstrap WR Bootstrap WOR New Method 1.0383 0.9440 1.0004 0.0383 −0.0560 0.0004 0.2412 0.1936 0.1985 0.0992 −0.0027 0.0019 Bootstrap WR New Method 1.0301 1.0296 0.0301 0.0296 0.2034 0.2090 0.1499 0.1430 Bootstrap WR New Method 0.9975 1.0248 −0.0025 0.0248 0.4293 0.4673 −0.0059 0.0536 Bootstrap WR New Method 0.9307 0.9615 −0.0693 −0.0385 0.8546 0.9496 −0.0814 −0.0406 TOTAL RATIO MEDIAN GINI Erika Antal and Yves Tillé () Bootstrap Methods June 2009 28 / 30 Future work Three possible directions for examining the performance of this new method: other sampling designs (two stages sampling design, balanced sampling, stratied sampling design, etc.) other θ (variance, QSR, etc.) comparison with other existing resampling methods (BBR, other types of bootstrap methods, etc.) Erika Antal and Yves Tillé () Bootstrap Methods June 2009 29 / 30 Bibliography Bol'shev, L. (1965). On a characterization of the Poisson distribution. Teoriya Veroyatnostei i ee Primeneniya, 10:6471. Booth, J., Butler, R., and Hall, P. (1994). Bootstrap methods for nite populations. Journal of the American Statistical Association, 89:12821289. Chao, M.-T. and Lo, S.-H. (1985). A bootstrap method for nite population. Sankhya Series A, 47:399405. Deville, J.-C. and Tillé, Y. (2005). Variance approximation under balanced sampling. Journal of Statistical Planning and Inference, 128:569591. Gross, S. (1980). Median estimation in sample surveys. ASA Proceedings of Survey Research, pages 181184. Hájek, J. (1981). Sampling from a Finite Population. Marcel Dekker, New York. Holmberg, A. (1998). A bootstrap approach to probability proportional-to-size sampling. In Proceedings of the Section on Survey Research Methods, pages 378383. Johnson, N., Kotz, S., and Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley, New York. Kuk, A. (1989). Double bootstrap estimation of variance under systematic sampling with probability proportional to size. Journal of Statistical Computation and Simulation, 31:7382. Mac Carthy, P. and Snowden, C. (1985). The bootstrap and nite population sampling. Technical report, Public Health Service Publication. Preston, J. and Henderson, T. (2007). Replicate variance estimation and high entropy variance approximations. In Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada. Rao, J. and Wu, C. (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83:231241. Rao, J., Wu, C., and Yue, K. (1992). Some recent work on resampling methods for complex surveys. Survey Methodology, 18:209217. Sitter, R. (1992a). Comparing three bootstrap methods for survey data. Canadian Journal of Statistics, 20:135154. Sitter, R. (1992b). A resampling procedure for complex survey data. Journal of the American Statistical Association, 87:755765. Erika Antal and Yves Tillé () Bootstrap Methods June 2009 30 / 30