Download Bootstrap Methods for Complex Sampling Designs in Finite Population

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bootstrap Methods for Complex Sampling Designs
in Finite Population
Erika Antal and Yves Tillé
University of Neuchâtel
Colloque Deville
Neuchâtel 2009
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
1 / 30
Table of contents
1
Objectives and Interest of the Method
2
Two Tools: Sampling with Over-replacement and One-one Sampling
3
The Problem of Bootstrap in Complex Designs
4
Bootstrap for Poisson Sampling
5
Bootstrap in Other Sampling Designs
6
Two-phase Designs and Conclusions
7
Simulation Results
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
2 / 30
Objectives and Interest of the Method
Method for bootstrapping from sample selected with complex design.
The bootstrap sample has the same size (in expectation) as the
original sample.
No need of a reconstruction of pseudo-population (Gross, 1980; Chao
and Lo, 1985).
No need of rescaling the bootstrapped units (Mac Carthy and
Snowden, 1985; Rao and Wu, 1988; Kuk, 1989; Rao et al., 1992;
Sitter, 1992a,b; Booth et al., 1994; Holmberg, 1998; Preston and
Henderson, 2007).
In the bootstrap sample: use of the same inclusion probabilities.
No need of correction of the variance.
The variance of the bootstrapped estimators is directly an estimator of
variance.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
3 / 30
Multinomial Sampling or Simple Sampling with Replacement
Well-known result (Bol'shev, 1965; Johnson et al., 1997)
Let (X1 , . . . , XN ) a vector of independent random variables with
Poisson distribution, Sk ∼ Poiss (λ) then
#
N
X
S = (S1 , . . . , SN ) = (X1 , . . . , XN ) Sk = n
k =1
"
has a multinomial distribution. Thus
Pr(S = s) =
n
s1 · · · sk · · · sN
where
k
k ∈U
n
, for all
s ∈ Rn ,
)
N
X
N
s ∈ N sk = n .
k =1
(
Rn =
Erika Antal and Yves Tillé ()
−1 Y πk s
Bootstrap Methods
June 2009
4 / 30
Multinomial sampling
In a multinomial sampling
1
Sk ∼ Bin n
N
n
E(Sk ) =
N n
1
var(Sk ) =
1−
N
N
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
5 / 30
Sampling with over-replacement
Consider now a vector of N independent geometric random variables
Xk :
Pr(Xk = xk ) = (1 − p )p x , xk = 0, 1, 2, 3, . . .
k
Then we say that
#
N
X
S = (S1 , . . . , SN ) = (X1 , . . . , XN ) Sk = n
k =1
"
is a sampling with over-replacement.
We get
1
Pr(S1 = x1 , . . . SN = xN ) = N +n−1
n
Erika Antal and Yves Tillé ()
Bootstrap Methods
.
June 2009
6 / 30
Sampling with over-replacement
Sampling with over-replacement
Sk has a negative
hypergeometric distribution
N −1+n−j −1
Pr(Sk = j ) =
n−j N +n−1
n
, j = 0, . . . , n ,
n
N
n(N − 1)(N + n)
var(Sk ) =
.
N 2 (N + 1)
E(Sk ) =
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
7 / 30
Comparison of simples sampling designs
Under simple random sampling without replacement
var(Yb ) = N 2
N −n 1 X
(y − Y )2 ,
nN N − 1 k ∈U k
Under simple random sampling with replacement (multinomial)
var(Yb ) =
N (N − 1 ) 1 X
(y − Y )2 ,
n
N − 1 k ∈U k
Under simple random sampling with over-replacement
var(Yb ) = N (N − 1)
Erika Antal and Yves Tillé ()
N +n 1 X
(y − Y )2 ,
n(N + 1) N − 1 k ∈U k
Bootstrap Methods
June 2009
8 / 30
One-one sampling design
A one-one sampling design is only a tool for bootstrapping.
Selection with replacement of n units in a sample of size n.
E(Sk ) = 1
var(Sk ) = 1
Can be implemented by selecting a part of the observation by
multinomial sampling and another part by sampling with
over-replacement.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
9 / 30
Fundamental problem of the bootstrap in survey sampling
Yb : an estimator of the total Y .
b ) : an estimator of the variance of Y
b.
var
c (Y
Yb ∗ : an estimator of the total Y obtained by resampling.
var(Yb ∗ |S ): conditional variance of Yb ∗ given the sample S .
A `good' bootstrap method must be such that
b)
E(Yb ∗ |S ) = Yb and var(Yb ∗ |S ) = var
c (Y
b)
PROBLEM : in a complex design an estimator of the variance var
c (Y
∗
can be very dierent form the variance var(Yb |S )
Thus the resampling design MUST be dierent from the sampling
design, except in very special cases.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
10 / 30
Fundamental problem of the bootstrap in survey sampling
S the sample Sk number of times the unit is selected in the sample
E(Sk ) = πk and cov(Sk , S` ) = ∆k `
S ∗ bootstrap sample Sk∗ number of times the unit is resampled
var(Yb ∗ |S ) conditional variance of Yb ∗ given the sample S .
E(Sk∗ |S ) = 1 (to have E(Yb ∗ |S ) = Yb )
cov(Sk∗ , S`∗ |S ) = Σk ` .
b ), we must have
In order to have var(Yb ∗ |S ) = var
c (Y
Σk ` =
Erika Antal and Yves Tillé ()
∆k `
.
πk `
Bootstrap Methods
June 2009
11 / 30
Simple Example: Poisson Sampling
Sk are independent Bernoulli random variables
E(Sk ) = πk , var(Sk ) = πk (1 − πk ), and cov(Sk , S` ) = 0.
S ∗ must be such that
E(Sk∗ |S ) = 1, cov(Sk∗ , S`∗ |S ) = 0,
var(Sk∗ |S ) =
Erika Antal and Yves Tillé ()
var(Sk )
πk
Bootstrap Methods
= 1 − πk .
June 2009
12 / 30
Implementation of Bootstrap for Poisson sampling
∗ with a Poisson sampling (without replacement)
Select a sample SkA
from S with inclusion probabilities πk .
∗ = 0, select a sample S ∗
From the set of units of S such that SkA
kB
according to a Poisson random variable (with replacement) with a
parameter 1 i.e.
Pr(SkB = x |S ) =
∗


e
, x = 0, 1, 2, 3, . . .
x!
 0
∗ =0
if SkA
∗ = 1.
if SkA
∗ + S∗ .
The resampling design is Sk∗ = SkA
kB
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
13 / 30
Simple Random Sampling With Replacement
Sampling design: sampling with replacement.
Resampling with a one-one design.
In this case,
var(Yb ∗ |S ) =
N2 X
(y − Y )2 .
n (n − 1 ) k ∈ S k
Note that, if the resampling design is a simple random sampling with
replacement, then the variance is slightly underestimated:
var(Yb ∗ |S ) =
Erika Antal and Yves Tillé ()
N2 X
(y − Y )2 .
n2 k ∈S k
Bootstrap Methods
June 2009
14 / 30
Unequal Probability Sampling With Replacement
Sampling design: unequal probability sampling with replacement.
Resampling with a one-one design.
var(Yb ∗ |S ) =
Erika Antal and Yves Tillé ()
n
X
n − 1 k ∈S
Bootstrap Methods
yk
πk
Yb
−
n
!2
,
June 2009
15 / 30
Simple Random Sampling Without Replacement
Sampling design: sampling without replacement.
Resampling with a composite design:
A sample SA of m = n2 /N units is selected in S with simple random
sampling without replacement.
In the non-selected units, select a sample SB with a one-one design.
The bootstrap sample is S ∗ = SA + SB .
N −n 1 X
b )2 .
var(Yb ∗ |S ) = N 2
(yk − Y
n(N − 1) n − 1
k ∈S
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
16 / 30
Unequal Probability Sampling Without Replacement
Sampling design: sampling without replacement.
Resampling with a composite design:
A sample SA of units is selected in S with unequal probability sampling
and inclusion probabilities φk without replacement.
In the non-selected units, select a sample SB with a one-one design.
The bootstrap sample is S ∗ = SA + SB .
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
17 / 30
Unequal Probability Sampling Without Replacement
The choice
n
1 − πk
1 − φk = min
(1 − πk ) 1 − P
,1 .
n−1
j ∈U Sj (1 − πj )
"
!
#
will be very close to the estimator of variance proposed by Hájek
(1981):
b) =
var(Yb ∗ |S ) ≈ var
c (Y
ck =
Erika Antal and Yves Tillé ()
X
k ∈S
n
ck
n−1
P
2
k
∈S ck yk /πk
P
−
,
πk
k ∈S ck
yk
(1 − πk ).
Bootstrap Methods
June 2009
18 / 30
Unequal Probability Sampling Without Replacement
Other choices for ck :
The more simple choice is φk = πk . In this case, var(Sk∗ |S ) = 1 − πk .
This choice will reconstruct a matrix that is very close to an estimator
of variance proposed by Deville and Tillé (2005).
A third choice can be φk such that


X
∆
1 − φk = min 
Sj kj , 1
−
,
πkj
∈
j
U
j 6=k
which has the interest of reconstructing exactly the diagonal of
∆k ` /πk ` .
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
19 / 30
Two-stage Designs
Possibility to generalize to two-stage and two-phases designs.
All the primary units must not be selected.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
20 / 30
Conclusion
In order to reconstruct the `good' estimator of variance, the
resampling design must be an ad hoc procedure that reproduces the
rst and the second moment.
The interest is that the rescaling is not necessary.
The bootstrap sample can be treated as the original sample
(calibration, etc.)
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
21 / 30
Population, functions and sampling designs
Generated population of 284 units.
Variance of the estimator of four functions:
Total
Ratio of two totals Y and X
Median
Index of Gini
Three dierent sampling designs:
Poisson sampling
Simple Random Sampling Without Replacement (srswor)
Unequal Probability Sampling Without Replacement (UPwor)
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
22 / 30
Total
Total:
b)
Goal: var(Yb ∗ |S ) = var
c (Y
Comparison of the results of:
the bootstrap with replacement (generally used bootstrap)
the bootstrap without replacement (e.g. creating articial population)
the new method proposed here.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
23 / 30
Other function
Other statistics:
b ≈ varsim (θ)
b
var(θ)
∗
b
Goal: var(θb |S ) = varsim (θ)
Comparison of the results of
the bootstrap with replacement (generally used bootstrap)
the new method proposed here.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
24 / 30
Measures of comparison
Measures used for the comparison:
Ratio (R) =
var(θb∗ |S )
b
varsim (θ)
Relative Bias (RB) =
b
var(θb∗ |S )−varsim (θ)
b
varsim (θ)
=
B
b
varsim (θ)
Relative Root Mean Squared Error (RRMSE) =
Bias Ratio (BR) = √
Erika Antal and Yves Tillé ()
B
√
[B ]2 +var[var(θb∗ |S )]
b
varsim (θ)
var[var(θb∗ |S )]
Bootstrap Methods
June 2009
25 / 30
Simulation results: Poisson Sampling
Ratio
Relative Bias
RRMSE
Bias Ratio
Bootstrap WR
Bootstrap WOR
New Method
0.9844
1.0246
0.9997
−0.0156
0.0246
−0.0002
0.1643
0.0813
0.1447
−0.0892
0.3174
−0.0014
Bootstrap WR
New Method
0.9955
0.9875
−0.0045
−0.0125
0.2370
0.2376
−0.0188
−0.0529
Bootstrap WR
New Method
1.0490
1.0502
0.0490
0.0502
0.5355
0.5619
0.0918
0.0897
Bootstrap WR
New Method
0.9625
1.0027
−0.0375
0.0027
0.6116
0.6528
−0.0614
0.0042
TOTAL
RATIO
MEDIAN
GINI
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
26 / 30
Simulation results: Simple Random Sampling Without
Replacement
Ratio
Relative Bias
RRMSE
Bias Ratio
Bootstrap WR
Bootstrap WOR
New Method
1.0101
0.9897
1.0067
0.0101
−0.0103
0.0067
0.1662
0.1633
0.1656
0.0611
−0.0631
0.0407
Bootstrap WR
New Method
0.9865
0.9831
−0.0135
−0.0169
0.1939
0.1936
−0.0698
−0.0874
Bootstrap WR
New Method
1.0427
1.0703
0.0427
0.0703
0.4745
0.5188
0.0903
0.1367
Bootstrap WR
New Method
1.0088
1.0044
0.0088
0.0044
0.4152
0.2619
0.0335
0.0169
TOTAL
RATIO
MEDIAN
GINI
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
27 / 30
Simulation results: Unequal Probability Sampling Without
Replacement
Ratio
Relative Bias
RRMSE
Bias Ratio
Bootstrap WR
Bootstrap WOR
New Method
1.0383
0.9440
1.0004
0.0383
−0.0560
0.0004
0.2412
0.1936
0.1985
0.0992
−0.0027
0.0019
Bootstrap WR
New Method
1.0301
1.0296
0.0301
0.0296
0.2034
0.2090
0.1499
0.1430
Bootstrap WR
New Method
0.9975
1.0248
−0.0025
0.0248
0.4293
0.4673
−0.0059
0.0536
Bootstrap WR
New Method
0.9307
0.9615
−0.0693
−0.0385
0.8546
0.9496
−0.0814
−0.0406
TOTAL
RATIO
MEDIAN
GINI
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
28 / 30
Future work
Three possible directions for examining the performance of this new
method:
other sampling designs (two stages sampling design, balanced
sampling, stratied sampling design, etc.)
other θ (variance, QSR, etc.)
comparison with other existing resampling methods (BBR, other types
of bootstrap methods, etc.)
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
29 / 30
Bibliography
Bol'shev, L. (1965). On a characterization of the Poisson distribution. Teoriya Veroyatnostei i ee Primeneniya,
10:6471.
Booth, J., Butler, R., and Hall, P. (1994). Bootstrap methods for nite populations. Journal of the American
Statistical Association, 89:12821289.
Chao, M.-T. and Lo, S.-H. (1985). A bootstrap method for nite population. Sankhya Series A, 47:399405.
Deville, J.-C. and Tillé, Y. (2005). Variance approximation under balanced sampling. Journal of Statistical Planning
and Inference, 128:569591.
Gross, S. (1980). Median estimation in sample surveys. ASA Proceedings of Survey Research, pages 181184.
Hájek, J. (1981). Sampling from a Finite Population. Marcel Dekker, New York.
Holmberg, A. (1998). A bootstrap approach to probability proportional-to-size sampling. In Proceedings of the
Section on Survey Research Methods, pages 378383.
Johnson, N., Kotz, S., and Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley, New York.
Kuk, A. (1989). Double bootstrap estimation of variance under systematic sampling with probability proportional to
size. Journal of Statistical Computation and Simulation, 31:7382.
Mac Carthy, P. and Snowden, C. (1985). The bootstrap and nite population sampling. Technical report, Public
Health Service Publication.
Preston, J. and Henderson, T. (2007). Replicate variance estimation and high entropy variance approximations. In
Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada.
Rao, J. and Wu, C. (1988). Resampling inference with complex survey data. Journal of the American Statistical
Association, 83:231241.
Rao, J., Wu, C., and Yue, K. (1992). Some recent work on resampling methods for complex surveys. Survey
Methodology, 18:209217.
Sitter, R. (1992a). Comparing three bootstrap methods for survey data. Canadian Journal of Statistics, 20:135154.
Sitter, R. (1992b). A resampling procedure for complex survey data. Journal of the American Statistical
Association, 87:755765.
Erika Antal and Yves Tillé ()
Bootstrap Methods
June 2009
30 / 30