Download Horvitz-Thompson estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Chapter 2
Horvitz-Thompson estimation
2.1
Introduction
We will study some theory for unbiased estimation under probability sampling
designs. Let U = {1, · · · , N} be the index set of the target population. A probability
sample is simply a subset of U, denoted by A ⊂ U, selected by a probability rule,
called sampling design. Let A = {A; A ⊂ U} be the set of all possible samples. We
have the following formal definition of the sampling distribution.
Definition 2.1.
1. Sampling distribution : probability mass function defined on
A . That is, a sampling distribution P(·) satisfies the following properties:
(a) P (A) ∈ [0, 1] ,
∀A ∈ A
(b) ∑A∈A P (A) = 1.
2. Random sampling ⇐⇒ P(A) < 1,
for all A ∈ A
3. Purposive sampling ⇐⇒ P(A) = 1 for some A ∈ A
If the parameter of interest is a population quantity that can be written as
θN = θ (yi ; i ∈ U), a statistic is written as θ̂ = θ̂ (yi ; i ∈ A), which means that
it is a function of yi in the sample. If the statistic is used to estimate θN , then
it becomes an estimator. The sampling distribution of an estimator is obtained
1
2
CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
from the sampling distribution of the sample. That is, as discussed in Section 1.2,
probability mass function P(A) applied for obtaining A is then used to represent
P{θ̂ = θ̂ (yi ; i ∈ A)}. Using the sampling distribution of the estimator, we can also
compute the expectation and variance of the estimator.
Definition 2.2.
For parameter θN , let θ̂ (A) = θ̂ (yi ; i ∈ A) be an estimator of θN .
1. Expectation : E θ̂ = ∑A∈A P (A) θ̂ (A)
2
2. Variance: Var θ̂ = ∑A∈A P (A) θ̂ (A) − E θ̂
2
3. Mean squared error : MSE θ̂ = ∑A∈A P(A) θ̂ (A) − θN
Here, the expectation is taken with respect to the sampling design induced by
the probability rule for A, treating {y1 , y2 , · · · , yN } as fixed. As discussed in Chapter
1, the difference between E(θ̂ ) and θN is called bias and an estimator is called
unbiased estimator when its bias is zero. When an estimator has high precision, it
means that its variance is small but it does not necessarily mean that its accuracy
is high. Accuracy of an estimator is related with small mean squared error of the
estimator.
2.2
Horvitz-Thompson estimation
Does an unbiased estimator always exist for all probability sampling designs? To
answer this questions, we need the following definition of the inclusion probabilities.
Definition 2.3.
1. First-order inclusion probability:
πi = Pr (i ∈ A) =
∑
A; i∈A
P (A)
2.2. HORVITZ-THOMPSON ESTIMATION
3
2. Second-order inclusion probability, or joint inclusion probability:
πi j = Pr (i, j ∈ A) =
∑
P (A)
A; i, j∈A
3. Probability sampling design: πi > 0, ∀i ∈ U
4. Measurable sampling design : πi j > 0 ∀i, j ∈ U.
That is, the first-order inclusion probability πi refers to the probability that unit
i is included in the sample. Also, the second-order inclusion probability πi j refers
to the probability that both units, unit i and unit j, are included in the sample. Note
that πii = πi by definition. Probability sampling design is a sampling design where
all the first-order inclusion probabilities are strictly greater than zero. Probability
sampling design is a sufficient condition for the existence of design unbiased estimator of the population total. Measurable sampling design is a sampling design
where all the second-order inclusion probabilities are strictly greater than zero.
Measurable sampling design is a sufficient condition for the existence of design
unbiased estimator of sampling variance of an unbiased estimator.
The following lemma presents some algebraic properties of the inclusion probabilities.
Lemma 2.1. The first order inclusion probabilities satisfy
N
∑ πi = n
(2.1)
i=1
where n is the sample size. If the sampling design is a fixed-size sampling design
such that V (n) = 0,
N
∑ πi j = nπ j
(2.2)
i=1
Proof. Given the sample index set A, define the following indicator function
(
Ii =
1
if i ∈ A
0
if i ∈
/ A.
(2.3)
4
CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
In this case, Ii is a random variable with E (Ii ) = πi and E (Ii I j ) = πi j . Furthermore,
by the definition of sample size n,
N
∑ Ii = n.
(2.4)
i=1
Thus, taking expectations of both sides of (2.4), we can obtain (2.1). Also, multiplying both sides of (2.4) and taking expectations again, we obtain (2.2).
When the sample is obtained from a probability sampling design, an unbiased
estimator for the total Y = ∑Ni=1 yi is given by
yi
.
i∈A πi
ŶHT = ∑
(2.5)
This is often called Horvitz-Thompson(HT) estimator , which is originally discussed by Horvitz and Thompson (1952) and Narain (1951). The following theorem presents the basic statistical properties of the HT estimator.
Theorem 2.1. The Horvitz-Thompson estimator, given by (2.5), satisfies the following properties:
E ŶHT = Y
(2.6)
N N
yi y j
V ŶHT = ∑ ∑ (πi j − πi π j )
π
i πj
i=1 j=1
(2.7)
Furthermore, for a fixed-size sampling design (i.e., V (n) = 0), we have
1 N N
yi y j 2
V ŶHT = − ∑ ∑ (πi j − πi π j )
−
.
2 i=1 j=1
πi π j
(2.8)
Proof. Using the sample indicator function Ii defined in (2.3), then HT estimator
can be written
N
yi
Ii .
i=1 πi
ŶHT = ∑
Treating {y1 , y2 , · · · , yN } as fixed and taking expectations with respect to Ii , we have
N
E ŶHT
=
i=1
N
=
yi
∑ πi E (Ii )
yi
∑ πi πi = Y
i=1
2.2. HORVITZ-THOMPSON ESTIMATION
5
which shows (2.6). Similarly, we have
N
V ŶHT
=
N
yi y j
∑ ∑ πi π j Cov (Ii , I j )
i=1 j=1
N
=
N
yi y j
∑ ∑ πi π j (πi j − πi π j )
i=1 j=1
and (2.7) is proved. To show (2.8), define ∆i j = πi j − πi π j and express (2.8) as
2 N N
N N
1 N N
yi y j 2
yi
yi y j
− ∑ ∑ ∆i j
−
= − ∑ ∑ ∆i j
+ ∑ ∑ ∆i j
.
(2.9)
2 i=1 j=1
πi π j
π
π
i
i πj
i=1 j=1
i=1 j=1
Now, using (2.2) and (2.1), we have
N
N
N
∑ ∆i j = ∑ πi j − ∑ πi π j = nπi − nπi = 0.
j=1
i=1
i=1
Thus, the first term on the right side of the equality becomes zero and (2.8) is
established.
Example 2.1. Let U = {1, 2, 3} be the target population and consider the following
sampling design.



 0.5
P (A) =
0.25


 0.25
if A = {1, 2}
if A = {1, 3}
if A = {2, 3}
In this case, we have π1 = 0.5 + 0.25 = 0.75, π2 = 0.5 + 0.25 = 0.75, and π3 =
0.25 + 0.25 = 0.5. Thus, HT estimator for the total is then



 y1 /0.75 + y2 /0.75 if A = {1, 2}
ŶHT =
y1 /0.75 + y3 /0.5
if A = {1, 3}


 y /0.75 + y /0.5
if A = {2, 3}
2
3
Therefore,
E ŶHT
= 0.5 (y1 /0.75 + y2 /0.75) + 0.25 (y1 /0.75 + y3 /0.5)
+ 0.25 (y2 /0.75 + y3 /0.5) = y1 + y2 + y3
and the HT estimator is unbiased for the population total.
6
CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
HT estimator provides an unbiased estimation under probability sampling. If
πi > 0 does not hold for some elements in the population, HT estimator cannot be
used. Also, HT estimator is not location-scale invariant. That is, for any constants
a and b,
1
1
a + byi
yi
6= a + b ∑ .
∑
N i∈A πi
N i∈A πi
Variance formula (2.8) was discovered independently by Sen (1953) and Yates
and Grundy (1953). Thus, it is often called Sen-Yates-Grundy(SYG) variance formula. The variance will be minimized when πi ∝ yi . That is, if the first order inclusion probability is proportional to yi , the resulting HT estimator under this sampling
design will have zero variance. However, in practice, we cannot construct such design because we do not know the value of yi in the design stage. If there is a good
auxiliary variable xi which is believed to be closely related with yi , then a sampling
design with πi ∝ xi can lead to very efficient sampling design.
Now, we discuss unbiased estimation of the variance of HT estimator. The variance formula in (2.7) or (2.8) is a population quantity and needs to be estimated
from the sample. Generally speaking, the variance formula is a quadratic function
of yi ’s in the population. Thus, to estimate the variance, we need to assume a measurable sampling design satisfying πi j > 0 for all i and j. That is, if the parameter
of interest is of the form
N
N
Q = ∑ ∑ q(yi , y j )
i=1 j=1
then, under measurable sampling design, an unbiased estimator of Q is
1
q (yi , y j ) .
π
i∈A j∈A i j
Q̂ = ∑ ∑
(2.10)
Thus, an unbiased estimator of variance in (2.7) is
πi j − πi π j yi y j
V̂ ŶHT = ∑ ∑
.
πi j
πi π j
i∈A j∈A
(2.11)
Also, for fixed-sized designs, an unbiased estimator of the SYG variance formula
is
V̂ ŶHT
πi j − πi π j
1
=− ∑∑
2 i∈A j∈A
πi j
yi y j
−
πi π j
2
.
(2.12)
2.2. HORVITZ-THOMPSON ESTIMATION
7
Another statistical properties of HT estimator are consistency and asymptotic
normality, which are established for sufficiently large sample sizes. For infinite
population, consistency of the sample mean means that the sample mean converges
to the population mean in probability. That is, the probability that the absolute
difference between the sample mean and the population mean is greater than a
given threshold (ε) goes to zero as the sample size increases. That is,
Pr (|ȳn − ȲN | > ε) → 0 as n → ∞, ∀ε > 0.
In the finite population setup, the finite population is conceptualized to be a sequence of finite population with size N also increasing. The HT estimator of the
finite population mean is given by
ȲHT =
1
ŶHT
N
and, using the Chebychev inequality,
Pr (|ȲHT − ȲN | > ε) ≤ ε −2Var (ȲHT ) .
Thus, as long as Var (ȲHT ) → 0 as n → ∞, the consistency of HT estimator follows.
Asymptotic normality is also an important property to obtain confidence intervals or perform a hypothesis testing from the sample. Under some regularity
conditions, we can establish
Ŷ −Y d
qHT
→ N (0, 1)
V̂ ŶHT
for most of the sampling designs. Thus, in this case, the 95% confidence interval
for the population total is computed by
q
q
ŶHT − 1.96 V̂ ŶHT , ŶHT + 1.96 V̂ ŶHT .
More in-depth discussion of the asymptotic normality of the HT estimator can be
found in Chapter 1 of Fuller (2009).
8
CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Reference
Fuller, W.A. (2009). Sampling Statistics, Wiley: Hoboken.
Horvitz, D.G., and Thompson, D.J. (1952). A generalization of sampling without
replacement from a finite universe, Journal of the American Statistical Association 47, 663-685.
Narain, R. D. (1951). On sampling without replacement with varying probabilities.
Journal of the Indian Society of Agricultural Statistics 3, 169-175.
Sen, A.R. (1953). On the estimate of the variance in sampling with varying probabilities, Journal of the Indian Society of Agricultural Statistics 5, 119-127.
Yates, F. and Grundy, P. M. (1953). Selection without replacement from within
strata with probability proportional to size. Journal of the Royal Statistical Society: Series B 15, 235-261.