Download Horvitz-Thompson estimation

Chapter 2 Horvitz-Thompson estimation 2.1 Introduction We will study some theory for unbiased estimation under probability sampling designs. Let U = {1, · · · , N} be the index set of the target population. A probability sample is simply a subset of U, denoted by A ⊂ U, selected by a probability rule, called sampling design. Let A = {A; A ⊂ U} be the set of all possible samples. We have the following formal definition of the sampling distribution. Definition 2.1. 1. Sampling distribution : probability mass function defined on A . That is, a sampling distribution P(·) satisfies the following properties: (a) P (A) ∈ [0, 1] , ∀A ∈ A (b) ∑A∈A P (A) = 1. 2. Random sampling ⇐⇒ P(A) < 1, for all A ∈ A 3. Purposive sampling ⇐⇒ P(A) = 1 for some A ∈ A If the parameter of interest is a population quantity that can be written as θN = θ (yi ; i ∈ U), a statistic is written as θ̂ = θ̂ (yi ; i ∈ A), which means that it is a function of yi in the sample. If the statistic is used to estimate θN , then it becomes an estimator. The sampling distribution of an estimator is obtained 1 2 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION from the sampling distribution of the sample. That is, as discussed in Section 1.2, probability mass function P(A) applied for obtaining A is then used to represent P{θ̂ = θ̂ (yi ; i ∈ A)}. Using the sampling distribution of the estimator, we can also compute the expectation and variance of the estimator. Definition 2.2. For parameter θN , let θ̂ (A) = θ̂ (yi ; i ∈ A) be an estimator of θN . 1. Expectation : E θ̂ = ∑A∈A P (A) θ̂ (A) 2 2. Variance: Var θ̂ = ∑A∈A P (A) θ̂ (A) − E θ̂ 2 3. Mean squared error : MSE θ̂ = ∑A∈A P(A) θ̂ (A) − θN Here, the expectation is taken with respect to the sampling design induced by the probability rule for A, treating {y1 , y2 , · · · , yN } as fixed. As discussed in Chapter 1, the difference between E(θ̂ ) and θN is called bias and an estimator is called unbiased estimator when its bias is zero. When an estimator has high precision, it means that its variance is small but it does not necessarily mean that its accuracy is high. Accuracy of an estimator is related with small mean squared error of the estimator. 2.2 Horvitz-Thompson estimation Does an unbiased estimator always exist for all probability sampling designs? To answer this questions, we need the following definition of the inclusion probabilities. Definition 2.3. 1. First-order inclusion probability: πi = Pr (i ∈ A) = ∑ A; i∈A P (A) 2.2. HORVITZ-THOMPSON ESTIMATION 3 2. Second-order inclusion probability, or joint inclusion probability: πi j = Pr (i, j ∈ A) = ∑ P (A) A; i, j∈A 3. Probability sampling design: πi > 0, ∀i ∈ U 4. Measurable sampling design : πi j > 0 ∀i, j ∈ U. That is, the first-order inclusion probability πi refers to the probability that unit i is included in the sample. Also, the second-order inclusion probability πi j refers to the probability that both units, unit i and unit j, are included in the sample. Note that πii = πi by definition. Probability sampling design is a sampling design where all the first-order inclusion probabilities are strictly greater than zero. Probability sampling design is a sufficient condition for the existence of design unbiased estimator of the population total. Measurable sampling design is a sampling design where all the second-order inclusion probabilities are strictly greater than zero. Measurable sampling design is a sufficient condition for the existence of design unbiased estimator of sampling variance of an unbiased estimator. The following lemma presents some algebraic properties of the inclusion probabilities. Lemma 2.1. The first order inclusion probabilities satisfy N ∑ πi = n (2.1) i=1 where n is the sample size. If the sampling design is a fixed-size sampling design such that V (n) = 0, N ∑ πi j = nπ j (2.2) i=1 Proof. Given the sample index set A, define the following indicator function ( Ii = 1 if i ∈ A 0 if i ∈ / A. (2.3) 4 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION In this case, Ii is a random variable with E (Ii ) = πi and E (Ii I j ) = πi j . Furthermore, by the definition of sample size n, N ∑ Ii = n. (2.4) i=1 Thus, taking expectations of both sides of (2.4), we can obtain (2.1). Also, multiplying both sides of (2.4) and taking expectations again, we obtain (2.2). When the sample is obtained from a probability sampling design, an unbiased estimator for the total Y = ∑Ni=1 yi is given by yi . i∈A πi ŶHT = ∑ (2.5) This is often called Horvitz-Thompson(HT) estimator , which is originally discussed by Horvitz and Thompson (1952) and Narain (1951). The following theorem presents the basic statistical properties of the HT estimator. Theorem 2.1. The Horvitz-Thompson estimator, given by (2.5), satisfies the following properties: E ŶHT = Y (2.6) N N yi y j V ŶHT = ∑ ∑ (πi j − πi π j ) π i πj i=1 j=1 (2.7) Furthermore, for a fixed-size sampling design (i.e., V (n) = 0), we have 1 N N yi y j 2 V ŶHT = − ∑ ∑ (πi j − πi π j ) − . 2 i=1 j=1 πi π j (2.8) Proof. Using the sample indicator function Ii defined in (2.3), then HT estimator can be written N yi Ii . i=1 πi ŶHT = ∑ Treating {y1 , y2 , · · · , yN } as fixed and taking expectations with respect to Ii , we have N E ŶHT = i=1 N = yi ∑ πi E (Ii ) yi ∑ πi πi = Y i=1 2.2. HORVITZ-THOMPSON ESTIMATION 5 which shows (2.6). Similarly, we have N V ŶHT = N yi y j ∑ ∑ πi π j Cov (Ii , I j ) i=1 j=1 N = N yi y j ∑ ∑ πi π j (πi j − πi π j ) i=1 j=1 and (2.7) is proved. To show (2.8), define ∆i j = πi j − πi π j and express (2.8) as 2 N N N N 1 N N yi y j 2 yi yi y j − ∑ ∑ ∆i j − = − ∑ ∑ ∆i j + ∑ ∑ ∆i j . (2.9) 2 i=1 j=1 πi π j π π i i πj i=1 j=1 i=1 j=1 Now, using (2.2) and (2.1), we have N N N ∑ ∆i j = ∑ πi j − ∑ πi π j = nπi − nπi = 0. j=1 i=1 i=1 Thus, the first term on the right side of the equality becomes zero and (2.8) is established. Example 2.1. Let U = {1, 2, 3} be the target population and consider the following sampling design.     0.5 P (A) = 0.25    0.25 if A = {1, 2} if A = {1, 3} if A = {2, 3} In this case, we have π1 = 0.5 + 0.25 = 0.75, π2 = 0.5 + 0.25 = 0.75, and π3 = 0.25 + 0.25 = 0.5. Thus, HT estimator for the total is then     y1 /0.75 + y2 /0.75 if A = {1, 2} ŶHT = y1 /0.75 + y3 /0.5 if A = {1, 3}    y /0.75 + y /0.5 if A = {2, 3} 2 3 Therefore, E ŶHT = 0.5 (y1 /0.75 + y2 /0.75) + 0.25 (y1 /0.75 + y3 /0.5) + 0.25 (y2 /0.75 + y3 /0.5) = y1 + y2 + y3 and the HT estimator is unbiased for the population total. 6 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION HT estimator provides an unbiased estimation under probability sampling. If πi > 0 does not hold for some elements in the population, HT estimator cannot be used. Also, HT estimator is not location-scale invariant. That is, for any constants a and b, 1 1 a + byi yi 6= a + b ∑ . ∑ N i∈A πi N i∈A πi Variance formula (2.8) was discovered independently by Sen (1953) and Yates and Grundy (1953). Thus, it is often called Sen-Yates-Grundy(SYG) variance formula. The variance will be minimized when πi ∝ yi . That is, if the first order inclusion probability is proportional to yi , the resulting HT estimator under this sampling design will have zero variance. However, in practice, we cannot construct such design because we do not know the value of yi in the design stage. If there is a good auxiliary variable xi which is believed to be closely related with yi , then a sampling design with πi ∝ xi can lead to very efficient sampling design. Now, we discuss unbiased estimation of the variance of HT estimator. The variance formula in (2.7) or (2.8) is a population quantity and needs to be estimated from the sample. Generally speaking, the variance formula is a quadratic function of yi ’s in the population. Thus, to estimate the variance, we need to assume a measurable sampling design satisfying πi j > 0 for all i and j. That is, if the parameter of interest is of the form N N Q = ∑ ∑ q(yi , y j ) i=1 j=1 then, under measurable sampling design, an unbiased estimator of Q is 1 q (yi , y j ) . π i∈A j∈A i j Q̂ = ∑ ∑ (2.10) Thus, an unbiased estimator of variance in (2.7) is πi j − πi π j yi y j V̂ ŶHT = ∑ ∑ . πi j πi π j i∈A j∈A (2.11) Also, for fixed-sized designs, an unbiased estimator of the SYG variance formula is V̂ ŶHT πi j − πi π j 1 =− ∑∑ 2 i∈A j∈A πi j yi y j − πi π j 2 . (2.12) 2.2. HORVITZ-THOMPSON ESTIMATION 7 Another statistical properties of HT estimator are consistency and asymptotic normality, which are established for sufficiently large sample sizes. For infinite population, consistency of the sample mean means that the sample mean converges to the population mean in probability. That is, the probability that the absolute difference between the sample mean and the population mean is greater than a given threshold (ε) goes to zero as the sample size increases. That is, Pr (|ȳn − ȲN | > ε) → 0 as n → ∞, ∀ε > 0. In the finite population setup, the finite population is conceptualized to be a sequence of finite population with size N also increasing. The HT estimator of the finite population mean is given by ȲHT = 1 ŶHT N and, using the Chebychev inequality, Pr (|ȲHT − ȲN | > ε) ≤ ε −2Var (ȲHT ) . Thus, as long as Var (ȲHT ) → 0 as n → ∞, the consistency of HT estimator follows. Asymptotic normality is also an important property to obtain confidence intervals or perform a hypothesis testing from the sample. Under some regularity conditions, we can establish Ŷ −Y d qHT → N (0, 1) V̂ ŶHT for most of the sampling designs. Thus, in this case, the 95% confidence interval for the population total is computed by q q ŶHT − 1.96 V̂ ŶHT , ŶHT + 1.96 V̂ ŶHT . More in-depth discussion of the asymptotic normality of the HT estimator can be found in Chapter 1 of Fuller (2009). 8 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION Reference Fuller, W.A. (2009). Sampling Statistics, Wiley: Hoboken. Horvitz, D.G., and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association 47, 663-685. Narain, R. D. (1951). On sampling without replacement with varying probabilities. Journal of the Indian Society of Agricultural Statistics 3, 169-175. Sen, A.R. (1953). On the estimate of the variance in sampling with varying probabilities, Journal of the Indian Society of Agricultural Statistics 5, 119-127. Yates, F. and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society: Series B 15, 235-261.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Horvitz-Thompson estimation