Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
(3) Let 2 be the population variance : 2
Var( ys )
2
1
2
N
i 1 ( yi )
N 1
(1 f )
n
Here, the factor (1 - f ) is called the finite population correction
• usually unimportant in social surveys:
n =10,000 and N = 5,000,000: 1- f = 0.998
n =1000 and N = 400,000: 1- f = 0.9975
n =1000 and N = 5,000,000: 1-f = 0.9998
• effect of changing n much more important than effect of
changing n/N
1
An unbiased estimator of 2 is given by the
sample variance
1
2
2
s
is ( yi ys )
n 1
2
s
The estimated variance Vˆ ( y s ) (1 f )
n
Usually we report the standard error of the estimate:
SE( y s ) Vˆ ( y s )
Confidence intervals for is based on the
Central Limit Theorem:
For large n, N n : Z ( ys ) / (1 f ) / n ~ N (0,1)
Approximat e 95% CI for :
ys 1.96 SE( ys ), ys 1.96 SE( ys ) ys 1.96 SE( ys )
2
Example
N = 341 residential blocks in Ames, Iowa
yi = number of dwellings in block i
1000 independent SRS for different values of n
n
Proportion of samples Proportion of samples
with |Z| <1.64
with |Z| <1.96
30
50
0.88
0.88
0.93
0.93
70
90
0.88
0.90
0.94
0.95
3
For one SRS with n = 90:
y s 13
s 2 75
SE( y s ) (1 90 / 341)75 / 90 0.78
Approximat e 95% CI : 13 1.96 0.78 13 1.53 (11.47, 14.53)
4
Absolute value of sampling error is not informative when
not related to value of the estimate
For example, SE =2 is small if estimate is 1000, but very
large if estimate is 3
The coefficient of variation for the estimate:
CV ( ys ) SE ( ys ) / ys
In example : CV ( ys ) 0.78 / 13 0.06 6%
•A measure of the relative variability of an estimate.
•It does not depend on the unit of measurement.
• More stable over repeated surveys, can be used for
planning, for example determining sample size
• More meaningful when estimating proportions
5
Estimation of a population proportion p
with a certain characteristic A
p = (number of units in the population with A)/N
Let yi = 1 if unit i has characteristic A, 0 otherwise
Then p is the population mean of the yi’s.
Let X be the number of units in the sample with
characteristic A. Then the sample mean can be
expressed as
pˆ y s X / n
6
Then under SRS :
E ( pˆ ) p
and
p(1 p )
n 1
Var( pˆ )
(1
)
n
N 1
since the population variance equals 2
Np(1 p )
N 1
n
s
pˆ (1 pˆ )
n 1
2
So the unbiased estimate of the variance of the estimator:
pˆ (1 pˆ )
n
ˆ
V ( pˆ )
(1 )
n 1
N
7
Examples
A political poll: Suppose we have a random sample of 1000
eligible voters in Norway with 280 saying they will vote
for the Labor party. Then the estimated proportion of Labor
votes in Norway is given by:
p̂ 280 / 1000 0.28
p̂( 1 p̂ )
n
0.28 0.72
SE( p̂ )
(1 )
0.0144
n 1
N
999
Confidence interval requires normal approximation.
Can use the guideline from binomial distribution, when
N-n is large: np 5 and n(1 p) 5
8
In this example : n = 1000 and N = 4,000,000
Approximat e 95% CI : p̂ 1.96 SE( p̂ )
0.280 0.028 (0.252, 0.308)
Ex: Psychiatric Morbidity Survey 1993 from Great Britain
p = proportion with psychiatric problems
n = 9792 (partial nonresponse on this question: 316)
N @ 40,000,000
pˆ 0.14
SE ( pˆ ) (1 0.00024 )0.14 0.86 / 9791 0.0035
95 % CI : 0.14 1.96 0.0035 0.14 0.007 (0.133,0.1 47)
9
General probability sampling
• Sampling design: p(s) - known probability of selection for
each subset s of the population U
• Actually: The sampling design is the probability distribution
p(.) over all subsets of U
• Typically, for most s: p(s) = 0 . In SRS of size n, all s with
size different from n has p(s) = 0.
• The inclusion probability:
i P( unit i is in the sample)
P(i s ) p( s )
{s:is}
10
Illustration
U = {1,2,3,4}
Sample of size 2; 6 possible samples
Sampling design:
p({1,2}) = ½, p({2,3}) = 1/4, p({3,4}) = 1/8, p({1,4}) = 1/8
The inclusion probabilities:
1 p( s ) p({1,2}) p({1,4}) 5 / 8
{s:1s}
2 p( s ) p({1,2}) p({2,3}) 3 / 4 6 / 8
{s:2s}
3 p( s ) p({2,3}) p({3,4}) 3 / 8
{s:3s}
4 p( s ) p({3,4}) p({1,4}) 2 / 8
{s:4s}
11
Some results
( I ) 1 2 ... N E ( n ) ; n is the sample size
( II ) If sample size is determined to be n in advance :
1 2 ... N n
Proof :
Let Z i 1 if unit i is included in the sample, 0 otherwise
i P( Z i 1) E ( Z i )
n i 1 Z i E (n) i 1 E ( Z i ) i 1 i
N
N
N
12
Estimation theory
probability sampling in general
Problem: Estimate a population quantity for the variable y
N
For the sake of illustration: The population total t yi
An estimator of t based on the sample : tˆ
i 1
Expected value : E (tˆ ) s tˆ( s ) p( s )
Variance : Var(tˆ ) E[tˆ Etˆ]2 s [tˆ( s ) Etˆ]2 p( s )
Bias : E (tˆ ) t
tˆ is unbiased if E (tˆ ) t
13
Let Vˆ (tˆ) be an (unbiased if possible) estimate of Var(tˆ)
The standard error of tˆ : SE(tˆ) Vˆ (tˆ)
Coefficient of variation of tˆ : CV (tˆ) SE(tˆ) / tˆ
CV is a useful measure of uncertainty, especially when
standard error increases as the estimate increases
Margin of error : 2 SE(tˆ)
Because, typically we have that
P(tˆ 2SE(tˆ) t tˆ 2SE(tˆ)) 0.95 for large n, N n
Since tˆ is approximat ely normally distribute d for large n, N n
t̂ 2 SE( t̂ ) is approximat ely a 95% CI
14
Some peculiarities in the estimation theory
Example: N=3, n=2, simple random sample
s1 {1,2}, s2 {1,3}, s3 {2,3}
p( sk ) 1 / 3 for k 1,2,3
Let tˆ1 3 ys , unbiased
Let tˆ2 be given by :
ˆt 2 ( s1 ) 3 1 ( y1 y2 ) tˆ1 ( s1 )
2
ˆt 2 ( s2 ) 3 ( 1 y1 2 y3 ) tˆ1 ( s 2 ) 1 y3
2
3
2
ˆt 2 ( s3 ) 3 ( 1 y2 1 y3 ) tˆ1 ( s3 ) 1 y3
2
3
2
15
Also tˆ2 is unbiased :
1 3 ˆ
1
ˆ
ˆ
E (t2 ) s t2 ( s ) p( s ) k 1 t2 ( sk ) 3t t
3
3
1
ˆ
ˆ
Var(t1 ) Var(t2 ) y3 (3 y2 3 y1 y3 )
6
Var(tˆ1 ) Var(tˆ2 ) if y3 0 and 3 y2 3 y1 y3
If yi 0 / 1 variables, this happens when y1 0, y2 y3 1
For this set of values of the yi’s:
tˆ1 ( s1 ) 1.5, tˆ1 ( s2 ) 1.5, tˆ1 ( s3 ) 3 : never correct
tˆ2 ( s1 ) 1.5, tˆ2 ( s2 ) 2, tˆ2 ( s3 ) 2.5
tˆ2 has clearly less variabilit y than tˆ1 for these y - values
16
Let y be the population vector of the y-values.
This example shows that
Ny s
is not uniformly best ( minimum variance for all y)
among linear design-unbiased estimators
Example shows that the ”usual” basic estimators do not
have the same properties in design-based survey
sampling as they do in ordinary statistical models
In fact, we have the following much stronger result:
Theorem: Let p(.) be any sampling design. Assume each
yi can take at least two values. Then there exists no
uniformly best design-unbiased estimator of the total t
17
Proof:
Let tˆ be unbiased, and let y 0 be one possible value of y.
Then there exists unbiased tˆ0 with Var(tˆ0 ) 0 when y y 0
tˆ0 ( s, y) tˆ( s, y) tˆ( s, y 0 ) t0 , t0 is the total for y 0
1) tˆ0 is unbiased : E (tˆ0 ) t s tˆ( s, y 0 ) p( s ) t0 t
2) When y y 0 : tˆ0 t0 for all samples s Var(tˆ0 ) 0
This implies that a uniformly best unbiased estimator
must have variance equal to 0 for all values of y,
which is impossible
18