Download Notes 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Stat 475 Notes 8
Reading: Lohr, Chapter 4.2-4.5
Note for Homework 2:
yU
ˆ y
B

B
For estimating a ratio
xU with the estimator
x , the
standard error of B̂ is
ˆ )
 ( y  Bx
2
n 1

SE ( Bˆ )  1   2 iS
n 1
 N  nxU
If xU is unknown, then we substitute the sample mean x for it.
i
i
I. Inference from a stratified sample
Suppose we take a stratified sample from H strata with
N1 , , N H units in the population in the strata
( N1 
 N H  N ) and sample sizes in the strata of n1 ,
, nH .
Our estimators of the population total and population mean are
H
H
h 1
h 1
tˆstr   tˆh   N h yh
ystr
H
tˆstr
N

  h yh
N h 1 N
1
The properties of these estimators follow directly from the
property of simple random sample estimators:
 Unbiasedness. ystr and tˆstr are unbiased estimators of yU
and t . This is true because
H
N
 H Nh  H Nh
E 
yh   
E[ yh ]   h yhU  yU
h 1 N
 h 1 N  h 1 N
 Variance of the estimators. Since we are sampling
independently from the strata and we know Var (tˆh ) from
simple random sampling theory, we have
H
H

 2 Sh2
n
h
Var (tˆstr )  Var (tˆh )   1 
 Nh
Nh 
nh .
h 1
h 1 
 Variance estimates for stratified samples. We can obtain an
unbiased estimator of Var (tˆstr ) by substituting the sample
2
2
estimates sh for the population quantities S h . Note that to
estimate the variances, we need at least two units from each
stratum.
H

 2 sh2
n
h
ˆ (tˆstr )   1 
Var
N h
N h  nh
h 1 
H

  N h  sh2
n
1
h
ˆ ( ystr )  2 Var
ˆ (tˆstr )   1 
Var


N
N
h 1 
h   N  nh
As always, the standard error of an estimator is the square
ˆ ( ystr ) .
root of the estimated variance: SE ( ystr )  Var
2
 If either (1) the sample sizes within each stratum are large
or (2) the sampling design has a large number of strata, an
2
approximate 95% confidence interval for the population
mean yU is
ystr  1.96* SE ( ystr )
Some survey researchers use the 0.975 quantile of the tdistribution with n  H degrees of freedom instead of 1.96
(this multipler converges to 1.96 as n  H gets large).
Example 1: An advertising firm, interested in determining how
much to emphasize television advertising in a certain county,
decides to conduct a sample survey to estimate the average
number of hours each week that households within the county
watch television. The county contains two towns, A and B, and
a rural area. Town A is built around a factory, and most
households contain factory workers with school age children.
Town B is an exclusive suburb of a city in a neighboring county
and contains older residents with few echildren at home. There
are 155 households in town A, 62 in town B and 93 in the rural
area.
Merits of using stratified random sampling in this situation: The
population of households falls into three natural groupings, two
towns and a rural area, according to geographic location. Thus,
to use these divisions as three strata is quite natural simply for
administrative convenience in selecting the samples and
carrying out the fieldwork. In addition, each of the three groups
of households should have similar behavioral patterns among
residents within the group. We expect to see relatively small
variability in number of hours of television viewing among
households within a group, and this is precisely the situation in
3
which stratification produces a reduction in the variance of the
estimate of the population mean.
The advertising firm has enough time and money to interview
n  40 households and decides to select random samples of size
n1  20 from town A, n2  8 from town B and n3  12 from the
rural area (We will discuss the choice of sample sizes later).
The simple random samples are selected and the interviews are
conducted. The data and summaries are shown below.
towna=c(35,43,36,39,28,28,29,25,38,27,26,32,29,40,35,41,37,31,45,34);
townb=c(27,15,4,41,49,25,10,30);
rural=c(8,14,12,15,30,32,21,20,34,7,11,24);
mean(towna)
> [1] 33.9
mean(townb)
> [1] 25.125
mean(rural)
> [1] 19
sd(towna)
> [1] 5.94625
sd(townb)
> [1] 15.24502
sd(rural)
> [1] 9.36143
A good way to view the key features of these samples and look
for any outliers or unusual features is to make side-by-side
boxplots.
boxplot(towna,townb,rural,names=c("Town A","Town B","Rural"),main="Box
plots of Television Viewing Time")
4
There do not appear to be any outliers or unusual features to be
concerned about.
Note that N  155, N  62, N  93, N  155  62  93  310 Our estimate of the
population mean is
H
N
ystr   h yh 
h 1 N
1
(155)(33.90)  (62)(25.12)  (93)(19)  27.7
310
1
2
3
The standard error is
5

n
SE ( ystr )   1  h
Nh
h 1 
H
  N h  sh2



  N  nh
2
2
2
 155  155  2 5.952 
62  62  15.252 
93  93  9.36 2 
 1 
 1 
 1 







8
 310  310 
 310  310  12 
 310  310  20
 1.40
An approximate 95% confidence interval for the population
mean is
ystr  1.96SE ( ystr )  27.7  1.96*1.40  (25.0, 30.4)
II. Sampling Weights
The stratified sampling estimator tˆstr can be expressed as a
weighted sum of the individual sampling units.
H
H
N
tˆstr   N h yh    h yhj
h 1
h 1 jSh nh
The sampling weight whj  ( N h / nh ) can be thought of as the
number of units in the population represented by the sample
member ( h, j ) . If the population has 1600 men and 400 women
and the stratified sample design specifies sampling 200 men and
200 women, then each man in the sample has weight 8 and each
woman has weight 2. Each woman in the sample represents
herself and 1 other woman not selected to be in the sample, and
each man represents himself and 7 other men not in the sample.
Note that the probability of selecting the jth unit in the ith
stratum to be in the sample is  hj  nh / N h , the sampling
fraction in the hth stratum. Thus, the sampling is simply the
reciprocal of the probability of selection:
6
whj 
1
 hj .
The sum of the sampling weights equals the population size N ;
each sampled unit “represents” a certain number of units in the
population, so the whole sample “represents” the whole
population.
The stratified estimate of the population total may thus be
written as:
H
tˆstr    whj yhj
h 1 jSh
and the estimate of the population mean as
H
ystr 
w
h 1 jS h
H
yhj
hj
w
h 1 jS h
.
hj
Example 1 continued. In Example 1, the weights are
w
N
n
Stratum
hj
h
h
Town A
155
20
7.75
Town B
62
8
7.75
Rural
93
12
7.75
The sampling weights are identical for each stratum. This is an
example of proportional allocation. In proportional allocation,
7
so called because the number of sampled units in each stratum is
proportional to the size of the stratum, the probability of
selection  hj  nh / N h is the same ( n / N ) for all strata: in a
population of 2400 men and 1600 women, proportional
allocation with a 10% sample would mean sampling 240 men
and 160 women.
For a stratified random sample with proportional allocation, the
probability that an individual will be selected in the sample,
n / N , is the same as in a simple random sample but many of the
“bad” samples that could occur in a simple random sample (for
example, a sample in which all 400 persons are men) cannot be
selected in a sample with proportional allocation.
III. Optimal Allocation
The objective in designing a sample survey is to maximize the
information, i.e., minimize the variance of the estimator of the
desired quantity, for a fixed total cost. Let C represent total
cost, co represent overhead cost such as maintaining an office;
and ch represent the cost of taking an observation in stratum h
so that
H
C  co   ch nh .
h 1
We want to allocate observations to strata in order to minimize
Var ( ystr ) for a given total cost C or equivalently to minimize C
for a fixed Var ( ystr ) . Suppose the costs c1 ,
8
, ch are fixed. To
minimize the variance for a fixed cost, we can prove, using
calculus, that the optimal allocation has nh proportional to
N h Sh
ch
for each h. Thus, the optimal sample size in stratum h is
 N h Sh 


c
h
n
nh   H

N l Sl 


c
l

1
l


We thus sample heavily within a stratum if
 The stratum accounts for a large part of the population.
 The variance within the stratum is large; we sample more
heavily to compensate for the heterogeneity.
 Sampling in the stratum is inexpensive.
The variance of ystr is

nh   N h  Sh2
Var ( ystr )   1 


N h   N  nh
h 1 
ˆ ( ystr ) equal to some fixed value D
If we would like to set Var
and we use the optimal allocation, then we can solve for the
value of n that makes Var ( ystr ) equal to D .
 H
 H

N
S
/
c
N
S
c
h  
h h
h 
 h h
h 1
h 1




n
H
N 2 D   N h S h2
2
H
h 1
9
Example 1 continued. The advertising firm finds that obtaining
an observation from a rural household costs more than obtaining
a response in town A or B. The increase is due to the costs of
traveling from one rural household to another. The cost per
observation in each town is estimated to be $9 (that is,
c1  c2  9 ) and the cost per observation in the rural area $16
(that is, c3  16 ). The stratum standard devations (approximated
by the strata sample variances from a prior survey) are
S1  5, S2  15, S3  10 . Find the overall sample size n and the
stratum sample sizes n1 , n2 , n3 that allow the firm to estimate, at
minimum cost, the average television-viewing time with a
margin of error equal to 2 hours.
The margin of error is half the width of the 95% confidence
interval which is approximately equal to 2*standard deviation of
ystr . Thus, we want the standard deviation of ystr and the
variance of ystr to be 1.
We have
H
 N h Sh / ch 
h 1
H
N S
h 1
h
h
155(5) 62(15) 93(10)


 800.83
9
9
16
ch  155(5) 9  62(15) 9  93(10) 16  8835
Thus,
10
 H
 H

N
S
/
c
N
S
c


h
h
h
h
h
h



h 1
h 1



n
H
N D   N h S h2
2
h 1

(800.83)(8835)
 57.42  58
(310) 21  27,125
Then,


 NS / c 
155(5) / 3 
1
  58 
n1  n  3 1 1
 58(.32)  18.5  18

 N S / c 
 800.83 

h
h
h


 h 1

Similarly,
 62(15) / 3 
n2  58 
 58(.39)  22.6  23

800.83


 93(10) / 4 
n3  58 
 58(0.29)  16.8  17

800.83


Hence, we should select 18 households at random from town A,
23 from town B, and 17 from the rural area. We can then
estimate the average number of hours spent watching television
at minimum cost with a margin of error of 2 hours.
Neyman allocation is a special case of optimal allocation used
when the costs in the strata are approximately equal. Under
Neyman allocation, nh is proportional to N h Sh .
11
If all variances in strata and costs are equal, proportional
allocation is the same as optimal allocation. If we know the
variances within each stratum and they differ, optimal allocation
gives a smaller variance than proportional allocation. But
optimal allocation is a more complicated scheme; often the
simplicity and self weighting property of proportional allocation
are worth the extra variance. In addition, the optimal allocation
will differ for each variable being measured, whereas the
proportional allocation depends only on the number of
population units in each stratum.
Variance comparisons for different designs
Let y , ystr , pa , ystr ,na be for a sample of size n the mean from a
simple random sample, a proportional allocation and the
Neyman allocation respectively. Ignoring the finite population
correction,
2
1 H Nh
Var ( ystr , pa )  Var ( ystr ,na )  
 Sh  S  ,
n h 1 N
H
N
where S   h Sh
h 1 N
and
2
1 H Nh
Var ( y )  Var ( ystr , pa )  
 yhU  yU  .
n h 1 N
Thus proportional allocation yields the same results as the
optimal Neyman allocation (assuming costs are the same) when
12
the variances of the strata are all the same, but if the variances
differ, the optimal allocation is better.
Stratified random sampling with proportional allocation always
gives a smaller variance than does simple random sampling.
Comparing the equations for the variances under simple random
sampling, proportional allocation and optimal allocation
assuming costs of all observations are equal, we see that
stratification with proportional allocation is better than simple
random sampling if the strata means are quite variable and that
stratification with optimal allocation is even better than
stratification with proportional allocation if the strata standard
deviations are variable.
13