Download 20.Additional Topics in Sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics for
Business and Economics
6th Edition
Chapter 20
Sampling:
Additional Topics in Sampling
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-1
Chapter Goals
After completing this chapter, you should be
able to:
 Explain the basic steps of a sampling study
 Describe sampling and nonsampling errors
 Explain simple random sampling and stratified sampling
 Analyze results from simple random or stratified
samples
 Determine sample size when estimating population
mean, population total, or population proportion
 Describe other sampling methods
 Cluster Sampling, Two-Phase Sampling, Nonprobability Samples
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-2
Steps of a Sampling Study
Step 6: Conclusions?
Step 5: Inferences From
Step 4: Obtaining Information?
Step 3: Sample Selection?
Step 2: Relevant Population?
Step 1: Information Required?
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-3
Sampling and
Nonsampling Errors
 A sample statistic is an estimate of an unknown
population parameter
 Sample evidence from a population is variable
 Sample-to-sample variation is expected
 Sampling error results from the fact that we only
see a subset of the population when a sample
is selected
 Statistical statements can be made about
sampling error
 It can be measured and interpreted using confidence
intervals, probabilities, etc.
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-4
Sampling and
Nonsampling Errors
(continued)
 Nonsampling error results from sources not
related to the sampling procedure used
 Examples:
 The population actually sampled is not the relevant
one
 Survey subjects may give inaccurate or dishonest
answers
 Nonresponse to survey questions
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-5
Types of Samples
 Probability Sample
 Items in the sample are chosen on the
basis of known probabilities
 Nonprobability Sample
 Items included are chosen without
regard to their probability of occurrence
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-6
Types of Samples
(continued)
Samples
Probability Samples
Simple
Random
Non-Probability
Samples
Stratified
Systematic
Judgement
Cluster
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Convenience
Quota
Chap 20-7
Simple Random Samples
 Suppose that a sample of n objects is to be selected
from a population of N objects
 A simple random sample procedure is one in which
every possible sample of n objects is equally likely to be
chosen
 Only sampling without replacement is considered here
 Random samples can be obtained from table of random
numbers or computer random number generators
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-8
Systematic Sampling
 Decide on sample size: n
 Divide frame of N individuals into groups of j
individuals: j=N/n
 Randomly select one individual from the 1st
group
 Select every jth individual thereafter
N = 64
n=8
First Group
j=8
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-9
Finite Population
Correction Factor
 Suppose sampling is without replacement and
the sample size is large relative to the
population size
 Assume the population size is large enough to
apply the central limit theorem
 Apply the finite population correction factor
when estimating the population variance
finite population correction factor 
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Nn
N
Chap 20-10
Estimating the Population Mean

Let a simple random sample of size n be
taken from a population of N members with
mean μ

The sample mean is an unbiased estimator of
the population mean μ

The point estimate is:
1 n
x   xi
n i1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-11
Estimating the Population Mean
(continued)

An unbiased estimation procedure for the variance
of the sample mean yields the point estimate
2
s
N

n
2
σˆ x  
n
N

Provided the sample size is large, 100(1 - )%
confidence intervals for the population mean are
given by
x  z α/2σˆ x  μ  x  z α/2σˆ x
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-12
Estimating the Population Total

Consider a simple random sample of size
n from a population of size N

The quantity to be estimated is the
population total Nμ

An unbiased estimation procedure for the
population total Nμ yields the point
estimate NX
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-13
Estimating the Population Total

An unbiased estimator of the variance of the
population total is
2
s
N2σˆ 2x  N(N  n)
n

Provided the sample size is large, a 100(1 - )%
confidence interval for the population total is
Nx  z α/2Nσˆ x  Nμ  Nx  z α/2Nσˆ x
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-14
Confidence Interval for
Population Total: Example
A firm has a population of 1000 accounts and
wishes to estimate the total population value
A sample of 80 accounts is selected with
average balance of $87.6 and standard
deviation of $22.3
Find the 95% confidence interval estimate of
the total balance
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-15
Example Solution
N  1000, n  80,
x  87.6,
s  22.3
2
2
s
(22.3)
N σˆ  N(N  n) 
(1000)(920 )  5718835
n
80
2
2
x
Nσˆ x  5718835  2391.41
Nx  z α/2N σˆ x  (1000)(87. 6)  (1.96)(239 1.41)
82912.84  Nμ  92287.16
The 95% confidence interval for the population total
balance is $82,912.52 to $92,287.16
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-16
Estimating the
Population Proportion

Let the true population proportion be P

Let p̂ be the sample proportion from n
observations from a simple random sample

The sample proportion, p̂ , is an unbiased
estimator of the population proportion, P
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-17
Estimating the
Population Proportion
(continued)

An unbiased estimator for the variance of the
population proportion is
ˆ (1 pˆ ) (N  n)
p
σˆ 

n 1
N
2
pˆ

Provided the sample size is large, a 100(1 - )%
confidence interval for the population proportion is
pˆ  zα/2σˆ pˆ  P  pˆ  zα/2σˆ pˆ
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-18
Stratified Sampling
Overview of stratified sampling:
 Divide population into two or more subgroups (called
strata) according to some common characteristic
 A simple random sample is selected from each subgroup
 Samples from subgroups are combined into one
Population
Divided
into 4
strata
Sample
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-19
Stratified Random Sampling
 Suppose that a population of N individuals can be
subdivided into K mutually exclusive and collectively
exhaustive groups, or strata
 Stratified random sampling is the selection of
independent simple random samples from each
stratum of the population.
 Let the K strata in the population contain N1, N2,. . .,
NK members, so that N1 + N2 + . . . + NK = N
 Let the numbers in the samples be n1, n2, . . ., nK.
Then the total number of sample members is
n1 + n2 + . . . + nK = n
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-20
Estimation of the Population Mean,
Stratified Random Sample


Let random samples of nj individuals be taken from
strata containing Nj individuals (j = 1, 2, . . ., K)
Let
K
K
Nj  N and  n j  n
j1
j1

Denote the sample means and variances in the strata
by Xj and sj2 and the overall population mean by μ

An unbiased estimator of the overall population mean
μ is:
1 K
x st   Nj x j
N j1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-21
Estimation of the Population Mean,
Stratified Random Sample
(continued)

An unbiased estimator for the variance of the overall population
mean is
σˆ 2xst
where

1
 2
N
K
2 2
N
 j σˆ x j
j1
2
s
(N j  n j )
j
2
σˆ x j  
nj
Nj
Provided the sample size is large, a 100(1 - )% confidence
interval for the population mean for stratified random samples is
x st  zα/2σˆ xst  μ  x st  zα/2σˆ xst
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-22
Estimation of the Population Total,
Stratified Random Sample

Suppose that random samples of nj individuals from
strata containing Nj individuals (j = 1, 2, . . ., K) are
selected and that the quantity to be estimated is the
population total, Nμ

An unbiased estimation procedure for the population
total Nμ yields the point estimate
K
Nx st   Nj x j
j1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-23
Estimation of the Population Total,
Stratified Random Sample
(continued)

An unbiased estimation procedure for the variance of
the estimator of the population total yields the point
estimate
K
N2σˆ 2xst   N2jσˆ 2xst
j1

Provided the sample size is large, 100(1 - )%
confidence intervals for the population total for
stratified random samples are obtained from
Nx st  z α/2Nσˆ st  Nμ  Nx st  z α/2Nσˆ st
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-24
Estimation of the Population
Proportion, Stratified Random Sample



Suppose that random samples of nj individuals from
strata containing Nj individuals (j = 1, 2, . . ., K) are
obtained
Let Pj be the population proportion, and p̂ j the
sample proportion, in the jth stratum
If P is the overall population proportion, an unbiased
estimation procedure for P yields
K
1
pˆ st   Njpˆ j
N j1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-25
Estimation of the Population
Proportion, Stratified Random Sample
(continued)
•
An unbiased estimation procedure for the
variance of the estimator of the overall population
proportion is
σˆ p2ˆ st
1 K 2ˆ2
 2  Nj σ pˆ j
N j1
where
pˆ j (1 pˆ j ) (N j  n j )
σˆ 

nj 1
Nj
2
pˆ j
is the estimate of the variance of the sample proportion in
the jth stratum
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-26
Estimation of the Population
Proportion, Stratified Random Sample
(continued)

Provided the sample size is large, 100(1 - )%
confidence intervals for the population proportion for
stratified random samples are obtained from
pˆ st  zα/2σˆ pˆ st  P  pˆ st  zα/2σˆ pˆ st
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-27
Proportional Allocation:
Sample Size
 One way to allocate sampling effort is to make the
proportion of sample members in any stratum the same
as the proportion of population members in the stratum
 If so, for the jth stratum,
nj
n

Nj
N
 The sample size for the jth stratum using proportional
allocation is
nj 
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Nj
N
n
Chap 20-28
Optimal Allocation
To estimate an overall population mean or total and if the
population variances in the individual strata are
denoted σj2 , the most precise estimators are obtained
with optimal allocation
 The sample size for the jth stratum using optimal
allocation is
nj 
N jσ j
n
K
N σ
i1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
i
i
Chap 20-29
Optimal Allocation
(continued)
To estimate the overall population proportion, estimators
with the smallest possible variance are obtained by
optimal allocation
 The sample size for the jth stratum for population
proportion using optimal allocation is
nj 
N j Pj (1  Pj )
K
N
i1
i
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
n
Pi (1  Pi )
Chap 20-30
Determining Sample Size
 The sample size is directly related to the size
of the variance of the population estimator
 If the researcher sets the allowable size of
the variance in advance, the necessary
sample size can be determined
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-31
Sample Size, Mean,
Simple Random Sampling


Consider estimating the mean of a population of N
members, which has variance σ2
2
If the desired variance, σ x of the sample mean is
specified, the required sample size to estimate the
population mean through simple random sampling is
Nσ 2
n
(N  1)σ 2x  σ 2
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-32
Sample Size, Mean,
Simple Random Sampling
(continued)

Often it is more convenient to specify directly the
desired width of the confidence interval for the
population mean rather than σ 2x


Thus the researcher specifies the desired margin of error for
the mean
Calculations are simple since, for example, a 95%
confidence interval for the population mean will
extend an approximate amount 1.96 σ x on each side
of the sample mean, X
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-33
Required Sample Size Example
2000 items are in a population. If σ = 45,
what sample size is needed to estimate the
mean within ± 5 with 95% confidence?
N = 2000, 1.96 σ x = 5 → σ x = 2.551
Nσ 2
(2000)(45) 2
n

 269.39
2
2
2
2
(N  1)σ x  σ
(1999)(2.5 51)  (45)
So the required sample size is n = 270
(Always round up)
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-34
Sample Size, Proportion,
Simple Random Sampling
(continued)
 Consider estimating the proportion P of individuals
in a population of size N who possess a certain
attribute
2
 If the desired variance, σpˆ , of the sample proportion
is specified, the required sample size to estimate the
population proportion through simple random
sampling is
NP(1  P)
n
(N  1)σ p2ˆ  P(1 P)
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-35
Sample Size, Proportion,
Simple Random Sampling
(continued)
 The largest possible value for this expression occurs
when the value of P is 0.25
nmax
0.25N

(N  1)σ p2ˆ  0.25
 A 95% confidence interval for the population proportion
will extend an approximate amount 1.96 σpˆ on each
side of the sample proportion
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-36
Required Sample Size Example
How large a sample would be necessary
to estimate the true proportion of voters
who will vote for proposition A, within ±3%,
with 95% confidence, from a population of
3400 voters?
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-37
Required Sample Size Example
(continued)
Solution:
N = 34000
For 95% confidence, use z = 1.96
1.96 σ pˆ s = .03 → σ pˆ s = .015306
nmax
0.25N
(0.25)(340 00)


 1035.47
2
2
(N  1)σ pˆ  0.25 (33999)(.0 153)  025
So use n = 1036
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-38
Sample Size, Mean,
Stratified Sampling

Suppose that a population of N members is subdivided
in K strata containing N1, N2, . . .,NK members

Let σj2 denote the population variance in the jth stratum

An estimate of the overall population mean is desired

If the desired variance, σ 2xst , of the sample estimator is
specified, the required total sample size, n, can be
found
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-39
Sample Size, Mean,
Stratified Sampling
(continued)

For proportional allocation:
K
2
N
σ
 j j
j1
n
Nσ

2
xst
1 K
  N jσ 2j
N j1
For optimal allocation:

1 K
2
  N jσ j 

N  j1

n
1 K
2
Nσ x s t   N jσ 2j
N j1
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-40
Cluster Sampling
 Population is divided into several “clusters,”
each representative of the population
 A simple random sample of clusters is selected
 Generally, all items in the selected clusters are examined
 An alternative is to chose items from selected clusters using
another probability sampling technique
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-41
Estimators for Cluster Sampling
 A population is subdivided into M clusters and a simple
random sample of m of these clusters is selected and
information is obtained from every member of the
sampled clusters
 Let n1, n2, . . ., nm denote the numbers of members in
the m sampled clusters
 Denote the means of these clusters by x1, x 2, , xm
 Denote the proportions of cluster members possessing
an attribute of interest by P1, P2, . . . , Pm
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-42
Estimators for Cluster Sampling
(continued)
 The objective is to estimate the overall population mean
µ and proportion P
 Unbiased estimation procedures give
Mean
Proportion
m
xc 
n x
i1
m
i
n
i 1
m
i
i
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
p̂c 
n p
i1
m
i i
n
i1
i
Chap 20-43
Estimators for Cluster Sampling
(continued)
 Estimates of the variance of these estimators, following from
unbiased estimation procedures, are
Mean
σˆ 2xc
Proportion
 m 2
2 
n
(
x

x
)
 i i

c
M  m  i1



Mm n 2 
m 1




σˆ p2ˆ c
 m 2
2 
ˆ
n
(P

p
)
 i i

c
M  m  i1



Mm n 2 
m 1




m
Where n 
n
i1
m
i
is the average number of individuals in the sampled clusters
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-44
Estimators for Cluster Sampling
(continued)
 Provided the sample size is large, 100(1 - )%
confidence intervals using cluster sampling are
 for the population mean
xc  zα/2σˆ xc  μ  xc  zα/2σˆ xc
 for the population proportion
pˆ c  zα/2σˆ pˆ c  P  pˆ c  zα/2σˆ pˆ c
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-45
Two-Phase Sampling
 Sometimes sampling is done in two steps
 An initial pilot sample can be done
 Disadvantage:
 takes more time
 Advantages:
 Can adjust survey questions if problems are noted
 Additional questions may be identified
 Initial estimates of response rate or population
parameters can be obtained
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-46
Non-Probability Samples
Samples
Probability Samples
Simple
Random
Non-Probability
Samples
Stratified
Systematic
Judgement
Cluster
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Convenience
Quota
Chap 20-47
Non-Probability Samples
(continued)
 It may be simpler or less costly to use a nonprobability based sampling method
 Judgement sample
 Quota sample
 Convience sample
 These methods may still produce good
estimates of population parameters
 But …
 Are more subject to bias
 No valid way to determine reliability
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-48
Chapter Summary
 Reviewed basic steps in a sampling study
 Defined sampling and nonsampling errors
 Examined probability sampling methods
 Simple Random Sampling, Systematic Sampling, Stratified
Random Sampling, Cluster Sampling
 Identified Estimators for the population mean, population
total, and population proportion for different types of
samples
 Determined the required sample size for specified
confidence interval width
 Examined nonprobabilistic sampling methods
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.
Chap 20-49