Download Sample size estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sample size estimation
Steps for calculating the sample size per stratum
1. Choose the
stratification (e.g.
regions, district,….)
2. Define the
population (N) of each
strata
3. Decide on key
indicator(s)
5. Decide on precision
and confidence level
6. Calculate the initial
total sample size (n)
according to the
budget/time
7. Use simple random
sample per strata to
select your
representative sample
4. Estimate mean &
variance or prevalence
of key indicator
Final sample size
Calculate sample size – random sample
To estimate sample size, you need to know:
•Estimate of the prevalence or mean & STDev of the key indicator (e.g. 30% poor food consumption)
•Precision desired (for example: ± 5%)
•Level of confidence (for example: 95%)
•Population (only if below 10,000, otherwise it will not influence the required sample size)
•Expected response rate (for example: 90%)
•Number of eligible individuals per household (if applicable)
Note:
•
Precision is the variability of the estimate
•
Confidence Level is the probability of the same result if you re-sampled, all other things equal
•
Confidence Interval is the interval around the estimate for which we have a desired confidence level
•
Choose Size of Confidence Interval (Precision) given a confidence level
•
As long as the target population is more than a few thousand households, it will not influence the required sample size.
Only if less, the sample size requirements will reduce slightly.
Prevalence vs. mean
•
Prevalence is the total number of cases for a variable of interest that is typically binary
within a population divided by its total population (e.g. stunting incidence,
unemployment)
•
Mean is the expected value of a variable of interest that is typically continuous or
within a prescribed range for a given population (e.g. height, weight, age)
For the purpose of calculating sample size:
Treat variables as prevalence rates only when they are naturally binary
DO NOT threshold continuous indicators when calculating sample size even if for
analysis purposes they are turned into prevalence indicators
e.g. the Food Consumption Score (FCS) is a continuous indicator (0-112) but it is tresholded at 21 and
35 for poor and borderline prevalence
Choosing the right distribution
For continuous variables we must choose a
probability distribution that best fits the data
Step 1: ALWAYS plot a histogram of past data
and choose the distributional form!
Two of the most common ones encountered at WFP
are:
Normal
Distribution
Negative Binomial
Distribution
Prevalence rates
always follow a
binomial distribution
which is why they are
mathematically easy
to deal with
Sample size formula for prevalence
(single survey using random sampling)
To calculate sample size for estimate of prevalence with 95%
confidence limit
2
𝒏 = 1.96 x (P)(1-P)
d2
•
•
•
•
1.96 = Z value for 95% confidence limits
P = Estimated prevalence (e.g. 0.3 for 30%)
(P)(1-P) = variance for a binary (binomial) variable
d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)
Sample size formula for continuous variable
(single survey using random sampling)
To calculate sample size for estimate with 95% confidence limit:
𝒏 =
•
•
•
•
•
•
1
1
𝜇∗𝑑 2
+
𝑁 1.962 𝜎2
(1.2)
1.96 = Z value for 95% confidence limits
μ = Expected mean
σ2 = variance of variable
d = ½ of desired confidence interval (e.g. 0.025 for ± 5%)
μ*d = absolute value of confidence interval
N= population of each strata.
Sample size formula for cluster sampling
• To calculate sample size for estimate of prevalence with 95%
confidence interval taking into account cluster sampling:
2
N = DEFF x 1.96 x (P)(1-P)
2
d
DEFF = Design effect
1.96 = Z value for p = 0.05 or 95% confidence intervals
P = Estimated prevalence
d = Desired precision (for example, 0.05 for ± 5%)
What is Design effect?
• Ratio of the actual variance from the sampling method used, to the
hypothetical variance under simple random sampling
• For clustered sampling:
• N = # of Samples |K = # of clusters |M = # of samples per cluster
• Deff > 1 always for cluster sampling because it can never be more
efficient than random sampling, vise versa for stratified sampling
Design effect
Design effect increases when
• Key indicators are highly geographically clustered (e.g. water source, access to
health care)
• Water source
• Access to health care
• When number of clusters are decreased and size of clusters are increased
To minimize design effect
• Include more clusters of smaller size
• Stratify sample into more homogeneous groups
• All clusters should be of same size
Example1: Key indicator Normally distributed
Food Consumption Score
Example 1: Iraq Case Study
1. Choose the stratification (Strata): 18 Governorates of Iraq
2. Define Population (N) of each strata:
Governorate(s)
Anbar
Babil
Najaf
Baghdad
Basrah
Diyala
Duhok
Erbil
Kerbala
Kirkuk
Missan
Wassit
Muthanna
Qadissiya
Ninewa
Salah al-Din
Sulaymaniyah
Thi-Qar
Population 2015
1723154
2008609
1428979
7882807
2822646
1592434
848524
1650224
1183818
1551670
1080392
1340116
792339
1254963
3397659
1551978
1858506
2035734
3. Key Indicator: Food Consumption Score (FCS)
4. Calculate the Mean (μ) and Standard deviation (STDev) of the FCS (Key Indicator)
The STDev was rounded for having just two levels
5. Decide on precision and confidence level:
• 90% confidence interval
• Z=1.645
N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!
The bigger the CI then the worse the ability to detect trends!
6. Initial total sample size (n): 2200 surveyed respondents
The required sample size needed in each Governorate calculated using the equation (1.2)
will be:
Simple Random Sampling
• After estimating the sample size in each governorate, rescale it to the total
𝑛
sample size of 2200 (n) using the following formula:
𝑛
𝑛𝐼𝑅𝐴𝑄
Governorate(s) Population FCS Mean FCS Stdev SRS 5%-90%
Anbar
1723154
75
20.5
323
Babil
3437588
82.5
17.5
195
Najaf
3437588
82.5
17.5
195
Baghdad
7882807
82.5
17.5
195
Basrah
2822646
85
17.5
184
Diyala
1592434
80
17.5
207
Duhok
848524
77.5
20.5
303
Erbil
1650224
80
17.5
207
Kerbala
1183818
82.5
17.5
195
Kirkuk
1551670
75
17.5
236
Missan
2420508
80
17.5
207
Wassit
2420508
80
17.5
207
Muthanna
2047302
82.5
17.5
195
Qadissiya
2047302
82.5
17.5
195
Ninewa
3397659
77.5
20.5
303
Salah al-Din
1551978
80
17.5
207
Sulaymaniyah
1858506
80
17.5
207
Thi-Qar
2035734
85
17.5
184
Iraq
36004552
3348
Given 5/90 -> scaling by 2200
212
128
128
121
136
199
136
128
155
136
128
199
136
136
121
𝑛𝐼𝑅𝐴𝑄
2199
Example2: Negative binomial distributed indicator
reduced Coping Strategy Index
Stratified Random Sampling
Example 2: Malawi Case Study
1. Define the total Population (N):
Malawi Total
16512568
2. Define the population in each Strata (Nh): (Strata= 10 aggregated districts of Malawi)
District(s)
Population
Blantyre-Mwanza-Neno-Balaka
1933263
Chikwawa-Nsanje
892772
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
3170421
Dedza-Ntcheu
1406995
Dowa-Ntchisi-Kasungu-Mchinji
2322675
Lilongwe
2310728
Machinga-Mangochi
1608745
Mzimba-Karonga-Rumphi
1578519
Nkhata Bay-Chitipa-Likoma
514968
Nkhotakota-Salima
773482
Stratified Random Sampling
3. Key Indicator: reduced Coping Strategy Index (rCSI)
4. Calculate the Mean (μ) and Standard deviation (STDev) of the rCSI
District(s)
Population rCSI Mean rCSI Stdev
Blantyre-Mwanza-Neno-Balaka
1933263
12
4
Chikwawa-Nsanje
892772
16
5
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
3170421
14
4
Dedza-Ntcheu
1406995
16
5
Dowa-Ntchisi-Kasungu-Mchinji
2322675
16
4
Lilongwe
2310728
12
4
Machinga-Mangochi
1608745
16
5
Mzimba-Karonga-Rumphi
1578519
14
4
Nkhata Bay-Chitipa-Likoma
514968
16
5
Nkhotakota-Salima
773482
14
4
Malawi Total
16512568
15
4
StDev
StDev
= =
2
µ+µ2µ+µ
𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
Where
nh,baseline = sample size of each
Where nh,baseline = sample size of each
strata
thebaseline
baseline
with
h=1,…,10
strata of
of the
with
h=1,…,10
The STDev was rounded for having just two levels
5. Decide on precision and confidence level:
• 90% confidence interval
• Z=1.645
N.B. A confidence interval of 10% at 90% confidence is the absolute minimum!
The bigger the CI then the worse the ability to detect trends!
Simple Random Sampling
6. Total Sample Size (n): 2000 surveyed respondents
The required sample size needed in each Governorate calculated using the
equation (1.2) will be:
District(s)
Blantyre-Mwanza-Neno-Balaka
Population
rCSI Mean
rCSI Stdev
SRS 5%-90%
1933263
12
4
481
892772
16
5
423
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
3170421
14
4
353
Dedza-Ntcheu
1406995
16
5
423
Dowa-Ntchisi-Kasungu-Mchinji
2322675
16
4
271
Lilongwe
2310728
12
4
481
Machinga-Mangochi
1608745
16
5
423
Mzimba-Karonga-Rumphi
1578519
14
4
353
Nkhata Bay-Chitipa-Likoma
514968
16
5
422
Nkhotakota-Salima
773482
14
4
353
Chikwawa-Nsanje

Rescaling it to the total sample size of 2000 individuals:
Final sample size
District(s)
Blantyre-Mwanza-Neno-Balaka
Chikwawa-Nsanje
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
Dedza-Ntcheu
Dowa-Ntchisi-Kasungu-Mchinji
Lilongwe
Machinga-Mangochi
Mzimba-Karonga-Rumphi
Nkhata Bay-Chitipa-Likoma
Nkhotakota-Salima
Population
rCSI Mean rCSI Stdev SRS 5%-90% Given 5/90 -> scaling by 2000
1933263
12
4
481
242
892772
16
5
423
212
3170421
14
4
353
177
Applying Human
1406995
16
5
423
212 Judgement
2322675
16
4
271
136
2310728
12
4
481
242
1608745
16
5
423
212
1578519
14
4
353
177
514968
16
5
422
212
773482
14
4
353
177
Rounding/Appying
Human Judgement*
240
210
180
210
135
245
210
180
210
180
Number of Attempts – Malawi case
Given:
•The Response rate (25%)
•Estimated sample size in each strata:
Aggregated Districts
Blantyre-Mwanza-Neno-Balaka
Chikwawa-Nsanje
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
Dedza-Ntcheu
Dowa-Ntchisi-Kasungu-Mchinji
Lilongwe
Machinga-Mangochi
Mzimba-Karonga-Rumphi
Nkhata Bay-Chitipa-Likoma
Nkhotakota-Salima
The desired attempts=estimated sample size/response rate
Aggregated Districts
Blantyre-Mwanza-Neno-Balaka
Chikwawa-Nsanje
Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe
Dedza-Ntcheu
Dowa-Ntchisi-Kasungu-Mchinji
Lilongwe
Machinga-Mangochi
Mzimba-Karonga-Rumphi
Nkhata Bay-Chitipa-Likoma
Nkhotakota-Salima
Rounding/Appying
Human Judgement* Desired Attempts
240
960
210
840
180
720
210
840
135
540
245
980
210
840
180
720
210
840
180
720
Rounding/Appying
Human Judgement*
240
210
180
210
135
245
210
180
210
180
Related documents