Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sample size estimation Steps for calculating the sample size per stratum 1. Choose the stratification (e.g. regions, district,….) 2. Define the population (N) of each strata 3. Decide on key indicator(s) 5. Decide on precision and confidence level 6. Calculate the initial total sample size (n) according to the budget/time 7. Use simple random sample per strata to select your representative sample 4. Estimate mean & variance or prevalence of key indicator Final sample size Calculate sample size – random sample To estimate sample size, you need to know: •Estimate of the prevalence or mean & STDev of the key indicator (e.g. 30% poor food consumption) •Precision desired (for example: ± 5%) •Level of confidence (for example: 95%) •Population (only if below 10,000, otherwise it will not influence the required sample size) •Expected response rate (for example: 90%) •Number of eligible individuals per household (if applicable) Note: • Precision is the variability of the estimate • Confidence Level is the probability of the same result if you re-sampled, all other things equal • Confidence Interval is the interval around the estimate for which we have a desired confidence level • Choose Size of Confidence Interval (Precision) given a confidence level • As long as the target population is more than a few thousand households, it will not influence the required sample size. Only if less, the sample size requirements will reduce slightly. Prevalence vs. mean • Prevalence is the total number of cases for a variable of interest that is typically binary within a population divided by its total population (e.g. stunting incidence, unemployment) • Mean is the expected value of a variable of interest that is typically continuous or within a prescribed range for a given population (e.g. height, weight, age) For the purpose of calculating sample size: Treat variables as prevalence rates only when they are naturally binary DO NOT threshold continuous indicators when calculating sample size even if for analysis purposes they are turned into prevalence indicators e.g. the Food Consumption Score (FCS) is a continuous indicator (0-112) but it is tresholded at 21 and 35 for poor and borderline prevalence Choosing the right distribution For continuous variables we must choose a probability distribution that best fits the data Step 1: ALWAYS plot a histogram of past data and choose the distributional form! Two of the most common ones encountered at WFP are: Normal Distribution Negative Binomial Distribution Prevalence rates always follow a binomial distribution which is why they are mathematically easy to deal with Sample size formula for prevalence (single survey using random sampling) To calculate sample size for estimate of prevalence with 95% confidence limit 2 𝒏 = 1.96 x (P)(1-P) d2 • • • • 1.96 = Z value for 95% confidence limits P = Estimated prevalence (e.g. 0.3 for 30%) (P)(1-P) = variance for a binary (binomial) variable d = ½ of desired confidence interval (e.g. 0.025 for ± 5%) Sample size formula for continuous variable (single survey using random sampling) To calculate sample size for estimate with 95% confidence limit: 𝒏 = • • • • • • 1 1 𝜇∗𝑑 2 + 𝑁 1.962 𝜎2 (1.2) 1.96 = Z value for 95% confidence limits μ = Expected mean σ2 = variance of variable d = ½ of desired confidence interval (e.g. 0.025 for ± 5%) μ*d = absolute value of confidence interval N= population of each strata. Sample size formula for cluster sampling • To calculate sample size for estimate of prevalence with 95% confidence interval taking into account cluster sampling: 2 N = DEFF x 1.96 x (P)(1-P) 2 d DEFF = Design effect 1.96 = Z value for p = 0.05 or 95% confidence intervals P = Estimated prevalence d = Desired precision (for example, 0.05 for ± 5%) What is Design effect? • Ratio of the actual variance from the sampling method used, to the hypothetical variance under simple random sampling • For clustered sampling: • N = # of Samples |K = # of clusters |M = # of samples per cluster • Deff > 1 always for cluster sampling because it can never be more efficient than random sampling, vise versa for stratified sampling Design effect Design effect increases when • Key indicators are highly geographically clustered (e.g. water source, access to health care) • Water source • Access to health care • When number of clusters are decreased and size of clusters are increased To minimize design effect • Include more clusters of smaller size • Stratify sample into more homogeneous groups • All clusters should be of same size Example1: Key indicator Normally distributed Food Consumption Score Example 1: Iraq Case Study 1. Choose the stratification (Strata): 18 Governorates of Iraq 2. Define Population (N) of each strata: Governorate(s) Anbar Babil Najaf Baghdad Basrah Diyala Duhok Erbil Kerbala Kirkuk Missan Wassit Muthanna Qadissiya Ninewa Salah al-Din Sulaymaniyah Thi-Qar Population 2015 1723154 2008609 1428979 7882807 2822646 1592434 848524 1650224 1183818 1551670 1080392 1340116 792339 1254963 3397659 1551978 1858506 2035734 3. Key Indicator: Food Consumption Score (FCS) 4. Calculate the Mean (μ) and Standard deviation (STDev) of the FCS (Key Indicator) The STDev was rounded for having just two levels 5. Decide on precision and confidence level: • 90% confidence interval • Z=1.645 N.B. A confidence interval of 10% at 90% confidence is the absolute minimum! The bigger the CI then the worse the ability to detect trends! 6. Initial total sample size (n): 2200 surveyed respondents The required sample size needed in each Governorate calculated using the equation (1.2) will be: Simple Random Sampling • After estimating the sample size in each governorate, rescale it to the total 𝑛 sample size of 2200 (n) using the following formula: 𝑛 𝑛𝐼𝑅𝐴𝑄 Governorate(s) Population FCS Mean FCS Stdev SRS 5%-90% Anbar 1723154 75 20.5 323 Babil 3437588 82.5 17.5 195 Najaf 3437588 82.5 17.5 195 Baghdad 7882807 82.5 17.5 195 Basrah 2822646 85 17.5 184 Diyala 1592434 80 17.5 207 Duhok 848524 77.5 20.5 303 Erbil 1650224 80 17.5 207 Kerbala 1183818 82.5 17.5 195 Kirkuk 1551670 75 17.5 236 Missan 2420508 80 17.5 207 Wassit 2420508 80 17.5 207 Muthanna 2047302 82.5 17.5 195 Qadissiya 2047302 82.5 17.5 195 Ninewa 3397659 77.5 20.5 303 Salah al-Din 1551978 80 17.5 207 Sulaymaniyah 1858506 80 17.5 207 Thi-Qar 2035734 85 17.5 184 Iraq 36004552 3348 Given 5/90 -> scaling by 2200 212 128 128 121 136 199 136 128 155 136 128 199 136 136 121 𝑛𝐼𝑅𝐴𝑄 2199 Example2: Negative binomial distributed indicator reduced Coping Strategy Index Stratified Random Sampling Example 2: Malawi Case Study 1. Define the total Population (N): Malawi Total 16512568 2. Define the population in each Strata (Nh): (Strata= 10 aggregated districts of Malawi) District(s) Population Blantyre-Mwanza-Neno-Balaka 1933263 Chikwawa-Nsanje 892772 Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 Dedza-Ntcheu 1406995 Dowa-Ntchisi-Kasungu-Mchinji 2322675 Lilongwe 2310728 Machinga-Mangochi 1608745 Mzimba-Karonga-Rumphi 1578519 Nkhata Bay-Chitipa-Likoma 514968 Nkhotakota-Salima 773482 Stratified Random Sampling 3. Key Indicator: reduced Coping Strategy Index (rCSI) 4. Calculate the Mean (μ) and Standard deviation (STDev) of the rCSI District(s) Population rCSI Mean rCSI Stdev Blantyre-Mwanza-Neno-Balaka 1933263 12 4 Chikwawa-Nsanje 892772 16 5 Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 Dedza-Ntcheu 1406995 16 5 Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 Lilongwe 2310728 12 4 Machinga-Mangochi 1608745 16 5 Mzimba-Karonga-Rumphi 1578519 14 4 Nkhata Bay-Chitipa-Likoma 514968 16 5 Nkhotakota-Salima 773482 14 4 Malawi Total 16512568 15 4 StDev StDev = = 2 µ+µ2µ+µ 𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 𝑛ℎ,𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 Where nh,baseline = sample size of each Where nh,baseline = sample size of each strata thebaseline baseline with h=1,…,10 strata of of the with h=1,…,10 The STDev was rounded for having just two levels 5. Decide on precision and confidence level: • 90% confidence interval • Z=1.645 N.B. A confidence interval of 10% at 90% confidence is the absolute minimum! The bigger the CI then the worse the ability to detect trends! Simple Random Sampling 6. Total Sample Size (n): 2000 surveyed respondents The required sample size needed in each Governorate calculated using the equation (1.2) will be: District(s) Blantyre-Mwanza-Neno-Balaka Population rCSI Mean rCSI Stdev SRS 5%-90% 1933263 12 4 481 892772 16 5 423 Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe 3170421 14 4 353 Dedza-Ntcheu 1406995 16 5 423 Dowa-Ntchisi-Kasungu-Mchinji 2322675 16 4 271 Lilongwe 2310728 12 4 481 Machinga-Mangochi 1608745 16 5 423 Mzimba-Karonga-Rumphi 1578519 14 4 353 Nkhata Bay-Chitipa-Likoma 514968 16 5 422 Nkhotakota-Salima 773482 14 4 353 Chikwawa-Nsanje Rescaling it to the total sample size of 2000 individuals: Final sample size District(s) Blantyre-Mwanza-Neno-Balaka Chikwawa-Nsanje Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe Dedza-Ntcheu Dowa-Ntchisi-Kasungu-Mchinji Lilongwe Machinga-Mangochi Mzimba-Karonga-Rumphi Nkhata Bay-Chitipa-Likoma Nkhotakota-Salima Population rCSI Mean rCSI Stdev SRS 5%-90% Given 5/90 -> scaling by 2000 1933263 12 4 481 242 892772 16 5 423 212 3170421 14 4 353 177 Applying Human 1406995 16 5 423 212 Judgement 2322675 16 4 271 136 2310728 12 4 481 242 1608745 16 5 423 212 1578519 14 4 353 177 514968 16 5 422 212 773482 14 4 353 177 Rounding/Appying Human Judgement* 240 210 180 210 135 245 210 180 210 180 Number of Attempts – Malawi case Given: •The Response rate (25%) •Estimated sample size in each strata: Aggregated Districts Blantyre-Mwanza-Neno-Balaka Chikwawa-Nsanje Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe Dedza-Ntcheu Dowa-Ntchisi-Kasungu-Mchinji Lilongwe Machinga-Mangochi Mzimba-Karonga-Rumphi Nkhata Bay-Chitipa-Likoma Nkhotakota-Salima The desired attempts=estimated sample size/response rate Aggregated Districts Blantyre-Mwanza-Neno-Balaka Chikwawa-Nsanje Chiradzulu-Mulanje-Thyolo-Zomba-Phalombe Dedza-Ntcheu Dowa-Ntchisi-Kasungu-Mchinji Lilongwe Machinga-Mangochi Mzimba-Karonga-Rumphi Nkhata Bay-Chitipa-Likoma Nkhotakota-Salima Rounding/Appying Human Judgement* Desired Attempts 240 960 210 840 180 720 210 840 135 540 245 980 210 840 180 720 210 840 180 720 Rounding/Appying Human Judgement* 240 210 180 210 135 245 210 180 210 180