Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Formalizing the Concepts: Simple Random Sampling Juan Muñoz Kristen Himelein March 2013 Purpose of sampling To study a portion of the population – through observations at the level of the units selected, such as households, persons, institutions or physical objects – and make quantitative statements about the entire population Purpose of sampling • Why sampling? – Saves cost compared to full enumeration – Easier to control quality of sample – More timely results from sample data – Measurement can be destructive Sampling Concepts and Definitions Unit of analysis • The level at which a measurement is taken • Most common units of analysis are persons, households, farms, and economic establishments Sampling Concepts and Definitions Target population or universe • The complete collection of all the units of analysis to study. • Examples: population living in households in a country; students in primary schools Sampling Concepts and Definitions Sampling frame • List of all the units of analysis whose characteristics are to be measured • Comprehensive, non-overlapping and must not contain irrelevant elements • Units must be identifiable (often linked to cartography) • Should be updated to ensure complete coverage • Examples: list of establishments; census; civil registration Sampling Concepts and Definitions Parameter / Estimate • Objective of sampling is to estimate parameters of a population • Quantity computed from all N values in a population set • Typically, a descriptive measure of a population, such as mean, variance – Poverty rate, average income, etc. Sampling Concepts and Definitions Unbiased Estimator • Estimator - mathematical formula or function using sample results to produce an estimate for the entire population ˆ ˆ( X 1 , X 2 ,..., X n ) • When the mean of individual sample estimates equals the population parameter, then the estimator is unbiased • Formally, an estimator is unbiased if the expected value of the (sample) estimates is equal to the (population) parameter being estimated (where k is the number of experiments). ˆ1 ˆ2 ... ˆk k k Random sampling • Also known as scientific sampling or probability sampling • Each unit has a non-zero and known probability of selection • Mathematical theory is available to predict the probability distribution of the sampling error (the error caused by observing a sample instead of the whole population). Random sampling techniques • Single stage, equal probability sampling – Simple Random Sampling (SRS) – Systematic sampling with equal probability • Stratified sampling • Multi-stages sampling In real life those techniques are usually combined in various ways – most sampling designs are complex Techniques in Random Sampling Single stage, equal probability sampling • Random selection of n “units” from a population of N units, so that each unit has an equal probability of selection – N (population ) → n (sample) – Probability of selection (sampling fraction) = f = n/N Is the most basic form of probability sampling and provides the theoretical basis for more complicated techniques Techniques in Random Sampling Single stage, equal probability sampling (continued) 1. Simple Random Sampling. The investigator mixes up the whole target population before grabbing “n” units. 2. Systematic Random Sampling. The N units in the population are ranked 1 to N in some order (e.g., alphabetic). To select a sample of n units, calculate the step k ( k= N/n) and take a unit at random, from the 1st k units and then take every kth unit. Techniques in Random Sampling Single stage, equal probability sampling • Advantage – self-weighting (simplifies the calculation of estimates and variances) • Disadvantages – Sample frame may not be available – May entail high transportation costs Techniques in Random Sampling Stratified sampling • The population is divided into mutually exclusive subgroups called strata. • Then a random sample is selected from each stratum. • Common examples : Urban / Rural, Provinces, Male / Female Techniques in Random Sampling Two-stage sampling • Units of analysis are divided into groups called Primary Sampling Units (PSUs) • A sample of PSUs is selected first • Then a sample of units is chosen in each of the selected PSUs This technique can be generalized (multi-stage sampling) Sample variance & standard error • Uncertainty is measured by the standard error (ê). • Variance of the sample mean of an SRS of „n‟ units for a population of size „N‟: n Var ( X ) N n Var ( X ) eˆ 2 Var ( x) 1 N 1 n N n • Measure of sampling error. Depends on 3 factors: – ( 1 - n/N ) = Finite Population Correction (fpc) – n = sample size – Var(X) = Population variance. Unknown, but can be estimated without bias by: ( xi x) sˆ n 1 i 1 n 2 x 2 Sample Variance in Proportions • A proportion P (or prevalence) is equal to the mean of a dummy variable. • In this case Var(P) = P(1-P), and pˆ (1 pˆ ) Var ( pˆ ) n 1 Standard deviation vs standard error Population 2 N = variance of the population = standard deviation around the mean Sample s2 s n = variance of the sample = standard error Difference: The standard deviation is a descriptive statistic. It is degree to which individuals in the population differ from the mean of the population. The standard error is an estimate of how close to the population mean your sample mean is likely to be. Standard errors decrease with sample size. Standard deviations are left unchanged. Sample Standard Error n = 100 n = 750 Bigger samples have smaller standard errors around the mean Confidence intervals o Estimates obtained from random samples can be accompanied by measures of the uncertainty associated with the estimate called confidence intervals. o It is not sufficient to simply report the sample proportion obtained by a candidate in the sample survey, we also need to give an indication of how accurate the estimate is. Confidence intervals for averages x t eˆ( x ) where: tα = 1.28 for confidence level α = 80% tα = 1.64 for confidence level α = 90% tα = 1.96 for confidence level α = 95% tα = 2.58 for confidence level α = 99% Confidence intervals for proportions In a sample of 1,000 electors, 280 of them (28 percent) say they will vote Green. e ˆ) Var ( p ˆ (1 p ˆ) p n 1 0.28 0.72 999 Standard error is 1.42 percent. Confidence intervals In a sample of 1,000 electors, 280 of them (28 percent) say they will vote Green. Standard error is 1.42 percent. Standard error 24 25 26 27 28 29 30 31 95 percent confidence interval 28 ± 1.42 • 1.96 99 percent confidence interval 28 ± 1.42 • 2.58 32 Sample Size The required sample size n is determined by • The variability of the parameter Var(X) Though this is unknown… • The maximum margin of error E we are willing to accept • How confident we want to be in that the error of our estimation will not exceed that maximum For each confidence level α there is a coefficient tα • The size of the population (not very important) t2 Var ( X ) n E2 t2 P(1 P) n E2 n nN 1 n N