Download Probability sampling, also known as scientific sampling or random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Formalizing the Concepts:
Simple Random Sampling
Juan Muñoz
Kristen Himelein
March 2013
Purpose of sampling
To study a portion of the population – through
observations at the level of the units selected, such
as households, persons, institutions or physical
objects – and make quantitative statements about
the entire population
Purpose of sampling
• Why sampling?
– Saves cost compared to full enumeration
– Easier to control quality of sample
– More timely results from sample data
– Measurement can be destructive
Sampling Concepts and Definitions
Unit of analysis
• The level at which a measurement is taken
• Most common units of analysis are persons,
households, farms, and economic establishments
Sampling Concepts and Definitions
Target population or universe
• The complete collection of all the units of
analysis to study.
• Examples: population living in households in a
country; students in primary schools
Sampling Concepts and Definitions
Sampling frame
• List of all the units of analysis whose characteristics
are to be measured
• Comprehensive, non-overlapping and must not
contain irrelevant elements
• Units must be identifiable (often linked to
cartography)
• Should be updated to ensure complete coverage
• Examples: list of establishments; census; civil
registration
Sampling Concepts and Definitions
Parameter / Estimate
• Objective of sampling is to estimate parameters
of a population
• Quantity computed from all N values in a
population set
• Typically, a descriptive measure of a population,
such as mean, variance
– Poverty rate, average income, etc.
Sampling Concepts and Definitions
Unbiased Estimator
• Estimator - mathematical formula or function using sample
results to produce an estimate for the entire population
ˆ  ˆ( X 1 , X 2 ,..., X n )
• When the mean of individual sample estimates equals the
population parameter, then the estimator is unbiased
• Formally, an estimator is unbiased if the expected value of the
(sample) estimates is equal to the (population) parameter
being estimated (where k is the number of experiments).
ˆ1  ˆ2  ...  ˆk
k
k
 

Random sampling
• Also known as scientific sampling or probability
sampling
• Each unit has a non-zero and known probability
of selection
• Mathematical theory is available to predict the
probability distribution of the sampling error
(the error caused by observing a sample instead of the
whole population).
Random sampling techniques
• Single stage, equal probability sampling
– Simple Random Sampling (SRS)
– Systematic sampling with equal probability
• Stratified sampling
• Multi-stages sampling
In real life those techniques are usually combined
in various ways – most sampling designs are
complex
Techniques in Random Sampling
Single stage, equal probability sampling
• Random selection of n “units” from a population of N
units, so that each unit has an equal probability of
selection
– N (population ) → n (sample)
– Probability of selection (sampling fraction) = f = n/N
Is the most basic form of probability sampling and
provides the theoretical basis for more complicated
techniques
Techniques in Random Sampling
Single stage, equal probability sampling
(continued)
1. Simple Random Sampling. The investigator mixes
up the whole target population before grabbing “n”
units.
2. Systematic Random Sampling. The N units in the
population are ranked 1 to N in some order (e.g.,
alphabetic). To select a sample of n units, calculate the
step k ( k= N/n) and take a unit at random, from the
1st k units and then take every kth unit.
Techniques in Random Sampling
Single stage, equal probability sampling
• Advantage
– self-weighting (simplifies the calculation of
estimates and variances)
• Disadvantages
– Sample frame may not be available
– May entail high transportation costs
Techniques in Random Sampling
Stratified sampling
• The population is divided into mutually exclusive
subgroups called strata.
• Then a random sample is selected from each
stratum.
• Common examples : Urban / Rural, Provinces,
Male / Female
Techniques in Random Sampling
Two-stage sampling
• Units of analysis are divided into groups
called Primary Sampling Units (PSUs)
• A sample of PSUs is selected first
• Then a sample of units is chosen in each of
the selected PSUs
This technique can be generalized
(multi-stage sampling)
Sample variance & standard error
• Uncertainty is measured by the standard error (ê).
• Variance of the sample mean of an SRS of „n‟ units for a
population of size „N‟:
n  Var ( X )
 N  n  Var ( X ) 
eˆ 2  Var ( x)  

1




 N 1  n
 N n
• Measure of sampling error. Depends on 3 factors:
– ( 1 - n/N ) = Finite Population Correction (fpc)
– n = sample size
– Var(X) = Population variance. Unknown, but can be
estimated without bias by:
( xi  x)
sˆ  
n 1
i 1
n
2
x
2
Sample Variance in Proportions
• A proportion P (or prevalence) is equal to the mean of
a dummy variable.
• In this case Var(P) = P(1-P), and
pˆ (1  pˆ )
Var ( pˆ ) 
n 1
Standard deviation vs standard error
Population
2

N
= variance of the population
= standard deviation around the mean
Sample
s2
s
n
= variance of the sample
= standard error
Difference: The standard deviation is a descriptive statistic. It is degree to
which individuals in the population differ from the mean of the population. The
standard error is an estimate of how close to the population mean your
sample mean is likely to be.
Standard errors decrease with sample size. Standard deviations are left
unchanged.
Sample Standard Error
n = 100
n = 750
Bigger samples have smaller standard errors around the mean
Confidence intervals
o Estimates obtained from random samples can be
accompanied by measures of the uncertainty associated
with the estimate called confidence intervals.
o It is not sufficient to simply report the sample
proportion obtained by a candidate in the sample
survey, we also need to give an indication of how
accurate the estimate is.
Confidence intervals for averages
x  t  eˆ( x )
where:
tα = 1.28 for confidence level α = 80%
tα = 1.64 for confidence level α = 90%
tα = 1.96 for confidence level α = 95%
tα = 2.58 for confidence level α = 99%
Confidence intervals for proportions
In a sample of 1,000 electors, 280 of them (28
percent) say they will vote Green.
e
ˆ) 
Var ( p
ˆ (1  p
ˆ)
p

n 1
0.28  0.72
999
Standard error is 1.42 percent.
Confidence intervals
In a sample of 1,000 electors, 280 of them
(28 percent) say they will vote Green.
Standard error is 1.42 percent.
Standard error
24
25
26
27
28
29
30
31
95 percent confidence interval 28 ± 1.42 • 1.96
99 percent confidence interval
28 ± 1.42 • 2.58
32
Sample Size
The required sample size n is determined by
• The variability of the parameter Var(X)
Though this is unknown…
• The maximum margin of error E we are willing to accept
• How confident we want to be in that the error of our estimation
will not exceed that maximum
For each confidence level α there is a coefficient tα
• The size of the population
(not very important)
t2  Var ( X )
n 
E2
t2  P(1  P)
n 
E2
n
nN 
1  n N