Download Sampling - Columbia Law School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Epidemiology wikipedia, lookup

Transcript
Sampling
Class 7
Goals of Sampling



Representation of a population
Representation of a specific phenomenon or
behavior that is infrequent in the population
Ensuring sufficient power for statistical
analysis
Types of Samples

Probability Samples





Simple Random Samples
Stratified Random Samples
Cluster Samples
Matched Samples (Case Controls)
Non-Probability Samples




Systematic Samples
Quota Samples
Purposive Samples
Theoretical Samples
Simple Random Samples


We use simple random samples when we don't know
how a phenomenon is distributed in the population, or
when we assume that the probability of an event is
equal for all persons in the population, or when we
assume that the population characteristics that may bear
on the phenomena being studies are evenly distributed
among the population (EPSEM)
Examples
−
−
−
Monitoring the Future – annual survey of high school youths
News Polls
General Social Survey
Stratified Random Samples


We use stratified random samples when we believe that these
population characteristics are not evenly distributed; in that case
a random sample would not ensure representativeness of the
population. Stratification means that we sample first by identify
specific population characteristics or groups, and then sampling
cases within each groups.
Examples






School research
Selection of stratifying variables?
Theoretical concerns
Demographic concerns
We oversample when we need sufficient cases of a population
that has a low base rate in the overall population, and when even
stratification procedures may not yield sufficient cases for
comparison of these groups
Example – Adolescent Health
Cluster Samples


Cluster samples are used when subjects are widely
dispersed spatially or socially. Thus, we identify the social
or spatial units first, take a sample of these, and then
sample specific subjects within each of the social or
spatial units. This method is called a multi-stage cluster
sampling procedure
Example: Lawyer Satisfaction Study



Stratify by type of practice and area of law, (e.g. oversample
patent lawyers),
BUT let other characteristics (e.g., demographics) vary
naturally
Question – how, in this example, should we deal with years of
practice?
Case Controls

Case Controls



Matched Samples
Matched Cases
Matching on other sampling units?
Non-Probabilistic Samples

Systematic Samples


Convenient but flawed. You sample based on a consistent
parameter but with a sample whose representation to the
population is uncertain. The most well know examples is
election exit polls, or market research at a shopping mall.
Quota Sample

Ensures adequate representation of specific groups, but not
with the goal of constructing a representative population.
Generally useful when phenomena is not randomly
distributed but concentrated, or when practical issues prevent
other probability-based techniques.

Example – survey of second-generation immigrants

Purposive Samples
Useful in generalizing to a specific phenomenon
when the independent variable is not widely distributed.
For example, we may want to look at the effects of
particular occupations on job satisfaction, but these
occupations may be rare (eg., driving instructors,
stenographers). We sample by identifying these
individuals and conducting observations on as many
as are needed to make valid statistical inferences.
 Examples:




People with unusual jobs (e.g., driving instructors, stenographers)
Consumers of unusual products
Persons with rare diseases

Theoretical Samples (a.k.a., snowball samples)
Sampling on the dependent variable when it is not
widely distributed and its population parameters are
unknown (precluding other sampling techniques).
 Examples:



People engaged in rare and hard-to-find behaviors
These raise problems in inference, but there are
considerable strengths in internal validity
Issues in Sample Construction


Sample attrition and mortality
Sample size



Over-samples to compensate for low base rates or specific
theoretical questions
Practical limitations in sampling
Sample error

The degree of error for a particular sampling design
s
PxQ
n
Where P and Q are parameters,n=sample size, and
s = standard error


http://www.dssresearch.com/toolkit/secalc/error.asp
Sample weighting
Power Considerations


Statistical Power -- it’s easy to get statistical significance with a large sample,
but it’s not terribly important (theoretically) if the effect size is quite small
Power is the ability of a test to detect relationships that exist in the
population, and is generally defined as the ability of a design to reject the
null hypothesis when it is false. When a study has low power, effect size
estimates will be less precise (have wider confidence intervals) and we may
incorrectly conclude that the cause and effect do not covary.
−
−
−


Type I Error – False Positive (α)
Type II Error – False Negative (β)
Power = 1- β
But….. power is a function of effect size, and Type I and II error estimates.
The effect size is determined from what is practically or theoretically
important and significant.
So, you want to specify a difference between groups that is meaningful, that
is worth detecting