Download Sampling - University of Illinois at Chicago

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

George Armitage Miller wikipedia , lookup

Vladimir J. Konečni wikipedia , lookup

Transcript
Foundations of
Research
1
Psychology 242, Dr. McKirnan
Research Sampling.
 Defining your target
population
 Probability & NonProbability sampling
methods.
Dr. David J. McKirnan, University of Illinois at
Chicago, Psychology; [email protected]
Please run this as a PowerPoint
Show

Go to “slide show” and click “run
show”.

Click through it by pressing any
key.

Focus & think about each point;
do not just passively click.
Foundations of
Research
The big picture: Research sampling
 Define your target population

 What group do you want to generalize to?
 How is / is not a member of the group?
 What is your sampling frame?
2
Foundations of
Research
3
Sampling
Sampling: Who do you want to generalize to?

Any study assesses only a sample of the
population.

There are many different ways we may collect a
sample.

There are many different populations or subpopulations we may be interested in.

The size and breadth of a sample can affect the
Internal or External validity of the study.
Psychology 242, Dr. McKirnan
Week 6; Sampling
Foundations of
Research
4
Define the target population
Who do you want to generalize to?
Mammals
Humans
All Western people
Breadth of
population to sample
from (i.e., size of
sampling frame).
Represents
increasing external
validity.
Psychology 242, Dr. McKirnan
All Americans
Young Americans
College students
UIC Students
This class
Week 6; Sampling
Specificity (and
ease) of sampling
frame.
Generally
increases
internal
validity.
Foundations of
Research

5
Who do you want to generalize to?
Samples are often comprised of very
targeted sub-populations



Demographic groups;

Ethnicity

Socio-economic status

Geography; e./g., urban dwellers…
Behavioral groups

Registered voters

Home owners
Clinical or other groups

Medical or psychiatric patients…
Psychology 242, Dr. McKirnan
Week 6; Sampling
That Specificity
increases
Internal
validity by
decreasing the
complexity of the
sample.
Foundations of
Research
Research samples & validity
EXAMPLE
Clinical drug trials illustrate the
conflict between internal v.
external validity in sampling.

People with diverse symptoms and
backgrounds see physicians for
depression.

To enhance internal validity drug
researchers use exclusion criteria to
select only participants who fit a
specific definition of depression

Zimmerman et al. suggest that too
many exclusion criteria compromises
the validity of this research area. (click
image for article)
Zimmerman, M.l, Mattia, J.I., & Posternak, M.A. (2002). Are Subjects in Pharm-acological Treatment
Trials of Depression Representative of Patients in Routine Clinical Practice? Am J Psychiatry, 159,
469–473.
Week 6; Sampling
Psychology 242, Dr. McKirnan
6
Foundations of
Research
Exclusion criteria & validity
EXAMPLE
The study begins with a large # of
people self-referred for depression
They exclude those with serious
mental illness, drug abuse or
personality disorder…
…whose symptoms are not severe
enough, are suicidal, or who have
other affective disorders..
…whose symptoms are too recent OR
too long-standing…
…and end up with a small, carefully
selected sub-set of patients (8.4% of
general depression patients).
Psychology 242, Dr. McKirnan
Week 6; Sampling
7
EXAMPLE
Foundations of
Research
External vs. internal validity in sampling

Applying rigorous study selection
criteria for drug trials excludes the
great majority of routine depression
patients.

Rigorous participant selection for
internal validity seriously
compromises external validity in
these studies.

This leaves the actual usefulness of
anti-depressant (and other)
medications for the general
population in doubt.

To be useful research must balance
the need for careful subject selection
with the need for representativeness
Psychology 242, Dr. McKirnan
Week 6; Sampling
8
Foundations of
Research
Who is a group member?
Do you use Facebook or
other media 5 times a week or
more?
A = Yes
B = No
C = Not sure – lost count.
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
9
Foundations of
Research
Who is a group member?
Are you a “Facebooker”?
A = Yes
B = No
C = Not sure – let me
facebook that.
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
10
Foundations of
Research
Who is a group member?
Are you a Latino?
A = Yes
B = No
C = Maybe – I’m not sure
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
11
Foundations of
Research
Who is a group member?
Do you speak Spanish?
A = Yes
B = No
C = Maybe – I’m not sure
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
12
Foundations of
Research
13
Define the target population
Who do you want to generalize to: who is in the group?

Once choosing our sampling group, we
must decide on criteria for membership…



To sample “Facebook users”, do I use a …

Behavioral criterion (which behavior?)

Self-identification?
To sample “Latinos”…

Is it enough to call oneself “Latino”

Is Spanish language necessary…?
Clearer and
narrower group
criteria increases
Internal validity
by making the
sample more
homogeneous.
Using a behavioral criterion (amount of
Facebook use) may yield a different sample
than self-identification.
Foundations of
Research
14
Who do you want to generalize to?
Criterion
Demographic / Behavioral
“Student”
Self-Identification
# Hours registered
Occupational Choice
Sexual patterns
Self-label
“Drug User”
# of drugs used
Perceived
dependence
“Latino”
Geographic origins.
language
Ethic identification
“gay
/ lesbian”
“Depression
patient”
Specific profile of
behaviors & symptoms
Self-referred for
treatment
The criteria used to define the group will determine who
specifically gets sampled.
Foundations of
Research
15
Who do you want to generalize to: Your “Sampling Frame”.

What is known about your larger population?

Are there Census or survey data?
 E.g., are there “population” data on depressed people?
 Do we know the demographic profiles of Facebook users?


Data about your target population will help you
determine how well your sample represents that
population.
What is its size, sub-groups, location….
 Where / how can I best recruit members of the population
 Will different recruitment methods be biased in favor of some
sub-groups?
 E.g., internet surveys are biased against less computer-oriented
people.
Foundations of
Research
Research sampling
 Defining your target
population


Probability & NonProbability sampling
methods.
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
16
Foundations of
Research
Major forms of sampling
Probability (Random) Sampling
 Recruit (or select) participants to maximize the
representativeness of the sample to a known population.
 Uses some form of random selection.
 Requires that each member of the population has a known (often
equal) probability of being selected.
 Most externally valid approach to sampling general populations
Non-Probability Sampling
 Use available samples for convenience, or targeted outreach to
unusual or small populations.
 Selection may be either systematic or haphazard, but is not
random.
 Often the most externally valid approach to unusual, small, or
extreme groups, or groups where little is known.
 When used only for convenience it is the least externally valid.
17
Foundations of
Research
Probability / Random Sampling
• Core feature: all members of the study population have
an equal chance of being sampled
• Procedure: Choose participants in a systematic, random
fashion.
• e.g., every nth name from a list, every nth number in a phone
exchange, etc.
• Advantages: eliminates obvious biases of convenience
sampling
• Limitations:
• May under-sample unusual / hard to reach
participants
• Some may be unavailable in, e.g., telephone lists,
computer files.
Psychology 242, Dr. McKirnan
Week 6; Sampling
18
Foundations of
Research
Basic Forms of random sampling
19
• Simple Random Sampling: Select a specific % of a
target population; all members of population have
about equal chance of selection.
• Multi-Stage: Randomly select population units (census
tracts, households, schools..), then randomly select individuals
within unit.
• Stratified: Random within population sub-blocks, e.g.,
gender (randomly select 50 women and randomly select 50 men), ethnicity,
etc.
• Cluster: Random within (potentially convenience) clusters,
e.g., specific locations or “venues”, events, times of day, etc.
Psychology 242, Dr. McKirnan
Week 6; Sampling
Foundations of
Research
Simple Random sampling
Objective: Attempts to truly represent the general
population; absolute minimal selection bias.
Procedure: Recruitment method where all members of
the population have ~ chance of being selected:

Examples:

Gallup polls using random digit
dialing surveys
“Long form” of the census to a small
% of U.S. households
Advantages: Most representative sampling frame for
general (non-targeted) population
Disadvantages: Any recruitment method excludes
some people (no telephone, no stable address, etc.).
Psychology 242, Dr. McKirnan
Week 6; Sampling
20
Foundations of
Research
Multi-Stage Random sampling
21
Objective: Focused & efficient random sample.
Procedure: Concentrate recruitment in specific locations
or venues.
Examples:
NIDA household drug surveys:
1) Random select moderate # of
census tracts nationally
2) randomly select small % of
households within each tract;
3) Interview 1st adult who answers
phone in each household
“CITY” HIV study among youth:
1) Randomly select bars, clubs,
other venues across the city
2) Randomly approach every 4th
person who enters the venue
to recruit for interview
Advantage: Much more efficient that simple random
Disadvantage: Same as simple random
Psychology 242, Dr. McKirnan
Week 6; Sampling
Foundations of
Research
Stratified or cluster sampling
Objective: Represent every key segment of the
population.
Procedure:
 Decide which population segments are important (e.g.
ethnic groups, census tracts, geographic areas...),
 Randomly select from each segment.
• Proportionate: Same sampling fraction from each segment;
approximates overall population
• (e.g., sample 1% of all African-Americans, 1% of all Latinos, etc…)
• Disproportionate: Unequal sampling fraction across segments,
to over-represent smaller groups
• (e.g., select larger % of recent immigrants…)
Psychology 242, Dr. McKirnan
Week 6; Sampling
22
Foundations of
Research
Non-Probability Sampling
Useful for populations that:
Cannot be randomly sampled; “hidden” or difficult to
reach
No sampling frame available, such as census data,
describing its size, composition, etc.
 Examples: drug users, gay men, homeless, etc.
Likely to misrepresent the population
May be difficult or impossible to detect this misrepresentation
Often over-sensitive to incentives: paying participants
attracts more poor people
 “Respondent Driven” sampling (RDS) allows for “targeted”
population estimates
Psychology 242, Dr. McKirnan
Week 6; Sampling
23
Foundations of
Research

Non-Probability methods (1)
Haphazard Sampling; “Man on the street”

College psychology majors

Available medical / therapy clients

Volunteer samples
Problem: No evidence for representativeness
Advantage: availability of participants
Modal Instance Sampling; “Typical” case
 Typical New Yorker describing trade tower tragedy
 Typical voter.
Problem: May not represent the modal group.
Advantage: Describe simple, “typical case”
Haphazard / Modal instance often used by journalists or
qualitative-descriptive studies; see NYT “down low” article.
Psychology 242, Dr. McKirnan
Week 6; Sampling
24
Foundations of
Research
Non-Probability methods, 2
Venue & time / space Sampling

Sample a specific, well-defined, often hard to reach group

Assume group members are well represented at specific
locations or settings (“venues”).


Use “Intercept” methods for reaching participants

Use indigenous outreach workers from the population

Develop a standard recruitment script

Collect / distribute contact information for later participation
Time / Space randomization:


Lessen bias due to choice of venue:

Randomly approach different venues at different times

Randomly select participants within the venue (e.g., every 4th person…)
Strategy must be based on a clear epidemiological or theory
question.
Examples: Shopping mall intercepts, gay recruitment
Psychology 242, Dr. McKirnan
Week 6; Sampling
25
Foundations of
Research
Outreach / venue sampling: examples of palm cards
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
26
Foundations of
Research
Outreach lead sheet
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
27
Foundations of
Research
28
Non-Probability methods, 3
Targeted Multi-Frame Sampling

Sample a specific, hard to reach group

No census or similar data for sampling frame.

Uses multiple (convenience) sampling “frames”:

Direct outreach to places where population members are available
(venue sampling).

Newsletters, internet lists & chat rooms

Organizations or meeting places

Strategy must be based on a clear epidemiological or
theory question.

Most common & valid convenience sample
Examples:


“MTV” Market segments
Shoplifters
Psychology 242, Dr. McKirnan
Week 6; Sampling


People who have risky sex
Homeless people…
Foundations of
Research
29
Non-Probability methods (3)
Snowball / “Respondent Driven” Sampling (RDS)

Early participants are paid to recruit others, who
recruit others, etc.
Choice of seeds.

Form of targeted sampling:

Recruit network of “linked” people tracked by referrals
Problem:
Eligibility criteria
Sensitive to incentives!
Advantage: Access unusual or “hidden” people related
by a common behavior.




With enough “generations” of links can well represent a
target population.
Often part of multi-frame approach.

With RDS can show “chain” of referrals / links.
Useful for people who mistrust research or where personal
contact is necessary for recruitment (HIV, drug use).
Portrays “chain” of influence or, e.g., infectious disease.
Psychology 242, Dr. McKirnan
Week 6; Sampling
Foundations of
Research
RDS coupon examples
Heckathorn, D.D. & Magnani, R. (2004). Snowball and RespondentDriven Sampling. In: Behavioral Surveillance Surveys: Guidelines for
Repeated Behavioral Surveys in Populations at Risk of HIV
Psychology 242, Dr. McKirnan
Week 6; Sampling
30
Foundations of
Research
RDS; chain description
Heckathorn, D.D. & Magnani, R. (2004). Snowball and Respondent-Driven Sampling. In:
Behavioral Surveillance Surveys: Guidelines for Repeated Behavioral Surveys in Populations at
Week 6; Sampling
Psychology
242, Dr. McKirnan
Risk of HIV.
31
Foundations of
Research
Example of social network sampling:
Bearman et al., Romantic ties among adolescents
32
With a number of
smaller chains
And a small % in 2
to 4 person chains


A substantial majority of
students are in an extended,
linked chain of relationships.
Psychology 242, Dr. McKirnan
Week 6; Sampling
From sampling
perspective,
several “seeds”
access most of
the population
Findings
suggest a clear
potential for STI
transmission.
Foundations of
Research

Non-Probability methods
Quota Sampling
Similar to cluster sampling, except
you cannot randomly sample each
Select people non-randomly
according to quotas
population segment.
 Must have clear theory / research question to pick
relevant population characteristic(s).
 Proportional quota sampling
• Represent major characteristics of a population. If gender is
important, and the proportion of women :: men in your
population = 65% :: 35%, the sample must meet that quota.
 Non-proportional quota sampling
• Sample enough members of each group to test hypothesis,
even if the sample is not proportional. (e.g., recruit 50 women &
50 men, even though the real proportion is 65::35).
• Helps assure that you have good representation of smaller
population groups.
Psychology 242, Dr. McKirnan
Week 6; Sampling
33
Foundations of
Research

Non-Probability methods
Web sampling
Typically highly targeted samples
 Gay / bisexual men…
 Adolescents…
 “Gamers”…

Typically access through existing venues:
 Users of specific web sites
 List-serves, e-mail lists
 Active recruitment in “chat rooms”
Problem: Inherent bias in computer literacy(?)
Advantage:  Cheap large national sample
 Access unusual or “hidden” people who reach
others via internet
Psychology 242, Dr. McKirnan
Week 6; Sampling
34
Foundations of
Research
Non-Probability methods;
Heterogeneity Sampling
35
• Sample every sector of a population -- at least several of
everyone -- without worrying about proportions.
• At least some members of each geographic area
• …ethnic group
• …behavioral group (voters & non-voters…)
• Assume that a few people are a good proxy for the
group.
Examples: focus groups or qualitative interviews about products,
social issues...
Problem; Cannot be sure a few people really represent
their sub-group.
Advantage: At least some representation of all subgroups.
Psychology 242, Dr. McKirnan
Week 6; Sampling
Sampling overview
Foundations of
Research
36
Who do you want to generalize to?

Summary


Who is the target population?
 broad – external validity
 narrow – internal validity
How do you decide who is a member?
 demographic / behavioral criteria?
 subjective / attitudinal?
What do you know about the population already – what
is the “sampling frame”.
Is a Probability or random sample possible?




“Hidden” population?
Socially undesirable research topic?
Easily available via telephone, door-to-door?
Sampling frame adequate to choose selection method?
Psychology 242, Dr. McKirnan
Week 6; Sampling
Foundations of
Research
37
Overview, 2
Summary
Types of Non-probability Samples








Haphazard
Modal instance
Venue – time / space
Multi-frame
Snowball / Respondent driven
Web
Quota
Heterogeneity
Psychology 242, Dr. McKirnan
Week 3; Experimental designs
Foundations of
Research
Probability sampling
 simple
 multi-stage
 cluster or stratified
Summary
38
Overview, 3
Non-probability sampling
 targeted / multi-frame
 snowball
 quota, etc.
 Most externally valid
 Assumes:
 Clear sampling frame
 Population is available
 Less externally valid for
hidden groups.
 Less externally valid
 High “convenience”
 Best when:
 No clear sampling frame
 Hidden / avoidant
population.