Download Quick and Painless Introduction to Survey Methodology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Quick and Painless Introduction
to Survey Methodology
R. Michael Alvarez
PS 120
Testing Theories or Models
• Experimental data: expensive, and has
validity problems
• Quasi-experimental data: aggregate
election statistics, other data. Suffers from
various problems.
• Survey data: data about individual voters
Fundamentals of Surveying
• Population: all elements of interest, usually
in a geographic area
• Sample: subset of population
• Sample frame: list of sample (addresses,
phone numbers, email addresses, etc)
Basic Typology of Surveys
• Probability designs: population elements
have a known (at least in theory) probability
of selection into the sample.
• Nonprobability designs: population
elements have an unknown probability of
selection into the sample

All of the statistical tools we use to study
survey data are based on probability designs!
Literary Digest 1936: What Went
Wrong?
70
60
50
Election
Literary Digest
Gallup Poll
Crossley Poll
Fortune
40
30
20
10
0
Roosevelt Vote
Literary Digest Methodology
• Sent out 10 million straw ballots, using a
list drawn from auto registration lists and
telephone books.
• 2.3 million were returned, about 25%
response.
Flawed sample (overrepresented rich and
Republicans)
 Low response rate

Literary Digest Fiasco Reforms
Polling
• Underlying flaws of Literary Digest straw
polls revealed --- not using a scientific
sampling procedure
• Others, especially Gallup, Roper and
Crossley began to work to find better ways
of generating samples …
• The Literary Digest soon folded!
Problems Continue, 1948
60
50
40
Crossley
Gallup
Roper
Election
30
20
10
0
Truman
Dewey
Thurmond
Wallace
New Sampling Techniques Were
Flawed!
• Before 1948, they used “quota sampling”
• Each interviewer is assigned a fixed quota
of subjects to interview from certain
demographic categories … gender, age,
education, residential location.
• Once they met their quota, the interviewer
could select anyone they desired until they
conducted all their required interviews
Quota Sampling
• It’s not necessarily a stupid idea, as long as
the underlying data (Census data?) used to
construct the parameters of the sample are
okay.
• But, what can happen is that interviewers
end up working to talk with people who are
easy to contact. In 1948 that tended to be
people in nice neighborhoods, with fixed
addresses and phones (ie, Republicans).
Random Sampling
• In the 1950’s, most scientific surveys
shifted to the use of random sampling
• For example, Gallup in 1956 moves to the
use of random selection methods and seems
to generate more accurate presidential
election forecasts thereafter
-8
1992
1988
1984
1980
1976
1972
1968
1964
1960
1956
1952
1948
1944
1940
1936
Gallup’s Track Record
6
4
2
0
-2
Difference
-4
-6
Basic Introduction to Sampling
• Concept: The population (or universe or
target population).
• The population is the entire set of units to
which a survey will be applied. Individual
members of the population are called units
or elements.
More on sampling ...
• Next, we need a list of population units
from which we can draw a sample.
• This list is called the SAMPLE FRAME
• The basic property of a sample frame is that
every unit in the population has some
known chance of being selected into the
sample by whatever method is used to select
units
Then ...
• Probability sample: units are selected using
a method that insures that each unit has a
known, nonzero probability of being
included.
• Nonprobability sample: units are selected
and inclusion probabilities are unknown
(quota sampling …)
Simple Random Sampling
• All elements of population have equal
probability of being sampled
Cluster sampling: population is divided into
clusters or groups, and clusters are sampled.
Why? Cost and simplicity.
 Stratified sampling: population is divided into
subpopulations, or strata, and sampling occurs
within strata. Why? Strata might be of interest
or require different methods of analysis.

Sampling Error
• Best way to think of survey error is in the
context of proportions (percent saying “yes”
or “no”).
• Standard error of a proportion in SRS:
se(p) = sqrt[ ( p(1-p) )/( n - 1 )]
An Example of Survey Error
+- 2 Standard Deviation Example
12
Standard deviation of P
10
8
P=.5
6
P=.7
P=.9
4
2
0
100
200
300
400
500
600
700
800
900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
P=.5 10.05 7.089 5.783 5.006 4.477 4.086 3.782 3.538 3.335 3.164 3.016 2.888 2.775 2.674 2.583 2.501 2.426 2.358 2.295 2.237
P=.7 9.211 6.497 5.3 4.588 4.103 3.745 3.467 3.242 3.057 2.9 2.765 2.647 2.543 2.45 2.367 2.292 2.224 2.161 2.103 2.05
P=.9 6.03 4.253 3.47 3.004 2.686 2.452 2.269 2.123 2.001 1.898 1.81 1.733 1.665 1.604 1.55
Sample size
1.5 1.456 1.415 1.377 1.342
An Example of Nonresponse
Error? March 2001 CSLP RDD
NES Response and Refusal Rates
90
80
70
60
50
40
30
20
10
0
19
52
19
60
19
64
19
68
19
72
19
76
19
80
19
84
19
88
19
92
19
96
20
00
Response
Refusal
Response rate: interviews net of refusals and respondents
who cannot provide an interview (e.g., language, etc)
Misreporting: Voting in Recent
Federal Elections
Official Turnout
NES Turnout
CPS Turnout
McDonald-Popkin
Note: Percentage of voting age population
2004
2000
1996
1992
1988
1984
1980
1976
1972
1968
1964
90
80
70
60
50
40
30
20
10
0
Item Nonresponse
• Don’t know is necessary in any survey, so
that people can tell you if they don’t have
an opinion
• Due to uncertainty, vague questions, or
respondent unwillingness to answer some
questions
Should Gov’t Provide More
Services?
More
No Opinion
1996 NES
Number
Fewer
500
450
400
350
300
250
200
150
100
50
0
Certainty of Responses?
50
45
40
35
30
25
20
15
10
5
0
Not
Pretty
Certain
1
2
3
4
5
6
7
Senator Position on Abortion Scale, Alvarez and Franklin 1993
Question Wording and Order?
• Would you say that traffic contributes more
or less to air pollution than industry? (45%
traffic primary contributor to 32% industry).
• Would you say that industry contributes
more or less to air pollution than traffic?
(57% industry primary contributor to 24%
traffic)

Wanke et al. 1995
Types of Surveys
• Self-administered questionnaires (mail,
web)
cheap
 but:

low response rates
 uncertainty about who completes questionnaire

Types of Surveys
• Telephone: RDD/CATI
quick, random?
 Uncertainty about respondent, difficult to ask
complex questions, must be short

Types of Surveys
• Face-to-face (on doorstep, exit polls)
highly accurate, high response rates
 very expensive to implement
 interviewer biases are problematic

Internet surveying --- the future?
• Cheap to implement
• Quick in the field, quick with analysis
• Can implement complex designs, for
example, use multimedia
Basic types of Internet surveys
• Probability designs
• Nonprobability designs
• Mixtures of probability and nonprobability
Probability-based Internet
surveys
• Intercept-based surveys of visitors to particular
web sites
• known email lists (students, etc)
Nonprobability Internet surveys
• Entertainment surveys
• Self-selected surveys
• Volunteer survey panels
Surveys are not perfect!
• Sampling error
• Nonresponse (item)
(difference between
sample and pop.)
• Coverage error
(deviation between
sample and frame)
• Systematic sampling
error; error in frame
• Nonresponse (unit)
bias
bias
• Question wording or
ordering effects
• Interviewer error;
coding mistakes
How do I evaluate survey results?
• Sample size
• Questionnaire design
• Sampling
and question wording
• Item response rates
• Intuition: do the
results make sense?
methodology
(probability or nonprobability)
• Estimated sampling
error
• Survey response rate
Caltech’s
National
Public Relations Initiatives
March 11, 2003
Brief recap of survey
methodology
• Survey conducted by ICR
• Wednesday, February 12-Sunday February
15
• Omnibus survey
• N=1010
• Tabulation presents weighted results,
weighted to map to American adult
population
Questions
1 Considering what you might have seen or
heard about the California Institute of
Technology, also known as Caltech, in
Pasadena, California, which of the
following best describes your opinion of
Caltech’s reputation. Would you say
Caltech’s reputation is excellent, good, fair,
or poor?
Questions
2 How did you hear about Caltech? (not
asked to those unable to answer 1).
3 What do you think Caltech is best known
for (not asked to those unable to answer 1).
Questions
4 Now, as I read each of the following topics,
please tell me, generally speaking, whether
or not you are interested in the topic:
voting, the brain, climate changes,
astronomy, earthquakes, nano-technology,
detecting gravity waves
Questions
5 And considering those topics in which you
said you had an interest, how do you usually
get news and information about these
topics? (asked to only those who were
interested in at least one topic)
National Awareness of Caltech
60
50
40
Aware
Unaware
30
20
10
0
Level of Awareness of Caltech Among General Public
Caltech Awareness Successes
80
70
60
50
Aware
Unaware
40
30
20
10
0
High Income
College
West
55-64
Caltech Awareness Challenges
70
60
50
40
Aware
Unaware
30
20
10
0
North East
> 64
Caltech’s Reputation
90
80
70
60
50
40
30
20
10
0
Excellent/Good
Fair/Poor
Caltech's Reputation as Judged by Those
Aware of the Institute
Media Relations Focus
• National and northeast TV -
visit them, pitch them, invite
them to campus.
• Households with children
• Senior-oriented media
Evaluate the Caltech Awareness
Survey
• Technical evaluation
• Substantive evaluation
• Policy evaluation