Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quick and Painless Introduction to Survey Methodology R. Michael Alvarez PS 120 Testing Theories or Models • Experimental data: expensive, and has validity problems • Quasi-experimental data: aggregate election statistics, other data. Suffers from various problems. • Survey data: data about individual voters Fundamentals of Surveying • Population: all elements of interest, usually in a geographic area • Sample: subset of population • Sample frame: list of sample (addresses, phone numbers, email addresses, etc) Basic Typology of Surveys • Probability designs: population elements have a known (at least in theory) probability of selection into the sample. • Nonprobability designs: population elements have an unknown probability of selection into the sample All of the statistical tools we use to study survey data are based on probability designs! Literary Digest 1936: What Went Wrong? 70 60 50 Election Literary Digest Gallup Poll Crossley Poll Fortune 40 30 20 10 0 Roosevelt Vote Literary Digest Methodology • Sent out 10 million straw ballots, using a list drawn from auto registration lists and telephone books. • 2.3 million were returned, about 25% response. Flawed sample (overrepresented rich and Republicans) Low response rate Literary Digest Fiasco Reforms Polling • Underlying flaws of Literary Digest straw polls revealed --- not using a scientific sampling procedure • Others, especially Gallup, Roper and Crossley began to work to find better ways of generating samples … • The Literary Digest soon folded! Problems Continue, 1948 60 50 40 Crossley Gallup Roper Election 30 20 10 0 Truman Dewey Thurmond Wallace New Sampling Techniques Were Flawed! • Before 1948, they used “quota sampling” • Each interviewer is assigned a fixed quota of subjects to interview from certain demographic categories … gender, age, education, residential location. • Once they met their quota, the interviewer could select anyone they desired until they conducted all their required interviews Quota Sampling • It’s not necessarily a stupid idea, as long as the underlying data (Census data?) used to construct the parameters of the sample are okay. • But, what can happen is that interviewers end up working to talk with people who are easy to contact. In 1948 that tended to be people in nice neighborhoods, with fixed addresses and phones (ie, Republicans). Random Sampling • In the 1950’s, most scientific surveys shifted to the use of random sampling • For example, Gallup in 1956 moves to the use of random selection methods and seems to generate more accurate presidential election forecasts thereafter -8 1992 1988 1984 1980 1976 1972 1968 1964 1960 1956 1952 1948 1944 1940 1936 Gallup’s Track Record 6 4 2 0 -2 Difference -4 -6 Basic Introduction to Sampling • Concept: The population (or universe or target population). • The population is the entire set of units to which a survey will be applied. Individual members of the population are called units or elements. More on sampling ... • Next, we need a list of population units from which we can draw a sample. • This list is called the SAMPLE FRAME • The basic property of a sample frame is that every unit in the population has some known chance of being selected into the sample by whatever method is used to select units Then ... • Probability sample: units are selected using a method that insures that each unit has a known, nonzero probability of being included. • Nonprobability sample: units are selected and inclusion probabilities are unknown (quota sampling …) Simple Random Sampling • All elements of population have equal probability of being sampled Cluster sampling: population is divided into clusters or groups, and clusters are sampled. Why? Cost and simplicity. Stratified sampling: population is divided into subpopulations, or strata, and sampling occurs within strata. Why? Strata might be of interest or require different methods of analysis. Sampling Error • Best way to think of survey error is in the context of proportions (percent saying “yes” or “no”). • Standard error of a proportion in SRS: se(p) = sqrt[ ( p(1-p) )/( n - 1 )] An Example of Survey Error +- 2 Standard Deviation Example 12 Standard deviation of P 10 8 P=.5 6 P=.7 P=.9 4 2 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 P=.5 10.05 7.089 5.783 5.006 4.477 4.086 3.782 3.538 3.335 3.164 3.016 2.888 2.775 2.674 2.583 2.501 2.426 2.358 2.295 2.237 P=.7 9.211 6.497 5.3 4.588 4.103 3.745 3.467 3.242 3.057 2.9 2.765 2.647 2.543 2.45 2.367 2.292 2.224 2.161 2.103 2.05 P=.9 6.03 4.253 3.47 3.004 2.686 2.452 2.269 2.123 2.001 1.898 1.81 1.733 1.665 1.604 1.55 Sample size 1.5 1.456 1.415 1.377 1.342 An Example of Nonresponse Error? March 2001 CSLP RDD NES Response and Refusal Rates 90 80 70 60 50 40 30 20 10 0 19 52 19 60 19 64 19 68 19 72 19 76 19 80 19 84 19 88 19 92 19 96 20 00 Response Refusal Response rate: interviews net of refusals and respondents who cannot provide an interview (e.g., language, etc) Misreporting: Voting in Recent Federal Elections Official Turnout NES Turnout CPS Turnout McDonald-Popkin Note: Percentage of voting age population 2004 2000 1996 1992 1988 1984 1980 1976 1972 1968 1964 90 80 70 60 50 40 30 20 10 0 Item Nonresponse • Don’t know is necessary in any survey, so that people can tell you if they don’t have an opinion • Due to uncertainty, vague questions, or respondent unwillingness to answer some questions Should Gov’t Provide More Services? More No Opinion 1996 NES Number Fewer 500 450 400 350 300 250 200 150 100 50 0 Certainty of Responses? 50 45 40 35 30 25 20 15 10 5 0 Not Pretty Certain 1 2 3 4 5 6 7 Senator Position on Abortion Scale, Alvarez and Franklin 1993 Question Wording and Order? • Would you say that traffic contributes more or less to air pollution than industry? (45% traffic primary contributor to 32% industry). • Would you say that industry contributes more or less to air pollution than traffic? (57% industry primary contributor to 24% traffic) Wanke et al. 1995 Types of Surveys • Self-administered questionnaires (mail, web) cheap but: low response rates uncertainty about who completes questionnaire Types of Surveys • Telephone: RDD/CATI quick, random? Uncertainty about respondent, difficult to ask complex questions, must be short Types of Surveys • Face-to-face (on doorstep, exit polls) highly accurate, high response rates very expensive to implement interviewer biases are problematic Internet surveying --- the future? • Cheap to implement • Quick in the field, quick with analysis • Can implement complex designs, for example, use multimedia Basic types of Internet surveys • Probability designs • Nonprobability designs • Mixtures of probability and nonprobability Probability-based Internet surveys • Intercept-based surveys of visitors to particular web sites • known email lists (students, etc) Nonprobability Internet surveys • Entertainment surveys • Self-selected surveys • Volunteer survey panels Surveys are not perfect! • Sampling error • Nonresponse (item) (difference between sample and pop.) • Coverage error (deviation between sample and frame) • Systematic sampling error; error in frame • Nonresponse (unit) bias bias • Question wording or ordering effects • Interviewer error; coding mistakes How do I evaluate survey results? • Sample size • Questionnaire design • Sampling and question wording • Item response rates • Intuition: do the results make sense? methodology (probability or nonprobability) • Estimated sampling error • Survey response rate Caltech’s National Public Relations Initiatives March 11, 2003 Brief recap of survey methodology • Survey conducted by ICR • Wednesday, February 12-Sunday February 15 • Omnibus survey • N=1010 • Tabulation presents weighted results, weighted to map to American adult population Questions 1 Considering what you might have seen or heard about the California Institute of Technology, also known as Caltech, in Pasadena, California, which of the following best describes your opinion of Caltech’s reputation. Would you say Caltech’s reputation is excellent, good, fair, or poor? Questions 2 How did you hear about Caltech? (not asked to those unable to answer 1). 3 What do you think Caltech is best known for (not asked to those unable to answer 1). Questions 4 Now, as I read each of the following topics, please tell me, generally speaking, whether or not you are interested in the topic: voting, the brain, climate changes, astronomy, earthquakes, nano-technology, detecting gravity waves Questions 5 And considering those topics in which you said you had an interest, how do you usually get news and information about these topics? (asked to only those who were interested in at least one topic) National Awareness of Caltech 60 50 40 Aware Unaware 30 20 10 0 Level of Awareness of Caltech Among General Public Caltech Awareness Successes 80 70 60 50 Aware Unaware 40 30 20 10 0 High Income College West 55-64 Caltech Awareness Challenges 70 60 50 40 Aware Unaware 30 20 10 0 North East > 64 Caltech’s Reputation 90 80 70 60 50 40 30 20 10 0 Excellent/Good Fair/Poor Caltech's Reputation as Judged by Those Aware of the Institute Media Relations Focus • National and northeast TV - visit them, pitch them, invite them to campus. • Households with children • Senior-oriented media Evaluate the Caltech Awareness Survey • Technical evaluation • Substantive evaluation • Policy evaluation