Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 1680.010 Chapter 19 Page 1 of 7 Chapter 19: Sample Surveys Recall: Observed Value = Actual Value + Chance Error + Bias In probability, we assumed that we knew the population parameter (for example, the chance a coin lands heads), but the samples that were observed were unknown. This is the essence of chance. In statistics, we have the reverse scenario: the sample is known, but the population parameter is unknown. Definition: Population. A population is some generalization about a class of individuals, set of measurements, either existing or conceptual Definition: Sample. A sample is a subset of measurements from the population, some part of the population being examined Definition: Inference. An inference is a generalization made about a population based on a sample. Definition: Parameter. A parameter is numerical fact about a population investigators want to know. Definition: Statistic. A statistic is a number that can be computed from a sample. Parameters are estimated by statistics. Ex. #1: Consider the following two questions and determine which is a probability question and which is a statistics question. Question #1: A fair coin is flipped 100 times. What is the chance of getting at least 60 heads? Question #2: A coin is flipped 100 times, and it lands heads 65 times. Can we reasonably assert that the coin is loaded? Math 1680.010 Chapter 19 Page 2 of 7 To determine the outcome of the next election, it is impractical to ask the entire population – unless you actually held the election. Instead, a sample is chosen, and the results from the sample are extrapolated to the population. Such an extrapolation is only reasonable when the sample is representative – the topic of today’s lecture. Ex. #2: Suppose two 16-year-old boys are asked to conduct a mall survey. Just whom do you think will be over represented in the sample? Ex. #3: Let’s examine the Literary Digest poll of 1936. Their prediction was that Roosevelt would get 43%. However, the actual percentage was 62%. What went wrong? Problems: 1. The sample was not representative of the population. 2. The Digest poll was sent to 10 million people; only 2.4 million responded. Thus, the results were subject to non-response bias. This is typical of surveys of convenience, like call-in polls or the daily poll in ESPN’s Sportzone. Non-respondents can be very different than respondents. In Chicago, the Digest predicted that Landon would win at least half of the vote. In actuality, he gained less than a third. The problem is that only 20% of Chicagoans returned the survey. In practice, low-income and high-income people tend not to respond to questionnaires, and thus the middle class is over represented. For this reason, pollsters prefer interviews, which have a higher response rate (65%) than questionnaires (25%). 2 Math 1680.010 Chapter 19 Page 3 of 7 However, non-response bias is still a concern in modern polling, which is why (in a telephone poll) the pollster will call back up to 3 times and call in the evening (or during the day or on weekends). (FYI: A modern 1,000-person poll costs $25,000 or so.) Ex. #4: Consider the poll of 1948 given below. What went wrong for the polls? How were the samples chosen? The candidates Crossley Gallup Roper The results Truman 45 44 38 50 Dewey 50 50 53 45 Thurmond 2 2 5 3 Wallace 3 4 4 2 Problem: Quota Sampling. Consider the following example of quota sampling. In this example, the interviewer in St. Louis had to survey 13 people in the following manner. Of these 13 people 6 women 7 men 1 black 6 white Monthly rent: 1 of 3 of 2 of $44.01 $18.01-$44 $18.00 or more or less In this way, the sample is force-fit to match the characteristics of the population collected from the Census Bureau. What potential bias exists in this method of sampling? 3 Math 1680.010 Chapter 19 Page 4 of 7 Keep in mind, this bias existed in the polls of 1936, 1940 and 1944. However, they were not large enough to change the predicted outcome. Ideal Method: Simple random sample: drawing tickets at random without replacement from a box of tickets. At each draw, every ticket has an equal chance of being drawn and the interviewer thus has no discretion as to whom they interview. The law of averages thus dictates that the sample percentage is close to the population percentage. Problem: In real life, there is no “master list” with the names of all 300 million Americans. Also, even if such a list existed, the potential respondents would be located (actually dispersed) all over the country, and the cost of doing personal interviews would be too high. For this reason, survey organizations tend to use multistage cluster sampling, illustrated in the figure above. 4 Math 1680.010 Chapter 19 Page 5 of 7 In all probability methods: 1) The interviewers have no discretion as to whom to interview; 2) There is a fixed algorithm for selecting the sample; and 3) It involves the planned use of chance. These processes are implemented to minimize bias, non-sampling error. Once the sample has been chosen, the desired information must then be solicited from the sample. This aspect of the modern poll is a bit of an art, and must be done with care in order to not bias the results. Problems: 1. Nonvoters. Respondents may not want to admit they will not vote. 2. Undecided. This caused a big error in the 1992 election polls. 3. Response bias. The wording of the question or tone of interviewer’s voice may affect response. 4. Non-response bias. This was discussed earlier. 5. Interviewer control. The interviewer may not follow instructions. In most random polling, telephone surveys are used. The country is divided into a number of regions, and random numbers are dialed in each area code – called random digit dialing. This eliminates the worry arising from unlisted numbers, and the Yellow Pages are used to eliminate business phone numbers. Ex. #6: What are some potential pitfalls in this method that pollsters should worry about? 5 Math 1680.010 Chapter 19 Page 6 of 7 Ex. #7: Polls often conduct telephone pre-election surveys. Could this bias the results? What if the sample is drawn from the telephone book? Ex. #8: One study on slavery estimates that “11.9% of slaves were skilled craftsmen.” This estimate was based on record of thirty plantations in Plaque-mines Parish, Louisiana. Is the statistic trustworthy? Ex. #9: In any survey, a fair number of people who are in the original sample cannot be contacted by the survey organization, or are contacted but refuse to answer questions. A high non-response rate is a serious problem for survey organizations. True or false, and explain: this problem is serious because the investigators have to spend more time and money getting additional people to bring the sample back up to its planned size. 6 Math 1680.010 Chapter 19 Page 7 of 7 Summary: Observed Value = Actual Value + Chance Error + Bias To minimize bias (or non-sampling error), a probability method uses an objective chance process to construct the sample. Large samples do not preclude the possibility of bias – but relatively small samples that are properly constructed can be used to predict the behavior of a population of millions. Even if a sample is properly chosen, bias may result when soliciting information from the sample. 7