Survey

Document related concepts

no text concepts found

Transcript

Making Inferences Sample Size, Sampling Error, and 95% Confidence Intervals • Samples: usually necessary (some exceptions) and don’t need to be huge to be accurately representative of the entire population you want to study • e.g., 1936 election between Alf Landon and FDR; Literary Digest predicted sweeping victory for Landon (based on sample of 2 million people) Sample Size, Sampling Error, and 95% Confidence Intervals • Sampling Error (also known as Standard Error): is simply the difference between the estimates obtained from the sample and the true population value (e.g., president’s approval rating of 52% (±4%); determined by the sample’s size and standard deviation • Confidence Level (also known as Confidence Interval): 95 percent confidence level or interval would mean that 95 out of 100 samples that might be selected would generate an estimate of presidential approval within the range of 48-56%. The Mayor and Your Job as Lead Pollster The Normal Distribution & Sampling • Example: For his upcoming reelection campaign, Michael Bloomberg wants to know how many Independents there are in N.Y. city, which has grown rapidly in population the last several years. Although the N.Y. Bureau of Elections reports that 25% of registered voters claim “Independent” status, he wants to test the validity of this figure. • Consequently, Bloomberg asks you to conduct a poll to estimate the proportion/percent of citizens, 18 years or older, who are Independents rather than Democrats or Republicans in NYC. • You interview 10 randomly chosen individuals and find that 2 of them are registered to vote as an Independent. Based on this finding you might think that the proportion is closer to 20%, which is a little bit below the last reported proportion. This difference, 5%, is called the sampling error. • What you need is some way to measure the uncertainty in your estimate, so that you can tell Mr. Bloomberg what the margin of error is. The Normal Distribution & Sampling, cont’d • Let’s say you repeat the interview procedure 4 more times and get estimates of 20%, 30%, 40% and 20% (2 + 3 + 4 +2 Independents all out of 10 respondents) divided by 4 = 27.5%, which is not too far from the originally reported value of 25%. • What would happen if you repeated the process over and over and over, say, 1,000 independent samples of 10 interviewees and calculated the proportion of Independents in each one? After a while you would have a substantial list of sample proportions (see the first figure on handout #1.) In a simulated example, you end up with a mean proportion of .248 (24.8%) and a standard deviation of .141 (14.1%). • Now think of standard error as an indicator of how much uncertainty there is in your estimate. We see in the first figure, for example, that about 2/3rds of the estimates are in the range of .248 +/- .141, or between .107 and .389 (10.7% and 38.9%). Consider this the “68% confidence interval.” The other third of the samples are below .107 (10.7%) or above .389 (38.9%). The Normal Distribution & Sampling, cont’d • In other words, after 1,000 samples of 10 randomly chosen NYC voters, you could tell Michael Bloomberg that the proportion of Independents in New York City is probably between 10% and 39%. When he asks, “What do you mean by ‘probably’?” a technical answer on your part would be, “I’m 68% sure.” • Not surprisingly, he threatens to fire you by the end of the month if you can’t do better than that. He wants you to narrow the range of uncertainty. What do you do? You ask for another $1 million, so that you can take larger sample sizes (e.g., 50 randomly chosen people instead of 10). • Repeating the same process of 1,000 samples, but this time with 50 NYC voters, in each sample, you get the results in the 2nd figure on handout #1: mean proportion of .251 (25.1%) and a standard deviation of .064 (6.4%). • Now you can tell Mr. Bloomberg that the proportion of Independents in NYC is probably, with 68% confidence, between 19% and 31%. “You’re getting better,” he says, “but I still want some more certainty.” “O.k.,” you say, “show me some more $$ and I’ll get your some more certainty.” The Normal Distribution & Sampling, cont’d • By the time you’ve conducted 1,000 samples of interviews with 500 randomly selected NYC voters (see figure 4 on handout #2), your mean is still .250 (25%), but with a standard deviation now of only .019 (essentially 2%). • Hence, now you can tell Mr. Bloomberg that the proportion of Independents in NYC is probably, with 95% confidence (2 standard deviations either way; 1 standard deviation either way would give you 68% confidence), between 21% and 29%. • But let’s say that earlier in the process, Mayor Bloomberg randomly surveyed 10 individuals (e.g. friends, butler, misc. staff) about their voter status and found a rate of 35% registered Independents. He asks for you to demonstrate why the upcoming campaign shouldn’t work with his figure instead. I mean, he got it himself personally. • You have to show him, based on your original study of 1,000 samples of 10 random New York City voters, what the likelihood is of a finding of 35% registered Independents. This is first done by computing the amount of random sampling error or standard error: (Pollack, p. 106) Sampling or Standard error = standard deviation ÷ square root of the sample size (n=10) Sampling or Standard error = 14.1% divided by 3.16 Sampling or Standard error = 4.5% Inference Using the Normal Distribution & Z Scores • The central limit theorem (Pollack, p. 108) tells us that there is a 68% chance that the true population mean of NYC voters registered as “Independent” lies within plus or minus 1 standard error of the sample mean (25% in your study), and there is a 95% chance that it lies within plus or minus 1.96 standard errors of the sample mean (again, 25%). • Conversely, there is only a 5% probability that the true population of NYC voters registered as “Independent” is more or less than 1.96 standard errors away from the sample mean: • “Low” end of 95% confidence interval = sample mean – 1.96 standard errors = 25% – 1.96 (4.5%) = 16.2% • “High” end of 95% confidence interval = sample mean + 1.96 standard error = 25% + 1.96 (4.5%) = 33.8% Conclusion: Invariably, then, 95% of all possible random samples of 10 NYC voters will produce sample means of between 16.2% and 33.8% “Independent” registered voters. Inference Using the Normal Distribution & Z Scores • Given these results, how “random” is Mayor Bloomberg’s finding of 35% registered Independents among NYC’s voting population? • First, standardize his 35% finding into a Z score: Z = Bloomberg mean – larger sample mean of 1,000 samples of 10 voters ÷ standard error Z = 35% - 25% divided by 4.5% Z = 10% divided by 4.5% Z = 2.27 • Based on the table of Z scores (Pollack, p. 110), how likely is it that a truly random sample of registered NYC voters would find 35% to be Independents? .0116 = 1.16 or 1.2% (Basically, you could say, “Mr. Mayor, the odds of finding that 35% of registered voters in NYC are “Independent” are 1 out of a 100 and, congratulations Sir, you got that one. Now stop wasting my time and let me do the polling in this campaign.”