* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A non-election-related poll! A new method for margin of error:
Survey
Document related concepts
Transcript
11/2/12 Nov. 2 Statistic for the day: Prior to the 2012 summer Olympics, the percentages of Americans saying they intended to watch at least a fair amount of the Olympics was higher for women (63%) than for men (53%) A non-election-related poll! In a Gallup poll conducted Jul. 19 -22, 2012, people were asked: How much of the Olympics do you intend to watch? 59% answered "a great deal" or "a fair amount". The fine print (from gallup.com): maximum ± 4% margin of error; sample size=1030 Assignment: Read Chapter 19 Exercises pp. 367-369: 1, 2, 5, 6, 8, 10 Review Remember the empirical (68 – 95 – 99.7) rule? How did we measure and assess the uncertainty in the sample percentage back in chapter 4? margin of error = p 1 sample size 68% 95% For this Gallup poll, the sample size is 1030, so we get: margin of error = p 1 1 = = 0.031 32.9 1030 (and Gallup says “maximum is ±4 percentage points”.) A new method for margin of error: Based on the 68-95-99.7 rule, since there is something appealing about 95%, we can redefine the margin of error as Margin of error = 2 standard deviations But standard deviations of what?? -3 -2 -1 0 1 2 3 99.7% -3 It takes ±2 standard deviations to get 95%. -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 The Gallup Poll was based on 1030 telephone interviews. Based on the sample of 1030 American adults, Gallup estimated that 59% of a population of hundreds of millions planned to watch at least a fair amount of the Olympics. If they took a different sample of 1030, they would have gotten a new sample percentage. It will not always be exactly 59%. If they took lots of samples of 1030, they would get lots of sample percentages. Let's look at a hypothetical histogram for the percentages. 1 11/2/12 Histogram of 10,000 Percentages 10,000 percentages based on 10,000 samples of 1030 each. Mean = ??? (okay, I cheated. I used 50% for the true population percent. But I had to use something for the "unknown" population percent!) Approx. standard deviation = (53% − 47%) / 4 = 1.5% (or .015) 44 s 46 48 50 52 54 56 Formula for estimating the standard deviation of a sample proportion (without a histogram): sample proportion ⇥ (1 sample proportion) sample size Or in our case: standard deviation ⇡ r (.59) ⇥ (.41) = 0.015 1030 If we happen to know the true population proportion we use it instead of the sample proportion. (This is unrealistic; do you see why?) What to expect from sample proportions: An example Facts: fingerprints may be influenced by prenatal hormones. Most people have more ridges on right hand than left. People who have more on the left hand are said to have leftward asymmetry. So in our example, if the sample size is 1030, Then old method for MARGIN OF ERROR gives: margin of error = p 1 1 = = 0.031 32.9 1030 Or 3.1 % And we report 59% + 3.1% But suppose we define the margin of error to be 2 standard deviations. We estimated the standard deviation from the histogram to be .015. This nearly agrees since 2×.015 = .03. Pretty close! But creating a hypothetical histogram is a royal pain! Is there an alternative? Summary: 1. We take a sample of 1030 phone interviews 2. We estimate the percent of American adults who plan to watch the Olympics: 59% 3. To assess the uncertainty in the 59% sample figure, we think of a normal curve of percentages with a standard deviation of .015 = 1.5% 4. Since this normal curve has 95% of its distribution within 2×1.5% of the true value we want to know, we conclude that 59% plus or minus 2×1.5% is a reasonable interval of values for that true value to lie in. Notice: the old M.O.E. formula gives about the same as 2×1.5% In a study of 186 heterosexual and 66 homosexual men 26 (14%) heterosexual men showed the trait and 20 (30%) homosexual men showed the trait (Reference: Hall, J. A. Y. and Kimura, D. "Dermatoglyphic Asymmetry and Sexual Orientation in Men", Behavioral Neuroscience, Vol. 108, No. 6, 1203-1206, Dec 94. ) Is it unusual to observe a sample of 66 men and observe a sample proportion of 30%? Women are more likely to have this trait than men. The proportion of all men who have this trait is about 15% 2 11/2/12 Histogram of proportions, with Normal Curve n = 66, true proportion = .15, standard deviation = .044 We now know what the distribution of sample proportions based on a sample of 66 should look like. We will suppose that the true proportion in the population of men is 15%. Standard deviation (.15) ⇥ (1 66 .15) = 0.044 15 Frequency r 10 5 Now what? Let’s borrow some old ideas and find a zscore for the 30% observed in the experiment: 2 std devs 0 0.0 0.1 0.15 0.2 0.3 0.238 homosexual men 4 standard deviations Thus, a sample proportion of 30% would be (.30-.15)/.044 = 3.41 standard deviations above the true mean, assuming that the sample is a representative sample from the population. Sample means: measurement variables Suppose we want to estimate the mean weight at PSU Histogram of Weight, with Normal Curve The sample proportion for homosexual men (30%) is too large to come from the expected distribution of sample proportions. What is the uncertainty in the mean? We need a margin of error for the mean. Suppose we take another sample of 237. What will the mean be? Will it be 152.5 again? 40 30 Frequency 0.062 Probably not. 20 Consider what would happen if we took 1000 samples, each of size 237, and computed 1000 means. 10 0 100 200 300 Weight Data from stat 100 survey, spring 2004. Sample size 237. Mean value is 152.5 pounds. Standard deviation is about (240 – 100)/4 = 35 Hypothetical result, using a "population" that resembles our sample: Histogram of 1000 means with normal curve, based on samples of size 237 Frequency 100 50 0 145 150 155 Weight Standard deviation is about (157 – 148)/4 = 9/4 = 2.25 160 Extremely interesting: The histogram of means is bell-shaped, even though the original population was skewed! Formula for estimating the standard deviation of the sample mean (don't need histogram) Just like in the case of proportions, we would like to have a simple formula to find the standard deviation of the mean without having to resample a lot of times. Suppose we have the standard deviation of the original sample. Then the standard deviation of the sample mean is: standard deviation of the data sample size 3 11/2/12 Example: SAT math scores So in our example of weights: The standard deviation of the sample is about 35. Hence by our formula: Standard deviation of the mean is 35 divided by the square root of 237: 35/15.4 = 2.3 (Recall we estimated it to be 2.25) So the margin of error of the sample mean is 2×2.3 = 4.6 Report 152.5 ± 4.6 (or 147.9 to 157.1) Suppose nationally we know that the SAT math test has a mean of 100 points and a standard deviation of 100 points. Draw by hand a picture of what you expect the distribution of sample means based on samples of size 100 to look like. Sample means have a normal distribution mean 500 standard deviation 100/10 = 10 So draw a bell shaped curve, centered at 500, with 95% of the bell between 500 – 20 = 480 and 500 + 20 = 520 0.03 0.04 Normal curve of SAT means, sample size 100 0.02 A random sample of 100 SAT math scores with a mean of 540 would be very unusual. 0.00 0.01 A sample of 100 with a mean of 510 would not be unusual. 460 480 500 520 540 Score 4