Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mar. 22 Statistic for the day: Percent of Americans 18 or older who believe Martha Stewart’s sentence should include jail time: 53% Source: gallup.com Assignment: Read Chapter 18 Exercises p329: 1, 2, 5, 6, 8, 10 These slides were created by Tom Hettmansperger and in some cases modified by David Hunter Sample percentages: categorical variable Do you believe Martha Stewart got a fair trial? Do you believe Martha Stewart’s sentence should include jail time? Gallup Poll Yes No No opinion Fair trial 66% 27% 7% Jail time 53% 40% 7% The Gallup Poll was based on 1005 telephone interviews. Based on the sample of 1005 we estimate that 53% of the population of millions believes that Martha Stewart’s sentence should include jail time. If we take a new sample of 1005 we will get a new sample percentage. It will generally not be exactly 53%. If we take lots of samples of 1005 we will get lots of sample percentages. Next we look at the histogram for the percentages. Histogram of PERCENT, with Normal Curve Frequency 20 10 0 49 50 51 52 53 54 55 56 57 58 59 PERCENT 200 percentages based on 200 samples of 1005 each. Mean = 53% (or .53) Standard deviation = (57% − 50%) / 4 = 1.75% (or .0175) How do we measure and assess the uncertainty in the sample percentage? 1 marginof error samplesize So in our example, if the sample size is 1600, then the MARGIN OF ERROR is: 1 1 1 .032 samplesize 1005 31.7 Or 3.2 % And we report 53% + 3.2% We defined the margin of error to be 2 standard deviations. We estimated the standard deviation from the histogram to be .0175. This nearly agrees since 2x.0175 = .035. Pretty close! Summary: Gallup Poll We have a simple random sample from the population of telephone owners. The sample size used was 1005. We find the percentage from our sample. The MARGIN OF ERROR is 1 divided by the square root of the sample size. For 1005 the MARGIN OF ERROR is .032. Hence we report: PERCENTAGE + .032 The margin of error does not depend on the population size, only on the sample size! Goals • To refine the idea of standard deviation (for later use in a refined margin of error). • We also want to relate this to the normal curve. In the past we: 1. used a sample to get a sample proportion 2. used a formula to get the margin of error 3. reported the sample proportion + the margin of error Now we want a formula for the standard deviation. Then we will use the new standard deviation formula to calculate a new margin of error. Formula for estimating the standard deviation of a sample proportion (don’t need histogram): sample proportion (1 sample proportion ) sample size .53 (1 .53) .016 1005 If we happen to know the true population proportion we use it instead of the sample proportion. Histogram with Normal Curve 1000 percentages each based on a sample of 1005 90 80 70 Frequency 60 50 40 30 20 10 0 0.466 std dev = .016 0.482 0.498 0.514 0.530 0.546 4 standard dev iations 0.562 0.578 0.594 Summary: 1. We take a sample of 1005 phone interviews 2. We estimate the percent of the American public that thinks that Martha Stewart should go to jail: 53% 3. To assess the uncertainty in the 53% sample figure, we think of a normal curve of percentages centered at .530 with standard deviation of .016. 4. So the normal curve has 95% of its distribution between .530 – 2x.016 and .530 + 2x.016 or Estimate 53% (.53) with 50% to 56% (.50 to .56) the reasonable interval of values. What to expect from sample proportions Facts: fingerprints may be influenced by prenatal hormones. Most people have more ridges on right hand than left. People who have more on the left hand are said to have leftward asymmetry. Women are more likely to have this trait than men. The proportion of all men who have this trait is about 15% In a study of 186 heterosexual and 66 homosexual men 26 (14%) heterosexual men showed the trait and 20 (30%) homosexual men showed the trait (Reference: Hall, J. A. Y. and Kimura, D. "Dermatoglyphic Asymmetry and Sexual Orientation in Men", Behavioral Neuroscience, Vol. 108, No. 6, 1203-1206, Dec 94. ) Is it unusual to observe a sample of 66 men and observe a sample proportion of 30%? We now know what the distribution of sample proportions based on a sample of 66 should look like. We will suppose that the true proportion in the population of men is 15%. Standard deviation .15 ( 1 .15 ) .044 66 Histogram of proportions, with Normal Curve n = 66, true proportion = .15, standard deviation = .044 Frequency 15 10 5 2 std devs 0 0.0 0.062 0.1 0.15 0.2 0.238 0.3 homosexual men 4 standard deviations The sample proportion for homosexual men (30%) is too large to come from the expected distribution of sample proportions. Sample means: measurement variables Suppose we want to estimate the mean weight at PSU Histogram of Weight, with Normal Curve 40 Frequency 30 20 10 0 100 200 300 Weight Data from stat 100 survey. Sample size 237. Mean value is 152.5 pounds. Standard deviation is about (240 – 100)/4 = 35 What is the uncertainty in the mean? We need a margin of error for the mean. Suppose we take another sample of 237. What will the mean be? Will it be 152.5 again? Probably not. Consider what happens if we take 1000 samples each of size 237 and compute 1000 means. Histogram of 1000 means with normal curve, based on samples of size 237 Frequency 100 50 0 145 150 155 Weight Standard deviation is about (157 – 148)/4 = 9/4 = 2.25 160 Formula for estimating the standard deviation of the sample mean (don’t need histogram) Just like in the case of proportions, we would like to have a simple formula to find the standard deviation of the mean without having to resample a lot of times. Suppose we have the standard deviation of the original sample. Then the standard deviation of the sample mean is: standard deviation of the data sample size So in our example of weights: The standard deviation of the sample is about 35. Hence by our formula: Standard deviation of the mean is 35 divided by the square root of 237: 35/15.4 = 2.3 (Recall we estimated it to be 2.25) So the margin of error of the sample mean is 2x2.3 = 4.6 Report 152.5 + 4.6 or 147.9 to 157.1 Example: SAT scores Suppose nationally we know that the SAT has a mean of 425 points and a standard deviation of 120 points. Draw by hand a picture of what you expect the distribution of sample means based on samples of size 100 to look like. Sample means have a normal distribution mean 425 standard deviation 120/10 = 12 So draw a bell shaped curve, centered at 425, with 95% of the bell between 425 – 24 = 401 and 425 + 24 = 449 Normal Curve of SAT means based on samples of 100 Frequency 15 10 5 4 std devs 0 390 400 410 420 425 430 440 450 460 mean = 425 std dev = 12 A sample of 100 SATs with a mean of 460 would be very unusual. A sample of 100 with a mean of 440 would not be unusual.