* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint
Sufficient statistic wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Student's t-test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
STATISTICS 200 Lecture #17 Tuesday, October 18, 2016 Textbook: Sections 9.5, 10.3, 10.4 Objectives: • Apply general confidence interval formula: Estimate plus/minus (multiplier × standard error) • Calculate new values of the multiplier for new confidence levels other than 95% • Interpret confidence level as a relative frequency • Describe the sampling distribution of a difference of two independent sample proportions. • Apply formula for the S.E. of a difference We have begun a strong focus on Inference Means Proportions One population proportion Two population proportions One population mean Difference between Means This week Mean difference Motivation Goal: Use statistical inference to answer the question “What is the percentage of Creamery customers who prefer chocolate ice cream over vanilla?” Strategy: Get a random sample of 90 individuals and ask them this question. Use the answers to perform a hypothesis test to answer the question. Motivation Goal: Use statistical inference to answer the question “What is the percentage of Creamery customers who prefer chocolate ice cream over vanilla?” Data: Of 90 respondents in our representative sample, 35 said they prefer chocolate. Let’s create a 90% confidence interval for the true percentage. Our new confidence interval formula Here, “estimate” means p-hat. (estimate – ME to estimate + ME) ME = (multiplier)*(standard error) Our new confidence interval formula Putting it all together, we get What does it mean to be 90% confident? A. There is a 90% probability that the one interval that I calculated contains the true value for the parameter. B. If I get 100 such intervals, about 90 of them will contain the true value for the parameter. C. The sample estimate has a 90% chance of being inside the calculated interval. D. The p-value has a 90% chance of being inside the interval. Recall the example from Thursday Suppose we have a sample of 200 students in STAT 100 and find that 28 of them are left handed. Find a 95% CI for the true proportion. Our sample proportion is: Our ME is Our 95% CI is On the following two slides, we'll pretend that the true population proportion is 0.12. Normal curve of sample proportions The green curve is the based on sample size 200 true distribution of p- hat. Of course, ordinarily we don't know where it lies, but at least we know its approximate standard deviation. Thus, we can build a confidence interval around our 14% estimate (in red). 0.08 0.10 0.12 0.14 0.16 0.18 sample percents If we take another sample, the red line will move but the green curve will not! 30 confidence intervals based on sample size 200 If we repeat the sampling over and over, 95% of our confidence intervals will contain the true proportion of 0.12. This is why we use the term "95% confidence interval". 0.06 0.08 0.10 0.12 0.14 0.16 sample percents 0.18 Definition of "95% confidence interval for the true population proportion": An interval of values computed from a sample that will cover the true but unknown population proportion for 95% of the possible samples. To find a 95% CI: • The center is at p-hat. • The margin of error is 2 times the S.E., where… • …the S.E. is the square root of [p-hat(1-p-hat)/n]. Recall this example: Are women more likely to have dogs? Female Male Total Has Dog 89 56.7% 66 50.8% 155 No Dog 68 43.3% 64 49.2% 132 Total 157 130 287 Your class data Recall this example: Are women more likely to have dogs? Female Male Total Has Dog 89 56.7% 66 50.8% 155 No Dog 68 43.3% 64 49.2% 132 Total 157 130 287 Let’s reframe this problem: Examine the difference between two independent proportions, that is, pf–pm. Is it zero? How about a 95% confidence interval? Our new confidence interval formula Here, “estimate” means p-hat (estimate – ME to estimate + ME) ME = (multiplier)*(standard error) The sampling distribution of As long as both p-hat1 and p-hat2 are approximately normal… ...and the two samples are independent... Then the sampling distribution is approximately normal with mean p1–p2 and standard deviation Standard error of the difference between two independent statistics (p. 335 of book) If you remember your geometry, it might help to associate the S.E. of the difference with the hypotenuse of a right triangle. The good ol’ Pythagorean theorem says Recall this example: Are women more likely to have dogs? Female Male Total Has Dog 89 56.7% 66 50.8% 155 In this dataset, No Dog 68 43.3% 64 49.2% 132 Total 157 130 287 Our new confidence interval formula (estimate – ME to estimate + ME) ME = (multiplier)*(S.E.) In this dataset, Therefore, Thus, the 95% CI is (0.059–0.118 to 0.059+0.118) or (–0.059, 0.177). Recall this example: Are women more likely to have dogs? Female Male Total Has Dog 89 56.7% 66 50.8% 155 No Dog 68 43.3% 64 49.2% 132 Total 157 130 287 The 95% CI for pf–pm is (–0.059, 0.177). Importantly, this CI contains zero. So zero (no difference) is a reasonable value! General guidelines for using CIs to make decisions • Any value not in the interval can be rejected as a likely value of the parameter. • Special case: For an interval for a difference, if zero is not in the interval then we can conclude a difference between the parameters exists. • …and finally: If you have two different Cis (on the same scale) that do not overlap, it is safe to assume there’s a significant difference. But the reverse is not true! If you understand today’s lecture… 9.49, 9.54, 10.50, 10.55, 10.57, 10.64, 10.67 Objectives: • Apply general confidence interval formula: Estimate plus/minus (multiplier × standard error) • Calculate new values of the multiplier for new confidence levels other than 95% • Interpret confidence level as a relative frequency • Describe the sampling distribution of a difference of two independent sample proportions. • Apply formula for the S.E. of a difference