Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A local McDonald’s manager will return a shipment of hamburger buns if more than 10% of the buns are crushed. A random sample of 81 buns finds 13 crushed buns. A 5% significance test is conducted to determine if the shipment should be accepted. What is the p value and the conclusion? How many buns do we need to sample in order to have a ME smaller than 2% for a significance level of 5%? In a hypothesis testing problem: (a) The null hypothesis will not be rejected unless the data are not unusual (given that the hypothesis is true). (b) The null hypothesis will not be rejected unless the p-value indicates the data are very unusual (given that the hypothesis is true). (c) The null hypothesis will not be rejected only if the probability of observing the data provide convincing evidence that it is true. (d) The null hypothesis is also called the research hypothesis; the alternative hypothesis often represents the status quo. (e) The null hypothesis is the hypothesis that we would like to prove; the alternative hypothesis is also called the research hypothesis 1 A research biologist has carried out an experiment on a random sample of 15 experimental plots in a field. Following the collection of data, a test of significance was conducted under the appropriate null and alternative hypothesis and the P-value was determined to be approximately 0.03. This indicates that: (a) The results is statistically significant at a the 0.01 level (b) The probability of being wrong in this situation is only 0.03. (c) There is some reason to believe that the null hypothesis is incorrect (d) If this experiment were repeated 3 percent of the time we would get this same results. (e) The sample is so small that little confidence can be placed on the result. In a statistical test for proportion , such as H0 : 5 Ha : 5 , if 0.05 (a) 95% of the time we will make an incorrect inference (b) 5% of the time we will say that there is a real difference when there is no difference (c) 95% of the time the null hypothesis will be correct (d) 5% of the time we will make a correct inference 2 Statistics 111 - Lecture 15 Two-Sample Inference for Proportions Count Data and Proportions • Last class, we re-introduced count data: 𝑋𝑖 = 1 with probability p 0 with probability 1 − p • Example: Pennsylvania Primary • Xi = 1 if you favor Obama, Xi = 0 if not • What is the proportion p of Obama supporters at Penn? • We derived confidence intervals and hypothesis tests for a single population proportion p 3 Two-Sample Inference for Proportions • Today, we will look at comparing the proportions between two samples from distinct populations Population 1:p1 Sample 1: Population 2:p2 𝑝1 Sample 2: 𝑝2 • Two tools for inference: • Hypothesis test for significant difference between p1 and p2 • Confidence interval for difference p1 - p2 Example: Vitamin C study • Study done by Linus Pauling in 1971 • Does vitamin C reduce incidence of common cold? • 279 people randomly given vitamin C or placebo Group Colds Total Vitamin C 17 139 Placebo 31 140 • Is there a significant difference in the proportion of colds between the vitamin C and placebo groups? 4 Hypothesis Test for Two Proportions • For two different samples, we want to test whether or not the two proportions are different: H0 : p1 = p2 versus Ha : p1p2 • The test statistic for testing the difference between two proportions is: 𝑝1 − 𝑝2 𝑍= 𝑆𝐸(𝑝1 − 𝑝2 ) • 𝑆𝐸(𝑝1 − 𝑝2 ) is called the pooled standard deviation and has the following formula: SE( pˆ1 pˆ 2 ) • 𝑝𝑝 = 1 1 pˆ p (1 pˆ p ) n1 n2 𝑌1 + 𝑌2 is called the pooled sample proportion 𝑛1 + 𝑛2 Example: Vitamin C study Vitamin C group Y1 = 17 n1 = 139 Placebo group Y2 = 31 n2 = 140 • We need the following three sample proportions: 17 + 31 17 31 = 0.17 𝑝1 = = 0.12 𝑝2 = = 0.22 𝑝𝑝 = 139 + 140 139 140 • Next, we calculate the pooled standard deviation: 𝑆𝐸 𝑝1 − 𝑝2 = 𝑝𝑝 (1 − 𝑝𝑝 ) 1 1 + 𝑛1 𝑛2 1 1 0.17 0.83 0.045 139 140 • Finally, we calculate our test statistic: 𝑍= 𝑝1 − 𝑝2 0.12 − 0.22 = = −2.22 0.045 𝑆𝐸(𝑝1 − 𝑝2 ) 5 Hypothesis Test for Two Proportions • We use the standard normal distribution to calculate a pvalue for our test statistic prob = 0.0132 Z = -2.22 • Since we used a two-sided alternative, our p-value is 2 x P(Z < -2.22) = 2 x 0.0132 = 0.0264 • At a = 0.05 level, we reject the null hypothesis • Conclusion: the proportion of colds is significantly different between the Vitamin C and placebo groups Confidence Interval for Difference • We use the two sample proportions to construct a confidence interval for the difference in population proportionsp1- p2 between two groups: C. I. = 𝑝1 − 𝑝2 ∓ 𝑍 ∗ 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 ) + 𝑛1 𝑛2 • Interval is centered at the difference of the two sample proportions • As usual, the multiple Z* you use depends on the confidence level that is needed • eg. for a 95% confidence interval, Z* = 1.96 6 Example: Vitamin C study • Want a C.I. for difference in proportion of colds p1 - p2 between Vitamin C and placebo • Need sample proportions from before: 17 31 𝑝1 = = 0.12 𝑝2 = = 0.22 139 140 • Now, we construct a 95% confidence interval: C. I. = 0.12 − 0.22 ∓ 1.96 0.12 ∙ 0.88 0.22 ∙ 0.78 + 139 140 = (−0.19, −0.01) • Vitamin C causes decrease in cold proportions between 1% and 19% Another Example • Has Shaq gotten worse at free throws over his career? • Free throws are uncontested shots given to a player when they are fouled…Shaquille O’Neal is notoriously bad at them • Two Samples: the first three years of Shaq’s career vs. a later three years of his career Group Free Throws Made Free Throws Attempted Early Years Y1 = 1353 n1 = 2425 Later Years Y2 = 1121 n2 = 2132 7 Another Example: Shaq’s Free Throws • We calculate the sample and pooled proportions 1121 1353 1353 + 1121 = 0.526 𝑝𝑝 = 𝑝1 = = 0.558 𝑝2 = = 0.543 2132 2425 2425 + 2132 • Next, we calculate the pooled standard deviation: 1 1 0.015 𝑆𝐸 𝑝1 − 𝑝2 0.543 0.457 2425 2131 • Finally, we calculate our test statistic: 𝑝1 − 𝑝2 0.558 − 0.526 𝑍= = = 2.13 0.015 𝑆𝐸(𝑝1 − 𝑝2 ) Another Example: Shaq’s Free Throws • We use the standard normal distribution to calculate a pvalue for our test statistic prob = 0.0166 Z = 2.13 • Since we used a two-sided alternative, our p-value is 2 x P(Z > 2.13) = 0.0332 • At = 0.05 level, we reject null hypothesis • Conclusion: Shaq’s free throw success is significantly different now than early in his career 8 Confidence Interval: Shaq’s FT • We want a confidence interval for the difference in Shaq’s free throw proportion: 1121 1353 𝑝2 = = 0.526 𝑝1 = = 0.558 2132 2425 • Now, we construct a 95% confidence interval: C. I. = 0.558 − 0.526 ∓ 1.96 0.558 ∙ 0.442 0.526 ∙ 0.474 + = (0.003,0.061) 2425 2132 • Shaq’s free throw percentage has decreased from anywhere between 0.3% to 6.1% Is Shaq still bad at Free Throws? 9