Download A local McDonald`s manager will return a shipment of hamburger

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
A local McDonald’s manager will return a shipment of
hamburger buns if more than 10% of the buns are
crushed. A random sample of 81 buns finds 13 crushed
buns. A 5% significance test is conducted to determine if
the shipment should be accepted. What is the p value and
the conclusion?
How many buns do we need to sample in order to have a
ME smaller than 2% for a significance level of 5%?
In a hypothesis testing problem:
(a) The null hypothesis will not be rejected unless the data are
not unusual (given that the hypothesis is true).
(b) The null hypothesis will not be rejected unless the p-value
indicates the data are very unusual (given that the hypothesis
is true).
(c) The null hypothesis will not be rejected only if the probability
of observing the data provide convincing evidence that it is
true.
(d) The null hypothesis is also called the research hypothesis; the
alternative hypothesis often represents the status quo.
(e) The null hypothesis is the hypothesis that we would like to
prove; the alternative hypothesis is also called the research
hypothesis
1
A research biologist has carried out an experiment on a random
sample of 15 experimental plots in a field. Following the
collection of data, a test of significance was conducted under the
appropriate null and alternative hypothesis and the P-value was
determined to be approximately 0.03. This indicates that:
(a) The results is statistically significant at a the 0.01 level
(b) The probability of being wrong in this situation is only 0.03.
(c) There is some reason to believe that the null hypothesis is
incorrect
(d) If this experiment were repeated 3 percent of the time we
would get this same results.
(e) The sample is so small that little confidence can be placed on
the result.
In a statistical test for proportion , such as
H0 :   5
Ha :   5
, if   0.05
(a) 95% of the time we will make an incorrect inference
(b) 5% of the time we will say that there is a real difference
when there is no difference
(c) 95% of the time the null hypothesis will be correct
(d) 5% of the time we will make a correct inference
2
Statistics 111 - Lecture 15
Two-Sample Inference for
Proportions
Count Data and Proportions
• Last class, we re-introduced count data:
𝑋𝑖 =
1 with probability p
0 with probability 1 − p
• Example: Pennsylvania Primary
• Xi = 1 if you favor Obama, Xi = 0 if not
• What is the proportion p of Obama supporters at
Penn?
• We derived confidence intervals and hypothesis tests for
a single population proportion p
3
Two-Sample Inference for Proportions
• Today, we will look at comparing the proportions between
two samples from distinct populations
Population 1:p1
Sample 1:
Population 2:p2
𝑝1
Sample 2:
𝑝2
• Two tools for inference:
• Hypothesis test for significant difference between p1 and p2
• Confidence interval for difference p1 - p2
Example: Vitamin C study
• Study done by Linus Pauling in 1971
• Does vitamin C reduce incidence of common cold?
• 279 people randomly given vitamin C or placebo
Group
Colds
Total
Vitamin C
17
139
Placebo
31
140
• Is there a significant difference in the proportion of colds
between the vitamin C and placebo groups?
4
Hypothesis Test for Two Proportions
• For two different samples, we want to test whether or not
the two proportions are different:
H0 : p1 = p2 versus Ha : p1p2
• The test statistic for testing the difference between two
proportions is:
𝑝1 − 𝑝2
𝑍=
𝑆𝐸(𝑝1 − 𝑝2 )
• 𝑆𝐸(𝑝1 − 𝑝2 ) is called the pooled standard deviation and
has the following formula:
SE( pˆ1  pˆ 2 ) 
• 𝑝𝑝 =
1 1
pˆ p (1  pˆ p )  
 n1 n2 
𝑌1 + 𝑌2
is called the pooled sample proportion
𝑛1 + 𝑛2
Example: Vitamin C study
Vitamin C group
Y1 = 17
n1 = 139
Placebo group
Y2 = 31
n2 = 140
• We need the following three sample proportions:
17 + 31
17
31
= 0.17
𝑝1 =
= 0.12
𝑝2 =
= 0.22 𝑝𝑝 =
139 + 140
139
140
• Next, we calculate the pooled standard deviation:
𝑆𝐸 𝑝1 − 𝑝2 =
𝑝𝑝 (1 − 𝑝𝑝 )
1
1
+
𝑛1 𝑛2
1 
 1
 0.17  0.83  

  0.045
 139 140 
• Finally, we calculate our test statistic:
𝑍=
𝑝1 − 𝑝2
0.12 − 0.22
=
= −2.22
0.045
𝑆𝐸(𝑝1 − 𝑝2 )
5
Hypothesis Test for Two Proportions
• We use the standard normal distribution to calculate a pvalue for our test statistic
prob = 0.0132
Z = -2.22
• Since we used a two-sided alternative, our p-value is 2 x
P(Z < -2.22) = 2 x 0.0132 = 0.0264
• At a  = 0.05 level, we reject the null hypothesis
• Conclusion: the proportion of colds is significantly
different between the Vitamin C and placebo groups
Confidence Interval for Difference
• We use the two sample proportions to construct a
confidence interval for the difference in population
proportionsp1- p2 between two groups:
C. I. = 𝑝1 − 𝑝2 ∓ 𝑍 ∗
𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
+
𝑛1
𝑛2
• Interval is centered at the difference of the two sample
proportions
• As usual, the multiple Z* you use depends on the
confidence level that is needed
• eg. for a 95% confidence interval, Z* = 1.96
6
Example: Vitamin C study
• Want a C.I. for difference in proportion of colds p1 - p2
between Vitamin C and placebo
• Need sample proportions from before:
17
31
𝑝1 =
= 0.12
𝑝2 =
= 0.22
139
140
• Now, we construct a 95% confidence interval:
C. I. = 0.12 − 0.22 ∓ 1.96
0.12 ∙ 0.88 0.22 ∙ 0.78
+
139
140
= (−0.19, −0.01)
• Vitamin C causes decrease in cold proportions between
1% and 19%
Another Example
• Has Shaq gotten worse at free throws over his career?
• Free throws are uncontested shots given to a player when they are
fouled…Shaquille O’Neal is notoriously bad at them
• Two Samples: the first three years of Shaq’s career vs. a later
three years of his career
Group
Free Throws
Made
Free Throws
Attempted
Early
Years
Y1 = 1353
n1 = 2425
Later
Years
Y2 = 1121
n2 = 2132
7
Another Example: Shaq’s Free Throws
• We calculate the sample and pooled proportions
1121
1353
1353 + 1121
= 0.526 𝑝𝑝 =
𝑝1 =
= 0.558 𝑝2 =
= 0.543
2132
2425
2425 + 2132
• Next, we calculate the pooled standard deviation:
1 
 1

  0.015
𝑆𝐸 𝑝1 − 𝑝2  0.543 0.457  
 2425 2131
• Finally, we calculate our test statistic:
𝑝1 − 𝑝2
0.558 − 0.526
𝑍=
=
= 2.13
0.015
𝑆𝐸(𝑝1 − 𝑝2 )
Another Example: Shaq’s Free Throws
• We use the standard normal distribution to calculate a pvalue for our test statistic
prob = 0.0166
Z = 2.13
• Since we used a two-sided alternative, our p-value is 2 x
P(Z > 2.13) = 0.0332
• At  = 0.05 level, we reject null hypothesis
• Conclusion: Shaq’s free throw success is significantly
different now than early in his career
8
Confidence Interval: Shaq’s FT
• We want a confidence interval for the difference in Shaq’s
free throw proportion:
1121
1353
𝑝2 =
= 0.526
𝑝1 =
= 0.558
2132
2425
• Now, we construct a 95% confidence interval:
C. I. = 0.558 − 0.526 ∓ 1.96
0.558 ∙ 0.442 0.526 ∙ 0.474
+
= (0.003,0.061)
2425
2132
• Shaq’s free throw percentage has decreased from
anywhere between 0.3% to 6.1%
Is Shaq still bad at Free Throws?
9