Download Chapter 24 - TeacherWeb

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
"Would it be mean if I said that you were just average?"
Traner
Chapter 24: COMPARING TWO MEANS (Pages 547 -573)
OVERVIEW: One can use the t-distribution on two samples that are not matched pairs.
Comparing two means is not much different than comparing two proportions. Basically,
one tests the null hypothesis that the two samples came from populations with equal means.
The parameter of interest is the difference between the two means.
Let
= mean of sample of size n1 from population P1
= standard deviation calculated from the sample
= mean of sample of size n2 from population P2
= standard deviation calculated from the sample
= unknown mean of population P1
= unknown mean of population P2
SE ( x1  x2 ) 
Remember “variances add" but standard deviations do not. The expression under the radical is an
approximation for the variance of the sampling distribution of mean differences. Hence, the square root
is an approximation for the standard deviation of the mean differences.
A confidence interval for the difference of population means is called a two-sample t-interval.
The calculation for the degrees of freedom is crazy so we will let the TI do it for us.
Of course we can’t perform a confidence interval or significance test until we meet some
assumptions/conditions.
Assumptions/Conditions:
1. Independence Assumption: the data in each group must be drawn independently.
A) Randomization condition: Data must arise from a random sample.
B) 10% condition: The sample is less than 10% of the population.
2. Normal Population Assumption: the underlying populations are each Normally distributed.
A) Nearly Normal Condition: check to see if the data from both groups come from a
distribution that unimodal and symmetric by making a histogram or Normal
probability plot.
3. Independent Groups Assumption:
To test the null hypothesis 1   2 = 0, we calculate the t statistic
1. Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm)
and a standard deviation of 5 bpm. Among 32 randomly chosen nonsmokers, the mean and
standard deviation were 74 and 6 bpm. Both sets of data were roughly symmetric and had no
outliers. Is there evidence of a difference in mean pulse rate between smokers and nonsmokers?
How big?
Solution:
Define the parameters.
s 
ns 
Hypotheses. H 0 :
between the smokers and the non-smokers mean resting pulse rate.
Ha :
between the smokers and the non-smokers mean resting pulse rate.
Model. We have independent random samples, each less than 10% of the population, and are told that
the data appear to be approximately Normal. OK to proceed with a 2-sample t-test.
Mechanics. ns 
nns 
xs 
xns 
ss 
s ns 
t
Now we run the test in the calculator to get the degrees of freedom of
Therefore, P-value =
Conclusion. Because the P-value is
error. We
the null hypothesis. We
smokers and nonsmokers.
and round up to
.
, the observed difference is unlikely to be just sampling
strong evidence of a difference in mean pulse rates for
Follow-up. How big is that difference?
( xs  xns )  t 56  SE( xs  xns ) 
*
We can be 95% confident that the average pulse rate for smokers is between
minute
than for non-smokers.
and
beats per
2. Here are the saturated fat content (in grams) for several pizzas sold by two national chains. Be sure
that in checking the conditions, students plot both sets of data.
Solution:
We want to know if the two pizza chains have significantly different mean saturated fat contents.
Define the parameter. D 
PJ 
Hypotheses. H 0 :
The null hypothesis is that there
difference in mean saturated fat content.
Ha :
The alternative hypothesis is that there
difference in mean saturated fat content.
Model.
Independent Groups Assumption – The two samples of
saturated fat contents were chosen independently of one another.
Randomization Condition: There is no mention of randomness,
so we will assume that these pizzas are representative of all pizzas
by these two chains.
Brand D
Nearly Normal Condition: Both distributions of saturated fat
content are roughly unimodal and symmetric.
Since the conditions have been met, we can do a two sample t-test
for the difference of means, with
degrees of freedom
(from the approximation formula).
Mechanics. nD 
n PJ 
x D  x PJ 
t
( x D  x PJ )   0
s 2 D s 2 PJ

nD
n PJ
xD 
x PJ 
dof =
Brand PJ
sD 
s PJ 

P  value 
Conclusion. Since the P-value
the null hypothesis.
There
to suggest that the two pizza brands have different mean saturated
fat content. Brand
appears to have more saturated fat on average than Brand
.
Follow-up.The conditions have been met, so we can create a two-sample t-interval for difference in
means, with 95% confidence.
x D  x PJ  t * 33  SE ( x D  x PJ ) 
I am 95% confident that the average saturated fat content for Brand D is between
than the average saturated fat content for Brand PJ.
and
grams