Download 9.2 Day 4

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Section 9.4
Inference for the
Difference Between Two
Means
To determine the difference between two
population means, can we take (1   2) ?
To determine the difference between two
population means, can we take (1   2) ?
Usually not because we often do not know
the true population means.
To estimate the size of the difference
between the mean of one population
and the mean of another population, we
can use . . . .
To estimate the size of the difference
between the mean of one population and
the mean of another population, we can
use a confidence interval for the
difference between two means, (1   2) .
To estimate the size of the difference
between the mean of one population and
the mean of another population, we can
use a confidence interval for the
difference between two means, (1   2) .
This is known as a two-sample t-interval.
Confidence interval for the difference
between two means has the standard
form:
statistic  (critical value) 
(standard deviation of statistic)
Confidence interval for the difference
between two means has the standard
form: statistic  (critical value) (standard
deviation of statistic)
Confidence interval for the difference
between two means has the standard
form: statistic  (critical value) (standard
deviation of statistic)
Unlike the one-sample case, the sampling
distribution of the statistic for the
difference of two samples does not have
a t-distribution.
Unlike the one-sample case, the sampling
distribution of the statistic for the
difference of two samples does not have
a t-distribution.
The exact distribution is not even known.
The exact distribution is not even known.
However, it is known that the distribution is
reasonably close to a t-distribution if
The exact distribution is not even known.
However, it is known that the distribution is
reasonably close to a t-distribution if the
right number of degrees of freedom is
used.
The exact distribution is not even known.
However, it is known that the distribution is
reasonably close to a t-distribution if the
right number of degrees of freedom is
used.
So, how do you determine the right
number of df?
For dealing with the difference between two
means, df is approximated by a rather
complicated rule.
For dealing with the difference between two
means, df is approximated by a rather
complicated rule.
What do you notice about the degrees of
freedom (df) in this calculator display?
To construct a confidence interval for the
difference between two means, what do
we need to do?
To construct a confidence interval for the
difference between two means, what do
we need to do?
• Check conditions
To construct a confidence interval for the
difference between two means , what do
we need to do?
• Check conditions
• Do computations
To construct a confidence interval for the
difference between two means , what do
we need to do?
• Check conditions
• Do computations
• Give interpretation in context
Check Conditions
1)
For survey,
Check Conditions
1)
For survey, two samples randomly
selected from two different populations?
Check Conditions
1)
For survey, two samples randomly and
independently selected from two
different populations.
Check Conditions
1)
For survey, two samples randomly and
independently selected from two
different populations.
For experiment,
Check Conditions
1)
For survey, two samples randomly and
independently selected from two
different populations.
For experiment, two treatments randomly
assigned to available experimental units.
Check Conditions
2) normality:
Check Conditions
2) normality: two samples must look like
they came from normally distributed
populations
or
Check Conditions
2) normality: two samples must look like
they came from normally distributed
populations
or
Sample sizes are large enough that
sampling distributions of sample means
will be approximately normal
15/40 Guideline
Check Conditions
15/40 guideline can be applied to each
sample or treatment group, although it is a
bit conservative.
Check Conditions
15/40 guideline can be applied to each
sample or treatment group, although it is a
bit conservative.
For difference of two means, we can allow
our populations to be more skewed than
we did previously for estimating mean of a
population
Check Conditions
3) For survey,
Check Conditions
3) For survey, population sizes should be
at least ten times larger than sample sizes
for both samples.
Check Conditions
3) For survey, population sizes should be
at least ten times larger than sample sizes
for both samples.
Remember, this condition does not apply to
experiment.
Do Computations
Confidence interval for difference between
means of two populations, (1   2) , is:
where x1 and x2 are respective means of the
two samples, s1 and s2 are the standard
deviations, and n1 and n2 are the sample
sizes.
Do Computations
Because value of t* depends on complicated
formula, use 2-SampTInt under STAT
TESTS
You can start with actual data or summary
statistics.
Give Interpretation in Context and
Link to Computations
For survey 95% confidence interval:
I’m 95% confident that if I knew the means
of both populations, the difference
between those means, (1   2), would lie
in the confidence interval.
Give Interpretation in Context and
Link to Computations
For experiment:
If all experimental units could have been
assigned each treatment, I’m 95%
confident that the difference between the
means of the two treatment groups would
lie in the confidence interval.
Give Interpretation in Context and
Link to Computations
In either case of survey or experiment, you
must give the interpretation in context,
describing the two populations or
treatment groups.
You constructed a two-sample t-interval at
the 95% confidence level for the difference
between the mean test scores on a final
stats exam for 4th hour and 6th hour
classes.
Your interval is (- 0.23, 4.51).
Interpret your interval.
I’m 95% confident that the true difference
between the mean test scores on a final
exam for two stats classes,  4th 6th , is
in the interval (- 0.23, 4.51).
I’m 95% confident that the true difference
between the mean test scores on a final
exam for two stats classes,  4th 6th , is
in the interval (-0.23, 4.51).
Because this interval contains 0, it is
plausible that there is no difference in the
true mean test scores for these two stats
classes.
Page 631, P27
Remember, you must explain why you
believe or do not believe the necessary
conditions are met.
Page 631, P27
a) Problem states we can assume these
volunteers are independent, random
samples.
Page 631, P27
a) Problem states we can assume these
volunteers are independent, random
samples.
Both dot plots are fairly symmetric with
no outliers so it is reasonable to assume
that both samples are taken from
populations that are approximately
normally distributed.
Page 631, P27
a) Both populations are more than 10
times their respective sample sizes
(10 x 7 = 70 and 10 x 23 = 230) as there
are more than 70 left-handed people and
more than 230 right-handed people.
Page 631, P27
b) For consistency in our answer, use
left-handed volunteers as sample 1
and right-handed volunteers as
sample 2.
Use 2-SampTInt.
To Pool or Not to Pool?
Almost always select the unpooled
option.
The only situation in which the pooled
procedure has definite advantage over
unpooled is when the population standard
deviations are equal but the sample sizes
are unequal.
To Pool or Not to Pool?
Almost always select the unpooled option.
The only situation in which the pooled
procedure has definite advantage over
unpooled is when the population standard
deviations are equal but the sample sizes
are unequal.
Population standard deviations usually
unknown
Page 631, P27
2-SampTInt
Inpt: Data Stats
x1: 59.57
sx1: 14.77
n1: 7
x2: 58
sx2: 15.71
n2: 23
C-level: .95
Pooled: No Yes
Calculate
Page 631, P27
b) Left-handed volunteers are sample 1 and
right-handed volunteers are sample 2.
(-12.76, 15.899)
Page 631, P27
c) I’m 95% confident that the difference
between the mean distance all left-handed
volunteers could walk before crossing a
sideline and the mean distance all righthanded volunteers could walk before
crossing a sideline is in the interval
(-12.76, 15.899).
Page 631, P27
d) Because the interval contains 0, we do
not have statistically significant evidence
that left- and right-handed volunteers differ
in the mean number of yards they can
walk before crossing a sideline.
Page 632, E53
Page 632, E53
a) The treatments, raised in long days or
raised in short days, were randomly
assigned to subjects.
Page 632, E53
Page 632, E53
Page 632, E53
a) Both distributions are moderately skewed,
but neither has any outliers.
Since the distribution of the difference
reduces skewness, and the t-procedure is
robust against non-normality, the conditions
for inference are adequately met to proceed.
Page 632, E53
b. Interval?
Page 632, E53
b.
(0.50251, 9.46)
Page 632, E53
c. The difference between the mean enzyme
concentration of all eight hamsters had they
all been raised in short days and the mean
concentration had they all been raised in
long days.
Page 632, E53
d. Because 0 is not in the confidence
interval, and the treatments were
randomly assigned, Kelly has statistically
significant evidence that the difference in
enzyme concentrations between the two
groups of hamsters is due to the difference
in the amount of daylight.
Questions?