Download AP_Statistics_Week_24_files/Bock - CI for 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Comparing Two Means
• Comparing two means is not very different
from comparing two proportions.
• This time the parameter of interest is the
difference between the two means,
 1 –  2.
Comparing Two Means (cont.)
• Remember that, for independent random
quantities, variances add.
• So, the standard deviation of the difference
between two sample means is
SD  x1  x2  
 12
n1

 22
n2
• We still don’t know the true standard deviations
of the two groups, so we need to estimate and
use the standard error
s12 s22
SE  x1  x2  

n1 n2
Comparing Two Means (cont.)
• Because we are working with means and
estimating the standard error of their
difference using the data, we shouldn’t be
surprised that the sampling model is a
Student’s t.
– The confidence interval we build is called a
two-sample t-interval (for the difference in
means).
Assumptions and Conditions
• Independence Assumption (Each condition
needs to be checked for both groups.):
– Randomization Condition: Were the data collected
with suitable randomization (representative random
samples or a randomized experiment)?
– 10% Condition: We don’t usually check this condition
for differences of means. We will check it for means
only if we have a very small population or an
extremely large sample.
• Normal Population Assumption:
– Nearly Normal Condition: This must be checked for
both groups. A violation by either one violates the
condition.
Assumptions and Conditions
(cont.)
• Independent Groups Assumption: The two
groups we are comparing must be independent
of each other.
Two-Sample t-Interval
When the conditions are met, we are ready to find the confidence
interval for the difference between means of two independent
groups.
The confidence interval is
 x1  x2   t

df
 SE  x1  x2 
where the standard error of the difference of the means is
s12 s22
SE  x1  x2  

n1 n2
The critical value depends on the particular confidence level, C, that you
specify and on the number of degrees of freedom, which we get from the
sample sizes and a special formula.
Degrees of Freedom
• The special formula for the degrees of
freedom for our t critical value is a bear:
2
s
s 



n
n
2 
 1
df 
2
2
2
2




s1
s2
1
1

 
 
n1  1  n1  n2  1  n2 
2
1
2
2
• Because of this, we will let technology
calculate degrees of freedom for us!
• Or use the smaller of n1 or n2
Example:
The table below gives information concerning the number of gallons
used by ten different families in two different cities. Find a 95%
confidence interval for the difference in the means.
City X 26 27 31 29 36 19 20 34 26 27
City Y 28 26 30 32 32 35 28 21 33 34
1 = population mean number of gallons for City X
2 = population mean number of gallons for City Y
Step 1: Check Assumptions/Conditions
• We will assume a SRS
• ’s are unknown, so we will use t-distribution
• Populations are independent
City X
Based on the linearity of the
Normal Probability plots, we have
an approximately normal
distributions.
City Y
Example:
The table below gives information concerning the number of gallons
used by ten different families in two different cities. Find a 95%
confidence interval for the difference in the means.
City X 26 27 31 29 36 19 20 34 26 27
City Y 28 26 30 32 32 35 28 21 33 34
Step 2: Calculate Confidence Interval
2
1
2
2
s
s
 x1  x2   t * 
n1 n2
5.42 4.252

 27.5  29.9   2.262
10
10
 7.315, 2.515
Paired Data
• Data are paired when the observations are
collected in pairs or the observations in one
group are naturally related to observations in the
other group.
• Paired data arise in a number of ways. Perhaps
the most common is to compare subjects with
themselves before and after a treatment.
– When pairs arise from an experiment, the pairing is a
type of blocking.
– When they arise from an observational study, it is a
form of matching.
Paired Data (cont.)
• If you know the data are paired, you can (and
must!) take advantage of it.
– To decide if the data are paired, consider how they
were collected and what they mean (check the W’s).
– There is no test to determine whether the data are
paired.
• Once we know the data are paired, we can
examine the pairwise differences.
– Because it is the differences we care about, we treat
them as if they were the data and ignore the original
two sets of data.
Blocking
• Consider estimating the
mean difference in age
between husbands and
wives.
• The following display is
worthless. It does no
good to compare all the
wives as a group with all
the husbands—we care
about the paired
differences.
Blocking (cont.)
• In this case, we have paired data—each
husband is paired with his respective wife. The
display we are interested in is the difference in
ages:
Insert histogram from page 579 of the text.
Blocking (cont.)
• Pairing allows us to concentrate on the
variation associated with the difference in
age for each pair.
• A paired design is an example of blocking.
Paired Data (cont.)
• Now that we have only one set of data to
consider, we can return to the simple onesample confidence interval for means.
• Mechanically, a paired t-interval is just a
one-sample confidence interval for the
mean of the pairwise differences.
– The sample size is the number of pairs.
Assumptions and Conditions
• Paired Data Assumption: The data must be
paired.
• Independence Assumption: The differences
must be independent of each other. Check the:
– Randomization Condition
• Normal Population Assumption: We need to
assume that the population of differences follows
a Normal model.
– Nearly Normal Condition: Check this with a
histogram or Normal probability plot of the
differences.
Confidence Intervals for
Matched Pairs
• When the conditions are met, we are ready to find the
confidence interval for the mean of the paired
differences.
• The confidence interval is
xd  t

n1
 SE  xd 
where the standard error of the mean difference is
sd
SE  xd  
n
The critical value t* depends on the particular confidence level, C,
that you specify and on the degrees of freedom, n – 1, which is
based on the number of pairs, n.
Confidence Intervals for Matched Pairs
Example:
Ten different families are tested for the number of gallons of water a day
they use before and after viewing a conservation video. Construct a 90%
confidence interval for the mean of the difference of the "before" minus
the "after" times.
Before 33 33 38 33 35 35 40 40 40 31
After 34 28 25 28 35 33 31 28 35 33
Before – After -1 5
13 5
0
2
9 12 5 -2
d = population mean difference in number of gallons for the ten
families before and after viewing the video
Step 1: Check Assumptions/Conditions
• We will assume a SRS
•  is unknown, so we will use
t-distribution
• Populations are dependent
Based on the linearity of
the Normal Probability plot,
we have an approximately
normal distribution.
Confidence Intervals for Matched Pairs
Example:
Ten different families are tested for the number of gallons of water a day
they use before and after viewing a conservation video. Construct a 90%
confidence interval for the mean of the difference of the "before" minus
the "after" times.
Before 33 33 38 33 35 35 40 40 40 31
After 34 28 25 28 35 33 31 28 35 33
Before – After -1 5
13 5
0
2
9 12 5 -2
Step 2: Calculate Confidence Interval
sd
 5.245 
xd  t *
 4.8  1.833 
  1.760, 7.840 
n
 10 
Confidence Intervals for Matched Pairs
Example:
Ten different families are tested for the number of gallons of water a day
they use before and after viewing a conservation video. Construct a 90%
confidence interval for the mean of the difference of the "before" minus
the "after" times.
Before 33 33 38 33 35 35 40 40 40 31
After 34 28 25 28 35 33 31 28 35 33
Before – After -1 5
13 5
0
2
9 12 5 -2
Step 3: Conclusion
We are 90% confident that the population mean difference of
number of gallons of water used “before” versus “after” viewing
the conservation video is between 1.760 and 7.840.