Download Confidence intervals for difference of means of two independent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics 215 Lab Materials
Confidence intervals for difference of means of two independent
populations, µ 1-µ 2
Previously, we focused on a single population and parameters calculated from that population. Often we
want to compare two populations. In this section, we will be interested in comparing the means of two
populations. Specifically, we will consider the difference between the means of two populations.
This type of comparison for two populations we will want to make is between independent populations.
Independent populations imply the two groups are distinct and are not related. We might be interested in
the iron levels of the blood in two different species of baboons. We take a sample from each population
and compare the mean for each sample. Another common occurrence is for the two populations to be
similar but for each population to receive a different treatment. Comparison is then made on a
measurement related to the treatments given. For example, one fourth-grade class at Springfield
elementary might be shown a DVD about volcanoes, while the second fourth grade class at Springfield
elementary would read an article about volcanoes. The two groups would be given the same test about
volcanoes. We could then compare the two groups to see if there is a difference in the means of the two
groups.
We often we are interested in whether or not there is a difference between the means of the two
populations. Remember that because of sampling variability a difference in the sample means may not
mean that the population means are different. To account for this variability, we use a confidence interval.
As described below, we can create a confidence interval for the difference of the mean of the two
populations. If the two populations would have the same mean, then the difference of the means would be
0 (zero). For example, call the mean of the first population, µ1 and the mean of the second population, µ2.
If µ1 = µ2, then µ1 – µ2 = 0. Consequently, when we consider the confidence intervals, we are interested in
whether or not 0 (zero) is inside the confidence interval. If zero is inside the confidence interval then, we
would conclude there is no difference in the means of the two populations.
Confidence interval for the difference of independent population means, µ 1-µ 2
(Small Samples).
With two independent populations, we have two different samples from two different populations. We
need special notation to distinguish the two populations. From the first population, we will have a sample
of size n1. The average of those n1 observations will be X1 and the standard deviation will be s1. For the
first population, we will refer to the population mean as µ1 and the population standard deviation as σ1.
From the second population, we will have a sample of size n2. The average of those n2 observations will be
X2 and the standard deviation will be s2. For the first population, we will refer to the population mean as
µ2 and the population standard deviation as σ2.
The following (1-α)*100% CI for the difference of independent means can be used when
1. n1 and n2 are both less than 30 and σ1 = σ2 OR
2. n1 and n2 are both more than 2, σ1 = σ2, and the two populations possess Normal distributions.
(X1 − X 2 ) ± t(n
α
1 +n 2 −2,1− 2
) * sp *
1 1
+
n1 n 2
where
€
Page 1 of 4
Statistics 215 Lab Materials
sp =
(n1 −1)s12 + (n 2 −1)s22
n1 + n 2 − 2
sp represents an ‘average ‘ of the standard deviations (called the pooled standard deviation) from the two
samples. It is necessary to calculate sp before you can complete the calculation of the confidence interval.
€
Note that sx1 −x 2
= sp *
1
n1
+ n12 .
Example:
€ Suppose we want to construct a 90% CI for the difference of independent population means. X1 = 49.37, s1
= 4.89, n1 = 25. X2 =52.13, s2 = 5.38, n2 = 16.
sp =
(n1 −1)s12 + (n 2 −1)s22
(25 −1)4.89 2 + (16 −1)5.38 2
= s=
= 5.084
n1 + n 2 − 2
(25 +16 − 2)
Then,
€
(X1 − X 2 ) − t(n
1 1
€α 2 ) * sp * n + n
1
2
1 +n 2 −2,1−
= (49.37 − 52.13) ± t(25+16−2,0.95) * 5.084 *
€
€
= −2.76 ± 1.645 * 5.084 *
1
1
+
25 16
€
= −2.76 ± 1.645 * 5.084 *
1
1
+
25 16
1
1
+
25 `16
= −2.76 ± 2.678
€
= (-5.438, - 0.082)
€ So we are 90% confident that the difference of µ1 – µ2 is between –5.438 and –0.082.
Page 2 of 4
Statistics 215 Lab Materials
Confidence interval for the difference of independent population means, µ 1-µ 2
(Large Samples).
When we have two large samples (each sample has at least 30 observations), we can use the following
formula:
s12 s22
(X1 − X 2 ) ± z α *
+
(1− )
n1 n 2
2
Example:
The lifetimes of calculator batteries is being investigated by Consumer Digest. They find that the average
length of 45 Everset batteries is 125.245 hours and the standard deviation is 34.890 hours. For JordoVac
the average length of 50 batteries is 120.051 hours and the standard deviation is 42.801 hours. Assuming
that both sets of data are approximately Normal, create a 95% confidence interval for the mean difference
of lifetimes for these two batteries.
€
We have two distinct set of batteries. Each battery in one population is unrelated to another battery in the
other population so they are independent populations. For the samples that we have, (call the Everset
batteries population 1 and the JordoVac batteries population 2), both n1 and n2 are more than 30. Similarly
their standard deviations are approximately equal; they are close enough that we can claim that σ1 = σ2.
Consequently, we can use the formula above to make our confidence intervals.
(X1 − X 2 ) ± z
α
(1− )
2
*
s12 s22
+
n1 n 2
€
= (125.245 −120.051) ± z(0.0975) *
€
= 5.194 ± 1.96 *
34.890 2 42.8012
+
45
50
34.890 2 42.8012
+
45
50
= 5.194 ± 1.96 * 63.690
€
= 5.194 ± 15.642
€
= (-10.448, 20.836)
€
We are 95 % confident that the mean difference in calculator batteries lifetimes between Everset and
JordoVac is between –10.448 and 20.836.
Note that the differences in the sample means was 5.194; however since zero was inside the confidence
interval, we conclude with 95% confidence that 0 is a possible value for the difference between the
population means. This is because of the variability present from sample to sample. Because of the
Page 3 of 4
Statistics 215 Lab Materials
variability, we must conclude, with 95% confidence, that there is no difference between the means of these
two populations.
A note:
Some notes on confidence intervals:
1. This chapter on confidence intervals is the first that develops ideas that are statistical. For most people it
is a new way of thinking. It implies that a point estimate of a parameter is not the parameter. This forces
us to acknowledge the variability from sample to sample. And we must recognize that there is sampling
variability in any estimate, which includes almost every statistic reported in the media.
2. The reason for using a CI is the variability that comes from one sample to the next. Each sample is
different, each sample gives us a different value for a statistic. The range of a confidence interval gives an
indication of how much variability there is in the sample it was derived from. Another way to think of this
is that the smaller the variability in the sample, the more information we have about the location of the
mean.
3. There are three factors that influence the size or width of a confidence interval.
• n, as n increases, the width of the CI decreases.
• Confidence level (1-α), as confidence level increases, the width of the CI increases.
• s, the bigger s is, the wider the CI is.
4. As mentioned in the previous note, the samples size is affected by the number of observations in a
sample (or samples). It is possible to determine the minimum sample size required to estimate a population
parameter with a specified precision at a given confidence level.
5. The consequence of not having the assumptions met for a particular confidence interval is that the
confidence level is likely incorrect. It almost all cases this means that the confidence level is lower than it
should be. That is, if we make a 95% confidence interval but not all the assumptions are met for this
interval, then the true confidence level will be less (often much less) than 95%.
Page 4 of 4