Download 12.6 OMPARING TWO POPULATION MEANS The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
4. Repetition of Trials: Instead of actually sampling 100 times from the
population and finding the means, we imagine what the results of such
sampling would be. We can skip to step 5 and use a t table instead of doing
simulations to evaluate p(t ⬎ 1.32).
5. Estimation of the Probability of the Obtained Average or More
(Probability of a Successful Trial): We want to know the chance of
obtaining an average increase in number of hours of sleep as large as or
larger than 0.75. This will equal the chance that when we sample from a
population with a mean of 0, the t statistic is as large as or larger than 1.32.
We have to look to the t table given by Appendix F. As with the chi-square
tables, we need to know the number of degrees of freedom (df). Recall that
n denotes our sample size. For this t test the number of degrees of freedom
is n ⫺ 1, which in this case is 9. Looking in the t table, we see that for df ⳱ 9,
the number 1.38 is in the column under 0.10. What this means is that the
chance that the t statistic is above 1.38 is 0.1. We want the chance that the t
statistic is above 1.32, which must be slightly more than 0.1.
6. Decision: We found in step 5 that the probability of obtaining from
the null-hypothesis population a t statistic of 1.38 or greater is 0.1. Thus we
decide to accept the null hypothesis: we are not convinced that the dextro
treatment does any good.
SECTION 12.5 EXERCISES
1. For the data in Table 12.4, test whether the
mean population increase in hours due to the
laevo drug is 0. The mean of the 10 observations is 2.33, and the standard deviation
is 1.90.
2. For the data in Table 12.4, test whether the
mean population difference in hours of sleep
between laevo and dextro is 0. The mean of
the 10 differences is 1.58, and the standard
deviation is 1.17.
3. A farmer believes the corn yield from his land
will have a mean of 50 bushels per acre. In
a random sample of eight plots, the farmer
finds a mean yield of 49.25 bushels per acre
with standard deviation of 3.25 bushels per
acre. Test the farmer’s claim. (The data have
an approximately normal histogram.)
4. Suppose the producer of cattle feed in Exercise
1 of Section 12.4 claims that the mean weight
gain will be 160 pounds, not 100 pounds.
Use the data from the problem to test the
company’s claim.
For additional exercises, see page 731.
12.6 COMPARING TWO POPULATION MEANS
The Bootstrap Approach
Until now we have tested hypotheses for one population. But the most
common statistical problem is to compare two population means. That is
the topic of this section.
A small study was conducted several years ago to determine whether
taking the drug LSD has deleterious effects on one’s chromosomes. Four
users of LSD and four controls (nonusers) were studied. A sample of cells
was taken from each person, and the percentage of cells in which there was
chromosomal breakage was recorded. Here are the data:
Controls:
Users:
3.3
0.9
4.8
2.6
6.4
3.4
7.1
11.5
Mean ⳱ 5.40
Mean ⳱ 4.60
SD ⳱ 1.47
SD ⳱ 4.08
That is, the first control had breakage in 3.3% of the studied cells, the second
control had 4.8% breakage and so on. Looking at the mean breakages, we
see that actually the users had a slightly smaller amount of breakage. Is that
difference due to chance? Because there is such a small sample, one would
be inclined to say yes. But how can one formally test the hypothesis?
What, precisely, is the null hypothesis? It is that the mean breakages in
the two populations are equal, the populations being the population of users
and the population of controls:
H0 : The mean breakage in the control population
⳱ the mean breakage in the user population
We will focus on the difference between sample means of the controls
and the users in the sample: 5.40 ⫺ 4.60 ⳱ 0.80. Is this difference small
enough to be compatible with the null hypothesis? What is the chance of
getting a difference that large if the two population means are the same?
Let’s turn to the six steps. There are some modifications because we have
two populations to worry about instead of just one.
1. Choice of a Model (Two Populations): We have two populations
now, the controls and the users. The null hypothesis insists only that the
two populations have the same mean; beyond that, we merely expect the
populations to have the same distributional shapes as the samples. We have
no reason to assume that the populations are normal in shape, so we will
not try a t-test approach. Instead, we will use a bootstrap approach, as
introduced in Section 12.1.
We invent two populations. The control population will be the invented
population obtained by replicating the control data:
Invented control population
3.3
4.8
6.4
7.1
3.3
4.8
6.4
7.1
3.3
4.8
6.4
7.1
⭈⭈⭈
⭈⭈⭈
⭈⭈⭈
⭈⭈⭈
3.3
4.8
6.4
7.1
The mean of that invented control population is still 5.40. In order for our
invented user population to have that same mean, we have to add 0.80 to
each value in the sample of users before replicating, so that the new values
are 0.9 Ⳮ 0.8 ⳱ 1.7, 2.6 Ⳮ 0.8 ⳱ 3.4, 3.4 Ⳮ 0.8 ⳱ 4.2, and 11.5 Ⳮ 0.8 ⳱ 12.3.
These new numbers have a mean of 5.40, as desired. Thus we now
have two populations with the same population mean of 5.40. Moreover,
we have created two populations whose shapes closely approximate the
unknown shapes of the true populations. This is true because the two sample
histograms that form our created populations are in fact good estimates of
the unknown population histograms.
Invented user population
1.7
3.4
4.2
12.3
1.7
3.4
4.2
12.3
1.7
3.4
4.2
12.3
⭈⭈⭈
⭈⭈⭈
⭈⭈⭈
⭈⭈⭈
1.7
3.4
4.2
12.3
2. Definition of a Trial (Sample): A trial consists of randomly choosing
without replacement four persons from the control population and four
from the user population. (We could have avoided the creation of the
large null hypothesis populations and instead sampled with replacement
from small populations—review Section 12.1 if needed to understand this
assertion.)
3. Definition of a Successful Trial: The statistic of interest is the difference
between the means of the four controls and four users sampled in step 2. A
trial is a success if the difference is 0.8 or more.
4. Repetition of Trials: We repeat steps 2 and 3 100 times. Thus we obtain
100 simulated differences in means. Table 12.5 gives the stem-and-leaf plot
of the 100 simulated differences.
5. Estimation of the Probability of the Obtained Difference in Means
or More (Probability of a Successful Trial): The difference in means
obtained from the original data was 0.8. If we look at the stem-and-leaf plot
of step 4, we see that 36 of the simulated differences were 0.8 or more.
6. Decision: From step 5 we estimate a 0.36 chance that under the null
hypothesis, the difference in the means will be as large as or larger than what
we obtained. That is, it is not at all unusual to see a difference of 0.8 when
the populations have the same mean. Thus we accept the null hypothesis.
Now is a good time to reiterate that accepting the null hypothesis is
not the same as believing that the null hypothesis is actually true. All we
have done is shown that the null hypothesis and the data are compatible.
It could very well be that the reason we do not see much difference in
Table 12.5
One Hundred
Simulated Mean Differences
for Sleep Example
⫺6
⫺5
⫺4
⫺3
⫺2
⫺1
⫺0
0
1
2
3
4
5
4
Key:
“⫺6
310
8887652210
9754222
65433220
9766664443332222221100
122223344456689
022333378888
000122456788899
002448
0
4” stands for ⫺6.4.
the means is that we have too small a sample. This analysis does not
prove that LSD has no effects on chromosomal breakage. Nor does it prove
that LSD does effect chromosomal breakage. We cannot conclude much of
anything! In fact, the actual means suggest that LSD could actually retard
chromosomal damage—something that is very unlikely to be true. Perhaps
the real problem is the very small sample size of four from each population
used!
Larger Sample Sizes: Another z Test
In Section 11.7 we learned that if both sample sizes are fairly large (each ⱖ 30
is a common rule), the difference in the two sample means is approximately
normally distributed. Hence, if the sample sizes are fairly large, we can use
this fact to find a z statistic to test a hypothesis involving the equality of
two population means. Let’s consider an example. A beginning statistics
class (STAT 100) at the University of Illinois at Urbana-Champaign had 104
students, 64 women and 40 men. The average percentage on homework
assignments among the women was 78.56, and that among the men was
75.04. Thus the women did on average 3.52 percentage points better than
the men. Was this due to chance, or do women in general perform better on
the homework? The standard deviations were 25.08 for the men and 19.54
for the women.
We first have to decide what the populations we are trying to compare
are. The students in the class are not a random sample from the entire
population of the United States, nor even of the university. It does seem
reasonable to think of these people as a random sample of all the people
who take STAT 100 now or will do so in the near future. We will go on that
supposition. The null hypothesis is that the average percentage scores on
homework are the same for the populations of women and of men:
H0 : The mean homework score for the population of women
⳱ the mean homework score for the population of men
We will follow the six steps, except that we will introduce a new z statistic
to replace the bootstrap simulations. That is, we will develop a z test for
comparing the means. Although good practice for educational purposes to
develop the bootstrap-based steps 1–4, we will, as before with z-testing, not
need to do any simulations.
1. Choice of a Model (Populations): The two populations are the women
who take STAT 100 and the men who take STAT 100. The null hypothesis
is that the two populations have the same theoretical mean. We invent two
populations so that they have a common theoretical mean. The population
of men is a large replication of the sample of 40 men. Their mean is 75.04, so
to create a population of women with the same mean, we have to subtract
3.52 from each woman’s score before replication, and then replicate the new
data as often as we did for the male population (maybe 500 or 1000 times
for each original observation).
2. Definition of a Trial (Sample): A trial consists of randomly choosing 64
students without replacement from the women and 40 without replacement
from the men. (Again, we could have replaced replication and sampling
without replacement with no replication and sampling with replacement.)
3. Definition of a Successful Trial: The statistic is the difference between
the mean of the 40 men and that of the 64 women sampled in step 2. A trial
is a success if the observed difference exceeds 3.52.
4. Repetition of Trials: Instead of actually bootstrap sampling from the
step 1 population, we appeal to theory, which tells us what the theoretical
average and standard deviation of such differences of two sample means are.
The theoretical mean of the differences in the two sample means is 0 under
the null hypothesis. The theoretical standard deviation of the differences in
the two sample means is
冪theoretical variance of difference in means
⳱
冪
(theoretical SD of men)2
(theoretical SD of women)2
Ⳮ
number of men
number of women
This formula is based on a result that is fully developed in Chapter 14 and
also was discussed in Section 11.1. Here we give a brief partial justification.
First note that by theoretical SD of men we mean the theoretical SD of the
male population, and similarly for the theoretical standard deviation of
the women. Chapter 14 will show that the theoretical variance of either
the sum or difference of two independent statistics is the sum of the
individual variances. Thus
Theoretical variance of (sample average of men ⫺ sample average of women)
⳱ theoretical variance of sample average of men
Ⳮ theoretical variance of sample average of women
⳱
theoretical variance of male population theoretical variance of female population
Ⳮ
number of men in sample
number of women in sample
⳱
(theoretical SD of men)2
(theoretical SD of women)2
Ⳮ
number of men in sample number of women in sample
The women’s and men’s theoretical standard deviations in the formula are
estimated from the corresponding sample standard deviations computed
from the women’s and men’s data. Thus for our data
Estimated SD of the differences in means ⳱
冪
(25.08)2
(19.54)2
Ⳮ
40
64
⳱ 冪21.69
⳱ 4.66.
Now we can define the z statistic, using the fact that the subtracted 0 is
the hypothesized difference under the null hypothesis:
z⳱
Difference in sample means ⫺ 0
Sample SD of difference in means
3.52
4.66
⳱ 0.76
⳱
5. Estimation of the Probability of the Obtained Difference in Means
or More (Probability of Success): We wish to calculate the chance that
the difference in means is equal to or exceeds 3.52, which is approximately
the chance that a standard normal exceeds z ⳱ 0.76, so all we need to do is
look up 0.76 in the normal table (Appendix E). The area to the right of 0.76
is 0.7764, so the area to the left is 0.2236. That is, there is approximately a
0.22 chance of seeing a difference of 3.52 just by chance.
6. Decision: The chance calculated in step 5, 0.22, shows that it is not unusual to see such a difference when the null hypothesis that the populations
have the same means is true. Thus we accept the null hypothesis.
As an educational “thought experiment,” imagine how you would do a
bootstrap-based test of the above hypothesis.
SECTION 12.6 EXERCISES
1. To test the effectiveness of the new weightloss drug Redux, 40 women were split into
two groups: group A, the control group, took
a placebo drug, and group B, the experimental
group, took Redux. The amount of weight
lost by each member of the groups over a sixmonth period is given below. Explain how
you would test the claim that the population
mean weight loss of the two groups is the
same. (Hint: You cannot use the z test because
the sample sizes are not large enough.)
Group A:
3
4
5
6
7
10 11 12 15 18
19 20 23 24 25
30 33 38 40 42
Group B:
5
5
7
7 10
10 10 10 15 18
20 24 28 29 38
42 44 49 50 55
2. The following is a stem-and-leaf plot of 100
simulated differences between the population
means of the amount of weight lost by the
two groups in Exercise 1 (group A ⫺ group
B). (Group A was used as the control group
in the simulation.) Each difference is based on
a bootstrap sample of 20 from the invented
control population and 20 from the invented
treatment population. Using the stem-andleaf plot, test the claim that the difference
between the population means is 0.
Stem
Leaf
Stem
Leaf
⫺11
⫺10
⫺9
⫺8
⫺7
⫺6
⫺5
⫺4
⫺3
⫺2
⫺1
32
54
0
20
74322
840
955431
987443
87630
88775441
885410
⫺0
0
1
2
3
4
5
6
7
8
993
00126789
001356679
02233589
00335
02244478
45679
5799
234
0
Key:
“8
0” stands for 8.0.
3. Scientists want to study the effect of exercise
on the amount of weight loss. One hundred
people are randomly divided into two equal
groups. Both groups follow the same diet plan,
but the first group also follows an exercise program. In the exercise group the mean weight
lost over three months was 25.2 pounds with
a standard deviation of 10 pounds. In the
nonexercise group the mean weight lost was
20.4 pounds with a standard deviation of 6.3
pounds. Test the claim that the difference between the population means is 0.
4. On a college campus, a professor wanted to
test the effect on learning of using a computer
program to teach calculus. In his first class of
43 students, he taught using the computer as
an instructional aid. In his second class of 35
students, he taught without using the computer. On the final exam the computer class
scored a mean of 78.32 with a standard deviation of 8.07 while the noncomputer class
scored a mean of 80.41 with a standard deviation of 8.53. Test the claim that the difference
of the population means is 0.
5. Fifty independent measurements of the
weight of a chemical compound were made on
each of two scales. On the first scale the mean
of the 50 measurements was 19.45 grams with
a standard deviation of 0.49 gram. On the second scale, the mean of the 50 measurements
was 18.42 grams with a standard deviation of
0.27 gram. Test the claim that the difference
between the population means is 0.
6. One hundred and fifty high school students
were randomly assigned to two groups of 75.
Group A used a new math text, and group
B used the usual math text. The mean and
standard deviation of the SAT math scores
for group A were 549.45 and 21.12, and the
mean and standard deviation for group B
were 539.25 and 20.91. Test the claim that the
difference between the population means is 0.
For additional exercises, see page 732.