Download Read the textbook

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
15.4 ONE-WAY ANALYSIS OF VARIANCE
One-way analysis of variance refers to the case in which we are testing
the equality of the theoretical means of multiple populations distinguished
by differing levels or categories of one variable (or factor), such as various
teaching methods, for example. Every ANOVA example we have considered
so far has been one-way. For example, in the vehicles data, the populations
were distinguished by a single variable, namely the type of vehicle. This
kind of nonnumerical variable is called a nominal variable. Nominal variables
occur often in certain areas of study, such as sociology, in which people
are categorized in many ways to better understand societal functioning. In
the cancer patient data of Section 15.2, the populations were distinguished
by type of cancer. We will later consider two-way ANOVA, where populations will be distinguished by two variables, such as gender and highest
educational degree attained.
When legitimate to do so, we would like to use a statistic with a
specific probability distribution that we can look up in a table to test a null
hypothesis rather than use the bootstrap approach. Recall that when we
tested hypotheses previously, we sometimes used the normal distribution to
test for significance. When our assumptions about the real-world setting of
our hypothesis-testing problem allow it, such a distribution-based procedure
is simpler to utilize than the bootstrap procedure and will often produce
more reliable and powerful results. But if we do not believe we can justify
the required assumption, we have to fall back on bootstrapping, which
indeed usually works well. Thus the bootstrap approach is an extremely
valuable tool for the statistician to have in his or her toolbox of statistical
procedures.
We will develop the F distribution to provide the probability distribution
of the ratio of the between-samples mean square over the within-samples
mean square under the assumption that the null hypothesis of equal
population means is true. For use of the F distribution to be justified,
however, some assumptions must hold (there is always some tradeoff for
simpler methods; for example, they are less widely applicable):
1. Each observation must be independent of the others. Practically speak-
ing, this means that the data should be collected in such a way that
each observation is not influenced by any of the previously collected
observations. This assumption of independence between observations
is needed for the bootstrapping approach as well, because its random
sampling from the sample is justified only if the observations of the original sample were independent of each other. Indeed, almost all statistical
approaches taught in a first course in statistics presume independence
of observations. Why good statistical experimental design and statistical
survey sampling procedures make the assumption of the independence
of the observations in the sample a reasonable assumption was discussed
thoroughly in Chapter 10.
2. When the sample sizes are small for the populations being sampled from,
the user must have reasons to assume for each sampled population that
the observations are sampled from a roughly normal distribution (less
than 20 individuals per sample is our rule-of-thumb definition of a small
sample size). However, if the sample sizes are large (greater than 20),
there is no restriction about what the population shapes need to be,
because the sample mean for each such population is approximately
normally distributed by the central limit theorem regardless of the shape
of the population being sampled from. Moreover, this approximate
normality of sample means suffices to justify the F-distribution approach
even when the populations do not have a normal distribution shape.
3. The within-sample variability should be about the same across all samples. Assumption 2 above refers to the shape required of the distribution
of the population that a sample came from when sample sizes are small.
By contrast, assumption 3 specifies that these population shapes should
have approximately the same spread, which we assess using the various
sample standard deviations for convenience. This restrictive assumption
is not always true and is important to assess, the usual method being to
compare the standard deviations of the samples.
The third assumption can be violated in many settings. Consider, for
example, the cancer data from Section 15.2. The data show that the variability
of survival time for those with breast cancer is clearly much larger than the
variability for the other two types of cancer (compare the three standard
deviations as an informative exercise). So, although using the bootstrap
approach was justified and worked well, the F-distribution approach is
not appropriate, even though assumptions 1 and 2 may be argued to be
satisfied. There is no clear line as to when this last assumption is violated.
However, a rule of thumb sometimes suggested is to take the ratio of the
largest to the smallest sample standard deviation and require this ratio to be
less than two. We will adopt this rule of thumb.
If the above three assumptions hold and the null hypothesis of equal
population means holds, then the ratio of the between-sample mean
square divided by the within-sample mean square has the F distribution of
Table H.
The F Distribution
We now discuss the F distribution explicitly. The mean squares we find are
computed from the samples, and thus are random and hence will vary from
one set of samples to the next even when the null hypothesis is true. Thus,
even when populations have the same theoretical mean in common, when
we randomly sample from these populations, we do not expect the resulting
ratio of the mean squares to be exactly one, even though a theoretical
analysis shows that the expected value of the numerator mean square over
the expected value of the denominator mean square is 1. The point is that the
ratio will fluctuate around 1 when the null hypothesis of equal population
means is true. Thus variations in the data resulting from random sampling
will always make the ratio different from 1 when the null hypothesis is true.
Assuming the null hypothesis is true, what the F distribution does for us is to
give us an idea of how large the ratio must be to fall outside the range of the
typical random fluctuation of the ratio, and thus for ratio values sufficiently
distant from 1 lead us to conclude that the populations from which the mean
squares were taken do not have the same theoretical means. It is perhaps
evident that the numerator (the between-samples mean square) will tend to
be larger than the denominator (the within-samples mean square) when the
theoretical population means differ. The reason is that a difference between
population means will tend to increase the between-samples mean square
while not influencing the within-samples mean square. It is because of this
that we consider large values of the ratio as persuasive evidence that the
null hypothesis of equal population means is false.
The F distribution has two parameters that characterize it completely,
just as the normal distribution is characterized by its population mean and
variance. These are the degrees of freedom associated with the numerator
mean square, and the degrees of freedom associated with the denominator
mean square. In Figure 15.4 a typical F distribution with 4 numerator degrees of freedom and 10 denominator degrees of freedom is shown. Increasing the denominator degrees of freedom tends to conserve the general
shape but shrink the curve somewhat toward 0. Increasing the numerator
degrees of freedom slowly moves the shape toward a more normal shape
and expands the curve away from 0.
The F Distribution Tables
Appendix H in the back of the book gives the upper 5% (Table H.1) and
1% (Table H.2) points for the F distribution for various combinations of
numerator and denominator degrees of freedom. That is, the upper 0.05 and
0.01 probability points are given. Part of Table H.1 is reproduced as Table 15.2
for convenience. The row across the top gives the numerator degrees of
freedom, and the first column gives the denominator degrees of freedom. To
use Table H.1, find the desired row and column. The corresponding number
in the interior of the table is the 5% point (that is, the point for which the
probability of being to its right is 0.05) in the corresponding F distribution.
0.6
0.4
0.2
0.0
0
2
3.48 4
6
8
Figure 15.4
F distribution with 4 numerator degrees of freedom and 10 denominator degrees of freedom.
Table 15.2
Portion of Upper 5% F Table
Denominator
degrees of
freedom
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
161.45
18.51
10.13
7.71
6.61
5.99
5.59
5.32
5.12
4.96
199.50
19.00
9.55
6.94
5.79
5.14
4.74
4.46
4.26
4.10
215.71
19.16
9.28
6.59
5.41
4.76
4.35
4.07
3.86
3.71
224.58
19.25
9.12
6.39
5.19
4.53
4.12
3.84
3.63
3.48
230.16
19.30
9.01
6.26
5.05
4.39
3.97
3.69
3.48
3.33
233.99
19.33
8.94
6.16
4.95
4.28
3.87
3.58
3.37
3.22
236.77
19.35
8.89
6.09
4.88
4.21
3.79
3.50
3.29
3.14
238.88
19.37
8.85
6.04
4.82
4.15
3.73
3.44
3.23
3.07
240.54
19.38
8.81
6.00
4.77
4.10
3.68
3.39
3.18
3.02
241.88
19.40
8.79
5.96
4.74
4.06
3.64
3.35
3.14
2.98
Numerator degrees of freedom
For example, the 5% point for an F distribution with 4 numerator degrees
of freedom and 10 denominator degrees of freedom is 3.48. Observe this
value for the curve in Figure 15.4. This means that if we were considering an
observed ratio of mean squares greater than 3.48, with 4 and 10 degrees of
freedom in the numerator and denominator, respectively, we would have
statistically significant evidence to conclude that the population means are
not equal. In other words, the distribution under the null hypothesis of
equal population means says that a value greater than 3.48 for the ratio is
sufficiently unlikely in the case when the theoretical means are equal that
we should conclude that they are not equal.
One-Way ANOVA Using the F distribution
We are now prepared to use the F distribution to test for equality of means
in a one-way analysis of variance. Let’s consider the Key Problem presented
at the beginning of this chapter. Before we perform an ANOVA, however,
certain aspects of the hot dog data collection are instructive to think about,
in view of our emphasis in Chapter 10 on the use of randomization to obtain
data useful for statistical analyses.
First, let’s suppose that each of the 51 measurements in the Key Problem
is the result of some sort of chemical analysis of one randomly chosen hot
dog of the particular type (e.g., Oscar Mayer “All Beef”) of hot dog being
analyzed. Unlike an elementary physics experiment, such as observing
the distance a constant-speed object moves in 10 seconds, where random
aspects of collected data cause little error, here there are many reasons why
the 51 observations have a sizable random variability. First, measurement
error, discussed in Chapter 9, is likely a major contributor to the variability
occurring in the data, because measuring the number of calories in a hot dog
is a complex process. Second, there will be random variation in the number
of calories from hot dog to hot dog in the same package, and third, variation
will occur from package to package of the same brand (for example, one
package may have been made from a batch of leaner turkey meat than
another). Fourth, there will be variation from brand to brand within the
same meat type population. So there are four major sources of variation
in the observed number of calories of a particular meat type of hot dog.
We model each of the 17 observations of one meat type as being a simple
random sample from its meat type population, each such population likely
having a sizable variance.
This assumption of simple random sampling presumes that each hot
dog of the population of, for example, all beef hot dogs has an equal
chance of being chosen. However, as was also the case for the multi-
stage stratified National Opinion Research Center General Social Survey
data of Section 10.3, the random sampling was stratified and done in stages.
Stage 1 is the choice of 17 all beef hot dog brands, from the strata of all beef
hot dog brands of interest (which may just be these 17 or may be many more
than these 17). Then Stage 2 is the random choice of one package of hot
dogs from each of the 17 brands. Next, Stage 3 is the random choice of one
hot dog from each of the sampled packages. This is one way the random
sampling of the 51 hot dogs could have been carried out (other stratified
random sampling plans being possible). For our purposes, we intentionally
ignore this sampling complexity and act as if the sampling was simple
random sampling of hot dogs from each of the three meat type populations.
Thus, for example, each manufactured all beef hot dog is presumed to be
equally likely to have been chosen among the 17 actually sampled all beef
hot dogs.
Example 15.7
We want to perform the one-way ANOVA procedure on the hot dog data presented
in the Key Problem. To start, we will attempt to justify that the three assumptions
hold in this case, for if so then we can take an F-distribution approach. First, it
is reasonable to assume that the number of calories in one sampled hot dog is
independent of the number of calories in the other sampled hot dogs because
the hot dogs have been randomly sampled from different populations. Second,
for each of the three populations, the observations should have been sampled
from a population that is at least roughly normally distributed. To informally
investigate this, three histograms can be created from the three samples. We omit
details of this, but the histograms do not provide evidence that we should reject
the assumption that each population is roughly normally distributed, especially
since the F test of a one-way ANOVA is somewhat robust against departures
from population normality that are not too extreme. By robust, we mean here
that moderate departures from population normality do not make the procedure
behave very differently than what the F table tells us.
Finally, the last assumption is that the population standard deviations corresponding to the three samples are about the same. This assumption is certainly
reasonable, since the sample standard deviations of the beef, poultry, and meat
combination samples are 22.24, 21.87, and 24.48, respectively, producing a ratio of
largest over smallest sample standard deviation much less than the criterion value
of two.
We therefore proceed with testing the hypothesis
H0 : All three types of hot dogs have the same population average.
using the standard F-distribution-based ANOVA procedure. First, we want to
calculate the between-samples sum of squares. The sample averages for the beef,
poultry, and meat combination samples are 160.1, 118.8, and 158.7, respectively.
The overall average of the 51 hot dogs is 145.87, gotten easily by averaging the three
sample averages; this shortcut is permitted here because each sample is the same
size (17). So,
Between-samples sum of squares ⳱ 17(160.1 ⫺ 145.87)2 Ⳮ 17(158.7 ⫺ 145.87)2
Ⳮ 17(118.8 ⫺ 145.87)2 ⳱ 18698.07
Next, the within-samples sum of squares is calculated as follows:
Within-samples sum of squares
⳱ (186 ⫺ 160.1)2 Ⳮ (181 ⫺ 160.1)2 Ⳮ ⭈⭈⭈ Ⳮ (131 ⫺ 160.1)2
Ⳮ (129 ⫺ 118.8)2 Ⳮ (132 ⫺ 118.8)2 Ⳮ ⭈⭈⭈ Ⳮ (144 ⫺ 118.8)2
Ⳮ (173 ⫺ 158.7)2 Ⳮ (191 ⫺ 158.7)2 Ⳮ ⭈⭈⭈ Ⳮ (138 ⫺ 158.7)2 ⳱ 26735.53
The degrees of freedom associated with the between-samples sum of squares
is 3 ⫺ 1, since there are three samples. The degrees of freedom associated with the
within-samples sum of squares is 51 ⫺ 3 ⳱ 48, since there are 51 total observations
and 3 populations being sampled from. The degrees of freedom associated with the
total sum of squares is 51 ⫺ 1, since there are 51 observations. The following table
shows all the results, including the corresponding mean squares and the F statistic,
the ratio between the mean squares.
Sum
of squares
Degrees
of freedom
Mean
square
Between samples
Within samples
18,698.07
26,735.53
2
48
9349.04
556.99
Total
45,433.60
50
Source
F
16.78
This table is called the analysis of variance table or ANOVA table. It is the form
commonly provided by statistics software packages to summarize the results of the
ANOVA procedure—for example, SPSS, SAS, or Minitab. Note that, as we have
stated previously, the total sum of squares equals the between-samples sum of
squares plus the within-samples sum of squares. Likewise, the total sum of squares
degrees of freedom equals the between-samples degrees of freedom plus the within
samples degrees of freedom. The mean square associated with the total sum of
squares is not included in the table, since it is of no significance. Likewise, the only
entry in the F column that has any meaning is the ratio of the between-samples over
the within-samples mean squares, which is placed in the first row. That is because
it is this ratio that obeys the F distribution of Table H when the null hypothesis of
equal population means holds.
Finally we have reached the point where we want to compare the computed
F statistic, 16.78, to the F distribution having 2 numerator and 48 denominator
degrees of freedom. Looking at Table H.1, we see that the point that 5% of the area
is above for that curve is 3.19. Similarly, it can be seen in Table H.2 that the 1%
point for the F distribution is 5.10. Thus, we reject the null hypothesis, since the
observed ratio of 16.78 is so much larger than what we would expect the ratio to be
if the null hypothesis were in fact true. Indeed, the probability of an F ratio higher
than the observed 16.78 is extremely close to 0, and hence the statistical evidence is
extremely strong that the null hypothesis of equal population means for the three
meat types is false.
Designing Experiments
In many cases the between-samples sum of squares is instead called the
between-treatments sum of squares. The term treatments comes from the fact
that the ANOVA method is often used in cases where we have designed
and performed a statistical experiment involving different treatments using
randomization, as discussed in Chapter 10. For example, a group of subjects
may be gathered for a medical study. Some are randomly given treatment
1, some given treatment 2, and the rest (the control group) receive a notreatment placebo (treatment 3). Then all of the patients are assessed to
see how they responded to the treatments, and the ANOVA procedure is
used to test whether the different treatments (three in number, including
the control) produced significantly different results.
If we are designing an experiment in which the results will be analyzed
using ANOVA, one thing that should be done at the design stage of the
experiment is to randomly assign the treatments to the different units. For
example, in the medical study, we should take the randomly selected group
of patients and randomly choose one third of them to receive treatment 1,
and so on. As discussed in Chapter 10, randomization is essential because
there are many differences between units that affect the medical outcome
being evaluated that we may never be aware of, and we could never
adequately account for them in our analysis. The only major difference we
want between units (after averaging over all units receiving the same treatment)
in different treatment groups is the possible influence of having received a different
treatment. By randomly assigning treatments to people, we are able to make
negligible the possibility that any known or unknown variable not of interest
is at widely differing levels in the multiple treatment groups.
SECTION 15.4 EXERCISES
1. For each of the following F distributions, what
is the point above which lies 5% of the area?
a. Numerator df ⳱ 5, denominator df ⳱ 10
b. Numerator df ⳱ 3, denominator df ⳱ 12
c. Numerator df ⳱ 10, denominator df ⳱ 4
2. a. Complete the ANOVA table labeled Table
E1.
b. Would you reject the null hypothesis that
there are no significant differences among
the population means in this case?
Table E1
Source
Between samples
Within samples
Total
Sum
of squares
Degrees
of freedom
Mean
square
4823.45
26,591.01
?
?
22
26
?
?
3. A consumer group wanted to test for differences in the lives of three different brands
of batteries. Six batteries of each brand were
obtained and were then used in the same
electronic device. The length of life in hours
was measured. The results were as follows:
Brand A:
Brand B:
Brand C:
115.76, 107.92, 103.73, 114.14,
113.51, 110.87
121.82, 127.45, 122.24, 125.74,
124.02, 113.39
106.99, 107.78, 103.78, 112.32,
106.46, 120.77
a. State the null hypothesis of the researcher.
b. Does the assumption of equal variances
among samples seem to be valid in this
case?
c. Form the ANOVA table for this case.
d. Should the null hypothesis be rejected?
4. Four methods of weight loss were being compared. Subjects were chosen and randomly
assigned to each of the four groups. After
two weeks of treatment, the weight loss was
measured. The results in pounds lost were as
follows:
Method A: 3.67, 2.52, 4.88, 2.73, 4.63
Method B: 2.65, 3.51, 5.81, 2.24, 4.47
Method C: 2.59, 1.91, 0.96, 2.27, 2.80
Method D: 4.45, 3.34, 2.52, 3.72, 3.68
a. State the null hypothesis of the researcher.
b. Does the assumption of equal variances
among samples seem to be valid in this
case?
F
?
c. Form the ANOVA table for this case.
d. Should the null hypothesis be rejected?
5. A manufacturer was testing four different materials for the construction of yarn, comparing
the strength of the yarn made from each. A
sample of seven pieces of yarn for each material was obtained, and the strength was measured by seeing how much weight it could
hold before breaking. The results were as follows:
Material A: 9.84, 9.39, 9.70, 9.54, 9.78, 8.82, 10.23
Material B: 9.94, 9.56, 9.84, 8.98, 9.23, 8.83, 9.58
Material C: 8.10, 8.62, 8.31, 8.66, 8.87, 8.32, 8.95
Material D: 8.38, 7.79, 8.37, 8.23, 8.38, 9.11, 8.09
a. State the null hypothesis of the researcher.
b. Does the assumption of equal variances
among samples seem to be valid in this
case?
c. Form the ANOVA table for this case.
d. Should the null hypothesis be rejected?
6. Explain why it is important that subjects
be randomly assigned to different treatment
groups when an ANOVA experiment is being
designed.
7. a. Create the ANOVA table for the data in
Exercise 5 of Section 15.3.
b. Find the value from the F table for level of
significance ⳱ .05.
c. What is the null hypothesis tested by the F
test?
d. Do you accept or reject the null hypothesis?
e. Compare your answer in (d) to that in (d)
of Exercise 4 in Section 15.2.