Download here - BCIT Commons

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Pattern recognition wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
MATH 2441
Probability and Statistics for Biological Sciences
Analysis of Variance (ANOVA)
(Comparison of Many Population Means)
Analysis of Variance or ANOVA is the name given to a rather extensive set of statistical techniques used in
the inference of differences in the means of three or more populations. The reason for this somewhat
unexpected name being applied to methods for comparing population means will become clear as we go
along. In this document, we present and illustrate the simplest situation addressed with ANOVA methods -when the populations in question are distinguished by different instances of a single feature. Further study
of the methods will then be deferred until MATH 3441 for students in the Food Technology Program.
An Example
These days, people are quite interested in reducing the amount of fat in their diet. One of the foods that has
often been seen as a source of avoidable fat is eggs -- particularly the egg yolk. Suppose that a
technologist wishes to determine whether the amount of fat in the egg yolk can be influenced by the diet of
the hens. She prepares four different blends of food: blend 1, blend 2, blend 3, and blend 4. Each of a
group of hens is randomly assigned one of the food blends for a period of several weeks. Then, five eggs
are selected at random from chickens fed each blend of food, and the percent fat in the yolk of each
selected egg is measured. The data obtained is:
egg #1
egg #2
egg #3
egg #4
egg #5
sample size (n):
sample mean
sample standard deviation
blend 1
36.3
34.1
33.4
33.9
39.4
blend 2
34.5
39.1
34.3
30.3
31.8
blend 3
28.3
28.0
30.9
29.6
28.6
blend 4
31.6
36.3
32.5
41.5
33.1
5
35.42
2.487
5
34.0
3.350
5
29.08
1.182
5
35.0
4.042
For convenience, we've also included some statistics in this table: the sample sizes (all equal at 5), the
sample means, and the sample standard deviations.
Now, this is a problem involving four populations of egg-laying hens; the populations distinguished by the
identity of the blend of food eaten by their members. The different food blends here are examples of
different levels of the factor food in the experiment. Because there is just one factor that distinguishes the
populations in the problem, our analysis below will be an example of single-factor ANOVA or one-way
ANOVA.
(If we had also distinguished hens on the basis of, say, variety of hen, as well as on the basis of which diet
they were fed, then we would have a two-factor experiment. It is not uncommon for experimental studies to
involve two or three different factors, but the calculations and interpretation of results becomes much more
difficult as the number of factors increases. In this document we will consider only single factor situations.)
The question to be answered here is whether or not the data above is evidence that the true mean
percentage fat in egg yolks is different for hens eating different ones of the four blends of food used in the
experiment. We see that some of the sample means are quite similar, whereas others are quite different.
The basic issue is: are any of the differences in the sample means obtained from this data big enough to
infer a statistically significant difference in the corresponding population means.
First Try: Looking at the Populations One-at-a-Time
In order to compare the true means of the four populations in this example, you might think of starting by
constructing, say, 95% confidence interval estimates of each mean. If we use the symbols 1, 2, 3, and 4
to denote the true mean fat percentages of egg yolks from hens fed, respectively, blends 1, 2, 3, or 4, and
© David W. Sabo (2000)
Analysis of Variance (ANOVA)
Page 1 of 10
similar subscripts on the symbols x and s to denote the corresponding sample means and standard
deviations, then the formula for these interval estimates is
k  xk  t
sk
2
,nk 1
nk
@100(1   )%
(1)
This formula is valid only if the populations from which the samples were selected are approximately
normally distributed. We have no evidence supporting or opposing that assumption because the samples
are so small. However, this condition is a requirement of all methods to be considered in this document.
We will simply assume the condition is met.
Using  = 0.05, and taking note that all four sample sizes, n k are equal to 5, the t-table gives the critical
value t0.025,4 = 2.776. Then, for instance, for k = 1, the first population/sample, we get
1  x1  t0.025,4
s1
@ 95%
n1
 35.42   2.776
2.487
5
 35.42  3.09 @95%
It is probably more useful to write this in the form of an actual interval:
32.33  1  38.51 @95%

3
2
1
In a similar manner, we can also determine that
29.84  2  38.16
@95%
27.61  3  30.55 @95%
29.98  4  40.02 @95%
26.00
31.00
36.00
41.00
These intervals are sketched accurately in the figure
to the right. You can see that the only nonoverlapping intervals are those for 1 and for 3. All
of the others overlap to a greater or lesser degree.
Based on these calculations, we could thus say that it appears as if eggs from hens fed blend 1 have less fat
in the yolks than do the yolks of eggs from hens fed blend 3, but we aren't able to make any other definite
statements.
However, you need to realize that the interval estimates computed above are rather crude. They are each
based on quite small samples since each estimate results from just five observations. While there is a 95%
probability that each of these intervals does capture the true mean value being estimated, the precision of
the estimates is quite poor. With this approach the best we can say is that if two intervals do not overlap,
then very likely the corresponding population means are unequal. However, if two intervals do overlap
somewhat, we are not able draw any conclusion about the relative values of the corresponding population
means. Because of these defects, the single-population approach illustrated above is not a recommended
method.
A Second Try: Comparing the Population Means Pair-wise
We can get somewhat better results if we apply the procedures developed earlier for comparing the means
of two populations to each pair of means that arise from the four population means in this example. There
are six pairs to consider in all: 1 - 2, 1 - 3, 1 - 4, 2 - 3, 2 - 4, and 3 - 4. The samples are small and
independent, so we must assume that the populations are all approximately normally distributed, and that
their variances are equal. This latter assumption is dubious in a couple of pairings involving population 3,
but to avoid undue complexity, we will proceed on that second assumption as well.
Page 2 of 10
Analysis of Variance (ANOVA)
© David W. Sabo (2000)
The procedure should be familiar by now, but we will review it for one pair briefly: 1 - 2. We must first form
the pooled standard deviation, sp:
sp2 
 n1  1 s12   n2  1 s22
(2)
n1  n2  2

 5  1 2.4872   5  1 3.3502
552
 8.7038
Thus
sp  sp2  8.7038  2.950
Now, one way to determine if the data will support a conclusion that 1 - 2  0 (that is, that the two
populations means are different), would be to test the hypotheses:
H0: 1 - 2 = 0
vs.
HA: 1 - 2  0
The appropriate standardized test statistic is
t
x1  x2  0
sp
1 1

n1 n2

35.42  34.00  0
 2.950
1 1

5 5
 0.761
Now, this is a two-tailed hypothesis test, and so H0 can be rejected at a level of significance of 0.05 if
|t| > t0.025,8 = 2.306 (remember, the degrees of freedom are now n1 + n2 - 2 = 5 + 5 - 2 = 8) . Clearly that
condition is not met here, since |0.761| = 0.761 is not greater than 2.306. Thus, we conclude that the data is
not adequate evidence to conclude that 1 is different from 2 at a level of significance of 0.05.
The same conclusion could have been obtained by constructing a 95% confidence estimate of 1 - 2:
1  2  x1  x2  t , sp
2
1 1

n1 n2
@ 100(1   )%
 35.42  34.0   2.306 2.950
 1.42  4.30
(3)
1 1

5 5
@ 95%
or, in the form of an interval
2.88  1  2  5.72 @ 95%
Clearly, since this interval straddles the value zero, we are unable to eliminate the possibility that the two
population means are equal -- a result echoing the result obtained using the hypothesis testing approach
just above.
The results of these calculations for all six pairs of populations is shown in the following table:
comparison
1 - 2
1 - 3
1 - 4
2 - 3
© David W. Sabo (2000)
t
0.761
5.148
0.198
3.097
(p-value)
(0.47)
(0.00088)*
(0.85)
(0.015)*
95% confidence Interval
estimate
-2.883  1 - 2  5.723
3.500  1 - 3  9.180
-4.475  1 - 4  5.315
1.257  2 - 3  8.583
Analysis of Variance (ANOVA)
Page 3 of 10
2 - 4
3 - 4
-0.426
-3.143
(0.68)
(0.014)*
-6.414  2 - 4  4.414
-10.263  3 - 4  -1.577
For any pair of means for which |t| > 2.306, we can
reject the null hypothesis that the population
means are equal at a level of significance of 0.05.
The confidence intervals are sketched in the figure
to the right as well to give a bit more of a visual
impression of their relative sizes and positions.
Remember that intervals which do not straddle the
vertical axis correspond to pairs of populations for
which the difference of the means is non-zero at
this confidence level.
3 - 4
2 - 4
2 - 3
1 - 4
1 - 3
1 - 2
Now, from both the hypothesis test results and
-15.000 -10.000 -5.000 0.000 5.000 10.000 15.000
(equivalently) the estimates of the differences of
the various population means, we see that three
pairs of population means are different to a
statistically significant degree here: 1 and 3, 2 and 3, and 3 - 4. The confidence intervals of the
corresponding differences are the ones which do not overlap the vertical axis in the figure just above. It
looks like these results allow us to say that feed blend 3 results in egg yolks with a statistically significantly
lower fat content than does any of the other three diets. However, there is no statistically significant
difference in fat content of yolks of eggs from hens fed any of blends 1, 2, or 4. This is probably a useful
result as far as this specific problem.
However, there is a fairly serious defect with this approach. We tested six separate hypotheses, and were
able to reject three of them at a level of significance of 0.05 (in this example). We can refer to the 0.05 as
the pairwise level of significance, since it represents the probability that the rejection of the null hypothesis
for a pair of means will be mistaken. However, a more useful measure of the reliability of our analysis would
be to calculate the experiment-wise level of significance, EW , the probability that at least one of the
rejections of H0 is in error. In the specific problem above, where we declared three null hypotheses rejected
at a pairwise level of significance of 0.05, this probability is
Pr(at least one error) = 1 - Pr(no errors were made)
= 1 – (0.95)3  0.1426
Here, it was easier to calculate the probability of no errors being made than of ‘at least one error’, since the
event ‘at least one error’ corresponds to the compound event ‘one error or two errors or three errors.’ If the
probability of error for one pair is 0.05, then the probability of not making an error is 0.95.
Anyway, the point here is that the probability that we made at least one error in rejecting three of the null
hypotheses here is quite a bit larger in principle than the value of  used for each pairwise comparison. (For
this particular example with the specific data given, the situation is not as bad as indicated, since from the pvalues calculated for the three rejected null hypotheses, we get what amounts to an experiment-wise pvalue of 0.029 – however, you cannot assume that such favorable numbers will always occur in applications,
and so the “worst case” must be considered.) When people speak of EW , they normally have in mind the
value that would result if all possible pair-wise null hypotheses were rejected, each at a level of significance
, and this gives surprising large values for EW even when a relatively small number of populations is
considered.
To see this, note first that if we are dealing with k populations, then the number of unique pairs, C, of
populations to be compared is given by
C
k  k  1
2
For instance, in the example above, we had k = 4, which resulted in C = 4(3)/2 = 6 pairs of populations.
Then, if each pairwise comparison is done via a hypothesis test with a level of significance , we get that
Page 4 of 10
Analysis of Variance (ANOVA)
© David W. Sabo (2000)
EW  1  1   
C
(4)
For  = 0.05, we get the following sort of results:
k
3
4
5
6
7
8
9
10
15
EW
0.143
0.265
0.401
0.537
0.659
0.762
0.842
0.901
0.995
C
3
6
10
15
21
28
36
45
105
Again, the numbers in the third column of this table are an upper limit to the true probability of making at
least one error in comparing the means of all possible. pairs of populations. However, because these values
of EW potentially get so much larger than  so quickly as the number of populations increases, this strict
pair-wise approach to comparing the populations in this sort of problem is not considered sound.
The thing is, these two approaches pretty well exhaust the use of sample means directly to detect
differences between corresponding population means. It is by looking in more detail at the various
contributions to the variance of the data that we get a method which more reliably detects the presence of
differences between the means of several populations – hence the name analysis of variance or ANOVA.
Single-Factor Analysis of Variance
Because the eventual test statistic calculated here depends on all of the data in the experiment, we need to
develop a notation which can keep track of the details adequately. Some of the following notation has
already been established.
k = the number of populations being considered
nj = size of the random sample selected from population #j. There are k of these: n 1, n2,
and so on to nk
nT = total number of observations in all samples together (equals the sum of the n j values)
xij = the observation #i in sample #j. For sample #j, there will be n j of these: x1j, x2j, x3j,
and so on up to xnj,j
x j = the mean of sample #j. x j 
1
nj
m n j
x
(5)
mj
m 1
x = the mean of all of the observations in all of the samples. Note that this is not equal to
the mean of the sample means unless all samples are the same size.
m n j
sj2 = the variance of sample #j. s 
2
j
x
2
mj
 n j x j2
m 1
nj  1
(6)
We’ll introduce other symbols as we go along.
We note three assumptions that underlie ANOVA methods:
(i) the k populations are assumed to be approximately normally distributed
(ii) the variances of all k populations are assumed to be equal
(iii) the samples from the k populations are independent
© David W. Sabo (2000)
Analysis of Variance (ANOVA)
Page 5 of 10
It is also a good idea to try to have all sample sizes equal, but there are ways to deal with unequal sample
sizes. From these conditions, you can see that ANOVA methods are designed to distinguish between
normally-distributed populations which differ simply in their mean values.
The principle that ANOVA will exploit is based on the following notion. If all k populations have essentially
identical means, then the k samples will essentially be sampling identical populations. If the k samples are
pooled to form one large set of nT observations, the mean and variance of that larger set will be very similar
to the means and variances of the individual samples. On the other hand, if the k populations have very
different mean values, then the variance of the pooled data will be much larger than the variance of the
individual samples because most of the observations will be quite different from the pooled mean, x . We
will come up with a single number that “measures” the difference between these two extreme situations.
The effect described in the previous paragraph shows up if we look at just the numerator in the formula for
the variance, which simplifies the algebra considerably. Now, if we put all of the data together into one big
list, it will have the mean value, x , and its variance, s2TOT is given by the formula:
 nj
2
 x 
j 1  m 1


nT  1
   x
k
mj
2
sTOT
(7)
We will work with just the numerator of the rather forbidding expression on the right-hand side, which will be
identified by the symbol SST for “Sum of Squares Total”, since it consists of a sum of squares of terms taken
over the total collection of data:
makes sure that the sum includes
all k samples.
k  nj
2
SST      xm j  x  
j 1  m 1

sums over the values of  xm j  x 
2
for all of the observations xmj in
sample #j.
(8)
The double summation notation may be a bit intimidating, but simply amounts to saying that if you are going
to form the sum in SST over all observations in the experiment, you can do it by summing over all
observations in each sample, and then combining the sums from the individual samples into one grand total.
Now, do the following to the expression inside the curved brackets in formula (8):
k  nj
2
SST      xm j  x j  x j  x  
j 1  m 1

(9)
All we’ve done is subtract x j , the mean for the jth sample, and then add it in again. There is no net effect on
the value of SST, since we are subtracting and adding the same quantity, but this form allows us to group
things in new useful way. We can insert brackets to indicate how this grouping will be done:
k  nj
2
SST      xm j  x j    x j  x  
j 1  m 1

(10)
Now, we can carry out the squaring of the expression inside the inner square brackets, keeping the terms
inside the round brackets intact. This gives
Page 6 of 10
Analysis of Variance (ANOVA)
© David W. Sabo (2000)
k  nj
2 
2
SST      xm j  x j   2  xm j  x j   x j  x    x j  x   

 
j 1  m 1 
Now, it doesn’t matter if we sum the three terms inside the inner square brackets for each value of m, and
then sum those results for all values of m, or if we sum each term for each value of m, getting three
intermediate results, and then sum those three intermediate values to get the overall inner sum. Thus, the
formula above is equivalent to
nj
nj
k  nj
2
2
SST      xm j  x j    2  xm j  x j   x j  x     x j  x  
j 1  m 1
m 1
m 1

Further, all of the terms in the last two of these inner sums contain common factors which do not depend on
m, which we can factor out of the sums, to get:
nj
nj
k  nj

2
2
SST      xm j  x j   2  x j  x    xm j  x j    x j  x  1
j 1  m 1
m 1
m 1 
(11)
Now, we can deal with each of the terms in the square brackets quite easily. The first term is nothing but the
numerator of the formula for the variance of the sample from the j th population. Since
nj
s 2j 
x
mj
m 1
 xj 
2
nj
,
nj  1
x
then
m 1
 x j    n j  1 s 2j
2
mj
(12)
The second term takes a bit more work, but gives an even simpler result. Notice that we can break the sum
up into two parts:
nj
x
mj
m 1
nj
nj
nj
nj
m 1
m 1
m 1
m 1
 x j    xm j   x j   xm j  x j  1
Now, note the following:
nj
1  n
(13)
j
m 1
since we are just adding 1 to itself for every value of m between 1 and nj. Furthermore, by definition,
xj 
1
nj
nj
x
mj
m 1
Thus,
nj
nj
nj

x

x

x

x
1

 m j j   m j j   xm j   n1

m 1
m 1
m 1
m 1
 j
nj
nj
nj

x
n

x

 m j   j  m1 m j m1 xm j  0
m 1

nj
(14)
Thus, the second term in the square brackets of equation (11) is just equal to zero. Finally, the third term in
the square brackets of (11) is easily handled using the result (13), just above:
x
j
© David W. Sabo (2000)
 x
2
nj
1  n  x
j
m 1
j
 x
2
Analysis of Variance (ANOVA)
(15)
Page 7 of 10
So, putting results (12), (14), and (15) into (11), we end up with
SST    n j  1 s 2j   n j  x j  x 
k
k
j 1
j 1
2
(16)
This may have seemed like an awful lot of complex algebra which hasn’t really resulted in much of a
simplification of the formula for SST. We displayed the steps in the simplification of (11) in some detail so
that you could see how this sort of thing is done, and that despite the somewhat sophisticated notation that
is necessary, the actual algebra step-by-step is quite routine. Further, you should see that once you’ve
calculated the individual sample means and standard deviations, as well as the grand mean of all
observations in all samples, it is not too difficult to calculate SST from formula (16). However, that by itself is
not really the goal of deriving formula (16).
The immense value of formula (16) is that the terms each reflect the two extremes that may occur in an
experiment of this type.
Extreme Situation #1: This is the situation in which all of the k populations have more or less equal means
(so that the difference between their means is small compared to their variances). Then, we would expect
all of the individual sample means, x j to be very nearly equal, and also to be nearly equal to the grand
mean, x . If this is the case, then the second sum in (16) is a sum of very small values, and so will itself be
small compared to the sum of variances in the first term. Thus, in the extreme situation that the means of
the k populations are essentially identical, we expect the first term on the right-hand side of SST to be much
much larger than the second term.
Extreme Situation #2: This is the situation where the means of many of the k populations are very different
– separated by intervals which are large compared to their standard deviations (this is another way of saying
that the bell curves representing the distribution of each of the k populations do not come near to
overlapping). Then, the individual sample means will be very different, and very different from the grand
mean. This means that the sum in the second term of the right-hand side of (16) will be a sum of large
numbers (since they involve the square of large numbers in this case, which are even larger). In this
situation, the second term on the right-hand side of (16) will be dominant.
We’re looking for a measure that will distinguish between the situation in which all k populations have very
similar means from the situation in which at least one of the populations has a quite different mean value
from all the others. We see now that these two sorts of situations are distinguished by the relative values of
the two terms on the right-hand side of (16).
To make the method precise, people introduce some additional notation, which starts with symbols for each
of the terms in (16):
SST = SSW + SSB
where
k
SSW    n j  1 s 2j
(17)
j 1
measures the amount of scatter or spread within the individual populations themselves (hence the notation
SSW from “Sum of Squares Within”). If the individual populations are themselves very scattered, it becomes
more difficult to detect a difference in their means, and so SSW is a relevant quantity here.
Similarly
SSB 
nj  x j  x 
k
2
(18)
j 1
measures the amount of separation between the individual population means (hence the notation SSB from
“Sum of Squares Between”). SSB will tend to have larger values if at least some of the populations have
mean values which are quite different from the mean values of others of the populations.
Page 8 of 10
Analysis of Variance (ANOVA)
© David W. Sabo (2000)
There is one problem with working with SSB and SSW directly. Since every observation in an experiment
contributes a positive amount to each of SSB and SSW (or at least a non-negative amount), simply
increasing sample sizes will increase the values of both of these quantities, and so when we state that one
or the other is “big”, it is unclear whether the bigness is due to the properties of the populations under study
or simply due to us taking larger samples. To solve this problem, we don’t use SSB and SSW directly, but
with the corresponding quantities
MSW 
SSW
nT  k
and
MSB 
SSB
k 1
(19)
(where the M stands for “Mean”). By dividing by these denominators reflecting sample sizes and numbers of
populations, we compensate for the effects of simply including more observations in our calculations.
Finally, we can now state that the ratio
F
MSB
MSW
(20)
is a random variable which has the F-distribution with numerator degrees of freedom equal to k – 1, and
denominator degrees of freedom equal to nT – k. From the discussion so far, we know that when one or
more of the populations has mean values much different from the others, then MSB will be larger than MSW,
and so F will have a larger value, than if all of the k populations have more or less equal means (in which
case, MSW will tend to be larger than MSB and so F will have a smaller value).
Now we can state the solution of the problem of detecting the presence of differences in the means of many
populations as a simple hypothesis test:
H0: 1 = 2 = 3 = … = k
vs.
HA: at least one of these means is different from the others
Then, subject to the three conditions underlying all ANOVA methods listed earlier in this document, the test
statistic for these hypotheses is
F
MSB
MSW
(20)
and H0 can be rejected at a level of significance  if
F > F
(21)
where F is the value of F cutting off a right-hand tail of area  in the F-distribution with k – 1 numerator
degrees of freedom and nT – k denominator degrees of freedom. It’s just that simple. The p-value for this
test is computed as the right-hand tail area to the right of the value of F calculated using formula (20).
Normally, the calculations leading up to this hypothesis test are organized into a standard form called an
ANOVA table:
Source of
Variation
Between
Within
Total
Sum of Squares
SSB
SSW
SST = SSB + SSW
Degrees of
Freedom
k–1
nT – k
nT - 1
Mean Square
F
MSB
MSW
MSB/MSW
Formulas or symbols in the table should be replaced by the numerical values they generate when an actual
problem is being solved.
© David W. Sabo (2000)
Analysis of Variance (ANOVA)
Page 9 of 10
Example
To illustrate the use of the formulas above, we apply this analysis to the example introduced at the
beginning of this document. Since the example includes data for samples of five items from each of four
populations, we have that k = 4 (number of populations being compared) and nT = 20 (total number of
observations available). This allows us to fill in the ‘degrees of freedom’ column of the ANOVA table with
the values 3, 16, and 19 in order from top to bottom.
The sample means and standard deviations are given in the table on page 1 of this document. Thus,
SSW = (n1 –1) s1 2 + (n2 - 1)s2 2 + (n3 - 1)s3 2 + (n4 - 1)s4 2
= (5 - 1)(2.487)2 + (5 - 1)(3.350)2 + (5 - 1)(1.182)2 + (5 - 1)(4.402)2
= 140.576
Similarly, since the mean of all 20 observations works out to be 33.375, we get for SSB:
SSB  n1  x1  x   n2  x2  x   n3  x3  x   n4  x4  x 
2
2
2
2
= (5)(35.42 – 33.375)2 + (5)(34 – 33.375)2 + (5)(29.08 – 33.375)2
+ (5)(35 – 33.375)2
= 128.3015
For what it’s worth, then
SST = SSB + SSW = 268.8775
Finally,
MSB 
SSB 128.3015

 42.7672
k 1
3
and
MSW 
SSW 140.576

 8.786
nT  k
20  4
and so
F
MSB 42.7672

 4.868
MSW
8.786
Putting all of these results into the ANOVA table form gives the summary:
Source of
Variation
Between
Within
Total
Sum of Squares
128.3015
140.576
268.8775
Degrees of
Freedom
3
16
19
Mean Square
F
42.7672
8.786
4.868
From the F-distribution tables, we have that F0.05,3,16 = 3.24. Thus, since 4.868 > 3.24, the rejection criterion
is met. We can reject H0 at as level of significance of 0.05, concluding that at least one of the population
means in this problem is significantly different from the others. In fact, the function call FDIST(4.868, 3, 16)
in Excel gives the p-value for this test as 0.0136. This concludes the analysis.

Having established that at least one of the population mean differs from the others, the next logical step is to
try to determine which one(s) are the different one(s). This is a fairly difficult problem in itself, but many
methods have been developed for attempting to solve it. Such multiple comparison methods are beyond
the scope of this course however. We do look at three or four approaches in MATH 3441.
Page 10 of 10
Analysis of Variance (ANOVA)
© David W. Sabo (2000)