Download One-Way Analysis of Variance (ANOVA)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Chapter 8
One-Way Analysis of Variance (ANOVA)
In Chapter 6 we learned how to use an Independent-Sample T-Test to decide if the mean
of one group is significantly different from the mean of another one. However, there’s no
reason why a study can’t have three groups, or even more. These are situations a t-test
wasn’t really designed to handle. However, a technique called Analysis of Variance
(ANOVA) can test the effect of an independent variable that has any number of groups or
levels. This flexibility established Analysis of Variance long ago as one of the most
powerful and widely used tools in data analysis. In this chapter we explain the conceptual
foundations of ANOVA and describe the procedures for using it. We’ll see that the
statistic we make a decision about in ANOVA (referred to as an “F-ratio”) is a different
kind of number than a value for t, but the strategy for making a decision using this
number is exactly the same as in the other tests we’ve already talked about. To introduce
the “analysis of variance” we start by giving you an example of variance we need to
analyze or explain.
Table 8.1
X
--15
14
13
12
11
10
9
8
7
6
1
2
3
4
5
Let’s say you’re the achievement test scores of 15 seventh graders
(See Table 8.1). The lowest possible score is 1 and the highest
possible score is 18. The scores for these 15 students are displayed
in Table 8.1. Now, let’s say we ask you the following question:
why didn’t the students all get the same score? It’s a simple
question, but it also might strike you as a little strange. Of course
15 kids aren’t all going to give you the same score on a test. In
fact, we’d be pretty suspicious if it did happen. It’s a strange
question because we just assume that a bunch of kids aren’t all go
to get the same score on a test. But why don’t they get the same
score? Why is there variability, or variance, in the scores?
You might look at it like this. What if you knew the students all
grew up in the same small town; they had the same books and
teachers; they come from roughly the same socio-economic
background and watch the same TV shows. In short, they’ve all
been exposed to the same information growing up. Shouldn’t a
group of kids exposed to the same information get the same score
on a test that measures what they know? Of course not, but why?
You got to come up with at least one reason for the 15 kids don’t
all have the same score. What are some possibilities? Well, maybe
some kids studied more than others. Maybe some kids got a good night’s sleep and
others didn’t. Maybe some kids were sick on the day of the test and others weren’t. These
are all possible explanations for why the scores aren’t all the same – for why there’s
variability in the scores. This is exactly the kind of question Analysis of Variance is
designed to answer.
This question about the 15 scores may strike you as strange, but it actually gets at the
very heart of what it means to study behavior. In fact, it’s a researcher’s job to answer
©Thomas W. Pierce 2011 -- 9-25-11
2
that type of question. It’s our job to notice ways in which people are different from each
other. It’s our job to try to measure the variability in the scores we collect, but beyond
that, it’s our job to explain why people are different from each other! That’s what a
theory does. It represents a proposed explanation for why the scores on some measure of
interest are not all the same. An experiment is often a careful and systematic test of a
theory. 1
Starting out, we know only one thing about each student – we know their score on the
test. Now let’s say there’s one additional thing that we know about each student. We
learn that six weeks before they took the test each of them had been randomly assigned to
one of three groups. The five students who got
scores of 15, 14, 13, 12, and 11 were randomly
Table 8.2
assigned to a group that got a lot of tutoring in
how to do well on the test. The five students who
X
got scores of 10, 9, 8, 7, and 6 got a moderate
--amount of tutoring. The five students who got
Lot of Tutoring
15
scores of 1, 2, 3, 4, and 5 got no tutoring on how
14
X 1 = 13
to take the test. The assignment of each student to
13
one of the three groups is displayed in Table 8.2.
12
11
Now we know two things about each student. We
------------know their score on the test and we know which
Moderate Amount 10
group they were in. The score on the achievement
9
X 2 = 8.0
test is the dependent variable in the experiment
8
X T = 8.0
and the amount of tutoring the student got is the
7
independent variable.
6
------------A few symbols to get out of the way
No Tutoring
1
2
X 3 = 3.0
So we don’t have to keep writing out phrases like
3
“Amount of Tutoring” over and over, this seems
4
like a good time to introduce a few symbols that
5
will make equations easier to write out and
remember. First, we use the capital letter “A” to
represent the independent variable. This means that instead of writing “Amount of
Tutoring” we can just write “A”. Second, we use a small case letter “a” to represent the
number of levels of the independent variable. In the example there are three groups or
levels of A, so we’d say that “a” is equal to three (a = 3). Next, whenever you see the
small-case letter “a” with a number subscripted beside it, we’re referring to a particular
1
In this chapter we make the assumption that ANOVA is being conducted on data from an experiment. As
discussed in Chapter 6, an experiment involves as independent variable that has been manipulated in order
to determine if it causes an effect on a dependent variable. ANOVA can just as easily be used to analyze
data from studies using a “quasi-experimental” or “intact groups” design where the levels of the
independent variable are comprised of different groups, but the researcher has done nothing to make the
groups different from each other. Quasi-experimental designs are, essentially, correlational in nature.
Therefore, the researcher cannot conclude on the basis of a significant test that the independent variable
caused an effect on the dependent variable.
One-Way ANOVA
2
3
group (i.e., a1 = Group 1). Finally, a small-case letter “n” represents the number of people
in a particular group. In our example, there are five people in each group so we can say
that n = 1. Now back to our story…
The effect of the independent variable for one person
Now let’s talk about how we can get a sense of whether the amount of tutoring can help
to explain at least a little bit of the variability in the 15 achievement test scores. To start
with, we know that the mean score for everyone who took the test was 8.0. We can refer
to this value as the Total Mean, represented by the symbol X T. We also know the mean
score for the students in each group. The mean score for the students in the first group –
the group that got a lot of tutoring – is 13.0. The mean score for the students in the
second group – the group that got a moderate amount of tutoring – is 8.0. The mean score
for the students in the third group – the group that didn’t get any tutoring – is 3.0. We use
the symbol X G to represent the mean of a group. These means are also displayed in Table
8.2.
Eventually, we want to be able to know how good a job the independent variable does at
explaining the variability of everyone’s scores on the dependent variable. This will tell us
if the Amount of Tutoring can explain at least a little bit of why those 15 raw scores
weren’t all the same – why they weren’t all equal to that Total Mean of 8.0. But let’s say
for right now that we’d like to see how good a job the independent variable can do at
explaining just one person’s score. If we look at the person who got the score of 15, why
didn’t that one person get a score equal to the Total Mean of 8.0? Can knowing the
amount of tutoring they got help to answer this question?
If we want to explain why that person had a score that was different from the mean of
everybody, the place to start is to figure out exactly how much of a deviation from this
Total Mean there is to explain. And that’s easy. The person’s raw score is 15 and the
mean of everybody is 8.0. That means the person had a score 7 points higher than the
mean of everybody. This tells us that the total deviation for that one person that needs
to be explained is a deviation of 7 points. More formally, this deviation can be expressed
as:
Total Deviation =
X– X
T
= 15 – 8.0 = +7
Out of that Total Deviation of seven points, how much can we explain if we take into
account the amount of tutoring the person got? Look at it this way. Let’s say you’ve been
asked to provide your best guess about what the person’s score is. You don’t know what
score they got and you don’t know which group they were in. All you know is that they
were one of the 15 students in the class. If you don’t know which group they were in
your best guess would have to be the mean of all fifteen students – the Total Mean of
8.0. And how far off would your best guess be? You’d be off by 7 points. That’s the
Total Deviation we just talked about. Now, let’s say you’re given one additional piece of
information. You find out that the person had been in the group that got a lot of tutoring.
Now what’s your best guess? Are you still going to go with the 8.0, the mean of all 15
One-Way ANOVA
3
4
students? No! Of course not; you’ll go with the best information you’ve got. You’ll use
the mean of the person’s group as your best guess. And the mean of that person’s group
is 13.0. Think about it. Your best guess when you don’t know which group they’re in is
8.0. Your best guess when you do know which group they’re in is 13.0. How much
more accurate is your best guess when you have this one additional piece of information
– when you know how much tutoring they got? It’s the difference between these two
best guesses. It’s the difference between the mean of their group (13) and the mean of
everybody (8.0), a difference of 5.0 points. We can refer to this as the Deviation
Accounted-for and the equation for calculating it is:
Deviation Accounted-for = X
G
– X
T
= 13 - 8 = +5
Knowing which group the person is in takes you 5 points closer to their actual score.
This deviation of 5 points is a deviation we can account for because we can explain
where it comes from. It comes from the fact that people who get a lot of tutoring have
scores that are 5 points higher, on average, that the mean of all the students who took the
test.
At this point we can say we can explain 5 points out of the total of 7. That seems pretty
good. So if the independent variable can account for 5 units out of the total of 7, what
can the Amount of Tutoring not account for? It must be the remaining two points. And
here’s where it comes from. We know the person is in the group that got a lot of tutoring.
And we know that everyone within that group was treated exactly alike. They all had the
same instructor, for the same amount of time, using the same materials, at the same time
of day, etc. You get the picture. So why didn’t the five students in that group all get the
same score? Why did our student get a score that was two points higher than the mean of
their group? The answer has to be that we don’t know! There must be some explanation
for it, but we don't have it. That makes the deviation between the person’s raw score and
the mean of their group something that the independent variable can’t explain. We refer
to this value as the Deviation Not-accounted-for. The equation for calculating it is:
Deviation Not-accounted-for = X - X
G
= 15-13 = +2
Taking all three deviations into account we end up with an interesting relationship. The
total deviation we need to explain (7 points) is equal to a deviation the independent
variable can explain (5 points) plus a deviation the independent variable cannot explain
(2 points). The relationship looks like this:
Total Deviation
=
X– X
=
+7
One-Way ANOVA
T
=
Deviation
+
Accounted For
X
G
– X
T
+5
4
Deviation Not
Accounted For
+
X- X
+
+2
G
5
And the neat thing is that this relationship holds for any of the fifteen raw scores we
might want to look at.
Evaluating the ability of the independent variable to explain everyone’s scores
We’ve now figured out how good a job the
independent variable does at explaining the
deviation of one person’s score from the
Total Mean. Now we’re ready for the next
step. How good a job does the Amount of
Tutoring does at accounting for the
variability of all fifteen scores around the
Total Mean?
Table 8.3
Lot of Tutoring
X
--15 - 8
14 – 8
13 – 8
12 – 8
11 – 8
------------Moderate Amount 10 – 8
X
1
= 13
+6
+5
+4
+3
+2
9–8
+1
8–8
7–8
6–8
------------No Tutoring
1–8
0
-1
-2
-7
2–8
3–8
4–8
5–8
-6
-5
-4
-3
X 2 = 8.0
X T = 8.0
X– XT
-------+7
X 3 = 3.0
Table 8.4
Lot of Tutoring
X
X– XT
---------15 - 8
+7
14 – 8
13 – 8
12 – 8
11 – 8
------------Moderate Amount 10 – 8
X 1= 13
+2
Let’s start in the same place we did before.
If we want to explain the variability of a set
of raw scores we first have to ask “variable
around what?” The answer is “variable
around the Total Mean”. We need to
measure the total amount of variability that
needs to be explained or accounted for. If,
at the level of one person’s raw
score, we subtracted the Total
Mean from their raw score, we
(X – X T)2
should do the same thing for all
----------fifteen raw scores. This will give us
49
15 deviations of raw scores from
36
the Total Mean. It’ll give us 15
25
Total Deviations that need to be
16
accounted for. These Total
9
Deviations are displayed in Table
4
8.3.
9–8
8–8
7–8
6–8
------------No Tutoring
1–8
+1
0
-1
-2
1
0
1
4
-7
49
2–8
3–8
4–8
5–8
-6
-5
-4
-3
36
25
16
9
------280
X 2 = 8.0
MT = 8.0
+6
+5
+4
+3
Determining the total amount of
variability that needs to be accounted for
X 3 = 3.0
One-Way ANOVA
5
Instead of having 15 separate
deviations that need to be
accounted for, we want a single
number that measures the total
amount of variability among the 15
score that needs to be explained.
We learned in Chapter 2 that if we
square every deviation from the
6
mean and then add these squared deviations up we’ll get a perfectly good measure of
variability: the sum of squares. Table 8.4 shows that doing that here gives us a sum of
squares of 280. The total amount of variability that needs to be explained is a sum of
squares of 280.We can refer to this amount of variability as the Sum of Squares Total.
Determining the amount of variability accounted for by the independent variable
Out of the total amount of variability we need to explain of 280, how much can be
accounted for by knowing how much tutoring students got? We should start by
remembering how we calculated the Deviation Accounted-For for just one subject. It was
based on the deviation between the Total Mean and the mean of their group. If we can do
this for one person, we can do it for everyone. For all 15 students, let’s take the mean of
their group and subtract the Total Mean. Then we can take these 15 Deviations
Accounted-For and square them. When we add these Squared Deviations Accounted-For
up we get a sum of squared deviations of 250. We can say the Sum of Squares
Accounted-For is 250. These calculations are presented in Table 8.5.
Table 8.5
X
--Lot of Tutoring
15
14
X 1 = 13
13
12
11
------------Moderate Amount 10
9
X 2 = 8.0
MT = 8.0
8
7
6
------------No Tutoring
1
2
X 3 = 3.0
3
4
5
XG – XT
-------------13-8
13-8
13-8
13-8
13-8
8-8
8-8
8-8
8-8
8-8
X G – X T ( X G – X T)2
------------- ----------------+5
25
+5
25
+5
25
+5
25
+5
25
0
0
0
0
0
3-8
3-8
3-8
3-8
3-8
-5
-5
-5
-5
-5
0
0
0
0
0
25
25
25
25
25
------250
Taking all the scores into account, we can say that the independent variable accounts for
250 units out of the total of 280. Another way of looking at it is that out of all the reasons
for why students could differ from each other on achievement test scores, our one
proposed explanation – the Amount of Tutoring – accounts for 250 units out of the total
of 280.
One-Way ANOVA
6
7
One way of assessing the ability of the Amount of Tutoring to account for the variability
of scores on the achievement test is to determine the proportion of variability accountedfor. To get this all we have to do is take the amount of variability we’re able to account
for (250 units) and divide it by the amount of variability we needed to account for (280)
units. This gives us a value of .89 and let’s us say that the independent variable accounts
for 89% of variability in the dependent variable. Essentially, we’ve calculated the squared
correlation between the two variables in the study and determined that they overlap by
89%. This value gives of one way of measuring the size of the effect of the independent
variable. We’ll discuss the issue of effect size in more detail in Chapter X.
Determining the variability not accounted for by the independent variable
Ok, we know our independent variable accounts for 250 units out of the total of 280, but
now we need to calculate the amount of variability that’s not accounted for. It seems like
it should be a sum of squares of 30 (and it is), but where does this value come from?
Remember, we measured the Deviation Not-Accounted-For for one person by taking
their raw score and subtracting the mean of their group. This deviation was “not
accounted for” because we didn’t have an explanation for why the scores in a group
could be different from each other when everyone in that group was treated exactly alike.
If that’s what we did at the level of a single person, that’s what we ought to do with
everyone’s scores. We should take all 15 raw scores and subtract the means of their
respective groups. When we do this we end up with 15 Deviations Not-Accounted-For.
To get a measure of the variability that is not accounted for by the independent variable,
we take all 15 of these deviations and square them. When we add these squared
deviations up we get the Sum of Squares Not-Accounted-For. These calculations are
displayed in Table 8.6.
Table 8.6
X
--Lot of Tutoring
15 – 13
M1 = 13
14 – 13
13 – 13
12 – 13
11 – 13
------------Moderate Amount 10 – 8
M2 = 8.0
9–8
MT = 8.0
8–8
7–8
6–8
------------No Tutoring
1–3
M3 = 3.0
2–3
3– 3
4– 3
5– 3
One-Way ANOVA
X- XG
(X - X G)2
-------- ----------+2
4
+1
1
0
0
-1
1
-2
4
+2
+1
0
-1
-2
4
1
0
1
4
-2
-1
0
+1
+2
4
1
0
1
4
-----30
7
8
The Sum of Squares Not-Accounted-For ends up being 30 – just what we thought it
would be.
We saw before that the Total Deviation for one person’s score is equal to a deviation that
is accounted for by the independent variable plus a deviation that is not accounted for by
the independent variable. And the same relationship holds when you take all the scores
into account. The Sum of Squares Total is equal to the Sum of Squares Accounted-For
plus the Sum of Squares Not-Accounted-For.
SS Total =
SS Accounted-For + SS Not-Accounted-For
( X – X T)2 =
280
=
( X
G
– X T)2
250
+
( X - X G)2
+
30
This kind of makes sense: everything we need to know is equal to what we do know, plus
what we don’t know. It turns out that the Sum of Squares Total is something that we can
take apart. A statistician would say we can partition the Sum of Squares Total into two
pieces: the Sum of Squares Accounted-For and the Sum of Squares Not-Accounted-For.
Further exploration of the three sources of variability
Now that we’ve seen how we can quantify the degree to which the independent variable
is able to account for variability in the dependent variable, let’s take a closer look at these
sums of squares. What would the scores in a data set have to look like in order for the
Sum of Squares Total to be equal to zero? Could that really happen? No, probably not.
It would be a situation where there was nothing for the independent variable to have to
explain. It would be a dependent variable where every time you took a raw score and
subtracted the Total Mean you’d always get zero. And the only way for that to happen
would be if all the raw scores were the same number. It would be a situation where there
was no variability at all in the data set.
What would the data have to look like to get a SS Total that’s greater than zero, but a
SS Accounted-For that is equal to zero? Well, you know that the scores in the data set
aren’t all the same because the SS Total is greater than zero. The only way for the SS
Accounted-For to be equal to zero is if every time you took a person’s group mean and
subtracted the Total Mean, you got a value of zero. The only way for this to happen is if
group means are always equal to the Total Mean – and the only way for this to happen is
if all of the group means are equal to each other. So why does this make sense? If
tutoring didn’t have anything to do with achievement test scores – if tutoring has no
effect on achievement test scores – what would you expect these group means to be? If
tutoring doesn’t do anything to achievement test scores is there any reason to think that
one group should do any better than another group? No! If tutoring has absolutely no
effect on achievement test scores the group means should all be the same. Because the
One-Way ANOVA
8
9
average of all the group means has to turn out to be the mean of all the subjects in the
study, this would mean that the group means should also turn out to be equal to the mean
of everyone. In this case there would be no variability between the groups.
Statisticians refer to the Sum of Squares Accounted-For as the Sum of Squares
Between-Groups. This makes sense because we have an explanation for why we see
differences between the groups. We know the experimenter did something to make the
groups different from each other in terms of the independent variable.
What would the data have to look like for the SS Total to be greater than zero, but the
SS Not-Accounted-For to be equal to zero? Well, the Sum of Squares Not-AccountedFor is based on deviations between a person’s raw score and the mean of their group. For
this sum of squares to be equal to zero the deviation between a raw score and the group
mean would always have to be zero. The only way for this to happen would be if all the
scores within each group were the same. In this situation there would be no variability
among the scores within the groups.
Statisticians refer to the Sum of Squares Not-Accounted-For as the Sum of Squares
Within-Groups. This is because, when everyone in a group is treated exactly alike, the
independent variable can’t possibly explain why the scores within the group aren’t all the
same.
The F-ratio
Now that we have a better sense of how to think about three important sources of
variability – Total, Accounted-For, and Not-Accounted-For – let’s get back to our
original question. How do we decide if the Amount of Tutoring accounts for a significant
amount of variability in achievement test scores? This is a yes or no question. Either the
independent variable has a significant effect on the dependent variable or it doesn’t. The
names for the options are the same as those we worked with in conducting Z-tests and ttests. The null hypothesis for this question is that there is no significant effect of tutoring
on achievement test scores. The alternative hypothesis is that there is a significant effect
of tutoring on achievement test scores.
If we lived in a perfect world, it would be easy to tell if the independent variable had an
effect on the dependent variable. For the null hypothesis to be false, all we’d have to be
able to say is that the IV had some effect on the dependent variable. It doesn’t have to
have a large effect or even a noticeable effect. It’s a matter of whether it had any effect.
All you’d have to do is to see if the SS Accounted-For was greater than zero or not. If
it’s equal to zero there’s no evidence the null hypothesis is false and there’s no reason to
conclude that the independent variable has an effect on the dependent variable. If it’s
greater than zero the null hypothesis must be false; any differences between the means
would indicate that, to at least some extent, changing the conditions in terms of the
independent variable result in changes in the scores on the dependent variable.
One-Way ANOVA
9
10
Unfortunately, we don’t live in that perfect world. All we have to work with are sample
means – estimates – and these estimates don’t have to be perfect. It’s almost certain that
even if the independent variable did nothing to people’s scores, the sample means will be
at least a little bit different from each other, just by chance. Even when the null
hypothesis is true the sums of squares accounted-for is almost certain to be at least a
little bit bigger than zero, just by chance. We can’t tell if the independent variable had an
effect on the dependent variable by just looking to see if the Sum of Squares AccountedFor is greater than zero or not. It could be different from zero just by accident – by
chance alone.
Because the Sum of Squares Accounted-For could be greater than zero just by chance the
question becomes one of deciding whether the Sum of Squares Accounted-For is enough
greater than zero to be confident that it’s not just greater than zero by chance. We’re now
in exactly the same kind of situation we were in when we were doing Z-tests and t-tests.
We’re forced to make a decision based on some odds. Just like with a t-test it’ll turn out
that the only thing we’ll be able to know for sure are the odds of making a mistake if we
decide to reject the null hypothesis.
The idea behind both a z-test and a one-sample t-test was that there was one number we
were making our decision about, the mean of a sample. The question was whether we
were willing to believe that our one sample mean was a member of a collection of other
sample means obtained when the null hypothesis was true. We compared one sample
mean to a bunch of other sample means to see if it belonged with them.
The idea behind an independent samples t-test was that there was one number we made
our decision about, the difference between two sample means. The question was whether
we were willing to believe that our one difference between sample means was a member
of a collection of other differences between means that were obtained when the null
hypothesis was true. We compared one difference between means to a bunch of other
differences between means to see if it belonged with them.
No matter what the situation was, we handled it by comparing one number to a bunch of
other numbers to see if it belonged with them.
So what should we do in this latest situation? The same kind of thing! We just need to
decide what kind of number we’re going to make our decision about. Then, we’ll figure
out what these numbers would look like if we repeated this same experiment thousands
and thousands of times when the null hypothesis was true. The situation will then be just
a matter of deciding whether or not we’re confident that our number belonged in this
collection. If we decide our number doesn’t belong in a collection of numbers obtained
when the null hypothesis is true, we’ll have reason to be confidant it must have been
obtained when the null hypothesis was false – we’ll decide that the Amount of Tutoring
had an effect on Achievement Test scores.
So what kind of number should we use? The Sum of Squares Accounted-For would not
be a good choice because it’s influenced by both the effect of the independent variable
One-Way ANOVA
10
11
and the number of participants in the study (adding up a larger number of squared
deviations will give a larger sum of those squared deviations). Two studies could have
the identical means, but the study with the larger sample size will end up with a larger
Sum of Squares Accounted-For. So, we’re going to have to use a number that’s not
influenced by the sample size.
The measure that Ronald Fisher latched onto in the 1930s was based on the ratio of the
amount of variability accounted-for by the independent variable to the amount of
variability not-accounted-for by the independent variable, or…
Variability Accounted-For
--------------------------------------Variability Not-Accounted-For
Now, we’ve already measured these amounts of variability; we’ve got sums of squares
for them. So, the tempting thing to do at this point would be to take the Sum of Squares
Accounted-For and divide it by the Sum of Squares Not-Accounted-For, giving us a ratio
of 8.33.
SS Accounted-For
-----------------------------SS Not-Accounted-For
=
250
---- = 8.33
30
Unfortunately, we already know you can’t compare one sum of squares to another sum of
squares when they’re based on different numbers of values. It just doesn’t make sense,
for example, to compare the sum of 20 squared deviations to the sum of only 10 squared
deviations.
At this point you might well be saying “Hey, wait a minute! Didn’t we just add up 15
squared deviations to get the SSAccounted-For and didn’t we add up 15 squared deviations to
get the SSNot-Accounted-For? We added up the same number of values each time. Why can’t I
compare these numbers to each other?” And that would be a very reasonable question.
The answer is that what matters isn’t the number of values we added up. What matters is
the number of independent values we added up and, whether it seemed like it or not, they
weren’t always equal to 15. For example, the SSAccounted-For was based on deviations
between the group means and the total mean. How many different deviations between
group means and the total mean were there? There were only three group means so we
were only dealing with three different deviations (+5, 0, -5). We didn’t have 15
independent numbers. In that situation, we only had three. For the moment, I guess
you’ll just have to trust us, but in a few pages we’ll show you why the number of
independent values that contribute to the Accounted-For source of variability is very
different from the number of independent values that contribute to the Not-AccountedFor source of variability.
Okay, so what do we do? Well, you know you can’t compare the sum of one set of
numbers to the sum of another set of numbers when they’re based on different numbers
One-Way ANOVA
11
12
of values, but you can compare the mean of one set of numbers to the mean of another
set of numbers, even when the two sets are based on different numbers of values.
And that’s what we need to do here. Instead of comparing the SUM of squared
deviations accounted-for to the SUM of squared deviations not-accounted-for, we need to
compare the MEAN of the squared deviations accounted-for to the MEAN of the squared
deviations not-accounted-for. And what’s another name for the mean of a bunch of
squared deviations? The variance! The statistic we need will come from taking the
Variance Accounted-For and dividing it by the Variance Not-Accounted-For. The name
for this value is the F-ratio and is represented by the letter F.
Variance Accounted-For
F = ------------------------------------Variance Not-Accounted-For
The strategy Fisher came up with was to compare one variance to another variance.
Logically enough, he referred to this technique as Analysis of Variance.
So how do we get these variances? That’s easy. You remember how to calculate the
variance for a set of raw scores. You take the sum of squares and then divide it by the
appropriate number of degrees of freedom.
∑(X - X )2
S2 = ----------N-1
The steps needed to calculate the F-ratio for our experiment are organized in a table
referred to as an ANOVA Table. These steps are presented as we work our way from the
left side of the table to the right. The ANOVA table, shown in Table 8.7, starts by listing
the two sources of variability we need to calculate an F-ratio – variability accounted-for
and variability not-accounted-for. To be consistent with the traditional jargon associated
with ANOVA, we use the term Between-Groups in place of the more intuitively
meaningful term “Accounted-For”. We use the term Within-Groups in place of the term
“Not-Accounted-For” and we also include a row at the bottom of the table for the total
amount of variability.
Table 8.7
Source
-------------------Between-Groups
Within-Groups
Total
One-Way ANOVA
12
13
In Table 8.8 we can now list the sum of squares for each source of variability.
Table 8.8
Source
-------------------Between-Groups
Within-Groups
Total
SS
----250
30
280
At this point, we know we need the variances that correspond to each of the two sources
of variability. To get them, we divide each sum of squares by its appropriate number of
degrees of freedom. The abbreviation “df”, in Table 8.9 below stands for “degrees of
freedom”. “MS” refers to the “Mean Square” for each source of variability. A Mean
Square is the same thing as a variance. Remember, the variance is nothing more than the
mean of a bunch of squared deviations. For the moment, we simply describe the
procedure for calculating the number of degrees of freedom for each source of variability.
Once we’ve gotten our F-ratio we’ll go back and explain where these numbers came
from.
The number of degrees of freedom accounted-for is equal to the number of groups minus
one. We use the symbol “a” to represent the number of groups, so the equation for the
number of degrees of freedom accounted-for becomes “a – 1”. There are three groups in
this example, so this leaves us with 2 degrees of freedom. When we divide the Sum of
Squares Between-Groups of 250 by its 2 degrees of freedom we get a Mean Square
Between-Groups of 125. The variance accounted-for by the independent variable is
125.
The number of degrees of freedom Within-Groups is equal to the number of groups,
multiplied by the number of people in each group minus one. We use the symbol “n” to
represent the number of participants in each group, so the equation becomes “(a)(n-1)”.
For the Within-Groups row, a sum of squares of 30 is divided by 12 degrees of freedom,
giving us a Mean Square Within-Groups of 2.5. The variance not-accounted-for by the
independent variable is 2.5. The steps used to calculate both the Mean Square Between
Groups and the Mean Square Within-Groups are presented in table 8.9.
One-Way ANOVA
13
14
Table 8.9
a-1
Source
-------------------Between-Groups
Within-Groups
SS df
---- --250 2
30
12
MS
----125
2.5
(a)(n-1)
As shown in Table 8.10, the F-ratio is simply the Mean Square Accounted-For of 125
divided by the MS Not-Accounted-For of 2.5. This gives us a value of 50.0.
Table 8.10
Source
-------------------Between-Groups
Within-Groups
SS df
---- --250 2
30
12
MS
F
----- ---125 50
2.5
The F-ratio for this data set is 50.0. So what does this number mean? It’s telling us that
the ratio of variance accounted-for to variance not-accounted-for is 50:1. Another way of
saying the same thing is that the variance accounted-for is 50 times larger than the
variance not-accounted-for. That’s the definition of an F-ratio! It tells us how many times
larger the variance accounted-for is than the variance not-accounted-for.
Is an F-ratio of 50 good? It sounds pretty good. The important question is really whether
this F-ratio is large enough for us to be confident that the amount of tutoring really did
have an effect on the achievement test scores. Is it large enough for us to be confident
that it’s not that large just by chance? Unfortunately, it will always be possible that the Fratio we get is as large as it is just by chance alone. Because of this we’ll never be able to
know for sure if we’re making a mistake if we decide to reject the null hypothesis. But,
just like in a t-test we’ll be able to know the odds of making a mistake if we reject the
null hypothesis. If we use an alpha level of .05 we’re saying we’re willing to reject the
null hypothesis if we can show that the odds are less than 5% that it’s true. We’re saying
that we’re willing to take on a 5% risk of making a Type I error. Our next step is to figure
out whether or not the odds really are less than 5% that our null hypothesis is true.
The sampling distribution of the F-ratio and the critical value for F
In this chapter we’re asked to make a decision about an F-ratio, so we can refer that
decision as an F-test. This F-test uses the same strategy as every other test we’ve talked
about. In a Z-test and a one-sample t-test, we compared one sample mean to a bunch of
other sample means to see if it belonged with them. In an independent-samples t-test we
One-Way ANOVA
14
15
compared one difference between means to a bunch of other differences between means
to see if it belonged with them. To make our decision in the same way here we need to
compare our F-ratio to a bunch of other F-ratios to see if it belongs with them. In other
words, we need to find out what F-ratios look like when the null hypothesis is true and
then see if the odds are less than 5% that our F-ratio belongs with these other ones. It’s
the same thing as always – one number compared to a bunch of other numbers to see if it
belongs with them.
So how can we figure out what F-ratios look like when the independent variable doesn’t
work – when the null hypothesis is true? The truth is that we don’t need to. We’ve got
statisticians to do it for us. They can apply the laws of probability to determine what
these F-ratios would look like if someone were to go out and get them. But we can
imagine what the process would be like if we went out and tried to get them by hand.
First, you’d have to imagine you could know for sure that the null hypothesis is true, but
that you went ahead and did the experiment anyway. The first time you do this, let’s say
the F ratio turns out to be 0.80. In Figure 8.1 we can locate this F-ratio on a scale of
possible F-ratios.
Figure 8.1
|
|
|
|
|
|
F
|-------------|------------|------------|------------|---0
1
2
3
4
Now, let’s say the null hypothesis is true and we do the same experiment a second time.
There’s no reason, other than chance for the F-ratio to be greater than zero, but now the
F-ratio turns out to be 1.20. Figure 8.X shows where this second F-ratio goes on the
scale. Now, we have two F-ratios collected when the null hypothesis is true.
Figure 8.2
freq.
|
|
|
|
|
|
F
F
|-------------|------------|------------|------------|---0
1
2
3
4
Now, imagine that you do the same experiment over and over and over and over and over. And
every time you do the experiment you collect an F-ratio when the null hypothesis is true.
One-Way ANOVA
15
16
Figure 8.3 gives us a sense of what F-ratios look like when the null hypothesis is true.
The name for this collection of numbers is the Sampling Distribution of the F-ratio.
freq.
|
F
|
|
F
|
F F
|
F F F
|
|
F F F F
|
| F F F F F F F
| F F F F F F F F F
F
F
|----------------|---------------|---------------|--------------|---0
1
2
3
4
F-ratio
Figure 8.4 show that if we
collect the F-ratio from
thousands and thousands of
experiments conducted when
H0 was true, we’d find that the
shape of this pile of numbers
f
would be a nice smooth curve.
It’s not a normal curve
F
because it’s badly skewed to
F
F
the right. It’s an F-curve,
because it shows us how FF-ratio
ratios pile up on the scale.
Knowing the shape of this
distribution, a statistician can
tell us how far up on the scale we have to go before we hit the start of the upper 5% of
numbers that belong in the collection – the 5% of F-ratios we’re least likely to get when
the null hypothesis is true. That’s where the critical value for F comes from. It’s how far
up the scale an F-ratio has to be before the odds become 5% or less that the null
hypothesis is true.2
2
The reason critical values for F are different for different combinations of df Between-Groups and dfWithin-Groups is
that every time you change either the number of groups or the number of subjects in each group you change
the shape of the curve. And if you change the shape of the curve you change the location of the starting
place of the upper 5% of the F-ratios that make up the curve. The more degrees of freedom you have in
either the numerator or the denominator of the F-ratio the more the F-ratios are clustered closer to the
center of the curve (giving you smaller critical values). In our example, the critical value for F was 3.89
because 3.89 was how far up the scale we’d have to go to get to the start of the outer 5% of the area under a
curve with that particular shape.
One-Way ANOVA
16
17
Pragmatically, we can find the critical value for F we need by looking it up in a Critical
Values for F Table (see Tables 4.1 – 4.4 in the Appendix). There are three things you
need to know in order to look up a critical value for F. First, you need to know the alpha
level you want to use. In this case we’re using an alpha level of .05, so go to Table 4.1
labeled Critical Values for F – alpha = .05 (There are other pages for alpha levels of
.025 and .01). Second, you need to know which column to look in. That’s determined by
the number of degrees of freedom in the Between-Groups row (the numerator) of the
ANOVA table – “2” in this example. Third, you need to know which row to look in.
That’s determined by the number of degrees of freedom in the Within-Groups row (the
denominator) of the ANOVA table – “12” in this example. When we do this we arrive at
the value of 3.89 in the table. To say that our F-ratio is significant it has to be greater than
or equal to 3.89. Because our F-ratio of 50 is obviously greater than the critical value of
3.89 our decision is to reject the null hypothesis that there is no effect of tutoring on
achievement test scores. Our conclusion therefore becomes:
Tutoring has a significant effect on Achievement Test scores, F (2, 12) = 50.0, p < .05.
And that’s it! We compared a value for F we calculated from our data to a critical value
for F we looked up in the table. That means we’ve just done an F-test. We’ve learned
that changing the amount of tutoring students get changes the scores those students get on
the achievement test. We can be confident there are differences among the three sample
means. Chapter 9 will cover procedures for going further to determine which groups are
different from which other groups.
Degrees of Freedom in ANOVA
O.K. Now let’s go back and talk about where the degrees of freedom came from. Why
are there 2 degrees of freedom in the Between-Groups row and 12 degrees of freedom in
the Within-Groups row? It seems like it should be 14 degrees of freedom for both of
them. After all, in each case we added up 15 squared deviations to get each sums of
squares. Shouldn’t the number of degrees of freedom for each row be equal to the
number of values minus one or 15-1 = 14? When working with the Sum of Squares
Total that’s exactly what it is. The total number of degrees of freedom for the study is
equal to the total number of participants (15) minus one degree of freedom, giving us 14
degrees of freedom. But for the accounted-for and not-accounted-for sources of
variability there’s something else we have to keep in mind. The number of degrees of
freedom is always equal to the number of independent values that are free to vary. That
phrase “independent values” is what we need to focus on at the moment.
As far as the number of degrees of freedom Between-Groups goes, think of it this way.
The Sum of Squares Between-Groups (accounted-for) is based on deviations between the
group means and the Total Mean ( X G – X T). In effect, we can think of this as a situation
where the three group means are being used to calculate the Total Mean. If we know that
the Total Mean is equal to 8.0 and we know that one of the three group means is equal to
13.0, are the other two group means free to vary? Do these last two group means have to
One-Way ANOVA
17
18
be any particular number? No. As long as we pick numbers for these last two means that
get the Total Mean to come out to 8.0, this will work out fine (See Figure 8.5).
X 1 = 13.0
X
T
= 8.0
X 2 = ???
X 3= ???
Now, let’s say you know that a second group mean in the set is 8.0. Is the last group
mean fixed? Does it have to be a particular number? Now the answer is YES! If the
first two group means are 13.0 and 8.0, Figure 8.5 shows that the last group mean has to
be 3.0 to order to get the Total Mean to be 8.0.
X 1 = 13.0
2 df
Accounted-For
X T = 8.0
X 2 = 8.0
X 3 = ???
Out of the three group means being used to calculate the Total Mean, only two of them
are free to vary. Once we know two of them the last one is fixed; it is not free to vary.
That’s one way of describing why the number of degrees of freedom for the BetweenGroups term is equal to the number of group means minus one – in this case, two.
So how about the number of degrees of freedom Within-Groups (not-accounted-for)?
Where does this number come from? Think of it this way. The Sum of Squares WithinGroups is based on deviations between the raw scores and the group means (X - X G). In
effect, we can think of this situation as one where the raw scores are being used to
calculate the group means. If we look at the scores for the first group, the mean of the
five scores in this group is 13.0. If we know that one of the five scores in this group is 15,
are the other four scores in the group free to vary? Yes. You could change those
numbers around, as long as the mean of the group came out to 13.0. How about if you
know that three of the five scores in that group are 15, 14, and 13? Are the remaining
two scores free to vary? They are. Now, if you know that four of the five scores in the
group are 15, 14, 13, and 12, is the last score in that group free to vary? NO! To get the
mean of that group to come out to 13, that last score in the group has to be 11. The last
score is fixed; it is not free to vary. So when five raw scores are being used to calculate
that one group mean only four of them are free to vary. There are four degrees of freedom
within that one group.
One-Way ANOVA
18
19
The same thing goes for any of the three groups. There are four degrees of freedom
within the first group, four degrees of freedom within the second group and four degrees
of freedom within the third group. Taking every group into account, this brings us up to a
total of 12 degrees of freedom Within-Groups, see Figure 8.6. This corresponds to the
value we got when we applied the formula “(a)(n-1)”.3
X
X
T
= 8.0
1
X
= 13.0
2
= 8.0
X 3 = 3.0
X1 = 15
X2 = 14
X3 = 13 4 df
X4 = 12
X5 = ???
X6 = 10
X7 = 9
X8 = 8 4 df
12 df
X9 = 7
Not-Accounted-For
X10 = ???
X11 = 1
X12 = 2
X13 = 3 4 df
X14 = 4
X15 = ???
We’d like to mention one more thing about degrees of freedom. In Figure 8.7 below
we’ve got the same ANOVA table we worked out before, except that we’ve added back
in the additional row for the Total source of variability. You can see in the Sum of
Squares column the same relationship we stated earlier. The SSTotal is equal to the
SSBetween-Groups plus the SSWithin-Groups. This was true because the SSTotal was something that
could be partitioned into two pieces, accounted-for and not-accounted-for.
Source
-------------------Between-Groups
Within-Groups
Total
SS df
---- --250
2
30
12
MS
F
----- ---125 50
2.5
280 14
It turns out that the same relationship holds for degrees of freedom. The total number of
degrees of freedom is also something we can take apart, or partition. With 15 subjects
3
The equation (a)(n-1) works fine when the we have the same number of people in each group. Our
discussion of ANOVA assumes we equal sample sizes. An alternative method for calculating the df notaccounted-for is to take the total number of participants, N Total, and then subtract the number of groups, a. The
equation for this alternative strategy becomes NTotal – a.
One-Way ANOVA
19
20
there are 14 total degrees of freedom. Out of 14 degrees of freedom we have in all, 2 of
them can be attributed to the accounted-for source of variability and 12 of them can be
attributed to the not-accounted-for source of variability. So, the degrees of freedom
accounted-for and not-accounted-for aren’t both equal to 14, they add up to 14.
Conceptual definitions of the F-ratio
What value for F should we get when the null hypothesis is true? Zero? It sure seems like
it should be zero, but is that really the number we’re most likely to get? Let’s think about
it. Our experiment had three groups. Conceptually, when the null hypothesis is true, the
reality is that when we ran the experiment we started out by drawing three samples from
the same population. Then we did something we thought would make the groups different
from each other – in our example, we gave each group a different amount of tutoring.
However, in reality the independent variable didn’t do anything to change student
performance on the achievement test. Because the researcher didn’t do anything to
change anybody we end up with three samples drawn from the same population. Because
of this, when the null hypothesis is true, the means of the three samples are, in reality,
all estimates of the same population mean. This situation is represented in Figure 8.8.
X1
X2
X3
μ
If sample means gave us perfect estimates the SSBetween-Groups would be equal to zero
because the there would be no differences at all – no variability at all – between the group
means. But of course, you can’t expect these estimates to be perfect. Even when the
group means are all supposed to be the same number, they’ll almost certainly be at least a
little bit different from each other just by chance. When there are differences between the
group means – even if only by chance – the SSBetween-Groups is going to be greater than
zero. This, in turn, will produce an F-ratio that is greater than zero. So, even when the
null hypothesis is true, the F-ratio will almost always be greater than zero just by chance.
When the null hypothesis is true, the independent variable didn’t cause the group means
to be different from each other; they’re only different from each other by chance. And in
statistics, anything you don’t have an explanation for, or that happens by chance, is
referred to as Error. When sample means are only different from each other by chance
the Mean Square Between-Groups (accounted-for) will be greater than zero due only to
Error
As far as the denominator of the F-ratio is concerned, that number is based on the
deviations between the raw scores and the group means. Those are all deviations we
don’t have an explanation for. We don’t know why it is that when the people in a group
are all treated alike (as far as the independent variable is concerned) they don’t give us
the same score. There must be some explanation for it, but we don’t have it. And in
statistics, anything you don’t have an explanation for, or that happens by chance, is
One-Way ANOVA
20
21
referred to as Error. Consequently, for a statistician, the Mean Square Within-Groups
(not-accounted-for) is attributed to Error.
As shown in Figure 8.9, this results in a situation where, conceptually, when the null
hypothesis is true, the F-ratio is equal to Error divided by Error, giving us an F-ratio right
around 1.0.
Error
F = ------- ≈ 1
Error
Whatever forces of chance or error are operating in numerator to give us a number above
zero will tend to be roughly equal to the forces of chance or error that give us a number in
the denominator that’s above zero. If the error in the top part is roughly the same as the
error in the bottom part, we’re going to get an F-ratio that’s right around 1.0. The critical
value for F isn’t telling us how far above zero our F-ratio has to be in order to reject the
null hypothesis. It’s telling us how far above one an F-ratio has to be to get to the point
where only 5% of F-ratios are that far above 1.0 just by chance.
When the null hypothesis is false, there is something in addition to chance that’s making
the group means different from each other. When the alternative hypothesis is true the
independent variable has an effect on the scores. This means the independent variable is
actively driving the group means away from each other. The group means aren’t just
different from each other by chance, but for a reason. They’re different from each other
because of the effect of the independent variable. As shown in Figure 8.10, when the null
hypothesis is false the numerator of the F-ratio is as large as it is due to Error plus the
effect of the treatment. However, the denominator is still just due to Error.
Error + Effect of the IV
F = ------------------------------ > 1
Error
Thus, when there’s an effect of the independent variable present in the numerator there’s
a reason for the F-ratio to be greater than 1.0. There’s something going on that’s making
the variance in the numerator greater than the variance in the denominator. The critical
value for F tells us how far an F-ratio has to be above 1.0 to get to the point where we can
be confident that the effect of the independent variable is contributing to the numerator of
the F-ratio.
Assumptions of ANOVA
There are a number of assumptions that need to be met in order for the results of an F-test
to be valid. By “valid” we mean that the conclusion made on the basis of the test is
justified. For example, violating a particular assumption may make it impossible for the
researcher to draw any conclusion on the basis of their data. Violating a different
assumption might place the researcher in the position of thinking the risk of a Type I
One-Way ANOVA
21
22
error was 5%, when in reality it might be considerably higher or lower than this. We’ll
talk about the assumptions statisticians worry most about one at a time.
Independence of Groups
In the context of an experiment, a significant F-test should allow the researcher to infer
that the independent variable caused an effect on the dependent variable. This conclusion
is based on the assumption that the only thing that makes the groups different from each
other is the way in which the researcher made the groups different from each other. If the
groups differ in any other respect, the researcher can’t know if a significant F-test is due
to the effect of the independent variable or to an effect of this other way in which the
groups differ. In Chapter 4 on Independent Samples T-tests, we discussed a confound as
an alternative explanation for why sample means are significantly different from each
other. The assumption of independence of groups is thus that there are no confounds in
the design.
The presence of a confound doesn’t alter the odds of making either a Type I or a Type II
error. The F-test will allow you to be confident there are differences among the groups.
The problem is that the presence of a confound makes it impossible to know why the
groups are different from each other. The consequence of violating the assumption is that
the study no longer has internal validity; that is, we can no longer draw the conclusion
that it was the independent variable that caused an effect on the dependent variable. The
assumption of independence of groups isn’t a statistical assumption. It’s an assumption
about the design of the study. If a study’s design is invalid, any F-test conducted on the
data from that study yields a conclusion that is also invalid.
Independence of Scores
The assumption of independence of scores is that all of the score were collected
independently of every other score. In other words, nobody’s score was influenced by
anyone else’s score. For example, an instructor might want to see which of two sections
of a course does better on an exam. In one of the sections a student sitting in the middle
of the classroom gets a 98 and four other students within 20/20 vision of that student also
get 98s. The researcher finds that the mean for that section is significantly higher than the
mean of the other section. There’s nothing wrong with the F-test used to test the
difference between the means. Again, there’s something wrong with the design of the
study. The doofus instructor let four people cheat off the smart kid! Of course the mean
for that group is going to be higher! We can’t draw the conclusion that the students in
that section learned more of the material than the students in the other section because
four of the scores were influenced by one of the other ones. There’s nothing wrong with
the F-test itself. It accurately tells the instructor that one mean is significantly higher than
the other one. But again, the flaw in the design makes it impossible to draw a valid
conclusion about why that mean was higher.
One-Way ANOVA
22
23
Normality
This is simply the assumption that the scores in each of the groups are normally
distributed – that the frequency distributions for all of the groups look more or less like
the normal curve. If one or more of the groups has a distribution that is strikingly
different from normal (e,g., badly skewed, bimodal) the F-ratio will be biased; it will be
more likely to be too high than it is to be too low. If the F-ratio is inflated, this results in a
situation where the researcher is more likely to reject the null hypothesis than they
deserve to be. They may think the odds are 5% of making a Type I error, but, depending
on how far the distribution departs from normal, the true risk may be 7% or 8%.
Fortunately, Analysis of Variance is said to be robust with respect to this assumption.
This means the violation of the assumption has to be quite severe before the F-ratio is
inflated by enough to worry about. If the researcher does have reason to think that one or
more of the distributions is very different from normal, they might consider applying
some type of transformation to the scores. Software packages like SPSS make a number
of options available, such as the logarithmic or arcsine transformations. The goal of these
transformations is to produce sets of scores whose distributions are closer to normal than
those of the original raw scores.
Homogeneity of Variance (Equal Variances)
A fourth assumption is that the variances of the scores in the different groups are the
same. The jargon term statisticians use for this assumption is homogeneity of variance.
This just means that however much the scores in one group are spread out around their
mean, the scores in the other groups are spread out to the same degree around their
means. A more intuitive name for the assumption might be that of equal variances.
Expressed in symbols, the assumption is:
S2Group 1 = S2Group 2 = S2Group 3
If there are striking differences among the group variances the consequences of violating
this assumption are similar to those of violating the assumption of normality: the F-ratio
becomes biased. Under some conditions the F-ratio becomes biased so that it’s more
likely to be too high than it is to be too low. Under other conditions, it’s biased so that the
observed value for F is more likely to be too low than it is to be too high. Distinguishing
between these circumstances is beyond the scope of this book; however, whichever way
it turns out, the researcher might think the odds of making a Type I error are 5% when the
actual amount of risk is significantly different. One thing the researcher should know is
that the problem pretty much goes away when the sample sizes for the groups are
approximately the same. The more the sample sizes differ from each other the more
biased the F-ratio will be. With equal sample sizes, even a large violation of the
assumption is not very likely to change the odds of committing either a Type I or a Type
II error.
One-Way ANOVA
23
24
Levene’s Test. One way to see if we’re meeting the assumption of equal variances
is to conduct Levene’s Test. It tests whether or not there are significant differences
among the group variances. If Levene’s Test is not significant, it means the answer is
“no” and there’s nothing to worry about in terms of meeting the assumption. In SPSS, if
you ask for a “Homogeneity of Variance Test” as one of the Options for a One-Way
ANOVA, you’ll get Levene’s Test in the output.
F-Max. If Levene’s Test is significant then, technically, the data don’t meet the
assumption of homogeneity of variance. However, Analysis of Variance is also robust
with respect to the assumption of equal variances. This means that one group variance
has to be quite a bit larger than another one for the F-ratio to be inflated by enough to
worry about.
So how do we tell if there’s enough of a violation to worry about – enough of a violation
to have to make some type of adjustment when we test our F-ratio? One way of
evaluating the severity of the violation is through a statistic called F-max. F-max gives us
a measure of just how bad the situation is. We know we have a violation of the
assumption of equal variances when the group variances aren’t all the same. F-max tells
us how many times larger the largest group variance is than the smallest group
variance. Calculating it is simple if you have the standard deviations of the various
groups. Just find the largest standard deviation and square it – that gives you the largest
group variance. Then find the smallest standard deviation and square it – that gives you
the smallest group variance. Now take the largest group variance and divide it by the
smallest group variance. The number we end up with is an F-ratio because any F-ratio
represents how many times larger one variance is than another one.
S2Largest
FMax = ----------S2Smallest
Let’s say the value for FMax is 3.0. This tells us that the largest group variance is three
times larger than the smallest group variance. This doesn’t sound good, but is it still
enough to worry about. This is a situation where different authors give different answers
about how large FMax needs to be before we start to worry about it. A middle ground in
these values is 6.0, so that’s the number we’re going to recommend. We won’t consider
the violation of the assumption of equal variances to be severe enough to worry about
until the value for FMax is 6.0 or larger.
If a researcher has reason to be concerned that their F-ratio is biased as a result of
violating the assumption of homogeneity of variance, one option is to apply the BrownForsyth adjustment in calculating an F-ratio. An alternative is the Welch procedure.
Both are fairly horrific to do by hand, but programs like SPSS and SAS will calculate
adjusted values for F for you using each procedure.
One-Way ANOVA
24
25
Closing Thoughts
We’ve spent quite a bit of time getting a feel for Analysis of Variance and F-tests. Not
only are we now in a position to test the effect of an independent variable that has more
than two levels, but the extra time spent working through the reasoning behind Analysis
of Variance will pay off later when we see F-tests pop up in several other places. Here are
some take-home points:

An F-ratio represents how many times larger the variance accounted-for is than
the variance not-accounted-for.

When the null hypothesis is true we should expect to get an F-ratio right around
“1”.

The critical value for F tells us how far above “1” our F-ratio has to be before
we’re just not willing to believe it could have been collected when the null
hypothesis is true.

An F-test is still just one number (an F-ratio) compared to a bunch of other
numbers (a bunch of other F-ratios) to see if it belongs with them. It’s the same
strategy as in other chapters, just with a new statistic to deal with.

A significant F-test only tells us that there’s an effect of the independent variable;
that there are differences among the individual group means. The next step,
covered in the next chapter, is to conduct a set of follow-up tests to determine
which group means are different from which other group means.
One-Way ANOVA
25