Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 One-Way Analysis of Variance (ANOVA) In Chapter 6 we learned how to use an Independent-Sample T-Test to decide if the mean of one group is significantly different from the mean of another one. However, there’s no reason why a study can’t have three groups, or even more. These are situations a t-test wasn’t really designed to handle. However, a technique called Analysis of Variance (ANOVA) can test the effect of an independent variable that has any number of groups or levels. This flexibility established Analysis of Variance long ago as one of the most powerful and widely used tools in data analysis. In this chapter we explain the conceptual foundations of ANOVA and describe the procedures for using it. We’ll see that the statistic we make a decision about in ANOVA (referred to as an “F-ratio”) is a different kind of number than a value for t, but the strategy for making a decision using this number is exactly the same as in the other tests we’ve already talked about. To introduce the “analysis of variance” we start by giving you an example of variance we need to analyze or explain. Table 8.1 X --15 14 13 12 11 10 9 8 7 6 1 2 3 4 5 Let’s say you’re the achievement test scores of 15 seventh graders (See Table 8.1). The lowest possible score is 1 and the highest possible score is 18. The scores for these 15 students are displayed in Table 8.1. Now, let’s say we ask you the following question: why didn’t the students all get the same score? It’s a simple question, but it also might strike you as a little strange. Of course 15 kids aren’t all going to give you the same score on a test. In fact, we’d be pretty suspicious if it did happen. It’s a strange question because we just assume that a bunch of kids aren’t all go to get the same score on a test. But why don’t they get the same score? Why is there variability, or variance, in the scores? You might look at it like this. What if you knew the students all grew up in the same small town; they had the same books and teachers; they come from roughly the same socio-economic background and watch the same TV shows. In short, they’ve all been exposed to the same information growing up. Shouldn’t a group of kids exposed to the same information get the same score on a test that measures what they know? Of course not, but why? You got to come up with at least one reason for the 15 kids don’t all have the same score. What are some possibilities? Well, maybe some kids studied more than others. Maybe some kids got a good night’s sleep and others didn’t. Maybe some kids were sick on the day of the test and others weren’t. These are all possible explanations for why the scores aren’t all the same – for why there’s variability in the scores. This is exactly the kind of question Analysis of Variance is designed to answer. This question about the 15 scores may strike you as strange, but it actually gets at the very heart of what it means to study behavior. In fact, it’s a researcher’s job to answer ©Thomas W. Pierce 2011 -- 9-25-11 2 that type of question. It’s our job to notice ways in which people are different from each other. It’s our job to try to measure the variability in the scores we collect, but beyond that, it’s our job to explain why people are different from each other! That’s what a theory does. It represents a proposed explanation for why the scores on some measure of interest are not all the same. An experiment is often a careful and systematic test of a theory. 1 Starting out, we know only one thing about each student – we know their score on the test. Now let’s say there’s one additional thing that we know about each student. We learn that six weeks before they took the test each of them had been randomly assigned to one of three groups. The five students who got scores of 15, 14, 13, 12, and 11 were randomly Table 8.2 assigned to a group that got a lot of tutoring in how to do well on the test. The five students who X got scores of 10, 9, 8, 7, and 6 got a moderate --amount of tutoring. The five students who got Lot of Tutoring 15 scores of 1, 2, 3, 4, and 5 got no tutoring on how 14 X 1 = 13 to take the test. The assignment of each student to 13 one of the three groups is displayed in Table 8.2. 12 11 Now we know two things about each student. We ------------know their score on the test and we know which Moderate Amount 10 group they were in. The score on the achievement 9 X 2 = 8.0 test is the dependent variable in the experiment 8 X T = 8.0 and the amount of tutoring the student got is the 7 independent variable. 6 ------------A few symbols to get out of the way No Tutoring 1 2 X 3 = 3.0 So we don’t have to keep writing out phrases like 3 “Amount of Tutoring” over and over, this seems 4 like a good time to introduce a few symbols that 5 will make equations easier to write out and remember. First, we use the capital letter “A” to represent the independent variable. This means that instead of writing “Amount of Tutoring” we can just write “A”. Second, we use a small case letter “a” to represent the number of levels of the independent variable. In the example there are three groups or levels of A, so we’d say that “a” is equal to three (a = 3). Next, whenever you see the small-case letter “a” with a number subscripted beside it, we’re referring to a particular 1 In this chapter we make the assumption that ANOVA is being conducted on data from an experiment. As discussed in Chapter 6, an experiment involves as independent variable that has been manipulated in order to determine if it causes an effect on a dependent variable. ANOVA can just as easily be used to analyze data from studies using a “quasi-experimental” or “intact groups” design where the levels of the independent variable are comprised of different groups, but the researcher has done nothing to make the groups different from each other. Quasi-experimental designs are, essentially, correlational in nature. Therefore, the researcher cannot conclude on the basis of a significant test that the independent variable caused an effect on the dependent variable. One-Way ANOVA 2 3 group (i.e., a1 = Group 1). Finally, a small-case letter “n” represents the number of people in a particular group. In our example, there are five people in each group so we can say that n = 1. Now back to our story… The effect of the independent variable for one person Now let’s talk about how we can get a sense of whether the amount of tutoring can help to explain at least a little bit of the variability in the 15 achievement test scores. To start with, we know that the mean score for everyone who took the test was 8.0. We can refer to this value as the Total Mean, represented by the symbol X T. We also know the mean score for the students in each group. The mean score for the students in the first group – the group that got a lot of tutoring – is 13.0. The mean score for the students in the second group – the group that got a moderate amount of tutoring – is 8.0. The mean score for the students in the third group – the group that didn’t get any tutoring – is 3.0. We use the symbol X G to represent the mean of a group. These means are also displayed in Table 8.2. Eventually, we want to be able to know how good a job the independent variable does at explaining the variability of everyone’s scores on the dependent variable. This will tell us if the Amount of Tutoring can explain at least a little bit of why those 15 raw scores weren’t all the same – why they weren’t all equal to that Total Mean of 8.0. But let’s say for right now that we’d like to see how good a job the independent variable can do at explaining just one person’s score. If we look at the person who got the score of 15, why didn’t that one person get a score equal to the Total Mean of 8.0? Can knowing the amount of tutoring they got help to answer this question? If we want to explain why that person had a score that was different from the mean of everybody, the place to start is to figure out exactly how much of a deviation from this Total Mean there is to explain. And that’s easy. The person’s raw score is 15 and the mean of everybody is 8.0. That means the person had a score 7 points higher than the mean of everybody. This tells us that the total deviation for that one person that needs to be explained is a deviation of 7 points. More formally, this deviation can be expressed as: Total Deviation = X– X T = 15 – 8.0 = +7 Out of that Total Deviation of seven points, how much can we explain if we take into account the amount of tutoring the person got? Look at it this way. Let’s say you’ve been asked to provide your best guess about what the person’s score is. You don’t know what score they got and you don’t know which group they were in. All you know is that they were one of the 15 students in the class. If you don’t know which group they were in your best guess would have to be the mean of all fifteen students – the Total Mean of 8.0. And how far off would your best guess be? You’d be off by 7 points. That’s the Total Deviation we just talked about. Now, let’s say you’re given one additional piece of information. You find out that the person had been in the group that got a lot of tutoring. Now what’s your best guess? Are you still going to go with the 8.0, the mean of all 15 One-Way ANOVA 3 4 students? No! Of course not; you’ll go with the best information you’ve got. You’ll use the mean of the person’s group as your best guess. And the mean of that person’s group is 13.0. Think about it. Your best guess when you don’t know which group they’re in is 8.0. Your best guess when you do know which group they’re in is 13.0. How much more accurate is your best guess when you have this one additional piece of information – when you know how much tutoring they got? It’s the difference between these two best guesses. It’s the difference between the mean of their group (13) and the mean of everybody (8.0), a difference of 5.0 points. We can refer to this as the Deviation Accounted-for and the equation for calculating it is: Deviation Accounted-for = X G – X T = 13 - 8 = +5 Knowing which group the person is in takes you 5 points closer to their actual score. This deviation of 5 points is a deviation we can account for because we can explain where it comes from. It comes from the fact that people who get a lot of tutoring have scores that are 5 points higher, on average, that the mean of all the students who took the test. At this point we can say we can explain 5 points out of the total of 7. That seems pretty good. So if the independent variable can account for 5 units out of the total of 7, what can the Amount of Tutoring not account for? It must be the remaining two points. And here’s where it comes from. We know the person is in the group that got a lot of tutoring. And we know that everyone within that group was treated exactly alike. They all had the same instructor, for the same amount of time, using the same materials, at the same time of day, etc. You get the picture. So why didn’t the five students in that group all get the same score? Why did our student get a score that was two points higher than the mean of their group? The answer has to be that we don’t know! There must be some explanation for it, but we don't have it. That makes the deviation between the person’s raw score and the mean of their group something that the independent variable can’t explain. We refer to this value as the Deviation Not-accounted-for. The equation for calculating it is: Deviation Not-accounted-for = X - X G = 15-13 = +2 Taking all three deviations into account we end up with an interesting relationship. The total deviation we need to explain (7 points) is equal to a deviation the independent variable can explain (5 points) plus a deviation the independent variable cannot explain (2 points). The relationship looks like this: Total Deviation = X– X = +7 One-Way ANOVA T = Deviation + Accounted For X G – X T +5 4 Deviation Not Accounted For + X- X + +2 G 5 And the neat thing is that this relationship holds for any of the fifteen raw scores we might want to look at. Evaluating the ability of the independent variable to explain everyone’s scores We’ve now figured out how good a job the independent variable does at explaining the deviation of one person’s score from the Total Mean. Now we’re ready for the next step. How good a job does the Amount of Tutoring does at accounting for the variability of all fifteen scores around the Total Mean? Table 8.3 Lot of Tutoring X --15 - 8 14 – 8 13 – 8 12 – 8 11 – 8 ------------Moderate Amount 10 – 8 X 1 = 13 +6 +5 +4 +3 +2 9–8 +1 8–8 7–8 6–8 ------------No Tutoring 1–8 0 -1 -2 -7 2–8 3–8 4–8 5–8 -6 -5 -4 -3 X 2 = 8.0 X T = 8.0 X– XT -------+7 X 3 = 3.0 Table 8.4 Lot of Tutoring X X– XT ---------15 - 8 +7 14 – 8 13 – 8 12 – 8 11 – 8 ------------Moderate Amount 10 – 8 X 1= 13 +2 Let’s start in the same place we did before. If we want to explain the variability of a set of raw scores we first have to ask “variable around what?” The answer is “variable around the Total Mean”. We need to measure the total amount of variability that needs to be explained or accounted for. If, at the level of one person’s raw score, we subtracted the Total Mean from their raw score, we (X – X T)2 should do the same thing for all ----------fifteen raw scores. This will give us 49 15 deviations of raw scores from 36 the Total Mean. It’ll give us 15 25 Total Deviations that need to be 16 accounted for. These Total 9 Deviations are displayed in Table 4 8.3. 9–8 8–8 7–8 6–8 ------------No Tutoring 1–8 +1 0 -1 -2 1 0 1 4 -7 49 2–8 3–8 4–8 5–8 -6 -5 -4 -3 36 25 16 9 ------280 X 2 = 8.0 MT = 8.0 +6 +5 +4 +3 Determining the total amount of variability that needs to be accounted for X 3 = 3.0 One-Way ANOVA 5 Instead of having 15 separate deviations that need to be accounted for, we want a single number that measures the total amount of variability among the 15 score that needs to be explained. We learned in Chapter 2 that if we square every deviation from the 6 mean and then add these squared deviations up we’ll get a perfectly good measure of variability: the sum of squares. Table 8.4 shows that doing that here gives us a sum of squares of 280. The total amount of variability that needs to be explained is a sum of squares of 280.We can refer to this amount of variability as the Sum of Squares Total. Determining the amount of variability accounted for by the independent variable Out of the total amount of variability we need to explain of 280, how much can be accounted for by knowing how much tutoring students got? We should start by remembering how we calculated the Deviation Accounted-For for just one subject. It was based on the deviation between the Total Mean and the mean of their group. If we can do this for one person, we can do it for everyone. For all 15 students, let’s take the mean of their group and subtract the Total Mean. Then we can take these 15 Deviations Accounted-For and square them. When we add these Squared Deviations Accounted-For up we get a sum of squared deviations of 250. We can say the Sum of Squares Accounted-For is 250. These calculations are presented in Table 8.5. Table 8.5 X --Lot of Tutoring 15 14 X 1 = 13 13 12 11 ------------Moderate Amount 10 9 X 2 = 8.0 MT = 8.0 8 7 6 ------------No Tutoring 1 2 X 3 = 3.0 3 4 5 XG – XT -------------13-8 13-8 13-8 13-8 13-8 8-8 8-8 8-8 8-8 8-8 X G – X T ( X G – X T)2 ------------- ----------------+5 25 +5 25 +5 25 +5 25 +5 25 0 0 0 0 0 3-8 3-8 3-8 3-8 3-8 -5 -5 -5 -5 -5 0 0 0 0 0 25 25 25 25 25 ------250 Taking all the scores into account, we can say that the independent variable accounts for 250 units out of the total of 280. Another way of looking at it is that out of all the reasons for why students could differ from each other on achievement test scores, our one proposed explanation – the Amount of Tutoring – accounts for 250 units out of the total of 280. One-Way ANOVA 6 7 One way of assessing the ability of the Amount of Tutoring to account for the variability of scores on the achievement test is to determine the proportion of variability accountedfor. To get this all we have to do is take the amount of variability we’re able to account for (250 units) and divide it by the amount of variability we needed to account for (280) units. This gives us a value of .89 and let’s us say that the independent variable accounts for 89% of variability in the dependent variable. Essentially, we’ve calculated the squared correlation between the two variables in the study and determined that they overlap by 89%. This value gives of one way of measuring the size of the effect of the independent variable. We’ll discuss the issue of effect size in more detail in Chapter X. Determining the variability not accounted for by the independent variable Ok, we know our independent variable accounts for 250 units out of the total of 280, but now we need to calculate the amount of variability that’s not accounted for. It seems like it should be a sum of squares of 30 (and it is), but where does this value come from? Remember, we measured the Deviation Not-Accounted-For for one person by taking their raw score and subtracting the mean of their group. This deviation was “not accounted for” because we didn’t have an explanation for why the scores in a group could be different from each other when everyone in that group was treated exactly alike. If that’s what we did at the level of a single person, that’s what we ought to do with everyone’s scores. We should take all 15 raw scores and subtract the means of their respective groups. When we do this we end up with 15 Deviations Not-Accounted-For. To get a measure of the variability that is not accounted for by the independent variable, we take all 15 of these deviations and square them. When we add these squared deviations up we get the Sum of Squares Not-Accounted-For. These calculations are displayed in Table 8.6. Table 8.6 X --Lot of Tutoring 15 – 13 M1 = 13 14 – 13 13 – 13 12 – 13 11 – 13 ------------Moderate Amount 10 – 8 M2 = 8.0 9–8 MT = 8.0 8–8 7–8 6–8 ------------No Tutoring 1–3 M3 = 3.0 2–3 3– 3 4– 3 5– 3 One-Way ANOVA X- XG (X - X G)2 -------- ----------+2 4 +1 1 0 0 -1 1 -2 4 +2 +1 0 -1 -2 4 1 0 1 4 -2 -1 0 +1 +2 4 1 0 1 4 -----30 7 8 The Sum of Squares Not-Accounted-For ends up being 30 – just what we thought it would be. We saw before that the Total Deviation for one person’s score is equal to a deviation that is accounted for by the independent variable plus a deviation that is not accounted for by the independent variable. And the same relationship holds when you take all the scores into account. The Sum of Squares Total is equal to the Sum of Squares Accounted-For plus the Sum of Squares Not-Accounted-For. SS Total = SS Accounted-For + SS Not-Accounted-For ( X – X T)2 = 280 = ( X G – X T)2 250 + ( X - X G)2 + 30 This kind of makes sense: everything we need to know is equal to what we do know, plus what we don’t know. It turns out that the Sum of Squares Total is something that we can take apart. A statistician would say we can partition the Sum of Squares Total into two pieces: the Sum of Squares Accounted-For and the Sum of Squares Not-Accounted-For. Further exploration of the three sources of variability Now that we’ve seen how we can quantify the degree to which the independent variable is able to account for variability in the dependent variable, let’s take a closer look at these sums of squares. What would the scores in a data set have to look like in order for the Sum of Squares Total to be equal to zero? Could that really happen? No, probably not. It would be a situation where there was nothing for the independent variable to have to explain. It would be a dependent variable where every time you took a raw score and subtracted the Total Mean you’d always get zero. And the only way for that to happen would be if all the raw scores were the same number. It would be a situation where there was no variability at all in the data set. What would the data have to look like to get a SS Total that’s greater than zero, but a SS Accounted-For that is equal to zero? Well, you know that the scores in the data set aren’t all the same because the SS Total is greater than zero. The only way for the SS Accounted-For to be equal to zero is if every time you took a person’s group mean and subtracted the Total Mean, you got a value of zero. The only way for this to happen is if group means are always equal to the Total Mean – and the only way for this to happen is if all of the group means are equal to each other. So why does this make sense? If tutoring didn’t have anything to do with achievement test scores – if tutoring has no effect on achievement test scores – what would you expect these group means to be? If tutoring doesn’t do anything to achievement test scores is there any reason to think that one group should do any better than another group? No! If tutoring has absolutely no effect on achievement test scores the group means should all be the same. Because the One-Way ANOVA 8 9 average of all the group means has to turn out to be the mean of all the subjects in the study, this would mean that the group means should also turn out to be equal to the mean of everyone. In this case there would be no variability between the groups. Statisticians refer to the Sum of Squares Accounted-For as the Sum of Squares Between-Groups. This makes sense because we have an explanation for why we see differences between the groups. We know the experimenter did something to make the groups different from each other in terms of the independent variable. What would the data have to look like for the SS Total to be greater than zero, but the SS Not-Accounted-For to be equal to zero? Well, the Sum of Squares Not-AccountedFor is based on deviations between a person’s raw score and the mean of their group. For this sum of squares to be equal to zero the deviation between a raw score and the group mean would always have to be zero. The only way for this to happen would be if all the scores within each group were the same. In this situation there would be no variability among the scores within the groups. Statisticians refer to the Sum of Squares Not-Accounted-For as the Sum of Squares Within-Groups. This is because, when everyone in a group is treated exactly alike, the independent variable can’t possibly explain why the scores within the group aren’t all the same. The F-ratio Now that we have a better sense of how to think about three important sources of variability – Total, Accounted-For, and Not-Accounted-For – let’s get back to our original question. How do we decide if the Amount of Tutoring accounts for a significant amount of variability in achievement test scores? This is a yes or no question. Either the independent variable has a significant effect on the dependent variable or it doesn’t. The names for the options are the same as those we worked with in conducting Z-tests and ttests. The null hypothesis for this question is that there is no significant effect of tutoring on achievement test scores. The alternative hypothesis is that there is a significant effect of tutoring on achievement test scores. If we lived in a perfect world, it would be easy to tell if the independent variable had an effect on the dependent variable. For the null hypothesis to be false, all we’d have to be able to say is that the IV had some effect on the dependent variable. It doesn’t have to have a large effect or even a noticeable effect. It’s a matter of whether it had any effect. All you’d have to do is to see if the SS Accounted-For was greater than zero or not. If it’s equal to zero there’s no evidence the null hypothesis is false and there’s no reason to conclude that the independent variable has an effect on the dependent variable. If it’s greater than zero the null hypothesis must be false; any differences between the means would indicate that, to at least some extent, changing the conditions in terms of the independent variable result in changes in the scores on the dependent variable. One-Way ANOVA 9 10 Unfortunately, we don’t live in that perfect world. All we have to work with are sample means – estimates – and these estimates don’t have to be perfect. It’s almost certain that even if the independent variable did nothing to people’s scores, the sample means will be at least a little bit different from each other, just by chance. Even when the null hypothesis is true the sums of squares accounted-for is almost certain to be at least a little bit bigger than zero, just by chance. We can’t tell if the independent variable had an effect on the dependent variable by just looking to see if the Sum of Squares AccountedFor is greater than zero or not. It could be different from zero just by accident – by chance alone. Because the Sum of Squares Accounted-For could be greater than zero just by chance the question becomes one of deciding whether the Sum of Squares Accounted-For is enough greater than zero to be confident that it’s not just greater than zero by chance. We’re now in exactly the same kind of situation we were in when we were doing Z-tests and t-tests. We’re forced to make a decision based on some odds. Just like with a t-test it’ll turn out that the only thing we’ll be able to know for sure are the odds of making a mistake if we decide to reject the null hypothesis. The idea behind both a z-test and a one-sample t-test was that there was one number we were making our decision about, the mean of a sample. The question was whether we were willing to believe that our one sample mean was a member of a collection of other sample means obtained when the null hypothesis was true. We compared one sample mean to a bunch of other sample means to see if it belonged with them. The idea behind an independent samples t-test was that there was one number we made our decision about, the difference between two sample means. The question was whether we were willing to believe that our one difference between sample means was a member of a collection of other differences between means that were obtained when the null hypothesis was true. We compared one difference between means to a bunch of other differences between means to see if it belonged with them. No matter what the situation was, we handled it by comparing one number to a bunch of other numbers to see if it belonged with them. So what should we do in this latest situation? The same kind of thing! We just need to decide what kind of number we’re going to make our decision about. Then, we’ll figure out what these numbers would look like if we repeated this same experiment thousands and thousands of times when the null hypothesis was true. The situation will then be just a matter of deciding whether or not we’re confident that our number belonged in this collection. If we decide our number doesn’t belong in a collection of numbers obtained when the null hypothesis is true, we’ll have reason to be confidant it must have been obtained when the null hypothesis was false – we’ll decide that the Amount of Tutoring had an effect on Achievement Test scores. So what kind of number should we use? The Sum of Squares Accounted-For would not be a good choice because it’s influenced by both the effect of the independent variable One-Way ANOVA 10 11 and the number of participants in the study (adding up a larger number of squared deviations will give a larger sum of those squared deviations). Two studies could have the identical means, but the study with the larger sample size will end up with a larger Sum of Squares Accounted-For. So, we’re going to have to use a number that’s not influenced by the sample size. The measure that Ronald Fisher latched onto in the 1930s was based on the ratio of the amount of variability accounted-for by the independent variable to the amount of variability not-accounted-for by the independent variable, or… Variability Accounted-For --------------------------------------Variability Not-Accounted-For Now, we’ve already measured these amounts of variability; we’ve got sums of squares for them. So, the tempting thing to do at this point would be to take the Sum of Squares Accounted-For and divide it by the Sum of Squares Not-Accounted-For, giving us a ratio of 8.33. SS Accounted-For -----------------------------SS Not-Accounted-For = 250 ---- = 8.33 30 Unfortunately, we already know you can’t compare one sum of squares to another sum of squares when they’re based on different numbers of values. It just doesn’t make sense, for example, to compare the sum of 20 squared deviations to the sum of only 10 squared deviations. At this point you might well be saying “Hey, wait a minute! Didn’t we just add up 15 squared deviations to get the SSAccounted-For and didn’t we add up 15 squared deviations to get the SSNot-Accounted-For? We added up the same number of values each time. Why can’t I compare these numbers to each other?” And that would be a very reasonable question. The answer is that what matters isn’t the number of values we added up. What matters is the number of independent values we added up and, whether it seemed like it or not, they weren’t always equal to 15. For example, the SSAccounted-For was based on deviations between the group means and the total mean. How many different deviations between group means and the total mean were there? There were only three group means so we were only dealing with three different deviations (+5, 0, -5). We didn’t have 15 independent numbers. In that situation, we only had three. For the moment, I guess you’ll just have to trust us, but in a few pages we’ll show you why the number of independent values that contribute to the Accounted-For source of variability is very different from the number of independent values that contribute to the Not-AccountedFor source of variability. Okay, so what do we do? Well, you know you can’t compare the sum of one set of numbers to the sum of another set of numbers when they’re based on different numbers One-Way ANOVA 11 12 of values, but you can compare the mean of one set of numbers to the mean of another set of numbers, even when the two sets are based on different numbers of values. And that’s what we need to do here. Instead of comparing the SUM of squared deviations accounted-for to the SUM of squared deviations not-accounted-for, we need to compare the MEAN of the squared deviations accounted-for to the MEAN of the squared deviations not-accounted-for. And what’s another name for the mean of a bunch of squared deviations? The variance! The statistic we need will come from taking the Variance Accounted-For and dividing it by the Variance Not-Accounted-For. The name for this value is the F-ratio and is represented by the letter F. Variance Accounted-For F = ------------------------------------Variance Not-Accounted-For The strategy Fisher came up with was to compare one variance to another variance. Logically enough, he referred to this technique as Analysis of Variance. So how do we get these variances? That’s easy. You remember how to calculate the variance for a set of raw scores. You take the sum of squares and then divide it by the appropriate number of degrees of freedom. ∑(X - X )2 S2 = ----------N-1 The steps needed to calculate the F-ratio for our experiment are organized in a table referred to as an ANOVA Table. These steps are presented as we work our way from the left side of the table to the right. The ANOVA table, shown in Table 8.7, starts by listing the two sources of variability we need to calculate an F-ratio – variability accounted-for and variability not-accounted-for. To be consistent with the traditional jargon associated with ANOVA, we use the term Between-Groups in place of the more intuitively meaningful term “Accounted-For”. We use the term Within-Groups in place of the term “Not-Accounted-For” and we also include a row at the bottom of the table for the total amount of variability. Table 8.7 Source -------------------Between-Groups Within-Groups Total One-Way ANOVA 12 13 In Table 8.8 we can now list the sum of squares for each source of variability. Table 8.8 Source -------------------Between-Groups Within-Groups Total SS ----250 30 280 At this point, we know we need the variances that correspond to each of the two sources of variability. To get them, we divide each sum of squares by its appropriate number of degrees of freedom. The abbreviation “df”, in Table 8.9 below stands for “degrees of freedom”. “MS” refers to the “Mean Square” for each source of variability. A Mean Square is the same thing as a variance. Remember, the variance is nothing more than the mean of a bunch of squared deviations. For the moment, we simply describe the procedure for calculating the number of degrees of freedom for each source of variability. Once we’ve gotten our F-ratio we’ll go back and explain where these numbers came from. The number of degrees of freedom accounted-for is equal to the number of groups minus one. We use the symbol “a” to represent the number of groups, so the equation for the number of degrees of freedom accounted-for becomes “a – 1”. There are three groups in this example, so this leaves us with 2 degrees of freedom. When we divide the Sum of Squares Between-Groups of 250 by its 2 degrees of freedom we get a Mean Square Between-Groups of 125. The variance accounted-for by the independent variable is 125. The number of degrees of freedom Within-Groups is equal to the number of groups, multiplied by the number of people in each group minus one. We use the symbol “n” to represent the number of participants in each group, so the equation becomes “(a)(n-1)”. For the Within-Groups row, a sum of squares of 30 is divided by 12 degrees of freedom, giving us a Mean Square Within-Groups of 2.5. The variance not-accounted-for by the independent variable is 2.5. The steps used to calculate both the Mean Square Between Groups and the Mean Square Within-Groups are presented in table 8.9. One-Way ANOVA 13 14 Table 8.9 a-1 Source -------------------Between-Groups Within-Groups SS df ---- --250 2 30 12 MS ----125 2.5 (a)(n-1) As shown in Table 8.10, the F-ratio is simply the Mean Square Accounted-For of 125 divided by the MS Not-Accounted-For of 2.5. This gives us a value of 50.0. Table 8.10 Source -------------------Between-Groups Within-Groups SS df ---- --250 2 30 12 MS F ----- ---125 50 2.5 The F-ratio for this data set is 50.0. So what does this number mean? It’s telling us that the ratio of variance accounted-for to variance not-accounted-for is 50:1. Another way of saying the same thing is that the variance accounted-for is 50 times larger than the variance not-accounted-for. That’s the definition of an F-ratio! It tells us how many times larger the variance accounted-for is than the variance not-accounted-for. Is an F-ratio of 50 good? It sounds pretty good. The important question is really whether this F-ratio is large enough for us to be confident that the amount of tutoring really did have an effect on the achievement test scores. Is it large enough for us to be confident that it’s not that large just by chance? Unfortunately, it will always be possible that the Fratio we get is as large as it is just by chance alone. Because of this we’ll never be able to know for sure if we’re making a mistake if we decide to reject the null hypothesis. But, just like in a t-test we’ll be able to know the odds of making a mistake if we reject the null hypothesis. If we use an alpha level of .05 we’re saying we’re willing to reject the null hypothesis if we can show that the odds are less than 5% that it’s true. We’re saying that we’re willing to take on a 5% risk of making a Type I error. Our next step is to figure out whether or not the odds really are less than 5% that our null hypothesis is true. The sampling distribution of the F-ratio and the critical value for F In this chapter we’re asked to make a decision about an F-ratio, so we can refer that decision as an F-test. This F-test uses the same strategy as every other test we’ve talked about. In a Z-test and a one-sample t-test, we compared one sample mean to a bunch of other sample means to see if it belonged with them. In an independent-samples t-test we One-Way ANOVA 14 15 compared one difference between means to a bunch of other differences between means to see if it belonged with them. To make our decision in the same way here we need to compare our F-ratio to a bunch of other F-ratios to see if it belongs with them. In other words, we need to find out what F-ratios look like when the null hypothesis is true and then see if the odds are less than 5% that our F-ratio belongs with these other ones. It’s the same thing as always – one number compared to a bunch of other numbers to see if it belongs with them. So how can we figure out what F-ratios look like when the independent variable doesn’t work – when the null hypothesis is true? The truth is that we don’t need to. We’ve got statisticians to do it for us. They can apply the laws of probability to determine what these F-ratios would look like if someone were to go out and get them. But we can imagine what the process would be like if we went out and tried to get them by hand. First, you’d have to imagine you could know for sure that the null hypothesis is true, but that you went ahead and did the experiment anyway. The first time you do this, let’s say the F ratio turns out to be 0.80. In Figure 8.1 we can locate this F-ratio on a scale of possible F-ratios. Figure 8.1 | | | | | | F |-------------|------------|------------|------------|---0 1 2 3 4 Now, let’s say the null hypothesis is true and we do the same experiment a second time. There’s no reason, other than chance for the F-ratio to be greater than zero, but now the F-ratio turns out to be 1.20. Figure 8.X shows where this second F-ratio goes on the scale. Now, we have two F-ratios collected when the null hypothesis is true. Figure 8.2 freq. | | | | | | F F |-------------|------------|------------|------------|---0 1 2 3 4 Now, imagine that you do the same experiment over and over and over and over and over. And every time you do the experiment you collect an F-ratio when the null hypothesis is true. One-Way ANOVA 15 16 Figure 8.3 gives us a sense of what F-ratios look like when the null hypothesis is true. The name for this collection of numbers is the Sampling Distribution of the F-ratio. freq. | F | | F | F F | F F F | | F F F F | | F F F F F F F | F F F F F F F F F F F |----------------|---------------|---------------|--------------|---0 1 2 3 4 F-ratio Figure 8.4 show that if we collect the F-ratio from thousands and thousands of experiments conducted when H0 was true, we’d find that the shape of this pile of numbers f would be a nice smooth curve. It’s not a normal curve F because it’s badly skewed to F F the right. It’s an F-curve, because it shows us how FF-ratio ratios pile up on the scale. Knowing the shape of this distribution, a statistician can tell us how far up on the scale we have to go before we hit the start of the upper 5% of numbers that belong in the collection – the 5% of F-ratios we’re least likely to get when the null hypothesis is true. That’s where the critical value for F comes from. It’s how far up the scale an F-ratio has to be before the odds become 5% or less that the null hypothesis is true.2 2 The reason critical values for F are different for different combinations of df Between-Groups and dfWithin-Groups is that every time you change either the number of groups or the number of subjects in each group you change the shape of the curve. And if you change the shape of the curve you change the location of the starting place of the upper 5% of the F-ratios that make up the curve. The more degrees of freedom you have in either the numerator or the denominator of the F-ratio the more the F-ratios are clustered closer to the center of the curve (giving you smaller critical values). In our example, the critical value for F was 3.89 because 3.89 was how far up the scale we’d have to go to get to the start of the outer 5% of the area under a curve with that particular shape. One-Way ANOVA 16 17 Pragmatically, we can find the critical value for F we need by looking it up in a Critical Values for F Table (see Tables 4.1 – 4.4 in the Appendix). There are three things you need to know in order to look up a critical value for F. First, you need to know the alpha level you want to use. In this case we’re using an alpha level of .05, so go to Table 4.1 labeled Critical Values for F – alpha = .05 (There are other pages for alpha levels of .025 and .01). Second, you need to know which column to look in. That’s determined by the number of degrees of freedom in the Between-Groups row (the numerator) of the ANOVA table – “2” in this example. Third, you need to know which row to look in. That’s determined by the number of degrees of freedom in the Within-Groups row (the denominator) of the ANOVA table – “12” in this example. When we do this we arrive at the value of 3.89 in the table. To say that our F-ratio is significant it has to be greater than or equal to 3.89. Because our F-ratio of 50 is obviously greater than the critical value of 3.89 our decision is to reject the null hypothesis that there is no effect of tutoring on achievement test scores. Our conclusion therefore becomes: Tutoring has a significant effect on Achievement Test scores, F (2, 12) = 50.0, p < .05. And that’s it! We compared a value for F we calculated from our data to a critical value for F we looked up in the table. That means we’ve just done an F-test. We’ve learned that changing the amount of tutoring students get changes the scores those students get on the achievement test. We can be confident there are differences among the three sample means. Chapter 9 will cover procedures for going further to determine which groups are different from which other groups. Degrees of Freedom in ANOVA O.K. Now let’s go back and talk about where the degrees of freedom came from. Why are there 2 degrees of freedom in the Between-Groups row and 12 degrees of freedom in the Within-Groups row? It seems like it should be 14 degrees of freedom for both of them. After all, in each case we added up 15 squared deviations to get each sums of squares. Shouldn’t the number of degrees of freedom for each row be equal to the number of values minus one or 15-1 = 14? When working with the Sum of Squares Total that’s exactly what it is. The total number of degrees of freedom for the study is equal to the total number of participants (15) minus one degree of freedom, giving us 14 degrees of freedom. But for the accounted-for and not-accounted-for sources of variability there’s something else we have to keep in mind. The number of degrees of freedom is always equal to the number of independent values that are free to vary. That phrase “independent values” is what we need to focus on at the moment. As far as the number of degrees of freedom Between-Groups goes, think of it this way. The Sum of Squares Between-Groups (accounted-for) is based on deviations between the group means and the Total Mean ( X G – X T). In effect, we can think of this as a situation where the three group means are being used to calculate the Total Mean. If we know that the Total Mean is equal to 8.0 and we know that one of the three group means is equal to 13.0, are the other two group means free to vary? Do these last two group means have to One-Way ANOVA 17 18 be any particular number? No. As long as we pick numbers for these last two means that get the Total Mean to come out to 8.0, this will work out fine (See Figure 8.5). X 1 = 13.0 X T = 8.0 X 2 = ??? X 3= ??? Now, let’s say you know that a second group mean in the set is 8.0. Is the last group mean fixed? Does it have to be a particular number? Now the answer is YES! If the first two group means are 13.0 and 8.0, Figure 8.5 shows that the last group mean has to be 3.0 to order to get the Total Mean to be 8.0. X 1 = 13.0 2 df Accounted-For X T = 8.0 X 2 = 8.0 X 3 = ??? Out of the three group means being used to calculate the Total Mean, only two of them are free to vary. Once we know two of them the last one is fixed; it is not free to vary. That’s one way of describing why the number of degrees of freedom for the BetweenGroups term is equal to the number of group means minus one – in this case, two. So how about the number of degrees of freedom Within-Groups (not-accounted-for)? Where does this number come from? Think of it this way. The Sum of Squares WithinGroups is based on deviations between the raw scores and the group means (X - X G). In effect, we can think of this situation as one where the raw scores are being used to calculate the group means. If we look at the scores for the first group, the mean of the five scores in this group is 13.0. If we know that one of the five scores in this group is 15, are the other four scores in the group free to vary? Yes. You could change those numbers around, as long as the mean of the group came out to 13.0. How about if you know that three of the five scores in that group are 15, 14, and 13? Are the remaining two scores free to vary? They are. Now, if you know that four of the five scores in the group are 15, 14, 13, and 12, is the last score in that group free to vary? NO! To get the mean of that group to come out to 13, that last score in the group has to be 11. The last score is fixed; it is not free to vary. So when five raw scores are being used to calculate that one group mean only four of them are free to vary. There are four degrees of freedom within that one group. One-Way ANOVA 18 19 The same thing goes for any of the three groups. There are four degrees of freedom within the first group, four degrees of freedom within the second group and four degrees of freedom within the third group. Taking every group into account, this brings us up to a total of 12 degrees of freedom Within-Groups, see Figure 8.6. This corresponds to the value we got when we applied the formula “(a)(n-1)”.3 X X T = 8.0 1 X = 13.0 2 = 8.0 X 3 = 3.0 X1 = 15 X2 = 14 X3 = 13 4 df X4 = 12 X5 = ??? X6 = 10 X7 = 9 X8 = 8 4 df 12 df X9 = 7 Not-Accounted-For X10 = ??? X11 = 1 X12 = 2 X13 = 3 4 df X14 = 4 X15 = ??? We’d like to mention one more thing about degrees of freedom. In Figure 8.7 below we’ve got the same ANOVA table we worked out before, except that we’ve added back in the additional row for the Total source of variability. You can see in the Sum of Squares column the same relationship we stated earlier. The SSTotal is equal to the SSBetween-Groups plus the SSWithin-Groups. This was true because the SSTotal was something that could be partitioned into two pieces, accounted-for and not-accounted-for. Source -------------------Between-Groups Within-Groups Total SS df ---- --250 2 30 12 MS F ----- ---125 50 2.5 280 14 It turns out that the same relationship holds for degrees of freedom. The total number of degrees of freedom is also something we can take apart, or partition. With 15 subjects 3 The equation (a)(n-1) works fine when the we have the same number of people in each group. Our discussion of ANOVA assumes we equal sample sizes. An alternative method for calculating the df notaccounted-for is to take the total number of participants, N Total, and then subtract the number of groups, a. The equation for this alternative strategy becomes NTotal – a. One-Way ANOVA 19 20 there are 14 total degrees of freedom. Out of 14 degrees of freedom we have in all, 2 of them can be attributed to the accounted-for source of variability and 12 of them can be attributed to the not-accounted-for source of variability. So, the degrees of freedom accounted-for and not-accounted-for aren’t both equal to 14, they add up to 14. Conceptual definitions of the F-ratio What value for F should we get when the null hypothesis is true? Zero? It sure seems like it should be zero, but is that really the number we’re most likely to get? Let’s think about it. Our experiment had three groups. Conceptually, when the null hypothesis is true, the reality is that when we ran the experiment we started out by drawing three samples from the same population. Then we did something we thought would make the groups different from each other – in our example, we gave each group a different amount of tutoring. However, in reality the independent variable didn’t do anything to change student performance on the achievement test. Because the researcher didn’t do anything to change anybody we end up with three samples drawn from the same population. Because of this, when the null hypothesis is true, the means of the three samples are, in reality, all estimates of the same population mean. This situation is represented in Figure 8.8. X1 X2 X3 μ If sample means gave us perfect estimates the SSBetween-Groups would be equal to zero because the there would be no differences at all – no variability at all – between the group means. But of course, you can’t expect these estimates to be perfect. Even when the group means are all supposed to be the same number, they’ll almost certainly be at least a little bit different from each other just by chance. When there are differences between the group means – even if only by chance – the SSBetween-Groups is going to be greater than zero. This, in turn, will produce an F-ratio that is greater than zero. So, even when the null hypothesis is true, the F-ratio will almost always be greater than zero just by chance. When the null hypothesis is true, the independent variable didn’t cause the group means to be different from each other; they’re only different from each other by chance. And in statistics, anything you don’t have an explanation for, or that happens by chance, is referred to as Error. When sample means are only different from each other by chance the Mean Square Between-Groups (accounted-for) will be greater than zero due only to Error As far as the denominator of the F-ratio is concerned, that number is based on the deviations between the raw scores and the group means. Those are all deviations we don’t have an explanation for. We don’t know why it is that when the people in a group are all treated alike (as far as the independent variable is concerned) they don’t give us the same score. There must be some explanation for it, but we don’t have it. And in statistics, anything you don’t have an explanation for, or that happens by chance, is One-Way ANOVA 20 21 referred to as Error. Consequently, for a statistician, the Mean Square Within-Groups (not-accounted-for) is attributed to Error. As shown in Figure 8.9, this results in a situation where, conceptually, when the null hypothesis is true, the F-ratio is equal to Error divided by Error, giving us an F-ratio right around 1.0. Error F = ------- ≈ 1 Error Whatever forces of chance or error are operating in numerator to give us a number above zero will tend to be roughly equal to the forces of chance or error that give us a number in the denominator that’s above zero. If the error in the top part is roughly the same as the error in the bottom part, we’re going to get an F-ratio that’s right around 1.0. The critical value for F isn’t telling us how far above zero our F-ratio has to be in order to reject the null hypothesis. It’s telling us how far above one an F-ratio has to be to get to the point where only 5% of F-ratios are that far above 1.0 just by chance. When the null hypothesis is false, there is something in addition to chance that’s making the group means different from each other. When the alternative hypothesis is true the independent variable has an effect on the scores. This means the independent variable is actively driving the group means away from each other. The group means aren’t just different from each other by chance, but for a reason. They’re different from each other because of the effect of the independent variable. As shown in Figure 8.10, when the null hypothesis is false the numerator of the F-ratio is as large as it is due to Error plus the effect of the treatment. However, the denominator is still just due to Error. Error + Effect of the IV F = ------------------------------ > 1 Error Thus, when there’s an effect of the independent variable present in the numerator there’s a reason for the F-ratio to be greater than 1.0. There’s something going on that’s making the variance in the numerator greater than the variance in the denominator. The critical value for F tells us how far an F-ratio has to be above 1.0 to get to the point where we can be confident that the effect of the independent variable is contributing to the numerator of the F-ratio. Assumptions of ANOVA There are a number of assumptions that need to be met in order for the results of an F-test to be valid. By “valid” we mean that the conclusion made on the basis of the test is justified. For example, violating a particular assumption may make it impossible for the researcher to draw any conclusion on the basis of their data. Violating a different assumption might place the researcher in the position of thinking the risk of a Type I One-Way ANOVA 21 22 error was 5%, when in reality it might be considerably higher or lower than this. We’ll talk about the assumptions statisticians worry most about one at a time. Independence of Groups In the context of an experiment, a significant F-test should allow the researcher to infer that the independent variable caused an effect on the dependent variable. This conclusion is based on the assumption that the only thing that makes the groups different from each other is the way in which the researcher made the groups different from each other. If the groups differ in any other respect, the researcher can’t know if a significant F-test is due to the effect of the independent variable or to an effect of this other way in which the groups differ. In Chapter 4 on Independent Samples T-tests, we discussed a confound as an alternative explanation for why sample means are significantly different from each other. The assumption of independence of groups is thus that there are no confounds in the design. The presence of a confound doesn’t alter the odds of making either a Type I or a Type II error. The F-test will allow you to be confident there are differences among the groups. The problem is that the presence of a confound makes it impossible to know why the groups are different from each other. The consequence of violating the assumption is that the study no longer has internal validity; that is, we can no longer draw the conclusion that it was the independent variable that caused an effect on the dependent variable. The assumption of independence of groups isn’t a statistical assumption. It’s an assumption about the design of the study. If a study’s design is invalid, any F-test conducted on the data from that study yields a conclusion that is also invalid. Independence of Scores The assumption of independence of scores is that all of the score were collected independently of every other score. In other words, nobody’s score was influenced by anyone else’s score. For example, an instructor might want to see which of two sections of a course does better on an exam. In one of the sections a student sitting in the middle of the classroom gets a 98 and four other students within 20/20 vision of that student also get 98s. The researcher finds that the mean for that section is significantly higher than the mean of the other section. There’s nothing wrong with the F-test used to test the difference between the means. Again, there’s something wrong with the design of the study. The doofus instructor let four people cheat off the smart kid! Of course the mean for that group is going to be higher! We can’t draw the conclusion that the students in that section learned more of the material than the students in the other section because four of the scores were influenced by one of the other ones. There’s nothing wrong with the F-test itself. It accurately tells the instructor that one mean is significantly higher than the other one. But again, the flaw in the design makes it impossible to draw a valid conclusion about why that mean was higher. One-Way ANOVA 22 23 Normality This is simply the assumption that the scores in each of the groups are normally distributed – that the frequency distributions for all of the groups look more or less like the normal curve. If one or more of the groups has a distribution that is strikingly different from normal (e,g., badly skewed, bimodal) the F-ratio will be biased; it will be more likely to be too high than it is to be too low. If the F-ratio is inflated, this results in a situation where the researcher is more likely to reject the null hypothesis than they deserve to be. They may think the odds are 5% of making a Type I error, but, depending on how far the distribution departs from normal, the true risk may be 7% or 8%. Fortunately, Analysis of Variance is said to be robust with respect to this assumption. This means the violation of the assumption has to be quite severe before the F-ratio is inflated by enough to worry about. If the researcher does have reason to think that one or more of the distributions is very different from normal, they might consider applying some type of transformation to the scores. Software packages like SPSS make a number of options available, such as the logarithmic or arcsine transformations. The goal of these transformations is to produce sets of scores whose distributions are closer to normal than those of the original raw scores. Homogeneity of Variance (Equal Variances) A fourth assumption is that the variances of the scores in the different groups are the same. The jargon term statisticians use for this assumption is homogeneity of variance. This just means that however much the scores in one group are spread out around their mean, the scores in the other groups are spread out to the same degree around their means. A more intuitive name for the assumption might be that of equal variances. Expressed in symbols, the assumption is: S2Group 1 = S2Group 2 = S2Group 3 If there are striking differences among the group variances the consequences of violating this assumption are similar to those of violating the assumption of normality: the F-ratio becomes biased. Under some conditions the F-ratio becomes biased so that it’s more likely to be too high than it is to be too low. Under other conditions, it’s biased so that the observed value for F is more likely to be too low than it is to be too high. Distinguishing between these circumstances is beyond the scope of this book; however, whichever way it turns out, the researcher might think the odds of making a Type I error are 5% when the actual amount of risk is significantly different. One thing the researcher should know is that the problem pretty much goes away when the sample sizes for the groups are approximately the same. The more the sample sizes differ from each other the more biased the F-ratio will be. With equal sample sizes, even a large violation of the assumption is not very likely to change the odds of committing either a Type I or a Type II error. One-Way ANOVA 23 24 Levene’s Test. One way to see if we’re meeting the assumption of equal variances is to conduct Levene’s Test. It tests whether or not there are significant differences among the group variances. If Levene’s Test is not significant, it means the answer is “no” and there’s nothing to worry about in terms of meeting the assumption. In SPSS, if you ask for a “Homogeneity of Variance Test” as one of the Options for a One-Way ANOVA, you’ll get Levene’s Test in the output. F-Max. If Levene’s Test is significant then, technically, the data don’t meet the assumption of homogeneity of variance. However, Analysis of Variance is also robust with respect to the assumption of equal variances. This means that one group variance has to be quite a bit larger than another one for the F-ratio to be inflated by enough to worry about. So how do we tell if there’s enough of a violation to worry about – enough of a violation to have to make some type of adjustment when we test our F-ratio? One way of evaluating the severity of the violation is through a statistic called F-max. F-max gives us a measure of just how bad the situation is. We know we have a violation of the assumption of equal variances when the group variances aren’t all the same. F-max tells us how many times larger the largest group variance is than the smallest group variance. Calculating it is simple if you have the standard deviations of the various groups. Just find the largest standard deviation and square it – that gives you the largest group variance. Then find the smallest standard deviation and square it – that gives you the smallest group variance. Now take the largest group variance and divide it by the smallest group variance. The number we end up with is an F-ratio because any F-ratio represents how many times larger one variance is than another one. S2Largest FMax = ----------S2Smallest Let’s say the value for FMax is 3.0. This tells us that the largest group variance is three times larger than the smallest group variance. This doesn’t sound good, but is it still enough to worry about. This is a situation where different authors give different answers about how large FMax needs to be before we start to worry about it. A middle ground in these values is 6.0, so that’s the number we’re going to recommend. We won’t consider the violation of the assumption of equal variances to be severe enough to worry about until the value for FMax is 6.0 or larger. If a researcher has reason to be concerned that their F-ratio is biased as a result of violating the assumption of homogeneity of variance, one option is to apply the BrownForsyth adjustment in calculating an F-ratio. An alternative is the Welch procedure. Both are fairly horrific to do by hand, but programs like SPSS and SAS will calculate adjusted values for F for you using each procedure. One-Way ANOVA 24 25 Closing Thoughts We’ve spent quite a bit of time getting a feel for Analysis of Variance and F-tests. Not only are we now in a position to test the effect of an independent variable that has more than two levels, but the extra time spent working through the reasoning behind Analysis of Variance will pay off later when we see F-tests pop up in several other places. Here are some take-home points: An F-ratio represents how many times larger the variance accounted-for is than the variance not-accounted-for. When the null hypothesis is true we should expect to get an F-ratio right around “1”. The critical value for F tells us how far above “1” our F-ratio has to be before we’re just not willing to believe it could have been collected when the null hypothesis is true. An F-test is still just one number (an F-ratio) compared to a bunch of other numbers (a bunch of other F-ratios) to see if it belongs with them. It’s the same strategy as in other chapters, just with a new statistic to deal with. A significant F-test only tells us that there’s an effect of the independent variable; that there are differences among the individual group means. The next step, covered in the next chapter, is to conduct a set of follow-up tests to determine which group means are different from which other group means. One-Way ANOVA 25