Download Hypothesis Testing (continued)

1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test allows us to determine the probability that we are making a Type I error (rejecting the null when it is true). If the probability (pvalue) is less than alpha (generally set at 0.05) then we reject the null hypothesis (conclude that there is a difference) with at least 95% confidence that the difference between the two conditions is not due simply to chance. Take a detour with me. Imagine your instructor stated that she could obtain a 6 on the throw of a single die – and then she did it! Would that be impressive? What would the probability of doing that be? _____________. Now suppose that she said she could obtain at least one 6 on the throw of a pair of die. Is this more or less impressive than using a single die? What is the probability that she could get a 6 with two dice? _______________ What if she said that she could get at least one 6 by throwing 3 dice? What is the probability now? ___________________. We have the same problem when we want to make more than one comparison of condition means within a study. For example, we may want to include 3 conditions in our study A, B and C. I am interested in looking at difference between the means of conditions A and B; between the means of condition A and C and between the means of conditions B and C. Each new comparison we make is like adding an additional die to the example above. If we set alpha at .05 we have 5% chance of making a type one error with every comparison we make. The probability of making a Type I error in this study is not 5%, it is, in fact, 15%. Continuing with the example we have been using on the Marital Status and Happiness Ratings. The dependant variable in this example is happiness ratings. Assume that I interviewed 20 Married, 20 Single and 20 Divorced persons from Grant County for this study and obtained their responses to the Happiness Questionnaire. My Independent variable is Marital Status and it has three levels; single, married, and divorced. Below are Descriptive Statistics for this study. They are presented in the same format that would be produced if you analyzed the results using a computer statistical analysis program called Statistical Package for the Social Sciences (SPSS). As we discuss ANOVAs you will need to learn to interpret results from tables presented in the format that would be produced by SPSS. The Descriptive Statistics Table below contains the means, standard deviations and sample sizes (N) for the total sample as well as for each of the conditions. Descriptive Statistics Dependent Variable: Happiness rating Marriage Status Mean Std. Deviation Single 5.4000 1.6670 Married 6.6500 1.7252 Divorced 4.9500 1.7614 Total 5.6667 1.8381 N 20 20 20 60 To determine if there are statistically significant differences among the mean happiness ratings for these three conditions, I could do three t-tests to compare all possible combinations of these groups. (i.e., I could compare single to married; single to divorced and married to divorced.) But, remember for each t-test we have a 5% chance of making a Type I error. If I do three t-tests, I triple that chance. In fact, I would then be running a 15% chance of making a Type I error in the entire set of comparisons. If I found a significant difference for any or for all of these comparisons, I could not say that I was 95% confident that the difference is not due to chance. I could only be 85% confident. In science, that is not good enough! Fortunately, there is a method for comparing more than two means. The procedure is called an Analysis of Variance (ANOVA). When we have one independent variable (with 3 or more levels) we use a procedure called a One-way ANOVA. There are other variations that can be used for factorial designs. 2 For example, if we did a study with two independent variables, we would use a Two-way ANOVA to analyze the results. What do you think they would call the analysis used for a study with five independent variables? An ANOVA can be used to compare any number of group means and still maintain the probability of making a Type I error at 5%. An ANOVA is called an omnibus test because it looks at the amount the whole set of means differ from each other and determines if that pattern of differences is likely to have occurred less than 5 % of the time by chance alone. If the entire set does not have a p value of greater than .05, than no one comparison can either. Recall when hypothesis testing we are deciding which of two possible hypothesis to accept as the most likely conclusion to our study. The two hypotheses for the ANOVA are: Scientific Hypothesis - there are differences between at least 2 of the groups. Null Hypothesis – There are no differences among the groups. The logic of this test is really simple (the mathematics is more complex – but the computer does it). Here is the basic idea. Review: When we discussed measures of dispersion much earlier in the term, we talked about range, deviations scores (the score minus the mean), variance and standard deviations. The range is not a good estimate of the dispersion of scores because it is greatly influenced by extreme scores. Deviation scores when summed, equal zero, so they too are useless a measure of the dispersion of scores in a sample distribution. To get around this problem we can square all the dispersion scores (this makes them all positive numbers) then we can sum them and divide by the sample size to obtain the mean squared amount scores in the distribution deviate from the mean. This mean squared dispersion is called the variance of the distribution. Since most of us have difficulty thinking in squared amounts, it is generally more useful to think about the standard deviation of a distribution. The standard deviation is the square root of the variance and can be thought of as the mean amount the scores in the distribution vary from the distribution mean. Why are we talking about variances instead of standard deviations? The problem with the standard distribution is that it is a square root. We cannot add, subtract, multiply or divide square roots without squaring them first. For example, 9  9  18 . Since variances are easier to work with mathematically, we use them for this analysis. Keep in mind however, that a variance is just a measure of the dispersion of scores. The larger the variance, the more spread out the scores are in the distribution. 1) We start with the assumption that people in general differ from each other in happiness. I expect that married people show the same variability in Happiness that single people do and that divorced people do. In other words, I might expect that marriage shifts the Happiness ratings of the entire group, but does not effect how much variability in happiness there is within the group. If there are differences between my groups that are due to the independent variable (Marital Status), I expect the means of the groups to differ – but not the variances. 3 4 5 6 7 8 3 We refer to the amount of variation that is associated, in general, with the dependant variable as Within Groups Variance. It is the amount that scores between individuals within the same condition would be expected to vary from the mean. It does not have anything to do with variation due to the level of the IV. It can be thought of as variance that is due to random variation between individuals or what statisticians call Error. Having three groups (conditions) I have three estimates of this Within Groups Variance. Using the average of these three Within Groups variance estimates gives a better estimate of the general variation of happiness in the population. We make a second assumption -- that we can use Within Groups Variance to estimate the amount of variation we would expect to find Between Groups if the Independent variable has no effect (if the null is true). So we assume that Within Groups Variance give a good estimate of Error variance. This estimate of Error variance is called the Mean Squared Error (MSE). That leaves just one last step measuring the amount of variance between groups. The way this is computed is fairly complex but for this class we do not need to worry about that (SPSS will do it for us). If we repeatedly measured samples drawn from the same population, over and over again, we would not expect to get exactly the same mean each time. (Remember, the distribution of the means we discussed when we discussed t-tests.) Similarly, if we sample three levels of our IV, even if they do not differ from each other on the DV, we would not expect to get exactly the same means for each condition. The means will vary simply due to random variation. But they might also vary due to differences in the level of the IV. What is important is that you understand that Between Groups Variance is due to both random variation and to variation due to the independent variable. Between Group Variance is the variance of the entire set of scores in the study. Between Group Variance = Random Variance + Variance Due to the Independent variable 3 4 5 6 7 8 Assume for a moment that there is no effect of the Independent variable. This means that the independent variable adds zero variance to that we would expect to find due to chance. If there are no differences between our groups, then we would expect our Within Group Variance and our Between Group Variance to be equal. What would we expect the ration of these two variance estimates to be if there were no effect of the IV? Random Variance + Variance Due to the Independent variable = Random Variance + 0 Random Variance Random Variance We would expect to find a ratio of one. The amount that the ratio differs from 1 can be attributed to variation due to the IV. So, if the IV has an effect, the ratio of Between Groups Variance to Within Groups Variance will be greater than one (i.e., the numerator will be greater than the denominator). This is called an F ratio. Because we are only using estimates of variances we do not expect that the ratio will always be one even when the IV does not have an effect. How much the F ratio needs to be above 1 for us to be 95% sure that there really is an effect of the Independent variable can be determined using probability theory, Again (lucky us!) this is done for us by the computer program. SPSS provides the following type of output. 4 Tests of Between-Subjects Effects Dependent Variable: Happiness rating Source Sum of Squares Marital Status 31.033 Error 168.300 Total 2126.000 df Mean Square 2 15.517 57 2.953 60 F 5.255 Sig. .008 F is the ratio of Mean Squared Variance due to Marital Status (Between Group Variance) and Mean Squared Error (Within Group Variance). Sig. (in the final column) is the probability (p value) of making a Type I error. Because p is less than .05 we reject the null hypothesis and conclude that there is a significant difference between at least two of the means. We would report this result by stating that “A One-way ANOVA determined that Happiness ratings significantly differ among Marital Status Groups (F(2,57) = 5.26; p = .008).” The numbers in the parenthesis following the letter F are the degrees of freedom (df) associated with this analysis. The first is the degrees of freedom between groups and the second is the degrees of freedom within groups. They are related to the number of conditions and the number of subjects. They must be reported in APA reports. They are always reported in parenthesis after the letter F and are always reported in the order between Groups df, within groups df separated by a comma. A significant result for a One-way ANOVA allows us to reject the null hypothesis and conclude that the scientific hypothesis is a valid conclusion. (Remember the Scientific hypothesis is that there are differences between at least 2 of the groups.) The One-Way ANOVA does not tell us, which groups differ from each other. To determine that we need to make individual comparisons When the F ratio is significant, SPSS continues the analysis by running Multiple Comparisons of the sets of means to determine which means are significantly different from each other. These are very much like doing t-tests between all combinations of the means. Why can we do them now? Having found a significant F ratio from the One-way ANOVA we know that the level of Type I error is limited to 5% for the entire set of group comparisons. So, it is safe to go ahead and do the three separate comparisons. Since the pattern of differences we found was unlikely to occur (p < .05) if we just selected three samples randomly from a population we can be assured that any significant differences we find in the multiple comparisons are not due to having done multiple test (like throwing the dice three times) but are actually due to the effects of the IV. There are several ways of doing these Multiple Comparisons. I have had SPSS do a Least Squares Difference Test (LSD). Looking at the Multiple Comparisons Table below, the difference between each set of means is listed in the third column, and the p value in the last column. The LSD multiple comparisons analysis determined that married people rate themselves as significantly more happy then single people (p = .025) and than divorced people (p = .003), whereas single and divorced people do not differ on Happiness ratings. Multiple Comparisons Dependent Variable: Happiness rating LSD Mean Std. Error Difference (I-J) (I) Marital (J) Marital Status Status Single Married -1.2500 .5434 Divorced .4500 .5434 Married Single 1.2500 .5434 Divorced 1.7000 .5434 Divorced Single -.4500 .5434 Married -1.7000 .5434 Sig. .025 .411 .025 .003 .411 .003 5 If you were answering an exam question, I would be looking for the following. A statement about the one-way ANOVA: The one way ANOVA was significant (F(2,57) = 5.255, p = .008). If the one-way ANOVA is significant, then you need to give a FULL interpretation of the Least Squares Difference Multiple comparisons. If there are three conditions you need to make three statements. If there are four conditions you need to make 6 comparisons. If there are five conditions you need to make 10 comparisons. In our case The LSD multiple comparisons analysis determined that married people rate themselves as significantly more happy than single people (p = .025) and then divorced people (p = .003), whereas single and divorced people do not differ on Happiness ratings. Within Subject Designs When the design of the study is Within Subjects, the Post Hoc Multiple comparisons would be paired ttests instead of LSDs. We will see an example of this in concept checks.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hypothesis Testing (continued)