Download Lecture 8 - Notes - for Dr. Jason P. Turner

Analysis of Variance (ANOVA) MARE 250 Dr. Jason Turner First name: Mister, Last name: T “I pity myself that I got to be with these fools!”– Mr. T Hypothesis Testing for Two Means Test for comparing the means of two populations One of the most common (and simple?) tests in statistics What is the procedure if there are more than two populations? Mr. T – Part 2: Electric Boogaloo Why not run multiple T-test? μ1 μ2 μ3 1. Number of t-tests increases with # of groups becomes cognitively difficult 2. ↑ Number of analyses = ↑ probability of committing Type I error Probability of committing at least one type I error = experiment-wise error rate Analysis of Variance (ANOVA) Method for comparing multiple sample means Compares the means of a variable for populations that result from a classification by one other variable – Factor Levels of the Factor – the possible values of the factor At this point dealing with: One-Way AVOVA Analysis of Variance (ANOVA) Want to compare egg production in four populations of Roi in Hawaii Variable of interest = egg production Four populations result from classifying Roi in Hawaii by the Factor “Beach locations” whos Levels are Ka Lae, Kapoho, Richardson’s, Keauhou Bay Assumptions for One-Way ANOVA One-Way ANOVA Four assumptions for t-test hypothesis testing: 1. 2. 3. 4. Random Samples Independent Samples Normal Populations (or large samples) Variances (std. dev.) are equal Other similarities T & ANOVA “I pity the fools that think T and AVONA are similar!”– Mr. T A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal. The method is an extension of the two-sample t-test, specifically for the case where the population variances are assumed to be equal. Other similarities T & ANOVA “That sucka stole my van!”– Mr. T A one-way analysis of variance requires the following: Response - measurement taken from the units sampled. Factor - discrete variable that is altered systematically. The different values chosen for the factor variable are called levels of the factor. Each level of the factor in the analysis corresponds to a larger population with its own mean. The sample mean is an estimate of the level mean for the whole population. Other similarities T & ANOVA A one-way ANOVA can be used to tell you if there are statistically significant differences among the level means. The null hypothesis for the test is that all population means (level means) are the same. The alternative hypothesis is that one or more population means differ from the others. Analysis of Variance (ANOVA) Method for comparing multiple sample means Compares the means of a variable for populations that result from a classification by one other variable – Factor Levels of the Factor – the possible values of the factor At this point dealing with: One-Way AVOVA Mr. T Says… “I pity the fools that think T and ANOVA are similar!”– Mr. T also “Stay in School fool!” and the popular “Drugs are for Chumps!” A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal The method is an extension of the two-sample t-test, specifically for the case where the population variances are assumed to be equal The null hypothesis for the test is that all population means (level means) are the same H0: μ1 = μ2 = μ3 = μ4 The alternative hypothesis is that one or more population means differ from the others Ha: Not all means are equal One-Way ANOVA : Analysis of Variance Table One-way ANOVA: Egg production versus Location Source DF SS MS F P Paint 3 281.7 93.9 6.02 0.004 P value Error 20 312.1 15.6 F statistic Total 23 593.8 Mean Squares Sum of Squares Degrees of Freedom F F is the statistic used to test the hypothesis that all the factor level means are equal It is calculated as the mean square for the factor divided by the mean squares for error F is used to determine the p-value Like the T statistic for a t-test, or an R-J value for a Ryan-Joiner Four Sample Means, Only One Result? “I pity the fools that has only one result for 4 sample means!”– Mr. T A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal The null hypothesis for the test is that all population means (level means) are the same H0: μ1 = μ2 = μ3 = μ4 The alternative hypothesis is that one or more population means differ from the others Ha: Not all means are equal Source DF SS MS F P We reject the null Pop 3 281.7 93.9 6.02 0.004 = all means not equal Error 20 312.1 15.6 Total 23 593.8 Is that all? Confidence Intervals Confidence Interval – (CI) provides a range of numbers along with a percentage confidence that the parameter lies in that range Individual 95% CIs For Mean Based on Pooled StDev Level Pop 1 Pop 2 Pop 3 Pop 4 N Mean StDev 6 14.733 3.363 6 8.567 5.500 6 12.983 3.730 6 18.067 2.636 Pooled StDev = 3.950 (-----*------) (------*------) (------*------) (------*------) 95% confidence intervals for each level of the factor When the p-value in the analysis of variance table indicates there is a difference among the factor level means, you can use the table of individual confidence intervals to explore the differences: Each asterisk represents a sample mean Each set of parentheses encloses a 95% confidence interval for the mean of a population. You can be 95% confident that the population mean for each level is within the corresponding interval Level N Mean StDev Pop 1 6 14.733 3.363 (-----*------) Pop 2 6 8.567 5.500 (------*------) Pop 3 6 12.983 3.730 (------*------) Pop 4 6 18.067 2.636 (------*------) 95% confidence intervals for each level of the factor If the intervals do not overlap; suggests population means are different Interpret with caution; rate of type I error increases when making multiple comparisons Should use one of the four available methods for controlling the rate of type I error Intervals for Pop 2 and Pop 4 do not overlap; suggests population means for these levels are different Level N Mean StDev Pop 1 6 14.733 3.363 (-----*------) Pop 2 6 8.567 5.500 (------*------) Pop 3 6 12.983 3.730 (------*------) Pop 4 6 18.067 2.636 (------*------) Multiple Comparisons Allow you to determine the relations among all the means Several methods: Tukey, Fisher’s LSD, Dunnett’s, Bonferroni, Scheffe, etc Most focus on Tukey Tukey's method Tukey's method compares the means for each pair of factor levels using a family error rate to control the rate of type I error Results are presented as a set of confidence intervals for the difference between pairs of means Use the intervals to determine whether the means are different: If an interval does not contain zero, there is a statistically significant difference between the corresponding means If the interval does contain zero, the difference between the means is not statistically significant Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Ahi Individual confidence level = 98.89% Ahi = Pop1 subtracted from: Ahi Lower Center Upper Pop 2 -12.553 -6.167 0.219 Pop 3 -8.136 -1.750 4.636 Pop 4 -3.053 3.333 9.719 Ahi = Pop 2 subtracted from: Ahi Lower Center Upper Pop 3 -1.969 4.417 10.803 Pop 4 3.114 9.500 15.886 Ahi = Pop 3 subtracted from: Ahi Lower Center Upper Pop 4 -1.303 5.083 11.469 (-------*-------) (-------*-------) (-------*-------) (-------*-------) (-------*-------) (-------*-------) Which multiple comparisons to use At this point, unless otherwise advised – Tukey’s is fine Bonferroni, Fisher’s LSD, Dunnett’s, Bonferroni, Scheffe all used heavily as well Statistical versus practical significance Even if the level means are significantly different from a statistical standpoint, the difference may not be of any practical importance In the Ahi data, the smallest mean is 8.567 and the largest is 18.067. Is this 9.5-unit difference of any practical consequence? Only knowledge of the subject area, not statistics, can be used to answer this question Assumptions for One-Way ANOVA One-Way ANOVA Four assumptions for t-test hypothesis testing: 1. 2. 3. 4. Random Samples Independent Samples Normal Populations (or large samples) Variances (std. dev.) are equal Non-Parametric Version of ANOVA Kruskal-Wallis If samples are independent, similarly distributed data Use nonparamentric test regardless of normality or sample size Is based upon mean of ranks of the data – not the mean or variance (Like Mann-Whitney) If the variation in mean ranks is large – reject null Uses p-value like ANOVA Last Resort/Not Resort –low sample size, “bad” data When Do I Do the What Now? “Well, whenever I'm confused, I just check my underwear. It holds the answer to all the important questions.” – Grandpa Simpson If you are reasonably sure that the distributions are normal –use ANOVA Otherwise – use Kruskal-Wallis

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 8 - Notes - for Dr. Jason P. Turner