Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
One-Way Analysis of Variance (ANOVA) One-way ANOVA is used to compare means from at least three groups from one variable. The null hypothesis is that all the population group means are equal versus the alternative that at least one of the population means differs from the others. This may seem confusing as we call it Analysis of Variance even though we are comparing means. The reason is that the test statistic uses evidence about two types of variability. We will only consider the reason behind it instead of the complex formula used to calculate it. The dot plots present the minutes someone remained on hold before hanging up for three types of a recorded message. Say one got the sample data for A and someone else gathered data in B. Which case do you think gives stronger evidence that mean wait times for the three recording types are not all the same? That is, which gives stronger evidence against Ho: u1 = u2 = u3? In both sets of data the group means are the same. That is, in both case A and B the mean wait time for Advertisement is 5.4 minutes; Classical 10.4 minutes and Muzak 2.8 minutes. What’s the difference then? The variability between pairs of sample means is the same in each case because the sample means are the same. However, the variability within each sample is much smaller for B than for A. The sample SD is about 1 for each sample in B but in A the this SD is between 2.4 and 4.2 We will see that evidence against Ho is stronger when the variability within each sample is smaller. The evidence is also stronger when variability between sample means increases, i.e. sample means are farther apart. Recorded Message Telephone Holding Time (a) 0 2 4 6 8 10 12 14 Advertisement Classical Muzak Telephone Holding Time (b) Advertisement Classical Muzak 2 4 6 8 10 12 1 EXAMPLE For instance, say we were interested in studying if mean salaries for MLB, NFL and the NBA were the same. To answer this question we took a random sample of 30 players each for the three sports leagues. This gives us three levels, or sub-categories, for the overall variable Sports League. Since we want to compare means the question of interest is: Ho: The means salaries for players across the three leagues are the equal Ha: At least one of the mean salaries differs from the others (i.e. not all mean salaries are the same) One-way ANOVA: Salary versus Sport Source Sport Error Total DF 2 87 89 SS 202.7 1450.1 1652.8 MS 101.4 16.7 F 6.08 P 0.003 Individual 95% CIs For Mean Based on Pooled StDev Level N Baseball 30 Basketball 30 Football 30 Mean 3.765 5.204 1.554 StDev 4.861 4.914 1.490 +---------+---------+---------+-------(-------*------) (------*------) (-------*------) +---------+---------+---------+-------0.0 2.0 4.0 6.0 Pooled StDev = 4.083 Interpreting this output: 1. A one-way analysis is used to compare the populations for one variable or factor. In this instance the one variable is Sports League and there are 3 populations, also called group or factor levels being compared: baseball, basketball, and football. 2. DF stands for degrees of freedom. - The DF for the variable (e.g. Sports League) is found by taking the number of group levels (called k) minus 1 (i.e. 3 – 1 = 2). The DF for Error is found by taking the total sample size, N, minus k (i.e. 90 – 3 = 87). The DF for Total is found by N – 1 (i.e. 90 – 1 =89). 3. The SS stands for Sum of Squares. The first SS is a measure of the variation in the data between the groups and for the Source lists the variable name (e.g. Sport) used in the analysis. This is sometimes referred to as SSB for "Sum of Squares Between groups". The next value is the sum of squares for the error often called SSE or SSW for "Sum of Squares Within". Lastly, the value for Total is called SST (or sometimes SSTO) for "Sum of Squares Total". These values are additive, meaning SST = SSB + SSW. 2 4. The test statistic used for ANOVA is the F-statistic and is calculated by taking the Mean Square (MS) for the variable divided by the MS of the error (called Mean Square of the Error or MSE). The F-statistic will always be at least 0, meaning the F-statistic is always nonnegative. This F-statistic is a ratio of the variability between groups compared to the variability within the groups. If this ratio is large then the p-value is small producing a statistically significant result.(i.e. rejection of the null hypothesis) 5. The p-value is the probability of being greater than the F-statistic or simply the area to the right of the F-statistic, with the corresponding degrees of freedom for the group (number of group levels minus 1, or here 3 − 1 = 2) and error (total sample size minus the number of group levels, or here 995 − 3 = 992). The Fdistribution is skewed to the right (i.e. positively skewed) so there is no symmetrical relationship such as those found with the Z or t distributions. This pvalue is used to test the null hypothesis that all the group population means are equal versus the alternative that at least one is not equal. The alternative is not "they are not all equal." 6. The individual 95% confidence intervals provide one-sample t intervals that estimate the mean response for each group level. For example, the interval for Baseball provides the estimate of the population mean salary for major league baseball players. The * indicates the sample mean value (e.g. 3.765). Hypotheses Statements and Assumptions for One−Way ANOVA The hypothesis test for analysis of variance for g populations: Ho: μ1 = μ2 = ... = μg Ha: not all μi (i = 1, ... g) are equal Recall that when we compare the means of two populations for independent samples, we use a 2-sample t-test with pooled variance when the population variances can be assumed equal. For more than two populations, the test statistic is the ratio of between group sample variance and the within-group-sample variance. Under the null hypothesis, both quantities estimate the variance of the random error and thus the ratio should be close to 1. If the ratio is large, then we reject the null hypothesis. Assumptions: To apply or perform a One−Way ANOVA test, certain assumptions (or conditions) need to exist. If any of the conditions are not satisfied, the results from the use of ANOVA techniques may be unreliable. The assumptions are: 1. Each sample is an independent random sample 2. The distribution of the response variable follows a normal distribution (don’t forget about the central limit theorem where for ANOVA we would want a sample of at least 30) 3 3. The population variances are equal across responses for the group levels. This can be evaluated by using the following rule of thumb: if the largest sample standard deviation divided by the smallest sample standard deviation is not greater than two, then assume that the population variances are equal. (For this sports salary data we have the 4.914 as the largest compared to 1.490 the smallest and this ratio is greater than 4. However, since the group sample sizes are equal this is not of great concern). If we reject the null hypothesis how does one determine which mean(s) are statistically different? If the null hypothesis is rejected we would want to look at the comparisons of all possible means, i.e. all possible two-mean tests. This can be accomplished in Minitab by clicking the Comparisons button and then selecting the radio button for Tukey. Note that the theory behind why we choose an adjusted level of alpha to conduct these multiple comparisons and why we select the Tukey method over the other options provided is beyond this course. But to get a general understanding, think of this: If you were throwing a dart at a dart board and hit the bulls-eye on the first toss you would have the right to brag. However, if it took you 10, 15, 20, or even more throws to finally hit the center would anyone really be impressed? Probably not. Similar logic applies here. If we just started testing for a difference by doing all possible two mean comparisons, then just like throwing many darts, we shouldn’t be surprised to find two means that are different. So to avoid this possibility, we adjust the alpha level using some adjustment technique, e.g. the one used by this Tukey method. That is all you need to know about why we do it. The interpretation though is quite straightforward: the Minitab output will provide the intervals for all possible two-mean comparisons and the decision process is simply to inspect the intervals for these comparisons to see if the interval contains 0 (i.e. no difference between the means) or does not contain 0 (i.e. a statistical difference between the means). With our sport salary data being significant, p-value of 0.003 is less than 0.05, we would then look at these multiple comparisons which are: Sport = Baseball subtracted from: Sport Basketball Football Lower -1.074 -4.723 Center 1.438 -2.211 Upper 3.950 0.301 Sport = Basketball subtracted from: Sport Football Lower -6.161 Center -3.649 Upper -1.137 From these we see that the only interval NOT containing 0 is the interval comparing Basketball to Football where the interval is -6.161 to -1.137 Another Example: 4 Center of Disease Control was interested in comparing tar content for four cigarette brands, A, B, C, and D. Since tar content is something measured their interest is in comparing means. With four levels i.e. four cigarette brands, ANOVA methods would be more appropriate than using two-mean methods. The output is: One-way ANOVA: Tar Content versus Brand Source Brand Error Total DF 3 44 47 S = 1.073 Level A B C D N 12 12 12 12 SS 24.00 50.65 74.65 MS 8.00 1.15 F 6.95 R-Sq = 32.15% Mean 10.000 11.000 12.000 11.000 StDev 0.949 0.915 1.602 0.547 P 0.001 R-Sq(adj) = 27.53% Individual 95% CIs For Mean Based on Pooled StDev ------+---------+---------+---------+-(-----*-----) (-----*-----) (-----*-----) (-----*-----) ------+---------+---------+---------+-10.0 11.0 12.0 13.0 Pooled StDev = 1.073 Ho: All means of tar content are equal Ha: At least one mean differs from another To the right of the one-way ANOVA table, under the column headed P, is the P-value. In the example, P = 0.001 which is less than our usual alpha level of 0.05, but not less than 0.01 another often used level of significance. This is why setting alpha prior to conducting one"s analysis is important: to avoid potential conflicts of interest after the analysis is completed. The P-value is for testing the hypothesis above, the mean Tar Content from the 4 brands are the same vs. not all the means are the same. Because the Pvalue of 0.001 is less than specified significance level of 0.05, we reject H0. The data provide sufficient evidence to conclude that the mean Tar Content for the four brands are not all the same. We can see below the ANOVA table, another table that provides the sample sizes, sample means, and sample standard deviations of the 4 samples. Beneath that table, the pooled StDev is given and this is an estimate for the common standard deviation of the 4 populations. Since the largest SD is not more than twice the smallest SD we can assume, using the rule of thumb, that the equal variance assumption is satisfied. With the sample size exceeding 30 we can assume normality by the Central Limit Theorem. Please note, however, that equal sample sizes can be very helpful in the design of ANOVA studies. 5 Having equal sample sizes helps to protect against violations to the equal variance assumption. Also, the lower right side of the Minitab output gives individual 95% confidence intervals for the population means of the 4 brands. This example provides an excellent illustration as to why 2-sample t-tests can be misleading when more than two groups are being compared. As we see from the confidence interval output all of the intervals overlap. This indicates that none of the means would differ if we conducted a series of 2-sample tests of means. However, the ANOVA test does result in at least one mean being significantly different from the other. With this significance we look at the multiple comparisons which indicate that there is only one pair of means that differ: Brands A-C Brand = A subtracted from: Brand B C D Lower -0.171 0.829 -0.171 Center 1.000 2.000 1.000 Upper 2.171 3.171 2.171 Brand = B subtracted from: Brand C D Lower -0.171 -1.171 Center 1.000 0.000 Upper 2.171 1.171 Brand = C subtracted from: Brand D Lower -2.171 Center -1.000 Upper 0.171 6