Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing More Than Two Means Using ANOVA 115 8 Comparing More Than Two Means Using ANOVA 8.1 The Basic ANOVA situation • Two variables: categorical explanatory and quantitative response – Can be used in either experimental or observational designs. • Main Question: Does the population mean response depend on the (treatment) group? – H0 : the population group means are all the equal (µ1 = µ2 = · · · µk ) – Ha : the population group means are not all equal • If categorical variable has only 2 values, we already have a method: 2-sample t-test – ANOVA allows for 3 or more groups (sub-populations) • F statistic compares within group variation (how different are individuals in the same group?) to between group variation (how different are the different group means?) • ANOVA assumes that each group is normally distributed with the same (population) standard deviation. – Check normality with normal quantile plots (of residuals) – Check equal standard deviation using 2:1 ratio rule (largest standard deviation at most twice the smallest standard deviation). 8.1.1 An Example: Ants and Sandwiches favstats(Ants ˜ Filling, data = SandwichAnts) .group min Q1 median Q3 max mean sd n missing 1 Ham & Pickles 34 42.0 51.0 55.2 65 49.2 10.79 8 0 2 Peanut Butter 19 21.8 30.5 44.0 59 34.0 14.63 8 0 3 Vegemite 18 24.0 30.0 39.0 42 30.8 9.25 8 0 xyplot(Ants ˜ Filling, SandwichAnts, type = c("p", "a")) bwplot(Ants ˜ Filling, SandwichAnts) Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 116 Comparing More Than Two Means Using ANOVA Ants 60 50 40 30 ● ● ● ● ● ● ● ● 20 Ham & Pickles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Peanut Butter Vegemite Filling 60 Ants 50 ● 40 30 ● ● Peanut Butter Vegemite 20 Ham & Pickles Question: Are these differences significant? Or would we expect sample differences this large by random chance even if (in the population) the mean amount of shift is equal for all three groups? Whether differences between the groups are significant depends on three things: 1. the difference in the means 2. the amount of variation within each group 3. the sample sizes anova(lm(Ants ˜ Filling, SandwichAnts)) Analysis of Variance Table Response: Ants Df Sum Sq Mean Sq F value Pr(>F) Filling 2 1561 780 5.63 0.011 Residuals 21 2913 139 The p-value listed in this output is the p-value for our null hypothesis that the mean population response is the same in each treatment group. In this case we would reject the null hypothesis at the α = 0.05 level, but not quite at the α = 0.01 level. We have some evidence that the type of sandwich matters, but not overwhelming evidence. In the next section we’ll look at this test in more detail, but notice that if you know the assumptions of a test, the null hypothesis being tested, and the p-value, you can generally interpret the results even if you don’t know all the details of how the test statistic is computed. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 8.2 117 The ANOVA test statistic 8.2.1 The ingredients The ANOVA test statistic (called F) is based on three ingredients: 1. how different the group means are (between group differences) 2. the amount of variability within each group (within group differences) 3. sample size Each of these will be involved in the calculation of F. The F statistic is a bit complicated to compute. We’ll generally let the computer handle that for us. But is useful to see one small example to see how the ingredients are baked into a test statistic. In order to test the (1-way) ANOVA null hypothesis that the means of all the groups are the same, we need to come up with a test statistic. The usual test statistic is called F in honor of R A Fisher. 8.2.2 Smaller Ants Data To make things fit on the page/screen better, let’s look at just the first 10 rows of the SandwichAnts data set. SmallAnts <- head(SandwichAnts, 10) %>% select(Filling, Ants) %>% arrange(Filling) SmallAnts 1 2 3 4 5 6 7 8 9 10 Filling Ants Ham & Pickles 44 Ham & Pickles 34 Ham & Pickles 36 Peanut Butter 43 Peanut Butter 59 Peanut Butter 22 Vegemite 18 Vegemite 29 Vegemite 42 Vegemite 42 8.2.3 Computing F Note: The R code used to expand the data set with additional information is displayed below, but you don’t need to know the details of this code. It is shown in case you are interested in knowing how to do such things. For our purposes, focus your attention on the results. Comparing means If the null hypothesis is true, then the group means should be close to the overall mean. Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 118 Comparing More Than Two Means Using ANOVA mean(˜Ants, data = SmallAnts) [1] 36.9 mean(Ants ˜ Filling, data = SmallAnts) Ham & Pickles Peanut Butter 38.0 41.3 Vegemite 32.8 So let’s add the overall mean and the group means to our data. SmallAnts %>% mutate(GrandMean = mean(Ants)) %>% group_by(Filling) %>% mutate(GroupMean = round(mean(Ants),2)) # rounding to make tables look better Source: local data frame [10 x 4] Groups: Filling 1 2 3 4 5 6 7 8 9 10 Filling Ants GrandMean GroupMean Ham & Pickles 44 36.9 38.0 Ham & Pickles 34 36.9 38.0 Ham & Pickles 36 36.9 38.0 Peanut Butter 43 36.9 41.3 Peanut Butter 59 36.9 41.3 Peanut Butter 22 36.9 41.3 Vegemite 18 36.9 32.8 Vegemite 29 36.9 32.8 Vegemite 42 36.9 32.8 Vegemite 42 36.9 32.8 Contribution of each case Each case contributes to our F statistic. For each case, we will calculate three numbers: * M (model): The difference between the group mean and the global mean * E (error): The difference between the observed response and the group mean * T (total): The difference between the observed response and the global mean If H0 is true, then the value of M should be relatively small and the values of E should be relatively large. If H0 is false, we would expect the opposite: M will be large and E will be small. SmallAnts %>% mutate(GrandMean = mean(Ants)) %>% group_by(Filling) %>% mutate( GroupMean = round(mean(Ants),2), M = GroupMean - GrandMean, E = Ants - GroupMean, T = Ants - GrandMean ) Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 119 Source: local data frame [10 x 7] Groups: Filling 1 2 3 4 5 6 7 8 9 10 Filling Ants GrandMean GroupMean M E T Ham & Pickles 44 36.9 38.0 1.10 6.00 7.1 Ham & Pickles 34 36.9 38.0 1.10 -4.00 -2.9 Ham & Pickles 36 36.9 38.0 1.10 -2.00 -0.9 Peanut Butter 43 36.9 41.3 4.43 1.67 6.1 Peanut Butter 59 36.9 41.3 4.43 17.67 22.1 Peanut Butter 22 36.9 41.3 4.43 -19.33 -14.9 Vegemite 18 36.9 32.8 -4.15 -14.75 -18.9 Vegemite 29 36.9 32.8 -4.15 -3.75 -7.9 Vegemite 42 36.9 32.8 -4.15 9.25 5.1 Vegemite 42 36.9 32.8 -4.15 9.25 5.1 Finally, as we did with standard deviation and variance, we will square M, E, and T. SmallAnts <- SmallAnts %>% mutate(GrandMean = mean(Ants)) %>% group_by(Filling) %>% mutate( GroupMean = round(mean(Ants),2), M = GroupMean - GrandMean, E = Ants - GroupMean, T = Ants - GrandMean, Msquared = Mˆ2, Esquared = Eˆ2, Tsquared = Tˆ2 ) %>% collect() %>% data.frame() SmallAnts 1 2 3 4 5 6 7 8 9 10 Filling Ants GrandMean GroupMean M E T Msquared Esquared Tsquared Ham & Pickles 44 36.9 38.0 1.10 6.00 7.1 1.21 36.00 50.41 Ham & Pickles 34 36.9 38.0 1.10 -4.00 -2.9 1.21 16.00 8.41 Ham & Pickles 36 36.9 38.0 1.10 -2.00 -0.9 1.21 4.00 0.81 Peanut Butter 43 36.9 41.3 4.43 1.67 6.1 19.62 2.79 37.21 Peanut Butter 59 36.9 41.3 4.43 17.67 22.1 19.62 312.23 488.41 Peanut Butter 22 36.9 41.3 4.43 -19.33 -14.9 19.62 373.65 222.01 Vegemite 18 36.9 32.8 -4.15 -14.75 -18.9 17.22 217.56 357.21 Vegemite 29 36.9 32.8 -4.15 -3.75 -7.9 17.22 14.06 62.41 Vegemite 42 36.9 32.8 -4.15 9.25 5.1 17.22 85.56 26.01 Vegemite 42 36.9 32.8 -4.15 9.25 5.1 17.22 85.56 26.01 Adding it all up Now lets add up all those values of M 2 , E 2 , and T 2 . We will use SS to stand for ”sum of squares”. Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 120 Comparing More Than Two Means Using ANOVA SSM <- sum(˜Msquared, data = SmallAnts) SSM [1] 131 SSE <- sum(˜Esquared, data = SmallAnts) SSE [1] 1147 SST <- sum(˜Tsquared, data = SmallAnts) SST [1] 1279 Notice that SST = SSM + SSE SST [1] 1279 SSM + SSE [1] 1279 and SST = P T = (n − 1)s 2 : SSM + SSE [1] 1279 (10 - 1) * var(˜Ants, data = SmallAnts) [1] 1279 This is how analysis of variance gets its name. We are taking the components of the variance and splitting them into two portions: SSM is the portion explained by the model (by the fact that there is variation **between** the multiple groups), and SSE is the portion unexplained by the model (because there is variation **within** each group). abbrevation component details SST total total variation (how much to values differ from global mean?) SSM model between group variation (how much to groups differ?) SSE error within group variation (how much do members of the same group differ?) Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 121 8.2.4 Comparing within group variation to between group variation Before comparing SSM and SSE, we will adjust SSM for the number of groups and SSE for the sample size. notation defintion meaning DFT n−1 total degrees of freedom DFM number of groups - 1 model degrees of freedom DFE n - number of groups error degrrees of freedom MST SST / DFT variance MSM SSM / DFM Mean Squared Model MSE SSE / DFE Mean Squared Error Notice that DFM + DFE = DFT . DFT <- 10 - 1 DFT [1] 9 DFM <- 3 - 1 DFM [1] 2 DFE <- 10 - 3 DFE # same as 9 - 2 = DFT - DFM [1] 7 MSM <- SSM/DFM MSM [1] 65.7 MSE <- SSE/DFE MSE [1] 164 Now we can finally define Fisher’s F statistic: F= Math 145 : Fall 2014 : Pruim MSM SSM/DFM = MSE SSE/DFE Last Modified: November 14, 2014 122 Comparing More Than Two Means Using ANOVA F <- MSM/MSE F [1] 0.401 F will be large when there is a lot of variation between groups and smaller there is not so much (relative to the overall variability). So we will reject the null hypothesis when F is large. The ANOVA report This information, including the p-value, is traditionally reported in an ANOVA table: anova(lm(Ants ˜ Filling, data = SmallAnts)) Analysis of Variance Table Response: Ants Df Sum Sq Mean Sq F value Pr(>F) Filling 2 131 65.7 0.4 0.68 Residuals 7 1147 163.9 What we have called model above, R is calling ‘Filling‘, because that is what our grouping variable is. What we have called error, R is calling ‘residuals‘ (and some people use RSS instead of SSE for the sum of the squres of the residuals). There is not row for the total in R’s output. (Some software includes a row for T and some software doesn’t.) The values of SSM, SSE, MSM, MSE, F, and the p-value are all easy to spot in this layout. 8.2.5 Returning to the original data Remember that we have only been looking at the first 10 rows of the data. Here is the ANOVA table for the full data set: anova(lm(Ants ˜ Filling, data = SandwichAnts)) Analysis of Variance Table Response: Ants Df Sum Sq Mean Sq F value Pr(>F) Filling 2 1561 780 5.63 0.011 Residuals 21 2913 139 Here we see that the p-value is small enough to reject the null hypothesis. It looks like the mean number of ants does vary with sandwich type. But how? Which sandwhiches attract more ants? How many more? We’ll turn our attention to these follow-up questions soon. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 8.3 123 Computing the p-value for an F statistic 8.3.1 P-values from the randomization distribution We can now compute a p-value by comparing our value of F (0.401) to a randomization distribution. If the null hypothesis is true, the three groups are really just one big group and the group labels are meaningless, so we can shuffle the group labels to get a randomization distribution: Ants.Rand <- do(1000) * anova(lm(Ants ˜ shuffle(Filling), data = SandwichAnts)) tally(˜(F >= 5.63), data = Ants.Rand) TRUE FALSE 8 992 <NA> 1000 prop(˜(F >= 5.63), data = Ants.Rand) target level: TRUE; other levels: FALSE TRUE 0.008 histogram(˜F, data = Ants.Rand, v = 5.63) Density 0.3 0.2 0.1 0.0 0 5 10 15 F Since our estimated p-value is small, we have enough evidence in the data to reject the null hypothesis. 8.3.2 P-values without simulations Under certain conditions, the F statistic has a known distribution (called the F distribution). Those conditions are 1. The null hypothesis is true (i.e., each group has the same mean) 2. Each group is sampled from a normal population 3. Each population group has the same standard deviation Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 124 Comparing More Than Two Means Using ANOVA When these conditions are met, we can use the F-distribution to compute the p-value without generating the randomization distribution. • F distributions have two parameters – the degrees of freedom for the numerator and for the denominator. In our example, this is 2 for the numerator and 7 for the denominator. • When H0 is true, the numerator and denominator both have a mean of 1, so F will tend to be close to 1. • When H0 is false, there is more difference between the groups, so the numerator tends to be larger. This means we will reject the null hypothesis when F gets large enough. • The p-value is computed using pf(). 1 - pf(5.63, 2, 21) [1] 0.011 8.3.3 Getting R to do the work Of course, R can do all of this work for us. We saw this earlier. Here it is again in a slightly different way: Ants.model <- lm(Ants ˜ Filling, data = SandwichAnts) anova(Ants.model) Analysis of Variance Table Response: Ants Df Sum Sq Mean Sq F value Pr(>F) Filling 2 1561 780 5.63 0.011 Residuals 21 2913 139 lm() stands for “linear model” and can be used to fit a wide variety of situations. It knows to do 1-way ANOVA by looking at the types of variables involved. The anova() prints the ANOVA table. Notice how DFM, SSM, MSM, DFE, SSE, and MSE show up in this table as well as F and the p-value. 8.3.4 Checking the Model Assumptions If we use the F-distribution to estimate our p-value without simulations, then we should check that the assumptions above (normality in each population group and equal standard deviations in each population group) are reasonable. 1. Comparing standard deviations in each group If each group in the population has the same standard deviation, then the group means in our data should be similar. Our rule of thumb will be that the biggest should not be more than twice the smallest. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 125 favstats(Ants ˜ Filling, data = SandwichAnts) .group min Q1 median Q3 max mean sd n missing 1 Ham & Pickles 34 42.0 51.0 55.2 65 49.2 10.79 8 0 2 Peanut Butter 19 21.8 30.5 44.0 59 34.0 14.63 8 0 3 Vegemite 18 24.0 30.0 39.0 42 30.8 9.25 8 0 According to our rule of them, we are fine here. 2. Looking at Residuals While it would be possible (at least for larger data sets) to look at the distribution of each group in our sample to see if it looks like it comes from a normal distribution, often there is not much data in each group, so it is hard to judge. We can improve this situation if we combine the data from all of the groups, but we need to make an adjustment first. If we compute residuals using residual = observed response − group mean for each value, then the combined distribution should be approximately normal with a mean of 0 when each group has the same standard deviation. Let’s compute the first residual directly to see what is going on here. head(SandwichAnts, 1) 1 Butter Filling Bread Ants Order no Vegemite Rye 18 10 The first sandwich was a Vegemite sandwich that attracted 18 ants. The mean number of ants for Vegemite sandwiches was 30.75. So our residual is residual = 18 − 30.75 = −12.75 . R can calculate these residuals for us to save us some tedious work: resid(Ants.model) 1 -12.75 13 0.25 2 9.00 14 2.00 3 -5.25 15 4.75 4 -1.75 16 -9.75 5 6 25.00 -15.25 17 18 13.00 15.75 7 8 9 11.25 -12.00 -13.25 19 20 21 7.25 -15.00 9.75 10 11 11.25 -9.00 22 23 -5.75 -13.00 12 -0.25 24 3.75 With the residuals computed, we can look at a histogram or normal-quantile plot of the residuals to see if things look roughly normal. histogram(˜resid(Ants.model)) qqmath(˜resid(Ants.model)) Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 Comparing More Than Two Means Using ANOVA resid(Ants.model) 126 Density 0.03 0.02 0.01 0.00 −20 −10 0 10 20 ● 20 10 0 −10 ● 30 ● ●●● ●● ● ● ● ●●● ●● ●● ● ● ● ●● −2 −1 resid(Ants.model) 0 ● 1 2 qnorm 3. Diagnostic Plots R provides a tool for generating diagnostic plots quickly and easily. Here are two we will often look at mplot(Ants.model, which = 1:2) Normal Q−Q Residual ● 20 10 0 −10 ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 35 40 Fitted Value 45 50 Standardized residuals Residuals vs Fitted ● 2 1 0 −1 ● −2 ● ● ●●● ●● ● ●● ●●● ●● ●● ● ●●● −1 0 1 ● 2 Theoretical Quantiles The first shows how the residuals behave in each group, but organized according to the group means. We are hoping to see roughly equivalent amounts of spread in each group. This looks good. The second plot is the normal-quantile plot of the residuals. We would like it to be roughly a straight line. This can be hard to judge in a small data set, but it looks like things are a bit too closely packed together – the smallest residuals should be a bit smaller and the largest a bit larger that what we are seeing. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 127 Proportion of Variation Explained The summary() function can be used to provide a different summary of the ANOVA model: summary(Ants.model) Call: lm(formula = Ants ˜ Filling, data = SandwichAnts) Residuals: Min 1Q Median -15.25 -10.31 0.00 3Q 9.19 Max 25.00 Coefficients: (Intercept) FillingPeanut Butter FillingVegemite Estimate Std. Error t value Pr(>|t|) 49.25 4.16 11.83 9.5e-11 -15.25 5.89 -2.59 0.0171 -18.50 5.89 -3.14 0.0049 Residual standard error: 11.8 on 21 degrees of freedom Multiple R-squared: 0.349,Adjusted R-squared: 0.287 F-statistic: 5.63 on 2 and 21 DF, p-value: 0.011 The ratio R2 = SSM SSM = SSM + SSE SST measures the proportion of the total variation that is explained by the grouping variable (treatment). 8.4 Another Example: Jet Lag Details of this study can be found at http://www.sciencemag.org/content/297/5581/571.full. Here is all the code needed to analyze the jet lag experiment require(abd) favstats(shift ˜ treatment, data = JetLagKnees) .group min Q1 1 control -1.27 -0.65 2 eyes -2.83 -1.78 3 knee -1.61 -0.76 median Q3 max mean sd n missing -0.485 0.24 0.53 -0.309 0.618 8 0 -1.480 -1.10 -0.78 -1.551 0.706 7 0 -0.290 0.17 0.73 -0.336 0.791 7 0 xyplot(shift ˜ treatment, data = JetLagKnees, type = c("p", "a")) bwplot(shift ˜ treatment, data = JetLagKnees) Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 128 Comparing More Than Two Means Using ANOVA shift 0 −1 ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● ● control eyes knee treatment 0 ● shift ● −1 ● −2 ● control eyes knee jetlag.model <- lm(shift ˜ treatment, data = JetLagKnees) anova(jetlag.model) Analysis of Variance Table Response: shift Df Sum Sq Mean Sq F value Pr(>F) treatment 2 7.22 3.61 7.29 0.0045 Residuals 19 9.42 0.50 summary(jetlag.model) mplot(jetlag.model, w = 1:2) Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 129 Residuals vs Fitted ● 1.0 Residual ● ● ● ● ● ● ● ● ● ● ● Standardized residuals ●● ● ● 0.5 −0.5 ● ● ● ● 0.0 Normal Q−Q ●● ● ● ● 1 ● ●●● ● 0 ● ● ●● ● ● −1 ● ● −1.0 ● ● −1.5 −1.0 ● ● −2 −0.5 ● −2 Fitted Value ● −1 0 1 2 Theoretical Quantiles The small p-value suggests that the three treatment groups do not have the same mean shift in circadian rhythm. But the plots of our data suggest that this is because the eyes group is different from the other two. That is, the knees group looks very similar (on average) to the control group. We will formalize these observations in the next section. 8.5 Follow-Up Analysis 8.5.1 The Problems with Looking at Confidence Intervals for One Mean At a Time We can construct a confidence interval for any of the means by just taking a subset of the data and using t.test(), but there are some problems with this approach. Most importantly, We were primarily interested in comparing the means across the groups. Often people will display confidence intervals for each group and look for “overlapping” intervals. But this is not the best way to look for differences. Nevertheless, you will sometimes see graphs showing multiple confidence intervals and labeling them to indicate which means appear to be different from which. (See the solution to problem 15.3 for an example.) When doing this in the context of ANOVA, however, we should adjust our estimate for σ. Instead of using the standard deviation from just one group, we can combine the data from all the groups (since we are assuming √ they all have the same standard deviation) and use MSE as our estimate for σ. σ SE = √ ≈ n Math 145 : Fall 2014 : Pruim √ MSE √ n Last Modified: November 14, 2014 130 Comparing More Than Two Means Using ANOVA 8.5.2 Pairwise Comparison We really want to compare groups in pairs, and we have a method for this: 2-sample t. But we need to make a couple adjustments to the two-sample t. 1. As above, we will use a new formula for standard error that makes use of all the data (even from groups not involved in the pair). 2. We also need to adjust the critical value to take into account the fact that we are (usually) making multiple comparisons. 8.5.3 The Standard Error for Comparing Two Means v t SE = 2 σi2 σj + = ni nj s σ2 σ2 + =σ ni nj s 1 1 √ + ≈ MSE ni nj s s 1 1 + = ni nj MSE where ni and nj are the sample sizes for the two groups being compared. Basically, of s in our usual formula. The degrees of freedom for this estimate is 1 1 + ni nj ! √ MSE is taking the place DFE = total sample size − number of groups . Ignoring the multiple comparisons issue, we can now compute confidence intervals or hypothesis tests just as before. • confidence interval: y i − y j ± t∗ SE • test statistic (for H0 : µ1 − µ2 = 0): t= yi − yj SE . The appropriate degrees of freedom to use is DFE, since that’s the degrees of freedom associated with our estimate for σ. Using our jet lag data, we can compute a 95% confidence interval for the difference between the knees group and the control group as follows. anova(jetlag.model) Analysis of Variance Table Response: shift Df Sum Sq Mean Sq F value Pr(>F) treatment 2 7.22 3.61 7.29 0.0045 Residuals 19 9.42 0.50 favstats(shift ˜ treatment, data = JetLagKnees) .group min Q1 1 control -1.27 -0.65 2 eyes -2.83 -1.78 3 knee -1.61 -0.76 median Q3 max mean sd n missing -0.485 0.24 0.53 -0.309 0.618 8 0 -1.480 -1.10 -0.78 -1.551 0.706 7 0 -0.290 0.17 0.73 -0.336 0.791 7 0 Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 131 SE <- sqrt(0.5) * sqrt(1/8 + 1/7) SE [1] 0.366 DFE <- 19 t.star <- qt(0.975, df = DFE) t.star [1] 2.09 estimate <- (-0.309) - (-0.336) estimate [1] 0.027 t.star * SE # margin of error [1] 0.766 estimate - t.star * SE # lower end of CI [1] -0.739 estimate + t.star * SE # upper end of CI [1] 0.793 This would be correct if these were the only two groups we were comparing. But we need to make an adjustment to deal with all three groups at once. The adjustment will make the interval even wider, so even after adjusting, 0 will be inside the interval, so we do not have evidence that would allow us to reject the hypothesis that shining light on the back of knees makes no difference. 8.5.4 The Multiple Comparisons Problem Suppose we have 5 groups in our study and we want to make comparisons between each pair of groups. That’s 4 + 3 + 2 + 1 = 10 pairs. If we made 10 independent 95% confidence intervals, the probability that all of the cover the appropriate parameter is 0.599: 0.95ˆ10 [1] 0.599 So we have family-wide error rate of nearly 40%. We can correct for this by adjusting our critical value. Let’s take a simple example: just two 95% confidence intervals. The probability that both cover (assuming independence) is Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 132 Comparing More Than Two Means Using ANOVA 0.95ˆ2 [1] 0.902 Now suppose we want both intervals to cover 95% instead of 90.2% of the time. We could get this by forming two 97.5% confidence intervals. sqrt(0.95) [1] 0.975 0.975ˆ2 [1] 0.951 This means we need a larger value for t∗ for each interval. The ANOVA situation is a little bit more complicated because • There are more than two comparisons. • The different comparisons are not independent (because each group mean is used in multiple comparisons). We will briefly describe two ways to make an adjustment for multiple comparisons. 8.5.5 Bonferroni Corrections – An Easy Over-adjustment Bonferroni’s idea is simple: Simple divide the desired family-wise error rate by the number of tests or intervals. This is an over-correction, but it is easy to do, and is used in many situations where a better method is not known or a quick estimate is desired. Here is a table showing a few Bonferroni corrections for looking at all pairwise comparisons. number groups number of pairs of groups family-wise error rate individual error rate confidence level for determining t∗ 3 3 .05 0.017 0.983 4 6 .05 0.008 0.992 5 10 .05 0.005 0.995 Similar adjustments could be made for looking at only a special subset of the pairwise comparisons. 8.5.6 Tukey’s Honest Significant Differences Tukey’s Honest Significant Differences is a better adjustment method specifically designed for making all pairwise comparisons in an ANOVA situation. (It takes into account the fact that the tests are not independent.) R can compute Tukey’s Honest Significant Differences easily. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 133 TukeyHSD(lm(shift ˜ treatment, JetLagKnees)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = x) $treatment diff lwr upr p adj eyes-control -1.243 -2.168 -0.317 0.008 knee-control -0.027 -0.953 0.899 0.997 knee-eyes 1.216 0.260 2.172 0.012 mplot(TukeyHSD(lm(shift ˜ treatment, JetLagKnees))) Tukey's Honest Significant Differences treatment ● es ey e− e kn ● l tro on −c e ne k ● l tro on c s− e ey −2 −1 0 1 2 difference in means Tukey’s method adjusts the confidence intervals, making them a bit wider, to give them the desired familywide error rate. Tukey’s method also adjusts p-values (making them larger), so that when the means are all the same, there is only a 5% chance that a sample will produce any p-values below 0.05. In this example we see that the eye group differs significantly from control group and also from the knee group, but that the knee and control groups are not significantly different. (We can tell this by seeing which confidence intervals contain 0 or by checking which adjusted p-values are less than 0.05.) 8.5.7 Other Adjustments There are similar methods for testing other sets of multiple comparisons. Testing “one against all the others” goes by the name of Dunnet’s method, for example. This is useful when one group represents a control against which various treatments are being compared. Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014 134 8.6 Comparing More Than Two Means Using ANOVA Computing F from Summary Statistics It is possible to compute F from a fairly limited set of summary statistics. Everything we need is in these two tables: favstats(shift ˜ treatment, data = JetLagKnees) .group min Q1 1 control -1.27 -0.65 2 eyes -2.83 -1.78 3 knee -1.61 -0.76 median Q3 max mean sd n missing -0.485 0.24 0.53 -0.309 0.618 8 0 -1.480 -1.10 -0.78 -1.551 0.706 7 0 -0.290 0.17 0.73 -0.336 0.791 7 0 favstats(˜shift, data = JetLagKnees) min Q1 median Q3 max mean sd n missing -2.83 -1.33 -0.66 -0.05 0.73 -0.713 0.89 22 0 Recall that each E component is E = observed response − group mean So if we add up all the values of E for group i, we get (ni − 1)si2 . SSE = (8 - 1) * 0.618ˆ2 + (7 - 1) * 0.706ˆ2 + (7 - 1) * 0.791ˆ2 SSE [1] 9.42 Similarly, each M component is M = group mean − grand mean So the sum of all the values of M for group i is ni times this difference in means: SSM = 8 * (-0.309 - (-0.713))ˆ2 + 7 * (-1.551 - (-0.713))ˆ2 + 7 * (-0.336 - (-0.713))ˆ2 SSM [1] 7.22 These match the values in the ANOVA table (up to round-off): anova(jetlag.model) Analysis of Variance Table Response: shift Df Sum Sq Mean Sq F value Pr(>F) treatment 2 7.22 3.61 7.29 0.0045 Residuals 19 9.42 0.50 Once we have SSE and SSM, the rest is easily computed. Last Modified: November 14, 2014 Math 145 : Fall 2014 : Pruim Comparing More Than Two Means Using ANOVA 135 MSM <- SSM/2 MSM [1] 3.61 MSE <- SSE/19 MSE [1] 0.496 F <- MSM/MSE F [1] 7.28 p.val <- 1 - pf(F, df1 = 2, df2 = 19) p.val [1] 0.0045 Math 145 : Fall 2014 : Pruim Last Modified: November 14, 2014