Download 1-way ANOVA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Comparing More Than Two Means Using ANOVA
115
8
Comparing More Than Two Means Using ANOVA
8.1
The Basic ANOVA situation
• Two variables: categorical explanatory and quantitative response
– Can be used in either experimental or observational designs.
• Main Question: Does the population mean response depend on the (treatment) group?
– H0 : the population group means are all the equal (µ1 = µ2 = · · · µk )
– Ha : the population group means are not all equal
• If categorical variable has only 2 values, we already have a method: 2-sample t-test
– ANOVA allows for 3 or more groups (sub-populations)
• F statistic compares within group variation (how different are individuals in the same group?) to between
group variation (how different are the different group means?)
• ANOVA assumes that each group is normally distributed with the same (population) standard deviation.
– Check normality with normal quantile plots (of residuals)
– Check equal standard deviation using 2:1 ratio rule (largest standard deviation at most twice the
smallest standard deviation).
8.1.1 An Example: Ants and Sandwiches
favstats(Ants ˜ Filling, data = SandwichAnts)
.group min
Q1 median
Q3 max mean
sd n missing
1 Ham & Pickles 34 42.0
51.0 55.2 65 49.2 10.79 8
0
2 Peanut Butter 19 21.8
30.5 44.0 59 34.0 14.63 8
0
3
Vegemite 18 24.0
30.0 39.0 42 30.8 9.25 8
0
xyplot(Ants ˜ Filling, SandwichAnts, type = c("p", "a"))
bwplot(Ants ˜ Filling, SandwichAnts)
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
116
Comparing More Than Two Means Using ANOVA
Ants
60
50
40
30
●
●
●
●
●
●
●
●
20
Ham & Pickles
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Peanut Butter
Vegemite
Filling
60
Ants
50
●
40
30
●
●
Peanut Butter
Vegemite
20
Ham & Pickles
Question: Are these differences significant? Or would we expect sample differences this large by random
chance even if (in the population) the mean amount of shift is equal for all three groups?
Whether differences between the groups are significant depends on three things:
1. the difference in the means
2. the amount of variation within each group
3. the sample sizes
anova(lm(Ants ˜ Filling, SandwichAnts))
Analysis of Variance Table
Response: Ants
Df Sum Sq Mean Sq F value Pr(>F)
Filling
2
1561
780
5.63 0.011
Residuals 21
2913
139
The p-value listed in this output is the p-value for our null hypothesis that the mean population response is
the same in each treatment group. In this case we would reject the null hypothesis at the α = 0.05 level, but not
quite at the α = 0.01 level. We have some evidence that the type of sandwich matters, but not overwhelming
evidence.
In the next section we’ll look at this test in more detail, but notice that if you know the assumptions of a test,
the null hypothesis being tested, and the p-value, you can generally interpret the results even if you don’t
know all the details of how the test statistic is computed.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
8.2
117
The ANOVA test statistic
8.2.1 The ingredients
The ANOVA test statistic (called F) is based on three ingredients:
1. how different the group means are (between group differences)
2. the amount of variability within each group (within group differences)
3. sample size
Each of these will be involved in the calculation of F.
The F statistic is a bit complicated to compute. We’ll generally let the computer handle that for us. But is
useful to see one small example to see how the ingredients are baked into a test statistic. In order to test the
(1-way) ANOVA null hypothesis that the means of all the groups are the same, we need to come up with a test
statistic. The usual test statistic is called F in honor of R A Fisher.
8.2.2 Smaller Ants Data
To make things fit on the page/screen better, let’s look at just the first 10 rows of the SandwichAnts data set.
SmallAnts <- head(SandwichAnts, 10) %>% select(Filling, Ants) %>% arrange(Filling)
SmallAnts
1
2
3
4
5
6
7
8
9
10
Filling Ants
Ham & Pickles
44
Ham & Pickles
34
Ham & Pickles
36
Peanut Butter
43
Peanut Butter
59
Peanut Butter
22
Vegemite
18
Vegemite
29
Vegemite
42
Vegemite
42
8.2.3 Computing F
Note: The R code used to expand the data set with additional information is displayed below, but you don’t need
to know the details of this code. It is shown in case you are interested in knowing how to do such things. For our
purposes, focus your attention on the results.
Comparing means
If the null hypothesis is true, then the group means should be close to the overall mean.
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
118
Comparing More Than Two Means Using ANOVA
mean(˜Ants, data = SmallAnts)
[1] 36.9
mean(Ants ˜ Filling, data = SmallAnts)
Ham & Pickles Peanut Butter
38.0
41.3
Vegemite
32.8
So let’s add the overall mean and the group means to our data.
SmallAnts %>%
mutate(GrandMean = mean(Ants)) %>%
group_by(Filling) %>%
mutate(GroupMean = round(mean(Ants),2))
# rounding to make tables look better
Source: local data frame [10 x 4]
Groups: Filling
1
2
3
4
5
6
7
8
9
10
Filling Ants GrandMean GroupMean
Ham & Pickles
44
36.9
38.0
Ham & Pickles
34
36.9
38.0
Ham & Pickles
36
36.9
38.0
Peanut Butter
43
36.9
41.3
Peanut Butter
59
36.9
41.3
Peanut Butter
22
36.9
41.3
Vegemite
18
36.9
32.8
Vegemite
29
36.9
32.8
Vegemite
42
36.9
32.8
Vegemite
42
36.9
32.8
Contribution of each case
Each case contributes to our F statistic. For each case, we will calculate three numbers:
* M (model): The difference between the group mean and the global mean * E (error): The difference between
the observed response and the group mean * T (total): The difference between the observed response and the
global mean
If H0 is true, then the value of M should be relatively small and the values of E should be relatively large. If
H0 is false, we would expect the opposite: M will be large and E will be small.
SmallAnts %>%
mutate(GrandMean = mean(Ants)) %>%
group_by(Filling) %>%
mutate(
GroupMean = round(mean(Ants),2),
M = GroupMean - GrandMean,
E = Ants - GroupMean,
T = Ants - GrandMean
)
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
119
Source: local data frame [10 x 7]
Groups: Filling
1
2
3
4
5
6
7
8
9
10
Filling Ants GrandMean GroupMean
M
E
T
Ham & Pickles
44
36.9
38.0 1.10
6.00
7.1
Ham & Pickles
34
36.9
38.0 1.10 -4.00 -2.9
Ham & Pickles
36
36.9
38.0 1.10 -2.00 -0.9
Peanut Butter
43
36.9
41.3 4.43
1.67
6.1
Peanut Butter
59
36.9
41.3 4.43 17.67 22.1
Peanut Butter
22
36.9
41.3 4.43 -19.33 -14.9
Vegemite
18
36.9
32.8 -4.15 -14.75 -18.9
Vegemite
29
36.9
32.8 -4.15 -3.75 -7.9
Vegemite
42
36.9
32.8 -4.15
9.25
5.1
Vegemite
42
36.9
32.8 -4.15
9.25
5.1
Finally, as we did with standard deviation and variance, we will square M, E, and T.
SmallAnts <- SmallAnts %>%
mutate(GrandMean = mean(Ants)) %>%
group_by(Filling) %>%
mutate(
GroupMean = round(mean(Ants),2),
M = GroupMean - GrandMean,
E = Ants - GroupMean,
T = Ants - GrandMean,
Msquared = Mˆ2,
Esquared = Eˆ2,
Tsquared = Tˆ2
) %>%
collect() %>%
data.frame()
SmallAnts
1
2
3
4
5
6
7
8
9
10
Filling Ants GrandMean GroupMean
M
E
T Msquared Esquared Tsquared
Ham & Pickles
44
36.9
38.0 1.10
6.00
7.1
1.21
36.00
50.41
Ham & Pickles
34
36.9
38.0 1.10 -4.00 -2.9
1.21
16.00
8.41
Ham & Pickles
36
36.9
38.0 1.10 -2.00 -0.9
1.21
4.00
0.81
Peanut Butter
43
36.9
41.3 4.43
1.67
6.1
19.62
2.79
37.21
Peanut Butter
59
36.9
41.3 4.43 17.67 22.1
19.62
312.23
488.41
Peanut Butter
22
36.9
41.3 4.43 -19.33 -14.9
19.62
373.65
222.01
Vegemite
18
36.9
32.8 -4.15 -14.75 -18.9
17.22
217.56
357.21
Vegemite
29
36.9
32.8 -4.15 -3.75 -7.9
17.22
14.06
62.41
Vegemite
42
36.9
32.8 -4.15
9.25
5.1
17.22
85.56
26.01
Vegemite
42
36.9
32.8 -4.15
9.25
5.1
17.22
85.56
26.01
Adding it all up
Now lets add up all those values of M 2 , E 2 , and T 2 . We will use SS to stand for ”sum of squares”.
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
120
Comparing More Than Two Means Using ANOVA
SSM <- sum(˜Msquared, data = SmallAnts)
SSM
[1] 131
SSE <- sum(˜Esquared, data = SmallAnts)
SSE
[1] 1147
SST <- sum(˜Tsquared, data = SmallAnts)
SST
[1] 1279
Notice that SST = SSM + SSE
SST
[1] 1279
SSM + SSE
[1] 1279
and SST =
P
T = (n − 1)s 2 :
SSM + SSE
[1] 1279
(10 - 1) * var(˜Ants, data = SmallAnts)
[1] 1279
This is how analysis of variance gets its name. We are taking the components of the variance and splitting them
into two portions: SSM is the portion explained by the model (by the fact that there is variation **between**
the multiple groups), and SSE is the portion unexplained by the model (because there is variation **within**
each group).
abbrevation
component
details
SST
total
total variation (how much to values differ from global mean?)
SSM
model
between group variation (how much to groups differ?)
SSE
error
within group variation (how much do members of the same group differ?)
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
121
8.2.4 Comparing within group variation to between group variation
Before comparing SSM and SSE, we will adjust SSM for the number of groups and SSE for the sample size.
notation
defintion
meaning
DFT
n−1
total degrees of freedom
DFM
number of groups - 1
model degrees of freedom
DFE
n - number of groups
error degrrees of freedom
MST
SST / DFT
variance
MSM
SSM / DFM
Mean Squared Model
MSE
SSE / DFE
Mean Squared Error
Notice that DFM + DFE = DFT .
DFT <- 10 - 1
DFT
[1] 9
DFM <- 3 - 1
DFM
[1] 2
DFE <- 10 - 3
DFE # same as 9 - 2 = DFT - DFM
[1] 7
MSM <- SSM/DFM
MSM
[1] 65.7
MSE <- SSE/DFE
MSE
[1] 164
Now we can finally define Fisher’s F statistic:
F=
Math 145 : Fall 2014 : Pruim
MSM SSM/DFM
=
MSE
SSE/DFE
Last Modified: November 14, 2014
122
Comparing More Than Two Means Using ANOVA
F <- MSM/MSE
F
[1] 0.401
F will be large when there is a lot of variation between groups and smaller there is not so much (relative to the
overall variability). So we will reject the null hypothesis when F is large.
The ANOVA report
This information, including the p-value, is traditionally reported in an ANOVA table:
anova(lm(Ants ˜ Filling, data = SmallAnts))
Analysis of Variance Table
Response: Ants
Df Sum Sq Mean Sq F value Pr(>F)
Filling
2
131
65.7
0.4
0.68
Residuals 7
1147
163.9
What we have called model above, R is calling ‘Filling‘, because that is what our grouping variable is. What
we have called error, R is calling ‘residuals‘ (and some people use RSS instead of SSE for the sum of the squres
of the residuals). There is not row for the total in R’s output. (Some software includes a row for T and some
software doesn’t.)
The values of SSM, SSE, MSM, MSE, F, and the p-value are all easy to spot in this layout.
8.2.5 Returning to the original data
Remember that we have only been looking at the first 10 rows of the data. Here is the ANOVA table for the
full data set:
anova(lm(Ants ˜ Filling, data = SandwichAnts))
Analysis of Variance Table
Response: Ants
Df Sum Sq Mean Sq F value Pr(>F)
Filling
2
1561
780
5.63 0.011
Residuals 21
2913
139
Here we see that the p-value is small enough to reject the null hypothesis. It looks like the mean number of
ants does vary with sandwich type. But how? Which sandwhiches attract more ants? How many more? We’ll
turn our attention to these follow-up questions soon.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
8.3
123
Computing the p-value for an F statistic
8.3.1 P-values from the randomization distribution
We can now compute a p-value by comparing our value of F (0.401) to a randomization distribution. If the
null hypothesis is true, the three groups are really just one big group and the group labels are meaningless, so
we can shuffle the group labels to get a randomization distribution:
Ants.Rand <- do(1000) * anova(lm(Ants ˜ shuffle(Filling), data = SandwichAnts))
tally(˜(F >= 5.63), data = Ants.Rand)
TRUE FALSE
8
992
<NA>
1000
prop(˜(F >= 5.63), data = Ants.Rand)
target level:
TRUE;
other levels:
FALSE
TRUE
0.008
histogram(˜F, data = Ants.Rand, v = 5.63)
Density
0.3
0.2
0.1
0.0
0
5
10
15
F
Since our estimated p-value is small, we have enough evidence in the data to reject the null hypothesis.
8.3.2 P-values without simulations
Under certain conditions, the F statistic has a known distribution (called the F distribution). Those conditions
are
1. The null hypothesis is true (i.e., each group has the same mean)
2. Each group is sampled from a normal population
3. Each population group has the same standard deviation
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
124
Comparing More Than Two Means Using ANOVA
When these conditions are met, we can use the F-distribution to compute the p-value without generating the
randomization distribution.
• F distributions have two parameters – the degrees of freedom for the numerator and for the denominator.
In our example, this is 2 for the numerator and 7 for the denominator.
• When H0 is true, the numerator and denominator both have a mean of 1, so F will tend to be close to 1.
• When H0 is false, there is more difference between the groups, so the numerator tends to be larger.
This means we will reject the null hypothesis when F gets large enough.
• The p-value is computed using pf().
1 - pf(5.63, 2, 21)
[1] 0.011
8.3.3 Getting R to do the work
Of course, R can do all of this work for us. We saw this earlier. Here it is again in a slightly different way:
Ants.model <- lm(Ants ˜ Filling, data = SandwichAnts)
anova(Ants.model)
Analysis of Variance Table
Response: Ants
Df Sum Sq Mean Sq F value Pr(>F)
Filling
2
1561
780
5.63 0.011
Residuals 21
2913
139
lm() stands for “linear model” and can be used to fit a wide variety of situations. It knows to do 1-way ANOVA
by looking at the types of variables involved.
The anova() prints the ANOVA table. Notice how DFM, SSM, MSM, DFE, SSE, and MSE show up in this table
as well as F and the p-value.
8.3.4 Checking the Model Assumptions
If we use the F-distribution to estimate our p-value without simulations, then we should check that the assumptions above (normality in each population group and equal standard deviations in each population
group) are reasonable.
1. Comparing standard deviations in each group
If each group in the population has the same standard deviation, then the group means in our data
should be similar. Our rule of thumb will be that the biggest should not be more than twice the smallest.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
125
favstats(Ants ˜ Filling, data = SandwichAnts)
.group min
Q1 median
Q3 max mean
sd n missing
1 Ham & Pickles 34 42.0
51.0 55.2 65 49.2 10.79 8
0
2 Peanut Butter 19 21.8
30.5 44.0 59 34.0 14.63 8
0
3
Vegemite 18 24.0
30.0 39.0 42 30.8 9.25 8
0
According to our rule of them, we are fine here.
2. Looking at Residuals
While it would be possible (at least for larger data sets) to look at the distribution of each group in our
sample to see if it looks like it comes from a normal distribution, often there is not much data in each
group, so it is hard to judge. We can improve this situation if we combine the data from all of the groups,
but we need to make an adjustment first.
If we compute residuals using
residual = observed response − group mean
for each value, then the combined distribution should be approximately normal with a mean of 0 when
each group has the same standard deviation.
Let’s compute the first residual directly to see what is going on here.
head(SandwichAnts, 1)
1
Butter Filling Bread Ants Order
no Vegemite
Rye
18
10
The first sandwich was a Vegemite sandwich that attracted 18 ants. The mean number of ants for Vegemite sandwiches was 30.75. So our residual is
residual = 18 − 30.75 = −12.75 .
R can calculate these residuals for us to save us some tedious work:
resid(Ants.model)
1
-12.75
13
0.25
2
9.00
14
2.00
3
-5.25
15
4.75
4
-1.75
16
-9.75
5
6
25.00 -15.25
17
18
13.00 15.75
7
8
9
11.25 -12.00 -13.25
19
20
21
7.25 -15.00
9.75
10
11
11.25 -9.00
22
23
-5.75 -13.00
12
-0.25
24
3.75
With the residuals computed, we can look at a histogram or normal-quantile plot of the residuals to see
if things look roughly normal.
histogram(˜resid(Ants.model))
qqmath(˜resid(Ants.model))
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
Comparing More Than Two Means Using ANOVA
resid(Ants.model)
126
Density
0.03
0.02
0.01
0.00
−20
−10
0
10
20
●
20
10
0
−10
●
30
●
●●●
●●
●
●
●
●●●
●●
●●
●
● ● ●●
−2
−1
resid(Ants.model)
0
●
1
2
qnorm
3. Diagnostic Plots
R provides a tool for generating diagnostic plots quickly and easily. Here are two we will often look at
mplot(Ants.model, which = 1:2)
Normal Q−Q
Residual
●
20
10
0
−10
●
●
●
●
●
●
●
30
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
35
40
Fitted Value
45
50
Standardized residuals
Residuals vs Fitted
●
2
1
0
−1
●
−2
●
●
●●●
●●
●
●●
●●●
●●
●●
● ●●●
−1
0
1
●
2
Theoretical Quantiles
The first shows how the residuals behave in each group, but organized according to the group means.
We are hoping to see roughly equivalent amounts of spread in each group. This looks good.
The second plot is the normal-quantile plot of the residuals. We would like it to be roughly a straight
line. This can be hard to judge in a small data set, but it looks like things are a bit too closely packed
together – the smallest residuals should be a bit smaller and the largest a bit larger that what we are
seeing.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
127
Proportion of Variation Explained
The summary() function can be used to provide a different summary of the ANOVA model:
summary(Ants.model)
Call:
lm(formula = Ants ˜ Filling, data = SandwichAnts)
Residuals:
Min
1Q Median
-15.25 -10.31
0.00
3Q
9.19
Max
25.00
Coefficients:
(Intercept)
FillingPeanut Butter
FillingVegemite
Estimate Std. Error t value Pr(>|t|)
49.25
4.16
11.83 9.5e-11
-15.25
5.89
-2.59
0.0171
-18.50
5.89
-3.14
0.0049
Residual standard error: 11.8 on 21 degrees of freedom
Multiple R-squared: 0.349,Adjusted R-squared: 0.287
F-statistic: 5.63 on 2 and 21 DF, p-value: 0.011
The ratio
R2 =
SSM
SSM
=
SSM + SSE
SST
measures the proportion of the total variation that is explained by the grouping variable (treatment).
8.4
Another Example: Jet Lag
Details of this study can be found at http://www.sciencemag.org/content/297/5581/571.full.
Here is all the code needed to analyze the jet lag experiment
require(abd)
favstats(shift ˜ treatment, data = JetLagKnees)
.group
min
Q1
1 control -1.27 -0.65
2
eyes -2.83 -1.78
3
knee -1.61 -0.76
median
Q3
max
mean
sd n missing
-0.485 0.24 0.53 -0.309 0.618 8
0
-1.480 -1.10 -0.78 -1.551 0.706 7
0
-0.290 0.17 0.73 -0.336 0.791 7
0
xyplot(shift ˜ treatment, data = JetLagKnees, type = c("p", "a"))
bwplot(shift ˜ treatment, data = JetLagKnees)
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
128
Comparing More Than Two Means Using ANOVA
shift
0
−1
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
●
●
●
●
control
eyes
knee
treatment
0
●
shift
●
−1
●
−2
●
control
eyes
knee
jetlag.model <- lm(shift ˜ treatment, data = JetLagKnees)
anova(jetlag.model)
Analysis of Variance Table
Response: shift
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2
7.22
3.61
7.29 0.0045
Residuals 19
9.42
0.50
summary(jetlag.model)
mplot(jetlag.model, w = 1:2)
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
129
Residuals vs Fitted
●
1.0
Residual
●
●
●
●
●
●
●
●
●
●
●
Standardized residuals
●●
●
●
0.5
−0.5
●
●
●
●
0.0
Normal Q−Q
●● ●
●
●
1
●
●●●
●
0
●
●
●●
●
●
−1
●
●
−1.0
●
●
−1.5
−1.0
●
●
−2
−0.5
●
−2
Fitted Value
●
−1
0
1
2
Theoretical Quantiles
The small p-value suggests that the three treatment groups do not have the same mean shift in circadian
rhythm. But the plots of our data suggest that this is because the eyes group is different from the other
two. That is, the knees group looks very similar (on average) to the control group. We will formalize these
observations in the next section.
8.5
Follow-Up Analysis
8.5.1 The Problems with Looking at Confidence Intervals for One Mean At a Time
We can construct a confidence interval for any of the means by just taking a subset of the data and using
t.test(), but there are some problems with this approach. Most importantly,
We were primarily interested in comparing the means across the groups. Often people will display
confidence intervals for each group and look for “overlapping” intervals. But this is not the best
way to look for differences.
Nevertheless, you will sometimes see graphs showing multiple confidence intervals and labeling them to indicate which means appear to be different from which. (See the solution to problem 15.3 for an example.)
When doing this in the context of ANOVA, however, we should adjust our estimate for σ. Instead of using the
standard deviation from just one group, we can combine
the data from all the groups (since we are assuming
√
they all have the same standard deviation) and use MSE as our estimate for σ.
σ
SE = √ ≈
n
Math 145 : Fall 2014 : Pruim
√
MSE
√
n
Last Modified: November 14, 2014
130
Comparing More Than Two Means Using ANOVA
8.5.2 Pairwise Comparison
We really want to compare groups in pairs, and we have a method for this: 2-sample t. But we need to make a
couple adjustments to the two-sample t.
1. As above, we will use a new formula for standard error that makes use of all the data (even from groups
not involved in the pair).
2. We also need to adjust the critical value to take into account the fact that we are (usually) making multiple comparisons.
8.5.3 The Standard Error for Comparing Two Means
v
t
SE =
2
σi2 σj
+
=
ni
nj
s
σ2 σ2
+
=σ
ni
nj
s
1
1 √
+
≈ MSE
ni nj
s
s
1
1
+
=
ni nj
MSE
where ni and nj are the sample sizes for the two groups being compared. Basically,
of s in our usual formula. The degrees of freedom for this estimate is
1
1
+
ni nj
!
√
MSE is taking the place
DFE = total sample size − number of groups .
Ignoring the multiple comparisons issue, we can now compute confidence intervals or hypothesis tests just as
before.
• confidence interval:
y i − y j ± t∗ SE
• test statistic (for H0 : µ1 − µ2 = 0):
t=
yi − yj
SE
.
The appropriate degrees of freedom to use is DFE, since that’s the degrees of freedom associated with our
estimate for σ.
Using our jet lag data, we can compute a 95% confidence interval for the difference between the knees group
and the control group as follows.
anova(jetlag.model)
Analysis of Variance Table
Response: shift
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2
7.22
3.61
7.29 0.0045
Residuals 19
9.42
0.50
favstats(shift ˜ treatment, data = JetLagKnees)
.group
min
Q1
1 control -1.27 -0.65
2
eyes -2.83 -1.78
3
knee -1.61 -0.76
median
Q3
max
mean
sd n missing
-0.485 0.24 0.53 -0.309 0.618 8
0
-1.480 -1.10 -0.78 -1.551 0.706 7
0
-0.290 0.17 0.73 -0.336 0.791 7
0
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
131
SE <- sqrt(0.5) * sqrt(1/8 + 1/7)
SE
[1] 0.366
DFE <- 19
t.star <- qt(0.975, df = DFE)
t.star
[1] 2.09
estimate <- (-0.309) - (-0.336)
estimate
[1] 0.027
t.star * SE
# margin of error
[1] 0.766
estimate - t.star * SE
# lower end of CI
[1] -0.739
estimate + t.star * SE
# upper end of CI
[1] 0.793
This would be correct if these were the only two groups we were comparing. But we need to make an adjustment to deal with all three groups at once. The adjustment will make the interval even wider, so even after
adjusting, 0 will be inside the interval, so we do not have evidence that would allow us to reject the hypothesis
that shining light on the back of knees makes no difference.
8.5.4 The Multiple Comparisons Problem
Suppose we have 5 groups in our study and we want to make comparisons between each pair of groups. That’s
4 + 3 + 2 + 1 = 10 pairs. If we made 10 independent 95% confidence intervals, the probability that all of the
cover the appropriate parameter is 0.599:
0.95ˆ10
[1] 0.599
So we have family-wide error rate of nearly 40%.
We can correct for this by adjusting our critical value. Let’s take a simple example: just two 95% confidence
intervals. The probability that both cover (assuming independence) is
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
132
Comparing More Than Two Means Using ANOVA
0.95ˆ2
[1] 0.902
Now suppose we want both intervals to cover 95% instead of 90.2% of the time. We could get this by forming
two 97.5% confidence intervals.
sqrt(0.95)
[1] 0.975
0.975ˆ2
[1] 0.951
This means we need a larger value for t∗ for each interval.
The ANOVA situation is a little bit more complicated because
• There are more than two comparisons.
• The different comparisons are not independent (because each group mean is used in multiple comparisons).
We will briefly describe two ways to make an adjustment for multiple comparisons.
8.5.5 Bonferroni Corrections – An Easy Over-adjustment
Bonferroni’s idea is simple: Simple divide the desired family-wise error rate by the number of tests or intervals.
This is an over-correction, but it is easy to do, and is used in many situations where a better method is not
known or a quick estimate is desired.
Here is a table showing a few Bonferroni corrections for looking at all pairwise comparisons.
number
groups
number of
pairs of groups
family-wise
error rate
individual
error rate
confidence level
for determining t∗
3
3
.05
0.017
0.983
4
6
.05
0.008
0.992
5
10
.05
0.005
0.995
Similar adjustments could be made for looking at only a special subset of the pairwise comparisons.
8.5.6 Tukey’s Honest Significant Differences
Tukey’s Honest Significant Differences is a better adjustment method specifically designed for making all
pairwise comparisons in an ANOVA situation. (It takes into account the fact that the tests are not independent.)
R can compute Tukey’s Honest Significant Differences easily.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
133
TukeyHSD(lm(shift ˜ treatment, JetLagKnees))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = x)
$treatment
diff
lwr
upr p adj
eyes-control -1.243 -2.168 -0.317 0.008
knee-control -0.027 -0.953 0.899 0.997
knee-eyes
1.216 0.260 2.172 0.012
mplot(TukeyHSD(lm(shift ˜ treatment, JetLagKnees)))
Tukey's Honest Significant Differences
treatment
●
es
ey
e−
e
kn
●
l
tro
on
−c
e
ne
k
●
l
tro
on
c
s−
e
ey
−2
−1
0
1
2
difference in means
Tukey’s method adjusts the confidence intervals, making them a bit wider, to give them the desired familywide error rate. Tukey’s method also adjusts p-values (making them larger), so that when the means are all the
same, there is only a 5% chance that a sample will produce any p-values below 0.05.
In this example we see that the eye group differs significantly from control group and also from the knee
group, but that the knee and control groups are not significantly different. (We can tell this by seeing which
confidence intervals contain 0 or by checking which adjusted p-values are less than 0.05.)
8.5.7 Other Adjustments
There are similar methods for testing other sets of multiple comparisons. Testing “one against all the others”
goes by the name of Dunnet’s method, for example. This is useful when one group represents a control against
which various treatments are being compared.
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014
134
8.6
Comparing More Than Two Means Using ANOVA
Computing F from Summary Statistics
It is possible to compute F from a fairly limited set of summary statistics. Everything we need is in these two
tables:
favstats(shift ˜ treatment, data = JetLagKnees)
.group
min
Q1
1 control -1.27 -0.65
2
eyes -2.83 -1.78
3
knee -1.61 -0.76
median
Q3
max
mean
sd n missing
-0.485 0.24 0.53 -0.309 0.618 8
0
-1.480 -1.10 -0.78 -1.551 0.706 7
0
-0.290 0.17 0.73 -0.336 0.791 7
0
favstats(˜shift, data = JetLagKnees)
min
Q1 median
Q3 max
mean
sd n missing
-2.83 -1.33 -0.66 -0.05 0.73 -0.713 0.89 22
0
Recall that each E component is
E = observed response − group mean
So if we add up all the values of E for group i, we get (ni − 1)si2 .
SSE = (8 - 1) * 0.618ˆ2 + (7 - 1) * 0.706ˆ2 + (7 - 1) * 0.791ˆ2
SSE
[1] 9.42
Similarly, each M component is
M = group mean − grand mean
So the sum of all the values of M for group i is ni times this difference in means:
SSM = 8 * (-0.309 - (-0.713))ˆ2 + 7 * (-1.551 - (-0.713))ˆ2 + 7 * (-0.336 - (-0.713))ˆ2
SSM
[1] 7.22
These match the values in the ANOVA table (up to round-off):
anova(jetlag.model)
Analysis of Variance Table
Response: shift
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2
7.22
3.61
7.29 0.0045
Residuals 19
9.42
0.50
Once we have SSE and SSM, the rest is easily computed.
Last Modified: November 14, 2014
Math 145 : Fall 2014 : Pruim
Comparing More Than Two Means Using ANOVA
135
MSM <- SSM/2
MSM
[1] 3.61
MSE <- SSE/19
MSE
[1] 0.496
F <- MSM/MSE
F
[1] 7.28
p.val <- 1 - pf(F, df1 = 2, df2 = 19)
p.val
[1] 0.0045
Math 145 : Fall 2014 : Pruim
Last Modified: November 14, 2014