Download Lecture 12 Review: ANOVA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
STAT 651
Lecture #13
Copyright (c) Bani K. Mallick
1
Topics in Lecture #13


Multiple comparisons, especially Fisher’s
Least Significant Difference
Residuals as a means of checking the
normality assumption
Copyright (c) Bani K. Mallick
2
Book Sections Covered in Lecture #13

Chapter 8.4 (Residuals)

Chapter 9.4 (Fisher’s)

Chapter 9.1 (the idea of multiple
comparisons)
Copyright (c) Bani K. Mallick
3
Lecture 12 Review: ANOVA

Suppose we form three populations on the
basis of body mass index (BMI):

BMI < 22, 22 <= BMI < 28, BMI > 28

This forms 3 populations

We want to know whether the three
populations have the same mean caloric
intake, or if their food composition differs.
Copyright (c) Bani K. Mallick
4
Lecture 12 Review: ANOVA



One procedure that is often followed is to do
a preliminary test to see whether there are
any differences among the populations
Then, once you conclude that some
differences exist, you allow somewhat more
informality in deciding where those
differences manifest themselves
The first step is the ANOVA F-test
Copyright (c) Bani K. Mallick
5
Lecture 12 Review: ANOVA

The distance of the data to the overall mean
2
is
TSS =
 (Y
ij
 Y  )
ij

TSS = (Corrected) Total Sum of Squares

This has
nT 1
degrees of freedom
Copyright (c) Bani K. Mallick
6
Lecture 12 Review: ANOVA

The sum of squares between groups
Corrected Model) is
 n (Y
i
i
 Y  )
2
i

It has t-1 degrees of freedom, so the number
of populations is the degrees of freedom
between groups + 1.
Copyright (c) Bani K. Mallick
7
Lecture 12 Review: ANOVA

The distance of the observations to their
sample means is
SSE =
 (Y
ij
 Y i )
2
ij

This is the Sum of Squares for Error

It has
nT  t
degrees of freedom
Copyright (c) Bani K. Mallick
8
Lecture 12 Review: ANOVA



Next comes the F-statistic
It is the ratio of the mean square for the
corrected model to the mean square for
error
Large values indicate rejection of the null
hypothesis
Tests of Between-Subjects Effects
Dependent Variable: Bas eline FFQ
Source
Corrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sum
of Squares
960.287 a
196009.919
960.287
15275.639
226223.216
16235.925
df
2
1
2
181
184
183
Mean Square
480.143
196009.919
480.143
84.396
F
5.689
2322.508
5.689
Sig.
.004
.000
.004
a. R Squared = .059 (Adjusted R Squared = .049)
Copyright (c) Bani K. Mallick
9
Lecture 12 Review: ANOVA



The F-statistic is compared to the Fdistribution with t-1 and n  t degrees
T
of freedom.
See Table 8 ,which lists the cutoff points in
terms of a. If the F-statistic exceeds the
cutoff, you reject the hypothesis of equality of
all the means.
SPSS gives you the p-value (significance
level) for this test
Copyright (c) Bani K. Mallick
10
Lecture 12 Review: ANOVA



The F-statistic is compared to the Fdistribution with df1 = t-1 and df =n
2
T
degrees of freedom.
t
For example if you have 3 populations, 6
observations for each population, then there
are 18 total observations.
The degrees of freedom are 2 and 15. If you
want a type I error of 5%, look at df1 = 2,
df2 = 15, a = .05 to get a critical value of
3.68: try this out!
Copyright (c) Bani K. Mallick
11
Lecture 12 Review: ANOVA


If the populations have a common variance
s2, the Mean squared error estimates it.
You take the square root of the MSE to
estimate s
Tests of Between-Subjects Effects
Dependent Variable: Bas eline FFQ
Source
Corrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sum
of Squares
960.287 a
196009.919
960.287
15275.639
226223.216
16235.925
df
2
1
2
181
184
183
Mean Square
480.143
196009.919
480.143
84.396
F
5.689
2322.508
5.689
Sig.
.004
.000
.004
a. R Squared = .059 (Adjusted R Squared = .049)
Copyright (c) Bani K. Mallick
12
Lecture 12 Review: ANOVA


The critical value of 2 and 181 df for an F-test
at Type I error 0.05 is about 3.05
Hence F > 3.05, so the p-value is < 0.05
Tests of Between-Subjects Effects
Dependent Variable: Bas eline FFQ
Source
Corrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sum
of Squares
960.287 a
196009.919
960.287
15275.639
226223.216
16235.925
df
2
1
2
181
184
183
Mean Square
480.143
196009.919
480.143
84.396
F
5.689
2322.508
5.689
Sig.
.004
.000
.004
a. R Squared = .059 (Adjusted R Squared = .049)
Copyright (c) Bani K. Mallick
13
ANOVA in SPSS




“Analyze”, “General Linear Model”,
“Univariate”
“Fixed factor” = the variable defining the
populations
Always “Save” unstandardized residuals
“Posthoc”: Move factor to right and click on
LSD
Copyright (c) Bani K. Mallick
14
ANOVA Table
Tests of Between-Subjects Effects
Dependent Variable: Bas eline FFQ
Source
Corrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sum
of Squares
960.287 a
196009.919
960.287
15275.639
226223.216
16235.925
df
2
1
2
181
184
183
Mean Square
480.143
196009.919
480.143
84.396
F
5.689
2322.508
5.689
Sig.
.004
.000
.004
a. R Squared = .059 (Adjusted R Squared = .049)
Copyright (c) Bani K. Mallick
15
Fisher’s Least Significant Distance
(LSD)




Suppose that we determine that there are at
least some differences among t population
means.
Fisher’s Least Significant Difference is one
way to tell which ones are different
The main reason to use it is convenience: all
comparisons can be done with the click of a
mouse
It does not guarantee longer or shorter
confidence intervals
Copyright (c) Bani K. Mallick
16
Fisher’s Least Significant Distance
(LSD)

For example, suppose there are t = 3
populations.

The null hypothesis is

The alternative is: H 0 : null hypothesis is false

Η 0 :μ1 =μ 2 =μ 3
But this does not tell you which populations
are different, only that some are
Copyright (c) Bani K. Mallick
17
Fisher’s Least Significant Distance
(LSD)
H 0 :μ1 =μ 2 =μ 3

The null hypothesis is

The alternative is:

There are 4 possibilities:

Fishers LSD is a way of getting this directly
H 0 : null hypothesis is false
Copyright (c) Bani K. Mallick
1   2  3
1  3   2
 2  3  1
1   2  3
18
Fisher’s LSD


We have done an ANOVA, and now we want
to compare two specific populations.
Fisher’s LSD differs from our usual 2population comparisons in two features:

The degrees of freedom (nT-t) not n1+n2-2

The pooled standard deviation (square root of
MSE = SSE/(nT-t) , not sP
Copyright (c) Bani K. Mallick
19
Review: Comparing Two Populations

If you can reasonably believe that the
population sd’s are nearly equal, it is
customary to pick the equal variance
assumption and estimate the common
standard deviation by
sp 
(n1  1)s  (n 2  1)s
n1 +n 2  2
2
1
Copyright (c) Bani K. Mallick
2
2
20
Comparing Two Populations: Usual
and Fisher LSD
Usual
X1  X 2  ta /2 (n1 +n 2 -2)s p
1 1

n1 n 2
Fisher
1   2  ta 2  n T  t 
1 1
MSE

n1 n 2
Copyright (c) Bani K. Mallick
21
ROS Data


ROS data has three groups: Fish oil diet, Fishlike oil diet, and Corn Oil
We want to compare their responses to
butyrate
Between-Subjects Factors
Diet
Group
1.00
2.00
3.00
Value Label
FAEE oil
diet
Fish oil diet
Corn oil diet
Copyright (c) Bani K. Mallick
N
10
10
10
22
ANOVA

ROS data, log scale. What do you see?
ROS Response After Butyrate Exposure
2.0
1.5
1.0
24
.5
0.0
-.5
N=
10
10
10
FAEE oil diet
Fish oil diet
Corn oil diet
Diet Group
Copyright (c) Bani K. Mallick
23
ANOVA
ROS data, log scale. What do you see?
Maybe different variances, but sample
sizes are small
ROS Response After Butyrate Exposure
2.0
log(Butyrate) - log(Control)

1.5
1.0
24
.5
0.0
-.5
N=
10
10
10
FAEE oil diet
Fish oil diet
Corn oil diet
Diet Group
Copyright (c) Bani K. Mallick
24
ANOVA
ROS data, log scale. No major changes in
means?
ROS Response After Butyrate Exposure
2.0
log(Butyrate) - log(Control)

1.5
1.0
24
.5
0.0
-.5
N=
10
10
10
FAEE oil diet
Fish oil diet
Corn oil diet
Diet Group
Copyright (c) Bani K. Mallick
25
ANOVA


ROS data has three groups: Fish oil diet, Fishlike oil diet, and Corn Oil
What was the total sample size? n = 30
Tests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
Source
Corrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sum
of Squares
5.188E-02a
5.957
5.188E-02
3.456
9.465
3.508
df
2
1
2
27
30
29
Mean Square
2.594E-02
5.957
2.594E-02
.128
F
.203
46.542
.203
Sig.
.818
.000
.818
a. R Squared = .015 (Adjusted R Squared = -.058)
Copyright (c) Bani K. Mallick
26
ANOVA

ROS data: any evidence that the population
means are different in their change after
butyrate exposure?
Tests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
Source
Corrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sum
of Squares
5.188E-02 a
5.957
5.188E-02
3.456
9.465
3.508
df
2
1
2
27
30
29
Mean Square
2.594E-02
5.957
2.594E-02
.128
F
.203
46.542
.203
Sig.
.818
.000
.818
a. R Squared = .015 (Adjusted R Squared = -.058)
Copyright (c) Bani K. Mallick
27
ANOVA


ROS data: any evidence that the population
means are different in their change after
butyrate exposure? No, the p-value is
0.818!
This matches the box plots
Tests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
Source
Corrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sum
of Squares
5.188E-02 a
5.957
5.188E-02
3.456
9.465
3.508
df
2
1
2
27
30
29
Mean Square
2.594E-02
5.957
2.594E-02
.128
F
.203
46.542
.203
Sig.
.818
.000
.818
a. R Squared = .015 (Adjusted R Squared = -.058)
Copyright (c) Bani K. Mallick
28
ROS Data





Testing for Normality in ANOVA
I use the General Linear Model to define
these residuals
Form the residuals, which are simply the
differences of the data with their group
sample mean
Then do a q-q plot
Useful if you have many groups with a small
number of observations per group
Copyright (c) Bani K. Mallick
29
ANOVA

Here is the Q-Q plot. How’s it look?
ROS: log scale
.8
.6
.4
Expected Normal Value
.2
0.0
-.2
-.4
-.6
-.8
-1.0
-.5
0.0
.5
1.0
Observed Value
Copyright (c) Bani K. Mallick
30
ROS Data




Testing for Normality in ANOVA:
Illustrate saving residuals: “general linear
model”, “univariate”, “save” (select
“unstandardized” to create the residual
variable )
Illustrate q-q- plot on residuals
Illustrate editing a chart object to change
titles and the like
Copyright (c) Bani K. Mallick
31
ROS Data

Fisher’s LSD. Note how all p-values are >
0.10.
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
Pvalues
(I) Diet Group
FAEE oil diet
Fish oil diet
Corn oil diet
(J) Diet Group
Fish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
Mean
Difference
(I-J)
6.825E-02
9.960E-02
-6.8255E-02
3.135E-02
-9.9605E-02
-3.1350E-02
Std. Error
.1600
.1600
.1600
.1600
.1600
.1600
Sig.
.673
.539
.673
.846
.539
.846
95% Confidence Interval
Lower Bound
Upper Bound
-.2600
.3965
-.2287
.4279
-.3965
.2600
-.2969
.3596
-.4279
.2287
-.3596
.2969
Based on observed means.
Copyright (c) Bani K. Mallick
32
ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn =
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
Pvalues
(I) Diet Group
FAEE oil diet
Fish oil diet
Corn oil diet
(J) Diet Group
Fish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
Mean
Difference
(I-J)
6.825E-02
9.960E-02
-6.8255E-02
3.135E-02
-9.9605E-02
-3.1350E-02
Std. Error
.1600
.1600
.1600
.1600
.1600
.1600
Sig.
.673
.539
.673
.846
.539
.846
95% Confidence Interval
Lower Bound
Upper Bound
-.2600
.3965
-.2287
.4279
-.3965
.2600
-.2969
.3596
-.4279
.2287
-.3596
.2969
Based on observed means.
Copyright (c) Bani K. Mallick
33
ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error =
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
Pvalues
(I) Diet Group
FAEE oil diet
Fish oil diet
Corn oil diet
(J) Diet Group
Fish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
Mean
Difference
(I-J)
6.825E-02
9.960E-02
-6.8255E-02
3.135E-02
-9.9605E-02
-3.1350E-02
Std. Error
.1600
.1600
.1600
.1600
.1600
.1600
Sig.
.673
.539
.673
.846
.539
.846
95% Confidence Interval
Lower Bound
Upper Bound
-.2600
.3965
-.2287
.4279
-.3965
.2600
-.2969
.3596
-.4279
.2287
-.3596
.2969
Based on observed means.
Copyright (c) Bani K. Mallick
34
ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) =
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
Pvalues
(I) Diet Group
FAEE oil diet
Fish oil diet
Corn oil diet
(J) Diet Group
Fish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
Mean
Difference
(I-J)
6.825E-02
9.960E-02
-6.8255E-02
3.135E-02
-9.9605E-02
-3.1350E-02
Std. Error
.1600
.1600
.1600
.1600
.1600
.1600
Sig.
.673
.539
.673
.846
.539
.846
95% Confidence Interval
Lower Bound
Upper Bound
-.2600
.3965
-.2287
.4279
-.3965
.2600
-.2969
.3596
-.4279
.2287
-.3596
.2969
Based on observed means.
Copyright (c) Bani K. Mallick
35
ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) = -2969 to .3596
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
Pvalues
(I) Diet Group
FAEE oil diet
Fish oil diet
Corn oil diet
(J) Diet Group
Fish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
Mean
Difference
(I-J)
6.825E-02
9.960E-02
-6.8255E-02
3.135E-02
-9.9605E-02
-3.1350E-02
Std. Error
.1600
.1600
.1600
.1600
.1600
.1600
Sig.
.673
.539
.673
.846
.539
.846
95% Confidence Interval
Lower Bound
Upper Bound
-.2600
.3965
-.2287
.4279
-.3965
.2600
-.2969
.3596
-.4279
.2287
-.3596
.2969
Based on observed means.
Copyright (c) Bani K. Mallick
36
Concho Water Snake Illustration

A numerical example will help illustrate this
idea. I’ll consider comparing tail lengths of
female Concho Water Snakes with age
classes 2,3, and 4.
n1  11,n 2  17,n 3  9,n T  37.
s1  17.90,s2  10.95,s3  13.58.

Sample sizes

Sample sd:

Sample means:
1  153.82, 2  173.24, 3  194.67.
Copyright (c) Bani K. Mallick
37
Female Concho Water Snakes, Ages 2-4,
Tail Length
Between-Subjects Factors
N
Age
2.00
3.00
4.00
Copyright (c) Bani K. Mallick
11
17
9
38
Female Concho Water Snakes, Ages 2-4,
Tail Length
220
200
180
35
Tail Length
160
140
27
120
N=
11
17
9
2.00
3.00
4.00
Age
Copyright (c) Bani K. Mallick
39
Female Concho Water Snakes, Ages 2-4,
Tail Length: are they different in
population means?
Tests of Between-Subjects Effects
Dependent Variabl e: Tail Length
Source
Corrected Model
Intercept
AGE
Error
Total
Corrected Total
Type III Sum
of Squares
8269.413a
1043505.649
8269.413
6598.695
1118093.000
14868.108
df
2
1
2
34
37
36
Mean Square
4134.706
1043505.649
4134.706
194.079
F
21.304
5376.698
21.304
Sig.
.000
.000
.000
a. R Squared = .556 (Adjusted R Squared = .530)
Copyright (c) Bani K. Mallick
40
Concho Water Snake Example
Multiple Comparisons
Dependent Variable: Tail Length
LSD
(I) Age
2.00
3.00
4.00
(J) Age
3.00
4.00
2.00
4.00
2.00
3.00
Mean
Difference
(I-J)
-19.4171 *
-40.8485 *
19.4171 *
-21.4314 *
40.8485 *
21.4314 *
Std. Error
5.3907
6.2616
5.3907
5.7429
6.2616
5.7429
Sig.
.001
.000
.001
.001
.000
.001
95% Confidence Interval
Lower Bound
Upper Bound
-30.3724
-8.4618
-53.5736
-28.1233
8.4618
30.3724
-33.1023
-9.7604
28.1233
53.5736
9.7604
33.1023
Based on observed means.
*. The mean difference is significant at the .05 level.
Copyright (c) Bani K. Mallick
41
Concho Water Snake Illustration:
Hand Calculations

Sample size factor for comparing the age
groups
1
1

 0.41
n 2 n3

Sample mean difference
3  2  21.43
Copyright (c) Bani K. Mallick
42
Concho Water Snake Illustration

nT – t = 34 degrees of freedom for error

MSE = 194.08,

a= 0.05
MSE  13.93
ta 2  n T  t   2.03
3  2  21.43
3   2  ta 2  n T  t 

1
1
MSE

n 2 n3
= 9.76 to 33.10: compare with output
Copyright (c) Bani K. Mallick
43
Female Concho Water Snakes, Ages 2-4,
Tail Length
We need a method
that allows for nonnormal data!
Normal Q-Q Plot of Residual for TAILL
30
20
Expected Normal Value
10
0
-10
-20
-30
-40
-30
-20
-10
0
10
20
30
Observed Value
Copyright (c) Bani K. Mallick
44