Download anova glm 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Inferential Statistics II:
Introduction to ANOVA
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael J. Kalsher
Department of
Cognitive Science
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2012, Michael Kalsher
1 of 47
ANOVA: A Framework
• Understand the basic principles of ANOVA
– Why it is done?
– What it tells us?
• Theory of one-way independent ANOVA
• Following up an ANOVA:
– Planned Contrasts/Comparisons
• Choosing Contrasts
• Coding Contrasts
– Post Hoc Tests
• Writing up Results
2 of 47
Why ANOVA?
•
t tests are limited to situations in which there
are only two levels of a single independent
variable or two associated groups.
•
There are many instances in which we’d like to
compare more than two levels. But …
performing multiple t tests can inflate the Type
I error rate.
•
The concept of “Omnibus test.”
•
Familywise error: 1 - (.95)n
3 of 47
What is ANOVA
Tests the null hypothesis that three or more means are
roughly equal.
Produces a test statistic termed the F-ratio.
Systematic Variance (SSM)
Unsystematic Variance (SSR)
The F-ratio tells us only that the experimental manipulation
has had an effect—not where the effect has occurred.
-- Planned comparisons (before you collect the data)
-- Post-hoc tests (after-the-fact “snooping”).
-- Both conducted to control the overall Type I error rate at 5%.
-- Guards against familywise error [1 - (.95)n ]
-- For a study with 3 conditions, Type 1 error rate = 14.3%!
4 of 47
ANOVA as Regression
Suppose we are interested in determining whether the drug Viagra is
effective at making someone a better lover. A group of men are then
randomly assigned to one of three groups: (1) Placebo (sugar pills);
(2) Low-dose Viagra; or (3) High-dose Viagra. The dependent
measure is an “objective” measure of libido. The data are below:
5 of 47
ANOVA as Regression
To predict levels of libido from the different levels of
Viagra, we can use the general equation:
Outcomei = (model) + errori
Libido i  b0  b2 High i  b1Low i   i
6 of 47
Model for the Placebo group
(base category)
Libido i  b0  b2 High i  b1Low i   i
Libido i  b0  b2  0   b1  0 
Libido i  b0
XPlacebo  b0
The intercept, b0, is
always equal to the mean
of the base category
We are predicting the level
of libido when both doses of
Viagra are ignored. Thus,
the predicted value will be
the mean of the placebo
group.
7 of 47
Model for the High-dose group
Libido i  b0  b2 High i  b1Low i   i
Libido i  b0  b2  1  b1  0 
Libido i  b0  b2
Libido i  b0  b2
X High  X Placebo  b2
b2  X High  X Placebo
b2 represents the difference
between the means of the
High-dose group and the
Placebo (base) group.
8 of 47
Model for the Low-dose group
Libido i  b0  b2 High i  b1Low i   i
Libido i  b0  b2  0   b1  1
Libido i  b0  b1
Libido i  b0  b1
X Low  X Placebo  b1
b1  X Low  X Placebo
b1 represents the difference
between the means of the
Low-dose group and the
Placebo (base) group.
9 of 47
Regression Output
Conclusion: Using group means to
predict the outcome is significantly
better than using the overall mean.
b2 = High Dose
b1 = Low Dose
Note:
b2 = the difference between the means of the high-dose group and the placebo group (5.0 - 2.2 = 2.8).
b1 = the difference between the means of the low-dose group and the placebo group (3.2 - 2.2 = 1)
10 of 47
Theory of ANOVA:
Partitioning Variance
• We compare the amount of variability
explained by the Model (MSM), to the error in
the model [individual differences] (MSR)
– This ratio is called the F-ratio.
• If the model explains a lot more variability than
it can’t explain, then the experimental
manipulation has had a significant effect on
the outcome (DV).
11 of 47
Partitioning the Variance
Total sum of squares (SST)
Model sum of squares (SSM)
Residual sum of squares (SSR)
12 of 47
Partitioning the Variance
Basis
for
SSR
XHighDose = 5.00
Basis
for
SST
Grand Mean = 3.467
Basis for
SSM
XLowDose = 3.20
XPlacebo = 2.20
Base Category
SSM = differences
SSR = based on
SST = based on
between predicted values
(group means) and the
grand mean
differences between
person’s score and
their group mean.
differences between
each data point and
the grand mean
13 of 47
Total sum of squares (SST)
Compute the difference between each data point and the grand
mean, square these differences, then add them together.
1
SST 

(x i  x grand)2
Grand Mean = 3.467
2
3
4
2
s  (NSS1)
Grand SD = 1.767
Grand Variance = 3.124
SS  s 2 N  1
SST 
2
s grand
N  1
SST  3.124 15  1
 43.74
14 of 47
Model sum of squares (SSM)
1. Calculate the difference between the mean of each group and the
grand mean.
2. Square each of these differences
3. Multiply each result by the number of participants within that
group.
4. Add the values for each group together.
SS M 

ni (x i  x grand)2
SS M  52.2  3.467 2  53.2  3.467 2  55.0  3.467 2
 5 1.267 2  5 0.267 2  51.5332
 8.025  0.355  11.755
 20.135
15 of 47
Residual sum of squares (SSR)
SSR = SST - SSM … But, don’t rely on this relationship
(e.g., might not work out if either SST or SSM is miscalculated)
SSR is the difference between what the model predicts and
what was actually observed. Therefore it is calculated by
by looking at the difference between the score obtained by
a person and the mean of the group to which the person
belongs.
16 of 47
Total sum of squares (SSR)
SS R  (x i  x i )2
SS R 

si2 ni  1
s 2  (NSS1)
SS  s 2 N  1
2
2
2




SS R  s group
n

1

s
n

1

s
1 1
group2 2
group3 n3  1
2
2
2




SS R  s group
n

1

s
n

1

s
1 1
group 2 2
group3 n3  1
 1.70 5  1  1.70 5  1  2.50 5  1
 1.70  4   1.70  4   2.50  4 
 6.8  6.8  10
 23.60
17 of 47
Mean Squares (MSM and MSR)
SSM = amount of variation explained by the model
(exp. manipulation).
SSR = amount of variation due to extraneous factors.
These are “summed” scores and will therefore be influenced by the
number of scores. To eliminate this bias we calculate the average sum
of squares (mean squares) by dividing by the appropriate degrees of
freedom.
Calculating Degrees of Freedom
(for one-way independent groups ANOVA)
dftotal
= N - 1 (number of all scores minus 1)
dfM / between = k - 1 (number of groups minus 1)
df R / within = N - k (number of all scores minus number of groups)
18 of 47
The F-ratio
F = MSM/MSR and is a measure of the ratio of variation
due to the experimental effect to the variation due to
individual differences (unexplained variance).
Therefore, we again return to the basic conceptual model:
Test Statistic = systematic variance
unsystematic variance
19 of 47
Assumptions
The one-way independent groups ANOVA test requires
the following statistical assumptions:
1. Random and independent sampling.
2. Data are from normally distributed populations.
Note: This test is robust against violation of this assumption if n > 30 for both
groups.
3. Variances in these populations are roughly equal
(Homogeneity of variance).
Note: This test is robust against violation of this assumption if all group sizes are equal.
20 of 47
1-Way Independent-Groups ANOVA
Characteristics:
• Used when testing 3 or more
experimental groups.
• Each participant contributes
only one score to the data.
• Used to test the (null)
hypothesis that several
means are equal.
Example: Testing the idea that
exposure to mobile phones will lead to
brain tumors.
IV: Hrs./day exposure
DV: Size of brain tumor
5
4
3
Size of
T umor
(mm
cubed)
2
1
0
0
1
2
3
4
5
Mobile Phone Use
(hours per day)
21 of 47
Steps in the Analysis
-- Calculate the Model (or between-groups) sum of
squares (SSM) and the Residual (or within-groups) sum
of squares (SSR).
-- The number of scores used to calculate SSM and SSR
are different, so to make them comparable, we convert
them to the average sums of square or “mean squares”
(MS) by dividing by their appropriate degrees of
freedom.
-- F = MSM/MSR and is a measure of the ratio of variation
due to the experimental effect to the variation due to
individual differences (unexplained variance).
22 of 47
Degrees of Freedom (df):
One-way Independent-Groups ANOVA
Formulas for the number of degrees of freedom in one-way
independent-groups ANOVA differ for each SS term. The degrees of
freedom (df) reflect the number of values free to vary in any sum of
squares term.
Calculating Degrees of Freedom
dftotal
= N - 1 (number of all scores minus 1)
dfbetween = k - 1 (number of groups minus 1)
df within = N - k (number of all scores minus number of groups)
23 of 47
SPSS Summary Tables
Hrs. of
Exposure
N
95% Confidence Interval
Size of
Tumor
Std.
Mean
Deviation
Std.
Error
Lower
Upper
Bound
Bound
Min.
Max.
0
20
.0175
.01213
.00271
.0119
.0232
.00
.04
1
20
.5149
.28419
.06355
.3819
.6479
.00
.94
2
20
1.2614
.49218
.11005
1.0310
1.4917
.48
2.34
3
20
3.0216
.76556
.17118
2.6633
3.3799
1.77
4.31
4
20
4.8878
.69625
.15569
4.5619
5.2137
3.04
6.05
5
20
4.7306
.78163
.17478
4.3648
5.0964
2.70
6.14
Total
120
2.4056
2.02662
.18500
2.0393
2.7720
.00
6.14
Sum of
Between Groups
Squares
450.664
Within Groups
Total
Mean
df
5
Square
90.133
38.094
114
.334
488.758
119
F
269.733
Sig.
.000
24 of 47
Levene’s Test:
Testing the Homogeneity of
Variance Assumption
If Levene’s test is significant (p<.05), then the variances
are significantly different.
Levene
Statistic
df1
10.245
5
df2
Sig.
114 .000
What to do if Levene’s test is significant?
1. Transform the data
2. Report the F value and the Levene’s test results and allow
your audience to assess the accuracy of your results for
themselves.
25 of 47
Computing Effect Size
(2 - eta squared and 2 - omega squared)
Analogous to r2, eta-squared reflects the proportion of DV
variance explained by each IV in the sample data.
It tends to be upwardly biased (i.e., an overestimate).
2 = SSModel / SStotal
• One 2 for each effect (i.e., 1 per IV plus 1 per interaction) and the 2s sum to 1.
• SPSS reports partial eta squared (p2 = SSEffect / SSEffect+ SSError)
• In 1-way ANOVA, eta-squared and partial eta-squared are the same.
An alternative, omega-squared (2), estimates the
proportion of DV variance explained by the IVs in the
population (see Field, 2005, p. 357-359).
26 of 47
Computing Effect Size
(2 vs. 2 omega squared)
2
=
SSM = 450.66 = .922
SST
488.76
2 = SSM - (dfM)(MSR)
(called eta squared, 2 )
=
450.664 - (5 x .334)
488.758 + .334
SST + MSR
=
449.01
489.09
=
.918
(called omega squared, 2 )
27 of 47
Reporting the Results
Levene’s test indicated that the assumption of homogeneity of
variance had been violated, F(5, 114) = 10.25, p<.001.
Transforming the data did not rectify this problem and so F tests
are reported nonetheless. The results show that using a mobile
phone significantly affected the size of brain tumor found in
participants, F(5, 114) = 269.73, p<.001, 2 = .92. Tumor size
(measured in mm) increased with the duration of exposure.
Specifically, tumor size for 0, 1, 2, 3, 4 or 5 hours/day of exposure
was .02 mm, .51 mm, 1.26 mm, 3.02 mm, 4.89 mm, and 4.73 mm,
respectively. The effect size indicated that the effect of phone use
on tumor size was substantial. Games-Howell post hoc tests
revealed significant differences between all groups (p<.001 for all
tests) except between 4- and 5-hours (ns).
28 of 47
Computing a One-way
Independent-groups
ANOVA by hand
29 of 47
30 of 47
31 of 47
Critical Values for F
32 of 47
Computing Effect Size
(2 vs 2 Omega squared)
2
=
2 =
=
SSM = 93.17 = .69
SST
134.67
SSM - (dfM x MSR)
MSR + SST
83.95
139.28
=
=
93.17 - (2 x 4.61)
4.61 + 134.67
.60
33 of 47
Computing the One-way
Independent-groups
ANOVA using SPSS
34 of 47
SPSS Variable View:
Assign Value Labels
35 of 47
SPSS Data View:
Type in the Data
36 of 47
SPSS Data Editor:
Select Procedure
37 of 47
Specify IV and DV, then select “options”
Select “Descriptives” and “Homogeneity test”.
If the variances are unequal, choose “BrownForsythe” or “Welch” tests.
38
38 of 47
Choosing Follow-up tests:
Contrasts vs. Post-Hoc
Select “Contrasts” if you have apriori hypotheses.
Select “Post-Hoc” if you do not.
39
39 of 47
Planned Contrasts
The first contrast compares the combined
“experimental drug groups” against the
control (Saltwater) group.
(if we have specific hypotheses)
The second contrast compares the the
two experimental groups, Fenfluramine
vs. Amphetamine.
40
40 of 47
Post-Hoc Tests (if we do NOT have specific hypotheses)
LSD (Least Significant Difference) does not control Type I error rate.
Bonferroni tightly controls Type I error rate. Other post-hoc
procedures have specific applications (see Field text).
41
41 of 47
SPSS ANOVA Output:
Descriptives and ANOVA Summary Table
42 of 47
SPSS ANOVA Output: Planned Contrasts
Levene’s test was not significant (p>.05)
and so we can assume equal variances.
In the Contrast Tests summary, use the
“Assume equal variances” row values.
Contrast 1: weight gain among rats
receiving the experimental drugs is
significantly different from rats
receiving saltwater.
Contrast 2: weight gain among rats
taking fenfluramine and amphetamine
are not significantly different.
43 of 47
Post-hoc tests
After the Fact “Snooping”: Some Considerations
LSD does not control Type I error rate, whereas the Bonferroni
test tightly controls the Type I error rate.
Given equal sample sizes and equal variances, use REGWQ or
Tukey HSD, as both have good power and tight control over Type
I error rate.
If sample sizes across groups are slightly different, use Gabriel’s
procedure because it has greater power, but if sample sizes are
very different use Hochberg’s GT2.
If there is any doubt that the population variances are equal, use
the Games-Howell procedure, in place of, or in addition to other
44
tests.
44 of 47
Post-Hoc tests:
LSD vs Bonferroni
45 of 47
Reporting the Results
The results show that the amount of weight gained by the rats was
significantly affected by the type of drug they received, F(2,9) = 10.10,
p < .01, 2 = .60. Specifically, weight gain was greatest for rats
receiving Saltwater (M=10.0, SE=.82) and least for rats receiving
Fenfluramine (M=3.25, SE=1.11). Weight gain for rats receiving
Amphetamine was intermediate (M=5.75, SE=1.25).
If you used planned comparisons …
Planned comparisons revealed a significant difference in weight gain between
the combined experimental groups and the saltwater control group, t(9)=-4.18,
p<.05 (1-tailed). The difference in weight gain between the two experimental
groups receiving fenfluramine and amphetamine, respectively, was not
significant, t(9)= -1.65, p>.05 (1-tailed).
If you used post-hoc tests …
Bonferroni post hoc tests revealed a significant difference in weight gain
between the Fenfluramine and Saltwater groups (p<.05). No other
comparisons were significant (ps>.05).
46 of 47
Sample Problem: Your turn…
47 of 47