Download One-Way Analysis Of Variance (ANOVA)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
One-Way Analysis of Variance
(ANOVA)
Aside
I dislike the ANOVA chapter (11) in the
book with a passion – if it were all written
like this I wouldn’t use the thing
Therefore, don’t worry if you don’t
understand it, use my notes – Exam
questions on ANOVA will come from my
lectures and notes
One-Way ANOVA
One-Way Analysis of Variance
aka One-Way ANOVA
Most widely used statistical technique in all of
statistics
One-Way refers to the fact that only one IV and
one DV are being analyzed (like the t-test)
i.e. An independent-samples t-test with treatment and
control groups where the treatment (present in the tx
grp and absent in the control grp) is the IV
One-Way ANOVA
 Unlike the t-test, the ANOVA can look at levels or
subgroups of IV’s
The t-test can only test if an IV is there or not, not
differences between subgroups of the IV
I.e. our experiment is to test the effect of hair color (our
IV) on intelligence
 One t-test can only test if brunettes are smarter than
blondes, any other comparison would involve doing
another t-test
 A one-way ANOVA can test many subgroups or levels of
our IV “hair color”, for instance blondes, brunettes, and
redheads are all subtypes of hair color, can so can be
tested with one one-way ANOVA
One-Way ANOVA
Other examples of subgroups:
If “race” is your IV, then caucasian, african-american,
asian-american, hispanic (4) are all subgroups/levels
If “gender” is your IV, than male and female (2) are
your levels
If “treatment” is your IV, then some treatment, a little
treatment, and a lot of treatment (3) can be your
levels
One-Way ANOVA
 OK, so why not just do a lot of t-tests and keep
things simple?
1. Many t-tests will inflate our Type I Error rate!
 This is an example of using many statistical tests to
evaluate one hypothesis – see: the Bonferroni
Correction
2. It is less time consuming
 There is a simple way to do the same thing in ANOVA,
they are called post-hoc tests, and we will go over them
later on
 However, with only one DV and one IV (with only two
levels), the ANOVA and t-test are mathematically
identical, since they are essentially derived from the
same source
One-Way ANOVA
 Therefore, the ANOVA and the t-test have similar
assumptions:
Assumption of Normality
 Like the t-test you can place fast and loose with this one,
especially with large enough sample size – see: the
Central Limit Theorem
Assumption of Homogeneity of Variance
(Homoscedasticity)
 Like the t-test, this isn’t problematic unless one level’s
variance is much larger than one the others (~4 times as
large) – the one-way ANOVA is robust to small violations
of this assumption, so long as group size is roughly equal
One-Way ANOVA
Independence of Observations
Like the t-test, the ANOVA is very sensitive to
violations of this assumption – if violated it is more
appropriate to use a Repeated-Measures ANOVA
One-Way ANOVA
Hypothesis testing in ANOVA:
Since ANOVA tests for differences between
means for multiple groups or levels of our IV,
then H1 is that there is a difference somewhere
between these group means
H1 = μ1 ≠ μ2 ≠ μ3 ≠ μ4, etc…
Ho = μ1 = μ2 = μ3 = μ4, etc…
These hypothesis are called omnibus hypothesis,
and tests of these hypotheses omnibus tests
One-Way ANOVA
 However, our F-statistic does not tell us where this
difference lies
 If we have 4 groups, group 1 could differ from groups 2-4,
groups 2 and 4 could differ from groups 1 and 3, group 1 and
2 could differ from 3, but not 4, etc.
 Since our hypothesis should be as precise as possible
(presuming you’re researching something that isn’t
completely new), you will want to determine the precise
nature of these differences
 You can do this using multiple comparison techniques (more
on this later)
One-Way ANOVA
The basic logic behind the ANOVA:
The ANOVA yields and F-statistic (just like the ttest gave us a t-statistic)
The basic form of the F-statistic is:
MStreatment/MSerror
MS = mean square or the mean of squares
(why it is called this will be more obvious later)
One-Way ANOVA
The basic logic behind the ANOVA:
MSbetween or MStreatment = average variability
(variance) between the levels of our IV/groups
Ideally we want to maximize MStreatment, because
we’re predicting that our IV will differentially effect our
groups
i.e. if our IV is treatment, and the levels are no
treatment vs. a lot of treatment, we would expect the
treatment group mean to be very different than the no
treatment mean – this results in lots of variability
between these groups
One-Way ANOVA
The basic logic behind the ANOVA:
MSwithin or MSerror = average variance among subjects
in the same group
• Ideally we want to minimize MSerror, because ideally our
IV (treatment) influences everyone equally – everyone
improves, and does so at the same rate (i.e. variability is
low)
If F = MStreatment/ MSerror, then making MStreatment
large and MSerror small will result in a large value
of F
Like t, a large value corresponds to small pvalues, which makes it more likely to reject Ho
One-Way ANOVA
However, before we calculate MS, we
need to calculate what are called sums of
squares, or SS
SS = the sum of squared deviations around the
mean
Does this sound familiar? What does this sound
like?
Just like MS, we have SSerror and SStreatment
Unlike MS, we also have SStotal = SSerror +
SStreatment
One-Way ANOVA
 SStotal = Σ(Xij - X .. )2 = X
2

X 

2
N
 It’s the formula for our old friend variance,
minus the n-1 denominator!
 Note: N = the number of subjects in all of the
groups added together
One-Way ANOVA
 SStreatment =

nj X j  X ..

2
This means we:
1.Subtract the grand mean, or the mean of all of the
individual data points, from each group mean
2.Square these numbers
3.Multiply them by the number of subjects from that
particular group
4.Sum them
 Note: n = number of subjects per group
 Hint: The number of numbers that you sum should equal the
number of groups
One-Way ANOVA
 That leaves us with SSerror = SStotal – SStreatment
Remember: SStotal = SSerror + SStreatment
 Degrees of freedom:
Just as we have SStotal,SSerror, and SStreatment, we also
have dftotal, dferror, and dftreatment
 dftotal = N – 1 OR the total number of subjects in all groups
minus 1
 dftreatment = k – 1 OR the number of levels of our IV (aka
groups) minus 1
 dferror = N – k OR the total number of subjects minus the
number of groups OR dftotal - dftreatment
One-Way ANOVA
Now that we have our SS and df, we can
calculate MS
MStreatment = SStreatment/dftreatment
MSerror = SSerror/dferror
Remember:
MSbetween or MStreatment = average variability
(variance) between the levels of our IV/groups
MSwithin or MSerror = average variance among
subjects in the same group
One-Way ANOVA
We then use this to calculate our Fstatistic:
F = MStreatment/ MSerror
The p-value associated with this F-statistic
is a function of both F and your df
Higher F and/or df  Lower p
Recall: df = n & # levels of your IV
More S’s and/or fewer levels  Higher df  Lower p
One-Way ANOVA
 How can we change our experiment to increase
the likelihood of a significant result/decrease p?
Larger ES  Higher F:
 Increase the potency of the IV
Higher df
 More S’s
 Fewer # levels of your IV
• Collapse across groups – Instead of looking at Kids vs. Young
Adults vs. Adults, look at Children vs. Adults only
• Worst way to decrease p as this sacrifices how subtle-ly you
can test your theory
One-Way ANOVA
Example:
What effect does smoking have on
performance? Spilich, June, and Renner (1992)
asked nonsmokers (NS), smokers who had
delayed smoking for three hours (DS), and
smokers who were actively smoking (AS) to
perform a pattern recognition task in which they
had to locate a target on a screen. The data
follow:
One-Way ANOVA
 Example:
1. What is the IV, number
of levels, and the DV?
2. What is H1 and Ho?
3. What is your dftotal,
dfgroups, and dferror?
Non-Smokers
Delayed
Smokers
Active
Smokers
9
12
8
8
7
8
12
14
9
10
4
1
7
8
9
10
11
7
9
16
16
11
17
19
8
5
1
10
6
1
8
9
22
10
6
12
8
6
18
11
7
8
10
16
10
One-Way ANOVA
Descriptives
PERFORMA
N
ns
ds
as
Total
15
15
15
45
Mean
9.4000
9.6000
9.9333
9.6444
Std. Deviation
1.40408
4.40454
6.51884
4.51339
Std. Error
.36253
1.13725
1.68316
.67282
95% Confidence Interval for
Mean
Lower Bound Upper Bound
8.6224
10.1776
7.1608
12.0392
6.3233
13.5433
8.2885
11.0004
Minimum
7.00
4.00
1.00
1.00
Maximum
12.00
17.00
22.00
22.00
ANOVA
PERFORMA
Between Groups
Within Groups
Total
Sum of
Squares
2.178
894.133
896.311
df
2
42
44
Mean Square
1.089
21.289
F
.051
Sig.
.950
One-Way ANOVA
 Example:
4. Based on these results, would you reject or
fail to reject Ho?
5. What conclusion(s) would you reach about the
effect of the IV on the DV?
One-Way ANOVA
Assumptions of ANOVA:
Independence of Observations
Homoscedasticity
Normality
Equal sample sizes – not technically an assumption,
but effects the other 3
How do we know if we violate one (or
more) of these? What do we do?
One-Way ANOVA
Independence of Observations
Identified methodologically
Other than using repeated-measures tests
(covered later), nothing you can do
Equal Sample Sizes
Add more S’s to the smaller group
DON’T delete S’s from the larger one
One-Way ANOVA
Homoscedasticity
Identified using Levene’s Test or the Welch
Procedure
Again, don’t sweat the book, SPSS will do it for
you
If detected (and group sizes very unequal), use
appropriate transformation
One-Way ANOVA
 Homoscedasticity
Test of Homogeneity of Variances
Descriptives
N
Trial 1
Trial 2
Trial 3
Trial 4
1
2
Total
1
2
Total
1
2
Total
1
2
Total
6
6
12
6
6
12
6
6
12
6
6
12
Std. Deviation
2.714
1.329
2.067
2.098
2.828
2.431
2.714
2.338
2.417
1.835
3.445
2.864
Trial 1
Trial 2
Trial 3
Trial 4
Levene
Statis tic
3.312
.156
.266
7.788
df1
df2
1
1
1
1
10
10
10
10
Sig.
.099
.701
.617
.019
Robust Tests of Equality of Means
a
Trial 1
Trial 2
Trial 3
Trial 4
Welch
Welch
Welch
Welch
Statis tic
.292
.484
.013
1.849
a. Asymptotically F distributed.
df1
1
1
1
1
df2
7.268
9.223
9.786
7.626
Sig.
.605
.504
.912
.213
One-Way ANOVA
Normality
Can identify with histograms of DV’s (IV’s are
supposed to be non-normal)
More appropriate to use skewness and kurtosis
statistics
If detected (and sample size very small), use
appropriate transformation
One-Way ANOVA
 Normality
1. Divide statistic by its standard error to get z-score
2. Calculate p-value using z-score and df = n
Descriptive Statistics
Trial 1
Trial 2
Trial 3
Trial 4
Valid N (lis twis e)
N
Statis tic
12
12
12
12
12
Minimum
Statis tic
12
8
4
1
Maximum
Statis tic
19
16
12
9
Mean
Statis tic
16.50
11.50
7.75
4.25
Std.
Deviation
Statis tic
2.067
2.431
2.417
2.864
Skewness
Statis tic
Std. Error
-.815
.637
.205
.637
.165
.637
.534
.637
Kurtos is
Statis tic
Std. Error
.651
1.232
-.406
1.232
-.864
1.232
-1.192
1.232
One-Way ANOVA
 Estimates of Effect Size in ANOVA:
1. η2 (eta squared) = SSgroup/SStotal
 Unfortunately, this is what most statistical computer
packages give you, because it is simple to
calculate, but seriously overestimates the size of
effect
2.
ω2
(omega squared) =
SS groups  k  1MS error
SS total  MS error
 Less biased than η2, but still not ideal
One-Way ANOVA
 Estimates of Effect Size in ANOVA:
3. Cohen’s d = X 1  X 2  2 F 
sp
df error
2 t
n1  n2  2
 Remember: for d, .2 = small effect, .5 = medium,
and .8 = large
One-Way ANOVA
Multiple Comparison Techniques:
Remember: ANOVA tests for differences
somewhere between group means, but doesn’t
say where
H1 = μ1 ≠ μ2 ≠ μ3 ≠ μ4, etc…
If significant, group 1 could be different from groups
2-4, groups 2 & 3 could be different from groups 1
and 4, etc.
Multiple comparison techniques attempt to detect
specifically where these differences lie
Multiple Comparison Techniques
 You could always run 2-sample t-tests on all
possible 2-group combinations of your groups,
although with our 4 group example this is 6
different tests
 Running 6 tests @ (α = .05) = (α = .05 x 6 = .3) 
 This would inflate what is called the familywise error rate
– in our previous example, all of the 6 tests that we run
are considered a family of tests, and the familywise error
rate is the α for all 6 tests combined (.3) – however, we
want to keep this at .05
Multiple Comparison Techniques
 To perform multiple comparisons with a
significant omnibus F, or not to …
 Why would you look for a difference between
two or more groups when your F said there
isn’t one?
Multiple Comparison Techniques
Some say this is what is called statistical fishing and
is very bad – you should not be conducting statistical
tests willy-nilly without just cause or a theoretical
reason for doing so
Think of someone fishing in a lake, you don’t know if
anything is there, but you’ll keep trying until you find
something – the idea is that if your hypothesis is true,
you shouldn’t have to look to hard to find it, because if
you look for anything hard enough you tend to find it
Multiple Comparison Techniques
However, others would say that the omnibus
test is underpowered, particularly with a large
number of groups and if only a few significant
differences are predicted among them
i.e. H1 = μ1 ≠ μ2 ≠ μ3 ≠ μ4 … μ10
If you only predict groups 1 and 5 will differ, you are
unlikely to get a significant omnibus F unless you
have a ton of S’s
I and most statisticians fall on this side of the
argument – i.e. it’s OK to ignore the omnibus test if
you have a theoretical reason to predict specific
differences among groups
A Priori Techniques
 A priori techniques:
 Planned prior to data collection
 Involve specific hypotheses between group means (i.e.
not just testing all group differences)
Multiple t Tests
 As stated previously, not a good idea bc/  inflated 
 Involves a different formula than a traditional t test
Bonferroni t (Dunn’s Test)
 Simply using the Bonferroni Correction on a bunch of
regular t tests
 For 6 comparisons:  = .05/6 = .0083
 Can combine the Bonferroni correction with the adjusted
formula for t used above
A Priori Techniques
Dunn-Šidák Test
Almost identical to the Bonferroni t, but uses 1 – (1 )1/c instead of /c, where c = # of comparisons
Slightly less conservative
For 6 comparisons = 1-(1-.05)1/6 = .0085
A Priori Techniques
 Holm Test
 Instead of assuming we evaluate hypotheses all at once,
allows for ordering of when hypotheses are evaluated
 Uses most stringent  for hypothesis w/ strongest
support (largest t)
 Adjusts  downward for next comparisons, taking into
account that previous comparison was significant
 More powerful than Bonferroni t & Dunn-Šidák Test
1.
2.
3.
4.
5.
6.
Calculate t tests for all comparisons
Arrange t’s in decreasing order
For 1st t test, use Dunn-Šidák method with normal c
For 2nd t test, use c – 1
For 3rd t test, use c – 2, etc.
Continue until a nonsignificant result is obtained
A Priori Techniques
Linear Contrasts
What if, instead of comparing group x to group y, we
want to compare group x, y, & z to group a & b?
Coefficients – how to tell mathematically which
groups we are comparing
• Coefficients for the same groups have to be the same
and all coefficients must add up to 0
• Comparing groups 1, 2, & 3 to groups 4 & 5:
 Groups 1 – 3: Coefficient = 2
 Groups 4 & 5: Coefficient = -3
Use Bonferroni correction to adjust  to # of
contrasts
• 4 contrasts  use  = .05/4 = .0125
A Priori Techniques
Orthogonal Contrasts
What if you want to compare groups within a
contrast?
I.e. Group 1 vs. Groups 2 & 3 and Group 2 vs. Group
3
Assigning coefficients is the same, but calculations
are different (don’t worry about how different, just
focus on the linear vs. orthogonal difference)
A Priori Techniques
Both the Holm Test and linear/orthogonal
contrasts sound good, which do I use?
If making only a few contrasts:
Linear/Orthogonal
If making many contrasts: Holm Test
More powerful and determining coefficients is
confusing with multiple contrasts
Post Hoc Techniques
 Post hoc techniques:
 Fisher’s LSD
2
p in our 2-sample t-test formula with
 We replace s
MSerror, and we get:
t
X1  X 2
1
1 

MS error   
 n1 n2 
 We then test this using a critical t, using our t-table and
dferror as our df
 You can use either a one-tailed or two-tailed test,
depending on whether or not you think one mean is
higher or lower (one-tailed) or possibly either (two-tailed)
than the other
Post Hoc Techniques
 Fisher’s LSD
 However, with more than 3 groups, using Fisher’s
LSD results in an inflation of  (i.e. with 4 groups α
= .1)
 You could use the Bonferroni method to correct for
this, but then why not just use it in the first place?
 This is why Fisher’s LSD is no longer widely used
and other methods are preferred
 Newman-Keuls Test
 Like Fisher’s LSD, allows the familywise  > .05
 Pretty crappy test for that reason
Post Hoc Techniques
 Scheffé’s Test
 Fishers LSD & Neuman-Keuls = not conservative
enough = too easy to find significant results
 Scheffé’s Test = too conservative = result in a low
degree of Type I Error but too high Type II Error
(incorrectly rejects H1) = too hard to find significant
results
 Tukey’s Honestly Significant Difference (HSD)
test
 Very popular, but conservative
Post Hoc Techniques
Ryan/Einot/Gabriel/Welsch (REGWQ)
Procedure
Like Tukey’s HSD, but adjusts  (like the DunnŠidák Test) to make the test less conservative
Good compromise of Type I/Type II Error
Dunnett’s Test
Specifically designed for comparing a control group
with several treatment groups
Most powerful test in this case
One-Way ANOVA
Reporting and Interpreting Results in
ANOVA:
We report our ANOVA as:
F(dfgroups, dftotal) = x.xx, p = .xx, d = .xx
i.e. for F(4, 299) = 1.5, p = .01, d = .01 – We have 5
groups, 300 subjects total in all of our groups put
together; We can reject Ho, however our small effect
size statistic informs us that it may be our large
sample size that resulted in us doing so rather than a
large effect of our IV