Download Repeated Measures ANOVA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Repeated Measures
ANOVA
PSYCHOMETRICS
Michael J. Kalsher
Department of
Cognitive Science
MGMT 6971
PSYCHOMETRICS
© 2014, Michael Kalsher
1 of 44
Outline
• Review
• Descriptive vs. Inferential Statistics
• Parametric vs. Non-parametric Statistics
• The role of equivalence in research
• The t test
– One-group t test
– Independent-groups t test
– Dependent-groups t test
• Analysis of Variance
– One-way Independent-groups ANOVA
– One-way Dependent-groups ANOVA
2 of 44
Descriptive Statistics
• Describing or characterizing the obtained
sample data
• Use of summary measures, typically:
– measures of central tendency (mean, median, mode)
– measures of dispersion (range, variance, standard deviation)
3 of 44
Inferential Statistics
• Used to make inferences about
populations based on the behavior of a
sample.
• Concerned with how likely it is that a result
based on a particular sample, or set of
samples, are the same as results that
might be obtained from an entire
population.
4 of 44
Parametric vs. Non-parametric tests
Parametric tests
• Term used for inferential tests based on the normal distribution.
– Accuracy depends on tests meeting basic assumptions.
• Most outcomes closely follow a known probability distribution.
• Many parametric tests are robust to violations of distributional
assumptions, so the assigned p-value will be fairly accurate in
many situations.
Non-Parametric tests
• A family of statistical procedures that do not rely on the
restrictive assumptions of parametric tests. In particular, they
do not assume that data come from a normal distribution.
5 of 44
Assumptions of Parametric Data
Most parametric statistics based on the normal distribution
have four basic assumptions that must be met for the test to
be accurate:
1. Normally distributed data.
2. Homogeneity of variance (i.e., variance of each group
should be equivalent).
3. Data must be interval or ratio.
4. Independence (i.e., the behavior of one participant doesn’t
influence the behavior of another participant).
6 of 44
Achieving Equivalence:
Random Assignment versus Correlated-Groups Design
Random assignment: helps to ensure statistical
equivalence of groups at the beginning of a study.
Correlated-groups design:
assures equivalence by either using
the same participants in all groups or participants that have been
closely matched. They are usually more sensitive than betweensubjects designs to the effects of the IV(s).
- Within-subjects (or repeated measures) design
Tests each subject under every condition
- Matched-subjects designs
Match participants on relevant variables prior to the study and then
randomly assign the matched sets of participants--one member of
each matched set--to each group.
7 of 44
Characteristics of WithinSubjects Designs
1. Each participant is exposed to all conditions of the
experiment, and therefore, serves as his/her own
control.
2. Susceptible to sequence effects, so the order of the
conditions should be “counter-balanced”. In complete
counter-balancing:
a. Each participant is exposed to all conditions of the experiment.
b. Each condition is presented an equal number of times.
c. Each condition is presented an equal number of times in each position.
d. Each condition precedes and follows each other condition an equal number of
times.
3. The critical comparison is the difference between the
correlated groups on the dependent variable.
8 of 44
Dependent or Matched Pairs t-test
The dependent, or matched-pairs t-test, is designed for situations
in which the same participants are used in both experimental
conditions. Thus, each participant contributes two scores.
Calculation of t:
t=
Mean Difference
Standard Error
(of the mean difference)
9 of 44
Assumptions
The dependent groups t test requires the
following statistical assumptions:
1. Data are from normally distributed populations.
Note: Independent samples t-test is robust against violation of this
assumption if n > 30 for both groups.
2. Data are measured at least at the interval level.
10 of 44
Dependent t-test:
An Overview
A researcher wonders whether self-help books actually work and asks a group of
participants to read two books: one written for the purpose of increasing relationship
happiness and another that is (hopefully!) irrelevant for this purpose. After reading each
book, she asks participants to fill out a survey that measures relationship happiness.
Books Read
Mean
N
Std.
Deviation
Std. Error
Mean
Marital Bliss: A Practical Approach
20.018
500
9.98123
.44637
Statistics 101
18.490
500
8.99153
.40211
Paired Differences
Pair 1
Mean
Std.
Std. Error
95% Confidence Int.
Diff.
Deviation
Mean
Lower
Upper
1.5280
12.62807
.56474
.4184
2.6376
t
2.706
df
499
Sig.
(2-tailed)
.007
11 of 44
Critical Values:
Dependent Groups t test
Note: Degrees of Freedom = ND - 1 (where ND = the number of difference scores)
Our value: 2.706
12 of 44
Effect Sizes
The statistical test tells us whether it is safe to conclude that the
means come from the same—or different—populations. It
doesn’t tell us how strong these differences are.
r2 (r-Square), or the coefficient of determination, is one metric
for gauging effect size. It represents the proportion of variance
in the Sums of Squares Total that is accounted for by the
treatment.
Rules of Thumb regarding effects sizes:
Small effect: 1-3% of the total variance
Medium effect: 10% of the total variance
Large effect: 25% of the variance
R2
=
SSM
SST
13 of 44
Dependent t-test: Calculating the Effect Size
Statistical vs. Practical significance.
Degrees of freedom:
Formula:
r2
t2
=
(df) = N-1 = 499
t2 + df
Relationship Happiness sample computation:
r2 =
(-2.706)2
(-2.706)2 + 499
=
7.32
= .01 (small effect size)
506.32
14 of 44
Dependent t-test: Reporting the Results
On average, the reported relationship happiness after
reading Marital Bliss: A Practical Approach (M = 20.02,
SE = .45), was significantly higher than after reading the
introductory statistics book (M = 18.49, SE =.40), t(499)
= 2.71, p < .01, r2= .01. However, the small effect size
estimate indicates that this difference was not practically
significance.
15 of 44
Sample Problem
A psychologist believes that children of parents who use positive verbal statements (polite
requests and suggestions) are more socially accepted and more positive in interactions
with their peers. Although children acquire behavioral information from sources other than
parents (TV, peers, and so on), more induction (coaching children by introducing
consequences for behaviors and supplying rationales that support them) on the part of
parents, as opposed to more power-assertive and permissive types of discipline, facilitates
a pro-social behavioral orientation in children that, in turn, leads to greater competence
and greater acceptance by peers.
Twenty first-grade children who were rated by teachers and peers as aggressive and their
parents are asked to participate in a study to determine whether a seminar on inductive
parenting techniques improves social competency in children. The parents attend the
seminar for one month. The children are tested for social competency before the course
and then retested six months after their parents’ completion of the course. The results of
the social competency test are shown on the following page with higher scores indicating
a higher level of social competency.
In this problem, we are testing the null hypothesis that there is no difference between the
means of pre- and post-seminar social competency scores.
What is the IV? What is the DV? Was the seminar effective?
16 of 44
Social Competency Problem Data Set
Child
Pre
Post
Child
Pre
Post
1
2
3
4
5
6
7
8
9
10
31
26
32
38
29
34
24
35
30
36
34
25
38
36
29
41
26
42
36
44
11
12
13
14
15
16
17
18
19
20
31
27
25
28
32
27
37
29
31
27
28
32
25
30
41
37
39
33
40
28
17 of 44
T-TEST PAIRS=PreSeminar WITH PostSeminar (PAIRED)
/CRITERIA=CI(.9500)
/MISSING=ANALYSIS.
18 of 44
SPSS Output
Paired Samples Descriptive Statistics
Mean
Pre-Post
Std.
Deviation
N
Std.
Error
Mean
Paired Samples Correlations
Pre & Post
30.45
20
4.019
.899
34.20
20
6.066
1.356
N
Correlation
Sig.
20
.771
.000
Paired Differences
Mean
Std.
Dev.
Std.
Error
Mean
95% Confidence
Interval of the
Difference
Lower
Upper
Sig.
t
Pre-Post
-3.750
3.919
.876
-5.584
-1.916
-4.280
df
19
Paired
Samples
Test
(2-tail)
.000
19 of 44
One-way
Repeated-measures
ANOVA
20 of 44
What is it?
• Used when testing more than 2 experimental groups.
• In dependent groups ANOVA, all groups are dependent:
each score in one group is associated with a score in
every other group. This may be because the same
subjects served in every group or because subjects have
been matched.
• Within-subjects design equates conditions prior to
experiment by using same participants in each condition
or participants matched on some variable(s) of interest.
This removes the single largest contributing factor to
variance: individual differences.
21 of 44
Examining Sources of Variance in
repeated measures ANOVA: An Example
Suppose we want to test whether participants can
identify a target faster if there are fewer distractor items.
Participants must find the single target (T or F) in a
letter array and then push either the T or F button on the
keyboard.
Each person is tested under three conditions: 10, 15,
and 20 distractor items. There are 10 trials at each of
the three levels of distraction, and the 10 trials are
summed to give the total search time for each
distraction level.
22 of 44
One-Way Repeated Measures ANOVA
• Used when testing 3 or more experimental groups.
• Each person contributes more than one score
(i.e., every participant is exposed to every treatment).
• Within-subjects designs:
- equate conditions by using same participants in each condition.
- variance is partitioned into SST, SSM and SSR
- in repeated-measures ANOVA, the model and residual sums of
squares are both part of the within-participant variance.
SST
SSBG
SSWG
SSModel
SSR
23 of 44
Condition (time estimation in seconds)
Participants
Order of
Presentation
A
B
C
(10)
(15)
(20)
1
ABC
18.33
22.39
24.97
2
ACB
15.96
20.72
21.79
3
BAC
19.02
22.78
25.46
4
BCA
25.36
27.48
27.91
5
CAB
19.52
24.64
26.75
6
CBA
23.27
24.96
25.49
Mean
Scores
20.24
23.83
25.40
Source
Df
SS
MS
Between
2
83.69
41.85
Subjects
5
95.85
19.17
10
122.997
1.30
Error
F
32.25
p
<.001
24 of 44
Independent vs. Dependent Groups
ANOVA: A few points worth noting …
• In independent-groups ANOVA, accuracy of the F-test
depends upon the assumption that the groups tested are
independent.
• The relationship between treatments in a repeatedmeasures design causes the conventional F-test to lack
accuracy, which leads to an additional assumption.
• Sphericity: refers to the equality of variances of the
differences between treatment levels.
• Mauchly’s test statistic
• Corrections applied to produce a valid F-ratio:
• Sphericity estimates < .75, use Greenhouse-Geisser estimate
• When sphericity esimates > .75, use Huynh-Feldt estimate
25 of 44
The Sphericity Assumption: Equality of
variances of the differences between treatment levels
Accuracy of the F test in independent-groups ANOVA depends upon
the assumption that the groups tested are independent.
The relationship between treatments in a repeated-measures
design causes the F test to lack accuracy. This requires an
additional assumption termed sphericity.
If we were to take each pair of treatment levels and calculate the
differences between each pair of scores, then it is necessary that
these differences have equal variances.
Mauchly’s test examines the hypothesis that the variances of the
differences between conditions are equal. The effect of violating
sphericity is a loss of power (increased Type II error).
26 of 44
One-Way Dependent-Groups ANOVA:
An Example
Suppose we were testing the idea that consuming
an increasing amount of alcohol will make it more
likely people will “eye-up” members of the
opposite sex.
IV: Over 4 nights, people are given either 1, 2, 3 or 4
pints of beer to drink.
DV: How many people the drinkers “eye-ball” (as
measured by specialized eye-tracking goggles).
27 of 44
16
“Ogling”
increases
after 3 pints!
“Ogling” after 1
and 2 pints
seems similar
14
12
10
Descriptive Statistics
8
1 Pint
2 Pints
3 Pints
4 Pints
6
4
Mean
11.7500
11.7000
15.2000
14.9500
Std. Dev.
4.31491
4.65776
5.80018
4.67327
N
20
20
20
20
2
0
1
2
3
Number of Pints
4
Mean Difference = 3.5
28 of 44
Steps in the Analysis
-- Compute SSM (the variability explained by the experimental effect)
-- Compute SSR (amount of unexplained variation across the conditions of
the repeated measures variable)
-- Divide by the appropriate df:
(1) df for SSM = levels of the IV minus 1 (or k - 1);
(2) df for SSR = (k - 1) x (n - 1)
[n = number of participants in each group]
-- F = MSM/MSR = the probability of getting a value like this
by chance alone.
-- Check to see whether Mauchly’s test is significant!
29 of 44
SPSS Output
Mauchly’s Test of Sphericity
Within-subjects
Effect
Mauchly’s W
Alcohol
.477
Chi-Square
13.122
Source
Type III SS
ALCOHOL
Error
df1
5
Epsilon
GG
HF
LB
.745 .849 .333
Sig.
.022
df
Mean Square
F
Sig.
75.033
4.729
.005
.011
Sphericity Assumed
225.100
3
Greenhouse-Geisser
225.100
2.235
100.706
4.729
Huynh-Feldt
225.100
2.547
88.370
4.729
.008
Lower-bound
225.100
1.000
225.100
4.729
.042
Sphericity Assumed
904.400
57
15.867
Greenhouse-Geisser
904.400
42.469
21.296
Huynh-Feldt
904.400
48.398
18.687
Lower-bound
904.400
19.000
47.600
Note: If the Epsilon estimates <.75, use Greenhouse-Geisser (GG);
If the Epsilon estimates > .75, use Huynh-Feldt (HF).
30 of 44
SPSS Output: Post hoc tests
Mean
(I) Alcohol
(J) Alcohol
Difference
(I - J)
1
2
3
4
2
5.000E-02
Lower Bound
Std. Error
1.000
-2.133
2.233
.136
.242
-7.544
.644
-.7.480
1.080
-3.450
1.391
4
-3.200
1.454
-5.000E-02
Bound
.742
3
1
Sig.
Upper
.742
1.000
-2.233
2.133
3
-3.500
1.139
.038
-6.853
-.147
4
-3.250
1.420
.202
-7.429
.929
1
3.450
1.391
.136
-.644
7.544
2
3.500
1.139
.038
.147
6.853
4
.250
1.269
1.000
-3.485
3.985
1
3.200
1.454
.242
-1.080
7.480
2
3.250
1.420
.202
-.929
7.429
3
-.250
1.269
1.000
-3.985
3.485
31 of 44
Calculating Effect Size
MSM - MSR
2 =
MSM + ((n-1) x MSR)
2 =
75.03 - 15.87
75.03 + ((20 - 1) x 15.87)
=
59.16
=
75.03 + 301.53
59.17 = .16
376.56
32 of 44
Reporting the Results
The results show that the number of people eyed-up was
significantly affected by the amount of alcohol consumed,
F(2.55, 48.40) = 4.73, p<.05. Mauchly’s test indicated
that the assumption of sphericity had been violated, 2(5)
= 13.12, p<.05, therefore degrees of freedom were
corrected using Huynh-Feldt estimates of sphericity ( =
.85). The effect size indicated that the effect of alcohol
consumption on ogling was substantial. Bonferroni posthoc tests revealed a significant difference in the number
of people eyed-up only between 2 and 3 pints, p<.05. No
other comparisons were significant (all ps > .05).
33 of 44
Sample Problem: One-way Repeated-measures ANOVA
A stress management therapy group instructor conducts a study to determine the most
effective relaxation technique(s) for stress reduction. 20 members of his stress
management group participate in the study. The heart rate of each participant is monitored
during each of five conditions. Each participant experienced all five conditions during the
same session to control for variations in the amount of stress experienced from day to day.
The five conditions are as follows: (1) baseline (subjects sat quietly for 15-minutes); (2)
guided meditation (subjects listened to a tape instructing them to close their eyes, breathe
deeply, and relax their muscles for 15 minutes while concentrating on a single word or
phrase); (3) comedy (subjects listened to the act of a stand-up comedian on a tape
cassette for 15 minutes); (4) nature (subjects listened to a tape for 15 minutes of various
sounds of nature, including the sounds of the ocean, wind, rain, leaves rustling, and birds
chirping); and (5) music (subjects listened to a tape of a collection of easy-listening music
for 15 minutes). Every subject experienced the baseline condition first; however, the four
treatment conditions were counterbalanced to alleviate the possibility of order effects.
Each subject’s heart rate was monitored continuously during each of the 15-minute
periods. The mean heart rate (beats per minute) for each subject during each condition is
presented on the following page.
In this problem, we are testing the null hypothesis that, on average, the heart rates of
subjects remain the same during each of the five conditions - or - that these conditions do
not influence heart rate differentially. What is the IV? What is the DV?
34 of 44
Sample Problem Data Set
1
85
70
75
71
74
11
80
72
76
74
75
2
79
69
73
70
72
12
97
80
89
82
87
3
91
82
87
83
86
13
88
78
82
80
82
4
93
80
85
79
84
14
94
79
84
80
84
5
92
80
86
81
87
15
75
60
68
62
66
6
87
79
83
80
81
16
76
67
72
69
70
7
84
72
77
73
76
17
90
77
83
76
83
8
78
69
74
71
73
18
86
75
80
77
80
9
79
69
73
70
72
19
94
84
88
85
87
10
80
71
74
72
73
20
70
59
64
58
62
35 of 44
36 of 44
37 of 44
Move each of the within subjects
variables over to the “WithinSubjects Variables” box.
38 of 44
39 of 44
40 of 44
Which estimate do you use?
Remember!: If the Epsilon estimates <.75, use Greenhouse-Geisser (GG);
If the Epsilon estimates > .75, use Huynh-Feldt (HF).
41 of 44
SPSS Output
Source
SS
Df
Mean
Square
F
Sig.
Exp. Cond
WithinSubjects
Effects
Sphericity Assumed
1573.100
4
393.275
235.531
.000
Greenhouse-Geisser
1573.100
1.654
951.015
235.531
.000
Huynh-Feldt
1573.100
1.792
877.881
235.531
.000
Lower-bound
1573.100
1.000
1573.100
235.531
.000
Sphericity Assumed
126.900
76
1.670
Greenhouse-Geisser
126.900
31.428
4.038
Huynh-Feldt
126.900
34.047
3.727
Lower-bound
126.900
19.000
6.679
Error
Condition
Descriptive
Statistics
Mean
Std. Dev
N
Baseline
84.90
7.511
20
Meditate
73.60
6.969
20
Comedy
78.65
7.036
20
Nature
74.65
7.006
20
Music
77.70
7.420
20
42 of 44
43 of 44
Sample Problem
Western people can become obsessed with
body weight and diets, and because the media
continues to glamorize stick-thin celebrities, we
end up depressed that we’re not perfect. This
gives corporate moguls in the fashion industry
the opportunity to jump on our vulnerability by
making loads of money on diets that will
apparently help us to attain beautiful bodies. A
European company bursts onto the scene with
a diet called the Mediterranean-Monaco diet.
The basic principle is that you eat no meat,
drink lots of green tea, eat lots of bread dipped
in extra virgin olive oil, eat chocolate at least
once per day, and drink red wine (for the health
benefits, of course) at the rate of at least 1.5oz. glass per day. Ten people in need of losing
weight agree to try the diet for two months.
Their weight was measured in Kilograms at the
start of the diet and then after 1 month and 2
months. Did the diet work?
Before
Diet
After 1
Month
After 2
Months
63.75
65.38
81.34
62.98
66.24
69.31
65.98
67.70
77.89
107.27
102.72
91.33
66.58
69.45
72.87
120.46
119.96
114.26
62.01
66.09
68.01
71.87
73.62
55.43
83.01
75.81
71.63
76.62
67.66
68.60
44 of 44