Download Effect Sizes for Continuous Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Effect Sizes for Continuous
Variables
William R. Shadish
University of California, Merced
Indices for Treatment Outcome
Studies
• Correlation coefficient (r) between treatment and
outcome
• Standardized mean difference statistic (d)
t
i
X  X ic
di 
si
• Either can be transformed into the other, so we
will work with d since it is most common.
• Other indices do exist but are rare in social science
meta-analyes.
Estimating d
•
•
•
•
•
•
d itself
Algebraic equivalents to d
Good approximations to d
Methods that require intraclass correlation
Methods that require ICC and change scores
Methods that underestimate effect
Note: Italicized methods will be covered in this workshop.
Sample Data Set I: Two
Independent Groups
Treatment
Comparison
3
2
4
2
4
4
4
5
5
5
6
6
6
6
6
7
7
8
7
9
Mean
5.2
5.4
Standard Deviation
1.398
2.319
Sample Size
10
10
Correlation between treatment and outcome is r = -.055
Calculating d
t
i
X  X ic
di 
si
X
d 
sp 

t
X
sp
c

5.2  5.4
 .104 , where
1.915
( nt  1) st2  ( nc  1) sc2
nt  nc  2
(10  1)1.398 2  (10  1) 2.319 2
10  10  2
 1.915
Algebraic Equivalent: Between
Groups t-test on raw posttest
scores
,
1
1
1 1
d t

 .23

 .103
nT nC
10 10
Algebraic Equivalent:
t-test for two matched groups, sample
sizes, correlation between groups
1
d  t [2(1  r )] , then from Dunlap (1996 table 8)
n
1
d  4.513 [2(1  .6857)]  1.265
8
Algebraic Equivalent: Two-group
between-groups F-statistic on raw
posttest scores (Data Set I)
1 1
1 1
d  F (  )  .055    .105
nt nc
 10 10 
Algebraic Equivalent: Multifactor
Between Subjects ANOVA with Two
Treatment Conditions
1. Sums of Squares and Degrees of Freedom for all sources, and
Marginal Means for Treatment Conditions
2. Mean Squares and Degrees of Freedom for all sources, and Marginal
Means for Treatment Conditions
3. Sums of Squares and Degrees of Freedom for all sources, with Cell
Means and Cell Sample Sizes
4. Mean Squares and Degrees of Freedom for all sources, with Cell
Means and Cell Sample Sizes
5. Cell means, cell sample sizes, the F-statistic for the treatment factor,
and the degrees of freedom for the error term
6. F-statistics and degrees of freedom for all sources, sample size for
treatment and comparison groups, where treatment factor has only
two levels
Example: Sums of Squares and Degrees of
Freedom for all sources, and Marginal Means
for Treatment Conditions: Data Set II
Row
B1
B2
A2
8
4
0
10
8
6
8
6
4
14
10
6
4
2
0
15
12
9
A
B
AB
Residual
Total
B2
B3
4.0
(3)
10.0
(3)
8.0
(3)
2.0
(3)
6.0
(3)
12.0
(3)
6.0
(9)
8.0
(9)
7.5
(6)
8.0
(6)
5.5
(6)
7.0
(18)
Marginal
B3
A1
A1
B1
A2
Sum of Squares df
18.000
1
48.000
2
144.000
2
106.000
12
316.000
17
Column
Marginal
Mean Square
18.000
24.000
72.000
8.833
18.588
F
Probability
2.038
.179
2.717
.106
8.151
.006
Example: Sums of Squares and Degrees of
Freedom for all sources, and Marginal Means
for Treatment Conditions
• For a two group one factor ANOVA:
s p  MS w 
,
SS w
df w
• For a two factor ANOVA:
sp 
d 
SSb  SS ab  SS w

df b  df ab  df w
48  144  106
 4.32
2  2  12
68
 .46 3
4.32
• Which is the same as would have been obtained
had Factor B not existed (with equal n per cell)
Algebraic Equivalent : Oneway two-group
ANCOVA: Covariance error term, F for covariate,
raw score means, and total sample size (Data Set III)
Group 1
Group 2
Time 1
Time 2
Change
.00
.00
.00
3.00
1.00
-2.00
4.00
3.00
-1.00
4.00
2.00
-2.00
5.00
4.00
-1.00
7.00
5.00
-2.00
Time 2 Group Mean
1.333
3.667
ANCOVA Table, Time 2 as
Outcome, Time 1 as covariate
Source
Sum of Squares df
Mean Square
F
Sig.
Covariate
7.500
1
7.500
12.273 .039
Groups
.005
1
.005
.008
Error
1.833
3
.611
Total
17.500 5
.932
3.500
Note: This table was computed using the unique sum of squares method as defined
in SPSS for Windows Version 7.5.
Algebraic Equivalent: Oneway two-group
ANCOVA: Covariance error term, F for covariate, raw
score means, and total sample size (Data Set III)
sp 
rw 
MS e N  k  1

1  rw2
N  k 

.6116  2  1
1  .896 6  2
2
 1.524 , where
Fcov
12 .273

 .896 , so
Fcov  ( N  k  1)
12 .273  6  2  1
1.333  3.667
d
 1.532
1.524
Which is the same as would have been obtained had the standard
method been applied to the Time 2 scores
Algebraic Equivalent: Exact
Probability and Sample Sizes
• If exact p value from t-test or two group F-test
• Use sample size to get df, which in turn allows
you to get exact t statistic
• Then apply t-test method previously shown
• From Data Set I
– exact probability for t-test was p = .818.
– for df = 20-2 = 18, t = .2336
– so d = -.104, same as before
Algebraic Equivalent: r to d
To convert r to d uncorrected for small sample bias, using Data
Set I:
d
2r
1 r 2
nt  nc
nt  nc  2

2 * .055
1  (.055 ) 2
10  10
10  10  2
 .105
Which is the same as originally obtained using the
standard formula for d
Algebraic Equivalent: Raw Data
• Sometimes raw data is tabled as, say,
– Treatment group N = 10: A = 20%, B =20%, C = 30%,
D = 20%, and F = 10%
– Comparison group N = 10: A = 10%, B = 20%, C =
20%, D = 30%, and F = 20%
• Create raw data as, say, A = 4, B = 3, C = 2, D = 1,
and F = 0
– treatment group is 4, 4, 3, 3, 2, 2, 2, 1, 1, 0
– comparison group is 4, 3, 3, 2, 2, 1, 1, 1, 0, 0
• Then d = .377
Good Approximation
,
• Three-group or higher between-groups
oneway ANOVA on posttest scores: group
means, sample sizes, and F-statistic
k
sp 

ni  X i  G 


i 1
F
2
k  1
Example Data Set IV
Group 1
Group 2
Group 3
G = 2.111
s p 
Posttest
.00
1.00
3.00
2.00
4.00
5.00
2.00
1.00
1.00
F = 3.267
Group Mean
1.333
3.667
1.333
31.333  2.1112  33.667  2.1112  3(1.333  2.111)2

3  1

 1.291
3.267
This is similar but not identical to d = -1.53
1.333  3.667
d
 1.81 using the standard method comparing groups
1.291
1 and 2. Difference due to different sp.
Good Approximation
• Three-group or higher between-groups
oneway ANOVA on raw posttest scores:
treatment and comparison group means and
mean square error.
• For Data Set IV:
s p  MS w  1.667  1.291, so
1.333  3.667
d
 1.81, just as with the previous method
1.291
Good Approximations: Two-Factor
RM-ANOVA (groups x time)
• Between-groups mean square error, withingroups mean square error, posttest means,
and sample sizes
• F-ratio for groups, F-ratio for time, cell
means and sample sizes
• F-ratio for groups, F-ratio for group × time
interaction, cell means and sample sizes
Example: Data Set V
This data set is taken from Winer (1972, p. 525). It presents a
two-factor model with factor A as a between subjects factor
having two levels, A1 and A2, and factor B as a within-subjects
factor having four levels (columns B1 through B4). The raw
data are:
B1
B2
B3
B4
A1
0
3
4
0
1
3
5
5
6
3
4
2
A2
4
5
7
2
4
5
7
6
8
8
6
9
Example: Data Set V RM-ANOVA
A1
A2
Here are the cell means (and sample sizes) for the same data, along with marginals and grand
means.
Row
B1
B2
B3
B4
Marginal
2.33
1.33
5.33
3.00
3.00
(3)
(3)
(3)
(3)
(12)
5.33
3.67
7.00
7.67
5.917
(3)
(3)
(3)
(3)
(12)
Col
Marginal
3.83
(6)
2.50
(6)
6.17
(6)
5.33
(6)
4.458 = Grand Mean
(24)
Repeated Measures ANOVA Table
Tests of Within-Subjects Effects
Source
Sum of Squares
df
B
47.458
3
AB
7.458
3
WS Error
14.833
12
Mean Square
15.819
2.486
1.236
Tests of Between-Subjects Effects
Source
Sum of Squares
df
Mean Square
A
51.042
1
51.042
BS Error
17.167
4
4.292
F
12.798
2.011
Probability
.000
.166
F
11.893
Probability
.026
Between-groups mean square error, withingroups mean square error, posttest means,
and sample sizes: Data Set V
• Assuming Time 4 is the time point of
interest (e.g., it is the posttest, or the
followup), then:
sp 
MS e( ws )  t  1MS e(bs)
t
3.00  7.67
d
 3.30
1.415

4.29  4  11.24
 1.415 , so
4
Methods that underestimate
effect size I
• Results reported as verbally “significant”, or as p
< .05 or < .01 etc., with sample size
• Use previous method to convert p to t, and then
use t to compute d as before.
• In Data Set I using p < .05, this method would
yield t = -2.004, yielding d = -.939.
• Underestimates d because t will increase as p
decreases, and p = .05 is too high.
• Be careful to distinguish 1 vs 2 tailed tests.
Methods that underestimate
effect size II
• Results reported only as nonsignificant.
• Omitting them from the meta-analysis
results in an overestimate of average d.
• A typical solution is to code them as d = 0
(introduces a constant variance problem),
but then do sensitivity analyses.
• More sophisticated solutions exist such as
maximum likelihood imputation.
Discussion
• Many more methods exist
• The standard error for all but d and its algebraic
equivalents are typically unknown
• Whether to use the approximations or not involves
the same tradeoffs as with results reported only as
nonsignificant (missing effect sizes vs
approximate results)
• When doing a meta-analysis, good practice is to
code effect size calculation method, and then
explore its effects on outcome.
Computer Programs
• Lipsey and Wilson’s excel macro (free at
http://mason.gmu.edu/~dwilsonb/ma.html)
• ES program (purchase at
http://www.assess.com/ES.html)
• For more meta-analytic software, see
http://faculty.ucmerced.edu/wshadish/MetaAnalysis%20Links.htm.