Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Effect Sizes for Continuous Variables William R. Shadish University of California, Merced Indices for Treatment Outcome Studies • Correlation coefficient (r) between treatment and outcome • Standardized mean difference statistic (d) t i X X ic di si • Either can be transformed into the other, so we will work with d since it is most common. • Other indices do exist but are rare in social science meta-analyes. Estimating d • • • • • • d itself Algebraic equivalents to d Good approximations to d Methods that require intraclass correlation Methods that require ICC and change scores Methods that underestimate effect Note: Italicized methods will be covered in this workshop. Sample Data Set I: Two Independent Groups Treatment Comparison 3 2 4 2 4 4 4 5 5 5 6 6 6 6 6 7 7 8 7 9 Mean 5.2 5.4 Standard Deviation 1.398 2.319 Sample Size 10 10 Correlation between treatment and outcome is r = -.055 Calculating d t i X X ic di si X d sp t X sp c 5.2 5.4 .104 , where 1.915 ( nt 1) st2 ( nc 1) sc2 nt nc 2 (10 1)1.398 2 (10 1) 2.319 2 10 10 2 1.915 Algebraic Equivalent: Between Groups t-test on raw posttest scores , 1 1 1 1 d t .23 .103 nT nC 10 10 Algebraic Equivalent: t-test for two matched groups, sample sizes, correlation between groups 1 d t [2(1 r )] , then from Dunlap (1996 table 8) n 1 d 4.513 [2(1 .6857)] 1.265 8 Algebraic Equivalent: Two-group between-groups F-statistic on raw posttest scores (Data Set I) 1 1 1 1 d F ( ) .055 .105 nt nc 10 10 Algebraic Equivalent: Multifactor Between Subjects ANOVA with Two Treatment Conditions 1. Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 2. Mean Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions 3. Sums of Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 4. Mean Squares and Degrees of Freedom for all sources, with Cell Means and Cell Sample Sizes 5. Cell means, cell sample sizes, the F-statistic for the treatment factor, and the degrees of freedom for the error term 6. F-statistics and degrees of freedom for all sources, sample size for treatment and comparison groups, where treatment factor has only two levels Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions: Data Set II Row B1 B2 A2 8 4 0 10 8 6 8 6 4 14 10 6 4 2 0 15 12 9 A B AB Residual Total B2 B3 4.0 (3) 10.0 (3) 8.0 (3) 2.0 (3) 6.0 (3) 12.0 (3) 6.0 (9) 8.0 (9) 7.5 (6) 8.0 (6) 5.5 (6) 7.0 (18) Marginal B3 A1 A1 B1 A2 Sum of Squares df 18.000 1 48.000 2 144.000 2 106.000 12 316.000 17 Column Marginal Mean Square 18.000 24.000 72.000 8.833 18.588 F Probability 2.038 .179 2.717 .106 8.151 .006 Example: Sums of Squares and Degrees of Freedom for all sources, and Marginal Means for Treatment Conditions • For a two group one factor ANOVA: s p MS w , SS w df w • For a two factor ANOVA: sp d SSb SS ab SS w df b df ab df w 48 144 106 4.32 2 2 12 68 .46 3 4.32 • Which is the same as would have been obtained had Factor B not existed (with equal n per cell) Algebraic Equivalent : Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) Group 1 Group 2 Time 1 Time 2 Change .00 .00 .00 3.00 1.00 -2.00 4.00 3.00 -1.00 4.00 2.00 -2.00 5.00 4.00 -1.00 7.00 5.00 -2.00 Time 2 Group Mean 1.333 3.667 ANCOVA Table, Time 2 as Outcome, Time 1 as covariate Source Sum of Squares df Mean Square F Sig. Covariate 7.500 1 7.500 12.273 .039 Groups .005 1 .005 .008 Error 1.833 3 .611 Total 17.500 5 .932 3.500 Note: This table was computed using the unique sum of squares method as defined in SPSS for Windows Version 7.5. Algebraic Equivalent: Oneway two-group ANCOVA: Covariance error term, F for covariate, raw score means, and total sample size (Data Set III) sp rw MS e N k 1 1 rw2 N k .6116 2 1 1 .896 6 2 2 1.524 , where Fcov 12 .273 .896 , so Fcov ( N k 1) 12 .273 6 2 1 1.333 3.667 d 1.532 1.524 Which is the same as would have been obtained had the standard method been applied to the Time 2 scores Algebraic Equivalent: Exact Probability and Sample Sizes • If exact p value from t-test or two group F-test • Use sample size to get df, which in turn allows you to get exact t statistic • Then apply t-test method previously shown • From Data Set I – exact probability for t-test was p = .818. – for df = 20-2 = 18, t = .2336 – so d = -.104, same as before Algebraic Equivalent: r to d To convert r to d uncorrected for small sample bias, using Data Set I: d 2r 1 r 2 nt nc nt nc 2 2 * .055 1 (.055 ) 2 10 10 10 10 2 .105 Which is the same as originally obtained using the standard formula for d Algebraic Equivalent: Raw Data • Sometimes raw data is tabled as, say, – Treatment group N = 10: A = 20%, B =20%, C = 30%, D = 20%, and F = 10% – Comparison group N = 10: A = 10%, B = 20%, C = 20%, D = 30%, and F = 20% • Create raw data as, say, A = 4, B = 3, C = 2, D = 1, and F = 0 – treatment group is 4, 4, 3, 3, 2, 2, 2, 1, 1, 0 – comparison group is 4, 3, 3, 2, 2, 1, 1, 1, 0, 0 • Then d = .377 Good Approximation , • Three-group or higher between-groups oneway ANOVA on posttest scores: group means, sample sizes, and F-statistic k sp ni X i G i 1 F 2 k 1 Example Data Set IV Group 1 Group 2 Group 3 G = 2.111 s p Posttest .00 1.00 3.00 2.00 4.00 5.00 2.00 1.00 1.00 F = 3.267 Group Mean 1.333 3.667 1.333 31.333 2.1112 33.667 2.1112 3(1.333 2.111)2 3 1 1.291 3.267 This is similar but not identical to d = -1.53 1.333 3.667 d 1.81 using the standard method comparing groups 1.291 1 and 2. Difference due to different sp. Good Approximation • Three-group or higher between-groups oneway ANOVA on raw posttest scores: treatment and comparison group means and mean square error. • For Data Set IV: s p MS w 1.667 1.291, so 1.333 3.667 d 1.81, just as with the previous method 1.291 Good Approximations: Two-Factor RM-ANOVA (groups x time) • Between-groups mean square error, withingroups mean square error, posttest means, and sample sizes • F-ratio for groups, F-ratio for time, cell means and sample sizes • F-ratio for groups, F-ratio for group × time interaction, cell means and sample sizes Example: Data Set V This data set is taken from Winer (1972, p. 525). It presents a two-factor model with factor A as a between subjects factor having two levels, A1 and A2, and factor B as a within-subjects factor having four levels (columns B1 through B4). The raw data are: B1 B2 B3 B4 A1 0 3 4 0 1 3 5 5 6 3 4 2 A2 4 5 7 2 4 5 7 6 8 8 6 9 Example: Data Set V RM-ANOVA A1 A2 Here are the cell means (and sample sizes) for the same data, along with marginals and grand means. Row B1 B2 B3 B4 Marginal 2.33 1.33 5.33 3.00 3.00 (3) (3) (3) (3) (12) 5.33 3.67 7.00 7.67 5.917 (3) (3) (3) (3) (12) Col Marginal 3.83 (6) 2.50 (6) 6.17 (6) 5.33 (6) 4.458 = Grand Mean (24) Repeated Measures ANOVA Table Tests of Within-Subjects Effects Source Sum of Squares df B 47.458 3 AB 7.458 3 WS Error 14.833 12 Mean Square 15.819 2.486 1.236 Tests of Between-Subjects Effects Source Sum of Squares df Mean Square A 51.042 1 51.042 BS Error 17.167 4 4.292 F 12.798 2.011 Probability .000 .166 F 11.893 Probability .026 Between-groups mean square error, withingroups mean square error, posttest means, and sample sizes: Data Set V • Assuming Time 4 is the time point of interest (e.g., it is the posttest, or the followup), then: sp MS e( ws ) t 1MS e(bs) t 3.00 7.67 d 3.30 1.415 4.29 4 11.24 1.415 , so 4 Methods that underestimate effect size I • Results reported as verbally “significant”, or as p < .05 or < .01 etc., with sample size • Use previous method to convert p to t, and then use t to compute d as before. • In Data Set I using p < .05, this method would yield t = -2.004, yielding d = -.939. • Underestimates d because t will increase as p decreases, and p = .05 is too high. • Be careful to distinguish 1 vs 2 tailed tests. Methods that underestimate effect size II • Results reported only as nonsignificant. • Omitting them from the meta-analysis results in an overestimate of average d. • A typical solution is to code them as d = 0 (introduces a constant variance problem), but then do sensitivity analyses. • More sophisticated solutions exist such as maximum likelihood imputation. Discussion • Many more methods exist • The standard error for all but d and its algebraic equivalents are typically unknown • Whether to use the approximations or not involves the same tradeoffs as with results reported only as nonsignificant (missing effect sizes vs approximate results) • When doing a meta-analysis, good practice is to code effect size calculation method, and then explore its effects on outcome. Computer Programs • Lipsey and Wilson’s excel macro (free at http://mason.gmu.edu/~dwilsonb/ma.html) • ES program (purchase at http://www.assess.com/ES.html) • For more meta-analytic software, see http://faculty.ucmerced.edu/wshadish/MetaAnalysis%20Links.htm.