Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5-18. Correlated Data: Repeated Measures ANOVA Before generalized estimating equations (GEE) and multilevel models became popular, researchers analyzed repeated measures data using repeated measures analysis of variance (repeated measures ANOVA). There is no need to use repeated measures ANOVA anymore, because GEE regression models and multilevel regression models are easier to fit and are not as limited. Still, it is useful to know about repeated measures ANOVA since these models occasionally shown up in published papers (e.g., Tonetti et al, N Engl J Med, 2007). These analyses were the standard approach to repeated measures analysis a few dacades ago, before the new approaches were developed. Isoproterenol Dataset We will use the 11.2.Isoproterenol.dta dataset provided with the Dupont (2002, p.338) textbook, described as, “Lang et al. (1995) studied the effect of isoproterenol, a β-adrenergic agonist, on forearm blood flow in a group of 22 normotensive men. Nine of the study subjects were black and 13 were white. Each subject’s blood flow was measured at baseline and then at escalating doses of isoproterenol.” Reading the data in, File Open Find the directory where you copied the course CD Find the subdirectory datasets & do-files Single click on 11.2.Isoproterenol.dta Open use "C:\Documents and Settings\u0032770.SRVR\Desktop\ regressionclass\datasets & do-files\progesterone.dta", clear which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\datasets & do-files" use 11.2.Isoproterenol.dta, clear _________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010. Chapter 5-18 (revision 16 May 2010) p. 1 Listing the data, Data Describe data List data Options tab: Display numeric codes rather than label values OK list , nolabel 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. +---------------------------------------------------------------------+ | id race fbf0 fbf10 fbf20 fbf60 fbf150 fbf300 fbf400 | |---------------------------------------------------------------------| | 1 1 1 1.4 6.4 19.1 25 24.6 28 | | 2 1 2.1 2.8 8.3 15.7 21.9 21.7 30.1 | | 3 1 1.1 2.2 5.7 8.2 9.3 12.5 21.6 | | 4 1 2.44 2.9 4.6 13.2 17.3 17.6 19.4 | | 5 1 2.9 3.5 5.7 11.5 14.9 19.7 19.3 | |---------------------------------------------------------------------| | 6 1 4.1 3.7 5.8 19.8 17.7 20.8 30.3 | | 7 1 1.24 1.2 3.3 5.3 5.4 10.1 10.6 | | 8 1 3.1 . . 15.45 . . 31.3 | | 9 1 5.8 8.8 13.2 33.3 38.5 39.8 43.3 | | 10 1 3.9 6.6 9.5 20.2 21.5 30.1 29.6 | |---------------------------------------------------------------------| | 11 1 1.91 1.7 6.3 9.9 12.6 12.7 15.4 | | 12 1 2 2.3 4 8.4 8.3 12.8 16.7 | | 13 1 3.7 3.9 4.7 10.5 14.6 20 21.7 | | 14 2 2.46 2.7 2.54 3.95 4.16 5.1 4.16 | | 15 2 2 1.8 4.22 5.76 7.08 10.92 7.08 | |---------------------------------------------------------------------| | 16 2 2.26 3 2.99 4.07 3.74 4.58 3.74 | | 17 2 1.8 2.9 3.41 4.84 7.05 7.48 7.05 | | 18 2 3.13 4 5.33 7.31 8.81 11.09 8.81 | | 19 2 1.36 2.7 3.05 4 4.1 6.95 4.1 | | 20 2 2.82 2.6 2.63 10.03 9.6 12.65 9.6 | |---------------------------------------------------------------------| | 21 2 1.7 1.6 1.73 2.96 4.17 6.04 4.17 | | 22 2 2.1 1.9 3 4.8 7.4 16.7 21.2 | +---------------------------------------------------------------------+ We see that the data are in wide format, with variables id patient ID (1 to 22) race race (1=white, 2=black) fbf0 forearm blood flow (ml/min/dl) at ioproterenol dose 0 ng/min fbf10 forearm blood flow (ml/min/dl) at ioproterenol dose 10 ng/min … fbf400 forearm blood flow (ml/min/dl) at ioproterenol dose 400 ng/min In this dataset, each of the several occasions represents an increasing dose, so can be thought of as an effect across dose, rather than as an effect across time. Chapter 5-18 (revision 16 May 2010) p. 2 Paired Sample t Test If we are willing to limit the analysis to two repeated measurements, and do not care about covariates, than we can analyze these data with a paired sample t test. Let’s compare the no dose forearm blood flow, fbf0, to the initial dose (10 ng/min), ignoring race for now. Statistics Summaries, tables & tests Classical tests of hypotheses Mean comparison test, paired data First variable: fbf10 Second variable: fbf0 OK ttest fbf10 = fbf0 Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 The paired t test is identically the one sample t test on the absolute change scores (fbf10 – fbf0). To verify this, capture drop diff10 gen diff10 = fbf10-fbf0 list fbf0 fbf10 diff10 in 1/5 1. 2. 3. 4. 5. +-------------------------+ | fbf0 fbf10 diff10 | |-------------------------| | 1 1.4 .4 | | 2.1 2.8 .7 | | 1.1 2.2 1.1 | | 2.44 2.9 .46 | | 2.9 3.5 .5999999 | +-------------------------+ Chapter 5-18 (revision 16 May 2010) p. 3 Statistics Summaries, tables & tests Classical tests of hypotheses One sample mean comparison test Variable name: diff10 Hypothesized mean: 0 OK ttest diff10 = 0 One-sample t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------diff10 | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Degrees of freedom: 20 Ho: mean(diff10) = 0 Ha: mean < 0 t = 2.9957 P < t = 0.9964 Ha: mean != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean > 0 t = 2.9957 P > t = 0.0036 Comparing this to the paired t test output from above, Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 we see that the t test and p value are identically the same. For both tests, the hypothesis being tested is: H 0 : 0 , where is the population mean difference, estimated by the sample mean difference, d . Chapter 5-18 (revision 16 May 2010) p. 4 The fbf0 and fbf10 variables are correlated. To see this, we can compute the Pearson correlation coefficient, using Statistics Summaries, tables & tests Summary statistics Pairwise correlations Main tab: Variables: fbf0 fbf10 Print number of observatinos for each entry Print significance level for each entry OK pwcorr fbf0 fbf10, obs sig | fbf0 fbf10 -------------+-----------------fbf0 | 1.0000 | | 22 | fbf10 | 0.8927 1.0000 | 0.0000 | 21 21 The paired t test takes the correlation structure of the data into account by being a test on difference scores. The standard error of the difference, by definition, includes the correlation coefficient of the two variables. The formula is (van Belle, 2002, p.61): Var(difference) Var(Yi1 Yi 2 ) 12 22 2 1 2 where ρ is the correlation between the two variables. Let’s verify this: Statistics Summaries, tables & tests Summary statistics Summary statistics Main tab: Variables: fbf0 fbf10 diff10 OK summarize fbf0 fbf10 diff10 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------fbf0 | 22 2.496364 1.140741 1 5.8 fbf10 | 21 3.057143 1.770755 1.2 8.8 diff10 | 21 .5895238 .9018064 -.3999999 3 Chapter 5-18 (revision 16 May 2010) p. 5 Applying the above formula for the variance of two variables Var(difference) Var(Yi1 Yi 2 ) 12 22 2 1 2 display 1.770755^2+1.160719^2-2*0.8927*1.770755*1.160719 .81322181 Taking the square root to convert this to a standard deviation display sqrt(.81322181) .90178812 which is identically the standard deviation of the difference shown in either t test output, accurate to 4 decimal places. (Accuracy was limited to 4 decimal places since that is all we used for the correlation coefficient in the calculation.) Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 Which, of course, is also identical the standard deviation computed in the summarize command. Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------fbf0 | 22 2.496364 1.140741 1 5.8 fbf10 | 21 3.057143 1.770755 1.2 8.8 diff10 | 21 .5895238 .9018064 -.3999999 3 Chapter 5-18 (revision 16 May 2010) p. 6 Let’s compare this to an independent groups t test on the same data. We will limit the analysis to only those observations which have a non-missing value for both fbf0 and fbf10, so that we will be using identical samples to compare the mean difference between the two versions of the t test. Statistics Summaries, tables & tests Classical tests of hypotheses Two sample mean comparison test Main tab: First variable: fbf10 Second variable: fbf0 by/if/in tab: Restrict to observations: fbf0~=. & fbf10~=. OK ttest fbf10 == fbf0 if fbf0~=. & fbf10~=., unpaired Two-sample t test with equal variances -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------combined | 42 2.762381 .232776 1.508561 2.29228 3.232482 ---------+-------------------------------------------------------------------diff | .5895238 .4620266 -.3442668 1.523314 -----------------------------------------------------------------------------diff = mean(fbf10) - mean(fbf0) t = 1.2760 Ho: diff = 0 degrees of freedom = 40 Ha: diff < 0 Pr(T < t) = 0.8953 Ha: diff != 0 Pr(|T| > |t|) = 0.2093 Ha: diff > 0 Pr(T > t) = 0.1047 Comparing this to the paired t test computed above, Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 First, notice that both approaches produce an identical effect, with the same mean difference. Second, notice the large difference in p values, with significance being lost with the independent samples approach. Chapter 5-18 (revision 16 May 2010) p. 7 For both the independent samples t test and paired t test, the t statistic is computed using t effect standard error of effect where the effect is the difference in means in the first test and the mean of the differences in the second test. The t statistics are thus, display .5895238 / .4620266 display .5895238 / .1967903 1.2759521 2.9956954 which match the t test outputs. The difference, then, is completely due to the way the standard error is computed. For the independent samples t test, which assumes the correlation between the two variables, fbf0 and fbf10, is zero, we have something very close to (not exactly this because a pooled estimate of the variance is used), Var(difference) Var(Yi1 Yi 2 ) 12 22 2 1 2 12 22 2(0) 1 2 12 22 For the paired t test, which estimates the population correlation with the sample correlation coefficient, we have Var(difference) Var(Yi1 Yi 2 ) 12 22 2 1 2 so we get to substract off something to provide a smaller variance, which in turn produces a smaller standard error, which in turn provides a larger t statistic. (We substract something off only in the typical case when the correlation is positive, as discussed in the ANCOVA chapter, chapter 5-16.) Conclusion From This Illustration To correctly model repeated measures data, the model needs to take the correlation structure of the data into account. Chapter 5-18 (revision 16 May 2010) p. 8 Oneway Repeated Measures ANOVA With Two Repeated Measurements We will show that the oneway repeated measures ANOVA is identically the paired t test, when no covariates are included. Thus it extends the paired t test to allow for covariates. Restoring the longitudinal data in wide format and listing it, use 11.2.Isoproterenol.dta, clear list , nolabel 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. +---------------------------------------------------------------------+ | id race fbf0 fbf10 fbf20 fbf60 fbf150 fbf300 fbf400 | |---------------------------------------------------------------------| | 1 1 1 1.4 6.4 19.1 25 24.6 28 | | 2 1 2.1 2.8 8.3 15.7 21.9 21.7 30.1 | | 3 1 1.1 2.2 5.7 8.2 9.3 12.5 21.6 | | 4 1 2.44 2.9 4.6 13.2 17.3 17.6 19.4 | | 5 1 2.9 3.5 5.7 11.5 14.9 19.7 19.3 | |---------------------------------------------------------------------| | 6 1 4.1 3.7 5.8 19.8 17.7 20.8 30.3 | | 7 1 1.24 1.2 3.3 5.3 5.4 10.1 10.6 | | 8 1 3.1 . . 15.45 . . 31.3 | | 9 1 5.8 8.8 13.2 33.3 38.5 39.8 43.3 | | 10 1 3.9 6.6 9.5 20.2 21.5 30.1 29.6 | |---------------------------------------------------------------------| | 11 1 1.91 1.7 6.3 9.9 12.6 12.7 15.4 | | 12 1 2 2.3 4 8.4 8.3 12.8 16.7 | | 13 1 3.7 3.9 4.7 10.5 14.6 20 21.7 | | 14 2 2.46 2.7 2.54 3.95 4.16 5.1 4.16 | | 15 2 2 1.8 4.22 5.76 7.08 10.92 7.08 | |---------------------------------------------------------------------| | 16 2 2.26 3 2.99 4.07 3.74 4.58 3.74 | | 17 2 1.8 2.9 3.41 4.84 7.05 7.48 7.05 | | 18 2 3.13 4 5.33 7.31 8.81 11.09 8.81 | | 19 2 1.36 2.7 3.05 4 4.1 6.95 4.1 | | 20 2 2.82 2.6 2.63 10.03 9.6 12.65 9.6 | |---------------------------------------------------------------------| | 21 2 1.7 1.6 1.73 2.96 4.17 6.04 4.17 | | 22 2 2.1 1.9 3 4.8 7.4 16.7 21.2 | +---------------------------------------------------------------------+ We discarded the difference score between fbf0 and fbf10, which we no longer need. To fit a repeated measures ANOVA, the data must first be converted to long format. Data Create or change variables Other variable transformation commands Convert data between wide and long Main tab: long format from wide ID variable(s) – the i() option: id Subobservation identifier variable – the j() option: dose Base (stub) names of X_ij variables: fbf OK reshape long fbf , i(id) j(dose) Chapter 5-18 (revision 16 May 2010) p. 9 Listing the data for the first two subjects, with the separator line drawn between subjects list if id<=2 , sepby(id) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. +--------------------------+ | id dose race fbf | |--------------------------| | 1 0 White 1 | | 1 10 White 1.4 | | 1 20 White 6.4 | | 1 60 White 19.1 | | 1 150 White 25 | | 1 300 White 24.6 | | 1 400 White 28 | |--------------------------| | 2 0 White 2.1 | | 2 10 White 2.8 | | 2 20 White 8.3 | | 2 60 White 15.7 | | 2 150 White 21.9 | | 2 300 White 21.7 | | 2 400 White 30.1 | +--------------------------+ Computing a repeated measures ANOVA on the first two repeated measurements (doses 0 and 10), Statistics Linear models and related ANOVA Analysis of variance and covariance Model tab: Dependent variable: fbf Model: id dose Model variables: all categorical Repeated measures variables: dose by/if/in tab: If (expression): dose==0 | dose==10 OK anova fbf id dose if dose==0 | dose==10 , repeated(dose) Chapter 5-18 (revision 16 May 2010) p. 10 Number of obs = 43 Root MSE = .637673 R-squared = Adj R-squared = 0.9129 0.8172 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 85.2847538 22 3.87657972 9.53 0.0000 | id | 81.9059936 21 3.90028541 9.59 0.0000 dose | 3.64915245 1 3.64915245 8.97 0.0071 | Residual | 8.13254708 20 .406627354 -----------+---------------------------------------------------Total | 93.4173008 42 2.22422145 Between-subjects error term: Levels: Lowest b.s.e. variable: id 22 id (21 df) Repeated variable: dose Huynh-Feldt epsilon = Greenhouse-Geisser epsilon = Box's conservative epsilon = 1.0000 1.0000 1.0000 ------------ Prob > F -----------Source | df F Regular H-F G-G Box -----------+---------------------------------------------------dose | 1 8.97 0.0071 0.0071 0.0071 0.0071 Residual | 20 -----------+---------------------------------------------------- Comparing this to the paired t test computed above, we see that the term in the anova table for dose has an identical p value to the paired t test. Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 The id term is just ignored in this output. Ignore the second table for now, which is not useful until more than two repeated measurements are included in the model. That table will be explained below. Chapter 5-18 (revision 16 May 2010) p. 11 To see the underlying regression model that the anova fitted, we can use, anova, regress regress // Stata version 10 // Stata version 11 Source | SS df MS -------------+-----------------------------Model | 85.2847538 22 3.87657972 Residual | 8.13254708 20 .406627354 -------------+-----------------------------Total | 93.4173008 42 2.22422145 Number of obs F( 22, 20) Prob > F R-squared Adj R-squared Root MSE = = = = = = 43 9.53 0.0000 0.9129 0.8172 .63767 -----------------------------------------------------------------------------fbf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------id | 2 | 1.25 .6376734 1.96 0.064 -.0801634 2.580163 3 | .45 .6376734 0.71 0.489 -.8801633 1.780163 4 | 1.47 .6376734 2.31 0.032 .1398367 2.800163 5 | 2 .6376734 3.14 0.005 .6698367 3.330163 6 | 2.7 .6376734 4.23 0.000 1.369837 4.030163 7 | .02 .6376734 0.03 0.975 -1.310163 1.350163 8 | 2.194762 .7871611 2.79 0.011 .5527725 3.836751 9 | 6.1 .6376734 9.57 0.000 4.769837 7.430164 10 | 4.05 .6376734 6.35 0.000 2.719837 5.380163 11 | .605 .6376734 0.95 0.354 -.7251634 1.935163 12 | .95 .6376734 1.49 0.152 -.3801634 2.280163 13 | 2.6 .6376734 4.08 0.001 1.269837 3.930163 14 | 1.38 .6376734 2.16 0.043 .0498367 2.710163 15 | .7 .6376734 1.10 0.285 -.6301634 2.030163 16 | 1.43 .6376734 2.24 0.036 .0998366 2.760163 17 | 1.15 .6376734 1.80 0.086 -.1801634 2.480163 18 | 2.365 .6376734 3.71 0.001 1.034837 3.695163 19 | .83 .6376734 1.30 0.208 -.5001633 2.160163 20 | 1.51 .6376734 2.37 0.028 .1798365 2.840163 21 | .45 .6376734 0.71 0.489 -.8801633 1.780163 22 | .8 .6376734 1.25 0.224 -.5301634 2.130163 | 10.dose | .5895238 .1967903 3.00 0.007 .1790265 1.000021 _cons | .9052381 .4615141 1.96 0.064 -.0574635 1.86794 Comparing this to the paired t test computed above, we see that the term in the anova table for dose has an identical p value to the paired t test, and identical standard error, and the same effect size (mean difference) . Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------fbf10 | 21 3.057143 .3864103 1.770755 2.251105 3.863181 fbf0 | 21 2.467619 .2532897 1.160719 1.939266 2.995972 ---------+-------------------------------------------------------------------diff | 21 .5895238 .1967903 .9018064 .1790265 1.000021 -----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0 Ha: mean(diff) < 0 t = 2.9957 P < t = 0.9964 Chapter 5-18 (revision 16 May 2010) Ha: mean(diff) != 0 t = 2.9957 P > |t| = 0.0071 Ha: mean(diff) > 0 t = 2.9957 P > t = 0.0036 p. 12 We discover that the model created an indicator term for each subject, which makes the dose comparison a “stratified” analysis by subject. Recall from the chapter on conditional logistic regression, that stratifying by the matching variable (subject in this case) inflated the odds ratio (the effect measure). In least squares estimation, such as anova and linear regression, the effect measure, or mean difference is not inflated. However, the model is still subject to overfitting, since each indicator term for subject counts one toward the N/10 = # allowed predictors rule to avoid overfitting. Clearly, this rule is violated in this example. From the output, we see that R-squared = 0.9129, which is R = 0.9555. The Pearson correlation calculated above between dose 0 and dose 10 was r = 0.8927. Thus the R is inflated due to overfitting. The adjusted R-squared = 0.8172, which is R = 0.9040, did a nice job of removing the effect of overfitting. Chapter 5-18 (revision 16 May 2010) p. 13 Two-way Repeated Measures ANOVA With One Repeated Measurements Factor and One Between Groups Factor. Next, we will extend this paired t test to allow for a between subjects covariate, by including race in the model. That is, we are interested to know if the change from 0 dose to dose 10 differs between blacks and whites. This could be described as “a twoway repeated measures ANOVA with one repeated measures factor and one between groups factor.” Computing the ANOVA, anova fbf race if dose==0 | * anova fbf race if dose==0 | / id|race dose dose*race /// dose==10 ,repeated(dose) // Stata version 10 / id|race dose dose#race /// dose==10 ,repeated(dose) // Stata version 11 Number of obs = Root MSE = 43 .64235 R-squared = Adj R-squared = 0.9161 0.8145 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 85.5776555 23 3.72076763 9.02 0.0000 | race | 5.07251403 1 5.07251403 1.32 0.2648 id|race | 77.0659916 20 3.85329958 -----------+---------------------------------------------------dose | 3.28830185 1 3.28830185 7.97 0.0109 dose#race | .292901785 1 .292901785 0.71 0.4100 | Residual | 7.8396453 19 .41261291 -----------+---------------------------------------------------Total | 93.4173008 42 2.22422145 Between-subjects error term: Levels: Lowest b.s.e. variable: Covariance pooled over: id|race 22 id race (20 df) (for repeated variable) Repeated variable: dose Huynh-Feldt epsilon *Huynh-Feldt epsilon reset Greenhouse-Geisser epsilon Box's conservative epsilon = to = = 1.0526 1.0000 1.0000 1.0000 ------------ Prob > F -----------Source | df F Regular H-F G-G Box -----------+---------------------------------------------------dose | 1 7.97 0.0109 0.0109 0.0109 0.0109 dose#race | 1 0.71 0.4100 0.4100 0.4100 0.4100 Residual | 19 ---------------------------------------------------------------- We’ll put off the interpretation of the race and dose terms in this model, saving that for below. The dose × race interaction term, however, has a very intuitive interpretation. It tells us that blacks change differently than whites from no dose to initial dose (dose of 10). Chapter 5-18 (revision 16 May 2010) p. 14 In fact, for the two repeated measurements case, this interaction term is identically an independent groups t test on the change score. Let’s verify this. The change score is easier to compute in wide format, so we’ll first return to that reshape wide fbf , i(id) j(dose) list in 1/5 1. 2. 3. 4. 5. +----------------------------------------------------------------------+ | id fbf0 fbf10 fbf20 fbf60 fbf150 fbf300 fbf400 race | |----------------------------------------------------------------------| | 1 1 1.4 6.4 19.1 25 24.6 28 White | | 2 2.1 2.8 8.3 15.7 21.9 21.7 30.1 White | | 3 1.1 2.2 5.7 8.2 9.3 12.5 21.6 White | | 4 2.44 2.9 4.6 13.2 17.3 17.6 19.4 White | | 5 2.9 3.5 5.7 11.5 14.9 19.7 19.3 White | +----------------------------------------------------------------------+ Then, computing the change score, capture drop diff10 gen diff10 = fbf10-fbf0 list fbf0 fbf10 diff10 in 1/5 1. 2. 3. 4. 5. +-------------------------+ | fbf0 fbf10 diff10 | |-------------------------| | 1 1.4 .4 | | 2.1 2.8 .7 | | 1.1 2.2 1.1 | | 2.44 2.9 .46 | | 2.9 3.5 .5999999 | +-------------------------+ Finally, computing the independent groups t test on the change score, ttest diff10, by(race) Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------White | 12 .7341667 .3088259 1.069804 .0544455 1.413888 Black | 9 .3966667 .2071634 .6214902 -.081053 .8743863 ---------+-------------------------------------------------------------------combined | 21 .5895238 .1967903 .9018064 .1790265 1.000021 ---------+-------------------------------------------------------------------diff | .3375 .4005753 -.5009138 1.175914 -----------------------------------------------------------------------------diff = mean(White) - mean(Black) t = 0.8425 Ho: diff = 0 degrees of freedom = 19 Ha: diff < 0 Pr(T < t) = 0.7950 Ha: diff != 0 Pr(|T| > |t|) = 0.4100 Ha: diff > 0 Pr(T > t) = 0.2050 We now see that the p value is identical to the dose × race interaction term p value, which verifies that the interaction term is identically an independent groups t test on the change score. Chapter 5-18 (revision 16 May 2010) p. 15 Repeated Measures ANOVA With More Than Two Measurements In the Repeated Measurements Factor Now, let’s use all dose levels. We reshape the data to long format and leave off the if statement to allow all doses to be included. reshape long fbf , i(id) j(dose) anova fbf race * anova fbf race / id|race dose dose*race ,repeated(dose) // Version 10 / id|race dose dose#race ,repeated(dose) // Version 11 Number of obs = 150 Root MSE = 3.33212 R-squared = Adj R-squared = 0.8915 0.8606 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 10580.8603 33 320.632131 28.88 0.0000 | race | 2093.30331 1 2093.30331 16.44 0.0006 id|race | 2546.14177 20 127.307089 -----------+---------------------------------------------------dose | 3898.97947 6 649.829911 58.53 0.0000 dose#race | 1164.13439 6 194.022398 17.47 0.0000 | Residual | 1287.95004 116 11.1030176 -----------+---------------------------------------------------Total | 11868.8104 149 79.6564455 Between-subjects error term: Levels: Lowest b.s.e. variable: Covariance pooled over: id|race 22 id race (20 df) (for repeated variable) Repeated variable: dose Huynh-Feldt epsilon = Greenhouse-Geisser epsilon = Box's conservative epsilon = 0.3092 0.2732 0.1667 ------------ Prob > F -----------Source | df F Regular H-F G-G Box -----------+---------------------------------------------------dose | 6 58.53 0.0000 0.0000 0.0000 0.0000 dose#race | 6 17.47 0.0000 0.0000 0.0000 0.0005 Residual | 116 ---------------------------------------------------------------- When you have more than two time points, you should always make a correction to the p value of a repeated measures ANOVA to account for departures from “sphericity”. Sphericity is a complex form of the analogous “equal variances” assumption of the independent groups t test and ANOVA. The farther the departure from this assumption being justified, the greater the difference of the sphericity corrected and non-corrected p values. This adjustment is reported in the second table. Chapter 5-18 (revision 16 May 2010) p. 16 Each of the three methods adjust the degrees of freedom by computing an epsilon which is then multiplied by the unadjusted degrees of freedom—instead of (t-1) and (n-1)(t-1) degrees of freedom, the adjusted degrees of freedom are (t-1) and (n-1)(t-1) degrees of freedom The most widely used adjustment is the Greenhouse-Geisser adjustment. (Twisk, 2003, p.25) You’ll notice in the case of the two repeated measurements above, the three epilsons were 1, so no adjustment took place. In this second example with more than two repeated measurements, the epsilons are less than 1, reducing the degrees of freedom, and making it more difficult to get a significant p value. Twisk (2003, p.25) describes the sphericity assumption as, “This assumption is also known as the ‘compound symmetry’ assumption. It applies, firstly, when all correlations in outcome variable Y between repeated measurements are equal, irrespective of the time interval between the measurements. Secondly, the variances of otucome variable Y must be the same at each of the repeated measurements.” SPSS describes the sphericity assumption in its repeated measures output as, “the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix”. The best approach for reporting is just to assume that those who know about it are familiar with the term, and those who don’t do not want to know. You could describe the analysis as: Statistical Methods The repeated measurements of forearm blood flow across the of isoproterenol dose levels were analyzed using a two-way repeated measures analysis of variance, with the Greenhouse-Geisser adjustment to the p values to account for any violation of the sphericity assumption. The repeated measures factor was dose level and the between groups factor was race. Example Tonetti et al (N Engl J Med, 2007) reported a repeated measures analysis of variance with a Greenhouse-Geisser adjustment to the F-test, as well as a time × group interaction term to test the hypot. This is how they stated it in their Statistical Methods section, “We performed a repeated-measures analysis of variance to determine differences in flow-mediated dilatation (the primary outcome) and all secondary and other outcomes between the two groups and over time, using a conservative F-test for the interaction between time and treatment group (SPSS software version 13). The Greenhouse-Geisser correction for the F test was used to adjust the degrees of freedom for deviations form sphericity (one of the assumptions of repeated-measures analysis of variance [ANOVA].” Chapter 5-18 (revision 16 May 2010) p. 17 Repeated measures data are frequently displayed using profile plots, which is just a graph of the means across time. Here is such a plot with 95% confidence intervals. * -- profile plot use 11.2.Isoproterenol.dta, clear // wide format #delimit ; collapse (mean) mfbf0=fbf0 mfbf10=fbf10 mfbf20=fbf20 mfbf60=fbf60 mfbf150=fbf150 mfbf300=fbf300 mfbf400=fbf400 (sd) sfbf0=fbf0 sfbf10=fbf10 sfbf20=fbf20 sfbf60=fbf60 sfbf150=fbf150 sfbf300=fbf300 sfbf400=fbf400 (count) nfbf0=fbf0 nfbf10=fbf10 nfbf20=fbf20 nfbf60=fbf60 nfbf150=fbf150 nfbf300=fbf300 nfbf400=fbf400 , by(race); #delimit cr foreach i of numlist 0 10 20 60 150 300 400 { gen fbflcl`i'=mfbf`i'-invttail(nfbf`i'-1,0.025)*sfbf`i'/sqrt(nfbf`i') gen fbfucl`i'=mfbf`i'+invttail(nfbf`i'-1,0.025)*sfbf`i'/sqrt(nfbf`i') } reshape long mfbf sfbf nfbf fbflcl fbfucl, i(race) j(dose) list capture drop dose2 gen dose2=dose replace dose2=dose2+5 if race==2 order dose dose2 #delimit ; twoway (scatter mfbf dose2 if race==1 , connect(direct) clpattern(solid)clcolor(blue) clwidth(medium) msymbol(square) mlcolor(blue) mfcolor(blue) msize(medium)) (scatter mfbf dose2 if race==2 , connect(direct) clpattern(solid)clcolor(green) clwidth(medium) msymbol(triangle) mlcolor(green) mfcolor(green) msize(medium)) (rcap fbfucl fbflcl dose2 if race==1 , blcolor(blue) blwidth(medium) blpattern(solid) ) (rcap fbfucl fbflcl dose2 if race==2 , blcolor(green) blwidth(medium) blpattern(solid) ) , xlabel(0 10 20 60 150 300 400) xtitle("Isoproterenol dose (ng/min)", height(5)) ylabel(0(5)25,angle(horizontal)) ytitle("Mean Forearm Blood Flow (ml/min/dl)") legend(order(1 "White" 2 "Black")) note("Shown are means and 95% confidence intervals") ; #delimit cr Chapter 5-18 (revision 16 May 2010) p. 18 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. +---------------------------------------------------------------------+ | dose dose2 race mfbf sfbf nfbf fbflcl fbfucl | |---------------------------------------------------------------------| | 0 0 White 2.7146 1.3943 13 1.872018 3.557212 | | 10 10 White 3.4167 2.2307 12 1.999342 4.833992 | | 20 20 White 6.4583 2.7348 12 4.720738 8.19593 | | 60 60 White 14.658 7.3405 13 10.22189 19.09349 | | 150 150 White 17.25 8.8903 12 11.60138 22.89862 | |---------------------------------------------------------------------| | 300 300 White 20.2 8.4396 12 14.83772 25.56228 | | 400 400 White 24.408 8.7034 13 19.14828 29.6671 | | 0 5 Black 2.1811 .5571 9 1.752886 2.609337 | | 10 15 Black 2.5778 .73786 9 2.010605 3.144951 | | 20 25 Black 3.2111 1.0397 9 2.411905 4.010317 | |---------------------------------------------------------------------| | 60 65 Black 5.3022 2.1676 9 3.636074 6.96837 | | 150 155 Black 6.2344 2.2373 9 4.514739 7.95415 | | 300 305 Black 9.0567 4.0413 9 5.950222 12.16311 | | 400 405 Black 7.7678 5.4942 9 3.544545 11.99101 | +---------------------------------------------------------------------+ Mean Forearm Blood Flow (ml/min/dl) 25 20 15 10 5 0 01020 60 150 Isoproterenol dose (ng/min) (ng/min)White 300 400 Black Shown are means and 95% confidence intervals Exercise Look at the profile plots in the Tonetti et al (N Engl J Med, 2007) paper. Their plots are just like this, only they chose to use standard errors, rather than confidence intervals. Using the standard errors is just as common as using the confidence intervals; however, standard errors mislead the reader into thinking the separation of the two lines is greater than it really is. Chapter 5-18 (revision 16 May 2010) p. 19 Overlapping Confidence Intervals When the confidence intervals around statistics, such as group means, do not overlap, this indicates that the difference between the means is statistically significant. However, the converse is not always true. (see box). Overlapping Confidence Intervals Do Not Imply Nonsignificance van Belle (2002, p.39) explains, “It is sometimes claimed that if two independent statistics have overlapping confidence intervals, then they are not significantly different. This is certainly true if there is substantial overlap. However, the overlap can be surprisingly large and the means still significantly different. … Confidence intervals associated with statistics can overlap as much as 29% and the statistics can still be significantly different.” Chapter 5-18 (revision 16 May 2010) p. 20 ____________________________________________________________________________ Interpretation (a bit advanced, but might be interesting to MStat students) Just what do the results mean? The repeated measures ANOVA we have seen is called the “univariate approach”. There is also a “multivariate approach”, called MANOVA. The MANOVA approach compares each repeated measure to the preceding repeated measure, and so has an intuitive interpretation. However, the univariate approach is more powerful and thus more popular. The univariate approach is simply a sums of squares approach. The sums of squares for the between factor (between whites and blacks) is computed as (Twisk, 2003, p.26-27): T SSb N ( yt y ) 2 t 1 where N is the number of subjects, T is the number of repeated measurements, yt is the average of outcome variable Y at time-point t, and y is the overall average of outcome variable Y. The sums of squares for the within subjects, or repeated measures, factor is computed as: T N SS w ( yit yt )2 t 1 i 1 where N is the number of subjects, T is the number of repeated measurements, yit is the value of outcome variable Y for individual i at time-point t, and yt is the average value of outcome variable Y at time-point t. The within subjects sums of squares, then, is simply a way to determine if all the subjects’ scores are equal across the dose levels (all on a horizontal line). The between groups sums of squares, then, is simply a way to determine if subjects’ scores across dose level in one group are different from the other group. The between groups factor, race in this example, may be significant simply because the baseline repeated measure was different, forearm blood flow for dose = 0 in this example. Computing change scores from baseline is one way to adjust for this. However, it is generally only the interaction term that is of interest, which does not require equal baseline values. _____________________________________________________________________________ Chapter 5-18 (revision 16 May 2010) p. 21 Limitations of the Repeated Measures ANOVA Approach This approach does not use any information about the spacing between repeated measurements. Thus it does not take into account the unequal intervals in the dose in this example, treating them the same as if they were incremented by a constant amount. Secondly, this approach always uses a listwise deletion of missing data, so a subject is lost if any time point is missing. The two methods will consider next, GEE and multilevel (mixed effects) models, overcome both limitations. For that reason, and because these other methdos are easier to use, there really is no need to analyze your data with repeated measures ANOVA. References Dupont WD. (2002). Statistical Modeling for Biomedical Researchers: a Simple Introduction to the Analysis of Complex Data. Cambridge, Cambridge University Press. Fleiss JL. (1986). The Design and Analysis of Clinical Experiments. New York, John Wiley & Sons. Frison L, Pocock S. (1992). Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design. Statistics in Medicine 11:1685-1704. Lang CC, Stein CM, Brown RM, et al. (1995). Attenuation of isoproterenol-mediated vasodilation in blacks. N Engl J Med 333:155-60. Tonetti MS, D’Aiuto F, Nibali L et al. (2007). Treatment of periodontitis and endothelial function. N Engl J Med 356(9):911-20. Twisk JWR. (2003). Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. Cambridge, Cambridge University Press. van Belle G. (2002). Statistical Rules of Thumb. New York, John Wiley & Sons. Chapter 5-18 (revision 16 May 2010) p. 22