* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Modeling Variation in Repeated Measures Data
Adherence (medicine) wikipedia , lookup
Compounding wikipedia , lookup
Polysubstance dependence wikipedia , lookup
Pharmacognosy wikipedia , lookup
Neuropharmacology wikipedia , lookup
Pharmaceutical industry wikipedia , lookup
Electronic prescribing wikipedia , lookup
Prescription drug prices in the United States wikipedia , lookup
Prescription costs wikipedia , lookup
Drug interaction wikipedia , lookup
Drug discovery wikipedia , lookup
Drug design wikipedia , lookup
Modeling Variation in Repeated Measures Data Ramon C. Littell, Department of Statistics, University of Florida ABSTRACT through FEV1SH. The second data set, named FEV1 UNI, contained data in a univariate mode with variables DRUG, PATIENT, HR, BASEFEV1, and FEV1. Treatment means are plotted versus HR in Figure 1. The graph shows that means for the three treatment groups are essentially the same at HR=O (baseline). At. HR=l the mean for drug B is larger than the mean for drug A, and both of the drug means are much larger than the placebo mean. Means for drugs A and B continue to be larger than the placebo means for subsequent hours, but the magnitudes of the differences decrease sharply with time. It is of interest to estimate differences between the treatment group means at various times, and to estimate differences between means for the same treatment at different times. Statistical analysis of repeated measures data has been a problem because of covariation among measurements on the same experimental unit. Until recently, analysis techniques available in computer software only permitted the user to ignore or avoid covariance structure rather than incorporate it into the statistical model. Ignoring covariance structure results in erroneous inference and avoiding it results in suboptimal inference. The MIXED procedure provides a rich selection of modeling structures through the RANDOM and REPEATED statements. Examples will illustrate how to choose a covariance structure, and the effects on significance tests and confidence intervals. INTRODUCTION Mixed linear statistical models state that observed data consist of two parts, fixed effects and random effects. Fixed effects define the mean of the population from which the observation was drawn, and the random effects define the variance of the observation and its covariance with other observations. Mixed linear models are often used with repeated measures data to accommodate the covariation between observations on the same subject at different times. PROC MIXED in the SAS System provides a rich selection of covariance structure from which to choose. Most discussion in the literature has focused on the imporlance of proper covariance modeling in order to obtain valid F tests for testing time effects and treatment-by-time interaction effects. In the present paper we examine the effects of choice of covariance structure on estimates of treatment means at various times, and on standard errors of differences between treatment means. 3.8 • Drug A Drug B 3.6 Drug P ~~~ 3,4 :; w 3 .2 u- 3 2.8 2.6 0 1 2 3 4 5 6 7 8 HR Figure 1. FEV1 means for three patient groups at hourly intervals following treatment. EXAMPLE DATA SET An example from the pharmaceutical industry, which compared the effects of two drugs and a placebo on a measure of respiratory ability, called FEV1, will be used to illustrate covariance modeling. Twenty-four patients were assigned to each of the treatment groups, and FEV1 was measured at baseline Ommediately prior to administration of the drugs), and at hourly intervals thereafter for eight hours. Data were analyzed using PROC GLM and PROC MIXED. Two SAS data sets were created. The first data set, named FEV1 MULT, contained data in a multivariate mode and had variables DRUG, PATIENT, BASEFEV1, and FEV11H COMPARISON OF COVARIANCE STRUCTURES We first use PROC GLM with a REPEATED statement to examine the covariance structure in the data. The variables FEV1-FEVS are regarded as repeated measures and the variable BASEFEV1 is considered a baseline covariable. Run the statements proc glm data=fev1 mult ; class drug; model fev1_1h-fev1_Sh = fev_O drug; 1143 contrast 'trt vs co nt' drug 1 1 -2; contrast 'drga vs drgb' drug 1 -1 0; repeated time contrast I summary printe; run; The within-subject correlation matrix is printed because of the PRINTE option, and is shown in Table 1. The correlations between FEVl 1 and FEVl 2-FEVl 8 are in the first row of the matrbc. Correlations generally decrease from 0.893 with FEV1_2 down to 0.642 with FEV1_8. Similar decreases are found between FEVl 2 and FEV1 3-FEVl 8, between FEV1 3 and FEV1-4-FEV1 8, etc. In short, correlations beiween pairs -of FEV1 measurements decrease with the number of hours between the times the measurements were obtained. This is a common phenomenon with repeated measures data. As a consequence, a univariate analysis of variance is likely not appropriate. A formal test of whether the covariance structure meets the necessary assumptions (called the Huhyn-Feldt conditions) for univariate ANOVA is given by the test for sphericily applied to orthogonal components. The approximate chi-square form of the test has the value 183.98 with 27 degrees of freedom, and has p-value equal to 0.0001. According to this test, the conditions are not met. Thus another type of analysis must be used. 3. autoregressive COV(Yijk,Yij ,) = 02plk~1 4. autoregressive with random effect for patient COV(Yijk'Yij,) = of+~plk~1 5. unstructured Models with these covariance structures are implemented with the respective sets of statements: 1. 2. proc mixed data=fev1 uni; class drug patient time; model fev1 = fev1_0 drug time drug'time; repeated I type = cs sub = patient r rcorr; run; 3. proc mixed data=fev1 uni; class drug patient time; model fev1 = fev1_0 drug time drug'time; repeated I type = ar(l) sub = patient r rcorr; run; 4. proc mixed data=fevl uni; class drug patient time; model fev1 = fevl_0 drug time drug'time; random patient; repeated I type = ar(1) sub = patient r rcorr; run; 5. proc mixed data=fevl uni; class drug patient time; model fevl = fevl_0 drug time drug'time; repeated I type = un sub = patient r rcorr; run; Table 1. Partial Correlations of FEVl at8 Times from REPEATED Statement in PROC GLM Time 1 1 Z d ~ § § 1.0 .894 .879 .783 .691 .671 I §. .511 .642 2 .894 3 .879 .908 4 .783 .877 .917 5 .691 .807 .816 .835 6 .671 .896 .743 .734 .859 7 .511 .587 .639 .667 .736 .810 8 .642 .702 .741 .751 .856 .881 .818 1.0 .908 .877 .807 .896 .587 .702 Rather than display results for these statements in sequence, we display comparable parts in groups. First, parameter estimates in the covariance matrices for the various structures (excepting "unstructured,,) are: 1.0 .917 .816 .743 .639 .741 1.0 .835 .734 .667 .751 1.0 .859 .736 .856 1.0 .810 .881 1.0 .818 1.0 simple COV(Yijk,Yij,) (;2 _ 0.267 2. compound symmetric ~ 3. _ 0.206 autoregressive t;2 _ 0.266 P4. autoregressive with ~ 0.856 _ 0.185 random effect for 0; _0.083 patients p _ 0.540 = 0 if k • ! 5. =(J2ifk=! 2. compound symmetric 1. simple 0; - 0.063 We now turn to PROC MIXED for analyses which accommodate structures defined on the covariance matrix. Define Yrk to be FEVl measured at HR = k on PATIENT = j in DRUG = i. The structures in discussion pertain to covariances among measures at different hours on the same patient. Measures on different patients are considered independent in all cases. We consider five covariance structures, defined as follows: 1. proc mixed data=fev1 uni; class drug patient time; model fevl = fev1_0 drug time drug'time; repeated I type = simple sub = patient r rcorr; COV(Yijk,Yij,) = of if k • e = ~+~ifk=e 1144 unstructured (parameter estimates shown in Table 2.) The covariance matrices resulting from the "R" and "RCORR" options in the repeated statements are printed in Table 2. These correlation matrices, except for structure 4, "autoregressive with random effect for patient," are directly comparable with the correlation matrix in Table 1. Ignore structure 4 momentarily and compare the other correlation matrices in Table 2 with the correlation matrix in Table 1. Structures 1, "simple," and 2, "compound symmebic," clearly do not reflect the trends in Table 1. Structure 3, "autoregressive," has the general trend of correlations decreasing with length of time interval, but the values of the correlations in the autoregressive structure are too small, especially for long intervals. Thus none of the first three correlation structures appear to adequately model the correlation structure of the data. Structure 5, "unstructured," shows correlations which are very similar to the partial correlations in Table 1. The "unstructured" type is adequate, and would, in fact, be quite satisfactory for this example. Computing time can by excessive with a large number of times. The structure must be modeled by fewer parameters in order to be useful with a small number of patients. 5. Unstructured 1 .891 ;l ~ .2 § Z § .267 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ~ .2 § Z ~ .259 .233 .243 .220 .181 .156 .195 .880 .908 .254 .252 .219 .191 .168 .204 .784 .892 .915 .299 .240 .204 .190 .226 .688 .807 .813 .822 .286 .232 .204 .247 .675 .698 .745 .735 .855 .258 .214 .245 .516 .590 .643 .670 .733 .812 .270 .233 .646 .701 .742 .755 .845 .882 .820 .299 (Vanances on diagonal, CQII'artances above diagonal, corn:latlons below diagonal) Now turn to structure 4, "autoregressive with random effect for patient." As noted above, the covariance estimates in Table 2 for structure 4 do not produce correlation estimates which are directly comparable with the correlation structure in Table 1 that we are trying to model. The problem is that covariances in Table 2 for structure 4 do not contain the patient variance component, ~. In order to obtain correlation estimates for structure 4 which are comparable to correlations in Table 1, we must add the patient variance component estimate equal to 0.185 to each covariance. Doing so results in the covariance matrix and corresponding correlation matrix in Table 3. Inspection of Table 3 shows good agreement with correlations in Table 1. 1. Simple ~ ;l .226 .216 .211 .204 .175 .163 .128 .168 Table 2. Covariances and Correlations from R and RCORR Options on REPEATED Statement in PROC MIXED for Five Covariance Structures 1 ~ (CoYatlance aM CQY<Inances In top line, COtfelations In bottQm line) Table 3. Covariances and Correlations for Covariance Structure 4, "Autoregressive with Random Effect for Patient" 2. Compound Symmetric 1 ~ ;l ~ .2 § Z § .269 .206 .206 .206 .206 .206 .206 .206 1 1.0 .766 .766 .766 .766 .766 .766 .766 ~ ;l ~ .2 § Z ~ .268 .230 .209 .198 .192 .189 .187 .186 (CoYariance and c:awriance:s In top line, conelations In bettom line) 1.0 .858 .780 .739 .716 .705 .698 .694 3. Autoregressive (1) (V...i:allce and CQv,iiuiilnce:$ In top line. 1 ~ ;l ~ .2 § Z .266 .228 .195 .167 .143 .123 .105 .090 bQttom ik'Ie) 4. Autogressive (1) with Random Effect for Patient 1 ~ ;l ~ .2 § In botWm line) Akaike's Information Criterion (AlC) and Schwarz's Bayesian Criterion (SBC) are indices of relative goodness of fit of estimated covariance structures. Both of these criteria are log fikelihood values penalized for the number of estimated parameters. The larger the AlC or SBC values, the better the fit. AIC and SBC values for the five covariance structures are shown in Table 4. Structure 5, "unstructured," has the largest AlC, but Structure 4, "autoregressive with random effect for patient," has the largest SBC. For this example, SBC yields the best comparison of model fits because 1.0 .856 .733 .629 .538 .461 .394 .338 (C'..o¥arianc:e and CI7Arianoes In top line, correlations In ~ § Z § .083 .045 .024 .013 .007 .004 .002 .001 1.0 .540 .291 .157 .085 .046 .025 .013 {CownallCe and (;O¥';Irtances in top line, COITeIatiOns In bottom line) 1145 it reflects the large number of parameters in structure 5. Based on inspection of the correlation estimates in Tables 3 and 1 and the relative values of SBC, we conclude that structure 4 "autoregressive with random effect for patient" is the best choice of covariance structure. Table 5. Estimates and Standard Errors for Five Covariance Structures 1. Simple Earameter Table 4. Akaike's Information Criterion (AIC) and Schwarz's Bayesian Criterion (SBC) for Five Covariance Structures AIC 1. simple SSC -459.5 -461.6 2. compound symmetric -175.6 -179.9 3. autoregressive -139.5 -143.8 4. autoregressive with random -126.5 -132.9 effect for patient 5. unstructured -110.1 Estimate Std. Error tim1-tim8 drga 0.61541667 0.14909825 tim2-tim8 drga 0.53875000 0.14909825 tim 7-tim8 drga 0.01041667 0.14909825 drgb-drga tim1 0.21844510 0.14909825 2. Compound Symmetric Earameter -187.7 Estimate Std. Error tim1-tim8 drga 0.61541667 0.07252978 tim2-tim8 drga 0.53875000 0.07252978 tim7-tim8 drga 0.01041667 0.07252978 drgb-drga tim1 0.21844510 0.14984024 3. Autoregressive COMPARISON ESTIMATES OF STANDARD ERRORS OF Parameter In the previous section we observed the correlation and covariance matrices produced by five choices of structure. In this section we shall observe the effect of these choices on certain parameter estimates and their standard errors. We select a set of four comparisons among means and illustrate with ESTIMATE statements. The first three comparisons are differences between times at DRUG = A. The fourth is a comparison of drugs a and b at time 1. estimate 'tim1-tim8 drg a' estimate 'tim2-tim8 drg a' estimate 'tim7-tim8 drg a' estimate 'drga-drgb tim l' Estimate Std. Error tim1-tim8 drga 0.61541667 0.12114983 tim2-tim8 drga 0.53875000 0.11584838 tim 7-tim8 drga 0.01041667 0.05640908 drgb-drga tim1 0.21792809 0.14894036 4. Autoregressive with Random Effect for Patient Estimate Std. Error tim1-tim8 drga 0.61541667 0.08265180 tim2-tim8 drga 0.53875000 0.08217230 tim 7-tim8 drga 0.01041667 0.05643158 drgb-drga tim1 0.21824915 0.14944556 Earameter time 1000000-1 drug'time 10000 -1; time 0 1 0 0 0 0 0 -1 drug'time0100000-1; time 0 0 0 0 001 -1 drug'time 0 0 0 0 0 0 1 -1; drug 1 -1 0 drug'time-1 0000000 10000000; on 5. Unstructured Parameter run; Parameter estimates and standard errors from running these four ESTIMATE statements with each of the five covariance structures are in Table 5. We first discuss the results from the first three ESTIMATE statements; that is, those gMng differences between times 1 and 8, 2 and 8, and 7 and 8. It is perhaps surprising that exactly the same values of the three estimates are obtained from each of the first covariances. This will not happen in all cases if data Estimate Std. Error tim1-tim8 drga 0.61541667 0.08880390 tim2-tim8 drga 0.53875000 0.08368152 tim7-tim8 drga 0.01041667 0.06559564 drgb-drga tim1 0.21879074 0.13739961 are unbalanced, nor will it happen if polynomial trends are used to model time effects. In this example the data are balanced and time is treated as a discrete 1146 Enhancements, Release 6.10, Cary, NC. SAS Institute, Inc., 1994, 121 pp. factor. However, distinctions between the five covariance structures appear in the standard errors of the estimates. Each different structure results in different standard error estimates. Structure number 1, "simple; treats the data as if all observations were independent with the same variance. This results in equal standard error estimates for all differences between time means at the same drug and differences between drug means at the same time. These are incorrect, because this structure clearly is inappropriate. Structure number 2, "compound symmetric," acknowledges between-patient variation as being greater that within-patient variation. This results in standard errors for the first three within-patient contrasts being smaller than the standard error for the fourth between patient contrast, which is appropriate. But compound symmetry does not accommodate different standard errors of differences between times being dependent on the length of the time interval. Structure number 3, "autoregressive," results in standard errors between times which depend on the length of the time interval. The standard error is 0.1211 for the cfdference between times 1 and 8, 0.1158 for the difference between times 2 and 8, etc, down to 0.0564 for the difference between times 7 and 8. If the autoregressive structure is correct, then these estimates of standard errors should be estimates of the same quantities provided by the structure number 5 ("unstructuredj estimates. The structure number 5 estimates range from 0.0888 for the difference between times 1 and 8 down to 0.0656 for the difference between times 7 and 8. Thus the autoregressive estimates appear too large for long time intervals (times 1 to 8) and too small for short time intervals (times 7 to 8). Finally, we examine the standard errors provided by structure number 4, "autoregressive with random effect for patient." We see that these estimates are quite similar to the structure 5 estimates. In conclusion, similarity of structure 4 estimates to structure 5 estimates and the fact that the Schwarz Bayesian Criterion was largest for structure 4 leads us to prefer structure 4. For additional information on covariance modeling possibilities, see SAS Technical Report P-229, SAS/STAT Software: Changes and Enhancements, Release 6.07, and SAS/STAT Software: Changes and Enhancements, Release 6.10. REFERENCES SAS Institute Inc., SAS Technical Report P-229, SAS/STAT Software: Changes and Enhancements, Release 6.07, Cary, NC. SAS Institute Inc., 1992,620 pp. SAS Institute Inc., SAS/STAT Software: Changes and 1147