Download Modeling Variation in Repeated Measures Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Adherence (medicine) wikipedia , lookup

Compounding wikipedia , lookup

Polysubstance dependence wikipedia , lookup

Pharmacognosy wikipedia , lookup

Neuropharmacology wikipedia , lookup

Medication wikipedia , lookup

Pharmaceutical industry wikipedia , lookup

Bilastine wikipedia , lookup

Electronic prescribing wikipedia , lookup

Prescription drug prices in the United States wikipedia , lookup

Prescription costs wikipedia , lookup

Drug interaction wikipedia , lookup

Drug discovery wikipedia , lookup

Bad Pharma wikipedia , lookup

Drug design wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Transcript
Modeling Variation in Repeated Measures Data
Ramon C. Littell, Department of Statistics, University of Florida
ABSTRACT
through FEV1SH. The second data set, named
FEV1 UNI, contained data in a univariate mode with
variables DRUG, PATIENT, HR, BASEFEV1, and
FEV1. Treatment means are plotted versus HR in
Figure 1. The graph shows that means for the three
treatment groups are essentially the same at HR=O
(baseline). At. HR=l the mean for drug B is larger than
the mean for drug A, and both of the drug means are
much larger than the placebo mean. Means for drugs
A and B continue to be larger than the placebo means
for subsequent hours, but the magnitudes of the
differences decrease sharply with time. It is of interest
to estimate differences between the treatment group
means at various times, and to estimate differences
between means for the same treatment at different
times.
Statistical analysis of repeated measures data has
been a problem because of covariation among
measurements on the same experimental unit. Until
recently, analysis techniques available in computer
software only permitted the user to ignore or avoid
covariance structure rather than incorporate it into the
statistical model. Ignoring covariance structure results
in erroneous inference and avoiding it results in
suboptimal inference. The MIXED procedure provides
a rich selection of modeling structures through the
RANDOM and REPEATED statements. Examples will
illustrate how to choose a covariance structure, and the
effects on significance tests and confidence intervals.
INTRODUCTION
Mixed linear statistical models state that observed data
consist of two parts, fixed effects and random effects.
Fixed effects define the mean of the population from
which the observation was drawn, and the random
effects define the variance of the observation and its
covariance with other observations. Mixed linear
models are often used with repeated measures data to
accommodate the covariation between observations on
the same subject at different times. PROC MIXED in
the SAS System provides a rich selection of covariance
structure from which to choose. Most discussion in the
literature has focused on the imporlance of proper
covariance modeling in order to obtain valid F tests for
testing time effects and treatment-by-time interaction
effects. In the present paper we examine the effects of
choice of covariance structure on estimates of
treatment means at various times, and on standard
errors of differences between treatment means.
3.8
•
Drug A
Drug B
3.6
Drug P
~~~
3,4
:;
w 3 .2
u-
3
2.8
2.6
0
1
2
3
4
5
6
7
8
HR
Figure 1. FEV1 means for three patient groups
at hourly intervals following treatment.
EXAMPLE DATA SET
An example from the pharmaceutical industry, which
compared the effects of two drugs and a placebo on a
measure of respiratory ability, called FEV1, will be used
to illustrate covariance modeling. Twenty-four patients
were assigned to each of the treatment groups, and
FEV1 was measured at baseline Ommediately prior to
administration of the drugs), and at hourly intervals
thereafter for eight hours. Data were analyzed using
PROC GLM and PROC MIXED. Two SAS data sets
were created. The first data set, named FEV1 MULT,
contained data in a multivariate mode and had
variables DRUG, PATIENT, BASEFEV1, and FEV11H
COMPARISON OF COVARIANCE STRUCTURES
We first use PROC GLM with a REPEATED statement
to examine the covariance structure in the data. The
variables FEV1-FEVS are regarded as repeated
measures and the variable BASEFEV1 is considered a
baseline covariable. Run the statements
proc glm data=fev1 mult ; class drug;
model fev1_1h-fev1_Sh = fev_O drug;
1143
contrast 'trt vs co nt' drug 1 1 -2;
contrast 'drga vs drgb' drug 1 -1 0;
repeated time contrast I summary printe;
run;
The within-subject correlation matrix is printed because
of the PRINTE option, and is shown in Table 1. The
correlations between FEVl 1 and FEVl 2-FEVl 8 are
in the first row of the matrbc. Correlations generally
decrease from 0.893 with FEV1_2 down to 0.642 with
FEV1_8.
Similar decreases are found between
FEVl 2 and FEV1 3-FEVl 8, between FEV1 3 and
FEV1-4-FEV1 8, etc. In short, correlations beiween
pairs -of FEV1 measurements decrease with the
number of hours between the times the measurements
were obtained. This is a common phenomenon with
repeated measures data. As a consequence, a
univariate analysis of variance is likely not appropriate.
A formal test of whether the covariance structure meets
the necessary assumptions (called the Huhyn-Feldt
conditions) for univariate ANOVA is given by the test for
sphericily applied to orthogonal components. The
approximate chi-square form of the test has the value
183.98 with 27 degrees of freedom, and has p-value
equal to 0.0001. According to this test, the conditions
are not met. Thus another type of analysis must be
used.
3.
autoregressive
COV(Yijk,Yij ,)
= 02plk~1
4.
autoregressive with
random effect for
patient
COV(Yijk'Yij,)
= of+~plk~1
5.
unstructured
Models with these covariance structures are
implemented with the respective sets of statements:
1.
2. proc mixed data=fev1 uni; class drug patient time;
model fev1 = fev1_0 drug time drug'time;
repeated I type = cs sub = patient r rcorr;
run;
3.
proc mixed data=fev1 uni; class drug patient time;
model fev1 = fev1_0 drug time drug'time;
repeated I type = ar(l) sub = patient r rcorr;
run;
4.
proc mixed data=fevl uni; class drug patient time;
model fev1 = fevl_0 drug time drug'time;
random patient;
repeated I type = ar(1) sub = patient r rcorr;
run;
5.
proc mixed data=fevl uni; class drug patient time;
model fevl = fevl_0 drug time drug'time;
repeated I type = un sub = patient r rcorr;
run;
Table 1. Partial Correlations of FEVl at8 Times
from REPEATED Statement in PROC GLM
Time
1
1
Z
d
~
§
§
1.0 .894 .879 .783 .691 .671
I
§.
.511 .642
2
.894
3
.879 .908
4
.783 .877 .917
5
.691 .807 .816 .835
6
.671 .896 .743 .734 .859
7
.511 .587 .639 .667 .736 .810
8
.642 .702 .741 .751 .856 .881 .818
1.0 .908 .877 .807 .896 .587 .702
Rather than display results for these statements in
sequence, we display comparable parts in groups.
First, parameter estimates in the covariance matrices
for the various structures (excepting "unstructured,,) are:
1.0 .917 .816 .743 .639 .741
1.0 .835 .734 .667 .751
1.0 .859 .736 .856
1.0 .810 .881
1.0 .818
1.0
simple
COV(Yijk,Yij,)
(;2 _ 0.267
2. compound symmetric
~
3.
_ 0.206
autoregressive
t;2 _ 0.266
P4.
autoregressive with
~
0.856
_ 0.185
random effect for
0; _0.083
patients
p _ 0.540
= 0 if k • !
5.
=(J2ifk=!
2. compound symmetric
1. simple
0; - 0.063
We now turn to PROC MIXED for analyses which
accommodate structures defined on the covariance
matrix. Define Yrk to be FEVl measured at HR = k on
PATIENT = j in DRUG = i. The structures in discussion
pertain to covariances among measures at different
hours on the same patient. Measures on different
patients are considered independent in all cases. We
consider five covariance structures, defined as follows:
1.
proc mixed data=fev1 uni; class drug patient time;
model fevl = fev1_0 drug time drug'time;
repeated I type = simple sub = patient r rcorr;
COV(Yijk,Yij,) = of if k • e
= ~+~ifk=e
1144
unstructured (parameter estimates shown in Table
2.)
The covariance matrices resulting from the "R" and
"RCORR" options in the repeated statements are
printed in Table 2. These correlation matrices, except
for structure 4, "autoregressive with random effect for
patient," are directly comparable with the correlation
matrix in Table 1. Ignore structure 4 momentarily and
compare the other correlation matrices in Table 2 with
the correlation matrix in Table 1. Structures 1, "simple,"
and 2, "compound symmebic," clearly do not reflect the
trends in Table 1. Structure 3, "autoregressive," has
the general trend of correlations decreasing with length
of time interval, but the values of the correlations in the
autoregressive structure are too small, especially for
long intervals. Thus none of the first three correlation
structures appear to adequately model the correlation
structure of the data. Structure 5, "unstructured,"
shows correlations which are very similar to the partial
correlations in Table 1. The "unstructured" type is
adequate, and would, in fact, be quite satisfactory for
this example. Computing time can by excessive with a
large number of times. The structure must be modeled
by fewer parameters in order to be useful with a small
number of patients.
5. Unstructured
1
.891
;l
~
.2
§
Z
§
.267
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
~
.2
§
Z
~
.259 .233 .243 .220 .181 .156 .195
.880 .908 .254 .252 .219 .191 .168 .204
.784 .892 .915 .299 .240 .204 .190 .226
.688 .807 .813 .822 .286 .232 .204 .247
.675 .698 .745 .735 .855 .258 .214 .245
.516 .590 .643 .670 .733 .812 .270 .233
.646 .701 .742 .755 .845 .882 .820 .299
(Vanances on diagonal, CQII'artances above diagonal, corn:latlons below diagonal)
Now turn to structure 4, "autoregressive with random
effect for patient." As noted above, the covariance
estimates in Table 2 for structure 4 do not produce
correlation estimates which are directly comparable
with the correlation structure in Table 1 that we are
trying to model. The problem is that covariances in
Table 2 for structure 4 do not contain the patient
variance component, ~. In order to obtain correlation
estimates for structure 4 which are comparable to
correlations in Table 1, we must add the patient
variance component estimate equal to 0.185 to each
covariance. Doing so results in the covariance matrix
and corresponding correlation matrix in Table 3.
Inspection of Table 3 shows good agreement with
correlations in Table 1.
1. Simple
~
;l
.226 .216 .211 .204 .175 .163 .128 .168
Table 2. Covariances and Correlations from R
and RCORR Options on REPEATED Statement
in PROC MIXED for Five Covariance Structures
1
~
(CoYatlance aM CQY<Inances In top line, COtfelations In bottQm line)
Table 3. Covariances and Correlations for
Covariance Structure 4, "Autoregressive with
Random Effect for Patient"
2. Compound Symmetric
1
~
;l
~
.2
§
Z
§
.269 .206 .206 .206 .206 .206 .206 .206
1
1.0 .766 .766 .766 .766 .766 .766 .766
~
;l
~
.2
§
Z
~
.268 .230 .209 .198 .192 .189 .187 .186
(CoYariance and c:awriance:s In top line, conelations In bettom line)
1.0 .858 .780 .739 .716 .705 .698 .694
3. Autoregressive (1)
(V...i:allce and CQv,iiuiilnce:$ In top line.
1
~
;l
~
.2
§
Z
.266 .228 .195 .167 .143 .123 .105 .090
bQttom
ik'Ie)
4. Autogressive (1) with Random Effect for
Patient
1
~
;l
~
.2
§
In botWm line)
Akaike's Information Criterion (AlC) and Schwarz's
Bayesian Criterion (SBC) are indices of relative
goodness of fit of estimated covariance structures.
Both of these criteria are log fikelihood values penalized
for the number of estimated parameters. The larger
the AlC or SBC values, the better the fit. AIC and SBC
values for the five covariance structures are shown in
Table 4. Structure 5, "unstructured," has the largest
AlC, but Structure 4, "autoregressive with random effect
for patient," has the largest SBC. For this example,
SBC yields the best comparison of model fits because
1.0 .856 .733 .629 .538 .461 .394 .338
(C'..o¥arianc:e and CI7Arianoes In top line, correlations In
~
§
Z
§
.083 .045 .024 .013 .007 .004 .002 .001
1.0 .540 .291 .157 .085 .046 .025 .013
{CownallCe and (;O¥';Irtances in top line, COITeIatiOns In bottom line)
1145
it reflects the large number of parameters in structure
5. Based on inspection of the correlation estimates in
Tables 3 and 1 and the relative values of SBC, we
conclude that structure 4 "autoregressive with random
effect for patient" is the best choice of covariance
structure.
Table 5. Estimates and Standard Errors for
Five Covariance Structures
1. Simple
Earameter
Table 4. Akaike's Information Criterion (AIC)
and Schwarz's Bayesian Criterion (SBC) for Five
Covariance Structures
AIC
1. simple
SSC
-459.5 -461.6
2.
compound symmetric
-175.6 -179.9
3.
autoregressive
-139.5 -143.8
4.
autoregressive with random -126.5 -132.9
effect for patient
5.
unstructured
-110.1
Estimate
Std. Error
tim1-tim8 drga
0.61541667
0.14909825
tim2-tim8 drga
0.53875000
0.14909825
tim 7-tim8 drga
0.01041667
0.14909825
drgb-drga tim1
0.21844510
0.14909825
2. Compound Symmetric
Earameter
-187.7
Estimate
Std. Error
tim1-tim8 drga
0.61541667
0.07252978
tim2-tim8 drga
0.53875000
0.07252978
tim7-tim8 drga
0.01041667
0.07252978
drgb-drga tim1
0.21844510
0.14984024
3. Autoregressive
COMPARISON
ESTIMATES
OF
STANDARD
ERRORS
OF
Parameter
In the previous section we observed the correlation and
covariance matrices produced by five choices of
structure. In this section we shall observe the effect of
these choices on certain parameter estimates and their
standard errors. We select a set of four comparisons
among means and illustrate with ESTIMATE
statements.
The first three comparisons are
differences between times at DRUG = A. The fourth is
a comparison of drugs a and b at time 1.
estimate 'tim1-tim8 drg a'
estimate 'tim2-tim8 drg a'
estimate 'tim7-tim8 drg a'
estimate 'drga-drgb tim l'
Estimate
Std. Error
tim1-tim8 drga
0.61541667
0.12114983
tim2-tim8 drga
0.53875000
0.11584838
tim 7-tim8 drga
0.01041667
0.05640908
drgb-drga tim1
0.21792809
0.14894036
4. Autoregressive with Random Effect for
Patient
Estimate
Std. Error
tim1-tim8 drga
0.61541667
0.08265180
tim2-tim8 drga
0.53875000
0.08217230
tim 7-tim8 drga
0.01041667
0.05643158
drgb-drga tim1
0.21824915
0.14944556
Earameter
time 1000000-1
drug'time 10000
-1;
time 0 1 0 0 0 0 0 -1
drug'time0100000-1;
time 0 0 0 0 001 -1
drug'time 0 0 0 0 0 0 1 -1;
drug 1 -1 0
drug'time-1 0000000
10000000;
on
5. Unstructured
Parameter
run;
Parameter estimates and standard errors from running
these four ESTIMATE statements with each of the five
covariance structures are in Table 5.
We first discuss the results from the first three
ESTIMATE statements; that is, those gMng differences
between times 1 and 8, 2 and 8, and 7 and 8. It is
perhaps surprising that exactly the same values of the
three estimates are obtained from each of the first
covariances. This will not happen in all cases if data
Estimate
Std. Error
tim1-tim8 drga
0.61541667
0.08880390
tim2-tim8 drga
0.53875000
0.08368152
tim7-tim8 drga
0.01041667
0.06559564
drgb-drga tim1
0.21879074
0.13739961
are unbalanced, nor will it happen if polynomial trends
are used to model time effects. In this example the
data are balanced and time is treated as a discrete
1146
Enhancements, Release 6.10, Cary, NC. SAS Institute,
Inc., 1994, 121 pp.
factor.
However, distinctions between the five
covariance structures appear in the standard errors of
the estimates. Each different structure results in
different standard error estimates. Structure number 1,
"simple; treats the data as if all observations were
independent with the same variance. This results in
equal standard error estimates for all differences
between time means at the same drug and differences
between drug means at the same time. These are
incorrect, because this structure clearly is inappropriate.
Structure number 2, "compound symmetric,"
acknowledges between-patient variation as being
greater that within-patient variation. This results in
standard errors for the first three within-patient
contrasts being smaller than the standard error for the
fourth between patient contrast, which is appropriate.
But compound symmetry does not accommodate
different standard errors of differences between times
being dependent on the length of the time interval.
Structure number 3, "autoregressive," results in
standard errors between times which depend on the
length of the time interval. The standard error is 0.1211
for the cfdference between times 1 and 8, 0.1158 for the
difference between times 2 and 8, etc, down to 0.0564
for the difference between times 7 and 8. If the
autoregressive structure is correct, then these
estimates of standard errors should be estimates of the
same quantities provided by the structure number 5
("unstructuredj estimates. The structure number 5
estimates range from 0.0888 for the difference between
times 1 and 8 down to 0.0656 for the difference
between times 7 and 8. Thus the autoregressive
estimates appear too large for long time intervals (times
1 to 8) and too small for short time intervals (times 7 to
8). Finally, we examine the standard errors provided by
structure number 4, "autoregressive with random effect
for patient." We see that these estimates are quite
similar to the structure 5 estimates.
In conclusion, similarity of structure 4 estimates to
structure 5 estimates and the fact that the Schwarz
Bayesian Criterion was largest for structure 4 leads us
to prefer structure 4.
For additional information on covariance modeling
possibilities, see SAS Technical Report P-229,
SAS/STAT Software: Changes and Enhancements,
Release 6.07, and SAS/STAT Software: Changes and
Enhancements, Release 6.10.
REFERENCES
SAS Institute Inc., SAS Technical Report P-229,
SAS/STAT Software: Changes and Enhancements,
Release 6.07, Cary, NC. SAS Institute Inc., 1992,620
pp.
SAS Institute Inc., SAS/STAT Software: Changes and
1147