Download Generalized Estimating Equations for Depression Dose Regimes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Polysubstance dependence wikipedia , lookup

Bad Pharma wikipedia , lookup

Theralizumab wikipedia , lookup

Bilastine wikipedia , lookup

Transcript
Generalized Estimating Equations for Depression Dose Regimes
Karen Walker, Walker Consulting LLC, Menifee CA
Generalized Estimating Equations on the average produce consistent estimates of the regression coefficients and
variances under weak assumptions about the actual correlation as the number of treatments becomes large. Use
the GEE to:
ο‚· Relate the marginal response with a link function, for example the log of odds.
ο‚· Specify the variance function.
ο‚· Test the data to choose a working correlation matrix.
Compute an initial estimate of 𝛽, for example with an ordinary generalized linear model assuming independence.
Compute the working correlation matrix 𝑅𝑖 . Compute an estimate of the covariance matrix Update 𝛽 Compute
residuals and update 𝑉𝑖 Iterate until convergence
ABSTRACT
While working on a study for a depression drug, I came across drug administration data by where
subjects had no real treatment definition. Subject dose regimes consisted of scenarios like: 10 mg per
day, 10mg + 20mg +40mg per week, or 20mg + 10mg + 10mg... And so on.
At first glance, I thought "there has to be missing data and those regimes will have a definition later.”
However, what if there's a need to process this data just as it is? How can this be done? Will it make
sense to define a response level for each cumulative logits of the dose values over the course of a study,
then fit it to a Proportional Odds model? In this paper I will demonstrate how cumulative logits are affect
in the same ways using a parallel slopes test. We will use this information to see if the log cumulative
odds are proportional, and discover the influence of explanatory variable, and find the points where
regression lines "connect the dots" for a single continuous explanatory variable.
INTRODUCTION
So what is depression? According to WEBMD Clinical Depression is diagnosed when a change in one or more
chemicals in the brain cause abnormal brain function. Since the cause of depression can be a mix of chemicals,
there is no single disease to treat like say an infection, chronic pain, or even cancer. To treat depression we must
examine all the risk factors that will change the brain’s chemical balance, for example GENDER because of
chemicals introduced in the brain for women during pregnancy and menopause, AGE because as people grow older
chemicals are elevated in the brain because of the grief and trauma that are experienced with age. Health conditions
like cancer, heart disease, being overweight or chronic pain are the biggest complaints in person that are diagnosed
as clinically depressed. Anyone can suffer clinical depression due to physical emotional abuse or violence. And
other stressful events that cause clinical depression are moving, marriage, divorce, new baby, or a new job can
cause clinical depression symptoms that are more than just sadness. What makes treating clinical depression
tricky, and what makes this paper so interesting is that where millions of are people suffer from clinical depression
there’s no definite way to treat it. How can we know when clinical depression is cured when some subjects have a
clear sense of why they became depressed, and other subjects don’t know when and where it happened? For the
next 20 minutes or so we’ll explore the many ways to treat it to see if we can uncover the best way.
DOSE REGIMES FOR DEPRESSION
First we will recognize that depression has to be managed on a daily basis at the very least to cover those subjects
that didn’t know when or where it happened. We’ll build a Trial Arm dataset (TA) that contains the data points
mentioned above and a few others so we can measure the effects for SEX, AGE, OBESITY, SUBSTANCE USE,
DRUG, DOSE, and RESPONSE. Where response is β€œBetter=4”, β€œSlightly Better=3”, β€œNo Change=2”, β€œSlightly
Worst=1”, β€œWorst=0”. This depression dataset will contain all the subject records to be analyzed, and assign a
sequential number each time that subject takes a dose, so that sequential number will be both the visit and the
number of times the subject had treatment. Note: we also want to deal with over eating, too much sugar, and
substance use of alcohol, tobacco, or caffeine, so we’ll have obesity and substance uses as covariates.
1
<Generalized Estimating Equations for Depression Dose Regimes>, continued
There are over 30 known drugs available for the treatment of clinical depression, and many subjects; at least 200,
surveyed admit to taking more than one kind of drug when dealing with the sadness they experienced. We’ll consider
12 depression dose regimes for discussion here.
This kind of complex data distribution can be fitted to a generalized linear model because it allows for response
variables that have arbitrary distributions and for an arbitrary link function as well.
The Generalized Linear Model or (GLM) relates a mean response to a vector of explanatory variables through a link
function. However where a regular Linear Model works best for a simple normal distribution, the Generalized Model
allows for an arbitrary distribution on the response variable. So the link function can have assorted shapes.
The GLM consists of three elements:
1.
A probability distribution from the exponential family.
2.
A linear predictor Ξ· = XΞ² .
3.
A link function g such that E(Y) = ΞΌ = gβˆ’1(Ξ·).
Generalized Estimating Equations were introduced by Liang and Zeger in 1986 as a method of handling correlated
data that can be modeled as a Generalized Linear Model for health outcomes (longitudinal studies) or litters
(clustered data).
Generalized estimating equations are an extension of GLMs to accommodate correlated data; they are an extension
of quazi-score equations.
The GEE approach models a known function for the marginal expectation of the dependent variable as a linear
function of one or more variables.
With quasi-likelihood, you can pursue statistical models by making assumptions about the link function and the
relationship between the first two moments, but without specifying the complete distribution of the response.
The GEE describes the random component with a common link and variance function.
The GEE accounts for the covariance structure of the correlated measures
So let
π‘Œπ‘–π‘— (j = 1 ….𝑛𝑖 , i= 1 …. K) Represent the jth measurement on the ith subject.
For our purpose,
j is the Dose Regimes and i is one dose for a Clinical Depression Subject.
There are 𝑛𝑖 Dose Regimens for one subject i and ….
βˆ‘π‘²
π’Š=𝟏 π’π’Š Total measurements
2
<Generalized Estimating Equations for Depression Dose Regimes>, continued
Here’s how to make it work.
Step 1
The generalized estimating equation for Ξ² is an extension of the GLM estimating equation:
𝐊
βˆ‘
𝐒=𝟏
𝛛𝛑′ βˆ’πŸ
𝐕 (𝐘𝐒 βˆ’ 𝛍𝐒 (𝛃)) = 𝟎
𝛛𝛃 𝐒
Where 𝛍 is the corresponding vector of means 𝛍
Y.
= [π›π’πŸ ,…,𝛍𝐒𝐧𝐒 ]’ and 𝐕𝐒 is an estimate of the covariance matrix
Step 2
The working correlation matrix 𝑅𝑖 (𝛼) is estimated as
π’“π’Šπ’‹ = π’šπ’Šπ’‹ βˆ’ ππ’Šπ’‹ βˆšπ’—(ππ’Šπ’‹ )
For using the current value of the parameter vector 𝛽 to compute the appropriate function of the Pearson residual.
Step 3
Specify the variance of Y by a covariance matrix modeled as
𝟏/𝟐
𝟏/𝟐
π‘½π’Š = βˆ…π‘¨π’Š π‘Ήπ’Š (𝜢)π‘¨π’Š
Where π‘¨π’Š is an π’π’Š
X π’π’Š diagonal matrix with V(ππ’Šπ’‹ ) as the jth diagonal element.
Step 4
Test the 𝛃
Μ‚ )’[Cπ‘½πœ· C’]βˆ’πŸ (C𝜷
Μ‚)
𝑸𝒄 =(C𝜷
Update 𝛃
𝝏𝝁′
π’Š
βˆ’πŸ
πœ·π’“+𝟏 = πœ·π’“ – [βˆ‘π‘²
π’Š=𝟏 𝝏𝜷 π‘½π’Š
Step 5
Compute residuals and update
𝑉𝑖 .
Step 6
Iterate until convergence.
3
πππ’Š βˆ’πŸ
]
𝝏𝜷
𝝏𝝁′
π’Š
βˆ’πŸ
[βˆ‘π‘²
π’Š=𝟏 𝝏𝜷 π‘½π’Š (Y - ππ’Š ) ]
<Generalized Estimating Equations for Depression Dose Regimes>, continued
DEPRESSION DATA
ID
SEX
AGE
OBESITY
Su
VISIT
Treatment
Regime
RESPONSE
101
Female
39
Yes
Alcohol
1
30mg
standard
2
101
Female
39
Yes
Alcohol
2
30mg
Second
time
standard
1
101
Female
39
Yes
Tobacco
3
30mg
new
1
101
Female
39
Yes
Sugar
4
30mg
standard
2
102
Male
27
No
Tobacco
1
10mg
standard
2
Table 1. The Depression data dataset looks something like this except with at least 12 dose regimes…
ID
Baseline
Visit_1
Visit_2
Visit_3
Visit_4
Visit_5
Visit_6
Visit_7
Visit_8…up
to 12
101
2
2
1
1
2
2
1
1
2
102
2
2
4
4
4
2
4
4
4
103
4
4
4
4
4
4
4
4
4
104
4
4
2
4
4
4
2
4
4
105
2
2
2
3
4
2
2
3
4
Table 2.Visit 1 is set to Baseline and after PROC transpose depression data (depr_t.sas7bdat) is read in for
analysis to temporary SAS dataset depres.
The depression data can be analyzed with a logistic regression using GEE. Create 12 observations per subject, one
for each visit.
Data depres(keep=(regime id treatment sex age obesity su visit: outcome));
Set depr_t;
visit=1;
outcome=visit_1;
output;
visit=2;
outcome=visit_2;
output;
visit=3;
outcome=visit_3;
output;
visit=4;
outcome=visit_4;
output;
visit=5;
outcome=visit_5;
output;
visit=6;
outcome=visit_6;
output;
visit=7;
outcome=visit_7;
output;
visit=8;
outcome=visit_8;
output;
visit=9;
outcome=visit_9;
output;
visit=10;
outcome=visit_10;
output;
visit=11;
outcome=visit_11;
output;
visit=12;
outcome=visit_12;
output;
run;
4
<Generalized Estimating Equations for Depression Dose Regimes>, continued
data depression;
set depres;
if outcome>=3 then dichot=1; else dichot=0;
if baseline>=3 then di_base=1; else di_base=0;
run;
GEE ANALYSIS
Using an exchangeable working correlation Matrix patients on either standard or new regimes are assigned to
treatment doses, with a response measured as worst, slightly worst, no change, slightly better, and better ( 0, 1, 2, 3,
4) . Subjects are measured at baseline and 12 visits. Response is slightly better, or better versus not.
Proc genmod data=depression descending;
Class id regime sex age obesity substance treatment visit;
Model dichot = treatment sex age regime di_base visit visit*treatment
Treatment*regime /
Link=logit dist=bin type3;
Repeated subject=id*regime / type=exch;
Run;
Model Information
Correlation Structure
Exchangeable
Subject Effect
Id*regimes (levels)
Number of Clusters
Equal to number of Subjects
Correlation Matric Dimension
12
Maxim Cluster Size
12
Minimum Cluster Size
12
If you need to include a numbered or an ordered list:
1.
The Type 3 analysis shows nonsignificant interaction terms..
2.
When interactions are removed visit remains nonsignificant.
3.
Patients on standard treatment have, on the average greater odds of better or slightly better response
.
The SAS PROC GEE procedure is now available in SAS / STAT, version 9.4. It supports generalized logits as well
as the ESTIMATE, LSMEANS, and OUTPUT statements. It also provides the LOGOR=option in the β€œRepeated”
statement for alternating logistic regression with an extension for ordinal data.
.
5
<Generalized Estimating Equations for Depression Dose Regimes>, continued
THE RESULTS
Patients on standard regime of 30mg have, on the average 𝑒1.2654 times greater odds of a slightly better of better
response that those patients on new regime of 10mg adjusted for the other effect in the model.
Output 1. Analysis of GEE Parameter Estimates
Empirical
Parameter
intercept
Standard
Error
Estimates
Estimate
Standard
Error
95%
Confidence
Limits
Z
Pr > |z|
-0.2066
0.5776
-1.3388
0.9255
-0.36
0.7206
-0.6495
0.3532
-1.3418
0.0428
-1.84
0.0660
0.31
0.7560
3.65
0.0003
Regime
Standard
Regime
New
0.0000
0.0000
0.0000
0.0000
sex
F
0.1368
0.4402
-0.7261
0.9996
sex
M
0.0000
0.0000
0.0000
0.0000
Treatment
30mg
1.2654
0.3467
0.5859
1.9448
Treatment
10mg
0.0000
0.0000
0.0000
0.0000
-0.0188
0.0130
-0.0442
0.0067
-1.45
0.1480
9
1.857
0.3460
1.1676
2.5238
5.33
<.0001
53
0.000
0.0000
0.0000
0.0000
age
di_base
obesity
Source: Fictitious data, for illustration purposes only
6
<Generalized Estimating Equations for Depression Dose Regimes>, continued
LET’S DO THAT AGAIN
Using an unstructured working correlation Matrix patients on either standard or new regimes are assigned to
treatment doses, with a response measured as worst, slightly worst, no change, slightly better, and better ( 0, 1, 2, 3,
4) . Subjects are measured at baseline and 12 visits. Response is slightly better, or better versus not.
Proc genmod data=depression descending;
Class id regime sex age obesity substance treatment visit;
Model dichot = treatment sex age regime di_base visit visit*treatment
Treatment*regime /
Link=logit dist=bin type3;
Repeated subject=id*regime / type=unstr;
Run;
Patients on standard regime of 30mg have, on the average 𝑒1.2442 times greater odds of a slightly better of better
response that those patients on new regime of 10mg adjusted for the other effect in the model.
Output 1. Analysis of GEE Parameter Estimates
Empirical
Parameter
intercept
Standard
Error
Estimates
Estimate
Standard
Error
95%
Confidence
Limits
Z
Pr > |z|
-0.2324
0.5763
-1.3620
0.8972
-0.40
0.6868
-0.6558
0.3512
-1.3442
0.0326
-1.87
0.0619
0.26
0.7981
3.60
0.0003
Regime
Standard
Regime
New
0.0000
0.0000
0.0000
0.0000
sex
F
0.1128
0.4408
-0.7512
0.9768
sex
M
0.0000
0.0000
0.0000
0.0000
Treatment
30mg
1.2442
0.3455
0.5669
1.9214
Treatment
10mg
0.0000
0.0000
0.0000
0.0000
-0.0175
0.0129
-0.0427
0.0077
-1.36
0.1728
9
1.8981
0.3441
1.2237
2.5725
5.52
<.0001
53
0.000
0.0000
0.0000
0.0000
age
di_base
obesity
Source: Fictitious data, for illustration purposes only
7
<Generalized Estimating Equations for Depression Dose Regimes>, continued
CONCLUSION
With GEE both β€œExchangeable” and β€œUnstructured” working correlation matrix yield results that are very close.
Many statisticians routinely use the independent structure because the parameter estimates and standard errors are
consistent even if the correlation structure isn’t correctly specified. Here the working correlation matrix are consistent
as well. With smaller number of treatments it is often better to use a simpler structure because that means fewer
parameters to estimate. With GEE even the more complex structures are simplified.
REFERENCES
Modeling Longitudinal Categorical Response Data: Stokes, Maura: (April 6, 2015) SAS Global Forum, Dallas , Texas(
2015).
Analysis of Longitudinal Data Diggle P.J., and Zeger, S.L. (1994) Oxford: Oxford Science <Copyright date>.
Methods for Massive, Missing or Multifaceted Data<Stokes Maura>. 2015. β€œ Proceedings of the SAS Global Forum
2015 Conference>. <Dallas, Texas>:Available at http://support.sas.com/resources/papers/proceedings09/TOC.html.
ACKNOWLEDGMENTS
Thank you to all my friends working with SAS year after year you are most kind. Bless you, and in God I trust.
RECOMMENDED READING
ο‚·
Base SAS® Procedures Guide
ο‚·
SAS® For Dummies®
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Name: Karen Walker
Enterprise: Walker Consulting LLC
Address: 26175 Sunnywood
City, State ZIP: Menifee, California 92586
Work Phone: (480)206-7196
Fax:
E-mail: [email protected]
Web:
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
8