Download Controlling for Group Differences and Counting Rare Events: Propensity Scores with PROC CATMOD and Poisson Regression with PROC GENMOD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Group development wikipedia , lookup

Adherence (medicine) wikipedia , lookup

Multiple sclerosis research wikipedia , lookup

Management of multiple sclerosis wikipedia , lookup

Transcript
Controlling for Group Differences and Counting Rare Events:
Propensity Scores with PROC CATMOD and
Poisson Regression with PROC GENMOD
STEFANIE J. SILVA, LEWIN-TAG, INC.
LORI POTTER, LEWIN-TAG, INC.
ABSTRACT
In a retrospective study comparing treatments for
hypercholesterolemia and their impact on
economic and clinical outcomes, the application
of two SAS procedures is demonstrated. In this
atypical analysis, PROC CATMOD was used to
create propensity scores to control for systematic
differences between treatment groups. We
compare this method, which uses patient
characteristics to predict group membership, to
the more common approach of including a large
number of separate covariates in the regression
model
Analysis of rare clinical events was done using
PROC GENMOD to perform Poisson regression.
We show how we derived adjusted means with
this procedure.
INTRODUCTION
In comparing multiple treatments in nonrandomized patient populations, controlling for
characteristics such as demographics, comorbid
conditions, coincidental medications, and other
factors is crucial to identifying real treatment
effects. An alternative to including these variables
as separate covariates in an ANCOVA model is to
develop a propensity score for each patient, and
use it to control for group differences
(Rosenbaum and Rubin, 1983)
Propensity scores summarize the characteristics
in a set of classification variables in a way that
reduces bias between groups. PROC CATMOD
provides methods for modeling a categorical
response variable as a function of one or more
continuous or categorical variables. These
response functions can be mean scores,
cumulative logits, or marginal probabilities.
When outcomes or events are very rare, it may
be appropriate to estimate probabilities of these
events occurring using the Poisson distribution.
PROC GENMOD provides a method for
performing Poisson regression via a generalized
linear model, an extension of the traditional linear
model with applications to a broader range of
data analysis situations. GENMOD uses links to
different response functions and different
distributions. For Poisson regression, the Poisson
distribution is specified with the log link function.
DATA ANALYSIS
In order to compare five different treatment
regimens for hypercholesterolemia, two cohorts of
patients were analyzed. The first cohort, the
primary prevention group,. consisted of patients
with a diagnosis of hypercholesterolemia. In
addition, these patients had no history of the
following clinical events: stroke, myocardial
infarction, hospitalization due to unstable angina,
or revascularization procedure. These patients
were studied for a minimum of six months to a
maximum of three years; individual observation
times (total time enrolled) were recorded.
Compliance times (total time treated) were also
calculated.
The second cohort, the secondary prevention
group, was composed of patients with post-MI
syndrome and/or one of the other specified
events:
stroke,
myocardial
infarction,
hospitalization due to unstable angina, or
revascularization procedure. The same timeframe
applied to secondary prevention patients.
Approximately 23,000 patients were included in
this study; about 19,000 were primary prevention
patients. About half of the primary prevention
group fell into the untreated group.
Comparisons were done for two specific
outcomes. The first set of analyses looked at cost
differences among non-Medicare patients in the
five treatment groups. The second set of analyses
evaluated clinical outcomes. Both analyses were
characterized by the use of propensity scores to
217
control for group differences; only the clinical
events analysis employed Poisson regression.
Preliminary regression analyses using backwards
selection identified comorbidities and coincidental
medications that impacted resource use
(significance level 0.05). These were included as
dummy variables in the CATMOD models. They
include typical conditions for this patient group,
such as diabetes (diab), high blood pressure
(hibp), arteriosclerosis and related heart disease
(arte), arthritis (arth), and non-specific or illdefined chest pain (pain). We also included
significant additional comorbidities (comorb1comorb5) that were less common, but strongly
predictive of resource use. Seven specific types
of non-treatment
medications
that were
associated with resource use were also included
(rx1-rx7).
PROPENSITY SCORES
Patients who take drugs to control high
cholesterol may differ from each other and also
from patients who are untreated; these
differentiating factors may affect both the risk of
certain cardiovascular events and also medical
costs.
As a preliminary step for testing this hypothesis,
we calculated propensity scores, which were used
to control for confounding factors. The score
represents the likelihood that a patient in a
particular prevention cohort will be prescribed a
particular treatment regimen. For this study,
propensity scores were calculated as a function of
several factors: age, sex, health insurance,
comorbidities, and other coincidental drugs.
We applied the categorical modeling techniques
using generalized logit models available in
CATMOD. The code used to perform this analysis
is shown below.
FIG.1.
proc catmod data = mylib.probs ;
model trtqrp = sex aqecat insur diab hibp
arte arth pain comorbl-corm orbS rxl-rx7
I
nodesign noprofile;
response loqit I out = mylib.props cr;
run;
The output dataset from the CATMOD procedure
{mylib.propscr) includes several observations for
each unique combination of the independent
218
variables. Some of the output data is shown
below.
FIG.2.
s
N
T
A
u
s
R
MT
M
E
X
p y
B
0
G
L P
E
B
R
E E
R
s
E
R
p
p
•
0
R
R
•
•
E
D
D
s
D
p
1 FUNCTION 1 -2.94444 0.59235 -2.65396 0.09288 -0.29048
1 FUNCTION 2-2.65676 0.51725 -1.'7711573 0.07076-0.9800 3
l FUNCTION 3 -4.04305 1.00873 -3.79941 0.14922 -0.24364
1 FUNCTION 4
DRU<iA 1 PROS
1
-3.06348 0.11226
0.04615 0.02602
.
0.05377 0.00463 -0.00762
DRUGB 1 PP.OB
2
0.06154 0.02981
0.12928 0.00778 -0.06775
DRUGC l PROB
3
0.01538 0.01527
0.01710 0.00249 -0.00172
DRUGD 1 PRO!
4
0.00000 0.00000
0.03570 0.00381 -0.03570
DRUGE 1 PROD
5
0.87692 0.04075
0.76413 0.01007 -0.11279
We wanted to keep the profile information, plus
the information associated with the probabilities of
each treatment type LPRED_, _SEPRED_), so
we created a temporary dataset limited to these
observations (where _TYPE_ PROB).
=
In order to run the regressions controlling for
propensity scores, we had to be able to merge
that data back into our analysis file. Because we
have twenty independent variables, we created a
'profile' variable:
FIG.3.
profile = sex I I aqecat I I insur I Icomorbl I I
comorb2 I I comorb3 I I comorb4 I I comorbS I I
diab I I hibp I I arte I I arth I I pain I I rxl I I
rx2 I I rx3 I I rx4 I I rxS I I rx6 I I rx7 I I;
We transposed our
observation per profile:
data
to
obtain
one
FIG. 4.
proc transpose data = temp
out = transp;
var _pred__sepred_;
id trxgrp;
by profile;
run;
data propscr (drop= name);
merge transp (where= (=name_= '_pred_')
rename = (druga = dapred drugb =
dbpred drugc = dcpred drugd = ddpred druge =
depred))
transp (where~ (name = 'sepred ')
rename-= (druga = dasepred drugb =
dbsepred drugc = dcsepred drugd = ddsepred druge =
desepred) ) ;
by profile;
run;
associated with multiple comparison tests of
differences between the means. This approach
allowed us to minimize the effects of some
extremely high costs experienced by a handful of
patients. Analysis of log costs corresponds to
analysis using geometric means. Furthermore, we
describe the percentage difference between each
pair of comparisons (i.e., one treatment regimen
relative to another (Table 3)).
TABLE2
Type of
Cost and
Drug
Regimen
I
Adjusted
Mean
The results obtained were a set of probabilities
and standard errors for each unique patient
Medical Svcs
profile in the dataset. These probabilities, which Costs
reflect the distribution of patients by treatment, Drug A
14,302
are merged back into the full dataset by the
DrugS
11,643
profile variable. As a result, each patient record
17,247
Drug C
now contained values associated with probable
12,224
group membership for each treatment group. An Drug D
example of a profile and associated probabilities Category E- 14,651
no treatment
of each treatment type is shown below:
.
TABLE 1. -PATIENT PROFILE 1.
(FEMALE, <40 YEARS, PRIVATE INSURANCE, NO COMBORBIDITIES
NO DRUGS
= 01
TABLE3
Relative to
DrugB
p
%
A
8
Thus, patients with different characteristics, for
instance, male patients over 60 years with one or
more comorbidities, would be associated with a
different set of probabilities.
Comparisons of analyses conducted with
propensity scores to those done without
propensity scores allowed us to gauge the impact
of adjusting our models for each patient's
probability of falling into each of the treatment
groups based on demographic and medical
characteristics.
In our ANOVAs comparing costs among the five
treatment groups, we calculated a least-squares
mean for the actual costs and for the log of the
costs and included the 95% confidence interval for
each mean (reported in Table 2). In order to
compare costs among treatment regimens, we
used the log of the costs and reported the p-values
c
0
5.1%
Geometric
Arithmetic
Ad~~sted
·Mean
95%
Cl
I
95%CI
I
I
(10,038-18,566) 2, 728 (2,496-2,982)
{8, 226-15, 060) 2,595 {2,417-2,787)
{10,355-24,138) 2. 762 (2. 392-3. 189) I
{6,683-17,764) 2,820 (2,512-3,166)
{12,653-16,649) 2. 416 (2. 317-2. 519) I
Relative to
Relative to
Relative to
DrugC
p
%
OrugD
p
%
OrugE
0.38 -1.2%
-6.0%
%
p
0.88 -3.3% 0.65 12.9% 0.018
0.44 -8.0% 0.22
-2.1%
7.4% 0.10
0.82 14.3% 0.082
16.7% 0.015
COUNTING RARE EVENTS
In addition to determining whether or not different
treatment regimens resulted in different costs, we
also examined the incidence of specific clinical
events. All events that occurred within 30 days of
each other were counted as a single event.
These events were defined based on a predetermined order of precedence, in the case of
more than one diagnosis: revascularization,
stroke, acute Ml, unstable angina. Because of
the relatively low probability of one of these
events actually occurring - particularly in the
primary cohort - Poisson regression was used to
analyze the event data.
219
These models also adjusted for propensity
scores,
ambulatory
care
group
(ACG),
observation time, and compliance time. All
patients, including Medicare-eligible patients were
included in these analyses.
In other SAS procedures, such as PROC GLM
and PROC MIXED, the LSMEANS option allows
calculation of adjusted means. In PROC
GENMOD, there is no similar option available. In
our study, we wanted to be able to report the
adjusted mean outcome for each of our treatment
groups, so we devised a method that we describe
below.
Adjusted means (analogous to least-squares
means) were calculated by holding covariates at
their respective means and setting up prespecified contrasts for each treatment group
versus the other. The calculation of the adjusted
means was accomplished through an additional
programming step.
We obtained means for all of the independent
variables and saved them to a temporary dataset.
Within that dataset, we set dependent variables to
missing, created five dummy observations (one
for each treatment group), and appended the fiVerecord dataset to our patient dataset. Thus, when
we ran our Poisson regression we have five
additional "patients"; the predicted values for
these observations represent the adjusted
means. There are other ways of calculating the
adjusted means; this is perhaps the simplest. The
SAS code we used is included below:
FIG.5.
data temp;
set propscr;
*** set trx group dummies ***;
trxqrp a
trxqrp:b
trxqrp c
trxqrp:d
=
=
=
=
0;
O;
O;
O;
if trxqrp = 'DRUGA' then
else if trxgrp = 'DRUGB'
else if trxgrp = 'DRUGC'
else if trxgrp = 'DRUGD'
trxqrp a= 1;
then trxqrp_b = 1
then trxqrp c = 1
then trxgrp:d = 1
run;
proc means data = temp nway;
var
dapred dbpred dcpred ddpred
output out = mtemp mean =
run;
220
data mtemp;
set mtemp;
*** set dependents to missing ***;
events = • ;
rent = • ,
acnt • .,
mcnt == • ;
sent= .;
*** create 5 dummy cbs
***;
trxqrp a = 1;
trxqrp-b =
trxqrp-c =
trxqrp:d =
output;
trxqrp_a •
trxqrp_b =
trxqrp_c =
trxgrp_d =
output;
trxgrp_a =
trxgrp_b =
trxgrp_c =
trxgrp_d =
output;
trxqrp_a =
trxgrp_b =
trxqrp c =
trxgrp:d
0;
0;
0;
0;
1;
0;
0;
0;
0;
1;
0;
0;
0;
0;
= J;
output;
trxgrp_a =
trxgrp b =
trxgrp:c =
trxgrp_d =
output;
run;
0;
0;
Q_;
0;
data rarevnt;
set temp (keep = events rent acnt sent mcnt
trxqrp a trxgrp b trxgrp c trxgrp d
dapred dbpred dcpred ddpred acg pritime comply)
mtemp;
run;
The PROC GENMOD statements for the model
are shown in Figure 6. We used the 'make'
statement to write our results to a dataset, and
then used PROC PRINT to look at our five
observations containing adjusted means.
FIG.6.
proc genmod data = rarevnt;
model event = trtgpa trtgpb trtgpc trtgpd dapred
dbpred dcpred ddpred acg pritime comply
I dist = poisson 1ink=log obstats;
contrast •a-b' trtqpa 1 trtgpb -1;
contrast •a-c' trtgpa 1 trtqpc -1;
contrast •a-d' trtgpa 1 trtqpd -1;
contrast 'b-e' trtqpb 1 trtgpc -1;
contrast 'b-d' trtqpb l trtqpd -1;
contrast 'c-d' trtgpc 1 trtgpd -1;
make 'obstats' out•obevent noprint;
run;
***
print the adjusted means ***;
proc print data = obevent;
where event = • ,
run;
Sample results are shown below for total clinical
events (any event, regardless of type). Adjusted
means and 95% confidence intervals are reported
in Table 4. Table 5 summarizes the results of the
multiple contrasts.
TABLE4.
Type of Event and
Drug Regimen
Adjusted
Mean
95%
Cl
Total Events
Drug A
0.0194
(0.0158-0.0238)
Drug B
o. 0162
(0.0135-0.0195)
DrugC
0.0199
(0.0143-0.0278)
Drug D
0.0183
(0.0138-0.0243)
Category E - no
treatment
0.0081
(0.0068-0.0097)
TABLE 5
Relative to
DrugB
x.•
Relative to
Relative to
Relative to
DrugC
DrugD
DrugE
x.•
x.•
p
A 2.66 0.10 0.02
o.aa
0.12
0.73
54. 10 <0. 001
1.26
0.26
0. 65
0.42
39.24 <0.001
0.16
0.69
24. 15 <0. 001
B
c
D
We found that PROC GENMOD in SAS/STAT
easily allows a programmer to perform a Poisson
regression analysis.
REFERENCES
Koch GG, Atkinson SS, Stokes ME. Poisson Regression.
Found in: Encyclopedia of Statistical Sciences. Kotz S,
Johnson NL, Read CB, Eds. New York. John Wiley &
Sons, Inc. 1986.
p
p
The application of Poisson regression to the
clinical event data also yielded interesting results.
While the unadjusted means show some
differences among the groups, and little
difference between the treated groups and the
untreated group (range 1.4 - 3.0%), . the
regression results demonstrate virtually no
difference among treated groups (1.6-2.0%), but
a drop or the untreated group (< 1%). These
results reflect a more plausible clinical scenario.
x.•
p
26.66 <0.001
DISCUSSION
We found that the inclusion of propensity scores
yielded more conservative results for the
comparison of the untreated group (E) to each of
the treated groups. Much more interesting were
the comparisons among the treatment groups. In
many cases the direction of the difference
between one treatment and another changed
when propensity scores were included. The
overall result of including propensity scores was a
more consistent pattern when the analysis was
repeated for different time periods and with
Medicare patients included. The comparison of
the amount of difference between treatments was
strengthened by the inclusion of propensity
scores.
The actual programming required to develop the
propensity scores using PROC CATMOD was
relatively simple, and the payoff in clarification of
results was substantial.
Littell, Ramon C, Freund, Rudolf J, Spector, Philip C. SAS
System for Unear Models, Third Edition. Cary, NC. SAS
Institute Inc. 1991.
Rosenbaum PR, Rubin DB. The central role of propensity
scores in observational studies for causal effects.
Biometlika. 1983;70:41-55.
SAS Institute Inc. SASISTAT Software, Changes and
Enhancements through Release 6.11. Cary, NC. SAS
Institute Inc. 1990.
SAS Institute Inc. SASISTAT Users Guide, Version 6,
Fourth Edition: Volume 2. Cary, NC. SAS Institute Inc.
1990.
ACKNOWLEDGMENTS
SAS and SAS/STAT software are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other
countries. ® indicates USA registration.
AUTHOR CONTACT
Stefanie Silva, Statistical Analyst
[email protected]
Lori Potter, Senior Statistical Analyst
[email protected]
Lewin-TAG, Inc.
490 2"d St., Suite 201
San Francisco, CA 941 07
{415} 495-8966 {phone}
{415} 495-8669 {fax}
221