Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Controlling for Group Differences and Counting Rare Events: Propensity Scores with PROC CATMOD and Poisson Regression with PROC GENMOD STEFANIE J. SILVA, LEWIN-TAG, INC. LORI POTTER, LEWIN-TAG, INC. ABSTRACT In a retrospective study comparing treatments for hypercholesterolemia and their impact on economic and clinical outcomes, the application of two SAS procedures is demonstrated. In this atypical analysis, PROC CATMOD was used to create propensity scores to control for systematic differences between treatment groups. We compare this method, which uses patient characteristics to predict group membership, to the more common approach of including a large number of separate covariates in the regression model Analysis of rare clinical events was done using PROC GENMOD to perform Poisson regression. We show how we derived adjusted means with this procedure. INTRODUCTION In comparing multiple treatments in nonrandomized patient populations, controlling for characteristics such as demographics, comorbid conditions, coincidental medications, and other factors is crucial to identifying real treatment effects. An alternative to including these variables as separate covariates in an ANCOVA model is to develop a propensity score for each patient, and use it to control for group differences (Rosenbaum and Rubin, 1983) Propensity scores summarize the characteristics in a set of classification variables in a way that reduces bias between groups. PROC CATMOD provides methods for modeling a categorical response variable as a function of one or more continuous or categorical variables. These response functions can be mean scores, cumulative logits, or marginal probabilities. When outcomes or events are very rare, it may be appropriate to estimate probabilities of these events occurring using the Poisson distribution. PROC GENMOD provides a method for performing Poisson regression via a generalized linear model, an extension of the traditional linear model with applications to a broader range of data analysis situations. GENMOD uses links to different response functions and different distributions. For Poisson regression, the Poisson distribution is specified with the log link function. DATA ANALYSIS In order to compare five different treatment regimens for hypercholesterolemia, two cohorts of patients were analyzed. The first cohort, the primary prevention group,. consisted of patients with a diagnosis of hypercholesterolemia. In addition, these patients had no history of the following clinical events: stroke, myocardial infarction, hospitalization due to unstable angina, or revascularization procedure. These patients were studied for a minimum of six months to a maximum of three years; individual observation times (total time enrolled) were recorded. Compliance times (total time treated) were also calculated. The second cohort, the secondary prevention group, was composed of patients with post-MI syndrome and/or one of the other specified events: stroke, myocardial infarction, hospitalization due to unstable angina, or revascularization procedure. The same timeframe applied to secondary prevention patients. Approximately 23,000 patients were included in this study; about 19,000 were primary prevention patients. About half of the primary prevention group fell into the untreated group. Comparisons were done for two specific outcomes. The first set of analyses looked at cost differences among non-Medicare patients in the five treatment groups. The second set of analyses evaluated clinical outcomes. Both analyses were characterized by the use of propensity scores to 217 control for group differences; only the clinical events analysis employed Poisson regression. Preliminary regression analyses using backwards selection identified comorbidities and coincidental medications that impacted resource use (significance level 0.05). These were included as dummy variables in the CATMOD models. They include typical conditions for this patient group, such as diabetes (diab), high blood pressure (hibp), arteriosclerosis and related heart disease (arte), arthritis (arth), and non-specific or illdefined chest pain (pain). We also included significant additional comorbidities (comorb1comorb5) that were less common, but strongly predictive of resource use. Seven specific types of non-treatment medications that were associated with resource use were also included (rx1-rx7). PROPENSITY SCORES Patients who take drugs to control high cholesterol may differ from each other and also from patients who are untreated; these differentiating factors may affect both the risk of certain cardiovascular events and also medical costs. As a preliminary step for testing this hypothesis, we calculated propensity scores, which were used to control for confounding factors. The score represents the likelihood that a patient in a particular prevention cohort will be prescribed a particular treatment regimen. For this study, propensity scores were calculated as a function of several factors: age, sex, health insurance, comorbidities, and other coincidental drugs. We applied the categorical modeling techniques using generalized logit models available in CATMOD. The code used to perform this analysis is shown below. FIG.1. proc catmod data = mylib.probs ; model trtqrp = sex aqecat insur diab hibp arte arth pain comorbl-corm orbS rxl-rx7 I nodesign noprofile; response loqit I out = mylib.props cr; run; The output dataset from the CATMOD procedure {mylib.propscr) includes several observations for each unique combination of the independent 218 variables. Some of the output data is shown below. FIG.2. s N T A u s R MT M E X p y B 0 G L P E B R E E R s E R p p • 0 R R • • E D D s D p 1 FUNCTION 1 -2.94444 0.59235 -2.65396 0.09288 -0.29048 1 FUNCTION 2-2.65676 0.51725 -1.'7711573 0.07076-0.9800 3 l FUNCTION 3 -4.04305 1.00873 -3.79941 0.14922 -0.24364 1 FUNCTION 4 DRU<iA 1 PROS 1 -3.06348 0.11226 0.04615 0.02602 . 0.05377 0.00463 -0.00762 DRUGB 1 PP.OB 2 0.06154 0.02981 0.12928 0.00778 -0.06775 DRUGC l PROB 3 0.01538 0.01527 0.01710 0.00249 -0.00172 DRUGD 1 PRO! 4 0.00000 0.00000 0.03570 0.00381 -0.03570 DRUGE 1 PROD 5 0.87692 0.04075 0.76413 0.01007 -0.11279 We wanted to keep the profile information, plus the information associated with the probabilities of each treatment type LPRED_, _SEPRED_), so we created a temporary dataset limited to these observations (where _TYPE_ PROB). = In order to run the regressions controlling for propensity scores, we had to be able to merge that data back into our analysis file. Because we have twenty independent variables, we created a 'profile' variable: FIG.3. profile = sex I I aqecat I I insur I Icomorbl I I comorb2 I I comorb3 I I comorb4 I I comorbS I I diab I I hibp I I arte I I arth I I pain I I rxl I I rx2 I I rx3 I I rx4 I I rxS I I rx6 I I rx7 I I; We transposed our observation per profile: data to obtain one FIG. 4. proc transpose data = temp out = transp; var _pred__sepred_; id trxgrp; by profile; run; data propscr (drop= name); merge transp (where= (=name_= '_pred_') rename = (druga = dapred drugb = dbpred drugc = dcpred drugd = ddpred druge = depred)) transp (where~ (name = 'sepred ') rename-= (druga = dasepred drugb = dbsepred drugc = dcsepred drugd = ddsepred druge = desepred) ) ; by profile; run; associated with multiple comparison tests of differences between the means. This approach allowed us to minimize the effects of some extremely high costs experienced by a handful of patients. Analysis of log costs corresponds to analysis using geometric means. Furthermore, we describe the percentage difference between each pair of comparisons (i.e., one treatment regimen relative to another (Table 3)). TABLE2 Type of Cost and Drug Regimen I Adjusted Mean The results obtained were a set of probabilities and standard errors for each unique patient Medical Svcs profile in the dataset. These probabilities, which Costs reflect the distribution of patients by treatment, Drug A 14,302 are merged back into the full dataset by the DrugS 11,643 profile variable. As a result, each patient record 17,247 Drug C now contained values associated with probable 12,224 group membership for each treatment group. An Drug D example of a profile and associated probabilities Category E- 14,651 no treatment of each treatment type is shown below: . TABLE 1. -PATIENT PROFILE 1. (FEMALE, <40 YEARS, PRIVATE INSURANCE, NO COMBORBIDITIES NO DRUGS = 01 TABLE3 Relative to DrugB p % A 8 Thus, patients with different characteristics, for instance, male patients over 60 years with one or more comorbidities, would be associated with a different set of probabilities. Comparisons of analyses conducted with propensity scores to those done without propensity scores allowed us to gauge the impact of adjusting our models for each patient's probability of falling into each of the treatment groups based on demographic and medical characteristics. In our ANOVAs comparing costs among the five treatment groups, we calculated a least-squares mean for the actual costs and for the log of the costs and included the 95% confidence interval for each mean (reported in Table 2). In order to compare costs among treatment regimens, we used the log of the costs and reported the p-values c 0 5.1% Geometric Arithmetic Ad~~sted ·Mean 95% Cl I 95%CI I I (10,038-18,566) 2, 728 (2,496-2,982) {8, 226-15, 060) 2,595 {2,417-2,787) {10,355-24,138) 2. 762 (2. 392-3. 189) I {6,683-17,764) 2,820 (2,512-3,166) {12,653-16,649) 2. 416 (2. 317-2. 519) I Relative to Relative to Relative to DrugC p % OrugD p % OrugE 0.38 -1.2% -6.0% % p 0.88 -3.3% 0.65 12.9% 0.018 0.44 -8.0% 0.22 -2.1% 7.4% 0.10 0.82 14.3% 0.082 16.7% 0.015 COUNTING RARE EVENTS In addition to determining whether or not different treatment regimens resulted in different costs, we also examined the incidence of specific clinical events. All events that occurred within 30 days of each other were counted as a single event. These events were defined based on a predetermined order of precedence, in the case of more than one diagnosis: revascularization, stroke, acute Ml, unstable angina. Because of the relatively low probability of one of these events actually occurring - particularly in the primary cohort - Poisson regression was used to analyze the event data. 219 These models also adjusted for propensity scores, ambulatory care group (ACG), observation time, and compliance time. All patients, including Medicare-eligible patients were included in these analyses. In other SAS procedures, such as PROC GLM and PROC MIXED, the LSMEANS option allows calculation of adjusted means. In PROC GENMOD, there is no similar option available. In our study, we wanted to be able to report the adjusted mean outcome for each of our treatment groups, so we devised a method that we describe below. Adjusted means (analogous to least-squares means) were calculated by holding covariates at their respective means and setting up prespecified contrasts for each treatment group versus the other. The calculation of the adjusted means was accomplished through an additional programming step. We obtained means for all of the independent variables and saved them to a temporary dataset. Within that dataset, we set dependent variables to missing, created five dummy observations (one for each treatment group), and appended the fiVerecord dataset to our patient dataset. Thus, when we ran our Poisson regression we have five additional "patients"; the predicted values for these observations represent the adjusted means. There are other ways of calculating the adjusted means; this is perhaps the simplest. The SAS code we used is included below: FIG.5. data temp; set propscr; *** set trx group dummies ***; trxqrp a trxqrp:b trxqrp c trxqrp:d = = = = 0; O; O; O; if trxqrp = 'DRUGA' then else if trxgrp = 'DRUGB' else if trxgrp = 'DRUGC' else if trxgrp = 'DRUGD' trxqrp a= 1; then trxqrp_b = 1 then trxqrp c = 1 then trxgrp:d = 1 run; proc means data = temp nway; var dapred dbpred dcpred ddpred output out = mtemp mean = run; 220 data mtemp; set mtemp; *** set dependents to missing ***; events = • ; rent = • , acnt • ., mcnt == • ; sent= .; *** create 5 dummy cbs ***; trxqrp a = 1; trxqrp-b = trxqrp-c = trxqrp:d = output; trxqrp_a • trxqrp_b = trxqrp_c = trxgrp_d = output; trxgrp_a = trxgrp_b = trxgrp_c = trxgrp_d = output; trxqrp_a = trxgrp_b = trxqrp c = trxgrp:d 0; 0; 0; 0; 1; 0; 0; 0; 0; 1; 0; 0; 0; 0; = J; output; trxgrp_a = trxgrp b = trxgrp:c = trxgrp_d = output; run; 0; 0; Q_; 0; data rarevnt; set temp (keep = events rent acnt sent mcnt trxqrp a trxgrp b trxgrp c trxgrp d dapred dbpred dcpred ddpred acg pritime comply) mtemp; run; The PROC GENMOD statements for the model are shown in Figure 6. We used the 'make' statement to write our results to a dataset, and then used PROC PRINT to look at our five observations containing adjusted means. FIG.6. proc genmod data = rarevnt; model event = trtgpa trtgpb trtgpc trtgpd dapred dbpred dcpred ddpred acg pritime comply I dist = poisson 1ink=log obstats; contrast •a-b' trtqpa 1 trtgpb -1; contrast •a-c' trtgpa 1 trtqpc -1; contrast •a-d' trtgpa 1 trtqpd -1; contrast 'b-e' trtqpb 1 trtgpc -1; contrast 'b-d' trtqpb l trtqpd -1; contrast 'c-d' trtgpc 1 trtgpd -1; make 'obstats' out•obevent noprint; run; *** print the adjusted means ***; proc print data = obevent; where event = • , run; Sample results are shown below for total clinical events (any event, regardless of type). Adjusted means and 95% confidence intervals are reported in Table 4. Table 5 summarizes the results of the multiple contrasts. TABLE4. Type of Event and Drug Regimen Adjusted Mean 95% Cl Total Events Drug A 0.0194 (0.0158-0.0238) Drug B o. 0162 (0.0135-0.0195) DrugC 0.0199 (0.0143-0.0278) Drug D 0.0183 (0.0138-0.0243) Category E - no treatment 0.0081 (0.0068-0.0097) TABLE 5 Relative to DrugB x.• Relative to Relative to Relative to DrugC DrugD DrugE x.• x.• p A 2.66 0.10 0.02 o.aa 0.12 0.73 54. 10 <0. 001 1.26 0.26 0. 65 0.42 39.24 <0.001 0.16 0.69 24. 15 <0. 001 B c D We found that PROC GENMOD in SAS/STAT easily allows a programmer to perform a Poisson regression analysis. REFERENCES Koch GG, Atkinson SS, Stokes ME. Poisson Regression. Found in: Encyclopedia of Statistical Sciences. Kotz S, Johnson NL, Read CB, Eds. New York. John Wiley & Sons, Inc. 1986. p p The application of Poisson regression to the clinical event data also yielded interesting results. While the unadjusted means show some differences among the groups, and little difference between the treated groups and the untreated group (range 1.4 - 3.0%), . the regression results demonstrate virtually no difference among treated groups (1.6-2.0%), but a drop or the untreated group (< 1%). These results reflect a more plausible clinical scenario. x.• p 26.66 <0.001 DISCUSSION We found that the inclusion of propensity scores yielded more conservative results for the comparison of the untreated group (E) to each of the treated groups. Much more interesting were the comparisons among the treatment groups. In many cases the direction of the difference between one treatment and another changed when propensity scores were included. The overall result of including propensity scores was a more consistent pattern when the analysis was repeated for different time periods and with Medicare patients included. The comparison of the amount of difference between treatments was strengthened by the inclusion of propensity scores. The actual programming required to develop the propensity scores using PROC CATMOD was relatively simple, and the payoff in clarification of results was substantial. Littell, Ramon C, Freund, Rudolf J, Spector, Philip C. SAS System for Unear Models, Third Edition. Cary, NC. SAS Institute Inc. 1991. Rosenbaum PR, Rubin DB. The central role of propensity scores in observational studies for causal effects. Biometlika. 1983;70:41-55. SAS Institute Inc. SASISTAT Software, Changes and Enhancements through Release 6.11. Cary, NC. SAS Institute Inc. 1990. SAS Institute Inc. SASISTAT Users Guide, Version 6, Fourth Edition: Volume 2. Cary, NC. SAS Institute Inc. 1990. ACKNOWLEDGMENTS SAS and SAS/STAT software are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. AUTHOR CONTACT Stefanie Silva, Statistical Analyst [email protected] Lori Potter, Senior Statistical Analyst [email protected] Lewin-TAG, Inc. 490 2"d St., Suite 201 San Francisco, CA 941 07 {415} 495-8966 {phone} {415} 495-8669 {fax} 221