Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Propensity Score Analyses: A good looking cousin of an RCT KCASUG Q1: March 4, 2010 Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO John House, MS Saint Lukes’s Hospital, Kansas City, MO Phil Jones, MS Saint Luke’s Hospital, Kansas City, MO Motivation • Estimating Treatment effect is important! – Is Drug “A” advantageous to Placebo? – Do same sex classes increase academic performance? – Do Titanium golf clubs increase distance of drives? • Designing ways to answer these questions should be: – Ethical – Practical – Cost Effective The Gold Standard • Randomized Control Trials – Randomization of subjects to treatment groups (essentially coin flip determines group) – On average all subject characteristics will be balanced between groups Treatment Control (n=100) (n=100) Age 57±3.2 57±3.1 .78 Male 57% 58% .65 History Diabetes 22% 22% .99 History 8% 9% .75 Heart Failure P-value Benefits of a RCT • A pure link between Treatment and Outcome – Random allocation of subjects removes the possibility of a third factor being associated with treatment and outcome • Can blind subjects and researchers to treatment allocation Potential Caveats with an RCT • Ethical Issues: – Not assigning subjects to a treatment generally thought to improve outcomes is often thought unethical • Practical Issues: – Problems with recruitment of subjects • Consenting to “alternatives”, and substantial ‘drop out’ – Cost and Time Issues: • Enrolling subjects, training staff, designing trial, treatment • May be “too” controlled – Specific subject criteria and treatment use – Population may not represent the “real world” experience Spaar A, Frey M, Turk A, Karrer W, Puhan MA. Recruitment barriers in a randomized controlled trial from the physicians' perspective: a postal survey. BMC Med Res Methodol. 2009 Mar 2;9:14 So…what now? • Observational data is popular – Treatment is not given due to randomization, only observed – Unfortunately…Subject characteristics will likely not be balanced Treatment Control (n=100) (n=100) Age 57±3.2 62±5 .031 Male 57% 42% .047 History Diabetes 22% 30% <.001 History 8% 15% ..035 Heart Failure P-value So…what now? • Need to account for the differences between treatment and control – Common in modeling to “adjust” away differences between groups • However, sample size constraints restrict the # of variables to adjust for • Solution: Propensity Scores Propensity Score Outline I. II. i. ii. Introduction How to use the score Matching Stratifying III. Accessing Balance i. Standardized Difference IV. Propensity Scores Using SAS V. Concluding remarks i. ii. Other uses Issues with publications Introduction • Definition: – Propensity score (PS): the conditional probability of being treated given the individual’s covariates – Notation: e( x i ) P ( Z i 1 | X i x i ) Where : Z i 1 if treatment and 0 if control and x i are observed covariates – Estimating Propensity Score can be done with the common logistic regression model predicting treatment on selected covariates needing balanced – Will be used to balance characteristics between groups Introduction Treatment Control P-value (n=100) (n=100) Age 57±3.2 62±5 .031 Male 57% 42% .047 History Diabetes 22% 30% <.001 History 8% 15% ..035 Heart Failure Here we would develop a PS for being in the treatment group conditioned on: age, gender, diabetes history, and heart failure Introduction-why important? • Important: For a specific value of the PS the difference between treatment and control is an unbiased estimate of the average treatment effect at that PS (Rosenbaum & Rubin, 1983; Theorem 4) • “Quasi-Randomized” experiment – Take 2 subjects (one from treatment and other control) with the same PS then you could “imagine” these 2 subjects were “randomly” assigned to each group. (since they are equally likely to be treated. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. Introduction • It’s not just a side analysis anymore…… # of publications with PS in Search PubMed 500 400 300 200 100 09 20 08 20 07 20 06 20 05 20 04 20 03 20 02 20 01 20 20 00 0 Ways to use the PS • Common strategies include: – Matching • Match treatment and controls on PS – Stratification • Keep all subjects but analyze in Strata (usually quintiles of PS) – Regression adjustment Matching • Most common use of PS analyses. – Since the PS is a single scalar quantity Matching is comparatively easier (as opposed to matching on: age, gender, history, etc…) – Matching 1 Control to 1 Treatment makes for an easily understood analyses – Common to match on the Logit of the PS since it is approximately normal 1 e( X ) L( x) log e( X ) Matching • Nearest Neighbor matching (w/o replacement) – Randomly Order Treated and Control Subjects – Take the first treated subject and find the Control with the closest Propensity Score. Remove both from list – Move to the second Treated subject and find control with closest PS……continue until you run out of treated patients • This will create a 1:1 match of treated and control patients – Note: methods exist for 1:many matches also Matching • Problem: The “Nearest” neighbor may not be that “Near” • May want to enforce a caliper width for acceptable matches – E.g. if there is no control within the ‘caliper’ of a case then no match occurs and case will be removed • Common in Literature to use: .2*stddev[L(x)] as the caliper • For a matching macro see: mayoresearch.mayo.edu/biostat/upload/gmatch.sas Matching: Ideal Scenario Treatment Control (n=543) (n=1598) Age 57±3.2 62±5 .031 Male 57% 42% .047 History Diabetes 22% 30% <.001 History 8% 15% ..035 Treatment Control P-value (n=500) (n=500) Age 57±3.2 57.3±3 .45 Male 57% 57% .88 History Diabetes 22% 23% .48 History 8% 7% .77 • Before Match P-value Heart Failure • After Match: Heart Failure Stratification • Matching will inevitably result in a smaller dataset • Stratifying analyses on PS will keep all data. – Create the PS – Cut the PS into equal groups (Quartile, Quintiles) • (Rosenbaum & Rubin, 1983) claim quintile strata will remove 90% of bias – Conduct the analyses within these strata Example • Comparison of Angiography (vs not) in elderly patients with Chronic Kidney Disease (CKD) – Propensity score for receiving an Angio • Based on Demographics, History, and Hospital Characteristics Propensity Quintile Group # of patients 1-year Mortality OR (95%CI) 1(0-.06) Angio 46 56.5% 1.02 (.56-1.84) No Angio 1307 56.2% Angio 133 36.8% No Angio 1221 50.7% Angio 303 34.7% No Angio 1051 44.7% Angio 557 30.7% No Angio 797 38.3% Angio 967 18.9% No Angio 387 34.1% Angio 2014 26.7% No Angio 4780 47.4% 2(.06-.16) 3 (.16-.30) 4 (.30-.54) 5 (.54-1) Overall .57 (.39-.82) .66 (.50-.86) .72 (.57-.90) .45 (.35-.59) .62 (.54-.70) Chertow GM, Normand SL, McNeil BJ. "Renalism": inappropriately low rates of coronary angiography in elderly individuals with renal insufficiency. J Am Soc Nephrol. 2004 Sep;15(9):2462-8 Covariate Adjustment • This use would be the least recommended. • Do a model for PS, and then use that PS in a model as an adjustment when evaluating association between treatment and outcome • Advantage over normal covariate adjustment – Simpler final model – Can have many more covariates in the PS model Assessing Balance • Remember: the main purpose of a PS is to balance characteristics between treated and controls…so how do we show success? • P-values – Function of Sample Size – May be misleading for Stratification or 1:many match • Standardized Differences – Not a function of Sample Size – Can be used for Stratification and 1:many matches Standardized Differences • Formula: Continuous Variables d 100 * x treatment x control 2 2 s treatment s control 2 • Formula: Dichotomous Variables d 100 * pˆ treatment pˆ control pˆ t (1 pˆ t ) pˆ c (1 pˆ c ) 2 • For Stratified analyses: compute d in each strata and take average Standardized Differences • Sample Calculations for a 1:1 match: • Before Match Age Treatment Control (n=543) (n=1598) 57±3.2 62±5 P-value d 100 * .031 57 62 3.2 2 5 2 119 2 • After Match Age Treatment Control (n=500) (n=500) 57±3.2 57.3±3 P-value .45 d 100 * 57 57 .3 3.2 2 3 2 2 .9 Standardized Differences • What value constitutes balance? – Peter Austin Commonly states values less than 10 constitute balance between groups – The closer to ‘0’ then more balanced Propensity Analysis (Matching) Using SAS • Simulated Data • Data specifics – N=5000 (~1000 Group1, ~4000 Group2) Group1 N=1011 Group2 N=3989 P-value Age 59.4 ± 4.0 63.5 ± 4.0 < 0.001 Male_Gender 560( 55.4% ) 2009 ( 50.4% ) 0.004 History of Diabetes 689 ( 16.9% ) 516 ( 21.4% ) < 0.001 Example: Create PS proc logistic data=dataset descending; model group1= age gender diabetes {+others}; output out=pred p=pred xbeta=logit; run; Predicted probabilities of being in group 1 On Logit scale Example: Define Caliper proc means data=pred stddev; var logit; output out=lstd; run; data _null_; set lstd; if _stat_='STD' THEN do; call symputx('std',logit/5); end; run; Creating “caliper” of .2*stddev(logit) Example: Perform Match %gmatch(data=pred, group=group1, id=id, mvars=logit, wts=1 , dmaxk=&std, ncontls=1, seedca=987896, seedco=425632, out=match); Group1 N=858 Group2 N=858 P-value Age 60.1 ± 3.6 60.17 ± 3.62 .678 Male_Gender 469( 54.66% ) 478 ( 55.71% ) .662 History of Diabetes 261 ( 30.42% ) 256 ( 29.84% ) .792 mayoresearch.mayo.edu/biostat/upload/gmatch.sas Example: Assess Balance • Original Data – %std_diff(data=fulldata, group=group1, continuous=age {+others}, binary=male diabetes {+others}, out=before) • Matched Data – %std_diff(data=matched_data, group=group1, continuous=age {+others}, binary=male diabetes {+others}, out=after) • Combine data after; set after(rename=(stddiff=after_stddiff)); run; proc sql; create table both as select * from before as a join after as b on a.variable=b.variable ; quit; Example: Assess Balance Variable label STD DIFF Before STD DIFF AFTER V1 V2 V3 … Age Gender Diabetes … 99.65 9.22 15.9 … .3 .45 3.3 … proc gplot data=both; title 'Standardized difference plot'; plot label*StdDiff=1 label*after_stddiff=2/overlay vaxis=axis1 haxis=axis2 href=10 legend=legend1 AUTOVREF chref=black lhref=3; run; quit; Standardized difference plot Running out of Names Random Variable Made Up Hmmm…a bit ugly Kevin Rules KCASUG Gender Diabetes History Blah Blah Also Made Up Before Match After Match Age 0 10 20 30 40 50 60 70 80 90 Standardized Difference 100 110 120 Format macro proc sort data=both;by stddiff;run; /*attach formats to variables*/ %macro doformat(data=); data &data; set &data; Counter Variable count+1; run; proc sql; select label into :label separated by '*' from &data; quit; %let numvar=%words(&label,delim=%str(*)); proc format; value fmt %do i=1 %to &numvar ; &i=%qscan(&var,&i,*) %end;; run; data &data; set &data; format count fmt.; run; %mend; %doformat(data=both); Sort by stddiff before match Read in Label names into &label Count # of Variables Format (i) counter with (i) label Assessing Balance Variable label STD DIFF Before STD DIFF AFTER Count V1 V3 V2 … Age Diabetes Gender … 99.65 15.9 9.22 … .3 3.3 .45 … Age Diabetes Gender … proc gplot data=both; title 'Standardized difference plot'; plot count*StdDiff=1 count*afterstddiff=2/overlay vaxis=axis1 haxis=axis2 href=10 legend=legend1 AUTOVREF chref=black lhref=3; run; quit; Standardized difference plot Age Blah Blah Kevin Rules KCASUG Diabetes History Gender Random Variable Also Made Up Running out of Names Before Match After Match Made Up 0 10 20 30 40 50 60 70 80 90 Standardized Difference 100 110 120 Standardized difference plot stemi emergency elective age currentsmoke nstemi apr_mort cardiogenic_shock prior_PCI self_pay apr_sev hypertension hyperlipidemia diabetes race_white male chronic_kidney_dis formersmoke race_black prior_MI anemia PVD oth_aterialdisease rheumatic_HD CVD heartfailure stroke renal_insufficiency tia COPD obese dialysis otherheart_disease renal_failure underweight Before Match After Match 0 10 20 30 40 Standardized Difference 50 60 70 Now What? • Variable Standardized differences are <10, indicating balance • Now we can see if group membership has an impact on our outcome – Caution: this is matched data so statistically we need to account for this • Paired t-tests, McNemars Test, Conditional Logistic Regression, Stratified Proportional Hazard Regression Other Uses… • A way to show just how different 2 groups are… Distribution of Propensity Scores 1.0 0.9 Probability of Group 2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Group 1 Group 2 Distribution of Propensity Scores 1.0 0.9 0.7 Probability of CAS Probability Group 2 0.8 0.6 0.5 0.4 0.3 0.2 0.1 0.0 CEA Group 1 CAS Group 2 Concluding Remarks • If you want more information: Search for Ralph D’Agostino Jr. (Wake Forest) and Peter Austin (Univ of Toronto) • Introductory Read: – D’Agostino JR: Tutorial in Biostatistics: Propensity Score Methods for Bias Reduction in the comparison of treatment to a non-randomized control group. Statist. Med 17 (1998), 2265-2281 • 1:Many Matching – Austin P. Assessing balance in measured baseline covariates when using many-to-one matching on the propensity score. Pharmacoepidemiology and drug safety (2008) 17: 1218-1225 Concluding Remarks…things to avoid • Austin (2008) performed a literature review and found many propensity score matching papers were done incorrectly – 47 Articles reviewed from medical literature which did Propensity Score Matching • Only 2 studies used Standardized Differences to access match (most relied on p-values) • Only 13 used correct statistical methods for matched data • See paper for the common errors – Only 2 studies assessed balance correctly and used correct statistical methods Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med. 2008 May 30;27(12):2037-49 Concluding Remarks…things to avoid • Austin’s Recommendations 1. Strategy for creating pairings should be specifically stated with appropriate statistical citation 2. The distribution of baseline characteristics between treated and control should be described 3. Differences in distributions should be assessed with methods not influenced by sample size 4. Use appropriate statistical methods to account for match i. ii. McNemar’s Test for Binary data Use of strata statement in proc logistic or phreg What have we learned…if anything 1. RCT may be the gold standard but Propensity Scores are their attractive cousin 2. Using PS can remove a lot of bias in determining treatment effect 3. You can: Match, stratify, or adjust for the PS 4. Use the standardized difference to determine balance (unaffected by sample size) Name: Kevin Kennedy Company: Mid America Heart Institute: St. Luke’s Hospital Address: 4401 Wornall Rd, Kansas City, MO Email: [email protected] or [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.