Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline Causal inference for survival analysis (II) 1. Definition of causal effect Counterfactuals 2. Estimation of causal effects: Inverse probability weighting Miguel A. Hernán Department of Epidemiology Harvard School of Public Health 3. Causal diagrams Directed acyclic graphs 4. The bias of standard methods 5. Causal models Lysebu, September 2004 Marginal structural models Causal inference (II) So far From now on We have described the formal, counterfactual-based, definitions of The key concepts of Causal effect and conditional exchangeability Causal effect and conditional exchangeability can be represented graphically using causal diagrams This approach to causal inference is less mathematically/statistically powerful but natural and simple And we have used them to derive methods for the estimation of causal effects under conditional exchangeability This approach to causal inference is mathematically/statistically powerful but sometimes cumbersome Used to classify sources of bias (lack of exchangeability) in epidemiology And to identify potential problems in study design and analysis Uniquely useful to develop (semiparametric) structural models for time-varying exposures Causal inference (II) 2 3 Counterfactual versus graphs Causal inference (II) 4 Diagrams for causal structures The counterfactual approach Statistics L The causal diagrams approach Computer science / Artificial intelligence Y DIRECTED edges (arrows) linking nodes (variables) ACYCLIC links because no arrows from descendants (effects) to ancestors (causes) GRAPHS DAGs Both approaches are mathematically equivalent Lead to the same non parametric estimators We try to use the best of both worlds Graphs to conceptualize problems Structural models to analyze data Causal inference (II) A 5 Causal inference (II) 6 1 Causal DAGs Expert knowledge and causal DAGs A Complete DAGs do not exclude any possible causal effect L A Y Incomplete DAGs encode expert knowledge in the form of missing arrows L A Y Y means Pr[Ya=1=1] = Pr[Ya=0=1] A Y means either Pr[Ya=1=1] = Pr[Ya=0=1] or PrYa1 1 PrYa0 1 Information is in the missing arrows Causal inference (II) 7 Causal inference (II) 8 DAGs and causal DAGs Causal graphs and association A DAG is a causal DAG if the common causes of any pair of variables in the graph are also in the DAG Causal DAGs are (qualitative) structural or causal models (non parametric) statistical models Required by Causal Markov condition That is, a causal DAG does not need to include variables that are not of interest for the analysis and that are not common causes of other variables in the DAG Causal inference (II) Y A Let’s see this Emphasis on informal insight rather than formal rigor 9 Causal effect implies association A Causal effects imply associations Lack of causal effects imply (conditional) independences B Causal inference (II) 10 Common causes imply association Y L A Y Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1] Causal statement: PrYa1 1 PrYa0 1 Associational statement: PrY 1|A 1 PrY 1|A 0 Associational statement: PrY 1|A 1 PrY 1|A 0 Causal inference (II) Confounding lack of exchangeability 11 Causal inference (II) 12 2 What do common effects imply? A L Y Two variables are marginally associated if… They are cause and effect Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1] They share common causes Associational statement: Pr[Y=1|A=1] = Pr[Y=1|A=0] (By chance) YA Causal inference (II) 13 Aside: Two variables may be associated by chance Causal inference (II) Conditional independence Even in the absence of structures that lead to association Chance is not a structural source of association PrYa1 1 PrY a0 1 A B Y To focus our discussion on bias rather than chance, assume that we are working with the entire population Y PrYa1 1 PrY a0 1 L A Y 15 Pr[Y=1|A=1,L=l] = Pr[Y=1|A=0,L=l] Y A|L l l Causal inference (II) 16 Similarly… Conditioning on common effects A Pr[Y=1|A=1,B=b] = Pr[Y=1|A=0,B=b] Y A|B b b increase sample size and chance associations disappear (while structural associations remain) Causal inference (II) 14 A L Y L S Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1] Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1] Associational statement: for some l PrY 1|A 1, L l PrY 1|A 0, L l Associational statement: for some s PrY 1|A 1, S s PrY 1|A 0, S s Selection bias Selection bias lack of conditional exchangeability Causal inference (II) lack of conditional exchangeability 17 Causal inference (II) 18 3 Example of ascertainment bias or of how DAGs can help Sources of association A: exogenous estrogens Y: endometrial cancer C: vaginal bleeding Y’: ascertained endometrial cancer Cause and effect Common causes A: oral contraceptives Y: thromboembolism C: medical care Y’: ascertained thromboembolism Conditioning on common effects In design or analysis Causal inference (II) A Y 19 C Y’ Causal inference (II) Theory of causal DAGs d-separation Mathematically formalized by ¾ Pearl (1988, 1995, 2000) ¾ Spirtes, Glymour, and Scheines (1993, 2000) Set of graphical rules to decide whether two variables are 20 d-separated = independent d-connected = associated (in general) If two variables are d-separated without conditioning on any other variables in the DAG, then they are marginally independent after conditioning on a set of third variables, then they are conditionally independent (i.e., independent within every joint stratum of the third variables) Judea Pearl Professor of Computer Science, UCLA Causal inference (II) 21 Causal inference (II) 22 d-separation Terminology Summary of d-separation rules Descendants, non-descendants A path is blocked if and only if it contains a noncollider that has been conditioned, or it contains a collider that has not been conditioned on and has no descendants that have been conditioned on Two variables are d-separated if all paths between them are blocked (otherwise they are d-connected) Parents, children Path is any arrow-based route between two variables in the graph whether it follows the direction of the arrows or not Paths can be either blocked or open according to the following graphical rules Causal inference (II) 23 Causal inference (II) 24 4 Identifiability conditions: Causal effect can be identified if Outline No common causes 1. Definition of causal effect A No back-door path No confounding Y Counterfactuals 2. Estimation of causal effects: Ya A Inverse probability weighting Common causes but Enough data to block the back-door paths No unmeasured confounding L A Y 3. Causal diagrams Directed acyclic graphs 4. The bias of standard methods 5. Causal models Ya A|L Marginal structural models Causal inference (II) 25 Causal inference (II) First, what is bias? Bias Bias is a structural association between exposure and outcome that does not arise from the causal effect of exposure on outcome Cause and effect Bias only if reverse causation Information bias Under the causal null hypothesis, exposure and outcome are associated There are a finite number of causal structures that produce associations between two variables Confounding Conditioning on common effects Therefore biases can be classified by structure Causal inference (II) Common causes 26 A Y L A Y A Y L Selection bias 27 Causal inference (II) Standard methods are based on stratification Stratification-based methods are problematic because Non parametric Attempt to eliminate confounding by estimating the effect measure within levels of (conditioning on) L Stratified analysis (Mantel-Haenzsel), … Parametric/Semiparametric or of functions of L (propensity score) Generalized linear models (OLS, logistic regression,…) (Time-dependent) Cox proportional hazards regression Propensity score adjustment Causal inference (II) 28 Effect estimates may not have a causal interpretation when dealing with time-varying exposures There are two problems 29 Causal inference (II) 30 5 Problem #1 A(0) Problem #2 L(1) A(1) A(0) Y(2) L(1) A(1) Y(2) U U Adjusting for L eliminates part of the effect of the exposure Direct effect? Causal inference (II) 31 Problem #2: Time-varying confounders “affected” by exposure Adjusting for L creates (selection) bias Even if L not on causal pathway between exposure and outcome Causal inference (II) 32 Causal effect of interest Interested in effect of duration of treatment A on the risk of Y=1 A = A0+A1= 0, 1, or 2 For example, let’s suppose that we are interested in comparing whether continuous treatment has a causal effect compared with no treatment at all The causal risk ratio of interest is then At: Antiretroviral therapy (0: no, 1: yes) at time t Y: Viral load (1 if detectable, 0: otherwise) L: CD4 count (0: high, 1: low) U: True immunosupression level Pr[Ya=2=1] / Pr[Ya=0=1] (Unknown to data analyst: No effect of At on Y) Causal inference (II) 33 34 Stratification to compute the causal effect of A Identifiability conditions To identify the causal effect of treatment A on the risk of Y=1, we need to be able to identify the causal effect of each component of A That is, we need to be able to block all back-door paths for both A0 and A1 There are no back-door paths for A0 The only back-door path for A1 can be blocked if we have data on L1 We are OK then Causal inference (II) Causal inference (II) 35 Is the conditional risk ratio equal to the causal risk ratio (i.e., one)? Pr[Y=1|A=2, L1=l] / Pr[Y=1|A=0, L1=l] NO Conditioning on L1 eliminates confounding (blocks the back-door path) for one component of A, i.e., A1 creates selection bias for the other component of A, i.e., A0 As long as one component of A is associated with Y, A is associated with Y Causal inference (II) 36 6 To stratify or not to stratify… More generally Not stratifying is bad because there is confounding Stratifying is bad because stratification eliminates confounding at the cost of introducing selection bias Because the confounder for part of the exposure is affected by another part of the exposure Causal inference (II) Bias if either the confounder is affected by the exposure or shares a common cause with it Bias even if the confounder is not on the causal pathway from exposure to outcome 37 Causal inference (II) 38 In summary Analytic control of confounding Methods that estimate association measure ignoring data on L1 Stratification-based methods Association measure does not have a causal interpretation if there is confounding by L1 Methods that estimate association measure within levels of L1 Association measure does not have a causal interpretation if L1 affected by exposure (or a cause of the exposure) Causal inference (II) IPW-based methods Back-door eliminated in pseudopopulation Need for other methods (Yes, IPW) L Back-door path blocked by conditioning 39 A Y L a Ya Causal inference (II) Models Appropriately adjusts for confounding when time-dependent confounders are affected by exposure (or by causes of exposure) Because adjustment is achieved by eliminating the arrow from confounder to subsequent exposure (in the pseudopopulation) Conditional associational Now need models Causal inference (II) 41 Y L IPW Not by conditioning on the confounder A 40 E[Y|A]= θ0 + θ1cum(A) Standard statistical models Marginal structural E[Ya] = β0 + β1cum(a) Causal models Causal inference (II) 42 7 Marginal structural models Some types of MSMs Linear MODELS for E[Ya] = β0 + β1cum(a) β1 is the causal mean increase per unit of exposure the MARGINAL distribution of Logistic logit Pr[Da=1] = β0 + β1cum(a) exp(β1) is the causal odds ratio counterfactual outcomes (STRUCTURAL) Repeated measures Robins (1998) Etc… (we assume all models are correctly specified) Causal inference (II) E[Ya(t+1)] = β0(t)+ β1cum[a(t)] logit Pr[Da(t+1)=1] = β0(t)+ β1cum[a(t)] 43 Causal inference (II) 44 10 Marginal structural Cox proportional hazards model 40 In general, 30 20 16 λTa(t)= λ0(t) exp[β1a(t)] exp(β1) is the causal rate (hazard) ratio 24 2 52 1 08 30 120 But the outcome of these models is unobserved, how can we fit them and estimate the causal parameter β1? Answer: Using IPW estimation Causal inference (II) Wt k0 how are the weights defined? 8 12 126 54 Causal inference (II) 46 Stabilized weights 1 fAk|A0, .. ., Ak 1,L0 ,.. ., Lk Inverse of probability of having your own observed treatment history given timevarying covariates Problem: lead to inefficient estimators of the parameters of marginal structural models Causal inference (II) 60 45 (Time-varying) weights t 90 t SW (t ) = ∏ k =0 f [A(k ) | A(0),..., A( k − 1)] f [A( k ) | A(0),..., A( k − 1), L(0),..., L( k )] Denominator: probability of having your own observed treatment history given time-varying covariates Numerator: probability of having your own observed treatment history If no confounding, SW(t)=1 47 Causal inference (II) 48 8 An application of MS Cox Model The data MACS Multicenter AIDS Cohort Study 5,622 men (1984- ) WIHS Women’s Interagency HIV Study 2,628 women (1994- ) Semiannual visits questionnaire, blood sample Causal inference (II) 49 Causal inference (II) MACS+WIHS: Study population MACS+WIHS: Follow-up Inclusion criteria (1996) Follow-up From 1996, or first subsequent eligible visit, to April 2002 Median follow-up time 5.4 years 6,763 person-years of follow-up HIV-positive AIDS-free Had not started Highly Active Antiretroviral Therapy (HAART) Outcome 1,498 participants met inclusion criteria Time to AIDS or death 329 AIDS + 53 deaths = 382 events 259 censored before end of follow-up 66% female median age 39 years 37% Caucasians Causal inference (II) 50 51 Causal inference (II) 52 53 Causal inference (II) 54 Exposure and covariates Exposure A(t): HAART therapy 918 subjects initiated therapy Incidence rate: 22/100 person-years Covariates L(t): age, gender, race, prior ART, HIV-1 RNA, CD4, CD8, HIV-related symptoms Measured at baseline and every 6 months Time-dependent confounders and affected by previous treatment Causal inference (II) 9 Estimation of causal parameter β1 Marginal structural Cox model λTa(t)= λ0(t) exp[β1a(t)] Fit standard Cox model λT [t|A(0),…,A(t)]= λ0,T(t) exp[θ1A(t)] Reweight subjects in each risk set SW(t) more efficient than W(t) Weights have to be estimated The estimate of θ1 is an unbiased estimate of the (log) causal hazard ratio β1 Causal inference (II) 55 Causal inference (II) 56 Programming issue Solution Weights SW(t) are time-varying Fit a pooled logistic regression model to approximate Cox regression SAS Proc Phreg (and other standard software) has a weight statement but does not allow for time-varying weights Causal inference (II) 57 A confidence interval for the causal parameter β1 Causal inference (II) 58 Estimation of weights SW Using weights induces within-subject correlation t Naïve variance provided by standard software is incorrect Correct analytical variance not implemented in standard software Solution: Use robust variance (GEE sandwich estimator) SW (t ) = ∏ k =0 f [A( k ) | A(0),..., A(k − 1)] f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )] To estimate factors in the denominator, fit a logistic model for logit Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)] Covariates are a function of past exposure and past covariate history e.g., use SAS Proc Genmod with repeated statement rather than Proc Logistic Robust variance provides conservative confidence intervals for β1 Causal inference (II) logit Pr[D(t+1)=1|D(t)=0,A(0),…,A(t)]= θ0(t) + θ1A(t) where D(t)=0 if subject alive at time t, 1 otherwise θ1 is a good approximation to β1 when the probability of D(t)=1 is small in each time period Similarly for numerator logit Pr[A(k)=1|A(0),…,A(k-1)] Covariates are a function of past exposure 59 Causal inference (II) 60 10 Estimation of denominator of SW What about censoring? Use a logistic model to estimate the probability Same strategy: weighting by the inverse of the probability of having one’s censoring history Fit two logistic models for the outcome censoring (1=censored, 0=uncensored) Compute the stabilized inverse-probability-of censoring weights: Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)] For a subject in risk set k, multiply the probabilities of having had his/her own treatment history from time 0 to k SW * (t ) = Sometimes Pr[A(k)=1], sometimes Pr[A(k)=0] = 1-Pr[A(k)=1] Pr [C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1)] t ∏ Pr[C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1), L(0),..., L(k )] k =0 Final weight is the product SW(t)SW*(t) Similarly for the numerator Causal inference (II) 61 45 Causal inference (II) Effect(-measure) modification by baseline covariates 5 30 20 If interesting from subject-matter standpoint, one can estimate the causal parameter within levels of baseline covariates 24 e.g., λTa(t|V)= λ0(t) exp{β1a(t)+β’2V+β’3Vxa(t)} V is a subset of L(0) Not to adjust for confounding by V Weights are redefined as 16 252 t SW (t ) = ∏ k =0 108 Causal inference (II) 62 f [A( k ) | A(0),..., A( k − 1),V ] f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )] Causal inference (II) 63 In our study 64 MSMs Advantages Disadvantages Resemble standard models Any type of outcome variable Causal inference (II) 65 Not useful to estimate effects of dynamic treatments (i.e., no interaction with time-varying covariates) Require positive probability of exposure for all covariate histories Causal inference (II) 66 11