Download Causal inference for survival analysis (II)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Impact evaluation wikipedia , lookup

Rubin causal model wikipedia , lookup

Transcript
Outline
Causal inference
for survival analysis (II)
1. Definition of causal effect
† Counterfactuals
2. Estimation of causal effects:
† Inverse probability weighting
Miguel A. Hernán
Department of Epidemiology
Harvard School of Public Health
3. Causal diagrams
† Directed acyclic graphs
4. The bias of standard methods
5. Causal models
Lysebu, September 2004
† Marginal structural models
Causal inference (II)
So far
From now on
† We have described the formal,
counterfactual-based, definitions of
† The key concepts of
„ Causal effect and conditional exchangeability
„ Causal effect and conditional exchangeability
† can be represented graphically using causal
diagrams
† This approach to causal inference is less
mathematically/statistically powerful but
natural and simple
† And we have used them to derive methods
for the estimation of causal effects under
conditional exchangeability
† This approach to causal inference is
mathematically/statistically powerful but
sometimes cumbersome
„ Used to classify sources of bias (lack of
exchangeability) in epidemiology
„ And to identify potential problems in study
design and analysis
„ Uniquely useful to develop (semiparametric)
structural models for time-varying exposures
Causal inference (II)
2
3
Counterfactual versus graphs
Causal inference (II)
4
Diagrams for causal structures
† The counterfactual approach
„ Statistics
L
† The causal diagrams approach
„ Computer science / Artificial intelligence
Y
† DIRECTED edges (arrows) linking nodes
(variables)
† ACYCLIC links because no arrows from
descendants (effects) to ancestors (causes)
† GRAPHS
† DAGs
† Both approaches are mathematically
equivalent
„ Lead to the same non parametric estimators
† We try to use the best of both worlds
„ Graphs to conceptualize problems
„ Structural models to analyze data
Causal inference (II)
A
5
Causal inference (II)
6
1
Causal DAGs
Expert knowledge and causal DAGs
A
† Complete DAGs do
not exclude any
possible causal
effect
L
A
Y
† Incomplete DAGs
encode expert
knowledge in the
form of missing
arrows
L
A
Y
Y
„ means Pr[Ya=1=1] = Pr[Ya=0=1]
A
Y
„ means either Pr[Ya=1=1] = Pr[Ya=0=1] or
PrYa1  1  PrYa0  1
† Information is in the missing arrows
Causal inference (II)
7
Causal inference (II)
8
DAGs and causal DAGs
Causal graphs and association
† A DAG is a causal DAG if the common
causes of any pair of variables in the graph
are also in the DAG
† Causal DAGs are
„ (qualitative) structural or causal models
„ (non parametric) statistical models
„ Required by Causal Markov condition
† That is, a causal DAG does not need to
include variables that are not of interest for
the analysis and that are not common
causes of other variables in the DAG
Causal inference (II)
Y
A
† Let’s see this
„ Emphasis on informal insight rather than formal
rigor
9
Causal effect implies association
A
† Causal effects imply associations
† Lack of causal effects imply (conditional)
independences
B
Causal inference (II)
10
Common causes imply association
Y
L
A
Y
† Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
† Causal statement:
PrYa1  1  PrYa0  1
† Associational statement:
PrY  1|A  1  PrY  1|A  0
† Associational statement:
PrY  1|A  1  PrY  1|A  0
Causal inference (II)
† Confounding
„ lack of exchangeability
11
Causal inference (II)
12
2
What do common effects imply?
A
L
Y
Two variables are marginally
associated if…
† They are cause and effect
† Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
† They share common causes
† Associational statement:
Pr[Y=1|A=1] = Pr[Y=1|A=0]
† (By chance)
YA
Causal inference (II)
13
Aside: Two variables may be
associated by chance
Causal inference (II)
Conditional independence
† Even in the absence of structures that lead
to association
† Chance is not a structural source of
association
PrYa1  1  PrY a0  1
A
B
Y
† To focus our discussion on bias rather than
chance, assume that we are working with
the entire population
Y
PrYa1  1  PrY a0  1
L
A
Y
15
† Pr[Y=1|A=1,L=l] =
Pr[Y=1|A=0,L=l]
Y  A|L  l l
Causal inference (II)
16
Similarly…
Conditioning on common effects
A
† Pr[Y=1|A=1,B=b] =
Pr[Y=1|A=0,B=b]
Y  A|B  b b
„ increase sample size and chance associations
disappear (while structural associations remain)
Causal inference (II)
14
A
L
Y
L
S
† Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
† Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
† Associational statement: for some l
PrY  1|A  1, L  l   PrY  1|A  0, L  l 
† Associational statement: for some s
PrY  1|A  1, S  s  PrY  1|A  0, S  s
† Selection bias
† Selection bias
„ lack of conditional exchangeability
Causal inference (II)
„ lack of conditional exchangeability
17
Causal inference (II)
18
3
Example of ascertainment bias or
of how DAGs can help
Sources of association
† A: exogenous
estrogens
† Y: endometrial cancer
† C: vaginal bleeding
† Y’: ascertained
endometrial cancer
† Cause and effect
† Common causes
† A: oral
contraceptives
† Y: thromboembolism
† C: medical care
† Y’: ascertained
thromboembolism
† Conditioning on common effects
„ In design or analysis
Causal inference (II)
A
Y
19
C
Y’
Causal inference (II)
Theory of causal DAGs
d-separation
† Mathematically formalized by
¾ Pearl (1988, 1995, 2000)
¾ Spirtes, Glymour, and Scheines (1993, 2000)
† Set of graphical rules to decide whether
two variables are
20
„ d-separated = independent
„ d-connected = associated (in general)
† If two variables are d-separated
„ without conditioning on any other variables in
the DAG, then they are marginally independent
„ after conditioning on a set of third variables,
then they are conditionally independent (i.e.,
independent within every joint stratum of the
third variables)
Judea Pearl
Professor of Computer
Science, UCLA
Causal inference (II)
21
Causal inference (II)
22
d-separation
Terminology
Summary
of d-separation rules
† Descendants, non-descendants
† A path is blocked if and only if it
contains a noncollider that has been
conditioned, or it contains a collider
that has not been conditioned on and
has no descendants that have been
conditioned on
† Two variables are d-separated if all
paths between them are blocked
(otherwise they are d-connected)
„ Parents, children
† Path is any arrow-based route
between two variables in the graph
„ whether it follows the direction of the
arrows or not
† Paths can be either blocked or open
according to the following graphical
rules
Causal inference (II)
23
Causal inference (II)
24
4
Identifiability conditions:
Causal effect can be identified if
Outline
† No common causes
1. Definition of causal effect
A
„ No back-door path
„ No confounding
Y
† Counterfactuals
2. Estimation of causal effects:
Ya  A
† Inverse probability weighting
† Common causes but
„ Enough data to block
the back-door paths
„ No unmeasured
confounding
L
A
Y
3. Causal diagrams
† Directed acyclic graphs
4. The bias of standard methods
5. Causal models
Ya  A|L
† Marginal structural models
Causal inference (II)
25
Causal inference (II)
First, what is bias?
Bias
† Bias is a structural association between
exposure and outcome that does not arise
from the causal effect of exposure on
outcome
† Cause and effect
„ Bias only if reverse
causation
„ Information bias
„ Under the causal null hypothesis, exposure and
outcome are associated
† There are a finite number of causal
structures that produce associations
between two variables
„ Confounding
† Conditioning on
common effects
„ Therefore biases can be classified by structure
Causal inference (II)
† Common causes
26
A
Y
L
A
Y
A
Y
L
„ Selection bias
27
Causal inference (II)
Standard methods are based on
stratification
Stratification-based methods
are problematic because
† Non parametric
† Attempt to eliminate confounding by
estimating the effect measure within
levels of (conditioning on) L
„ Stratified analysis (Mantel-Haenzsel), …
† Parametric/Semiparametric
„ or of functions of L (propensity score)
„ Generalized linear models (OLS, logistic
regression,…)
„ (Time-dependent) Cox proportional
hazards regression
„ Propensity score adjustment
Causal inference (II)
28
† Effect estimates may not have a
causal interpretation when dealing
with time-varying exposures
„ There are two problems
29
Causal inference (II)
30
5
Problem #1
A(0)
Problem #2
L(1)
A(1)
A(0)
Y(2)
L(1)
A(1)
Y(2)
U
U
† Adjusting for L eliminates part of the effect
of the exposure
† Direct effect?
Causal inference (II)
31
Problem #2: Time-varying
confounders “affected” by exposure
† Adjusting for L creates (selection) bias
† Even if L not on causal pathway between
exposure and outcome
Causal inference (II)
32
Causal effect of interest
† Interested in effect of duration of treatment
A on the risk of Y=1
„ A = A0+A1= 0, 1, or 2
† For example, let’s suppose that we are
interested in comparing whether continuous
treatment has a causal effect compared
with no treatment at all
† The causal risk ratio of interest is then
At: Antiretroviral therapy (0: no, 1: yes) at time t
Y: Viral load (1 if detectable, 0: otherwise)
L: CD4 count (0: high, 1: low)
U: True immunosupression level
„ Pr[Ya=2=1] / Pr[Ya=0=1]
(Unknown to data analyst: No effect of At on Y)
Causal inference (II)
33
34
Stratification
to compute the causal effect of A
Identifiability conditions
† To identify the causal effect of treatment A
on the risk of Y=1, we need to be able to
identify the causal effect of each
component of A
† That is, we need to be able to block all
back-door paths for both A0 and A1
„ There are no back-door paths for A0
„ The only back-door path for A1 can be blocked if
we have data on L1
† We are OK then
Causal inference (II)
Causal inference (II)
35
† Is the conditional risk ratio equal to the
causal risk ratio (i.e., one)?
„ Pr[Y=1|A=2, L1=l] / Pr[Y=1|A=0, L1=l]
† NO
† Conditioning on L1
„ eliminates confounding (blocks the back-door
path) for one component of A, i.e., A1
„ creates selection bias for the other component
of A, i.e., A0
„ As long as one component of A is associated
with Y, A is associated with Y
Causal inference (II)
36
6
To stratify or not to stratify…
More generally
† Not stratifying is bad because there is
confounding
† Stratifying is bad because
stratification eliminates confounding
at the cost of introducing selection
bias
† Because the confounder for part of
the exposure is affected by another
part of the exposure
Causal inference (II)
† Bias if either the confounder is affected by
the exposure or shares a common cause
with it
† Bias even if the confounder is not on the
causal pathway from exposure to outcome
37
Causal inference (II)
38
In summary
Analytic control of confounding
† Methods that estimate association measure
ignoring data on L1
† Stratification-based
methods
„ Association measure does not have a causal
interpretation if there is confounding by L1
† Methods that estimate association measure
within levels of L1
„ Association measure does not have a causal
interpretation if L1 affected by exposure (or a
cause of the exposure)
Causal inference (II)
† IPW-based
methods
„ Back-door
eliminated in
pseudopopulation
† Need for other methods (Yes, IPW)
L
„ Back-door path
blocked by
conditioning
39
A
Y
L
a
Ya
Causal inference (II)
Models
† Appropriately adjusts for confounding when
time-dependent confounders are affected
by exposure (or by causes of exposure)
† Because adjustment is achieved by
eliminating the arrow from confounder to
subsequent exposure (in the
pseudopopulation)
† Conditional associational
† Now need models
Causal inference (II)
41
Y
L
IPW
„ Not by conditioning on the confounder
A
40
„ E[Y|A]= θ0 + θ1cum(A)
„ Standard statistical models
† Marginal structural
„ E[Ya] = β0 + β1cum(a)
„ Causal models
Causal inference (II)
42
7
Marginal structural models
Some types of MSMs
† Linear
† MODELS for
„ E[Ya] = β0 + β1cum(a)
„ β1 is the causal mean increase per unit of
exposure
† the MARGINAL distribution of
† Logistic
„ logit Pr[Da=1] = β0 + β1cum(a)
„ exp(β1) is the causal odds ratio
† counterfactual outcomes
(STRUCTURAL)
† Repeated measures
† Robins (1998)
† Etc… (we assume all models are correctly
specified)
Causal inference (II)
„ E[Ya(t+1)] = β0(t)+ β1cum[a(t)]
„ logit Pr[Da(t+1)=1] = β0(t)+ β1cum[a(t)]
43
Causal inference (II)
44
10
Marginal structural Cox
proportional hazards model
40
In general,
30
20
16
† λTa(t)= λ0(t) exp[β1a(t)]
† exp(β1) is the causal rate (hazard)
ratio
24
2 52
1 08
30
120
† But the outcome of these models is
unobserved, how can we fit them and
estimate the causal parameter β1?
† Answer: Using IPW estimation
Causal inference (II)
Wt 

k0
how are the weights
defined?
8
12
126
54
Causal inference (II)
46
Stabilized weights
1
fAk|A0, .. ., Ak  1,L0 ,.. ., Lk
† Inverse of probability of having your own
observed treatment history given timevarying covariates
† Problem: lead to inefficient estimators of
the parameters of marginal structural
models
Causal inference (II)
60
45
(Time-varying) weights
t
90
t
SW (t ) = ∏
k =0
f [A(k ) | A(0),..., A( k − 1)]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L( k )]
† Denominator: probability of having your
own observed treatment history given
time-varying covariates
† Numerator: probability of having your own
observed treatment history
† If no confounding, SW(t)=1
47
Causal inference (II)
48
8
An application of MS Cox Model
The data
† MACS
„ Multicenter AIDS Cohort Study
„ 5,622 men (1984- )
† WIHS
„ Women’s Interagency HIV Study
„ 2,628 women (1994- )
† Semiannual visits
„ questionnaire, blood sample
Causal inference (II)
49
Causal inference (II)
MACS+WIHS: Study population
MACS+WIHS: Follow-up
† Inclusion criteria (1996)
† Follow-up
„ From 1996, or first subsequent eligible visit, to
April 2002
„ Median follow-up time 5.4 years
„ 6,763 person-years of follow-up
„ HIV-positive
„ AIDS-free
„ Had not started Highly Active Antiretroviral
Therapy (HAART)
† Outcome
† 1,498 participants met inclusion criteria
„ Time to AIDS or death
„ 329 AIDS + 53 deaths = 382 events
„ 259 censored before end of follow-up
„ 66% female
„ median age 39 years
„ 37% Caucasians
Causal inference (II)
50
51
Causal inference (II)
52
53
Causal inference (II)
54
Exposure and covariates
† Exposure A(t): HAART therapy
„ 918 subjects initiated therapy
„ Incidence rate: 22/100 person-years
† Covariates L(t):
„ age, gender, race, prior ART, HIV-1 RNA, CD4,
CD8, HIV-related symptoms
„ Measured at baseline and every 6 months
„ Time-dependent confounders and affected by
previous treatment
Causal inference (II)
9
Estimation of causal parameter β1
† Marginal structural Cox model
„ λTa(t)= λ0(t) exp[β1a(t)]
† Fit standard Cox model
„ λT [t|A(0),…,A(t)]= λ0,T(t) exp[θ1A(t)]
† Reweight subjects in each risk set
„ SW(t) more efficient than W(t)
„ Weights have to be estimated
† The estimate of θ1 is an unbiased estimate
of the (log) causal hazard ratio β1
Causal inference (II)
55
Causal inference (II)
56
Programming issue
Solution
† Weights SW(t) are time-varying
† Fit a pooled logistic regression model
to approximate Cox regression
† SAS Proc Phreg (and other standard
software) has a weight statement but
does not allow for time-varying
weights
Causal inference (II)
57
A confidence interval for the
causal parameter β1
Causal inference (II)
58
Estimation of weights SW
† Using weights induces within-subject
correlation
t
„ Naïve variance provided by standard software is
incorrect
„ Correct analytical variance not implemented in
standard software
† Solution: Use robust variance (GEE
sandwich estimator)
SW (t ) = ∏
k =0
f [A( k ) | A(0),..., A(k − 1)]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )]
† To estimate factors in the denominator, fit
a logistic model for
„ logit Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)]
„ Covariates are a function of past exposure and
past covariate history
„ e.g., use SAS Proc Genmod with repeated
statement rather than Proc Logistic
„ Robust variance provides conservative
confidence intervals for β1
Causal inference (II)
„ logit Pr[D(t+1)=1|D(t)=0,A(0),…,A(t)]=
θ0(t) + θ1A(t)
„ where D(t)=0 if subject alive at time t, 1
otherwise
„ θ1 is a good approximation to β1 when
the probability of D(t)=1 is small in each
time period
† Similarly for numerator
„ logit Pr[A(k)=1|A(0),…,A(k-1)]
„ Covariates are a function of past exposure
59
Causal inference (II)
60
10
Estimation of denominator of SW
What about censoring?
† Use a logistic model to estimate the
probability
† Same strategy: weighting by the inverse of the
probability of having one’s censoring history
† Fit two logistic models for the outcome censoring
(1=censored, 0=uncensored)
† Compute the stabilized inverse-probability-of
censoring weights:
„ Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)]
† For a subject in risk set k, multiply the
probabilities of having had his/her own
treatment history from time 0 to k
SW * (t ) =
„ Sometimes Pr[A(k)=1], sometimes Pr[A(k)=0] =
1-Pr[A(k)=1]
Pr [C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1)]
t
∏ Pr[C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1), L(0),..., L(k )]
k =0
† Final weight is the product SW(t)SW*(t)
† Similarly for the numerator
Causal inference (II)
61
45
Causal inference (II)
Effect(-measure) modification
by baseline covariates
5
30
20
† If interesting from subject-matter standpoint, one
can estimate the causal parameter within levels of
baseline covariates
24
„ e.g., λTa(t|V)= λ0(t) exp{β1a(t)+β’2V+β’3Vxa(t)}
„ V is a subset of L(0)
† Not to adjust for confounding by V
† Weights are redefined as
16
252
t
SW (t ) = ∏
k =0
108
Causal inference (II)
62
f [A( k ) | A(0),..., A( k − 1),V ]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )]
Causal inference (II)
63
In our study
64
MSMs
† Advantages
† Disadvantages
„ Resemble standard
models
„ Any type of
outcome variable
Causal inference (II)
65
„ Not useful to
estimate effects of
dynamic treatments
(i.e., no interaction
with time-varying
covariates)
„ Require positive
probability of
exposure for all
covariate histories
Causal inference (II)
66
11