Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline
Causal inference
for survival analysis (II)
1. Definition of causal effect
Counterfactuals
2. Estimation of causal effects:
Inverse probability weighting
Miguel A. Hernán
Department of Epidemiology
Harvard School of Public Health
3. Causal diagrams
Directed acyclic graphs
4. The bias of standard methods
5. Causal models
Lysebu, September 2004
Marginal structural models
Causal inference (II)
So far
From now on
We have described the formal,
counterfactual-based, definitions of
The key concepts of
Causal effect and conditional exchangeability
Causal effect and conditional exchangeability
can be represented graphically using causal
diagrams
This approach to causal inference is less
mathematically/statistically powerful but
natural and simple
And we have used them to derive methods
for the estimation of causal effects under
conditional exchangeability
This approach to causal inference is
mathematically/statistically powerful but
sometimes cumbersome
Used to classify sources of bias (lack of
exchangeability) in epidemiology
And to identify potential problems in study
design and analysis
Uniquely useful to develop (semiparametric)
structural models for time-varying exposures
Causal inference (II)
2
3
Counterfactual versus graphs
Causal inference (II)
4
Diagrams for causal structures
The counterfactual approach
Statistics
L
The causal diagrams approach
Computer science / Artificial intelligence
Y
DIRECTED edges (arrows) linking nodes
(variables)
ACYCLIC links because no arrows from
descendants (effects) to ancestors (causes)
GRAPHS
DAGs
Both approaches are mathematically
equivalent
Lead to the same non parametric estimators
We try to use the best of both worlds
Graphs to conceptualize problems
Structural models to analyze data
Causal inference (II)
A
5
Causal inference (II)
6
1
Causal DAGs
Expert knowledge and causal DAGs
A
Complete DAGs do
not exclude any
possible causal
effect
L
A
Y
Incomplete DAGs
encode expert
knowledge in the
form of missing
arrows
L
A
Y
Y
means Pr[Ya=1=1] = Pr[Ya=0=1]
A
Y
means either Pr[Ya=1=1] = Pr[Ya=0=1] or
PrYa1 1 PrYa0 1
Information is in the missing arrows
Causal inference (II)
7
Causal inference (II)
8
DAGs and causal DAGs
Causal graphs and association
A DAG is a causal DAG if the common
causes of any pair of variables in the graph
are also in the DAG
Causal DAGs are
(qualitative) structural or causal models
(non parametric) statistical models
Required by Causal Markov condition
That is, a causal DAG does not need to
include variables that are not of interest for
the analysis and that are not common
causes of other variables in the DAG
Causal inference (II)
Y
A
Let’s see this
Emphasis on informal insight rather than formal
rigor
9
Causal effect implies association
A
Causal effects imply associations
Lack of causal effects imply (conditional)
independences
B
Causal inference (II)
10
Common causes imply association
Y
L
A
Y
Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
Causal statement:
PrYa1 1 PrYa0 1
Associational statement:
PrY 1|A 1 PrY 1|A 0
Associational statement:
PrY 1|A 1 PrY 1|A 0
Causal inference (II)
Confounding
lack of exchangeability
11
Causal inference (II)
12
2
What do common effects imply?
A
L
Y
Two variables are marginally
associated if…
They are cause and effect
Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
They share common causes
Associational statement:
Pr[Y=1|A=1] = Pr[Y=1|A=0]
(By chance)
YA
Causal inference (II)
13
Aside: Two variables may be
associated by chance
Causal inference (II)
Conditional independence
Even in the absence of structures that lead
to association
Chance is not a structural source of
association
PrYa1 1 PrY a0 1
A
B
Y
To focus our discussion on bias rather than
chance, assume that we are working with
the entire population
Y
PrYa1 1 PrY a0 1
L
A
Y
15
Pr[Y=1|A=1,L=l] =
Pr[Y=1|A=0,L=l]
Y A|L l l
Causal inference (II)
16
Similarly…
Conditioning on common effects
A
Pr[Y=1|A=1,B=b] =
Pr[Y=1|A=0,B=b]
Y A|B b b
increase sample size and chance associations
disappear (while structural associations remain)
Causal inference (II)
14
A
L
Y
L
S
Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
Causal statement: Pr[Ya=1=1] = Pr[Ya=0=1]
Associational statement: for some l
PrY 1|A 1, L l PrY 1|A 0, L l
Associational statement: for some s
PrY 1|A 1, S s PrY 1|A 0, S s
Selection bias
Selection bias
lack of conditional exchangeability
Causal inference (II)
lack of conditional exchangeability
17
Causal inference (II)
18
3
Example of ascertainment bias or
of how DAGs can help
Sources of association
A: exogenous
estrogens
Y: endometrial cancer
C: vaginal bleeding
Y’: ascertained
endometrial cancer
Cause and effect
Common causes
A: oral
contraceptives
Y: thromboembolism
C: medical care
Y’: ascertained
thromboembolism
Conditioning on common effects
In design or analysis
Causal inference (II)
A
Y
19
C
Y’
Causal inference (II)
Theory of causal DAGs
d-separation
Mathematically formalized by
¾ Pearl (1988, 1995, 2000)
¾ Spirtes, Glymour, and Scheines (1993, 2000)
Set of graphical rules to decide whether
two variables are
20
d-separated = independent
d-connected = associated (in general)
If two variables are d-separated
without conditioning on any other variables in
the DAG, then they are marginally independent
after conditioning on a set of third variables,
then they are conditionally independent (i.e.,
independent within every joint stratum of the
third variables)
Judea Pearl
Professor of Computer
Science, UCLA
Causal inference (II)
21
Causal inference (II)
22
d-separation
Terminology
Summary
of d-separation rules
Descendants, non-descendants
A path is blocked if and only if it
contains a noncollider that has been
conditioned, or it contains a collider
that has not been conditioned on and
has no descendants that have been
conditioned on
Two variables are d-separated if all
paths between them are blocked
(otherwise they are d-connected)
Parents, children
Path is any arrow-based route
between two variables in the graph
whether it follows the direction of the
arrows or not
Paths can be either blocked or open
according to the following graphical
rules
Causal inference (II)
23
Causal inference (II)
24
4
Identifiability conditions:
Causal effect can be identified if
Outline
No common causes
1. Definition of causal effect
A
No back-door path
No confounding
Y
Counterfactuals
2. Estimation of causal effects:
Ya A
Inverse probability weighting
Common causes but
Enough data to block
the back-door paths
No unmeasured
confounding
L
A
Y
3. Causal diagrams
Directed acyclic graphs
4. The bias of standard methods
5. Causal models
Ya A|L
Marginal structural models
Causal inference (II)
25
Causal inference (II)
First, what is bias?
Bias
Bias is a structural association between
exposure and outcome that does not arise
from the causal effect of exposure on
outcome
Cause and effect
Bias only if reverse
causation
Information bias
Under the causal null hypothesis, exposure and
outcome are associated
There are a finite number of causal
structures that produce associations
between two variables
Confounding
Conditioning on
common effects
Therefore biases can be classified by structure
Causal inference (II)
Common causes
26
A
Y
L
A
Y
A
Y
L
Selection bias
27
Causal inference (II)
Standard methods are based on
stratification
Stratification-based methods
are problematic because
Non parametric
Attempt to eliminate confounding by
estimating the effect measure within
levels of (conditioning on) L
Stratified analysis (Mantel-Haenzsel), …
Parametric/Semiparametric
or of functions of L (propensity score)
Generalized linear models (OLS, logistic
regression,…)
(Time-dependent) Cox proportional
hazards regression
Propensity score adjustment
Causal inference (II)
28
Effect estimates may not have a
causal interpretation when dealing
with time-varying exposures
There are two problems
29
Causal inference (II)
30
5
Problem #1
A(0)
Problem #2
L(1)
A(1)
A(0)
Y(2)
L(1)
A(1)
Y(2)
U
U
Adjusting for L eliminates part of the effect
of the exposure
Direct effect?
Causal inference (II)
31
Problem #2: Time-varying
confounders “affected” by exposure
Adjusting for L creates (selection) bias
Even if L not on causal pathway between
exposure and outcome
Causal inference (II)
32
Causal effect of interest
Interested in effect of duration of treatment
A on the risk of Y=1
A = A0+A1= 0, 1, or 2
For example, let’s suppose that we are
interested in comparing whether continuous
treatment has a causal effect compared
with no treatment at all
The causal risk ratio of interest is then
At: Antiretroviral therapy (0: no, 1: yes) at time t
Y: Viral load (1 if detectable, 0: otherwise)
L: CD4 count (0: high, 1: low)
U: True immunosupression level
Pr[Ya=2=1] / Pr[Ya=0=1]
(Unknown to data analyst: No effect of At on Y)
Causal inference (II)
33
34
Stratification
to compute the causal effect of A
Identifiability conditions
To identify the causal effect of treatment A
on the risk of Y=1, we need to be able to
identify the causal effect of each
component of A
That is, we need to be able to block all
back-door paths for both A0 and A1
There are no back-door paths for A0
The only back-door path for A1 can be blocked if
we have data on L1
We are OK then
Causal inference (II)
Causal inference (II)
35
Is the conditional risk ratio equal to the
causal risk ratio (i.e., one)?
Pr[Y=1|A=2, L1=l] / Pr[Y=1|A=0, L1=l]
NO
Conditioning on L1
eliminates confounding (blocks the back-door
path) for one component of A, i.e., A1
creates selection bias for the other component
of A, i.e., A0
As long as one component of A is associated
with Y, A is associated with Y
Causal inference (II)
36
6
To stratify or not to stratify…
More generally
Not stratifying is bad because there is
confounding
Stratifying is bad because
stratification eliminates confounding
at the cost of introducing selection
bias
Because the confounder for part of
the exposure is affected by another
part of the exposure
Causal inference (II)
Bias if either the confounder is affected by
the exposure or shares a common cause
with it
Bias even if the confounder is not on the
causal pathway from exposure to outcome
37
Causal inference (II)
38
In summary
Analytic control of confounding
Methods that estimate association measure
ignoring data on L1
Stratification-based
methods
Association measure does not have a causal
interpretation if there is confounding by L1
Methods that estimate association measure
within levels of L1
Association measure does not have a causal
interpretation if L1 affected by exposure (or a
cause of the exposure)
Causal inference (II)
IPW-based
methods
Back-door
eliminated in
pseudopopulation
Need for other methods (Yes, IPW)
L
Back-door path
blocked by
conditioning
39
A
Y
L
a
Ya
Causal inference (II)
Models
Appropriately adjusts for confounding when
time-dependent confounders are affected
by exposure (or by causes of exposure)
Because adjustment is achieved by
eliminating the arrow from confounder to
subsequent exposure (in the
pseudopopulation)
Conditional associational
Now need models
Causal inference (II)
41
Y
L
IPW
Not by conditioning on the confounder
A
40
E[Y|A]= θ0 + θ1cum(A)
Standard statistical models
Marginal structural
E[Ya] = β0 + β1cum(a)
Causal models
Causal inference (II)
42
7
Marginal structural models
Some types of MSMs
Linear
MODELS for
E[Ya] = β0 + β1cum(a)
β1 is the causal mean increase per unit of
exposure
the MARGINAL distribution of
Logistic
logit Pr[Da=1] = β0 + β1cum(a)
exp(β1) is the causal odds ratio
counterfactual outcomes
(STRUCTURAL)
Repeated measures
Robins (1998)
Etc… (we assume all models are correctly
specified)
Causal inference (II)
E[Ya(t+1)] = β0(t)+ β1cum[a(t)]
logit Pr[Da(t+1)=1] = β0(t)+ β1cum[a(t)]
43
Causal inference (II)
44
10
Marginal structural Cox
proportional hazards model
40
In general,
30
20
16
λTa(t)= λ0(t) exp[β1a(t)]
exp(β1) is the causal rate (hazard)
ratio
24
2 52
1 08
30
120
But the outcome of these models is
unobserved, how can we fit them and
estimate the causal parameter β1?
Answer: Using IPW estimation
Causal inference (II)
Wt
k0
how are the weights
defined?
8
12
126
54
Causal inference (II)
46
Stabilized weights
1
fAk|A0, .. ., Ak 1,L0 ,.. ., Lk
Inverse of probability of having your own
observed treatment history given timevarying covariates
Problem: lead to inefficient estimators of
the parameters of marginal structural
models
Causal inference (II)
60
45
(Time-varying) weights
t
90
t
SW (t ) = ∏
k =0
f [A(k ) | A(0),..., A( k − 1)]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L( k )]
Denominator: probability of having your
own observed treatment history given
time-varying covariates
Numerator: probability of having your own
observed treatment history
If no confounding, SW(t)=1
47
Causal inference (II)
48
8
An application of MS Cox Model
The data
MACS
Multicenter AIDS Cohort Study
5,622 men (1984- )
WIHS
Women’s Interagency HIV Study
2,628 women (1994- )
Semiannual visits
questionnaire, blood sample
Causal inference (II)
49
Causal inference (II)
MACS+WIHS: Study population
MACS+WIHS: Follow-up
Inclusion criteria (1996)
Follow-up
From 1996, or first subsequent eligible visit, to
April 2002
Median follow-up time 5.4 years
6,763 person-years of follow-up
HIV-positive
AIDS-free
Had not started Highly Active Antiretroviral
Therapy (HAART)
Outcome
1,498 participants met inclusion criteria
Time to AIDS or death
329 AIDS + 53 deaths = 382 events
259 censored before end of follow-up
66% female
median age 39 years
37% Caucasians
Causal inference (II)
50
51
Causal inference (II)
52
53
Causal inference (II)
54
Exposure and covariates
Exposure A(t): HAART therapy
918 subjects initiated therapy
Incidence rate: 22/100 person-years
Covariates L(t):
age, gender, race, prior ART, HIV-1 RNA, CD4,
CD8, HIV-related symptoms
Measured at baseline and every 6 months
Time-dependent confounders and affected by
previous treatment
Causal inference (II)
9
Estimation of causal parameter β1
Marginal structural Cox model
λTa(t)= λ0(t) exp[β1a(t)]
Fit standard Cox model
λT [t|A(0),…,A(t)]= λ0,T(t) exp[θ1A(t)]
Reweight subjects in each risk set
SW(t) more efficient than W(t)
Weights have to be estimated
The estimate of θ1 is an unbiased estimate
of the (log) causal hazard ratio β1
Causal inference (II)
55
Causal inference (II)
56
Programming issue
Solution
Weights SW(t) are time-varying
Fit a pooled logistic regression model
to approximate Cox regression
SAS Proc Phreg (and other standard
software) has a weight statement but
does not allow for time-varying
weights
Causal inference (II)
57
A confidence interval for the
causal parameter β1
Causal inference (II)
58
Estimation of weights SW
Using weights induces within-subject
correlation
t
Naïve variance provided by standard software is
incorrect
Correct analytical variance not implemented in
standard software
Solution: Use robust variance (GEE
sandwich estimator)
SW (t ) = ∏
k =0
f [A( k ) | A(0),..., A(k − 1)]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )]
To estimate factors in the denominator, fit
a logistic model for
logit Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)]
Covariates are a function of past exposure and
past covariate history
e.g., use SAS Proc Genmod with repeated
statement rather than Proc Logistic
Robust variance provides conservative
confidence intervals for β1
Causal inference (II)
logit Pr[D(t+1)=1|D(t)=0,A(0),…,A(t)]=
θ0(t) + θ1A(t)
where D(t)=0 if subject alive at time t, 1
otherwise
θ1 is a good approximation to β1 when
the probability of D(t)=1 is small in each
time period
Similarly for numerator
logit Pr[A(k)=1|A(0),…,A(k-1)]
Covariates are a function of past exposure
59
Causal inference (II)
60
10
Estimation of denominator of SW
What about censoring?
Use a logistic model to estimate the
probability
Same strategy: weighting by the inverse of the
probability of having one’s censoring history
Fit two logistic models for the outcome censoring
(1=censored, 0=uncensored)
Compute the stabilized inverse-probability-of
censoring weights:
Pr[A(k)=1|A(0),…,A(k-1),L(0),…,L(k)]
For a subject in risk set k, multiply the
probabilities of having had his/her own
treatment history from time 0 to k
SW * (t ) =
Sometimes Pr[A(k)=1], sometimes Pr[A(k)=0] =
1-Pr[A(k)=1]
Pr [C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1)]
t
∏ Pr[C (k ) = 0 | C (0) = ... = C (k − 1) = 0, A(0),..., A(k − 1), L(0),..., L(k )]
k =0
Final weight is the product SW(t)SW*(t)
Similarly for the numerator
Causal inference (II)
61
45
Causal inference (II)
Effect(-measure) modification
by baseline covariates
5
30
20
If interesting from subject-matter standpoint, one
can estimate the causal parameter within levels of
baseline covariates
24
e.g., λTa(t|V)= λ0(t) exp{β1a(t)+β’2V+β’3Vxa(t)}
V is a subset of L(0)
Not to adjust for confounding by V
Weights are redefined as
16
252
t
SW (t ) = ∏
k =0
108
Causal inference (II)
62
f [A( k ) | A(0),..., A( k − 1),V ]
f [A( k ) | A(0),..., A( k − 1), L(0),..., L(k )]
Causal inference (II)
63
In our study
64
MSMs
Advantages
Disadvantages
Resemble standard
models
Any type of
outcome variable
Causal inference (II)
65
Not useful to
estimate effects of
dynamic treatments
(i.e., no interaction
with time-varying
covariates)
Require positive
probability of
exposure for all
covariate histories
Causal inference (II)
66
11