Download Causal inference for survival analysis (I)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Outline
Causal inference
for survival analysis (I)
1. Definition of causal effect
† Counterfactuals
2. Estimation of causal effects:
† Inverse probability weighting
Miguel A. Hernán
Department of Epidemiology
Harvard School of Public Health
3. Causal diagrams
† Directed acyclic graphs
4. The bias of standard methods
5. Causal models
Lysebu, September 2004
† Marginal structural models
Causal inference (I)
2
An intuitive definition of cause
An intuitive definition of cause
† Ian took the pill on Sept 1, 2003
† Jim didn’t take the pill on Sept 1,
2002
„ Five days later, he died
„ Five days later, he was alive
† Had Ian not taken the pill on Sept 1,
2003 (all others things being equal)
„ Five days later, he would have been alive
™ Did the pill cause Ian’s death?
Causal inference (I)
† Had Jim taken the pill on Sept 1,
2002 (all others things being equal)
„ Five days later, he would have been alive
™ Did the pill cause Jim’s survival?
3
Causal inference (I)
4
Human reasoning
for causal inference
Notation for actual data
† We compare (often only mentally)
† Y=1 if patient died, 0 otherwise
„ the outcome when action A is present with
„ the outcome when action A is absent
„ all other things being equal
„ Yi=1, Yj=0
† A=1 if patient treated, 0 otherwise
„ Ai=1, Aj=0
† If the two outcomes differ, we say that the
action A has a causal effect, causative or
preventive.
† In epidemiology, A is commonly referred to
as exposure or treatment.
Causal inference (I)
5
ID
A
Y
Ian
1
1
Jim
0
0
Causal inference (I)
6
1
Notation for ideal data
Clarification:
† Ya=0=1 if subject would have died, had he
not taken the pill
† Upper-case letters for random variables
„ A, Y, Ya=0 , Ya=1
† Lower-case letters for possible values
(realizations) of those variables
„ Yi, a=0= 0, Yj, a=0= 0
† Ya=1=1 if patient would have died, had he
taken the pill
„ a is a possible value (0 or 1) of the random
variable A
„ Yi, a=1= 1, Yj, a=1= 0
ID
A
Ya=0
Ya=1
Ian
1
0
1
Jim
0
0
0
Causal inference (I)
† For our purposes, random variables are
variables with different values for different
individuals
7
Causal inference (I)
8
(Individual) Causal effect
Potential or
counterfactual outcomes (I)
† For Ian:
† Ya=0 and Ya=1
† Random variables
„ Pill has a causal effect because
† For Jim:
Yi,a1  Yi,a0
„ Amenable to mathematical treatment, e.g.,
statistical models
„ Pill does not have a causal effect because
Yj,a1  Yj,a0
™ Sharp causal null hypothesis holds
if, for all subjects,
Ya1  Ya0
Causal inference (I)
„ Refers to a “counter to the fact” situation
9
Potential or
counterfactual outcomes (II)
Causal inference (I)
10
Available data set
† One of them describes the subject's
outcome value under the exposure value
that the subject actually experienced
„ Refers to an observed (factual) situation
† A given potential outcome is factual for
some subjects and counterfactual for others
† Consistency
„ By definition, if Ai=a then Yi, a = Yi, A = Yi
Causal inference (I)
† One of them describes the subject's
outcome value that would have been
observed under a potential exposure value
that the subject did not actually experience
11
ID
Ian
Jim
Ken
Leo
Mike
Nick
…
A
1
0
1
0
1
0
Y
1
0
0
1
1
0
Causal inference (I)
Ya=0
?
0
?
1
?
0
Ya=1
1
?
0
?
1
?
12
2
Fundamental problem
of causal inference
First, more notation
† Individual causal effects cannot be
determined
† Pr[Ya=1]
„ except under extremely strong (and generally
unreasonable) assumptions
„ because only one counterfactual outcome is
observed
„ Causal inference as a missing data problem
† Whether using a randomized experiment or
an observational study
† Need another definition of causal effect that
requires weaker assumptions
Causal inference (I)
† Unconditional or marginal probability
„ “Calculated” by using data from the
whole population
13
Causal inference (I)
† In the population, exposure A has a
causal effect on the outcome Y if
† Pr[Ya=1=1] − Pr[Ya=0=1] = 0
† Pr[Ya=1=1] / Pr[Ya=0=1] = 1
† (Pr[Ya=1=1]/Pr[Ya=1=0])/(Pr[Ya=0=1]/Pr[Ya=0=0)
=1
† Causal effect can be measured in
many scales:
PrYa1  1  PrYa0  1
™ Causal null hypothesis holds if
Pr[Ya=1=1] = Pr[Ya=0=1]
„ causal risk difference, causal risk ratio,
causal odds ratio, …
„ Effect measures
15
Causal inference (I)
Individual versus population
causal effects
Association and causation:
More notation
† Individual causal effects cannot be
determined
† Pr[Y=1|A=a]
„ except under extremely strong assumptions
† Population causal effects can be
determined under
„ no assumptions (randomized studies)
„ strong assumptions (observational studies)
16
„ proportion of subjects that developed the
outcome Y among those who received
exposure value a in the population
„ Risk of Y among the exposed/unexposed
† Conditional probability
† We’ll refer to (population) causal effects
only
Causal inference (I)
14
Equivalent representations of the
causal null hypothesis
(Population) Causal effect
Causal inference (I)
„ proportion of subjects that would have
developed the outcome Y had all subjects
in the population of interest received
exposure value a
„ (Counterfactual) Risk of Ya
17
„ Calculated by using data from a subset
of the population
Causal inference (I)
18
3
Equivalent representations of
independence
Association
† Pr[Y=1|A=1] − Pr[Y=1|A=0] = 0
† Pr[Y=1|A=1] / Pr[Y=1|A=0] = 1
† (Pr[Y=1|A=1]/Pr[Y=0|A=1]) /
(Pr[Y=1|A=0]/Pr[Y=0|A=0]) = 1
† The exposure A and the outcome Y
are associated if
PrY  1|A  1  PrY  1|A  0
† No association = independence
PrY  1|A  1  PrY  1|A  0  A  Y  Y  A
Causal inference (I)
19
† Association can be measured in many
scales:
„ Associational risk difference, associational risk
ratio, associational odds ratio, …
„ Association measures
Causal inference (I)
20
Causal inference (I)
22
Again, crucial difference
“Association is not causation”
† Association: different risk in two disjoint
subsets of the population determined by
the subjects' actual exposure value
„ Pr[Y=1|A=a] is the risk in subjects of the
population that meet the condition `having
actually received exposure a’
† Causation: different risk in the entire
population under two exposure values
„ Pr[Ya=1] is the risk in all subjects of the
population had they received the counterfactual
exposure a
Causal inference (I)
21
An example of causal concept:
Confounding
Statistics and causation
† We need counterfactuals to talk about
causation
† Statistics (as a discipline) leaned towards
banning counterfactuals from statistical
language
† Statistics w/o counterfactuals is a language
for association, not for causation
„ Causal concepts cannot be represented using
statistics w/0 counterfactuals
Causal inference (I)
23
† There is confounding when
association is not causation
PrYa  1  PrY  1|A  a
† Confounding cannot be defined using
associational (statistical) language
Causal inference (I)
24
4
Counterfactual theory in statistics
Outline
† Neyman (1923)
1. Definition of causal effect
„ Effects of point exposures in
randomized experiments
† Counterfactuals
2. Estimation of causal effects
† Rubin (1974)
† Inverse probability weighting
„ Effects of point exposures in
randomized and observational
studies
3. Causal diagrams
† Directed acyclic graphs
4. The bias of standard methods
5. Causal models
† Robins (1986)
„ Total and direct effects of
time-varying exposures in
longitudinal studies
Causal inference (I)
† Marginal structural models
25
Causal inference (I)
Association measures
Effect measures
† The associational risk ratio
„ Pr[Y=1|A=1]/Pr[Y=1|A=0]
can be directly computed in any study
† The causal risk ratio
„ Pr[Ya=1=1] / Pr[Ya=0=1]
cannot be directly computed in
general
because Y is observed in all subjects
of the population
because Ya=1 and Ya=0 are unobserved
in some subjects of the population
„ Pr[Ya=1=1] and Pr[Ya=0=1] are
unobserved risks
„ Pr[Y=1|A=1] and Pr[Y=1|A=0] are
observed risks
Causal inference (I)
27
Causal inference (I)
Effect measures
What is an ideal
randomized experiment?
† can be computed using data from
ideal randomized experiments
† No loss to follow-up
„ with no assumptions
28
† Full compliance with (adherence to)
assigned exposure or treatment
† More rigorously, effect measures can
be consistently estimated using data
from ideal randomized experiments
„ For now let’s consider experiments with
near-infinite sample sizes only
Causal inference (I)
26
† Double blind assignment
29
Causal inference (I)
30
5
Randomization (I)
In ideal randomized experiments
„ Pr[Ya=1=1] is equal to Pr[Y=1|A=1]
„ Pr[Ya=0=1] is equal to Pr[Y=1|A=0]
† One (near-infinite) population
† Divided into two groups
„ Group 1 and group 2
† Therefore the associational RR
† Membership in each group is
randomly assigned
„ Pr[Y=1|A=1] / Pr[Y=1|A=0]
is equal to the causal RR
„ e.g., by the flip of a coin
„ Pr[Ya=1=1] / Pr[Ya=0=1]
† One group is treated and the other
untreated
† Let’s prove it
„ first we need to describe randomization
Causal inference (I)
31
Causal inference (I)
32
Randomization (II)
Randomization (III)
† First option
† When group membership is randomly
assigned, results are the same
„ Treat subjects in group 1, don’t treat
subjects in group 2
„ The risk is, say, Pr[Y=1|A=1] = 0.57
„ whether group 1: treated, group 2:
untreated
„ or vice versa
† Both groups are comparable or
exchangeable
† Exchangeability is the consequence of
randomization
† Second option
„ Treat subjects in group 2, don’t treat
subjects in group 1
„ What is the value of the risk Pr[Y=1|A=1] ?
Causal inference (I)
33
Exchangeability
Causal inference (I)
34
Formal definition of exchangeability
† Subjects in group 1 would have had the
same risk as those in group 2 had they
received the treatment of those in
group 2
† The counterfactual risk in the treated
equals the counterfactual risk in the
untreated
Ya  A for all a
† Exchangeability implies lack of
confounding
† Exchangeability is another causal
concept that cannot be represented by
associational (statistical) language
PrYa  1|A  1  PrYa  1|A  0  A  Ya  Ya  A
Causal inference (I)
35
Causal inference (I)
36
6
Proof:
Why Pr[Y=1|A=a] = Pr[Ya=1]?
In an ideal randomized experiment
† Association is causation
† because randomization produces
exchangeability
† Two steps:
1. Pr[Y=1|A=a] = Pr[Ya=1|A=a]
„ by definition (consistency)
2. Pr[Ya=1|A=a] = Pr[Ya=1]
™ We have a method for causal
inference!
„ by randomization (exchangeability)
„ No need for adjustments of any sort
„ Assumption-free
† Step 2 not generally true in the
absence of randomization
Causal inference (I)
37
Causal inference (I)
Example: Does heart transplant (A)
increase 5-year survival (Y)?
Potential problems of
real randomized experiments
† Select a large population of potential
recipients of a transplant
† Get funding and IRB/Ethical approval
† Randomly allocate them to either transplant
(A=1) or medical treatment (A=0)
† 5 years later, compute the associational RR
a) Loss to follow-up
Pr[Y=1|A=1] / Pr[Y=1|A=0]
† that equals (cons. estimates) the causal RR
Pr[Ya=1=1] / Pr[Ya=0=1]
Causal inference (I)
b) Noncompliance
c) Unblinding
d) Other: ethics, feasibility, cost…
39
Causal inference (I)
Consequence of problems a), b), c)
Conclusion
† Although exchangeability still holds in
randomized experiments but
† No clear-cut separation between
randomized and observational studies
„ “available association” may not be causation
(loss to follow-up)
„ exposure is misclassified (non compliance) or
contaminated (unblinding)
40
† Observational studies are needed
† Causal inference from real randomized
studies may require assumptions and
analytic methods similar to those for causal
inference from observational studies
Causal inference (I)
38
41
„ In fact, most of human knowledge comes from
observations, e.g., evolution theory, tectonic
plaques theory, hot coffee may cause burns…
† And so are methods for causal inference
from observational data
Causal inference (I)
42
7
In observational studies
In observational studies, can we
assume exchangeability?
† Absence of randomization implies that
exchangeability is not guaranteed
† In general,
† Too strong an assumption!
† The exposed and the unexposed are
not generally comparable
„ Pr[Ya=1=1] is not equal to Pr[Y=1|A=1]
„ Pr[Ya=0=1] is not equal to Pr[Y=1|A=0]
„ e.g., individuals who receive a heart
transplant may have a more severe
disease than those who do not receive it
† Therefore the associational RR
„ Pr[Y=1|A=1] / Pr[Y=1|A=0]
is not generally equal to the causal RR
„ Pr[Ya=1=1] / Pr[Ya=0=1]
Causal inference (I)
43
Dead end?
¾
¾
† In general,
PrYa  1|A  1  PrYa  1|A  0  A 
 Ya  Ya 
 A
Causal inference (I)
In search of a weaker condition
† Consider only individuals with the same
pre-exposure prognostic factors
† Then the exposed and the unexposed may
be exchangeable
Exchangeability (a consequence of
randomization) is a condition for causal
inference
Exchangeability is not generally an
acceptable assumption in observational
studies
† A condition weaker than exchangeability is
needed for causal inference from
observational data
Causal inference (I)
45
„ e.g., among individuals with an ejection fraction
of 50%, those who do and do not receive a
heart transplant may be comparable
„ e.g., among individuals with CD4 count<100,
those who do and do not receive antiretroviral
therapy may be comparable
† This is often reasonable
„ Especially if conditioning on many pre-exposure
covariates L
Causal inference (I)
Available data set (with covariates)
Conditional exchangeability
ID
1
2
3
4
5
6
7
8
† Within levels of the covariates L,
exposed subjects would have had the
same risk as unexposed subjects had
they being unexposed, and vice versa
† Counterfactual risk is the same in the
exposed and the unexposed with the
same value of L
L
0
0
0
0
1
1
1
1
44
A
1
1
0
0
1
1
0
0
Y
1
0
1
0
1
0
1
0
Causal inference (I)
Ya=0
?
?
1
0
?
?
1
0
Ya=1
1
0
?
?
1
0
?
?
46
PrYa  1|A  1, L  l  PrYa  1|A  0, L  l 
A  Ya|L  l  Ya  A|L  l
47
Causal inference (I)
48
8
Formal definition of
conditional exchangeability
Ya  A|L  l
OK, conditional exchangeability is a
weaker condition, so what?
† Conditional exchangeability is a necessary
condition for causal inference from
observational data
† Under conditional exchangeability in all
strata L=l, we can compute (consistently
estimate) the causal risk ratio
„ Pr[Ya=1=1] / Pr[Ya=0=1]
† The assumption of conditional
exchangeability implies
„ Pr[Y=1|L=l, A=a] = Pr[Ya=1|L=l]
for all a
† Conditional exchangeability is
equivalent to randomization within
levels of L
† It implies no unmeasured (residual)
confounding within levels of the
measured variables L
Causal inference (I)
49
Proof:
Pr[Y=1|L=l, A=a] = Pr[Ya=1|L=l]
† Association is causation within levels of the
covariates
† under the assumption of conditional
exchangeability
„ by definition of counterfactual variable
2. Pr[Ya=1|L=l, A=a] = Pr[Ya=1|L=l]
„ by assumption (conditional
exchangeability)
™ We have a method for causal inference
from observational data that it is not
assumption-free
† Same as for randomized studies but
within levels of L
„ But the need to rely on this assumption is not
THE problem
51
Causal inference (I)
Can we check whether conditional
exchangeability holds?
That’s why causal inference from
observational data is controversial
™ No
† Expert knowledge can be used to
enhance the plausibility of the
assumption
† This is THE problem
† The assumption of conditional
exchangeability is untestable
52
„ measure as many relevant pre-exposure
covariates as possible
† Then one can only hope the
assumption of conditional
exchangeability is approximately true
„ Even if there is conditional
exchangeability, there is no way we can
know it with certainty
Causal inference (I)
50
In an observational study
† Two steps:
1. Pr[Y=1|L=l, A=a] = Pr[Ya=1|L=l, A=a]
Causal inference (I)
Causal inference (I)
„ (All we are saying is that there may be
confounding due to unmeasured factors)
53
Causal inference (I)
54
9
Inverse probability weighting (IPW)
† 500 HIV-infected individuals
† Variables:
† A method to compute causal effects
under conditional exchangeability
† Plan of action:
„ L=1: CD4 cell count<200 cells/microL
„ A=1: on highly active antiretroviral
therapy (HAART)
„ Y=1: AIDS
„ YOU will compute the causal risk ratio
using IPTW in an observational study
† i.e., you will compute Pr[Ya=1=1]/Pr[Ya=0=1]
† Treatment status is decided after
looking at CD4 cell count
† No loss to follow-up
under conditional exchangeability
„ We will prove that you were right
Causal inference (I)
A simplified example of
observational study
55
The data summarized in a table
Causal inference (I)
56
The data summarized in a tree
10
40
L=0
L=1
30
Y=1
Y=0
Y=1
Y=0
20
A=1
20
30
108
252
16
A=0
40
10
24
16
24
252
108
Causal inference (I)
57
Causal inference (I)
58
10
Your goal
40
† To compute the effect of HAART on
the risk of AIDS on the causal risk
ratio scale
„ Pr[Ya=1=1] / Pr[Ya=0=1]
„ Assuming conditional
exchangeability within levels of L
30
20
16
24
252
† First, compute Pr[Ya=0=1]
† Second, compute Pr[Ya=1=1]
Causal inference (I)
108
59
Causal inference (I)
60
10
20
Pseudopopulation data analysis
80
60
40
160
Ya = 1
Ya = 0
a=1
160
340
a=0
320
180
240
† Pr[Ya=1=1] = 160 / (160 +340) = 0.32
† Pr[Ya=0=1] = 320 / (320 + 180) = 0.64
† Causal risk ratio = 0.32 / 0.64 = 0.5
280
120
Causal inference (I)
61
Which assumption are you making?
Ya  A|L  l
for all a
† Conditional exchangeability in the population
„ exposure is randomized within levels of L
„ no unmeasured confounding within levels of
the measured variable L
† Within levels of L, the risk among the
exposed if unexposed is the same as the risk
among the unexposed in the population
Causal inference (I)
62
Under conditional exchangeability…
† The observational study in the original population is a
randomized experiment within levels of L
† The study in the pseudopopulation created by IPTW is a
randomized experiment
„ Exposed and unexposed subjects are (unconditionally!)
exchangeable
„ Because they are the same individuals
„ Exposure is randomized, i.e., equally probable across
levels of the covariate L
„ There is no confounding
† In the pseudopopulation, causal effects can be
estimated as in a randomized experiment
„ No need for adjustments of any sort
„ and vice versa
Causal inference (I)
63
Causal inference (I)
64
W = 1 / f [A|L]
You did it
† You computed the causal risk ratio
using inverse-probability-oftreatment weighting
† Right?
Causal inference (I)
65
Causal inference (I)
20
1/.5 = 2
80
1/.5 = 2
60
1/.5 = 2
40
1/.5 = 2
160
1/.1 = 10
240
1/.1 = 10
280
1/.9 = 1.11
120
1/.9 = 1.11
66
11
Inverse-probability-of-treatment
weights
W
Important difference
1
fA|L
† Each individual in the population is
weighted to create W individuals in the
pseudopopulation
† The denominator of your W is (informally)
the probability of having your observed
treatment value given your L value
† Propensity score
† The PS is the probability of being treated
given L
„ Equal for all individuals with same L value
because it does not depend on the A value
„ Not equal for all individuals with same L value
because it depends on A value as well
Causal inference (I)
67
Causal inference (I)
68
Proof: Inverse probability weighted
mean equals counterfactual mean
Notational clarification
† fA(a) or f(a) is the probability density function
(pdf) of the random variable A evaluated at
the value a
† For discrete A: f(a) = Pr[A=a]
† We need to represent the probability that
each subject had his/her own exposure
level A
„ Pr[A=A] makes no sense
„ f(A) is the pdf evaluated at the random argument
A, exactly what we mean
Causal inference (I)
W 1
PS
PS  PrA  1|L
69
E IA  a
E E
E E
Y
fA|L
E
IA  a
Ya|L
fA  a|L
IA  a
Ya
fA  a|L

IA  a
|L EYa |L
fA  a|L

† By definition
(consistency)
† Just algebra
„

† By assumption
„
EEYa |L  EYa 
E[X] = E[ E[X|Z] ]
E[XZ] = E[X] E[Z] if X
and Z are independent
† Just algebra
Causal inference (I)
70
The positivity condition
IPW estimation
† Required for the proof
† In each level of L in the population,
there must be exposed and
unexposed individuals
† If f(l)>0 then f(a|l)>0 for all a
† In general, we refer to the causal risk
ratio estimate obtained by using
weights IPT weights W as an IPTW
estimate
„ conditional probabilities must be positive
† IPW cannot be used when the
positivity condition is not met
Causal inference (I)
71
† The idea is a generalization of
Horvitz-Thompson (1952) estimators
for survey sampling
Causal inference (I)
72
12
IPW as a simulation
† Weighting is the equivalent of simulating
what would happen had all individuals in
the population experience every possible
exposure level
† Individuals in the original population who
received exposure level a are weighted to
represent all individuals (regardless of
exposure level) in the population
„ sample size of pseudopulation is equal to
number of exposure levels times the size of
original population
Causal inference (I)
73
13