Download biostats - CMU Philosophy Department Web Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Causal Discovery
Richard Scheines
Peter Spirtes, Clark Glymour,
and many others
Dept. of Philosophy & CALD
Carnegie Mellon
Nov. 13th, 2003
1
Outline
1.
Motivation
2.
Representation
3.
Connecting Causation to Probability (Independence)
4.
Searching for Causal Models
5.
Improving on Regression for Causal Inference
Nov. 13th, 2003
2
1. Motivation
Non-experimental Evidence
Day Care
John
A lot
Mary
None
Aggressivenes
s
A lot
A little
Typical Predictive Questions
•
Can we predict aggressiveness from the amount of violent TV watched
•
Can we predict crime rates from abortion rates 20 years ago
Causal Questions:
•
Does watching violent TV cause Aggression?
•
I.e., if we change TV watching, will the level of Aggression change?
Nov. 13th, 2003
3
Causal Estimation
When and how can we use non-experimental data to
tell us about the effect of an intervention?
Manipulated Probability P(Y | X set= x, Z=z)
from
Unmanipulated Probability P(Y | X = x, Z=z)
Nov. 13th, 2003
4
2. Representation
1. Association & causal structure - qualitatively
2. Interventions
3. Statistical Causal Models
1. Bayes Networks
2. Structural Equation Models
Nov. 13th, 2003
5
Causation & Association
X and Y are associated (X _||_ Y) iff
x1  x2 P(Y | X = x1)  P(Y | X = x2)
Association is symmetric: X _||_ Y  Y _||_ X
X is a cause of Y iff
x1  x2 P(Y | X set= x1)  P(Y | X set= x2)
Causation is asymmetric: X
Y 
Nov. 13th, 2003
Y
X
6
Direct Causation
X is a direct cause of Y relative to S, iff
z,x1  x2 P(Y | X set= x1 , Z set= z)
 P(Y | X set= x2 , Z set= z)
where Z = S - {X,Y}
X
Nov. 13th, 2003
Y
7
Causal Graphs
Causal Graph G = {V,E}
Each edge X  Y represents a direct causal claim:
X is a direct cause of Y relative to V
Exposure
Chicken Pox
Exposure
Rash
Infection
Nov. 13th, 2003
Rash
8
Causal Graphs
Not Cause Complete
Exposure
Omitted Causes
Infection
Symptoms
Common Cause Complete
Omitted
Common Causes
Exposure
Infection
Nov. 13th, 2003
Symptoms
9
Modeling Ideal Interventions
Ideal Interventions (on a variable X):
• Completely determine the value or
distribution of a variable X
• Directly Target only X
(no “fat hand”)
E.g., Variables: Confidence, Athletic Performance
Intervention 1: hypnosis for confidence
Intervention 2: anti-anxiety drug (also muscle relaxer)
Nov. 13th, 2003
10
Modeling Ideal Interventions
Interventions on the Effect
Post
Pre-experimental System
Sweaters
On
Nov. 13th, 2003
Room
Temperature
11
Modeling Ideal Interventions
Interventions on the Cause
Post
Pre-experimental System
Sweaters
On
Nov. 13th, 2003
Room
Temperature
12
Interventions & Causal Graphs
• Model an ideal intervention by adding an “intervention” variable
outside the original system
• Erase all arrows pointing into the variable intervened upon
Intervene to change Inf
Pre-intervention graph
Post-intervention graph?
Exp
Exp
Inf
Inf
Rash
Rash
I
Nov. 13th, 2003
13
Conditioning vs. Intervening
P(Y | X = x1) vs. P(Y | X set= x1)
Teeth Slides
Nov. 13th, 2003
14
Causal Bayes Networks
The Joint Distribution Factors
Smoking [0,1]
According to the Causal Graph,
Yellow Fingers
[0,1]
Lung Cancer
[0,1]
P(S = 0) = .7
P(S = 1) = .3
P(YF = 0 | S = 0) = .99
P(YF = 1 | S = 0) = .01
P(YF = 0 | S = 1) = .20
P(YF = 1 | S = 1) = .80
i.e., for all X in V
P(V) = P(X|Immediate
Causes of(X))
P(S,YF, L) = P(S) P(YF | S) P(LC | S)
P(LC = 0 | S = 0) = .95
P(LC = 1 | S = 0) = .05
P(LC = 0 | S = 1) = .80
P(LC = 1 | S = 1) = .20
Nov. 13th, 2003
15
Structural Equation Models
Education
Causal Graph
Income
Statistical
Model
Longevity
1. Structural Equations
2. Statistical Constraints
Nov. 13th, 2003
16
Structural Equation Models
Education
Causal Graph
Income
Longevity
z Structural Equations:
One Equation for each variable V in the graph:
V = f(parents(V), errorV)
for SEM (linear regression) f is a linear function
z Statistical Constraints:
Joint Distribution over the Error terms
Nov. 13th, 2003
17
Structural Equation Models
Causal Graph
Education
Equations:
Education = ed
Longevity
Income = Educationincome
Longevity = EducationLongevity
Income
SEM Graph
Education
(path diagram)
1
Income
Income
2
Longevity
Statistical Constraints:
(ed, Income,Income ) ~N(0,2)
2 diagonal
- no variance is zero
Longevity
Nov. 13th, 2003
18
3. Connecting
Causation to Probability
Nov. 13th, 2003
19
The Markov Condition
Causal
Structure
Statistical
Predictions
Causal Markov Axiom
Independence
Causal Graphs
X
Y
X _||_ Z | Y
Z
i.e.,
P(X | Y) = P(X | Y, Z)
Nov. 13th, 2003
20
Causal Markov Axiom
If G is a causal graph, and P a
probability distribution over the
variables in G, then in P:
every variable V is independent of its
non-effects, conditional on its
immediate causes.
Nov. 13th, 2003
21
Causal Markov Condition
Two Intuitions:
1) Immediate causes make effects independent of
remote causes (Markov).
2) Common causes make their effects independent
(Salmon).
Nov. 13th, 2003
22
Causal Markov Condition
1) Immediate causes make effects independent of
remote causes (Markov).
E = Exposure to Chicken Pox
I = Infected
S = Symptoms
Markov Cond.
E
I
E || S | I
S
Nov. 13th, 2003
23
Causal Markov Condition
2) Effects are independent conditional on their common
causes.
Smoking
(S)
YF || LC | S
Markov Cond.
Yellow Fingers
(YF)
Lung Cancer
(LC)
Nov. 13th, 2003
24
Causal Structure  Statistical Data
Acyclic Causal Graph
X1
X2
X3
Causal Markov Axiom
(D-separation)
Independence
Relations
X1
Nov. 13th, 2003
X3 | X2
25
Causal Markov Axiom
In SEMs, d-separation follows from
assuming independence among error
terms that have no connection in the path
diagram i.e., assuming that the model is common
cause complete.
Nov. 13th, 2003
26
Causal Markov and
D-Separation
• In acyclic graphs: equivalent
• Cyclic Linear SEMs with uncorrelated errors:
• D-separation correct
• Markov condition incorrect
• Cyclic Discrete Variable Bayes Nets:
• If equilibrium --> d-separation correct
• Markov incorrect
Nov. 13th, 2003
27
D-separation:
Conditioning vs. Intervening
T
X1
P(X3 | X2)  P(X3 | X2, X1)
X3
X2
X3 _||_ X1 | X2
T
X1
X2
X3
P(X3 | X2 set= ) = P(X3 | X2 set=, X1)
X3 _||_ X1 | X2 set=
I
Nov. 13th, 2003
28
4. Search
From Statistical Data
to Probability
to Causation
Nov. 13th, 2003
29
Causal Discovery
Statistical Data  Causal Structure
Data
Equivalence Class of
Causal Graphs
X1
X1
X1
X2
X2
X2
X3
Causal Markov Axiom
(D-separation)
X3
X3
Statistical
Inference
Discovery Algorithm
Independence
Relations
X1
X3 | X2
Background Knowledge
- X2 before X3
- no unmeasured common causes
Nov. 13th, 2003
30
Representations of
D-separation Equivalence Classes
We want the representations to:
• Characterize the Independence Relations
Entailed by the Equivalence Class
• Represent causal features that are shared
by every member of the equivalence class
Nov. 13th, 2003
31
Patterns & PAGs
• Patterns (Verma and Pearl, 1990): graphical
representation of an acyclic d-separation
equivalence - no latent variables.
• PAGs: (Richardson 1994) graphical
representation of an equivalence class including
latent variable models and sample selection bias
that are d-separation equivalent over a set of
measured variables X
Nov. 13th, 2003
32
Patterns
Example
Possible Edges
X1
X2
X1
X2
X1
X2
Nov. 13th, 2003
X1
X2
X3
X4
33
Patterns: What the Edges
Mean
X2
X1 and X2 are not adjacent in any
member of the equivalence class
X1
X2
X1  X2 (X1 is a cause of X2) in
every member of the equivalence
class.
X1
X2
X1  X2 in some members of the
equivalence class, and X2  X1 in
others.
X1
Nov. 13th, 2003
34
Patterns
X1
X2
Pattern
X4
X3
Represents
X1
X1
X2
X3
X4
Nov. 13th, 2003
X2
X3
X4
35
PAGs: Partial Ancestral Graphs
What PAG edges mean.
X1
X2
X1 and X2 are not adjacent
X1
X2
X2 is not an ancestor of X1
X1
X2
No set d-separates X2 and X1
X1
X2
X1 is a cause of X2
X1
X2
There is a latent common
cause of X1 and X2
Nov. 13th, 2003
36
PAGs: Partial Ancestral
Graph
X1
X2
PAG
X3
Represents
X1
X2
X1
X3
X2
T1
X3
etc.
X1
X2
X1
T1
X3
T1
X2
X3
Nov. 13th, 2003
T2
37
Tetrad 4 Demo
www.phil.cmu.edu/projects/tetrad_download/
Nov. 13th, 2003
38
Overview of Search Methods
• Constraint Based Searches
• TETRAD
• Scoring Searches
• Scores: BIC, AIC, etc.
• Search: Hill Climb, Genetic Alg., Simulated Annealing
• Very difficult to extend to latent variable models
Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal
Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by
Glymour and Cooper, MIT Press, pp. 141-166
Nov. 13th, 2003
39
5. Regession and
Causal Inference
Nov. 13th, 2003
40
Regression to estimate
Causal Influence
• Let V = {X,Y,T}, where
- Y : measured outcome
- measured regressors: X = {X1, X2, …, Xn}
- latent common causes of pairs in X U Y: T = {T1, …, Tk}
• Let the true causal model over V be a Structural Equation
Model in which each V  V is a linear combination of its
direct causes and independent, Gaussian noise.
Nov. 13th, 2003
41
Regression to estimate
Causal Influence
• Consider the regression equation:
Y = b0 + b1X1 + b2X2 + ..…bnXn
• Let the OLS regression estimate bi be the estimated causal
influence of Xi on Y.
• That is, holding X/Xi experimentally constant, bi is an estimate of
the change in E(Y) that results from an intervention that changes
Xi by 1 unit.
• Let the real Causal Influence Xi  Y = i
• When is the OLS estimate bi an unbiased estimate of i
Nov. 13th, 2003
?
42
Linear Regression
Let the other regressors O = {X1, X2,....,Xi-1, Xi+1,...,Xn}
bi = 0 if and only if Xi,Y.O = 0
In a multivariate normal distribuion,
Xi,Y.O = 0 if and only if Xi _||_ Y | O
Nov. 13th, 2003
43
Linear Regression
So in regression:
bi = 0  Xi _||_ Y | O
But provably :
i = 0  S  O, Xi _||_ Y | S
So S  O, Xi _||_ Y | S  i = 0
~ S  O, Xi _||_ Y | S  don’t know
Nov. 13th, 2003
(unless we’re
lucky)
44
Regression Example
X1
X2
X1 _||_ Y | X2
b1 0
X2 _||_ Y | X1
b2 = 0
~S  {X2} X1 _||_ Y | S
Don’t
know
S  {X1} X2 _||_ Y | {X1}
2 = 0
Y
True Model
Nov. 13th, 2003
45
Regression Example
T1
X1
X2
T2
X3
Y
True Model
X1 _||_ Y | {X2,X3}
b1 0
X2 _||_ Y | {X1,X3}
b2 0
X3 _||_ Y | {X1,X2}
b3 0
~S  {X2,X3}, X1 _||_ Y | S
DK
S  {X1,X3}, X2 _||_ Y | {X1}
2 = 0
~S  {X1,X2}, X3 _||_ Y | S
DK
Nov. 13th, 2003
46
Regression Example
T1
X1
X2
T2
X3
X1
X2
X3
Y
Y
True Model
PAG
Nov. 13th, 2003
47
Regression Bias
If
• Xi is d-separated from Y conditional on X/Xi
in the true graph after removing Xi  Y, and
• X contains no descendant of Y, then:
bi is an unbiased estimate of i
See “Using Path Diagrams ….”
Nov. 13th, 2003
48
Applications
• Parenting among Single,
•
•
•
•
•
Black Mothers
Pneumonia
Photosynthesis
Lead - IQ
College Retention
Corn Exports
•
•
•
•
•
•
Rock Classification
Spartina Grass
College Plans
Political Exclusion
Satellite Calibration
Naval Readiness
Nov. 13th, 2003
49
References
•
Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C.
Glymour, and R. Scheines ( MIT Press)
•
Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge
Univ. Press
•
Computation, Causation, & Discovery (1999), edited by C. Glymour and G.
Cooper, MIT Press
•
Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre
Dame Press.
•
TETRAD IV: www.phil.cmu.edu/projects/tetrad
•
Web Course on Causal and Statistical Reasoning :
www.phil.cmu.edu/projects/csr/
Nov. 13th, 2003
50