Download Z - UCLA Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Twin study wikipedia , lookup

Oncogenomics wikipedia , lookup

Genetic testing wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Heritability of IQ wikipedia , lookup

Transcript
CAUSAL REASONING FOR
DECISION AIDING SYSTEMS
COGNITIVE SYSTEMS LABORATORY
UCLA
Judea Pearl, Mark Hopkins, Blai Bonet,
Chen Avin, Ilya Shpitser
PRESENTATIONS
Judea Pearl
Robustness of Causal Claims
Ilya Shpitser and Chen Avin
Experimental Testability of Counterfactuals
Blai Bonet
Logic-based Inference on Bayes Networks
Mark Hopkins
Inference using Instantiations
Chen Avin
Inference in Sensor Networks
Blai Bonet
Report from Probabilistic Planning Competition
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
Probability and statistics deal with static relations
Statistics
Probability
inferences
Data
from passive
observations
Causal analysis deals with changes (dynamics)
1. Effects of
Data
interventions
Causal
2. Causes of
Model
Causal
effects
assumptions
3. Explanations
Experiments
joint
distribution
TYPICAL CAUSAL MODEL
X
Y
Z
INPUT
OUTPUT
TYPICAL CLAIMS
1. Effects of potential interventions,
2. Claims about attribution (responsibility)
3. Claims about direct and indirect effects
4. Claims about explanations
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
a
y
Cancer
In
linear
systems:
y = on
ax cancer
+e
The
effect
of smoking
is, in general,
a
is non-identifiable.
non-identifiable
(from observational studies).
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
y
x
Smoking
Cancer
Z – Instrumental variable; cov(z,u) = 0
a is identifiable
R yz  a b

Rxz  b 
a
R yz
Rxz
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
x
Smoking
y
Cancer
Problem with Instrumental Variables:
The model may be wrong!
R yz
R yz  ab
a
Rxz
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Z2
Peer
Pressure
Genetic Factors (unobserved)
u
b
a
g
y
x
Smoking
Cancer
Solution: Invoke several instruments
a1 
R yz1
Rxz1
Surprise: a1 = a2
a2 
R yz2
Rxz2
model is likely correct
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Z2
Peer
Pressure
Genetic Factors (unobserved)
u
b
a
g
x
Smoking
y
Cancer
Z3
Anti-smoking Legislation
Zn
Greater surprise: a1 = a2 = a3….= an = q
Claim a = q is highly likely to be correct
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
a
y
Cancer
s
Symptom
Symptoms do not act as instruments
a remains non-identifiable
Why? Taking a noisy measurement (s) of an
observed variable (y) cannot add new information
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
Sn
u
S2
a
x
y
Smoking
Cancer
S1
Symptom
Adding many symptoms does not help.
a remains non-identifiable
ROBUSTNESS:
MOTIVATION
Given a parameter a in a general graph
a
x
y
Find if a can evoke an equality surprise
a1 = a2 = …an
associated with several independent estimands of a
Formulate: Surprise, over-identification, independence
Robustness: The degree to which a is robust to violations
of model assumptions
ROBUSTNESS:
FORMULATION
Bad attempt:
if:
f1, f2:
Parameter a is robust (over identifies)
a  f1()
a  f 2 ()
Two distinct functions
if model induces constraint g ()  0, then
a  f ()  t1[ g ()]  f ()  t2[ g ()]  
ti [ g ()] are distinct.
ROBUSTNESS:
FORMULATION
ex
ey
b
x
Ryx = b
Rzx = bc
Rzy = c
ez
x = ex
y = bx + ey
z = cy + ez
c
y
z
(b)
b  R yx
b  Rzx / Rzy
(c)
c  Rzy
c  Rzx / R yx
constraint:
y → z irrelvant to derivation of b
Rzx  R yx Rzy
RELEVANCE:
FORMULATION
Definition 8
Let A be an assumption embodied in model M,
and p a parameter in M. A is said to be relevant
to p if and only if there exists a set of assumptions
S in M such that S and A sustain the identification
of p but S alone does not sustain such
identification.
Theorem 2
An assumption A is relevant to p if and only if A is a
member of a minimal set of assumptions sufficient
for identifying p.
ROBUSTNESS:
FORMULATION
Definition 5 (Degree of over-identification)
A parameter p (of model M) is identified to
degree k (read: k-identified) if there are k
minimal sets of assumptions each yielding a
distinct estimand of p.
ROBUSTNESS:
FORMULATION
b
c
x
y
Minimal assumption sets for c.
x
c
y
G1
z
x
c
y
z
c
z x
y
G3
G2
Minimal assumption sets for b.
x
b
y
z
z
FROM MINIMAL ASSUMPTION SETS
TO MAXIMAL EDGE SUPERGRAPHS
FROM PARAMETERS TO CLAIMS
Definition
A claim C is identified to degree k in model M (graph
G), if there are k edge supergraphs of G that permit the
identification of C, each yielding a distinct estimand.
e.g., Claim: (Total effect) TE(x,z) = q
x
y
TE(x,z) = Rzx
z x
x
y
y
z
TE(x,z) = Rzx Rzy ·x
z
FROM MINIMAL ASSUMPTION SETS
TO MAXIMAL EDGE SUPERGRAPHS
FROM PARAMETERS TO CLAIMS
Definition
A claim C is identified to degree k in model M (graph
G), if there are k edge supergraphs of G that permit the
identification of C, each yielding a distinct estimand.
e.g., Claim: (Total effect) TE(x,z) = q
x
Nonparametric
y
TE ( x, z )  P( z | x)
z x
x
y
z
z
y
TE ( z, x)   P( y | x) P( z | x' , y ) P( x' )
y
x'
CONCLUSIONS
1. Formal definition to ROBUSTNESS
of causal claims.
2. Graphical criteria and algorithms for
computing the degree of robustness
of a given causal claim.