Download Presentation (powerpoint).

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Incorporating Prior Information in
Causal Discovery
Rodney O'Donnell, Jahangir Alam,
Bin Han, Kevin Korb and Ann Nicholson
Outline
• Methods for learning causal models
– Data mining, Elicitation, Hybrid approach
• Algorithms for learning causal models
– Constraint based
– Metric based (including our CaMML)
• Incorporating priors into CaMML
– 5 different types of priors
• Experimental Design
• Experimental Results
Learning Causal Bayesian
Networks
Elicitation
Data mining
Requires domain knowledge Requires large dataset
Expensive and timeconsuming
Sometimes, the algorithms are
“stupid”
(no prior knowledge →no
common sense)
Partial knowledge may be
insufficient
Data only tells part of the story
A hybrid approach
• Combine the domain
knowledge and the
facts learned from
data
• Minimize the
expert’s effort in
domain knowledge
elicitation
Elicitation
Data Mining
Causal BN
• Enhance the efficiency of the learning process
– Reduce / bias the search space
Objectives
• Generate different prior specification
methods
• Comparatively study the influences of
priors on the BN structural learning
• Future: apply the methods to the Heart
Disease modeling project
Causal learning algorithms
•
Constraint based
– Pearl & Verma’s algorithm, PC
•
Metric based
– MML, MDL, BIC, BDe, K2,
K2+MWST,GES,CaMML
•
Priors on structure
– Optional vs. Required
– Hard vs. Soft
Priors on structure
Required
K2 (BNT)
Optional
yes
Hard
Soft
yes
K2+MWST
(BNT)
yes
yes
GES
(Tetrad)
yes
yes
PC (Tetrad )
yes
yes
CaMML
yes
yes
yes
CaMML
• MML metric based
• MML vs. MDL
– MML can be derived from Bayes’ Theorem (Wallace)
– MDL is a non-Bayesian method
• Search: MCMC sampling through TOM space
– TOM = DAG + total ordering
– TOM is finer than DAG
A
B
A
C
B
One Tom: ABC
C
Two TOMs: ABC, ACB
Priors in CaMML: arcs
Experts may provide priors on pairwise relations:
1.
Directed arcs:
–
–
2.
e.g. {A→B 0.7} (soft)
e.g. {A→D 1.0} (hard)
Undirected arcs
–
E.g. {A─C 0.6} (soft)
3. {A→B 0.7; B→A 0.8; A─C 0.6}
–
Represented by 2 adjacency matrices
A
B
C
A
0.7
B 0.8
C
Directed arcs
A
A
B
C
B
C
0.6
0.6
Undirected arcs
Priors in CaMML: arcs (continued)
A
A
0.7
0.6
0.8
B
B
C
C
expert specified network
• MML cost for each pair
AB: log(0.7) + log(1-0.8)
AC: log(1-0.6)
BC: log( default arc prior)
One candidate network
Priors in CaMML: Tiers
• Expert can provide prior on an additional
pairwise relation
• Tier: Temporal ordering of variables
E.g., Tier {A>>C 0.6;B>>C 0.8}
A
C
One possible TOM
B
IMML(h)=log(0.6)+log(1-0.8)
Priors in CaMML: edPrior
• Expert specifies single network, plus a
confidence
– e.g. EdConf=0.7
• Prior is based on edit distance from this network
A
B
A
C
Expert specified network
B
C
One candidate network :ED=2
IMML(h)=-2*(log0.7-log(1-0.7))
Priors in CaMML: KTPrior
• Again, expert specifies single network, plus a confidence
– e.g. KTConf = 0.7
• Prior is based on Kendall-Tau Edit distance from this network
– KTEditDist = KT + undirected ED
A
B
A
C
B
Expert specified dag
TOM: ABC
C
A candidate TOM: ACB
• B-C order in expert TOM disagrees with candidate TOM
• KTEditDist = KT(1) + Undirected ED (2) = 3
IMML(h)=-3*(log0.7-log(1-0.7))
Experiment 1: Design
• Prior
– weak, strong
– correct, incorrect
• Size of dataset
– 100,1000,10k and 100k
– For each size we randomly generate 30 datasets
• Algorithms:
–
–
–
–
–
CaMML
K2 (BNT)
K2+MWST (BNT)
GES (TETRAD)
PC (TETRAD)
• Models: AsiaNet, “Model6”(An artificial model)
Models: AsiaNet and “Model6”
Experimental Design
Algorithms
Priors
Sample
Size
Experiment Design: Evaluation
• ED: Difference between Structures
• KL: Difference between distributions
Model6 (1000 samples)
Model6 (10k samples)
AsiaNet (1000 Samples)
Experiment 1: Results
• With default priors: CaMML is comparable to or
outperforms other algorithms
• With full tiers:
– There is no statistically significant differences
between CaMML and K2
– GES is slightly behind, PC performs poorly.
• CaMML is the only method allowing soft priors:
– with the prior 0.7, CaMML is comparable to other
algorithms with full tiers
– With stronger prior, CaMML performs better
• CaMML performs significantly better with
expert’s priors than with uniform priors
Expertiment 2:
Is CaMML well calibrated?
• Biased prior
– Expert’s confidence may not be consistent
with the expert’s skill
e.g, expert 0.99 sure but wrong about a
connection
– Biased hard prior
– Soft prior and data will eventually overcome
the bad prior
Is CaMML well calibrated?
• Question: Does CaMML reward well
calibrated experts?
• Experimental design
– Objective measure: How good is a proposed
structure?:
• ED: 0-14
– Subjective measure: Expert’s confidence
• 0.5 to 0.9999
– How good is the learned structure?
• KL distance
Effect of expert skill and confidence
on quality of learned model
Overconfidence penalized
Unconfident expert
Justified confidence rewarded
Better
←
Expert Skill
→
Worse
Experiment 2: Results
• CaMML improves the elicited structure
and approaches the true structure
• CaMML improves when the expert
confidence matches with the expert skill
Conclusions
• CaMML is comparable to other algorithms
when given equivalent prior knowledge
• CaMML can incorporate more flexible prior
knowledge
• CaMML’s results improve when expert is
skillful or well calibrated
Thanks
Related documents