Download Presentation (powerpoint).

Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson Outline • Methods for learning causal models – Data mining, Elicitation, Hybrid approach • Algorithms for learning causal models – Constraint based – Metric based (including our CaMML) • Incorporating priors into CaMML – 5 different types of priors • Experimental Design • Experimental Results Learning Causal Bayesian Networks Elicitation Data mining Requires domain knowledge Requires large dataset Expensive and timeconsuming Sometimes, the algorithms are “stupid” (no prior knowledge →no common sense) Partial knowledge may be insufficient Data only tells part of the story A hybrid approach • Combine the domain knowledge and the facts learned from data • Minimize the expert’s effort in domain knowledge elicitation Elicitation Data Mining Causal BN • Enhance the efficiency of the learning process – Reduce / bias the search space Objectives • Generate different prior specification methods • Comparatively study the influences of priors on the BN structural learning • Future: apply the methods to the Heart Disease modeling project Causal learning algorithms • Constraint based – Pearl & Verma’s algorithm, PC • Metric based – MML, MDL, BIC, BDe, K2, K2+MWST,GES,CaMML • Priors on structure – Optional vs. Required – Hard vs. Soft Priors on structure Required K2 (BNT) Optional yes Hard Soft yes K2+MWST (BNT) yes yes GES (Tetrad) yes yes PC (Tetrad ) yes yes CaMML yes yes yes CaMML • MML metric based • MML vs. MDL – MML can be derived from Bayes’ Theorem (Wallace) – MDL is a non-Bayesian method • Search: MCMC sampling through TOM space – TOM = DAG + total ordering – TOM is finer than DAG A B A C B One Tom: ABC C Two TOMs: ABC, ACB Priors in CaMML: arcs Experts may provide priors on pairwise relations: 1. Directed arcs: – – 2. e.g. {A→B 0.7} (soft) e.g. {A→D 1.0} (hard) Undirected arcs – E.g. {A─C 0.6} (soft) 3. {A→B 0.7; B→A 0.8; A─C 0.6} – Represented by 2 adjacency matrices A B C A 0.7 B 0.8 C Directed arcs A A B C B C 0.6 0.6 Undirected arcs Priors in CaMML: arcs (continued) A A 0.7 0.6 0.8 B B C C expert specified network • MML cost for each pair AB: log(0.7) + log(1-0.8) AC: log(1-0.6) BC: log( default arc prior) One candidate network Priors in CaMML: Tiers • Expert can provide prior on an additional pairwise relation • Tier: Temporal ordering of variables E.g., Tier {A>>C 0.6;B>>C 0.8} A C One possible TOM B IMML(h)=log(0.6)+log(1-0.8) Priors in CaMML: edPrior • Expert specifies single network, plus a confidence – e.g. EdConf=0.7 • Prior is based on edit distance from this network A B A C Expert specified network B C One candidate network :ED=2 IMML(h)=-2*(log0.7-log(1-0.7)) Priors in CaMML: KTPrior • Again, expert specifies single network, plus a confidence – e.g. KTConf = 0.7 • Prior is based on Kendall-Tau Edit distance from this network – KTEditDist = KT + undirected ED A B A C B Expert specified dag TOM: ABC C A candidate TOM: ACB • B-C order in expert TOM disagrees with candidate TOM • KTEditDist = KT(1) + Undirected ED (2) = 3 IMML(h)=-3*(log0.7-log(1-0.7)) Experiment 1: Design • Prior – weak, strong – correct, incorrect • Size of dataset – 100,1000,10k and 100k – For each size we randomly generate 30 datasets • Algorithms: – – – – – CaMML K2 (BNT) K2+MWST (BNT) GES (TETRAD) PC (TETRAD) • Models: AsiaNet, “Model6”(An artificial model) Models: AsiaNet and “Model6” Experimental Design Algorithms Priors Sample Size Experiment Design: Evaluation • ED: Difference between Structures • KL: Difference between distributions Model6 (1000 samples) Model6 (10k samples) AsiaNet (1000 Samples) Experiment 1: Results • With default priors: CaMML is comparable to or outperforms other algorithms • With full tiers: – There is no statistically significant differences between CaMML and K2 – GES is slightly behind, PC performs poorly. • CaMML is the only method allowing soft priors: – with the prior 0.7, CaMML is comparable to other algorithms with full tiers – With stronger prior, CaMML performs better • CaMML performs significantly better with expert’s priors than with uniform priors Expertiment 2: Is CaMML well calibrated? • Biased prior – Expert’s confidence may not be consistent with the expert’s skill e.g, expert 0.99 sure but wrong about a connection – Biased hard prior – Soft prior and data will eventually overcome the bad prior Is CaMML well calibrated? • Question: Does CaMML reward well calibrated experts? • Experimental design – Objective measure: How good is a proposed structure?: • ED: 0-14 – Subjective measure: Expert’s confidence • 0.5 to 0.9999 – How good is the learned structure? • KL distance Effect of expert skill and confidence on quality of learned model Overconfidence penalized Unconfident expert Justified confidence rewarded Better ← Expert Skill → Worse Experiment 2: Results • CaMML improves the elicited structure and approaches the true structure • CaMML improves when the expert confidence matches with the expert skill Conclusions • CaMML is comparable to other algorithms when given equivalent prior knowledge • CaMML can incorporate more flexible prior knowledge • CaMML’s results improve when expert is skillful or well calibrated Thanks

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Presentation (powerpoint).