Download original

Lecture 26 of 42 Inference and Software Tools 2 Discussion: MP5, Hugin, BNJ; Term Projects Friday, 31 October 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: http://snipurl.com/v9v3 Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730 Instructor home page: http://www.cis.ksu.edu/~bhsu Reading for Next Class: Chapter 15, Russell & Norvig 2nd edition CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Lecture Outline  Wednesday’s Reading: Sections 14.6 – 14.8, R&N 2e, Chapter 15  Friday: Sections 18.1 – 18.2, R&N 2e  Today: Inference in Graphical Models  Bayesian networks and causality  Inference and learning  BNJ interface (http://bnj.sourceforge.net)  Causality CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Bayes Net Application: Car Diagnosis © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/ CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Bayes Net Application: Interpreting Mammograms © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/ CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Bayes Net Application: Forecasting Oil Prices (ARCO1) © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/ CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Bayes Net Application: Forecasting Oil Prices – Real Network © 2003 D. Lin, University of Alberta CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/ CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Bayesian Network Inference  Inference: calculating P(X |Y ) for some variables or sets of variables X and Y.  Inference in Bayesian networks is #P-hard! Inputs: prior probabilities of .5 I1 I2 I3 I4 I5 Reduces to O P(O) must be Computing & Information Sciences How many satisfying CIS 530 / 730: Artificial Intelligence assignments? Friday, 31 Oct 2008 (#sat. assign.)*(.5^#inputs) Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Bayesian Network Inference  But…inference is still tractable in some cases.  Let’s look a special class of networks: trees / forests in which each node has at most one parent. CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Decomposing the probabilities  Suppose we want P(Xi | E ) where E is some set of evidence variables.  Let’s split E into two parts:  Ei- is the part consisting of assignments to variables in the subtree rooted at Xi  Ei+ is the rest of it Xi CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Decomposing the probabilities  i  i P( X i | E )  P( X i | E , E ) P( Ei | X , Ei ) P( X | Ei )  P( Ei | Ei )  i Xi  i P( E | X ) P( X | E )  P( Ei | Ei )  απ ( X i )λ( X i ) CIS 530 / 730: Artificial Intelligence Where: • a is a constant independent of Xi • p(Xi) = P(Xi |Ei+) • l(Xi) = P(Ei-| Xi) Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Using the decomposition for inference  We can use this decomposition to do inference as follows. First, compute l(Xi) = P(Ei-| Xi) for all Xi recursively, using the leaves of the tree as the base case.  If Xi is a leaf:  If Xi is in E : l(Xi) = 0 if Xi matches E, 1 otherwise  If Xi is not in E : Ei- is the null set, so P(Ei-| Xi) = 1 (constant) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Quick aside: “Virtual evidence”  For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree.  Why can we do this WLOG: Equivalent to Xi Observe Xi Xi Xi’ Observe Xi’ Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Calculating l(Xi) for non-leaves  Suppose Xi has one child, Xj  Then: Xi Xj λ ( X i )  P ( Ei | X i )   P( Ei , X j | X i ) X j   P ( X j | X i ) P ( Ei | X i , X j ) X j   P ( X j | X i ) P ( Ei | X j ) X j   P( X j | X i )λ( X j ) X j CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Calculating l(Xi) for non-leaves  Now, suppose Xi has a set of children, C.  Since Xi d-separates each of its subtrees, the contribution of each subtree to l(Xi) is independent: λ( X i )  P( Ei | X i )  λ j (Xi ) X j C      P( X j | X i )λ( X j ) X j C    Xj where lj(Xi) is the contribution to P(Ei-| Xi) of the part of the evidence lying in the subtree rooted at one of Xi’s children Xj. CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt We are now l-happy  So now we have a way to recursively compute all the l(Xi)’s, starting from the root and using the leaves as the base case.  If we want, we can think of each node in the network as an autonomous processor that passes a little “l message” to its parent. l l CIS 530 / 730: Artificial Intelligence l l Friday, 31 Oct 2008 l l Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt The other half of the problem  Remember, P(Xi|E) = ap(Xi)l(Xi). Now that we have all the l(Xi)’s, what about the p(Xi)’s?  p(Xi) = P(Xi |Ei+).  What about the root of the tree, Xr? In that case, Er+ is the null set, so p(Xr) = P(Xr). No sweat. Since we also know l(Xr), we can compute the final P(Xr).  So for an arbitrary Xi with parent Xp, let’s inductively assume we know p(Xp) and/or P(Xp|E). How do we get p(Xi)? CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Computing p(Xi) π( X i )  P ( X i | Ei )   P( X i , X p | Ei ) Xp   P( X i | X p , Ei ) P( X p | Ei ) Xp Xp   P( X i | X p ) P ( X p | Ei ) Xp Xi   P( X i | X p ) P( X p | E ) Xp λi ( X p )   P( X i | X p ) π i ( X p ) Xp Where pi(Xp) is defined as CIS 530 / 730: Artificial Intelligence P( X p | E ) Friday, 31 Oct 2008 λi ( X p ) Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt We’re done. Yay!  Thus we can compute all the p(Xi)’s, and, in turn, all the P(Xi|E)’s.  Can think of nodes as autonomous processors passing l and p messages to their neighbors l l p p l l p p CIS 530 / 730: Artificial Intelligence l p p Friday, 31 Oct 2008 l Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Conjunctive queries  What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)?  Just use chain rule:  P(A, B | C) = P(A | C) P(B | A, C)  Each of the latter probabilities can be computed using the technique just discussed. CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Polytrees  Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Dealing with cycles  Can deal with undirected cycles in graph by  clustering variables together A A B  Conditioning C BC D D Set to 0 CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Set to 1 Computing & Information Sciences Kansas State University www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt Join trees  Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. ABC A B E C D BCD BCD G DF F In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University BNJ Visualization: Pseudo-Code Annotation (Code Page) © 2004 KSU BNJ Development Team CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Graphical Models Overview [1]: Bayesian Networks   Conditional Independence  X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z  Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L Bayesian (Belief) Network  Acyclic directed graph model B = (V, E, ) representing CI assertions over   Vertices (nodes) V: denote events (each a random variable)  Edges (arcs, links) E: denote conditional dependencies  Markov Condition for BBNs (Chain Rule):  Example BBN n P  X 1 , X 2 ,  , X n    P  X i | parents X i  i 1 Exposure-To-Toxins Age X1 X3 Cancer Serum Calcium X6 X5 Gender X2 X4 X7 Smoking       Tumor Lung   Descendants NonDescendants Parents P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative) = P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Graphical Models Overview [3]: Inference Problem Multiply-connected case: exact, approximate inference are #P-complete Adapted from slides by S. Russell, UC Berkeley CIS 530 / 730: Artificial Intelligence http://aima.cs.berkeley.edu/ Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Tree Dependent Distributions  Polytrees  aka singly-connected Bayesian networks  Definition: a Bayesian network with no undirected loops  Idea: restrict distributions (CPTs) to single nodes  Theorem: inference in singly-connected BBN requires linear time • Linear in network size, including CPT sizes • Much better than for unrestricted (multiply-connected) BBNs  Tree Dependent Distributions  Further restriction of polytrees: every node has at one parent  Now only need to keep 1 prior, P(root), and n - 1 CPTs (1 per node) root  All CPTs are 2-dimensional: P(child | parent) z  Independence Assumptions x  As for general BBN: x is independent of non-descendants given (single) parent z  Very strong assumption (applies in some domains but not most) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983) C1 Upward (child-toparent) l messages C2 ’ (Ci’) modified during l message-passing phase C3 Downward p messages C4 C5 P’ (Ci’) is computed during p message-passing phase C6 Multiply-connected case: exact, approximate inference are #P-complete (counting problem is #P-complete iff decision problem is NP-complete) Adapted from Neapolitan (1990), Guo (2000) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques) Find Maximal Cliques A1 Clq4 Clq1 A 1 B 2 Moralize A B Bayesian Network (Acyclic Digraph) A G E F 6 E 3 D 8 D G 5 G5 E3 E3 G5 E3 Clq3 C4 H 7 C G E B2 C4 G5 C4 Clq6 F B Clq2 C 4 F F6 B2 Triangulate C4 D8 H Clq5 H7 C D H Adapted from Neapolitan (1990), Guo (2000) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Inference by Clustering [2]: Junction Tree – Lauritzen & Spiegelhalter (1988) Input: list of cliques of triangulated, moralized graph Gu Output: Tree of cliques Separators nodes Si, Residual nodes Ri and potential probability (Clqi) for all cliques Algorithm: 1. Si = Clqi (Clq1  Clq2 … Clqi-1) 2. Ri = Clqi - Si 3. If i >1 then identify a j < i such that Clqj is a parent of Clqi 4. Assign each node v to a unique clique Clqi that v  c(v)  Clqi 5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi} 6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques Adapted from Neapolitan (1990), Guo (2000) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Inference by Clustering [3]: Clique-Tree Operations Ri: residual nodes Si: separator nodes (Clqi): potential probability of Clique i A1 Clq4 Clq1 F6 B2 Clq2 B2 BEC Clq2 = {B,E,C} R2 = {C,E} S2 = { B } G5 E3 C4 Clq3 C4 G5 Clq3 = {E,C,G} R3 = {G} S3 = { E,C } C4 Clq6 C4 EGF Clq5 H7 Clq4 = {E, G, F} R4 = {F} S4 = { E,G } Adapted from Neapolitan (1990), Guo (2000) CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 B (Clq2) = P(C|B,E) Clq2 ECG EC (Clq3) = 1 Clq3 EG (Clq4) = P(E|F)P(G|F)P(F) D8 (Clq1) = P(B|A)P(A) Clq1 G5 E3 E3 AB Clq1 = {A, B} R1 = {A, B} S1 = {} CG (Clq5) = P(H|C,G) CGH Clq5 Clq4 Clq5 = {C, G,H} R5 = {H} S5 = { C,G } CD Clq6 = {C, D} R5 = {D} S5 = { C} (Clq2) = P(D|C) C Clq6 Computing & Information Sciences Kansas State University Inference by Loop Cutset Conditioning Age = [0, 10) Split vertex in undirected cycle; condition upon each of its state values X1,1 Age = [10, 20) X1,2 Exposure-ToToxins X3 Age = [100, ) X1,10 X2 Gender Serum Calcium Cancer X6 X5 X4 X7 Smoking Lung Tumor  Deciding Optimal Cutset: NP-hard  Current Open Problems Number of network instantiations: Product of arity of nodes in minimal loop cutset Posterior: marginal conditioned upon cutset variable values  Bounded cutset conditioning: ordering heuristics  Finding randomized algorithms for loop cutset optimization CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University BNJ Visualization [2] Pseudo-Code Annotation (Code Page) ALARM Network © 2004 KSU BNJ Development Team CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University BNJ Visualization [3] Network Poker Network © 2004 KSU BNJ Development Team CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Inference by Variable Elimination [1]: Intuition Adapted from slides by S. Russell, UC Berkeley CIS 530 / 730: Artificial Intelligence http://aima.cs.berkeley.edu/ Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Inference by Variable Elimination [2]: Factoring Operations Adapted from slides by S. Russell, UC Berkeley CIS 530 / 730: Artificial Intelligence http://aima.cs.berkeley.edu/ Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning Dtrain (Inductive Learning) D: Training Data [2] Representation Evaluator for Learning Problems Dval (Inference)  I e: Inference Specification f(α) α Representation Candidate Fitness Representation [1] Genetic Algorithm Genetic Wrapper for Change of Representation and Inductive Bias Control CIS 530 / 730: Artificial Intelligence α̂ Optimized Representation Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Tools for Building Graphical Models  Commercial Tools: Ergo, Netica, TETRAD, Hugin  Bayes Net Toolbox (BNT) – Murphy (1997-present)  Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html  Development group http://groups.yahoo.com/group/BayesNetToolbox  Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present)  Distribution page http://bnj.sourceforge.net  Development group http://groups.yahoo.com/group/bndev  Current (re)implementation projects for KSU KDD Lab • Continuous state: Minka (2002) – Hsu, Guo, Li • Formats: XML BNIF (MSBN), Netica – Barber, Guo • Space-efficient DBN inference – Meyer • Bounded cutset conditioning – Chandak CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University References [1]: Graphical Models and Inference Algorithms  Graphical Models  Bayesian (Belief) Networks tutorial – Murphy (2001) http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html  Learning Bayesian Networks – Heckerman (1996, 1999) http://research.microsoft.com/~heckerman  Inference Algorithms  Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html  (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html  Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986) http://citeseer.nj.nec.com/dechter96bucket.html  Recommended Books • Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) • Castillo, Gutierrez, Hadi (1997) • Cowell, Dawid, Lauritzen, Spiegelhalter (1999)  Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University References [2]: Machine Learning, KDD, and Bioinformatics  Machine Learning, Data Mining, and Knowledge Discovery  K-State KDD Lab: literature survey and resource catalog (1999-present) http://www.kddresearch.org/Resources  Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer, Thornton (2002-present) http://bnj.sourceforge.net  Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net  Bioinformatics  European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm  Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) http://www.cs.huji.ac.il/labs/compbio/  K-State BMI Group: literature survey and resource catalog (2002-2005) http://www.kddresearch.org/Groups/Bioinformatics CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Terminology  Introduction to Reasoning under Uncertainty  Probability foundations  Definitions: subjectivist, frequentist, logicist  (3) Kolmogorov axioms  Bayes’s Theorem  Prior probability of an event  Joint probability of an event  Conditional (posterior) probability of an event  Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses  MAP hypothesis: highest conditional probability given observations (data)  ML: highest likelihood of generating the observed data  ML estimation (MLE): estimating parameters to find ML hypothesis  Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model  Bayesian Learning: Searching Model (Hypothesis) Space using CPs CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University Summary Points  Introduction to Probabilistic Reasoning  Framework: using probabilistic criteria to search H  Probability foundations  Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist  Kolmogorov axioms  Bayes’s Theorem  Definition of conditional (posterior) probability  Product rule  Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses  Bayes’s Rule and MAP  Uniform priors: allow use of MLE to generate MAP hypotheses  Relation to version spaces, candidate elimination  Next Week: Chapter 14, Russell and Norvig  Later: Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes  Categorizing text and documents, other applications CIS 530 / 730: Artificial Intelligence Friday, 31 Oct 2008 Computing & Information Sciences Kansas State University

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download original