Download Bayesian Networks

Bayesian Networks VISA Hyoungjune Yi BN – Intro.     Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert System Common Sense Reasoning about uncertainty       June is waiting for Larry and Jacobs who are both late for VISA seminar June is worried that if the roads are icy one or both of them may have crash his car Suddenly June learns that Larry has crashed June think: “If Larry has crashed then probably the roads are icy. So Jacobs has also crashed” June then learns that it is warm outside and roads are salted June Think: “Larry was unlucky; Jacobs should still make it” Causal Relationships State of Road Icy/ not icy Larry Crash/No crash Jacobs Crash/No crash Larry Crashed ! Information Flow Larry Crash/No crash State of Road Icy/ not icy Jacobs Crash/No crash But Roads are dry State of Road not icy Larry Crash/No crash Information Flow Jacobs Crash/No crash Wet grass      To avoid icy roads, Larry moves to UCLA; Jacobs moves in USC One morning as Larry leaves for work, he notices that his grass is wet. He wondered whether he has left his sprinkler on or it has rained Glancing over to Jacobs’ lawn he notices that it is also get wet Larry thinks: “Since Jacobs’ lawn is wet, it probably rained last night” Larry then thinks: “If it rained then that explains why my lawn is wet, so probably the sprinkler is off” Larry’s grass is wet Information Flow Jacobs grass Wet/Dry Sprinkler On/Off Rain Yes/no Larry’s grass Wet Jacobs’ grass is also wet Information Flow Jacobs grass Wet Sprinkler On/Off Rain Yes/no Larry’s grass Wet Bayesian Network    Data structure which represents the dependence between variables Gives concise specification of joint prob. dist. Bayesian Belief Network is a graph that holds – – – – – Nodes are a set of random variables Each node has a conditional prob. Table Edges denote conditional dependencies DAG : No directed cycle Markov condition Bayesian network  Markov Assumption – – Each random variable X is independent of its nondescendent given its parent Pa(X) Formally, Ind(X; NonDesc(X) | Pa(X)) if G is an I-MAP of P (<-? ) I-MAP? Later Y1 Y2 X Markov Assumption Earthquake  Burglary In this example: – – – – – Ind( Ind( Ind( Ind( Ind( E; B ) B; E, R ) R; A, B, C | E ) A; R | B,E ) C; B, E, R | A) Radio Alarm Call I-Maps A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P  Examples:  X Y X Y I-MAP  G is Minimal I-Map iff – –  G is I-Map of P If G’  G then G’ is not an I-Map of P I-Map is not unique Factorization     Given that G is an I-Map of P, can we simplify the representation of P? Example: X Y Since Ind(X;Y), we have that P(X|Y) = P(X) Applying the chain rule P(X,Y) = P(X|Y) P(Y) = P(X) P(Y)  Thus, we have a simpler representation of P(X,Y) Factorization Theorem Thm: if G is an I-Map of P, then P(X1 ,..., X n )   P(X i | Pa(X i )) Earthquake Burglary i Radio P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A) Alarm Call So, what ? We can write P in terms of “local” conditional probabilities  If G is sparse,  that is, |Pa(Xi)| < k ,  each conditional probability can be specified compactly e.g. for binary variables, these require O(2k) params.  representation of P is compact linear in number of variables Formal definition of BN  A Bayesian network specifies a probability distribution via two components: – A DAG G – A collection of conditional probability distributions P(Xi|Pai)  The joint distribution P is defined by the factorization P (X1 ,..., Xn )   P (Xi | Pai ) i  Additional requirement: G is a minimal I-Map of P Bayesian Network - Example P T P(I |P, T ) p t 0.8 0.2 p t p t 0.6 0.4 0.2 0.8 Pneumonia Tuberculosis Lung Infiltrates XRay Sputum Smear p t 0.010.99  Each node Xi has a conditional probability distribution P(Xi|Pai) – – If variables are discrete, P is usually multinomial P can be linear Gaussian, mixture of Gaussians, … BN Semantics P T I X S conditional local full joint independencies + probability = distribution models over domain in BN structure P(p ,t,i, x, s )  P(p ) P(t) P(i | p ,t) P(x | i) P(s | t)  Compact & natural representation: – nodes have  k parents  2k n vs. 2n params d-separation  d-sep(X;Y | Z, G) –  X is d-separated from Y, given Z if all paths from a node in X to a node in Y are blocked given Z Meaning ? – – On the blackboard Path   – Active: dependency between end nodes in the path Blocked: No dependency Common cause, Intermediate, common effect  On the blackboard BN – Belief, Evidence and Query   BN is for “Query” - partly Query involves evidence –  Evidence is an assignment of values to a set of variables in the domain Query is a posteriori belief – Belief  P(x) = 1 or P(x) = 0 Learning Structure  Problem Definition – –  Issue – –  Given: Data D Return: directed graph expressing BN Superfluous edges Missing edges Very difficult – http://robotics.stanford.edu/people/nir/tutorial/ BN Learning Data  T I X S BN models can be learned from empirical data – –  Induce r P parameter estimation via numerical optimization structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bayesian Networks