Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006 The UNIVERSITY of Kansas Overview Bayesian network and other probabilistic graph models 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 Bayesian networks (informal) A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 Example contd. 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 Semantics The full joint distribution is defined as the product of the local conditional distributions: n P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j m a b e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 Inference Given the data that “neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call”, how do we make a decision about the following four possible explanations: Nothing at all Burglary but not Earthquake Earthquake but not Burglary Burglary and Earthquake 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 Learning Suppose that we only have a joint distribution, how do you “learn” the topology of a BN? 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 Application: Clustering Users Input: TV shows that each user watches Output: TV show “clusters” Assumption: shows watched by same users are similar Class 1 • Power rangers • Animaniacs • X-men • Tazmania • Spider man Class 4 • 60 minutes • NBC nightly news • CBS eve news • Murder she wrote • Matlock 10/30/2006 Probabilistic Graph Models Class 2 • Young and restless • Bold and the beautiful • As the world turns • Price is right • CBS eve news Class 5 • Seinfeld • Friends • Mad about • ER • Frasier Mining Biological Data KU EECS 800, Luke Huan, Fall’06 Class 3 • Tonight show • Conan O’Brien • NBC nightly news • Later with Kinnear • Seinfeld you slide10 App.: Finding Regulatory Networks P(Level | Module, Regulators) Module HAP4 1 Expression level of CMK1 Regulator 1 in experiment 0 does What module gene “g” belong to? 0 Experiment Regulator1 0 BMH1 Regulator2 GIC2 2 Module Regulator3 Gene 0 0 0 Expression level in each module is a function of expression of regulators 10/30/2006 Probabilistic Graph Models Level Expression Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 App.: Finding Regulatory Networks Gat1 Hap4 1 9 10 11 8 CBF1_B GCN4 N11 GATA Energy and cAMP signaling Msn4 Xbp1 25 HAP234 3 Sip2 2 STRE 48 Module (number) Kin82 Tpk1 N41 N14 41 33 N13 MIG1 REPCAR CAT8 ADR1 N26 MCM1 18 13 17 15 14 DNA and RNA processing nuclear Yer184c Cmk1 Ppt1 Lsg1 4 N18 Pph3 26 Gis1 Tpk2 Gac1 30 42 N30 GCR1 HSF XBP1 HAC1 5 16 Yap6 Ypl230w ABF_C N36 47 39 Ime4 Bmh1 Gcn20 Not3 31 36 Amino acid metabolism Inferred regulation Regulator (Signaling molecule) Regulation supported in literature Regulator (transcription factor) Enriched cis-Regulatory Motif Experimentally tested regulator 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 Constructing Bayesian networks Base: We know the joint distribution of X = X1, … ,Xn We know the “topology” of X Xi X, we know the parents of Xi Goal: we want to create a Bayesina network that capture the joint distribution according to the topology n n Theorem: such BN exists 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 Prove by Construction A leaf in X is a Xi X such that Xi has no child. For each Xi add Xi to the network select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) X = X – {Xi} This choice of parents guarantees: P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) Mining Biological Data 10/30/2006 (chain rule) Probabilistic Graph Models KU EECS 800, Luke Huan, Fall’06 slide14 Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values Each row requires one number p for Xi = true (the number for Xi = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2k) numbers I.e., grows linearly with n, vs. O(2n) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 Reasoning: Probability Theory Well understood framework for modeling uncertainty Partial knowledge of the state of the world Noisy observations Phenomenon not covered by our model Inherent stochasticity Clear semantics Can be learned from data 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 Probability Theory A (Discrete) probability P over (, S = 2) is a mapping from elements in S such that: is a set of all possible outcomes (sample space) in a probabilistic experiment, S is a set of “events” P() 0 for all S P() = 1 If ,S and =, then P()=P()+P() P( ) Conditional Probability: P( | ) P( ) Chain Rule: P( ) P( | ) P( ) Bayes Rule: P( | ) P( ) P( | ) P( ) Conditional Independence: P( | ) P( | ) ( | ) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 Random Variables & Notation Random variable: Function from to a non-negative real value such that summation of all the values is 1. Val(X) – set of possible values of RV X Upper case letters denote RVs (e.g., X, Y, Z) Upper case bold letters denote set of RVs (e.g., X, Y) Lower case letters denote RV values (e.g., x, y, z) Lower case bold letters denote RV set values (e.g., x) Eg. P(X = x), P(X) = {P(X=x) | x } 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 Joint Probability Distribution Given a group of random variables X = X1, … ,Xn, Xi takes value from a set xi, the joint probability distribution is a function that maps elements in = Π xi to a non-negative value such that the summation of all the values is 1. For example, RV weather takes four values “sunny, rainy, cloudy, snow”, RV Cavity takes 2 values “true, false” P(Weather,Cavity) = a 4 × 2 matrix of values: Weather = Cavity = true Cavity = false 10/30/2006 Probabilistic Graph Models sunny rainy 0.144 0.02 0.576 0.08 cloudy snow 0.016 0.02 0.064 0.08 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 Marginal Probability Given a set of RV X and its joint probabilities, a marginal probability distribution over X’ X is: P(X' ) P( X ) X X ' Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 P(weather=sunny) = 0.144 + 0.576 = 0.72 P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 Independence Two RV X, Y are independent, denoted as X Y if x X , y Y : P( X x, Y y) P( X x) P(Y y) Conditional independence: X is independent of Y given Z if: x X , y Y , z Z : ( X x Y y | Z z ) Weather = Cavity = true Cavity = false sunny 0.144 0.576 rainy 0.02 0.08 cloudy 0.016 0.064 snow 0.02 0.08 P(weather=sunny) = 0.144 + 0.576 = 0.72 P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 Representing Joint Distributions Random variables: X1,…,Xn P is a joint distribution over X1,…,Xn If X1,..,Xn binary, need 2n parameters to describe P Can we represent P more compactly? Key: Exploit independence properties 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 Independent Random Variables If X and Y are independent then: P(X, Y) = P(X|Y)P(Y) = P(X)P(Y) If X1,…,Xn are independent then: P(X1,…,Xn) = P(X1)…P(Xn) O(n) parameters All 2n probabilities are implicitly defined Cannot represent many types of distributions We may need to consider conditional independence 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 Conditional Parameterization S = Score on test, Val(S) = {s0,s1} I = Intelligence, Val(I) = {i0,i1} G = Grade, Val(G) = {g0,g1,g2} Assume that G and S are independent given I Joint parameterization 223=12-1=11 independent parameters Conditional parameterization has P(I,S,G) = P(I)P(S|I)P(G|I,S) = P(I)P(S|I)P(G|I) P(I) – 1 independent parameter P(S|I) – 21 independent parameters P(G|I) - 22 independent parameters 7 independent parameters 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 Naïve Bayes Model Class variable C, Val(C) = {c1,…,ck} Evidence variables X1,…,Xn Naïve Bayes assumption: evidence variables are conditionally independent given C n P (C , X 1 ,..., X n ) P (C ) P ( X i | C ) i 1 Applications in medical diagnosis, text classification Used as a classifier: P(C c1 | x1 ,..., xn ) P(C c1 ) n P( xi | C c1 ) P(C c2 | x1 ,..., xn ) P(C c2 ) i 1 P( xi | C c2 ) Problem: Double counting correlated evidence 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 Bayesian Network A Formal Study A Bayesian network on a group of random variables X = X1, … ,Xn is a tupple (T, P) such that The topology T X X is a directed acyclic graph A joint distribution P such that for all i [1,n], for all possible value of xi and xs P(Xi = xi| Xs = xs) = P(Xi = xi| parents(Xi) = xs) S = non-descendents of Xi in X Or, Xi is conditional independent of any of its non-descendent variables, given its parents(Xi) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 Factorization Theorem If G is an Independence-Map (I-map) of P, then n P( X 1 ,..., X n ) P( X i | Pa( X i )) i 1 Proof: X1,…,Xn is an ordering consistentnwith G P( X 1 ,..., X n ) P( X i | X 1 ,..., X i 1 ) By chain rule: i 1 From assumption: Pa( X i ) { X 1, , X i 1} { X 1, , X i 1} Pa( X i ) NonDesc ( X i ) Since G is an I-Map (Xi; NonDesc(Xi)| Pa(Xi))I(P) P( X i | X 1 ,..., X i 1 ) P( X i | Pa( X i )) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 Factorization Implies I-Map n P ( X 1 ,..., X n ) P ( X i | Pa( X i )) i 1 G is an I-Map of P Proof: Need to show that P(Xi | ND(Xi)) = P(Xi | Pa(Xi)) D is the descendents of node I, ND all nodes except i and D P ( X i | ND( X i )) P ( X i , ND( X i )) P ( ND( X i )) P( X d jD jD {i } P( X d jD | Pa( X j )) P ( X i | Pa( X i )) P ( X j | Pa( X j )) d j j P ( X j | Pa( X j )) P ( X j | Pa( X j )) jND | Pa( X j )) P ( X i | Pa( X i )) P( X d jND j | Pa( X j )) jD {i } P ( X i | Pa( X i )) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex systems are built by combining simpler parts Why have a model? Compact and modular representation of complex systems Ability to execute complex reasoning patterns Make predictions Generalize from particular problem 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 Probabilistic Graphical Models Increasingly important in Machine Learning Many classical probabilistic problems in statistics, information theory, pattern recognition, and statistical mechanics are special cases of the formalism Graphical models provides a common framework Advantage: specialized techniques developed in one field can be transferred between research communities 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 Representation: Graphs Intuitive data structure for modeling highly-interacting sets of variables Explicit model for modularity Data structure that allows for design of efficient generalpurpose algorithms 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 Reference “Bayesian Networks and Beyond”, Daphne Koller (Stanford) & Nir Friedman (Hebrew U.) 10/30/2006 Probabilistic Graph Models Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32