* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Dynamic Bayesian Networks
Survey
Document related concepts
Transcript
Dynamic Bayesian Networks Sushmita Roy [email protected] Computational Network Biology Biostatistics & Medical Informatics 826 Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu Oct 18th 2016 Goals for today • • • • • Dynamic Bayesian Network How does a DBN differ from a static BN? Learning a Dynamic Bayesian Network DBN with structure priors Application of DBN to a phospho-proteomics time course – Evaluation and insights Dynamic Bayesian Networks • A Dynamic Bayesian network (DBN) is a Bayesian network that can model temporal/sequential data • DBN is a Bayes net for dynamic processes • A DBN also has a graph structure and conditional probability distributions • The DBN specifies how observations at a future time point may arise from previous time points. Notation • Assume we have a time course with T time points specifying activity of p different variables • Let denote the set of random variables at time t • A DBN over these variables defines the joint distribution of P(X), where • A DBN, like a BN, has a directed acyclic graph G and parameters Θ • G typically specifies the dependencies between time points – In addition we need to specify dependence (if any) at t=0 A DBN for p variables and T time points t=0 p X21 … XT1 X2 1 X12 X22 … XT2 … XTp Xp 1 X1p X2p Dependency at the first time point t=T … X11 … X1 1 … t=2 … t=1 X2: Variables at time t=2 Stationary assumption in a Bayesian network The stationarity assumption states that the dependency structure and parameters do not change with t Due to this assumption, we only need to specify dependencies between two sets of variables X21 … XT1 X1t X1t+1 X12 X22 … XT2 X2t X2t+1 … XTp t+1 … X2p t … X1p t=T … X11 … t=2 … p t=1 Xpt Xpt+1 Computing the joint probability distribution in a DBN Joint Probability Distribution can be factored into a product of conditional distributions across time and variables: Parameters specifying Graph encoding dependency structure the form of the conditional distributions between variables at consecutive time points Parents of Xit defined by the graph G Learning problems in DBNs • Parameter learning: Given known temporal dependencies between random variables estimate the parameters from observed measurements • Structure learning: Given data, learn both the graph structure and parameters – Complexity of learning depends upon the order of the model An example DBN • Let us consider a simple example of two regulators, B and C and one target gene, A • Assume their expression takes on values H, L and NC (for high, low and no-change in expression) • A’s expression level depends upon regulator B and C’s expression level • B and C mutually regulate each other • Let XAt denote the random variable representing the expression level of gene A at time t DBN for a three node network DBN The collapsed network Specifying the parameters of the DBN for a three node network Each of these conditional distributions will specify the distribution over {H,L, NC} given the state of the parent variable DBN Specifying the parameters of the DBN H L NC H 0.5 0.1 0.4 L 0.4 0.4 0.2 NC 0.25 0.25 0.5 Specifying the parameters of the DBN H L NC H H 0.8 0.1 0.1 H L 0.2 0.2 0.6 H NC 0.2 0.1 0.7 L H 0.2 0.2 0.6 L L 0.1 0.8 0.1 L NC 0.1 0.3 0.6 NC H 0.2 0.1 0.7 NC L 0.05 0.2 0.75 NC NC 0.1 0.1 0.8 Parameter estimation: Estimating these numbers Assume the following CPDs for three variables H L NC H 0.5 0.1 0.4 L 0.4 0.4 0.2 NC 0.25 0.25 0.5 H L NC H 0.5 0.1 0.4 L 0.4 0.4 0.2 NC 0.25 0.25 0.5 B C H L NC H H 0.8 0.1 0.1 H L 0.2 0.2 0.6 H NC 0.2 0.1 0.7 L H 0.2 0.2 0.6 L L 0.1 0.8 0.1 L NC 0.1 0.3 0.6 NC H 0.2 0.1 0.7 NC L 0.05 0.2 0.75 NC NC 0.1 0.1 0.8 Computing the probability distribution of an observation Suppose we are given a new observation time course T=0 T=1 T=2 NC L H NC H H NC NC H Assume, P(NC)=0.5 and P(H)=P(L)=0.25 for all variables at T=0. Using the DBN from the previous slides, what is the probability of this time course? First we plug in the formula at the time point level Next, we look at the graph structure of the DBN to further decompose these terms Computing the probability distribution of an observation T=0 T=1 T=2 NC L H NC H H NC NC H Assume P(NC)=0.5 and P(H)=P(L)=0.25 at T=0. Graph structure of the DBN to further decompose these terms Parameter estimation in DBNs • Parameter estimation approach would differ depending upon the form of the CPD • Assume that the variables are discrete, then we need to estimate the entries of the CPD distribution Parameter estimation example for three node DBN Need to estimate this table H H L NC Suppose we had a training time course: To compute these probabilities, we need to look at the joint assignments of {XBt+1,XCt} for all 0≤t≤4 T=0 T=1 T=2 T=3 T=4 NC L L L NC NC H H NC L NC NC H H L What is P(XBt+1=H|XCt=L)? What is P(XBt+1=NC|XCt=L)? L NC Structure learning in DBNs • We need to learn the dependency structure between two consecutive time points • We may also want to learn within time point connectivity • Structure search learning algorithms used for BNs, can be used with a simple extension: – parents of a node can come from the previous or current time step. DBN with score-based search • Score of a DBN is a function of the data likelihood Data: Collection of time courses Graph prior: This can be uniform, or can encode some form of model complexity Goals for today • • • • • Dynamic Bayesian Network How does a DBN differ from a static BN? Learning a Dynamic Bayesian Network DBN with structure priors Application of DBN to a phospho-proteomics time course – Evaluation and insights Bayesian Inference of Signaling Network Topology in a Cancer Cell Line (Hill et al 2012) • Protein signaling networks are important for many cellular diseases – The networks can differ between normal and disease cell types • But the structure of the network remains incomplete • Temporal activity of interesting proteins can be measured over time, that can be used infer the network structure • Build on prior knowledge of signaling networks to learn a better, predictive network • BNs are limiting because they do not model time Applying DBNs to infer signaling network topology Hill et al., Bioinformatics 2012 Application of DBNs to signaling networks • Dataset description – Phospho-protein levels of 20 proteins – Eight time points – Four growth conditions • Use known signaling network as a graph prior • Estimate CPDs as conditional regularized Gaussians • Assume a first-order Markov model – Xt depends on on Xt-1 Integrating prior signaling network into the DBN • A Bayesian approach to graph learning Data likelihood Graph prior • Graph prior is encoded as (Following Mukherjee & Speed 2008) Prior strength Graph features • Where f(G)=-|E(G)\E*| is defined as the number of edges in the graph G, E(G), that are not in the prior set E* • This prior does not promote new edges, but penalizes edges that are not in the prior Calculating posterior probabilities of edges • For each edge e, we need to calculate • Although this is intractable in general, this work makes some assumptions – Allow edges only forward in time • The learning problem decomposes to smaller per-variable problems that can be solved by variable selection – Assume P(G) factorizes over individual edges – To compute the posterior probability, the sum goes over all possible parent sets • Assume a node can have no more than dmax parents Results on simulated data 20 variables, 4 time-courses 8 time points Prior network had 54 extra edges and did not have 10 of the ground truth edges Results are not sensitive to prior values Sensitivity to choice of hyper parameter Sensitivity to noisy prior graph Inferred signaling network using a DBN Prior also had self-loops that are not shown Inferred signaling network Prior network gur eS3: Network prior. Existing biology iscaptured and integrated during modeling using aprior probability tribution on graphs P (G) ∝ exp(λf (G)), with f (G) = − |E (G)\ E ∗ | where E (G) is the set of edges ntained in G and E ∗ is a set of a priori expected edges. The graph shows edge set E ∗ . Edges represent eractions through time. Each node also has a self-loop edge (i.e. an edge starting and finishing at the same de, these are not displayed). The edge set includes expected indirect edges which operate via components included in our study. Using the DBN to make predictions • Although many edges were expected, several edges were unexpected • Select novel edges based on posterior probability and test them based on inhibitors • For example, if an edge was observed from X to Y, inhibition of X should affect the value of Y, if X is a regulator of Y • Example edges tested – MAPKp to STAT3p(S727) with high probability (0.98) • Apply MEKi, which is an inhibitor of MAPK, and measure MAPKp and STAT3p post inhibition – AKTp to p70S6Kp, AKTp to MEKp and AKTp to cJUNp Experimental validation of links Add MAPK inhibitor and measure MAPK and STAT3 MAPK is significantly inhibited (P-value 5X10-4) STAT3 is also inhibited (P-value 3.3X10-4) Their success is measured by the difference in the levels of the targets as a function of the levels of the inhibitors Take away points • Network dynamics can be defined in multiple ways • We have seen two ways to capture network dynamics • Skeleton network-based approaches – The universe of networks is fixed – Nodes become on or off – No assumption or model of how the network changes over time • Dynamic Bayesian network – A type of probabilistic graphical model – Describes how the system transitions from one state to another – Assumes that the dependency between t-1 and t is the same for all time points • Application to cancer signaling data – DBNs are powerful for capturing the dynamics – However, the prior was important to learn an accurate network References • • N. Friedman, K. Murphy, and S. Russell, "Learning the structure of dynamic probabilistic networks," in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, ser. UAI'98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 139-147. [Online]. Available: http://portal.acm.org/citation.cfm?id=2074111 S. M. Hill, Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills, and S. Mukherjee, "Bayesian inference of signaling network topology in a cancer cell line." Bioinformatics (Oxford, England), vol. 28, no. 21, pp. 2804-2810, Nov. 2012. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/bts514