* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download FG-NEMs
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Essential gene wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Heritability of IQ wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Probabilistic models for interpreting perturbations in networks: Nested effect models Sushmita Roy [email protected] Computational Network Biology Biostatistics & Medical Informatics 826 Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu Dec 6th 2016 Types of algorithms used to examine perturbations in networks • Graph diffusion followed by subnetwork finding methods – HOTNET – NETBAG • Information flow-based methods (also widely used for integrating different types of data) – Prize collecting steiner tree – Min cost max flow • Probabilistic graphical model-based methods – Factor graphs – Nested Effect Models (NEMs) Probabilistic graphical models for interpreting network perturbations • C.-H. H. Yeang, T. Ideker, and T. Jaakkola, "Physical network models." Journal of computational biology : a journal of computational molecular cell biology, vol. 11, no. 2-3, pp. 243262, Mar. 2004. • F. Markowetz, D. Kostka, O. G. Troyanskaya, and R. Spang, "Nested effects models for high-dimensional phenotyping screens," Bioinformatics, vol. 23, no. 13, pp. i305-312, Jul. 2007. • C. J. Vaske, C. House, T. Luu, B. Frank, C.-H. H. Yeang, N. H. Lee, and J. M. Stuart, "A factor graph nested effects model to identify networks from genetic perturbations." PLoS computational biology, vol. 5, no. 1, pp. e1 000 274+, Jan. 2009. Motivation of nested effect models • Perturbation of genes followed by high-throughput profiling of different phenotypes can be used to characterize functions of genes • However, most genes do not function independently but interact in a network to drive a particular function • Phenotypic measurements (e.g. mRNA levels) are indirect measurements of the underlying network structure – Includes direct and indirect effects • Given perturbation data from multiple genes, can we more systematically identify the functions of these genes and how they interact at a pathway level? Problem overview • Given – global measurements of gene expression after single gene deletions of multiple genes • Do – Infer interactions between genes with deletions to enable further characterization of these genes • Nested Effect Models are probabilistic model-based approaches to solve this problem Nested Effect Models Markowetz et al, 2007 Nested Effect Models Key properties • A generalization of similarity based clustering • Orders the clusters according to subset relationships – A gene A is upstream of another gene B if B’s effects are a subset of A’s effects • Build a hierarchy of all perturbed genes by constructing from smaller sub-models of pairs and triplets of genes Subset relationships to order genes A complete model. The left part of the figure shows a complete model M’xyz consisting of a transitively closed graph between genes and assignments of genes to specific effects (the dashed arrows). Given the complete model, we can formulate a prediction of what effects to expect: perturbing x should cause all effects, while perturbing y should only cause E3–E6, and perturbing z only E5 and E6 (middle plot). In reality, our observations will be noisy: there can be false positive (FP) and false negative (FN) effect observations (right plot). Probabilistic graphical models for interpreting network perturbations • C.-H. H. Yeang, T. Ideker, and T. Jaakkola, "Physical network models." Journal of computational biology : a journal of computational molecular cell biology, vol. 11, no. 2-3, pp. 243262, Mar. 2004. • F. Markowetz, D. Kostka, O. G. Troyanskaya, and R. Spang, "Nested effects models for high-dimensional phenotyping screens," Bioinformatics, vol. 23, no. 13, pp. i305-312, Jul. 2007. • C. J. Vaske, C. House, T. Luu, B. Frank, C.-H. H. Yeang, N. H. Lee, and J. M. Stuart, "A factor graph nested effects model to identify networks from genetic perturbations." PLoS computational biology, vol. 5, no. 1, pp. e1 000 274+, Jan. 2009. Key properties of Factor Graph-NEMs (FGNEMs) • NEMs assume the genes that are perturbed interact in a binary manner • But many interactions have sign – inhibitory or stimulating action • FG-NEMs capture a broader set of interactions among the perturbed genes • Formulation based on a Factor Graph – Provide an efficient search over the space of NEMs Notation • • • • • S-genes: Set of genes that have been deleted individually E-genes: Set of effector genes that are measured Θ: The attachment of an effector gene to the S-gene network Φ: The interaction matrix of S-genes X: The phenotypic profile, each column gives the difference in expression in a knockout compared to wild type – Rows: E-genes – Columns: S-genes • Y: Hidden effect matrix, each entry is {-1, 0, +1} which specifies whether an S-gene affects the E-gene An example of 4 S-genes and 13 E-gens A->C is reflected in the scatter plot. When XA is up, X C is up. When XA is down, XC is down or no change E-genes X Φ S-genes Θ B-ID is also reflected in the scatter plot. XD is a subset of opposite changes from XB S-gene interaction modes and their expression signatures Interaction mode Figure 1. Predicting Pair-wise Interaction Using Quantitat i ve Nested Effects. (A) Hypothetical example with four S-genes, A, B, C, a with expectation of E-gene expression under knockdown of each S-gene shows both inhib A heatmap link, BxD (left). graph contains one inhibitoryConsistent stimulatory effects (middle). Scatter plots of the C, A, B, and D knock-outs show that expression fits in the shaded preferred regions of each in (right). The inhibitory link explains some of the ‘‘observed’’ data: expression changes under DD (bright red or bright green entries in the Expected trend in E-genes for specific interaction modes expectation Expression levels interaction. known inhibitorywith occur in a subset of the E-genes for which the opposite changes occur in DB. (B) Data from aInconsistent genes under the DIG1/DIG2 knock-out (y-axis) plotted against their levels under the STE2 knock-out (x-axis) as detected in [17]. Expression significant at a = 0.05 indicated in gray lines. DIG1/DIG2 is known to inhibit STE12. (C) Interaction modes. Observed E-gene expression cha compared to five possible types of interactions between two S-genes, A and B (i–v). The top row illustrates the expected nested effects rel Factor graphs • A type of graphical model • A bi-partite graph with variable nodes and factor nodes • Edges connect variables to potentials that the variables are arguments of • Represents a global function as product of smaller local functions • Perhaps the most general graphical model – Bayesian networks and Markov networks have factor graph representations Example factor graph Variable nodes Factor nodes From Kschischang, Frey, Loeliger 2001 Probabilistic model for NEMs • Goal is to find a network, Φ and Θ that best fit the observed data (X) • This is an inference problem • Use a Maximum a posterior (MAP) approach X is a noisy measurement of Y. Y is the quantity we need to sum over is attached to a single S-gene and that each E-gene observation vector across the knock-downs is independent of other E-gene observations. The maximization function can then be written: Probabilistic model continued ( J ðX Þ~ maxW,H X PðWÞ Y ) P PðY ejW,heÞPðX ejY eÞ e[ E ð5Þ Independence over all E-genes ( ~ maxW,H PðWÞP X e[ E ) PðY ejW,heÞPðX ejY eÞ ð6Þ Y Re-arranging the terms ~ maxW,H PðWÞP L e e[ E ð7Þ where Xe and Ye are the row vectors of data and hidden states for E-gene e respectively, and he records the attachment point How to compute Le? Previous approaches decompose Le over the knock-downs, which assume the S-gene observations are independent given the Digging inside theforLan e term network and attachments (see [18] example of such a derivation). To facilitate scoring the expanded set of interaction • Note: Ye={YeA,YeB, Yearlier, N is the total Smodes mentioned we replace Le number with aof function eC.. YeN}, where genes to Le, Le9. Le9 is defined as a product of pair-wise Sproportional gene terms:L’e, proportional to Le using a set of pairwise potentials • Define L ’e~ X P PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eB jY eB Þ ð8Þ Y eA, A,B[ S Y eB where heAB represents the attachment of E-gene e relative to the • The S- gene interaction pair of S-genes A and B. Note that both heAB and wAB are indexed • Attachment of gene e with respect to A or B by the unordered pair, { A, B} , so that wAB and wBA are references for the same variable. We refer to heAB as e’s local attachment which gene. wAB defines the mode of interaction between S-genes A and B. Assuming the replicates are independent given the E-gene states, P(XeA | YeA) can be written as a product over replicate terms: P PðX eAr jY eA Þ, where P(XeAr | YeeA) is modeled with a Digging inside the L term r[ RA Gaussian distribution having mean m:Y eA and standard deviation • s estimated Thefrom S- gene the interaction data (see Text S1). Substituting Le9 for Le into Eq.respect (7) and the • Attachment of gene e with to Adistributing or B maximization over attachment points, we obtain the maximizing function used in our approach: Now the joint can be 8 written in a more tractable way < J ðX Þ~ maxW PðWÞ P max e[ E, heAB : A,B[ S X ) ð9Þ PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eA jY eA Þ Y eA ,Y eB Each of these conditional distributions will correspond to a factor The interaction factors P(YeA, YeB | wAB, heAB) have a value of one if the E-gene eis attached to either A or B and e’s state is consistent s m i i s f s i i a s f p p I G f < X Þ~ maxW PðWÞ P max e[ E, hthe : Defining eAB factors A,B[ S ) PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eA jY eA Þ Y eB Modeled as Gaussian distributions Four variable factor, over discrete variables action factors A, YeB | wAB, heAB) have a value of Y : binary variablesP(Ye ne eis attached to either A or B and e ’s state is cons Four values for each possible type of interaction: inhibitory, activating, equivalent no interaction interaction mode between A and B. If e’s sta Interaction of e with A or B: inhibited or activated by A or B or no action ent with the interaction and attachment, then the This factor has value=1 E-genehard e is attached to either A or B and state is consistent e zero. While weif the used constraints toe’smodel cons with the interaction mode between A and B. nsistent expression changes (corresponding to the eA should exhibit transitivity to force pair-wise interaction mo be consistent among all triples. Using transitivity, all pathsbe any two genes, A andover B, areS-gene guaranteedgraph to have the same o The prior effect; i.e. the product of the signsof individual linksalong di between A and B are equal. • paths The prior P(Φ ) can incorporate prior knowledge of In order among to preserve transitivity of identified inter interactions genes inthe pathways the prior is decomposed over interaction configur • modes, At its simplest, it should encode a transitivity relationship to into transitivity onbeallconsistent triples ofamong S-genes; force all pairwiseconstraints interactions to all i.e.: triples PðWÞ! P A,B,C[ S t ABC ðwAB ,wBC ,wAC Þ P r AB ðwAB Þ A,B[ S Transitivity constraint for triples Physical network constraints where t iszero if the triple of interactions are intransitive, an if the interactions are transitive (see Text S1 for full defin Then, A->Cto find con Example If A B, forces B C,the Usingtransitivity: transitivity constraints search models that best explain the observed changes. The tran Factor graph representation of NEMs Prior Figure 2. Structure of the factor graph for net work inference The factor graph consists of three classes of variables (circles) and three Inference on the factor graph • • • • Find most likely configurations for Use a message passing algorithm (standard for factor graphs) Called the Max-Product algorithm Message passing happens in two steps – Messages are passed from observations XeA to the – Messages are passed between the interaction and transitivity factors until convergence To make the comparison of FG-NEM to uFG-NE measured network recovery in two ways. 1) We ca measure of structure recovery: a predicted interaction correct if it matched an interaction (of either sign) in the Does FG-NEM capture activating and inhibitory relationships? FG-NEM: capture inhibitory and activating relationships uFG-NEM: capture only unsigned interactions FG-NEM AVT: FG-NEM run on absolute value data Solid lines: structure recovery Dashed lines: sign recovery of FG-NEM to uFG-NEM fair, we in two ways. 1) We calculated a a predicted interaction was called ction (of either sign) in the simulated linearly with increasing fraction of inhibition. G results, we expect FG-NEMs to have significa performance on real genetic networks where amounts of inhibition exist (see Figure S1). We also Does FG-NEM expand pathways better than the baseline approach Right shift in percentile rank difference of NEM methods e. ee ne EwAB rs on in m es ng wise attachments for a single E-gene connection variable heAB, provide local ‘‘best guesses’’ for e’s attachment. Rather than aggregate e’s collection of local attachments, we use NEM scoring, modified to incorporate both stimulatory and inhibitory attachments, to estimate the attachment point using the full network learned in the previous (seenetwork Text S1). Attach new E-genes to step S-gene calculate a log-likelihood thatthat measures the degree to AnWe attached gene e to S-gene s ratio asserts e is directly which e’s expression data is explained by the network if it is downstream of s attached to one of the S-genes compared to being disconnected All E-genes attached to the S-gene network are called frontier from the network, i.e. its likelihood was generated entirely by the genes background Gaussian distribution. For E-gene e, we compute the An E-gene’s connectivity is examined based on the Loglog-likelihood of attachment ratio (LAR): likelihood Attachment Ratio 0 1 One of the S genes max PðX ejW,he~ i Þ i= 0 A, L ARðeÞ~ log@ PðX ejW,he~ 0Þ Pathway expansion • • • • where hehere represents Markowetz et. al’s attachment parameter expanded to include inhibitory and stimulatory attachments. We FG-NEM based pathway expansion in yeast Factor Graph Nested Effects Mode Template matching: rank E genes based on similarity in expression to an “idealized template” FG-NEM infers a more accurate network than the unsigned version in yeast • FG-NEM and uFG-NEM networks inferred in the ion-homeostasis pathway • FG-NEM inferred more genes associated with ion homeostatis compared to uFG-NEM FG-NEM application to colon cancer Factor Graph Nested Effects Model Most interactions in S-gene network are activating Novel expanded genes that have significant effect on the invasive phenotype Figure 5. Invasive colon cancer network predictions. (A) Expression changes of selected E-genes following targeted S-gene knock-downs in Summary • FG-NEMs: A general approach to infer an ordering of genes from knock-down phenotypes • Strengths – FG-NEMs could be used in an iterative computationalexperimental framework – Handles signed interactions between S-genes • Weaknesses – Computational complexity of the inference procedure might be high • Required independence among E-genes • Model pairs of S-genes at a time Concluding remarks • We have seen a suite of problems, algorithms and applications in a real setting • These ranged from network inference, dynamic network inference, network modules, network alignment and network-based interpretation • What we did not see (or saw less of) – Integration of different types of networks – Experimental design for better learning of networks – More topological properties of networks – Subnetwork network analysis • If you remain interested in these topics or would like to learn more, send me an email