Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. In the beginning there was DNA… Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334 …then came protein interactions Arabidopsis PPI network E. Coli PPI network Yeast PPI network Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional relevance ◦ Sequence conservation implies functional conservation ◦ Network conservation implies functional conservation too! What new insights might we gain from network comparisons? (Why should we care?) Network comparisons allow us to: Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions Only possible with network comparisons Possible with existing techniques, but improved with network comparisons What is a Protein Interaction Network? Proteins are nodes Interactions are edges Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001) The Network Alignment Problem Given k different protein interaction networks belonging to different species, we wish to find conserved subnetworks within these networks Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity) Example Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006 General Framework For Network Alignment Algorithms Network construction Alignment algorithm Scoring function Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006 Building Co-expression Networks Gene B Gene A Arrays Gene C Pearson Correlation Expression Genes Gene A Gene B Gene C 1 .8 1 .8 -.7 -.6 1 = -.7 -.6 Microarray data Balaji S. Srinivasan Two Algorithms NetworkBLAST (covered today) Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005. Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006. Overview of Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005. Estimation of Interaction Probabilities In the preprocessing step, edges in the network are given a reliability score using a logistic regression model based on three features: 1. 2. 3. Number of times an interaction was observed Pearson correlation coefficient between expression profiles Proteins’ small world clustering coefficient Network Alignment Graphs Construct a Network Alignment Graph to represent the alignment Nodes contain groups of sequence similar proteins from the k organisms Edges represent conserved interactions. An edge between two nodes is present if: 1. 2. 3. One pair of proteins directly interacts, the rest are distance at most 2 away All protein pairs are of distance exactly 2 At least max(2, k – 1) protein pairs directly interact Tries to account for interaction deletions Example Network Alignment Graph b a b’’ Species X Species Y Species Z b’ a’’ b’ b a’ a a’ a’’ Nodes c c’’ c c’ Network alignment graph b’’ c’ c’’ Individual species’ PPI network Scoring Function Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure High scoring subgraphs correspond to structured sub-networks (cliques or pathways) Only network topology is scored, node similarity is not Log Likelihood Ratio Model Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network Pr(Subgraph occurs | Conserved Network) log Pr(Subgraph occurs | Random Network) Randomly constructed network preserves degree distribution for nodes Log Likelihood Ratio Model (i) in a real subnetwork,each interaction should be present independently with high probability,and (ii) in a random subnetwork, the probability of an interaction between any two proteins depends on their total number of connections in the network. Likelihood Ratio Scoring of a Protein Complex in a Single Species Probability of complex being observed in a conserved network model Probability of subgraph being observed in a random network model U : a subset of vertices (proteins) in the PPI graph OU : collection of all observations on vertex pairs in U Ouv : interaction between proteins u, v observed Ms : conserved network model Mn: random network (null) model Tuv : proteins u, v interact Fuv : proteins u, v do not interact β : probability that proteins u, v interact in conserved model puv : probability that edge u, v exists in a random model Likelihood Ratio Scoring of a Protein Complex in a Single Species Hence, log likelihood for a complex occurring in a single species is given by For multiple complexes across different species, it is the sum of the log likelihoods L(A, B, C) = L(A) + L(B) + L(C) Example of Complex Scoring b a b’’ Complex X1 in Species X Complex Y1 in Species Y Complex Z1 in Species Z b’ a’’ b’ b a’ a a’ a’’ Nodes c b’’ c’ c’’ Individual species’ PPI network c’’ c c’ Conserved complex A in the Network alignment graph L(A) = L(X1) + L(Y1) + L (Z1) Alignment algorithm Problem of identifying conserved subnetworks reduces to finding high scoring subgraphs NP-complete problem Heuristic solution: ◦ Greedy extension of high scoring seeds ◦ (Does this sound familiar? BLAST?) ◦ Common to both papers discussed Alignment algorithm 1. Find seeds for each node v in the alignment graph a. Find high scoring paths of 4 nodes by exhaustive search b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed Alignment algorithm 2. Iteratively add or remove nodes to increase the overall score of the node Original seeds are preserved Limit size of discovered subgraphs to 15 nodes Record up to 4 highest scoring subgraphs discovered around each node Alignment algorithm 3. Filter subgraphs with a high degree of overlap Iteratively find high scoring subgraph and remove all highly overlapping ones remaining Results Conserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks. Results Prediction of protein function • ‘Guilt by association’ • If a conserved cluster or path is significantly enriched in a functional annotation Prediction of protein interactions Predictions based on 2 strategies: • Evidence that proteins with similar sequences interact • Co-occurrence of proteins in the same conserved cluster or path • Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate Overview of Fast, scalable, network alignment ◦ Scales linearly in number of networks compared ◦ NetworkBLAST scales exponentially Supports efficient querying of modules Speed-sensitivity control via user defined parameter ◦ Not supported in NetworkBLAST Input to the Algorithm Weighted protein interaction graphs ◦ Weights represent probability that proteins interact ◦ Constructed via network integration algorithm A phylogenetic tree relating the species in the desired alignment ◦ Used for progressive alignment Key Ideas of Graimin Generating An Initial Alignment From The Seed Greedy Seed Extension Phase Progressive alignment technique using the phylogenetic tree Results Functional module identification using network alignment Results Multiple alignment of 10 networks showing possible cell division module Functional annotation using network alignment The Future of Network Comparison Græmlin? Græmlin Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006 Summary The problem: Network comparison/comparative interactomes NetworkBlast algorithm Brief introduction fo The analogy between sequence comparison and network comparison Reference & Acknowledgements Chuan Sheng Foo Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS. February 8, 2005 | vol. 102 | no. 6 | 1974-1979