Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene and Protein Networks Monday, April 10 2006 CSCI 7000-005: Computational Genomics Debra Goldberg [email protected] What is a network? • A collection of objects (nodes, vertices) • Binary relationships (edges) • May be directed • Also called a graph Networks are everywhere Social networks Nodes: People Edges: Friendship from www.liberality.org Sexual networks Nodes: People Edges: Romantic and sexual relations Transportation networks Nodes: Locations Edges: Roads Power grids Nodes: Power station Edges: High voltage transmission line Airline routes Nodes: Airports Edges: Flights Internet Nodes: MBone Routers Edges: Physical connection Internet Nodes: Autonomous systems Edges: Physical connection World-Wide-Web Nodes: Web documents Edges: Hyperlinks Gene and protein networks Metabolic networks Nodes: Metabolites Edges: Biochemical reaction (enzyme) from web.indstate.edu Metabolic networks Nodes: Metabolites Edges: Biochemical reaction (enzyme) • Drug targets predicted from www.bact.wisc.edu Metabolic networks Nodes: Metabolites Edges: Biochemical reaction (enzyme) Protein interaction networks Nodes: Proteins Edges: Observed interaction from www.embl.de • Gene function predicted Gene regulatory networks Nodes: Genes or gene products Edges: Regulation of expression from Wyrick et al. 2002 • Inferred from error-prone gene expression data Signaling networks Nodes: Molecules (e.g., Proteins or Neurotransmitters) Edges: Activation or Deactivation from pharyngula.org Signaling networks Nodes: Molecules (e.g., Proteins or Neurotransmitters) Edges: Activation or Deactivation from www.life.uiuc.edu Synthetic sick or lethal (SSL) X Y X Y X Y X Y Cells live (wild type) Cells live Cells live Cells die or grow slowly SSL networks X Y Nodes: Nonessential genes Edges: Genes co-lethal from Tong et al. 2001 • Gene function, drug targets predicted Other biological networks • Coexpression – Nodes: genes – Edges: transcribed at same times, conditions • Gene knockout / knockdown – Nodes: genes – Edges: similar phenotype (defects) when suppressed What they really look like… We need models! Traditional graph modeling from GD2002 Random Regular Introduce small-world networks Small-world Networks • Six degrees of separation • 100 – 1000 friends each • Six steps: 1012 - 1018 • But… We live in communities Small-world measures • Typical separation between two vertices – Measured by characteristic path length • Cliquishness of a typical neighborhood – Measured by clustering coefficient v v Cv = 1.00 Cv = 0.33 Watts-Strogatz small-world model Measures of the W-S model • Path length drops faster than cliquishness • Wide range of p has both small-world properties Small-world measures of various graph types Characteristic Cliquishness Path Length Regular graph High Long Random graph Low Short Small-world graph High Short Another network property: Degree distribution P (k) • The degree (notation: k) of a node is the number of its neighbors • The degree distribution is a histogram showing the frequency of nodes having each degree Degree distribution of E-R random networks Erdös-Rényi random graphs 0.1 P(k) P(k ) Binomial degree distribution, well-approximated by a 0.15 Poisson 0.05 0 Network figures from Strogatz, Nature 2001 0 Degree 25 = k 50 Degree = k Degree distribution of many real-world networks Scale-free networks Degree distribution follows a 1 power law P(k = x) = x - 0.5 log k log P(k) 1 P(k ) P(k) 0.1 0.01 0.001 1 10 100 0 0 Degree 25 = k 50 Degree = k Hierarchical Networks Ravasz, et al., Science 2002 Properties of hierarchical networks 1. Scale-free 2. Clustering coefficient independent of N 3. Scaling clustering coefficient (DGM) C of 43 metabolic networks • Independent of N Ravasz, et al., Science 2002 Scaling of the clustering coefficient C(k) • Metabolic networks Ravasz, et al., Science 2002 Many real-world networks are small-world, scale-free • • • • • • • • World-wide-web Collaboration of film actors (Kevin Bacon) Mathematical collaborations (Erdös number) Power grid of US Syntactic networks of English Neural network of C. elegans Metabolic networks Protein-protein interaction networks There is information in a gene’s position in the network We can use this to predict • Relationships – Interactions – Regulatory relationships • Protein function – Process – Complex / “molecular machine” Confidence assessment • Traditionally, biological networks determined individually – High confidence – Slow • New methods look at entire organism – Lower confidence ( 50% false positives) • Inferences made based on this data Confidence assessment • Can use topology to assess confidence if true edges and false edges have different network properties • Assess how well each edge fits topology of true network • Can also predict unknown relations Goldberg and Roth, PNAS 2003 Use clustering coefficient, a local property • Number of triangles = |N(v) N(w)| y v v w w x • Normalization factor? N(x) = the neighborhood of node x ... Mutual clustering coefficient Jaccard Index: Meet / Min: Geometric: |N(v) N(w)| ---------------|N(v) N(w)| |N(v) N(w)| -----------------------min ( |N(v)| , |N(w)| ) |N(v) N(w)| 2 -----------------|N(v)| · |N(w)| Hypergeometric: a p-value Mutual clustering coefficient Hypergeometric: P (intersection at least as large by chance) = neighbors of node v = neighbors of node w = nodes in graph -log Prediction • A v-w edge would have a high clustering coefficient v w Confidence assessment • Integrate experimental details with local topology – Degree – Clustering coefficient – Degree of neighbors – Etc. Bader, et al., Nature Biotechnology 2003 The synthetic lethal network has many triangles Xiaofeng Xin, Boone Lab 2-hop predictors for SSL • • • • • SSL – SSL (S-S) Homology – SSL (H-S) Co-expressed – SSL (X-S) Physical interaction – SSL (P-S) 2 physical interactions (P-P) S: H: X: P: Synthetic sickness or lethality (SSL) Sequence homology Correlated expression Stable physical interaction v w Wong, et al., PNAS 2004 Multi-color motifsHir1 Hir2 Hir2 Hir1 C1 C2 C1 C2 R R R R R R R R C1 C2 P R P R R XX R 3 4 Nreal: 5.6×10 1.5×10 Nreal: 5.6×103 1.5×104 2 Nrand: (4.3+0.5)×10 (3.6+0.2) ×103 P X Nrand: (4.3+0.5)×102 (3.6+0.2) ×103 Nreal: 5.6×103 1.5×104 Nrand: (4.3+0.5)×102 S: H: X: P: R: (3.6+0.2) ×103 Hir1 Hir1 Hir1 R R R Hir1 R Hhf1R P,XR Hht1 Hhf1 Hht1 P,X a network Hhf1 Hht1 P,X motif a network motif a network motif Synthetic sickness or lethality Sequence homology Correlated expression Stable physical interaction Transcriptional regulation Hir2 Hta1 Hta1 Hhf1 Hht1 Hhf1 Hht1 Hta2 Htb2 Hta2 Hta1 Htb2 Hhf1 Hht1 Hhf2 Hht2 Hhf2 Hht2 Hta2 Htb1Htb2 Htb1 Hhf2 a network theme Hht2 a network theme Htb1 a network theme Zhang, et al., Journal of Biology 2005 SSL “hubs” might be good cancer drug targets Normal cell Alive Cancer cells w/ random mutations Dead (Tong et al, Science, 2004) Dead Predict protein function from function of neighboring proteins • “Guilt by association” • Consider immediate neighbors – Schwikowski, et al., Nature Biotechnology 2001 • Consider a given radius – Hishigaki, et al., Yeast 2001 Predict protein function from neighboring proteins (2) • Minimize interactions between proteins with different annotations – Vazquez, et al., Nature Biotechnology 2003 – Karaoz, et al., PNAS 2004 • Use network flow algorithm to “transport” function annotation – Nabieva, et al., Bioinformatics 2005 Lethality • Hubs are more likely to be essential Jeong, et al., Nature 2001 Degree anti-correlation • Few edges directly between hubs • Edges between hubs and low-degree genes are favored Maslov and Sneppen, Science 2002 Beware of bias Protein abundance • Abundant proteins are – more likely to be represented in some types of experiments – More likely to be essential • Correlation between degree (hubs) and essentiality disappears or is reduced when corrected for protein abundance Bloom and Adami, BMC Evolutionary Biology 2003 Degree correlation 25 20 average degree K1 • Anti-correlation of degrees of interacting proteins disappears in un-biased data 15 10 essential non-essential 5 0 0 10 20 30 40 degree k Coulomb, et al., Proceedings of the Royal Society B 2005 50 60 70 Community structure Partitioning methods Community structure • Proteins in a community may be involved in a common process or function Finding the communities • • • • • • Hierarchical clustering “Betweenness” centrality Dense subgraphs Similar subgraphs Spectral clustering Party and date hubs Hierarchical clustering (1) Using natural edge weights • Gene co-expression • e.g., Eisen MB, et al., PNAS 1998 from www.medscape.com Hierarchical clustering (2) Topological overlap • A measure of neighborhood similarity li,j is 1 if there is a direct link between i and j, 0 otherwise Ravasz, et al., Science 2002 Hierarchical clustering (3) Adjacency vector • Function cluster: Tong et al., Science 2004 • Find drug targets: Parsons et al., Nature Biotechnology 2004 “Betweenness” centrality • Consider the shortest path(s) between all pairs of nodes • “Betweenness” centrality of an edge is a measure of how many shortest paths traverse this edge • Edges between communities have higher centrality Girvan , et al., PNAS 2002 Dense subgraphs • Spirin and Mirny, PNAS 2003 – Find fully connected subgraphs (cliques), OR – Find subgraphs that maximize density: 2 m / (n (n-1)) • Bader and Hogue, BMC Bioinformatics 2003 – Weight vertices by neighborhood density, connectedness – Find connected communities with high weights Similar subgraphs • Across species • Interaction network and genome sequence • e.g., Ogata, et al., Nucleic Acids Research 2000 Spectral clustering • Compute adjacency matrix eigenvectors • Each eigenvector defines a cluster: – Proteins with high magnitude contributions Bu, et al., Nucleic Acids Research 2003 positive eigenvalue negative eigenvalue Party and date hubs • Protein interaction network • Partition hubs by expression correlation of neighbors Han, et al., Nature 2004 Network connectivity • Scale-free networks are: – Robust to random failures – Vulnerable to attacks on hubs • Removing hubs quickly disconnects a network and reduces the size of the largest component Albert, et al., Nature 2000 Removing date hubs shatters network into communities Date Hubs Many sub-networks A single main component Temporal partitioning Luscombe, et al., Nature 2004 Final words • Network analysis has become an essential tool for analyzing complex systems – There is still much biologists can learn from scientists in other disciplines • The references mentioned are representative, and not comprehensive