* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Biological networks: Global network properties
Zero-configuration networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Computer network wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Network tap wikipedia , lookup
Peer-to-peer wikipedia , lookup
Biological networks: Global network properties Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected] Lectures on biological networks 2 Global network properties (11/9) Motifs and modules (11/18) Network construction (11/20) Network-based applications (11/30) BMIF310, Fall 2009 Cell as a system Metabolic network Signaling network Transcriptional regulatory network 3 Gene co-expression network BMIF310, Fall 2009 Protein interaction network Genome-wide protein interaction networks Saccharomyces cerevisiae Drosophila melanogaster Uetz et al. Nature, 403:623, 2000 Giot et al. Science, 302:1727, 2003 Ito et al. PNAS, 97:1143, 2000 Homo sapiens 4 Caenorhabditis elegans Rual et al. Nature, 437:1173, 2005 Li et al. Science, 303:540, 2004 Stelzl et al. Cell, 122:957, 2005 BMIF310, Fall 2009 Graph Graph: a graph is a set of objects called nodes or vertices connected by links called edges. In mathematics and computer science, a graph is the basic object of study in graph theory. node edge RNA polymerase II 5 Cramer et al. Science 292:1863, 2001 BMIF310, Fall 2009 Undirected graph vs directed graph Protein interaction network Nodes: protein Edges: physical interaction Undirected Krogan et al. Nature 440:637, 2006 Lee et al. Science 298:799, 2002 Metabolic network Transcriptional regulatory network Nodes: metabolites Nodes: transcription factors and genes Edges: enzymes Edges: transcriptional regulation Directed Directed Substrate->Product TF->target gene Ravasz et al. Science 297:1551, 2002 6 BMIF310, Fall 2009 Fhl1 RPL2B Unweighted graph vs weighted graph Protein interaction network Nodes: protein Edges: physical interaction Unweighted Krogan et al. Nature 440:637, 2006 Gene co-expression network Nodes: gene Edges: co-expression level Correlation filtering Weighted (>0.8) 7 BMIF310, Fall 2009 Unweighted Graph representation Adjacency matrix 8 Space tradeoff Operation tradeoff BMIF310, Fall 2009 Adjacency list Node degree Degree: the number of edges adjacent to a vertex. YDL176W dTMP Degree: 3 In degree: 3 Out degree: 2 9 Fhl1 RPL2B Out degree: 4 Out degree: 0 In degree: 0 In degree: 3 BMIF310, Fall 2009 Degree distribution 10 BMIF310, Fall 2009 Node Degree Vid30 10 Fyv10 5 YMR135C 4 Vid24 3 Vid28 3 YDL176W 3 Ald6 1 YDR255C 1 Sod1 1 YMR093W 1 Hta2 1 Vma2 1 RPL23A 1 YCL039W 1 Random network: Erdös-Rényi model Model p=1/6; n=10; <k> = 1.5 G(n,p): Nodes are connected randomly to each other with probability p. Characteristic Degree distribution: binomial P(K=k) B(n-1,p) <k> K Binomial distribution ⎛ n −1⎞ k n −1−k P(K i = k) = ⎜ ⎟ p (1 − p) ⎝ k ⎠ ⎛ n ⎞ k ~ ⎜ ⎟ p (1 − p) n −k ⎝ k ⎠ € Average degree <k> = (n-1)p ~ np € 11 BMIF310, Fall 2009 Web documents network Albert et al. Nature 401:130, 1999 12 Nodes: WWW documents Edges: URL links Data: 800 million documents Network construction: collects all URLs found in a document and follows them recursively BMIF310, Fall 2009 Degree distribution of the web documents network What was expected? 〈k〉 ~ 6 P(k=500) ~ 10-99 NWWW ~ 109 ⇒ N(k=500)~10-90 What was found N(k=500) ~ 103 P(k=500) ~ 10-6 Kleinberg et al. Proceedings of ICCC, 1999 13 BMIF310, Fall 2009 Scale-free network 14 The straight-line on the log-log plot is the signature of a power law: Scale-free network: networks whose degree distribution follows the power-law. BMIF310, Fall 2009 Random network vs scale-free network Random network Scale-free network 130 nodes, 215 edges 130 nodes, 215 edges Homogeneous: most nodes have approximately the same number of links Five red nodes with the highest number of links reach 27% of the nodes Heterogeneous: the majority of the nodes have one or two links but a few nodes have a large number of links Five red nodes with the highest degrees reach 60% of the nodes (hubs) Albert et al., Nature, 406:378, 2000 15 BMIF310, Fall 2009 Origin of scale-free networks 16 Networks are the result of a growth process New nodes prefer to connect to nodes that already have many links, i.e. hubs (preferential attachment) Examples Social network Citation network BMIF310, Fall 2009 Degree distribution of metabolic networks A. fulgidus E. coli C. elegans 43 organisms Jeong et al, Nature, 407:651, 2000 17 BMIF310, Fall 2009 Scale-free biological networks 18 Metabolic network C. elegans Protein interaction network H. sapiens Jeong et al, Nature, 407:651, 2000 Stelzl et al. Cell, 122:957, 2005 BMIF310, Fall 2009 Gene co-expression network S. cerevisiae Noort et al, EMBO Reports,5:280, 2004 Evolutionary origin of scale-free networks Networks are the result of a growth process New nodes prefer to connect to nodes that already have many links (preferential attachment) Growth and preferential attachment have a common origin in protein interaction networks that is probably rooted in gene duplication Highly connected proteins are more likely to have a link to a duplicated protein if a randomly selected protein is duplicated Barabasi and Oltvai, Nature Rev Genet, 5:101, 2004 19 BMIF310, Fall 2009 Connectivity vs protein age Divide proteins in the Baker’s yeast into four groups: group 1, 872 proteins; group 2, 665 proteins; group 3, 2079 proteins; group 4, 2678 proteins Solid symbols: whole interaction database; Open symbols: highconfidence interactions Older proteins (group 4) have significantly more interactions Eisenberg and Levanon, Physi Rev Lett, 91:138701, 2003 20 BMIF310, Fall 2009 Scale-free topology and network robustness Robust against random damage Yet fragile against selective damage 21 BMIF310, Fall 2009 Connectivity vs protein lethality Red, lethal; green, non-lethal; orange, slow growth; yellow, unknown Pearson's correlation coefficient r = 0.75, demonstrates a positive correlation between lethality and connectivity Jeong et al, Nature, 411:41, 2001 22 BMIF310, Fall 2009 Path and shortest path 23 Path: a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence. Shortest path: a path between two nodes such that the sum of the distance of its constituent edges is minimized. BMIF310, Fall 2009 Small world network Wichita Stanly Milgram’s small world experiment Boston Omaha "If you do not know the target person on a personal basis, do not try to contact him directly. Instead, mail this folder to a personal acquaintance who is more likely than you to know the target person." Average path length between two person Small world network: a graph in which most nodes can be reached from every other by a small number of steps. Six degrees of separation 24 Social network BMIF310, Fall 2009 Metabolic networks are small world networks The histogram of the path lengths in the E. coli metabolic network The average path lengths for metabolic networks of 43 organisms with different complexity Biological interpretation: Efficiency in transfer of biological information Jeong et al, Nature, 407:651, 2000 25 BMIF310, Fall 2009 Summary Graph representation of biological networks Node, edge Directed/undirected, weighted/unweighted Degree, degree distribution Path, shortest path, average path length Properties of biological networks 26 Scale-free Degree distribution follows the power-law Growth and preferential attachment Hubs and robustness Small world Most nodes can be reached from every other by a small number of steps Efficiency BMIF310, Fall 2009 Key references 27 Barabasi and Oltvai, Network biology: Understanding the cell’s functional organization. Nature Rev Genet, 5:101, 2004 BMIF310, Fall 2009