* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Biological Networks
IEEE 802.1aq wikipedia , lookup
Distributed firewall wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Computer network wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Network tap wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Biological Networks Feng Luo 1 Copyright notice • Many of the images in this power point presentation of other people. The Copyright belong to the original authors. Thanks! 2 Biological Networks Biological Systems Made of many non-identical elements interact each other with diverse ways. Biological Networks Biological networks as framework for the study of biological systems 3 Why Study Networks? • It is increasingly recognized that complex systems cannot be described in a reductionist view. • Understanding the behavior of such systems starts with understanding the topology of the corresponding network. • Topological information is fundamental in constructing realistic models for the function of the network. 4 Graph Terminology Node Edge Directed/Undirected Degree Shortest Path/Geodesic distance Neighborhood Subgraph Complete Graph Clique Degree Distribution Hubs 5 Type of Biological Networks • • • • • • Protein interaction networks Gene regulatory networks Metabolism networks Gene co-expression networks Signal transduction networks Genetic interaction networks 6 Protein Interactions P. Uetz, et al. Nature, 2000; Ito et al., PNAS, 2001; … 7 Protein Interaction Network Nodes: proteins Links: physical interactions (Jeong et al., 2001) 8 9 Metabolic network (KEGG) Graph Node: Object e.g. Chemical compound Edge: Relation between objects e.g. Chemical reaction 10 Nodes: chemicals Metabolic Network (substrates) Links: chem. reaction 11 Metabolic Network Nodes: chemicals (substrates) Links: chemistry reactions (Ravasz et al., 2002) 12 Gene Regulation •Proteins are encoded by the DNA of the organism. •Proteins regulate expression of other proteins by interacting with the DNA protein protein Inducer (external signal) protein DNA promoter region ACCGTTGCAT Coding region Activators increase gene production X X Activator Y No transcription X binding site gene Y Y Y Sx X Y Y X* X* Bound activator INCREASED TRANSCRIPTION Repressors decrease gene production X Bound repressor Sx X X* No transcription X* Bound repressor Y Unbound repressor X Y Y Y Y Gene Regulatory Networks •Nodes are proteins (or the genes that encode them) X Y The gene regulatory network of E. coli Shen-Orr et. al. Nature Genetics 2002 •shallow network, few long cascades. •modular •compact in-degree, scale free outdegree (promoter size limitation) Gene regulatory networks 18 CoExpression Network Revealed from Yeast Cell Cycle Data 12. Protein synthesis 9. Cell cycle regulation 11. Cell differentiation 6. Mating 3. Galactose metabolism Y’-cluster 10. Stress response 5. Amino acid metabolism 7. Glucogenesis 14. Energy transport 1. Protein fate Histone 2. Amino acid synthesis Cell wall organization 13. Cell wall organization 8. unknown Protein degradation 4. Protein glycosylation and transport Mitochondrion Protein degradation 15. Ribosomal biogenesis Ribosomal proteins Yeast cell cycle microarray data (Spellman et al., 1998) 19 Signal transduction networks (BD BioScience) 20 Properties of Biological Networks • • • • • • Scale Free Small world Hierarchical Modular Robust Motif 21 Scale-Free Network Degree of a node P(k) The number of adjacent nodes degree=5 Degree distribution Frequency of nodes degree=2 with degree k degree =3 Scale-free network P(k) follows power law Different from random networks P( k ) k 22 Erdös-Rényi model (1960) Connect with probability p p=1/6 N=10 k ~ 1.5 Pál Erdös (1913-1996) Poisson distribution - Democratic - Random 23 SCALE-FREE NETWORKS (1) The number of nodes (N) is NOT fixed. Networks continuously expand by the addition of new nodes Examples: WWW : addition of new documents Citation : publication of new papers (2) The attachment is NOT uniform. A node is linked with higher probability to a node that already has a large number of links. Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again 24 (1) GROWTH : Scale-free model At every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node ki ( ki ) jk j P(k) ~k-3 A.-L.Barabási & R. Albert, Science, 1999 25 Metabolic network Archaea Bacteria Eukaryotes Organisms from all three domains of life are scale-free networks! 26 H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 2000 Topology of the protein network P(k ) ~ (k k0 ) exp( k k0 ) k H. Jeong, S.P. Mason, A.-L. Barabasi & Z.N. Oltvai, Nature, 2001 27 Nature 408 307 (2000) 28 p53 network (mammals) 29 Local clustering Clustering: My friends will likely know each other! Networks are clustered [large C] 30 Clustering Coefficient The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity nI 2n I CI k k k 1 2 k: neighbors of I The center node has 8 (grey) neighbors There are 4 edges between the neighbors nI: edges between node I’s neighbors C = 4 /((8*(8-1)) /2)= 4/28 = 1/7 31 Shortest-Path between nodes 32 Shortest-Path between nodes 33 Small-world Network • Every node can be reached from every other by a small number of hops or steps • High clustering coefficient and low meanshortest path length – Random graphs don’t necessarily have high clustering coefficients • Social networks, the Internet, and biological networks all exhibit small-world network characteristics 34 Modularity in Cellular Networks Hypothesis: Biological function are carried by discrete functional modules. Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999. Traditional view of modularity: 35 Modular vs. Scale-free Topology (a) Scale-free (b) Modular 36 How do we know that metabolic networks are modular? • clustering coefficient is the same across metabolic networks in different species with the same substrate • corresponding randomized scale free network: C(N) ~ N-0.75 (simulation, no analytical result) bacteria archaea (extreme-environment single cell organisms) eukaryotes (plants, animals, fungi, protists) scale free network of the same size 37 Real Networks Have a Hierarchical Topology What does it mean? Many highly connected small clusters combine into few larger but less connected clusters combine into even larger and even less connected clusters The degree of clustering follows: 38 Properties of hierarchical networks 1. Scale-free 2. Clustering coefficient independent of N 3. Clustering coefficient scales 39 Hierarchy in biological systems Metabolic networks Protein networks 40 Can we identify the modules? topological overlap J (i, j ) OT (i, j ) J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link min( ki , k j ) 41 Modules in the E. coli metabolism E. Ravasz et al., Science, 2002 42 Robustness Complex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns) 1 S fc 0 1 Fraction of removed nodes, f node failure 43 Robustness of scale-free networks Failures Topological error tolerance 1 R. Albert et.al. Nature, 2000 3 : fc=1 S (R. Cohen et. al., PRL, 2000) 0 fc f 1 Attacks 44 Path Length Attack Tolerance 45 Yeast protein network - lethality and topological position - Highly connected proteins are more essential (lethal)... H. Jeong, S.P. Mason, A.-L. Barabasi &Z.N. Oltvai, Nature, 200146 Network Motifs 47 Network motifs • Comparable to electronic circuit types (i.e., logic gates) • The notion of motif, widely used for sequence analysis, is generalizable to the level of networks. • Network Motifs are defined as recurring patterns of interconnections found within networks at frequencies much higher than those found in randomized networks. 48 Random vs designed/evolved features • Large networks may contain information about design principles and/or evolution of the complex system • Which features are there for a reason? – Design principles (e.g. feed-forward loops) – Constraints (e.g. the all nodes on the Internet must be connected to each other) – Evolution, growth dynamics (e.g. network growth is mainly due to gene duplication) 49 Network motifs • Uri Alon et al : “Network Motifs: Simple building Blocks of Complex Networks”; Science, 2002. • Different networks were found to have different motif abundances. • The motifs reflect the underlying processes that generate each type of network. 50 Motifs in the network graph motif to be found motif matches in the target graph 51 http://mavisto.ipk-gatersleben.de/frequency_concepts.html Detecting network motifs There are three main tasks in detecting network motifs: (1) Generating an ensemble of proper random networks (2) Counting the subgraphs in the real network and in random networks (3) Search for graphs that appear disproportionately in one list vs. the other 52 All 3-node connected subgraphs • 13 different isomorphic types of 3-node connected subgraph • There are: 199 4-node subgraphs, 9,364 5-node subgraphs, etc…… 53 Motifs detected • Two significant motifs appearing numerous times in non-homologous gene systems that perform diverse biological functions 54 Motifs II S. Wuchty, Z. Oltvai & A.-L. Barabasi, Nature Genetics, 2003 55 Probabilistic algorithm for subgraph sampling The problem : •Exhaustive subgraph enumeration complexity scales as # of subgraphs •Exponential in subgraph size •Infeasible for large networks with hubs Solution : An efficient sampling algorithm Probabilistic algorithm for subgraph sampling •Instead of examining absolute subgraph counts we define subgraph concentration : Num of n -nodes connected subgraphs type i C i Total num of all n -nodes connected subgraphs •Sampling algorithm : Different probabilities of sampling different subgraphs 58 Weight of each sample corrects for its sampling probability 4 1 2 3 P=0.33 W=3 Ci weighted weighted 5 6 7 P=0.14 W=7 samples of subgraph i samples of ALL n - nodes subgraphs types Rapid convergence to real concentration Kashtan et. al. Bioinformatics 2004 Runtime almost independent of network size Kashtan et. al. Bioinformatics 2004