* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download duplicativenetworks
Survey
Document related concepts
Silencer (genetics) wikipedia , lookup
Magnesium transporter wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
Network motif wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein adsorption wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Proteolysis wikipedia , lookup
Node of Ranvier wikipedia , lookup
Transcript
Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University Acknowledgements Colin Cooper, KCL Joe Nadeau, CWRU Petra Berenbrink, SFU Gurkan Bebek, CWRU NSF, NSERC, CRC Programme, Charles Wang Foundation Networks are found in biological systems of varying scales: time units: millions of years 1. the evolutionary tree of life 2. ecological networks 3. the genetic control networks of organisms 4. the protein interaction network in cells 5. the metabolic network in cells time units: millionth of a second Proteins in a cell There are thousands of different active proteins in a cell acting as: enzymes, catalysors to chemical reactions of the metabolism components of cellular machinery (e.g. ribosomes) regulators of gene expression Certain proteins play specific roles in special cellular compartments Others move from one compartment to another as “signals”. Protein Interactions Proteins are produced and degraded all of the time. The rates at which these processes occur depend on what proteins are already present, how they interact with one another directly and how they interact with genes (at DNA or mRNA level). Proteins that bind to DNA or RNA have direct effect on production or degradation of other proteins. One protein can speed up or slow down the rate of production of another by binding to the corresponding DNA or mRNA, What is a proteome network? Represents interactions between pairs of proteins as a binary relationship. Forms a network in which: Vertex = protein Link = interaction Establishes an ordinary graph of all proteins in an organism and all possible interactions between them. MIPS Proteome Network Topology PPI Database Sources DIP (Database of Interacting Proteins - UCLA) BIND (also include other molecule interactions) MIPS (Munich information center for proteins) others including: PROTEOME PRONET CURAGEN PIM see http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html Complete Yeast Proteome Network The yeast proteome network seems to reveal two basic graph theoretic properties: The frequency of proteins having interactions with exactly k other proteins follows a power law: f(d) ~ C.d-a The network exhibits the small world phenomena: small degree of separation between individuals Degree Distribution of PPI Network of the yeast [Wagner], [Jeong et al.] Small world phenomena & power-law degree distribution also observed in: communication networks web graphs research citation networks social networks [Albert, Barabasi & Jeong], [Broder et al.], [Faloutsos3] Classical -Erdos-Renyi type random graphs do not exhibit these properties: Links between pairs of fixed set of nodes picked uniformly: Maximum degree logarithmic with network size No hubs to make short connections between nodes Preferential Attachment Model [Yule], [Simon] New node G Power-law graphs can be generated by an iterative process: Add one new node at a time Connect new node to existing ones independently: Probability that a node is connected to the new node is proportional to degree [Bollabas et al] Such graphs also exhibit small world phenomena [Barabasi & Albert], [Bollabas & Riordan] Proteome network modeling The model should capture the underlying mechanisms that generate the network while satisfying known mathematical properties: Ohno’s model of genome growth by duplication Duplication based graphs [Kleinberg et al.], [Kumar et al] [Pastor-Satorras et al], [Chung et al.]: each iteration duplicates a randomly chosen vertex with all its links. it then independently deletes existing edges and inserts new ones. Analysis of incoming degree distribution in directed graphs reveal a power law. Simulations on undirected networks exhibit power law like degree distributions. Duplication Model At each iteration t (= total number of nodes) G 1. Existing vertex is chosen uniformly at random and is ``duplicated'' with all its links. 2. Emulate mutations by a. each link of the new vertex is deleted with probability q = 1-p b. inserting edges between the new node and every other node with probability r/t Degree distribution of the best fitting duplication model Expected degree distribution Iterative process give difference equations based on both degree and time; for r = 0: Ft+1(d) = Ft(d) (1- pd/t) + Ft(d-1) p(d-1)/t + 1/t Sj>d-1Ft(j) pk qj-k [j! / k! (j-k)!] [Pastor-Satorras et al.]: approximate difference equations by differential equations to come up with a power law with exponential cutoff: F(d)/t = f(d) ~ C d-a b-d Underlying assumption: Pr[ t+1 generates a degree d node] depends on ft(d+1) and ft(d) only [Chung et al.]: verify whether power law degree distribution is satisfied by the difference equations: ft(d) ~ C d-a for sufficiently large t,d Underlying assumption: ft(d) is independent of t for all d Counter evidence ft(d) with exponential cutoff will result in a maximum degree of O(log t) as per Erdos-Renyi graphs. Power law degree distribution implies a maximum degree of W(tp) ft(d) can not be independent of t: for r=0 and p=.5, the fraction of singletons approach 1 with growing t What to do with singletons Allow them to exist and duplicate (and let them dominate) Allow them to exist but not duplicate (will have fixed fraction – do not agree with the yeast network) Do not allow them to exist: Either delete as soon as one is created Or, always have one default connection to one of the existing nodes adding a fourth term to the difference equation Ft+1(k) = Ft(k) (1- pk/t) + Ft(k-1) p(k-1)/t + 1/t Sj>k-1Ft(j) pk qj-k [j! / k! (j-k)!] + (Ft(k-1) - Ft(k)) Sj>0 Ft(j) qj / t2 which gives a power law if 1 = pa – p + p-a Other properties: k-reachability rk(n): number of nodes that are at most k hops away from n. r1(n1) = 5 r2(n1) = 9 r3(n1) = 10 k-reachability of individual nodes (nodes sorted by degree) Average k-reachability of nodes with fixed initial degree Average degree distribution of the neighbors is independent of the degree of the original node However the variance is high Sequence Homology Two proteins are sequencewise homologous if their pairwise cDNA alignment results with 50% similarity and above: Dual phase distribution of the total number of protein pairs as a function of percentage similarity: cDNA sequence source: ftp://genome-ftp.stanford.edu/pub/yeast Sequence homology vs interactions If p1 - p2 are similar and p2 - p3 interact, then with .03 chance p1 - p3 interact. if p1 - p2 are similar, p2 - p3 are similar, then p1 - p3 are similar with .64 chance If two genes physically interact with each other, it is very likely that they are not similar (excluding self interactions). Sequence Homology Enhanced Duplication Model • Given duplicate node i: Each interaction edge (i,j) is deleted with probability q. For each similarity edge (j,k), with .03 probabilty, the interaction edge (i,k) is deleted. • Each similarity edge (i,j) is deleted with probability q’. For each similarity edge (j,k), with .64 probabilty the similarity edge (i,k) is deleted. For each interaction edge (j,k), with .03 probabilty the interaction edge (i,k) is deleted. • For each j, a new interaction edge (i,j) is added with probability r/t. For each similarity edge (j,k), with .03 probabilty the interaction edge (i,k) is added. • A new similarity edge (i,j) is added with probability r’/t. For each similarity edge (j,k), with .64 probabilty, the similarity edge (i,k) is added. For each interaction edge (j,k), with .03 probabilty, the interaction edge (i,k) is added. Degree Distribution of the Enhanced Model k-reachability of individual nodes (nodes sorted by degree)