* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How to reconstruct a large genetic network from n gene
Epigenetics of neurodegenerative diseases wikipedia , lookup
Human genetic variation wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n2 Easy Steps Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187. Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University CSIE in National Chi-Nan University 1 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 2 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 3 Introduction and basic definitions Gene activity includes whether a gene is expressed or not, as mRNA, as protein etc.. Gene network: In this paper, we define a genetic network as a group of genes in which individual gene can influence the activity of other genes. The core task of reconstructing genetic networks is to identify the causal structure of a gene network. CSIE in National Chi-Nan University 4 To reconstruct a genetic network is to identify, for each network gene, which other genes and their activity the gene influences directly. Now, let’s see an illustration of genetic network. CSIE in National Chi-Nan University 5 transcription factor protein kinase protein transcription phosphatase factor inactive inactive P protein P DNA Gene 1 Gene 2 active active Gene 3 Gene 4 Gene 5 This is a hypothetical biochemical pathway involving two transcription factors, a protein kinase and a protein phosphatase, as well as the genes encoding them. CSIE in National Chi-Nan University 6 Genetic perturbation: an experimental manipulation of gene activity by manipulating either a gene itself or its product. It includes point mutations, gene deletions, or other interference with the activity of the product. CSIE in National Chi-Nan University 7 transcription factor protein kinase protein transcription phosphatase factor inactive inactive P protein P DNA Gene 1 Gene 2 Genetic perturbation: gene deletion Aspect of gene activity: mRNA expression G1: G2: G3: G4: G5: active active Gene 3 Gene 4 Gene 5 Genetic perturbation: gene deletion Aspect of gene activity: phosphorlation state G2, G5 G5 G5 G5 G1: G2: G3: G4: G5: CSIE in National Chi-Nan University G3, G4 G3, G4 G4 8 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 9 Graph theoretical framework As the previous instance indicated, we are concerned with qualitative information on gene interaction. We consider a “digraph”, a graph representation of genetic networks, to this qualitative information. A digraph is a directed graph consisting of nodes and directed edges. Let’s see an example. CSIE in National Chi-Nan University 10 We use a → b to mean that gene a influence the activity of gene b directly. For brevity, genes will be labeled by numbers from now on. 1 13 17 18 4 8 7 19 9 6 3 2 11 20 10 15 5 12 16 0 14 CSIE in National Chi-Nan University 11 Adjacency list: for each gene i, it simply shows which genes’ activity state the gene i influences directly. We denote Adj(G) to be the adjacency list of graph G and Adj(i) to be the set of nodes (genes) adjacent to (directly influenced by) node i. CSIE in National Chi-Nan University 12 Adjacency list of G: 1 13 17 18 4 8 7 19 9 6 3 2 11 20 10 15 5 12 16 0 G 14 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: CSIE in National Chi-Nan University 16 2 5 8 12 5 12 2 17 10 15 1 20 20 14 8 17 0 0 2 8 8 6 18 13 Accessibility list: the list of perturbation effects or the list of regulatory effects. It shows all nodes (genes) that can be accessed (influenced in their activity state) from a given gene by paths of direct interactions. We denote Acc(G) to be the accessibility list of the graph G and Acc(i) to be the set of nodes that can be reached (influenced) from node (gene) i. CSIE in National Chi-Nan University 14 Accessibility list of G: 1 13 17 18 4 8 7 19 9 6 3 2 11 20 10 15 5 12 16 0 14 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 2 16 0 2 5 8 12 14 16 0 2 12 14 16 0 2 5 12 14 16 2 8 17 0 0 0 0 8 0 0 2 8 1 2 5 6 10 12 14 15 16 18 20 1 2 5 6 12 14 16 18 20 2 5 6 12 14 16 18 20 2 14 16 17 2 16 2 16 8 0 2 5 6 12 14 16 18 G CSIE in National Chi-Nan University 15 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 16 Before proceeding with the algorithm, we have to give some concepts and theorems first. CSIE in National Chi-Nan University 17 The most parsimonious network An acyclic digraph defines its accessibility list, but an accessibility list may have more than one corresponding acyclic digraph. Let’s see an example first. CSIE in National Chi-Nan University 18 (d) is the most parsimonious network of Acc, i.e., (a). 0: 1: 2: 3: 4: 5: 0 1 2 3 4 5 2 3 4 5 3 4 5 1 2 5 4 (a) 5 (b) 0 0 1 1 2 2 4 3 3 5 4 3 5 (c) (d) CSIE in National Chi-Nan University 19 An accessibility list Acc and a digraph G are compatible if G has Acc as its accessibility list. Acc is the accessibility list induced by G. Gpars is called the most parsimonious network compatible with Acc. CSIE in National Chi-Nan University 20 Why we prefer the most parsimonious network? We prefer simplest or most parsimonious one of gene network. For any accessibility list Acc of a digraph G, there exists a most parsimonious network Gpars. (From a result of a theorem.) Therefore Gpars is the core of all the corresponding digraphs. More complicated digraphs make people confused. CSIE in National Chi-Nan University 21 Theorem 1 Let Acc be the accessibility list of an acyclic digraph. Then there exists exactly one graph Gpars that has Acc as its accessibility list and that has fewer edges than any other graph G with Acc as its accessibility list. Before starting the proof, we need to introduce some terminology. CSIE in National Chi-Nan University 22 Range and shortcut Consider two nodes i and j of a digraph that are connected by an edge e. The range r of the edge e is the length of the shortest path between i and j in the absence of e. If there is no other path connecting i and j, then r : = . An edge e with range r ≥ 2 but shortcut. is called a Let’s see an example. CSIE in National Chi-Nan University 23 e i j r (e) = k + 1 zk z1 zk-1 e is a shortcut. When eliminating e, i and j are still connected by a path of length k + 1, so r (e) = k + 1. z2 zk-2 CSIE in National Chi-Nan University 24 Lemma 1 For any accessibility list Acc of a digraph, there exists a compatible graph Gpars that is free of shortcuts. CSIE in National Chi-Nan University 25 Proof of Lemma 1 Assume that there is no such graph Gpars. ei yi xi yi xi deleting ei Pi Pi Length of Pi is greater than 1. If there exists a shortcut ei between xi and yi , delete ei . Then by the definition of shortcut, we’ll derive that xi and yi are still connected via Pi , whose length is greater than 1. CSIE in National Chi-Nan University 26 Suppose that we have n possible (xi , yi), i.e., (x1 , y1), …, (x1, xn). After repeating all possible (xi , yi), i = 1, …, n, we’ll derive a shortcut-free graph compatible with the accessibility list. This is a contradiction to the assumption made in the beginning of this proof. CSIE in National Chi-Nan University 27 Lemma 2 Assume that Acc is the accessibility list of a digraph G. For each node x, the adjacency list Adj(x) of a shortcut-free graph Gpar compatible with Acc is a subset of the adjacency list Adj(x) of any graph compatible with Acc. CSIE in National Chi-Nan University 28 Proof of Lemma 2 Assume that Lemma 2 is false. W. L. O. G., suppose that a shortcut-free graph Gpars and some other graph G induce Acc. By assumption, Gpars contains at least one node x so that Adj(x) of Gpars contains at least one node y that isn’t in Adj(x) of G. CSIE in National Chi-Nan University 29 Because G and Gpars have the same accessibility list Acc, there must exist some path x → z1 → z2 → … → zk → y from x to y in G. For the same reason, z1 is accessible from x in Gpars, z2 from z1 in Gpars, … and zk from zk-1 in Gpars. Therefore we can find two paths (x →…→y) in Gpars: (1) the edge e between x and y (2) the path x → z1 →z2 →… →zk →y This is in contradiction to the assumption that Gpars is shortcut-free because e is a shortcut. Let’s see an example! CSIE in National Chi-Nan University 30 Acc: x: z1 z2 y z1 : z2 y z2 : y Adj(Gpars): x z1 x: z1 y z1: z2 z2 : y Adj(G): x: z1 z2 z1: z2 z2 : y x z1 A shortcut! z2 y z2 y Gpars G CSIE in National Chi-Nan University 31 Corollary 1 The shortcut-free graph Gpars compatible with Acc is a unique graph with the fewest edges among all graphs G compatible with Acc. This corollary follows immediately from Lemma 2. CSIE in National Chi-Nan University 32 Now, we can proceed to the algorithm. CSIE in National Chi-Nan University 33 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 34 1: 2: for all nodes i of G Adj(i) = Acc(i) 3: 4: 5: 6: for all nodes i of G if node i hasn’t been visited call PRUNE_ACC(i) end if 7: 8: 9: 10: 11: 12: 13: PRUNE_ACC(i) for all nodes j Acc(i) if Acc(j) = declare j as visited. else call PRUNE_ACC(j) end if 14: 15: 16: 17: 18: 19: 20: for all nodes j Acc(i) for all nodes k Adj(j) if k Acc(i) delete k from Adj(i) end if declare node i as visited end PRUNE_ACC(i) CSIE in National Chi-Nan University A recursive pruning algorithm to reconstruct the most parsimonious graph from an accessibility list. 35 This algorithm is based on the following theorem, so we have to get something from the theorem. CSIE in National Chi-Nan University 36 Theorem 2 Let Acc(G) be the accessibility list of an acyclic digraph, Gpars its most parsimonious graph, and V(Gpars) the set of all nodes of Gpars. Then the following identity holds: In stead of proving the theorem, we give an example later. CSIE in National Chi-Nan University 37 0: 1: 2: 3: 4: 5: 1 2 3 4 5 2 3 4 5 3 4 5 0: 1: 2: 3: 4: 5: 5 1 2 3 4 5 3 4 5 0: 1: 2: 3: 4: 5: 5 1 2 3 4 5 5 Original Acc(G) 0 1 via 2, 3, 4, 5 0 via 1, 2, 3, 4, 5 1 1 1 2 4 0 0 3 5 2 2 4 3 5 4 3 5 A possible corresponding G CSIE in National Chi-Nan University 38 0: 1: 2: 3: 4: 5: 0: 1: 2: 3: 4: 5: 1 2 3 4 5 5 2 via 3, 4, 5 1 5 1 2 3 4 5 0 4 via 5 1 1 2 2 5 0: 1: 2: 3: 4: 5: 0 0 4 1 2 3 4 3 4 2 4 3 5 3 5 The most parsimonious network CSIE in National Chi-Nan University 39 Actually, the aforementioned example is an illustration of our algorithm. From this theorem, we can derive Corollary 2. CSIE in National Chi-Nan University 40 Corollary 2 Let i, j and k be any three pairwise different nodes of an acyclic directed shortcut-free graph G. If j is accessible from i, then no node k accessible from j is adjacent to i. i A shortcut !! j k CSIE in National Chi-Nan University 41 Computational complexity Let k < n − 1 be the average number of entries in a node’s accessibility list. Assume that there are n genes, that is, n entries. CSIE in National Chi-Nan University 42 During execution, each node accessible from a node j induces one recursive call of PRUNE_ACC, after which the node accessed from j is declared as visited. Thus each entry of the accessibility list of a node is explored no more than once. Line 15 of the algorithm loops over all nodes adjacent to a node j. Let a denotes the average number of entries in Adj(j). The overall computational complexity would be O(nka). CSIE in National Chi-Nan University 43 For practical matters, large scale experimental gene perturbations in the yeast Saccharomyces cerevisiae (n ≈ 6300) suggest that k < 50 ([HMJRS2000]), a ≤ 1 ([W2001a]) and thus nka << n2. CSIE in National Chi-Nan University 44 Storage complexity The algorithm stores two copies of the accessibility list, as well as a list of the nodes that has been visited. Because the graph is acyclic, the recursion depth can be no greater than n − 1. Note that k < n − 1 is the average number of entries in a node’s accessibility list. The overall storage requirements are O(nk). CSIE in National Chi-Nan University 45 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions CSIE in National Chi-Nan University 46 Dealing with cycles All we have mentioned are restricted on acyclic graphs. Now let us go to see the problems brought by cyclic graphs. CSIE in National Chi-Nan University 47 Problems that single gene perturbation can’t solve 1 2 4 2 3 1 0 4 3 0: 1: 2: 3: 4: 0 1 0 0 0 0 2 2 1 1 1 3 3 3 2 2 4 4 4 4 3 They have the same accessibility list. Therefore, we can not reconstruct the gene network uniquely. CSIE in National Chi-Nan University 48 1 2 4 2 3 1 0 4 3 0: 1: 2: 3: 4: 0 3 4 1 2 0 0: 1: 2: 3: 4: 1 2 3 4 0 Note that the order of direct regulatory interactions in these two networks is different, as reflected in the adjacency lists. CSIE in National Chi-Nan University 49 Instead of solving this problem, we collapse the nodes which form a cycle into a single group of nodes with indistinguishable order of regulatory interactions. Such a single group can be also called a strongly connected component or strong component of a directed graph G. Every two nodes in a strong component are mutually accessible. Let us see an example. CSIE in National Chi-Nan University 50 1 3 5 2 4 15 7 9 14 6 0 8 10 12 13 11 A single group This graph is called a condensation of G. 1, 3, 4, 5, 15 2 7 6 , 9, 12 14 0 10 8 A single group CSIE in National Chi-Nan University 11 13 51 How do we construct a condensation of a gene network? There are a theorem and a corollary before our presenting the algorithm constructing a condensation of a gene network. CSIE in National Chi-Nan University 52 Theorem 3 Let P be the accessibility matrix of a digraph G with n nodes, x1, …, xn. The strong component containing xi is determined by the unit entries of ith row in the matrix . xi CSIE in National Chi-Nan University 53 Corollary 3 Let i and j (i ≠ j) be two nodes of a digraph G. i and j are in the same component iff and We use corollary 3 because we will work with accessibility lists, not matrices. Now we are going to present the algorithm. CSIE in National Chi-Nan University 54 1: 2: 3: 4: 5: 6: 7: 8: 9: for all nodes i of G if component [i] has not been defined create new node x of G* component [i] = x for all nodes j Acc (i) if i Acc (j) component [j] = x end if end if 10: for all nodes i of G* 11: 12: for all nodes i of G 13: for all nodes j Acc (i) 14: if component [i] ≠ component [j] 15: if component [j] 16: add component [j] to 17: end if 18: end if CSIE in National Chi-Nan University 55 1: 2: 3: 4: 5: 6: 7: 1 2 7 5 4 3 6 2 1 1 5 6 5 5 3 3 2 6 7 7 6 4 5 6 7 4 5 6 7 4 5 6 7 7 1 2 7 5 4 3 6 x1 x3 x2 CSIE in National Chi-Nan University 56 1 2 7 5 4 3 6 x1 1: 2: 3: 4: 5: 6: 7: 2 1 1 5 6 5 5 3 3 2 6 7 7 6 4 5 6 7 4 5 6 7 4 5 6 7 7 x3 x2 CSIE in National Chi-Nan University 57 Storage and time complexity The graph G* has at most the same number of nodes and accessibility list. The algorithm generates only one copy of G* and its accessibility list. Therefore both time and storage complexity are O(k), where k is the average number of entries of the accessibility list. (k < n2) CSIE in National Chi-Nan University 58 Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions References CSIE in National Chi-Nan University 59 Conclusions Genetics is concerned with identifying the gene interactions and their biological significance. Function genomics takes this concern to the next level, that is, identifying gene interactions among thousands of genes in a genome. There are other ways to simplify gene networks, such as Boolean logic design, reduction in symbolic logic, graph theory, and etc.. CSIE in National Chi-Nan University 60 References [BB2001] Arabidopsis Gene Knockout: Phenotypes Wanted, Bouche, N. and Bouchez, D., Curr. Opin. Plant Biol., vol. 4, pp. 111-117. [DIB97] Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale, DeRisi, J. L., Iyer, V. R., Brown, P. O., Science, Vol. 278, pp. 680-686. [ESBB98] Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D., Proc. Natl Acad. Sci. USA, vol. 95, pp. 14863-14868. [FW2000] The Small World of Metabolism, Fell, D. and Wagner, A., Nature Biotechnology, Vol. 18, pp. 1121-1122. [FKZMS2000] Functional Genomic Analysis of C. elegans Chromosome I by Systematic RNA Interference, Fraser, A. G., Kamath, R. S., Zipperlen, P., MartinezCampos, M. and Sohrmann, M., Nature, Vol. 408, pp. 325-330. CSIE in National Chi-Nan University 61 [GEOCJ2000] Functional Genomic Analysis of Cell Division in C. elegans Using RNAi of Genes on Chromosome III, Gonczy, P., Echeverri, C., Oegema, K., Coulson, A. and Jones, S. J. M. et al., Nature, Vol. 408, pp. 331-336. [H69] Graph Theory, Harary, F., Addison-Wesley, Reading, MA., 1969. [HMJRS2000] Functional Discovery via a Compendium of Expression Profiles, Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J. and Stoughton, R. et al., Cell, Vol. 102, 2000, pp. 109-126. [JTAOB2000] The Large-Scale Organization of Metabolic Networks, Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barebasi, A. L., Nature, Vol. 407, pp. 651-654. [MN99] LEDA: a Platform for Combinatorial and Geometric Computing, Mehlhorn, K. and Naher, S., Cambrige Unversity Press, Cambrige, 1999. [SSBRL99] The Berkeley Drosophila Genome Project Gene Disruption Project: Single P-element Insertions Mutating 25% of Vital Drosophila Genes, Spradling, A. C., Stern, D., Beaton, A., Rhem, E. J. and Laverty, T. et al., Genetics, Vol. 153, 1999, pp. 135-177. CSIE in National Chi-Nan University 62 [THCCC99] Systematic Determination of Genetic Network Architecture, Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. M., Nature Genet., Vol. 22, 1999, pp. 281-285. [W2000] Mutational Robustness in Genetic Networks of Yeast, Wagner, A., Nature Genet., Vol. 24, 2000, pp. 355-361. [W2001a] Genetic Networks Are Sparse: Estimates Based on a LargeScale Genetic Perturbation Experiment, submitted, Wagner, A., 2001. [W2001b] The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant Duplicate Genes, Wagner, A., Mol. Bio. Evol., Vol. 18, 2001, pp. 1283-1292. [WF2001] The Small World Inside Large Metabolic Networks, Wagner, A. and Fell, D., Proceedings of the Royal Society of London, Series B, Vol. 268, pp. 1803-1810. [W97] The Structure and Dynamics of Small World Networks, Watts, D. J., PhD Dissertation, Cornell University, 1999. [WSALA99] Functional Characterization of the S. cerevisiae Geneome by Gene Deletion and Parallel Analysis, Winzeler, E. A., Shoemaker, D. D., Astromoffm A., Liang, H. and Anderson, K. et al., Science, Vol. 285, pp. 901-906. CSIE in National Chi-Nan University 63