* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt - Chair of Computational Biology
Oncogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Essential gene wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Integrated Analysis of Metabolic and Regulatory Networks Sofar, studies of large-scale cellular networks have focused on their connectivities. The emerging picture shows a densely-woven web where almost everything is connected to everything. In the cell‘s metabolic network, hundreds of substrates are interconnected through biochemical reactions. Although this could could in principle lead to the simultaneous flow of substrates in numerous directions, in practice metabolic fluxes pass through specific pathways. Topological studies sofar did not consider how the modulation of this connectivity might also determine network properties. Therefore it is important to correlate the network topology (picture derived from EFMs and EPs) with the expression of enzymes in the cell. Start with review of last lecture‘s final point about coupling of metabolic and regulatory networks. 22. Lecture WS 2003/04 Bioinformatics III 1 Application of elementary modes Metabolic network structure of E.coli determines key aspects of functionality and regulation Compute EFMs for central metabolism of E.coli. Catabolic part: substrate uptake reactions, glycolysis, pentose phosphate pathway, TCA cycle, excretion of by-products (acetate, formate, lactate, ethanol) Anabolic part: conversions of precursors into building blocks like amino acids, to macromolecules, and to biomass. Stelling et al. Nature 420, 190 (2002) 22. Lecture WS 2003/04 Bioinformatics III 2 Robustness analysis The # of EFMs qualitatively indicates whether a mutant is viable or not, but does not describe quantitatively how well a mutant grows. Define maximal biomass yield Ymass as the optimum of: Yi , X / Si ei Sk ei ei is the single reaction rate (growth and substrate uptake) in EFM i selected for utilization of substrate Sk. Stelling et al. Nature 420, 190 (2002) 22. Lecture WS 2003/04 Bioinformatics III 3 Can regulation be predicted by EFM analysis? Compute control-effective fluxes for each reaction l by determining the efficiency of any EFM ei by relating the system‘s output to the substrate uptake and to the sum of all absolute fluxes. With flux modes normalized to the total substrate uptake, efficiencies i(Sk, ) for the targets for optimization -growth and ATP generation, are defined as: ei eiATP i S k , and i S k , ATP l ei eil l l Control-effective fluxes vl(Sk) are obtained by averaged weighting of the product of reactionspecific fluxes and mode-specific efficiencies over all EFMs using the substrate under consideration: vl S k 1 YXmax / Sk l S , e i k i i S , i k 1 YAmax / Sk l l S , ATP e i k i i S , ATP i k l YmaxX/Si and YmaxA/Si are optimal yields of biomass production and of ATP synthesis. Control-effective fluxes represent the importance of each reaction for efficient and flexible operation of the entire network. Stelling et al. Nature 420, 190 (2002) 22. Lecture WS 2003/04 Bioinformatics III 4 Prediction of gene expression patterns As cellular control on longer timescales is predominantly achieved by genetic regulation, the control-effective fluxes should correlate with messenger RNA levels. Compute theoretical transcript ratios (S1,S2) for growth on two alternative substrates S1 and S2 as ratios of control-effective fluxes. Compare to exp. DNA-microarray data for E.coli growing on glucose and acetate. Excellent correlation! Stelling et al. Nature 420, 190 (2002) 22. Lecture WS 2003/04 Calculated ratios between gene expression levels during exponential growth on acetate and exponential growth on glucose (filled circles indicate outliers) based on all elementary modes versus experimentally determined transcript ratios. Lines indicate 95% confidence intervals for experimental data (horizontal lines), linear regression (solid line), perfect match (dashed line) and two-fold deviation (dotted line). Bioinformatics III 5 Analyze transcriptional control in metabolic networks Regulatory and metabolic functions of cells are mediated by networks of interacting biochemical components. Metabolic flux is optimized to maximize metabolic efficiency under different conditions. Control of metabolic flow: - allosteric interactions - covalent modifications involving enzymatic activity - transcription (revealed by genome-wide expression studies) Here: N. Barkai and colleagues analyzed published experimental expression data of Saccharomyces cerevisae. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 6 Recurrence signature algorithm Availability of DNA microarray data study transcriptional response of a complete genome to different experimental conditions. An essential task in studying the global structure of transcriptional networks is the gene classification. Commonly used clustering algorithms classify genes successfully when applied to relatively small data sets, but their application to large-scale expression data is limited by 2 well-recognized drawbacks: - commonly used algorithms assign each gene to a single cluster, whereas in fact genes may participate in several functions and should thus be included in several clusters - these algorithms classify genes on the basis of their expression under all experimental conditions, whereas cellular processes are generally affected only by a small subset of these conditions. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 7 Recurrence signature algorithm Aim: identify transcription „modules“ (TMs). a set of randomly selected genes is unlikely to be identical to the genes of any TM. Yet many such sets do have some overlap with a specific TM. In particular, sets of genes that are compiled according to existing knowledge of their functional (or regulatory) sequence similarity may have a significant overlap with a transcription module. Algorithm receives a gene set that partially overlaps a TM and then provides the complete module as output. Therefore this algorithm is referred to as „signature algorithm“. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 8 Recurrence signature algorithm normalization of data identify modules classify genes into modules a, The signature algorithm. b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input sets containing different subsets of the postulated transcription module. If the different input sets give rise to the same module, it is considered reliable. c, General application of the recurrent signature method. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 9 Normalize expression matrices Collect from literature expression dataset composed of over 1000 conditions, including environmental stresses, profiles of deletion mutants and natural processes such as cell cycle. Element Egc of the gene expression matrix contains the log-expression change of gene g {1, ..., NG} at the experimental conditions c {1, ..., NC} where NG and NC denote the total number of genes and conditions, respectively. Introduce 2 normalized expression matrices EGgc and ECgc with zero mean and unit variance with respect to genes and conditions EGgc E gc 2 G gG gG 0 ECgc 1 E where ...x denote the average with respect to x. gc 2 C cC cC 0 1 Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 10 Experiment signature SC The input set consists of NI genes: GI g1 ,..., g N I G Score each experimental condition by the average expression change over the genes of the input set. The condition score is: gc sc EG gGI The experiment signature SC contains those conditions whose absolute score is statistically significant: SC c C : sc sc cC t C C Here use tC = 2.0 as the condition threshold level and the standard deviation expected for random fluctuations of C 1 NI Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 11 Gene Signature SG In the next step, score all genes by the weighted average change in the expression with the experimental signature. The gene score is: s g sc ECgc cS c The gene signature SG contains those genes whose absolute score is statistically significant: SG g G : s g s g gG tG G Here use tG = 3.0 as the gene threshold level and the measured standard deviation G. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 12 Fusion of signatures Apply signature algorithm to reference input set GIref and to a set of input sets {GI(i)} that are obtained from GIref ( identify robust modules!) Each set contains a fraction of the „wanted“ genes in GI(i) and some unrelated genes that were selected at random. The result is a reference signature Sref and a collection of modified signatures {Si}. The overlap between any of these signatures and the reference signature is defined as S S OLref i i ref Si S ref where |...| refers to the size of a set and denotes intersection. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 13 Fusion of signatures All signatures Si whose overlap with the reference signature exceeds a certain threshold are included in the set of recurrent signatures R Si : OLref tR i The threshold tR must be chosen to be large enough to discriminate against random fluctuations, but small enough to include a significant fraction of signatures. Here, tR = 70%. A module is obtained by selecting only those genes that appear in at least 80% of all signatures in R. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 14 Fusion of signatures Generate modules from recurrent signatures: To fuse pairs of recurrent signatures {Si, Sj} into transcription modules: For each pair, compute the intersect Pij = Si Sj of genes appearing in both signatures as well as the overlap OLij Pij Si S j Select the pair signature Pref with the largest associated overlap OLref as the „seed“ of a new module. Assign all pair signatures Pij whose overlap with Pref exceeded a certain fraction tR of OLref to the set of recurrent signatures R : R Pij : OL Pij , Pref t R OLref Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 15 Fusion of signatures Obtain gene content and scores of the associated module from R. Remove the pairs that were assigned to R from the total „pool“ of pair signatures {Pij}. To avoid identification of more, less-coherent realizations of the same module, remove also those pairs from R that would have been assigned to R for a somewhat lower value of threshold tR unless they had a significant overlap (~75%) with any other pair signature. This process is iterated until all sets are assigned. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 16 Numerical test Apply algorithm to set of Ncore genes that are known to be co-regulated. Then add Nrand randomly selected genes. The addition of many random genes leaves the output of the signature algorithm essentially unchanged. In detail: A reference set of Ncore co-regulated genes was composed of genes encoding either ribosomal proteins (dashed lines) or proteins involved in amino-acid biosynthesis (dashed/dotted line). The recurrent signature method was applied to this set as follows. First, a collection of input sets was derived by randomly adding genes to the reference set. Second, the signature algorithm was applied to the reference set and to the derived sets; this generates a reference signature and a collection of perturbed signatures, respectively. Last, the overlaps between the reference signature and the perturbed signatures were calculated. Shown is the average overlap as a function of the number of genes added to the reference set. The different lines correspond to different choices of Ncore, shown in parentheses. Ihmels et al. Nat Genetics 31, 370 (2002) 22. Lecture WS 2003/04 Bioinformatics III 17 Correlation between genes of the same metabolic pathway Distribution of the average correlation between genes assigned to the same metabolic pathway in the KEGG database. The distribution corresponding to random assignment of genes to metabolic pathways of the same size is shown for comparison. Importantly, only genes coding for enzymes were used in the random control. Interpretation: pairs of genes associated with the same metabolic pathway show a similar expression pattern. However, typically only a set of the genes assigned to a given pathway are coregulated. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 18 Correlation between genes of the same metabolic pathway Genes of the glycolysis pathway (according KEGG) were clustered and ordered based on the correlation in their expression profiles. Shown here is the matrix of their pair-wise correlations. The cluster of highly correlated genes (orange frame) corresponds to genes that encode the central glycolysis enzymes. The linear arrangement of these genes along the pathway is shown at right. Of the 46 genes assigned to the glycolysis pathway in the KEGG database, only 24 show a correlated expression pattern. In general, the coregulated genes belong to the central pieces of pathways. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 19 Coexpressed enzymes often catalyze linear chain of reactions Coregulation between enzymes associated with central metabolic pathways. Each branch corresponds to several enzymes. In the cases shown, only one of the branches downstream of the junction point is coregulated with upstream genes. Interpretation: coexpressed enzymes are often arranged in a linear order, corresponding to a metabolic flow that is directed in a particular direction. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 20 Co-regulation at branch points To examine more systematically whether coregulation enhances the linearity of metabolic flow, analyze the coregulation of enzymes at metabolic branch-points. Search KEGG for metabolic compounds that are involved in exactly 3 reactions. Only consider reactions that exist in S.cerevisae. 3-junctions can integrate metabolic flow (convergent junction) or allow the flow to diverge in 2 directions (divergent junction). In the cases where several reactions are catalyzed by the same enzymes, choose one representative so that all junctions considered are composed of precisely 3 reactions catalyzed by distinct enzymes. Each 3-junction is categorized according to the correlation pattern found between enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered significant. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 21 Coregulation pattern in three-point junctions All junctions corresponding to metabolites that participate in exactly 3 reactions (according to KEGG) were identified and the correlations between the genes associated with each such junction were calculated. The junctions were grouped according to the directionality of the reactions, as shown. Divergent junctions, which allow the flow of metabolites in two alternative directions, predominantly show a linear coregulation pattern, where one of the emanating reaction is correlated with the incoming reaction (linear regulatory pattern) or the two alternative outgoing reactions are correlated in a context-dependent manner with a distinct isozyme catalyzing the incoming reaction (linear switch). By contrast, the linear regulatory pattern is significantly less abundant in convergent junctions, where the outgoing flow follows a unique direction, and in conflicting junctions that do not support metabolic flow. Most of the reversible junctions comply with linear regulatory patterns. Indeed, similar to divergent junctions, reversible junctions allow metabolites to flow in two alternative directions. Reactions were counted as coexpressed if at least two of the associated genes were significantly correlated (correlation coefficient >0.25). As a random control, we randomized the identity of all metabolic genes and repeated the analysis. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III In the majority of divergent junctions, only one of the emanating branches is significantly coregulated with the incoming reaction that synthesizes the metabolite. 22 Co-regulation at branch points: conclusions The observed co-regulation patterns correspond to a linear metabolic flow, whose directionality can be switched in a condition-specific manner. When analyzing junctions that allow metabolic flow in a larger number of directions, there also only a few important branches are coregulated with the incoming branch. Therefore: transcription regulation is used to enhance the linearity of metabolic flow, by biasing the flow toward only a few of the possible routes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 23 Connectivity of metabolites The connectivity of a given metabolite is defined as the number of reactions connecting it to other metabolites. Shown are the distributions of connectivity between metabolites in an unrestricted network () and in a network where only correlated reactions are considered (). In accordance with previous results (Jeong et al. 2000) , the connectivity distribution between metabolites follows a power law (log-log plot). In contrast, when coexpression is used as a criterion to distinguish functional links, the connectivity distribution becomes exponential (log-linear plot). Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 24 Differential regulation of isozymes Observe that isozymes at junction points are often preferentially coexpressed with alternative reactions. investigate their role in the metabolic network more systematically. Two possible functions of isozymes associated with the same metabolic reaction. An isozyme pair could provide redundancy which may be needed for buffering genetic mutations or for amplifying metabolite production. Redundant isozymes are expected to be coregulated. Alternatively, distinct isozymes could be dedicated to separate biochemical pathways using the associated reaction. Such isozymes are expected to be differentially expressed with the two alternative processes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 25 Differential regulation of isozymes in central metabolic PW Arrows represent metabolic pathways composed of a sequence of enzymes. Coregulation is indicated with the same color (e.g., the isozyme represented by the green arrow is coregulated with the metabolic pathway represented by the green arrow). Most members of isozyme pairs are separately coregulated with alternative processes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 26 Differential regulation of isozymes Regulatory pattern of all gene pairs associated with a common metabolic reaction (according to KEGG). All such pairs were classified into several classes: (1) parallel, where each gene is correlated with a distinct connected reaction (a reaction that shares a metabolite with the reaction catalyzed by the respective gene pair); (2) selective, where only one of the enzymes shows a significant correlation with a connected reaction; and (3) converging, where both enzymes were correlated with the same reaction. Correlations coefficients >0.25 were considered significant. To be counted as parallel, rather than converging, we demanded that the correlation with the alternative reaction be <80% of the correlation with the preferred reaction. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 27 Differential regulation of isozymes: interpretation The primary role of isozyme multiplicity is to allow for differential regulation of reactions that are shared by separated processes. Dedicating a specific enzyme to each pathway may offer a way of independently controlling the associated reaction in response to pathway-specific requirements, at both the transcriptional and the post-transcriptional levels. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 28 Genes coexpressed with metabolic pathways Identify the coregulated subparts of each metabolic pathway and identify relevant experimental conditions that induce or repress the expression of the pathway genes. Also associate additional genes showing similar expression profiles with each pathway using the signature algorithm. Input: set of genes, some of which are expected to be coregulated. Output: coregulated part of the input and additional coregulated genes together with the set of conditions where the coregulation is realized. Numerous genes were found that are not directly involved in enzymatic steps: - transporters - transcription factors Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 29 Co-expression of transporters Transporter genes are co-expressed with the relevant metabolic pathways providing the pathways with its metabolites. Co-expression is marked in green. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 30 Co-regulation of transcription factors Transcription factors are often co-regulated with their regulated pathways. Shown here are transcription factors which were found to be co-regulated in the analysis. Co-regulation is shown by color-coding such that the transcription factor and the associated pathways are of the same color. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 31 Hierarchical modularity in the metabolic network Sofar: co-expression analysis revealed a strong tendency toward coordinated regulation of genes involved in individual metabolic pathways. Does transcription regulation also define a higher-order metabolic organization, by coordinated expression of distinct metabolic pathways? Based on observation that feeder pathways (which synthesize metabolites) are frequently coexpressed with pathways using the synthesized metabolites. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 32 Feeder-pathways/enzymes Feeder pathways or genes co-expressed with the pathways they fuel. The feeder pathways (light blue) provide the main pathway (dark blue) with metabolites in order to assist the main pathway, indicating that coexpression extends beyond the level of individual pathways. These results can be interpreted in the following way: the organism will produce those enzymes that are needed. 22. Lecture WS 2003/04 Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) Bioinformatics III 33 Hierarchical modularity in the metabolic network Derive hierarchy by applying an iterative signature algorithm to the metabolic pathways, and decreasing the resolution parameter (coregulation stringency) in small steps. Each box contains a group of coregulated genes (transcription module). Strongly associated genes (left) can be associated with a specific function, whereas moderately correlated modules (right) are larger and their function is less coherent. The merging of 2 branches indicates that the associated modules are induced by similar conditions. All pathways converge to one of 3 low-resolution modules: amino acid biosynthesis, protein synthesis, and stress. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 34 Hierarchical modularity in the metabolic network Although amino acids serve as building blocks for proteins, the expression of genes mediating these 2 processes is clearly uncoupled! This may reflect the association of rapid cell growth (which triggers enhanced protein synthesis) with rich growth conditions, where amino acids are readily available and do not need to be synthesized. Amino acid biosynthesis genes are only required when external amino acids are scarce. In support of this view, a group of amino acid transporters converged to the protein synthesis module, together with other pathways required for rapid cell growth (glucose fermentation, nucleotide synthesis and fatty acid synthesis). Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 35 Global network properties Jeong et al. showed that the structural connectivity between metabolites imposes a hierarchical organization of the metabolic network. That analysis was based on connectivity between substrates, considering all potential connections. Here, analysis is based on coexpression of enzymes. In both approaches, related metabolic pathways were clustered together! There are, however, some differences in the particular groupings (not discussed here), and importantly, when including expression data the connectivity pattern of metabolites changes from a power-law dependence to an exponential one corresponding to a network structure with a defined scale of connectivity. This reflects the reduction in the complexity of the network. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 36 Summary Transcription regulation is prominently involved in shaping the metabolic network of S. cerevisae. 1 Transcription leads the metabolic flow toward linearity. 2 Individual isozymes are often separately coregulated with distinct processes, providing a means of reducing crosstalk between pathways using a common reaction. 3 Transcription regulation entails a higher-order structure of the metabolic network. It exists a hierarchical organization of metabolic pathways into groups of decreasing expression coherence. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 22. Lecture WS 2003/04 Bioinformatics III 37