Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bioinformatics Metabolic pathway analysis Jacques van Helden [email protected] Graph-based analysis of biochemical networks Examples of metabolic pathways Jacques van Helden [email protected] Methionine Biosynthesis in S.cerevisiae Aspartate biosynthesis L-Aspartate ATP ADP 2.7.2.4 Aspartate kinase HOM3 Aspartate semialdehyde deshydrogenase HOM2 Homoserine deshydrogenase HOM6 L-aspartyl-4-P NADPH NADP+; Pi 1.2.1.11 L-aspartic semialdehyde Threonine biosynthesis NADPH NADP+ 1.1.1.3 L-Homoserine AcetlyCoA CoA 2.3.1.31 Met31p met32p Homoserine O-acetyltransferase MET2 O-acetylhomoserine (thiol)-lyase MET17 MET31 MET32 O-acetyl-homoserine Sulfur assimilation Sulfide 4.2.99.10 MET28 Homocysteine Cysteine biosynthesis 5-methyltetrahydropteroyltri-L-glutamate 5-tetrahydropteroyltri-L-glutamate 2.1.1.14 Methionine synthase (vit B12-independent) MET6 Cbf1p/Met4p/Met28p complex CBF1 MET4 Gcn4p GCN4 L-Methionine S-adenosyl-methionine synthetase I H20; ATP 2.5.1.6 S-adenosyl-methionine Pi, PPi synthetase II S-Adenosyl-L-Methionine Met30p SAM1 SAM2 MET30 Methionine Biosynthesis in E.coli Aspartate biosynthesis L-Aspartate ATP ADP aspartate kinase II/ homoserine dehydrogenase II 2.7.2.4 metL L-aspartyl-4-P Lysine biosynthesis Threonine biosynthesis NADPH 1.2.1.11 NADP+; Pi L-aspartic semialdehyde NADPH NADP+ Aspartate semialdehyde deshydrogenase asd 1.1.1.3 L-Homoserine SuccinylSCoA HSCoA Homoserine O-succinyltransferase metA Cystathionine-gamma-synthase metB Cystathionine-beta-lyase metC Cobalamin-independenthomocysteine transmethylase metE Cobalamin-dependenthomocysteine transmethylase metH 2.3.1.46 Methionine repressor Alpha-succinyl-L-Homoserine Cysteine biosynthesis L-Cysteine 4.2.99.9 Succinate H2O Pyruvate; NH4+ Cystathionine 4.4.1.8 Homocysteine 5-MethylTHF THF 2.1.1.14 2.1.1.13 L-Methionine ATP; H2O Pi; PPi 2.5.1.6 S-Adenosyl-L-Methionine metR metR metJ Alternative methionine pathways L-Aspartate 2.7.2.4 S.cerevisiae E.coli L-aspartyl-4-P 1.2.1.11 L-aspartic semialdehyde 1.1.1.3 L-Homoserine 2.3.1.31 2.3.1.46 Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine 4.2.99.10 4.4.1.8 Homocysteine 2.1.1.14 L-Methionine 2.5.1.6 S-Adenosyl-L-Methionine KEGG "consensus pathway" for Methionine metabolism Lysine biosynthesis in Escherichia coli Aspartate biosynthesis L-Aspartate ATP ADP 2.7.2.4 aspartate kinase III metL aspartate semialdehyde deshydrogenase asd dihydrodipicolinate synthase dapA dihydrodipicolinate reductase dapB tetrahydrodipicolinae N-succinyltransferase dapD succinyl diaminopimelate aminotransferase dapC N-succinyldiaminopimelate desuccinylase dapE diaminopimelate epimerase dapF diaminopimelate decarboxylase lysA L-aspartyl-4-P NADPH; H+ Methionine biosynthesis NADP+; Pi 1.2.1.11 L-aspartic semialdehyde Threnonine biosynthesis pyruvate 2 H2O 4.2.1.52 dihydropicolinic acid NADPH or NADH; H+ NADP+ or NAD+ 1.3.1.26 tetrahydrodipicolinate succinyl CoA CoA 2.3.1.117 N-succinyl-epsilon-ketoL-alpha-aminopimelic acid glutamate alpha-ketoglutarate 2.6.1.17 succinyl diaminopimelate H2O succinate 3.5.1.18 LL-diaminopimelic acid 5.1.1.7 meso-diaminopimelic acid CO2 3.5.1.18 L-lysine lysR protein lysR Lysine biosynthesis in Saccharomyces cerevisiae 2-Oxoglutarate Acetyl-CoA CoA homocitrate synthase LYS20 homocitrate dehydratase LYS7 4.1.3.21 1,2,4-Tricarboxylate H2O But-1-ene-1,2,4-tricarboxylate 4.2.1.36 homoaconitate hydratase LYS4 Homoisocitrate NAD+ H+; NADH 1.1.1.87 Oxaloglutarate CO2 Homoisocitrate dehydrogenase 1.1.1.87 2-Oxoadipate L-Glutamate 2-Oxoglutarate aminoadipate aminotransferase 2.6.1.39 L-2-Aminoadipate H+ ; NADH (or NADPH) NAD+( or NADP+); H2O 1.2.1.31 amlnoadipate semialdehyde dehydrogenase LYS5 LYS2 L-2-Aminoadipate 6-semialdehyde L-Glutamate ; NADPH (or NADH); H+ NADP+ (OR NAD+); H2O 1.5.1.10 saccharopine dehydrogenase (glutamate forming) LYS9 N6-(L-1,3-Dicarboxypropyl)-L-lysine NADP+ (OR NAD+) ; H2O 2-Oxoglutarate ; NADPH (OR NADH) ; H+ 1.5.1.7 L-lysine saccharopine dehydrogenase (lysine forming) LYS1 Lysine biosynthesis in KEGG (yeast enzymes in green) EcoCyc example - proline utilization EcoCyc example - proline biosynthesis Ecocyc - metabolic overview KEGG example : proline and arginine metabolism (E.coli) where is proline ? how is proline synthesized in E.coli ? how is proline catabolized in E.coli ? is it obvious that reactions 1.5.99.8 and 1.5.1.2 have distinct side reactants ? Graph-based analysis of biochemical networks Pathway reconstruction by reaction clustering Jacques van Helden [email protected] A graph of compounds and reactions Reactions from KEGG Compound nodes • 10,166 compounds (only 4302 used by one reaction) Reaction nodes • 5,283 reactions Arcs • 10,685 substrate reaction (7,297 non-trivial) • 10,621 reaction product (6,828 non-trivial) Metabolic Pathways as subgraphs Escherichia coli 4219 Genes (Blattner) 967 enzymes (Swissprot) 159 pathways (EcoCyc) Reconstructing a pathway from a subset of reactions Input: a set of reactions (the seed reactions) Output: a metabolic pathway including • • the seed reactions, together with their substrates and products optionally, some additional reactions, interaalated to improve the pathway connectivity the pathway can either be connected, or contain several unconnected components Seed nodes Compound Reaction Seed Reaction Linking seed nodes Compound Reaction Seed Reaction Direct link Enhance linking by intercalating reactions Compound Reaction Seed Reaction Direct link Intercalated reaction Subgraph extraction Validation of the method Take a set of experimentally characterized pathways, and for each one Select a subset of enzymes Use the reactions catalysed by these enzymes as seed nodes Extract the subgraph Compare with known pathway Lysine biosynthesis in E.coli Aspartate biosynthesis L-Aspartate ATP ADP 2.7.2.4 aspartate kinase III lysC aspartate semialdehyde deshydrogenase asd dihydrodipicolinate synthase dapA dihydrodipicolinate reductase dapB tetrahydrodipicolinae N-succinyltransferase dapD succinyl diaminopimelate aminotransferase dapC N-succinyldiaminopimelate desuccinylase dapE diaminopimelate epimerase dapF diaminopimelate decarboxylase lysA L-aspartyl-4-P NADPH; H+ Methionine biosynthesis NADP+; Pi 1.2.1.11 L-aspartic semialdehyde Threnonine biosynthesis pyruvate 2 H2O 4.2.1.52 dihydropicolinic acid NADPH or NADH; H+ NADP+ or NAD+ 1.3.1.26 tetrahydrodipicolinate succinyl CoA CoA 2.3.1.117 N-succinyl-epsilon-ketoL-alpha-aminopimelic acid glutamate alpha-ketoglutarate 2.6.1.17 succinyl diaminopimelate H2O succinate 3.5.1.18 LL-diaminopimelic acid 5.1.1.7 meso-diaminopimelic acid CO2 3.5.1.18 L-lysine lysR protein lysR Example: reconstitution of lysine pathway Gap size: 0 Seeds all Ecs from original pathway are provided as seeds 1.2.1.11 1.3.1.26 2.3.1.117 2.6.1.17 2.7.2.4 3.5.1.18 4.1.1.20 4.2.1.52 5.1.1.7 Result: Inferring reaction orientation (reverse or forward) Ordering Example: reconstitution of lysine pathway Gap size: 1 5 seed reactions Result Inferring missing steps Inferring reaction orientation Ordering Example: reconstitution of lysine pathway Gap size: 2 4 seed reactions Result E.coli pathway found Alternative pathways also returned Example: reconstitution of lysine pathway Gap size: 3 3 seed reactions Result E.coli pathway is not found, because the program finds shortcuts between the seed reactions Applications of pathway reconstruction We have the complete genome for dozens of bacteria, for which there is almost no experimental characterization of metabolism For these genomes, enzymes have been predicted by sequence similarity In some cases, one expects to find the same pathways as in model organisms, but in other cases, variants or completely distinct pathways For each known pathway from model organisms Select the subset of reactions for which an enzyme exists in the target organism If a reasonable number of reactions are present • Using these as seeds, reconstruct a pathway • Preferentially (but not exclusively) intercalate reactions for which an enzyme has been detected in the target organism Graph-based analysis of biochemical networks From gene expression data to pathways Jacques van Helden [email protected] Reaction clustering and gene expression data Many biochemical pathways are co-regulated at the transcriptional level. Starting from the observation that a group of genes is coregulated, try to find if they could be involved in a common pathway. Gene expression data: cell cycle Alpha cdc15 cdc28 Elu MCM CLB2 SIC1 MAT CLN2 Y' MET Spellman et al. (1998). Mol Biol Cell 9(12), 3273-97. Gilbert et al. (2000). Trends Biotech. 18(Dec), 487-495. Study case : cluster of co-regulated genes ID YKR069W YFR030W YGL125W YKL001C YPR167C YLR303W YJR010W YER091C name met1 met10 met13 met14 MET16 MET17 met3 met6 YIR017C YGR055W YJR137C YER042W YIL074C YLL061W YLL062C YLR302C YNL276C YPL250C YPL274W MET28 MUP1 ECM17 decription siroheme synthase subunit of assimilatory sulfite reductase putative methylenetetrahydrofolate reductase (mthfr) adenylylsulfate kinase 3'phosphoadenylylsulfate reductase O-Acetylhomoserine-O-Acetylserine Sulfhydralase ATP sulfurylase vitamin B12-(cobalamin)-independent isozyme of methionine synthase (also called N5-methyltetrahydrofolate homocysteine methyltransferase or 5-methyltetrahydropteroyl triglutamate homocysteine Transcriptionalmethyltransferase) activator of sulfur amino acid metabolism high affinity methionine permease ExtraCellular Mutant KEGG - gene search in pathway maps KEGG - reaction coloring in pathway maps KEGG - reaction coloring in pathway maps KEGG - reaction coloring in pathway maps Building pathways from gene clusters chip 3 ... 1 2 3 4 5 6 7 chip 2 gene gene gene gene gene gene gene chip 1 Gene Experiment 1.24 -0.56 1.39 -0.30 -0.29 0.66 1.15 0.43 NA 0.26 0.66 0.57 0.38 0.32 0.40 NA -0.09 0.72 0.59 0.48 0.20 0.40 NA 0.08 -0.64 0.72 0.03 0.48 gene expression profiles gene 1 expr protein 1 cat 1 react 1 gene 2 expr protein 2 cat 2 react 2 gene 3 expr protein 3 cat 3 gene 4 expr protein 4 cat 4 gene 5 expr protein 5 cat 5 gene 6 expr protein 6 cat 6 gene 7 expr protein 7 gene 8 expr protein 8 gene 9 expr protein 9 react 3 react 4 Classification Pathway reconstruction cluster of co-regulated genes Putative pathway Pathway found in Spellman’s “MET” cluster Sulfate ATP PPi Sulfate adenylyl transferase MET3 Adenylyl sulfate kinase MET14 3'-phosphoadenylylsulfate reductase MET16 Putative Sulfite reductase MET5 sulfide Sulfite reductase (NADPH) MET10 4.2.99.10 O-acetylhomoserine (thiol)-lyase MET17 Methionine synthase (vit B12-independent) MET6 2.7.7.4 Adenylyl sulfate (APS) ATP ADP 2.7.1.25 3'-phosphoadenylylsulfate (PAPS) NADPH NADP+; AMP; 3'-phosphate (PAP); H+ 1.8.99.4 sulfite 3 NADPH; 5H+ 3 NADP+; 3 H2O 1.8.1.2 O-acetyl-homoserine Homocysteine 5-methyltetrahydropteroyltri-L-glutamate 5-tetrahydropteroyltri-L-glutamate 2.1.1.14 L-Methionine Analysis of Gene Expression Data Gene cluster 20 genes Identify genes coding for enzymes 7 enzymes Identify subset of catalyzed reactions 8 reactions Interconnect these reactions to find all possible pathways Automatic Graph Layout Compare with Classical Pathways Pathway Diagram Known Pathways 2 matching pathways Comparison with Sulfur assimilation Sulfate (extracellular) Sulfate transporter SUL1 Sulfate transporter SUL2 Sulfate adenylyl transferase MET3 Sulfate transport Sulfate (intracellular) ATP PPi 2.7.7.4 Met31p Met32p Adenylyl sulfate (APS) ATP ADP 2.7.1.25 Adenylyl sulfate kinase MET14 3'-phosphoadenylylsulfate reductase MET16 MET31 MET32 3'-phosphoadenylylsulfate (PAPS) NADPH NADP+; AMP; H+; 3'-phosphate (PAP) 3 NADPH; 5H+ 3 NADP+; 3 H2O 1.8.99.4 Cbf1p/Met4p/Met28p complex sulfite 1.8.1.2 sulfide MET28 Putative Sulfite reductase MET5 Sulfite reductase (NADPH) MET10 Methionine biosynthesis Gcn4p Met31p CBF1 MET4 GCN4 MET30 Comparison with methionine biosynthesis Aspartate biosynthesis L-Aspartate ATP ADP 2.7.2.4 Aspartate kinase HOM3 Aspartate semialdehyde deshydrogenase HOM2 Homoserine deshydrogenase HOM6 L-aspartyl-4-P NADPH NADP+; Pi 1.2.1.11 L-aspartic semialdehyde Threonine biosynthesis NADPH NADP+ 1.1.1.3 L-Homoserine AcetlyCoA CoA 2.3.1.31 Met31p met32p Homoserine O-acetyltransferase MET2 O-acetylhomoserine (thiol)-lyase MET17 MET31 MET32 O-acetyl-homoserine Sulfur assimilation Sulfide 4.2.99.10 MET28 Homocysteine Cysteine biosynthesis 5-methyltetrahydropteroyltri-L-glutamate 5-tetrahydropteroyltri-L-glutamate 2.1.1.14 Methionine synthase (vit B12-independent) MET6 Cbf1p/Met4p/Met28p complex CBF1 MET4 Gcn4p GCN4 L-Methionine S-adenosyl-methionine synthetase I H20; ATP 2.5.1.6 S-adenosyl-methionine Pi, PPi synthetase II S-Adenosyl-L-Methionine Met30p SAM1 SAM2 MET30 Summary Starting from an unordered cluster of genes, one gets an ordered set of reactions, connected to form a pathway Should permit discovery of novel pathways, that are not stored in any pathway database yet Interpretation of intercalated reactions enzyme is not regulated DNA chip defect for that gene gene was not on the DNA chip enzyme remains to be identified in that organism Analysis of data from Gasch et al. Gasch et al (2000). Molecular Biology of the Cell, 11:4241-4257 6152 yeast genes Various conditions of stress (heat shock, osmotic shock, peroxide, amino acid starvation, nitrogen depletion Steady-state growth on alternative carbon sources Overexpression studies 4 4 -4 4 -4 -2 0 2 0 2 4 -2 0 2 log(expression ratio) 600 200 number of genes 1200 4 -6 -4 4 -2 0 2 4 log(expression ratio) number of genes YP.galactose.vs.reference.pool.car.2 galactose vs reference -4 -2 0 2 4 -6 -4 -2 0 2 4 log(expression ratio) YP.sucrose.vs.reference.pool.car.2 sucrose vs reference number of genes 1000 -4 4 0 2 YP.raffinose.vs.reference.pool.car.2 raffinose vs reference 0 number of genes 600 -6 2 800 number of genes 1000 600 -6 1000 -2 log(expression ratio) 0 YP.fructose.vs.reference.pool.car.2 fructose vs reference YP.mannose.vs.reference.pool.car.2 mannose vs reference number of genes -4 -2 log(expression ratio) 0 200 600 number of genes 0 200 -6 -4 log(expression ratio) YP.glucose.vs.reference.pool.car.2 glucose vs reference 0 raffinose.car.1 raffinose 0 200 -6 4 -2 log(expression ratio) 0 -6 log(expression ratio) -4 0 600 number of genes 2 -6 log(expression ratio) 0 200 600 number of genes 200 0 4 YP.ethanol.vs.reference.pool.car.2 ethanol vs reference 0 -2 2 number of genes sucrose -4 0 log(expression ratio) sucrose.car.1 -6 -2 4 mannose number of genes -6 2 mannose..car.1 200 400 600 number of genes 2 0 glucose 0 0 log(expression ratio) -2 log(expression ratio) 1000 number of genes 600 -2 -4 glucose.car.1 0 200 -4 0 -6 400 2 800 0 400 galactose -2 log(expression ratio) 1200 -4 800 -6 galactose.car.1 -6 ethanol ethanol.car.1 400 4 1200 2 800 0 400 -2 log(expression ratio) 600 -4 0 200 -6 YAP1YAP1.overexpression overexpression 0 0 400 800 number of genes 1200 MSN4.overexpression MSN4 overexpression number of genes 1500 0 500 number of genes MSN2.overexpression..repeat. MSN2 overexpression 200 400 600 800 Selected experiments -6 -4 -2 0 2 log(expression ratio) 4 -6 -4 -2 0 2 log(expression ratio) 4 Repressed by mannose (at least 3-fold) Galactose utilization (redundancy in the database ?) inferred Citrate cycle with shunt gluconeogenesis Remark: arrows should be displayed as bi-directional Repressed by mannose (at least 2-fold) (redundancy in the database ?) gluconeogenesis Citrate cycle with shunt Galactose utilization gluconeogenesis Remark: arrows should be displayed as bi-directional Induced by galactose (at least 2-fold) Galactose utilization Remark: arrows should be displayed as bi-directional Repressed by glucose (at least 2-fold) (redundancy in the database ?) gluconeogenesis Galactose utilization gluconeogenesis