Download Metabolic pathway analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics
Metabolic pathway analysis
Jacques van Helden
[email protected]
Graph-based analysis of biochemical networks
Examples of metabolic pathways
Jacques van Helden
[email protected]
Methionine Biosynthesis in S.cerevisiae
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
Aspartate kinase
HOM3
Aspartate semialdehyde
deshydrogenase
HOM2
Homoserine
deshydrogenase
HOM6
L-aspartyl-4-P
NADPH
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threonine
biosynthesis
NADPH
NADP+
1.1.1.3
L-Homoserine
AcetlyCoA
CoA
2.3.1.31
Met31p
met32p
Homoserine
O-acetyltransferase
MET2
O-acetylhomoserine
(thiol)-lyase
MET17
MET31
MET32
O-acetyl-homoserine
Sulfur
assimilation
Sulfide
4.2.99.10
MET28
Homocysteine
Cysteine biosynthesis
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
Methionine synthase
(v it B12-independent)
MET6
Cbf1p/Met4p/Met28p
complex
CBF1
MET4
Gcn4p
GCN4
L-Methionine
S-adenosyl-methionine
synthetase I
H20; ATP
2.5.1.6
S-adenosyl-methionine
Pi, PPi
synthetase II
S-Adenosyl-L-Methionine
SAM1
SAM2
Met30p
MET30
Methionine Biosynthesis in E.coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
aspartate kinase II/
homoserine dehydrogenase II
metL
L-aspartyl-4-P
Lysine
biosynthesis
Threonine
biosynthesis
Cysteine
biosynthesis
NADPH
Aspartate semialdehyde
1.2.1.11
deshydrogenase
NADP+; Pi
L-aspartic semialdehyde
NADPH
1.1.1.3
NADP+
L-Homoserine
SuccinylSCoA
Homoserine
2.3.1.46
O-succinyltransferase
HSCoA
asd
Methionine
repressor
metA
Alpha-succinyl-L-Homoserine
L-Cysteine
4.2.99.9
Succinate
H2O
Pyruv ate; NH4+
THF
metB
Cystathionine-beta-lyase
metC
Cobalamin-independenthomocysteine transmethylase
metE
Cobalamin-dependenthomocysteine transmethylase
metH
Cystathionine
4.4.1.8
Homocysteine
5-MethylTHF
Cystathionine-gamma-synthase
2.1.1.14
2.1.1.13
L-Methionine
ATP; H2O
Pi; PPi
2.5.1.6
S-Adenosyl-L-Methionine
metR
metR
metJ
Alternative methionine pathways
L-Aspartate
S.cerevisiae
2.7.2.4
E.coli
L-aspartyl-4-P
1.2.1.11
L-aspartic semialdehyde
1.1.1.3
L-Homoserine
2.3.1.31
2.3.1.46
Alpha-succinyl-L-Homoserine
O-acetyl-homoserine
4.2.99.9
Cystathionine
4.2.99.10
4.4.1.8
Homocysteine
2.1.1.14
L-Methionine
2.5.1.6
S-Adenosyl-L-Methionine
KEGG "consensus pathway" for Methionine metabolism
Lysine biosynthesis in Escherichia coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
aspartate kinase III
metL
aspartate semialdehyde
deshydrogenase
asd
dihydrodipicolinate
synthase
dapA
dihydrodipicolinate
reductase
dapB
tetrahydrodipicolinae
N-succinyltransferase
dapD
succinyl diaminopimelate
aminotransferase
dapC
N-succinyldiaminopimelate
desuccinylase
dapE
diaminopimelate
epimerase
dapF
diaminopimelate
decarboxylase
lysA
L-aspartyl-4-P
Methionine
biosynthesis
Threnonine
biosynthesis
NADPH; H+
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
pyruvate
2 H2O
4.2.1.52
dihydropicolinic acid
NADPH or NADH; H+
NADP+ or NAD+
1.3.1.26
tetrahydrodipicolinate
succinyl CoA
CoA
2.3.1.117
N-succinyl-epsilon-ketoL-alpha-aminopimelic acid
glutamate
alpha-ketoglutarate
2.6.1.17
succinyl diaminopimelate
H2O
succinate
3.5.1.18
LL-diaminopimelic acid
5.1.1.7
meso-diaminopimelic acid
CO2
3.5.1.18
L-lysine
lysR
protein
lysR
Lysine biosynthesis in Saccharomyces cerevisiae
2-Oxoglutarate
Acetyl-CoA
CoA
4.1.3.21
homocitrate synthase
LYS20
homocitrate dehydratase
LYS7
1,2,4-Tricarboxylate
H2O But-1-ene-1,2,4-tricarboxylate
4.2.1.36
homoaconitate hydratase
LYS4
Homoisocitrate
NAD+
H+; NADH
1.1.1.87
Oxaloglutarate
CO2
Homoisocitrate
dehydrogenase
1.1.1.87
2-Oxoadipate
L-Glutamate
2-Oxoglutarate
aminoadipate
aminotransferase
2.6.1.39
L-2-Aminoadipate
H+ ; NADH (or NADPH)
NAD+( or NADP+); H2O
1.2.1.31
amlnoadipate semialdehyde
dehydrogenase
LYS5
LYS2
L-2-Aminoadipate 6-semialdehyde
L-Glutamate ; NADPH (or NADH); H+
NADP+ (OR NAD+); H2O
1.5.1.10
saccharopine dehydrogenase
(glutamate forming)
LYS9
N6-(L-1,3-Dicarboxypropyl)-L-lysine
NADP+ (OR NAD+) ; H2O
2-Oxoglutarate ; NADPH (OR NADH) ; H+
1.5.1.7
L-lysine
saccharopine dehydrogenase
(lysine forming)
LYS1
Lysine biosynthesis in KEGG (yeast enzymes in green)
EcoCyc example - proline utilization
EcoCyc example - proline biosynthesis
Ecocyc - metabolic overview
KEGG example : proline and arginine metabolism (E.coli)




where is proline ?
how is proline synthesized in E.coli ?
how is proline catabolized in E.coli ?
is it obvious that reactions 1.5.99.8 and
1.5.1.2 have distinct side reactants ?
Graph-based analysis of biochemical networks
Pathway reconstruction
by reaction clustering
Jacques van Helden
[email protected]
A graph of compounds and reactions
Reactions from KEGG
Compound nodes
• 10,166 compounds
(only 4302 used by one reaction)
Reaction nodes
• 5,283 reactions
Arcs
• 10,685 substrate → reaction (7,297 non-trivial)
• 10,621 reaction → product (6,828 non-trivial)
Metabolic Pathways as subgraphs
Escherichia coli



4219 Genes (Blattner)
967 enzymes (Swissprot)
159 pathways (EcoCyc)
Reconstructing a pathway from a subset of reactions

Input:


a set of reactions (the seed reactions)
Output:

a metabolic pathway including
•
•

the seed reactions, together with their substrates and products
optionally, some additional reactions, interaalated to improve the
pathway connectivity
the pathway can either be connected, or contain several
unconnected components
Seed nodes
Compound
Reaction
Seed Reaction
Linking seed nodes
Compound
Reaction
Seed Reaction
Direct link
Enhance linking by intercalating reactions
Compound
Reaction
Seed Reaction
Direct link
Intercalated reaction
Subgraph extraction
Validation of the method

Take a set of experimentally characterized pathways,
and for each one




Select a subset of enzymes
Use the reactions catalysed by these enzymes as seed nodes
Extract the subgraph
Compare with known pathway
Lysine biosynthesis in E.coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
aspartate kinase III
lysC
aspartate semialdehyde
deshydrogenase
asd
dihydrodipicolinate
synthase
dapA
dihydrodipicolinate
reductase
dapB
tetrahydrodipicolinae
N-succinyltransferase
dapD
succinyl diaminopimelate
aminotransferase
dapC
N-succinyldiaminopimelate
desuccinylase
dapE
diaminopimelate
epimerase
dapF
diaminopimelate
decarboxylase
lysA
L-aspartyl-4-P
Methionine
biosynthesis
Threnonine
biosynthesis
NADPH; H+
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
pyruvate
2 H2O
4.2.1.52
dihydropicolinic acid
NADPH or NADH; H+
NADP+ or NAD+
1.3.1.26
tetrahydrodipicolinate
succinyl CoA
CoA
2.3.1.117
N-succinyl-epsilon-ketoL-alpha-aminopimelic acid
glutamate
alpha-ketoglutarate
2.6.1.17
succinyl diaminopimelate
H2O
succinate
3.5.1.18
LL-diaminopimelic acid
5.1.1.7
meso-diaminopimelic acid
CO2
3.5.1.18
L-lysine
lysR
protein
lysR
Example: reconstitution of lysine pathway

Gap size: 0


Seeds










all Ecs from original pathway are
provided as seeds
1.2.1.11
1.3.1.26
2.3.1.117
2.6.1.17
2.7.2.4
3.5.1.18
4.1.1.20
4.2.1.52
5.1.1.7
Result:


Inferring reaction orientation
(reverse or forward)
Ordering
Example: reconstitution of lysine pathway



Gap size: 1
5 seed reactions
Result



Inferring missing
steps
Inferring reaction
orientation
Ordering
Example: reconstitution of lysine pathway



Gap size: 2
4 seed reactions
Result


E.coli pathway
found
Alternative
pathways also
returned
Example: reconstitution of lysine pathway



Gap size: 3
3 seed reactions
Result

E.coli pathway is not
found, because the
program finds shortcuts
between the seed
reactions
Applications of pathway reconstruction




We have the complete genome for dozens of bacteria, for which
there is almost no experimental characterization of metabolism
For these genomes, enzymes have been predicted by sequence
similarity
In some cases, one expects to find the same pathways as in model
organisms, but in other cases, variants or completely distinct
pathways
For each known pathway from model organisms


Select the subset of reactions for which an enzyme exists in the target
organism
If a reasonable number of reactions are present
• Using these as seeds, reconstruct a pathway
• Preferentially (but not exclusively) intercalate reactions for which an
enzyme has been detected in the target organism
Graph-based analysis of biochemical networks
From gene expression data
to pathways
Jacques van Helden
[email protected]
Reaction clustering and gene expression data


Many biochemical pathways are co-regulated at the
transcriptional level.
Starting from the observation that a group of genes is coregulated, try to find if they could be involved in a
common pathway.
Gene expression data: cell cycle
Alpha
cdc15 cdc28 Elu
MCM
CLB2
SIC1
MAT
CLN2
Y'
MET
Spellman et al. (1998).
Mol Biol Cell 9(12), 3273-97.
Gilbert et al. (2000).
Trends Biotech. 18(Dec), 487-495.
Study case : cluster of co-regulated genes
ID
YKR069W
YFR030W
YGL125W
YKL001C
YPR167C
YLR303W
YJR010W
YER091C
name
met1
met10
met13
met14
MET16
MET17
met3
met6
YIR017C
YGR055W
YJR137C
YER042W
YIL074C
YLL061W
YLL062C
YLR302C
YNL276C
YPL250C
YPL274W
MET28
MUP1
ECM17
decription
siroheme synthase
subunit of assimilatory sulfite reductase
putative methylenetetrahydrofolate reductase (mthfr)
adenylylsulfate kinase
3'phosphoadenylylsulfate reductase
O-Acetylhomoserine-O-Acetylserine Sulfhydralase
ATP sulfurylase
vitamin B12-(cobalamin)-independent isozyme of methionine synthase
(also called N5-methyltetrahydrofolate homocysteine
methyltransferase or 5-methyltetrahydropteroyl triglutamate
homocysteine methyltransferase)
Transcriptional
activator of sulfur amino acid metabolism
high affinity methionine permease
ExtraCellular Mutant
KEGG - gene search in pathway maps
KEGG - reaction coloring in pathway maps
KEGG - reaction coloring in pathway maps
KEGG - reaction coloring in pathway maps
Building pathways from gene clusters
chip 3
...
1
2
3
4
5
6
7
chip 2
gene
gene
gene
gene
gene
gene
gene
chip 1
Gene
Experiment
1.24
-0.56
1.39
-0.30
-0.29
0.66
1.15
0.43
NA
0.26
0.66
0.57
0.38
0.32
0.40
NA
-0.09
0.72
0.59
0.48
0.20
0.40
NA
0.08
-0.64
0.72
0.03
0.48
gene expression
profiles
gene 1
expr
protein 1
cat 1
react 1
gene 2
expr
protein 2
cat 2
react 2
gene 3
expr
protein 3
cat 3
gene 4
expr
protein 4
cat 4
gene 5
expr
protein 5
cat 5
gene 6
expr
protein 6
cat 6
gene 7
gene 8
expr
expr
protein 7
protein 8
gene 9
expr
protein 9
react 3
react 4
Classification
Pathway
reconstruction
cluster of
co-regulated genes
Putative
pathway
Pathway found in Spellman’s “MET” cluster
Sulfate
ATP
PPi
Sulfate adenylyl
transferase
MET3
Adenylyl sulfate
kinase
MET14
3'-phosphoadenylylsulfate
reductase
MET16
1.8.1.2
Putative
Sulfite reductase
MET5
sulfide
Sulfite reductase
(NADPH)
MET10
4.2.99.10
O-acetylhomoserine
(thiol)-lyase
MET17
Methionine synthase
(vit B12-independent)
MET6
2.7.7.4
Adenylyl sulfate (APS)
ATP
ADP
2.7.1.25
3'-phosphoadenylylsulfate (PAPS)
NADPH
NADP+; AMP; 3'-phosphate (PAP); H+
1.8.99.4
sulfite
3 NADPH; 5H+
3 NADP+; 3 H2O
O-acetyl-homoserine
Homocysteine
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
L-Methionine
Analysis of Gene Expression Data
Gene cluster
20 genes
Identify genes coding for enzymes
7 enzymes
Identify subset of
catalyzed reactions
8 reactions
Interconnect these reactions to
find all possible pathways
Automatic Graph Layout
Pathway Diagram
Compare with Classical Pathways
Known
Pathways
2 matching
pathways
Comparison with Sulfur assimilation
Sulfate (extracellular)
Sulfate transport
Sulfate transporter
SUL1
Sulfate transporter
SUL2
Sulfate adenylyl
transferase
MET3
Sulfate (intracellular)
ATP
PPi
2.7.7.4
Met31p
Met32p
Adenylyl sulfate (APS)
ATP
ADP
2.7.1.25
Adenylyl sulfate
kinase
MET14
3'-phosphoadenylylsulfate
reductase
MET16
MET31
MET32
3'-phosphoadenylylsulfate (PAPS)
NADPH
NADP+; AMP; H+;
3'-phosphate (PAP)
3 NADPH; 5H+
3 NADP+; 3 H2O
1.8.99.4
Cbf1p/Met4p/Met28p
complex
sulfite
1.8.1.2
sulfide
MET28
Putativ e
Sulfite reductase
MET5
Sulfite reductase
(NADPH)
MET10
Methionine biosynthesis
Gcn4p
Met31p
CBF1
MET4
GCN4
MET30
Comparison with methionine biosynthesis
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
Aspartate kinase
HOM3
Aspartate semialdehyde
deshydrogenase
HOM2
Homoserine
deshydrogenase
HOM6
L-aspartyl-4-P
NADPH
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threonine
biosynthesis
NADPH
NADP+
1.1.1.3
L-Homoserine
AcetlyCoA
CoA
2.3.1.31
Met31p
met32p
Homoserine
O-acetyltransferase
MET2
O-acetylhomoserine
(thiol)-lyase
MET17
MET31
MET32
O-acetyl-homoserine
Sulfur
assimilation
Sulfide
4.2.99.10
MET28
Homocysteine
Cysteine biosynthesis
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
Methionine synthase
(v it B12-independent)
MET6
Cbf1p/Met4p/Met28p
complex
CBF1
MET4
Gcn4p
GCN4
L-Methionine
S-adenosyl-methionine
synthetase I
H20; ATP
2.5.1.6
S-adenosyl-methionine
Pi, PPi
synthetase II
S-Adenosyl-L-Methionine
SAM1
SAM2
Met30p
MET30
Summary



Starting from an unordered cluster of genes, one gets an ordered
set of reactions, connected to form a pathway
Should permit discovery of novel pathways, that are not stored in
any pathway database yet
Interpretation of intercalated reactions




enzyme is not regulated
DNA chip defect for that gene
gene was not on the DNA chip
enzyme remains to be identified in that organism
Analysis of data from Gasch et al.





Gasch et al (2000). Molecular Biology of the Cell,
11:4241-4257
6152 yeast genes
Various conditions of stress (heat shock, osmotic shock,
peroxide, amino acid starvation, nitrogen depletion
Steady-state growth on alternative carbon sources
Overexpression studies
4
2
number of genes
4
-4
-2
0
2
4
number of genes
1000
2
4
-4
-2
0
2
log(expression ratio)
4
600
1200
800
400
-2
0
2
4
YP.galactose.vs.reference.pool.car.2
galactose
vs reference
0
-4
-2
0
2
4
-6
-4
-2
0
2
4
log(expression ratio)
YP.raffinose.vs.reference.pool.car.2
raffinose
vs reference
0
-6
-4
log(expression ratio)
200
200
0
-6
YP.mannose.vs.reference.pool.car.2
mannose
vs reference
0
-2
log(expression ratio)
-6
log(expression ratio)
600
number of genes
600
200
-4
4
YP.fructose.vs.reference.pool.car.2
fructose
vs reference
log(expression ratio)
0
-6
2
400
-6
YP.glucose.vs.reference.pool.car.2
glucose
vs reference
0
YP.sucrose.vs.reference.pool.car.2
sucrose
vs reference
400
2
-2
0
0
-4
0
200
-2
log(expression ratio)
4
raffinose.car.1
raffinose
log(expression ratio)
YP.ethanol.vs.reference.pool.car.2
ethanol
vs reference
2
0
-6
0
-4
mannose
1000
4
0
800
0
-2
400
-2
-4
log(expression ratio)
200
-4
600
600
0
-6
-6
0
-6
number of genes
sucrose
4
mannose..car.1
log(expression ratio)
sucrose.car.1
2
600
glucose
glucose.car.1
4
0
number of genes
2
-2
number of genes
0
-4
log(expression ratio)
1200
-2
0
-6
1200
2
0
0
-4
200
number of genes
0
200 400 600
number of genes
1000
600
200
number of genes
galactose
log(expression ratio)
number of genes
-2
ethanol
ethanol.car.1
800
-4
log(expression ratio)
galactose.car.1
-6
number of genes
0
-6
number of genes
4
800
2
1000
0
600
-2
log(expression ratio)
number of genes
-4
YAP1.overexpression
YAP1
overexpression
200
400
0
0
-6
200 400 600 800
number of genes
1200
MSN4.overexpression
MSN4
overexpression
800
number of genes
1500
MSN2.overexpression..repeat.
MSN2
overexpression
500
number of genes
Selected experiments
-6
-4
-2
0
2
log(expression ratio)
4
-6
-4
-2
0
2
log(expression ratio)
4
Repressed by mannose (at least 3-fold)
Galactose utilization
(redundancy in the database ?)
inferred
Citrate cycle with shunt
gluconeogenesis
Remark: arrows should be displayed as bi-directional
Repressed by mannose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis
Citrate cycle with shunt
Galactose utilization
gluconeogenesis
Remark: arrows should be displayed as bi-directional
Induced by galactose (at least 2-fold)
Galactose utilization
Remark: arrows should be displayed as bi-directional
Repressed by glucose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis
Galactose utilization
gluconeogenesis
Related documents