Download ppt - pedagogix

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics
Metabolic pathway analysis
Jacques van Helden
[email protected]
Graph-based analysis of biochemical networks
Examples of metabolic pathways
Jacques van Helden
[email protected]
Methionine Biosynthesis in S.cerevisiae
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
Aspartate kinase
HOM3
Aspartate semialdehyde
deshydrogenase
HOM2
Homoserine
deshydrogenase
HOM6
L-aspartyl-4-P
NADPH
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threonine
biosynthesis
NADPH
NADP+
1.1.1.3
L-Homoserine
AcetlyCoA
CoA
2.3.1.31
Met31p
met32p
Homoserine
O-acetyltransferase
MET2
O-acetylhomoserine
(thiol)-lyase
MET17
MET31
MET32
O-acetyl-homoserine
Sulfur
assimilation
Sulfide
4.2.99.10
MET28
Homocysteine
Cysteine biosynthesis
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
Methionine synthase
(vit B12-independent)
MET6
Cbf1p/Met4p/Met28p
complex
CBF1
MET4
Gcn4p
GCN4
L-Methionine
S-adenosyl-methionine
synthetase I
H20; ATP
2.5.1.6
S-adenosyl-methionine
Pi, PPi
synthetase II
S-Adenosyl-L-Methionine
Met30p
SAM1
SAM2
MET30
Methionine Biosynthesis in E.coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
aspartate kinase II/
homoserine dehydrogenase II
2.7.2.4
metL
L-aspartyl-4-P
Lysine
biosynthesis
Threonine
biosynthesis
NADPH
1.2.1.11
NADP+; Pi
L-aspartic semialdehyde
NADPH
NADP+
Aspartate semialdehyde
deshydrogenase
asd
1.1.1.3
L-Homoserine
SuccinylSCoA
HSCoA
Homoserine
O-succinyltransferase
metA
Cystathionine-gamma-synthase
metB
Cystathionine-beta-lyase
metC
Cobalamin-independenthomocysteine transmethylase
metE
Cobalamin-dependenthomocysteine transmethylase
metH
2.3.1.46
Methionine
repressor
Alpha-succinyl-L-Homoserine
Cysteine
biosynthesis
L-Cysteine
4.2.99.9
Succinate
H2O
Pyruvate; NH4+
Cystathionine
4.4.1.8
Homocysteine
5-MethylTHF
THF
2.1.1.14
2.1.1.13
L-Methionine
ATP; H2O
Pi; PPi
2.5.1.6
S-Adenosyl-L-Methionine
metR
metR
metJ
Alternative methionine pathways
L-Aspartate
2.7.2.4
S.cerevisiae
E.coli
L-aspartyl-4-P
1.2.1.11
L-aspartic semialdehyde
1.1.1.3
L-Homoserine
2.3.1.31
2.3.1.46
Alpha-succinyl-L-Homoserine
O-acetyl-homoserine
4.2.99.9
Cystathionine
4.2.99.10
4.4.1.8
Homocysteine
2.1.1.14
L-Methionine
2.5.1.6
S-Adenosyl-L-Methionine
KEGG "consensus pathway" for Methionine metabolism
Lysine biosynthesis in Escherichia coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
aspartate kinase III
metL
aspartate semialdehyde
deshydrogenase
asd
dihydrodipicolinate
synthase
dapA
dihydrodipicolinate
reductase
dapB
tetrahydrodipicolinae
N-succinyltransferase
dapD
succinyl diaminopimelate
aminotransferase
dapC
N-succinyldiaminopimelate
desuccinylase
dapE
diaminopimelate
epimerase
dapF
diaminopimelate
decarboxylase
lysA
L-aspartyl-4-P
NADPH; H+
Methionine
biosynthesis
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threnonine
biosynthesis
pyruvate
2 H2O
4.2.1.52
dihydropicolinic acid
NADPH or NADH; H+
NADP+ or NAD+
1.3.1.26
tetrahydrodipicolinate
succinyl CoA
CoA
2.3.1.117
N-succinyl-epsilon-ketoL-alpha-aminopimelic acid
glutamate
alpha-ketoglutarate
2.6.1.17
succinyl diaminopimelate
H2O
succinate
3.5.1.18
LL-diaminopimelic acid
5.1.1.7
meso-diaminopimelic acid
CO2
3.5.1.18
L-lysine
lysR
protein
lysR
Lysine biosynthesis in Saccharomyces cerevisiae
2-Oxoglutarate
Acetyl-CoA
CoA
homocitrate synthase
LYS20
homocitrate dehydratase
LYS7
4.1.3.21
1,2,4-Tricarboxylate
H2O
But-1-ene-1,2,4-tricarboxylate
4.2.1.36
homoaconitate hydratase
LYS4
Homoisocitrate
NAD+
H+; NADH
1.1.1.87
Oxaloglutarate
CO2
Homoisocitrate
dehydrogenase
1.1.1.87
2-Oxoadipate
L-Glutamate
2-Oxoglutarate
aminoadipate
aminotransferase
2.6.1.39
L-2-Aminoadipate
H+ ; NADH (or NADPH)
NAD+( or NADP+); H2O
1.2.1.31
amlnoadipate semialdehyde
dehydrogenase
LYS5
LYS2
L-2-Aminoadipate 6-semialdehyde
L-Glutamate ; NADPH (or NADH); H+
NADP+ (OR NAD+); H2O
1.5.1.10
saccharopine dehydrogenase
(glutamate forming)
LYS9
N6-(L-1,3-Dicarboxypropyl)-L-lysine
NADP+ (OR NAD+) ; H2O
2-Oxoglutarate ; NADPH (OR NADH) ; H+
1.5.1.7
L-lysine
saccharopine dehydrogenase
(lysine forming)
LYS1
Lysine biosynthesis in KEGG (yeast enzymes in green)
EcoCyc example - proline utilization
EcoCyc example - proline biosynthesis
Ecocyc - metabolic overview
KEGG example : proline and arginine metabolism (E.coli)




where is proline ?
how is proline synthesized in E.coli ?
how is proline catabolized in E.coli ?
is it obvious that reactions 1.5.99.8 and
1.5.1.2 have distinct side reactants ?
Graph-based analysis of biochemical networks
Pathway reconstruction
by reaction clustering
Jacques van Helden
[email protected]
A graph of compounds and reactions
Reactions from KEGG
Compound nodes
• 10,166 compounds
(only 4302 used by one reaction)
Reaction nodes
• 5,283 reactions
Arcs
• 10,685 substrate  reaction (7,297 non-trivial)
• 10,621 reaction  product (6,828 non-trivial)
Metabolic Pathways as subgraphs
Escherichia coli



4219 Genes (Blattner)
967 enzymes (Swissprot)
159 pathways (EcoCyc)
Reconstructing a pathway from a subset of reactions

Input:


a set of reactions (the seed reactions)
Output:

a metabolic pathway including
•
•

the seed reactions, together with their substrates and products
optionally, some additional reactions, interaalated to improve the
pathway connectivity
the pathway can either be connected, or contain several
unconnected components
Seed nodes
Compound
Reaction
Seed Reaction
Linking seed nodes
Compound
Reaction
Seed Reaction
Direct link
Enhance linking by intercalating reactions
Compound
Reaction
Seed Reaction
Direct link
Intercalated reaction
Subgraph extraction
Validation of the method

Take a set of experimentally characterized pathways,
and for each one




Select a subset of enzymes
Use the reactions catalysed by these enzymes as seed nodes
Extract the subgraph
Compare with known pathway
Lysine biosynthesis in E.coli
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
aspartate kinase III
lysC
aspartate semialdehyde
deshydrogenase
asd
dihydrodipicolinate
synthase
dapA
dihydrodipicolinate
reductase
dapB
tetrahydrodipicolinae
N-succinyltransferase
dapD
succinyl diaminopimelate
aminotransferase
dapC
N-succinyldiaminopimelate
desuccinylase
dapE
diaminopimelate
epimerase
dapF
diaminopimelate
decarboxylase
lysA
L-aspartyl-4-P
NADPH; H+
Methionine
biosynthesis
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threnonine
biosynthesis
pyruvate
2 H2O
4.2.1.52
dihydropicolinic acid
NADPH or NADH; H+
NADP+ or NAD+
1.3.1.26
tetrahydrodipicolinate
succinyl CoA
CoA
2.3.1.117
N-succinyl-epsilon-ketoL-alpha-aminopimelic acid
glutamate
alpha-ketoglutarate
2.6.1.17
succinyl diaminopimelate
H2O
succinate
3.5.1.18
LL-diaminopimelic acid
5.1.1.7
meso-diaminopimelic acid
CO2
3.5.1.18
L-lysine
lysR
protein
lysR
Example: reconstitution of lysine pathway

Gap size: 0


Seeds










all Ecs from original pathway are
provided as seeds
1.2.1.11
1.3.1.26
2.3.1.117
2.6.1.17
2.7.2.4
3.5.1.18
4.1.1.20
4.2.1.52
5.1.1.7
Result:


Inferring reaction orientation
(reverse or forward)
Ordering
Example: reconstitution of lysine pathway



Gap size: 1
5 seed reactions
Result



Inferring missing
steps
Inferring reaction
orientation
Ordering
Example: reconstitution of lysine pathway



Gap size: 2
4 seed reactions
Result


E.coli pathway
found
Alternative
pathways also
returned
Example: reconstitution of lysine pathway



Gap size: 3
3 seed reactions
Result

E.coli pathway is not
found, because the
program finds shortcuts
between the seed
reactions
Applications of pathway reconstruction




We have the complete genome for dozens of bacteria, for which
there is almost no experimental characterization of metabolism
For these genomes, enzymes have been predicted by sequence
similarity
In some cases, one expects to find the same pathways as in model
organisms, but in other cases, variants or completely distinct
pathways
For each known pathway from model organisms


Select the subset of reactions for which an enzyme exists in the target
organism
If a reasonable number of reactions are present
• Using these as seeds, reconstruct a pathway
• Preferentially (but not exclusively) intercalate reactions for which an
enzyme has been detected in the target organism
Graph-based analysis of biochemical networks
From gene expression data
to pathways
Jacques van Helden
[email protected]
Reaction clustering and gene expression data


Many biochemical pathways are co-regulated at the
transcriptional level.
Starting from the observation that a group of genes is coregulated, try to find if they could be involved in a
common pathway.
Gene expression data: cell cycle
Alpha
cdc15 cdc28 Elu
MCM
CLB2
SIC1
MAT
CLN2
Y'
MET
Spellman et al. (1998).
Mol Biol Cell 9(12), 3273-97.
Gilbert et al. (2000).
Trends Biotech. 18(Dec), 487-495.
Study case : cluster of co-regulated genes
ID
YKR069W
YFR030W
YGL125W
YKL001C
YPR167C
YLR303W
YJR010W
YER091C
name
met1
met10
met13
met14
MET16
MET17
met3
met6
YIR017C
YGR055W
YJR137C
YER042W
YIL074C
YLL061W
YLL062C
YLR302C
YNL276C
YPL250C
YPL274W
MET28
MUP1
ECM17
decription
siroheme synthase
subunit of assimilatory sulfite reductase
putative methylenetetrahydrofolate reductase (mthfr)
adenylylsulfate kinase
3'phosphoadenylylsulfate reductase
O-Acetylhomoserine-O-Acetylserine Sulfhydralase
ATP sulfurylase
vitamin B12-(cobalamin)-independent isozyme of methionine synthase
(also called N5-methyltetrahydrofolate homocysteine
methyltransferase or 5-methyltetrahydropteroyl triglutamate
homocysteine
Transcriptionalmethyltransferase)
activator of sulfur amino acid metabolism
high affinity methionine permease
ExtraCellular Mutant
KEGG - gene search in pathway maps
KEGG - reaction coloring in pathway maps
KEGG - reaction coloring in pathway maps
KEGG - reaction coloring in pathway maps
Building pathways from gene clusters
chip 3
...
1
2
3
4
5
6
7
chip 2
gene
gene
gene
gene
gene
gene
gene
chip 1
Gene
Experiment
1.24
-0.56
1.39
-0.30
-0.29
0.66
1.15
0.43
NA
0.26
0.66
0.57
0.38
0.32
0.40
NA
-0.09
0.72
0.59
0.48
0.20
0.40
NA
0.08
-0.64
0.72
0.03
0.48
gene expression
profiles
gene 1
expr
protein 1
cat 1
react 1
gene 2
expr
protein 2
cat 2
react 2
gene 3
expr
protein 3
cat 3
gene 4
expr
protein 4
cat 4
gene 5
expr
protein 5
cat 5
gene 6
expr
protein 6
cat 6
gene 7
expr
protein 7
gene 8
expr
protein 8
gene 9
expr
protein 9
react 3
react 4
Classification
Pathway
reconstruction
cluster of
co-regulated genes
Putative
pathway
Pathway found in Spellman’s “MET” cluster
Sulfate
ATP
PPi
Sulfate adenylyl
transferase
MET3
Adenylyl sulfate
kinase
MET14
3'-phosphoadenylylsulfate
reductase
MET16
Putative
Sulfite reductase
MET5
sulfide
Sulfite reductase
(NADPH)
MET10
4.2.99.10
O-acetylhomoserine
(thiol)-lyase
MET17
Methionine synthase
(vit B12-independent)
MET6
2.7.7.4
Adenylyl sulfate (APS)
ATP
ADP
2.7.1.25
3'-phosphoadenylylsulfate (PAPS)
NADPH
NADP+; AMP; 3'-phosphate (PAP); H+
1.8.99.4
sulfite
3 NADPH; 5H+
3 NADP+; 3 H2O
1.8.1.2
O-acetyl-homoserine
Homocysteine
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
L-Methionine
Analysis of Gene Expression Data
Gene cluster
20 genes
Identify genes coding for enzymes
7 enzymes
Identify subset of
catalyzed reactions
8 reactions
Interconnect these reactions to
find all possible pathways
Automatic Graph Layout
Compare with Classical Pathways
Pathway Diagram
Known
Pathways
2 matching
pathways
Comparison with Sulfur assimilation
Sulfate (extracellular)
Sulfate transporter
SUL1
Sulfate transporter
SUL2
Sulfate adenylyl
transferase
MET3
Sulfate transport
Sulfate (intracellular)
ATP
PPi
2.7.7.4
Met31p
Met32p
Adenylyl sulfate (APS)
ATP
ADP
2.7.1.25
Adenylyl sulfate
kinase
MET14
3'-phosphoadenylylsulfate
reductase
MET16
MET31
MET32
3'-phosphoadenylylsulfate (PAPS)
NADPH
NADP+; AMP; H+;
3'-phosphate (PAP)
3 NADPH; 5H+
3 NADP+; 3 H2O
1.8.99.4
Cbf1p/Met4p/Met28p
complex
sulfite
1.8.1.2
sulfide
MET28
Putative
Sulfite reductase
MET5
Sulfite reductase
(NADPH)
MET10
Methionine biosynthesis
Gcn4p
Met31p
CBF1
MET4
GCN4
MET30
Comparison with methionine biosynthesis
Aspartate
biosynthesis
L-Aspartate
ATP
ADP
2.7.2.4
Aspartate kinase
HOM3
Aspartate semialdehyde
deshydrogenase
HOM2
Homoserine
deshydrogenase
HOM6
L-aspartyl-4-P
NADPH
NADP+; Pi
1.2.1.11
L-aspartic semialdehyde
Threonine
biosynthesis
NADPH
NADP+
1.1.1.3
L-Homoserine
AcetlyCoA
CoA
2.3.1.31
Met31p
met32p
Homoserine
O-acetyltransferase
MET2
O-acetylhomoserine
(thiol)-lyase
MET17
MET31
MET32
O-acetyl-homoserine
Sulfur
assimilation
Sulfide
4.2.99.10
MET28
Homocysteine
Cysteine biosynthesis
5-methyltetrahydropteroyltri-L-glutamate
5-tetrahydropteroyltri-L-glutamate
2.1.1.14
Methionine synthase
(vit B12-independent)
MET6
Cbf1p/Met4p/Met28p
complex
CBF1
MET4
Gcn4p
GCN4
L-Methionine
S-adenosyl-methionine
synthetase I
H20; ATP
2.5.1.6
S-adenosyl-methionine
Pi, PPi
synthetase II
S-Adenosyl-L-Methionine
Met30p
SAM1
SAM2
MET30
Summary



Starting from an unordered cluster of genes, one gets an ordered set
of reactions, connected to form a pathway
Should permit discovery of novel pathways, that are not stored in
any pathway database yet
Interpretation of intercalated reactions




enzyme is not regulated
DNA chip defect for that gene
gene was not on the DNA chip
enzyme remains to be identified in that organism
Analysis of data from Gasch et al.





Gasch et al (2000). Molecular Biology of the Cell,
11:4241-4257
6152 yeast genes
Various conditions of stress (heat shock, osmotic shock,
peroxide, amino acid starvation, nitrogen depletion
Steady-state growth on alternative carbon sources
Overexpression studies
4
4
-4
4
-4
-2
0
2
0
2
4
-2
0
2
log(expression ratio)
600
200
number of genes
1200
4
-6
-4
4
-2
0
2
4
log(expression ratio)
number of genes
YP.galactose.vs.reference.pool.car.2
galactose
vs reference
-4
-2
0
2
4
-6
-4
-2
0
2
4
log(expression ratio)
YP.sucrose.vs.reference.pool.car.2
sucrose
vs reference
number of genes
1000
-4
4
0
2
YP.raffinose.vs.reference.pool.car.2
raffinose
vs reference
0
number of genes
600
-6
2
800
number of genes
1000
600
-6
1000
-2
log(expression ratio)
0
YP.fructose.vs.reference.pool.car.2
fructose
vs reference
YP.mannose.vs.reference.pool.car.2
mannose
vs reference
number of genes
-4
-2
log(expression ratio)
0 200
600
number of genes
0 200
-6
-4
log(expression ratio)
YP.glucose.vs.reference.pool.car.2
glucose
vs reference
0
raffinose.car.1
raffinose
0 200
-6
4
-2
log(expression ratio)
0
-6
log(expression ratio)
-4
0
600
number of genes
2
-6
log(expression ratio)
0 200
600
number of genes
200
0
4
YP.ethanol.vs.reference.pool.car.2
ethanol
vs reference
0
-2
2
number of genes
sucrose
-4
0
log(expression ratio)
sucrose.car.1
-6
-2
4
mannose
number of genes
-6
2
mannose..car.1
200 400 600
number of genes
2
0
glucose
0
0
log(expression ratio)
-2
log(expression ratio)
1000
number of genes
600
-2
-4
glucose.car.1
0 200
-4
0
-6
400
2
800
0
400
galactose
-2
log(expression ratio)
1200
-4
800
-6
galactose.car.1
-6
ethanol
ethanol.car.1
400
4
1200
2
800
0
400
-2
log(expression ratio)
600
-4
0 200
-6
YAP1YAP1.overexpression
overexpression
0
0
400
800
number of genes
1200
MSN4.overexpression
MSN4
overexpression
number of genes
1500
0 500
number of genes
MSN2.overexpression..repeat.
MSN2
overexpression
200 400 600 800
Selected experiments
-6
-4
-2
0
2
log(expression ratio)
4
-6
-4
-2
0
2
log(expression ratio)
4
Repressed by mannose (at least 3-fold)
Galactose utilization
(redundancy in the database ?)
inferred
Citrate cycle with shunt
gluconeogenesis
Remark: arrows should be displayed as bi-directional
Repressed by mannose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis
Citrate cycle with shunt
Galactose utilization
gluconeogenesis
Remark: arrows should be displayed as bi-directional
Induced by galactose (at least 2-fold)
Galactose utilization
Remark: arrows should be displayed as bi-directional
Repressed by glucose (at least 2-fold)
(redundancy in the database ?)
gluconeogenesis
Galactose utilization
gluconeogenesis
Related documents