Download Integration of omics data with biochemical reaction

Document related concepts

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene wikipedia , lookup

Expression vector wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Signal transduction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pharmacometabolomics wikipedia , lookup

Metabolism wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Metabolomics wikipedia , lookup

Paracrine signalling wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Biochemical cascade wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcript
Integration of omics data with
biochemical reaction networks
Maike Kathrin Aurich
Department
Department of
of Life
Life and
and Environmental
Environmental Sciences
Sciences
University
University of
of Iceland
Iceland
2014
2014
INTEGRATION OF OMICS DATA WITH
BIOCHEMICAL REACTION NETWORKS
Maike Kathrin Aurich
Dissertation submitted in partial fulllment of
Philosophiae Doctor degree in Biology
Advisor
Professor Ines Thiele
Thesis Committee
Professor Ólafur S. Andrésson
Professor Ines Thiele
Professor Jón.J. Jónsson
Professor Sigurður Brynjolfsson
Opponents
Professor Dietmar Schomburg
Professor Fabien Jourdan
Department of Life and Environmental Sciences
School of Engineering and Natural Sciences
University of Iceland
Reykjavik, June 2014
Integration of omics data with biochemical reaction networks
Integration of omics data with biochemical networks
Dissertation submitted in a partial fulfillment of a Ph.D. degree in Biology
c 2014 Maike Kathrin Aurich
Copyright All rights reserved
Department of Life and Environmental Sciences
School of Engineering and Natural Sciences
University of Iceland
Sturlugata 7 (Askja)
101, Reykjavik, Reykjavik
Iceland
Telephone: 525 4000
Bibliographic information:
Maike Kathrin Aurich, 2014, Integration of omics data with biochemical reaction
networks, Ph.D. thesis, Department of Life and Environmental Sciences, University
of Iceland.
ISBN XX
Printing: Háskólaprent, Fálkagata 2, 107 Reykjavik
Reykjavik, Iceland, June 2014
Abstract
The appearance of omics data sets has contributed to the rapid development of
systems biology, which seeks the understanding of complex biological systems.
Constraint-based modeling is one modeling formalism applied in systems biology,
which relies on genome-scale network reconstructions. Metabolic reconstructions
are increasingly used to understand normal cellular and disease states, which often
involves the generation of cell-line or tissue-specific metabolic models through the
integration of omics data. Metabolomic data can be easily obtained. Yet, methods
for the generation of condition-specific metabolic models are less well developed. In
this thesis, a workflow is established for the generation of condition-specific models
from extracellular metabolomic data and the human metabolic model. The analysis
of the models enables the investigation of metabolic phenotypes among cancer cell
line specific models, based on model predictions of ATP yield, and the robustness
of the models towards environmental and genetic perturbation. The models are built
through a rigid reduction of exchange reactions, which emphasizes the detected
metabolite concentration changes. However, the internal pathway redundancy remains widely preserved. Integration of transcriptomic reduces the internal pathway
redundancy. Hence, in a following study, two lymphoblastic leukemia cells line
models are generated, combining metabolomic and transcriptomic data. The models explain distinctive concentration changes in the spent medium of the two cancer
cell lines by different utilization of glycolysis and oxidative phosphorylation. Analysis further reveals the accumulation of differential gene regulation and alternative
splicing events at key steps of central metabolic pathways. Metabolism is closely
intertwined with other cellular processes, namely signaling pathways, which play
a key role in diseases like cancer. Hence, a contextualization procedure for signaling networks was developed, opening yet another avenue for omics data analysis.
This approach is demonstrated through the contextualization of the Toll-like receptor (TLR) signaling network towards a generic monocyte TLR signaling network
at first, and subsequently towards an LPS activated TLR signaling network. Taken
together, my work extends the scope of omics data integration within the COBRA
field. The inference of internal network states from extracellular measurements, as
demonstrated herein, holds great potential for personalized medicine. However, further development is needed for the interpretation of metabolomic data derived from
bio-fluids. Additionally, contextualization of signaling and metabolic networks can
become crucial to understand the interplay between different cellular processes that
collectively give rise to complex diseases.
Útdráttur
Tilkoma mengjagagna hefur ýtt undir hraða þróun kerfislíffræði, fræðigreinar sem
miðar að því að auka skilning á flóknum líffræðilegum kerfum. Meðal þeirra líkana
sem eru notuð í kerfislíffræði eru skorðuð líkön af efnaskiptanetum, sem ná yfir
stóran hluta af genamengjum lífvera. Líkön af efnaskiptanetum eru notuð í sífellt
meiri mæli til að skilja hegðun fruma í heilbrigðu eða sjúku ástandi. Það felur
oft í sér smíði sérhæfðra líkana af ákveðinni frumulínu eða vefjagerð við ákveðin
skilyrði. Slík skilyrða-sérhæfð líkön má smíða með því að tvinna saman mengjagögn og almenn líkön. Utanfrumumælingar á efnaskiptaefnamengi fruma við
tiltekin skilyrði má nota til að smíða sérhæfð efnaskiptalíkön. Auðvelt að nálgast slíkar mælingar, en aðferðir til að smíða líkön út frá þeim hafa hingað til ekki
verið nægilega þróaðar. Þessi ritgerð mun kynna verkferli til að smíða skilyrðasérhæfð efnaskiptalíkön út frá utanfrumumælingum af efnaskiptaefnamengjum og
almennu líkani af efnaskiptaneti manna. Sérhæfð líkön fyrir krabbameinsfrumulínur
má nota í rannsóknum á efnaskiptasvipgerðum slíkra frumulína til að spá fyrir um
ATP nýtni og næmni fyrir umhverfis- og genabreytingum. Líkönin eru smíðuð með
því að fækka víxlunarefnahvörfum í samræmi við mældar breytingar á styrkleika
efnaskiptaefna. Þessi fækkun ein og sér leiðir ekki til mikillar minnkunar á umfremd innri efnaskiptaferla. Minnkun á umfremd innri efnaskiptaferla fæst fram
með viðbótargögnum um umritamengi frumulínanna. Í rannsókn sem hér er lýst
tvinnuðum við saman gögnum um bæði efnaskiptaefnamengi og umritamengi til
að smíða líkön af tveimur frumulínum úr hvítblæði í eitilfrumum. Líkönin skýra
mismun á styrkbreytingum í ræktunarvökva þessarra tveggja frumulína með mismunandi notkun á sykurrofi og oxunarfosfórun. Greining okkar leiddi einnig í ljós
uppsöfnun á mismunandi genastýringaratburðum og breytilegri splæsingu við lykilskref í miðlægum efnaskiptaferlum. Efnaskipti eru náið samtengd öðrum frumuferlum, sérstaklega boðefnaferlum sem leika lykilhlutverk í sjúkdómum eins og krabbameini. Við þróuðum því aðferð til að aðlaga boðefnanet og opnuðum þar með á
enn aðra leið til að greina mengjagögn. Við sýnum þessa aðferð með því að aðlaga
boðefnanet fyrir Toll-líka viðtaka (TLR net), fyrst að almennu TLR neti í einkjörnungum, svo að LPS virkjuðu TLR neti. Vinna mín í heild sinni eykur við umfang
samtvinnunar mengjagagna innan kerfislíffræði. Aðferðir til að draga ályktanir um
innri ástand efnaskiptaneta út frá utanfrumumælingum opna á mikla möguleika fyrir
einstaklingsmiðaðar lækningar, eins og sýnt er fram á hér. Þó er þörf á frekari þróun
aðferða til að túlka gögn um efnaskiptaefnamengi sem fengin eru úr lífvökva. Þar
að auki getur aðlögun boðefna- og efnaskiptaneta orðið lykilatriði í að skilja samspil
mismunandi frumuferla sem saman valda flóknum sjúkdómum.
To Inge Aurich, Miriam, Elias, Finn & Friederike
Contents
List of Figures
xi
List of Tables
xiii
Abbreviations
xxi
Acknowledgements
1
1 Introduction
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Systems biology . . . . . . . . . . . . . . . . . . . . . . . . . . . .
COBRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Methods to explore the solution space . . . . . . . . . . . .
1.2.2 Flux balance analysis . . . . . . . . . . . . . . . . . . . . .
1.2.3 Flux variability analysis . . . . . . . . . . . . . . . . . . .
1.2.4 Sampling analysis . . . . . . . . . . . . . . . . . . . . . .
Biochemical networks . . . . . . . . . . . . . . . . . . . . . . . .
Signaling networks . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Innate Immunity . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Reconstruction of human Toll-like receptor signaling network
Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Human metabolic genome-scale reconstructions . . . . . . .
1.5.2 Cancer as a metabolic disease . . . . . . . . . . . . . . . .
1.5.3 The importance of extracellular membrane transporters in
Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.4 Using COBRA to investigate cancer metabolism . . . . . .
High-throughput data . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Transcriptomics . . . . . . . . . . . . . . . . . . . . . . . .
1.6.2 Proteomics . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.3 Metabolomics . . . . . . . . . . . . . . . . . . . . . . . . .
Analysis of omics data in the context of COBRA models . . . . . .
1.7.1 Methods for network contextualization . . . . . . . . . . .
1.7.2 Human cell-type specific metabolic models . . . . . . . . .
1.7.3 Integration of metabolomic data sets . . . . . . . . . . . . .
1.7.4 COBRA for biomedical applications and personalized health
1.7.5 Existing challenges . . . . . . . . . . . . . . . . . . . . . .
Preview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
4
5
5
6
6
6
7
7
8
10
12
14
16
17
17
18
18
19
20
21
22
23
26
vii
2 Metabolic heterogeneity and robustness among the NCI-60 cancer
cell lines
29
2.1
2.2
2.3
2.4
2.5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Generation of heterogeneous cancer cell line models . . . .
2.2.2 Distinction of metabolic phenotypes . . . . . . . . . . . . .
2.2.3 Robustness towards genetic and environmental perturbation
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matherial and Methods . . . . . . . . . . . . . . . . . . . . . . . .
Supplementary material . . . . . . . . . . . . . . . . . . . . . . . .
29
32
32
32
34
38
41
47
3 Prediction of intracellular metabolic states from extracellular metabolomic
data
61
3.1
3.2
3.3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pipeline for generation of condition-specific metabolic cell line models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Generation of experimental data . . . . . . . . . . . . . . .
3.3.2 Analysis of experimental data . . . . . . . . . . . . . . . .
3.3.3 Generation of the condition-specific models . . . . . . . . .
3.3.4 Condition-specific metabolic models for CCRF-CEM and
Molt-4 cells . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.5 Condition-specific cell line models predict distinct metabolic
strategies . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Experimental validation of energy and redox status of CCRF-CEM
and Molt-4 cells . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Comparison of network utilization and alteration in gene expression
3.6 Accumulation of DEGs and AS genes at key metabolic steps . . . .
3.7 Single gene deletion . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . .
3.10 Supplementary material . . . . . . . . . . . . . . . . . . . . . . . .
61
63
65
65
65
66
66
67
69
70
71
73
73
77
88
4 Contextualization Procedure and Modeling of Monocyte Specic
TLR Signaling
103
4.1
4.2
viii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.1 Extensions of gene results in ihsTLRv2 . . . . . . . . . . . 105
4.2.2 Protein-Protein Interactions (PPI) in InnateDB and ihsTLRv2 107
4.2.3 SNPs in the TLR signaling network . . . . . . . . . . . . . 107
4.2.4 Tissue specific TLR expression . . . . . . . . . . . . . . . 109
4.2.5 Protein abundance of ihsTLRv2 in cancer cell lines . . . . . 113
4.2.6
4.3
4.4
4.5
Generation of a draft monocyte specific TLR model based
on gene expression data . . . . . . . . . . . . . . . . . . . 114
4.2.7 Literature based curation of the draft monocyte specific TLR
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2.8 Tailoring the monocyte TLR model to a LPS stimulation
specific model . . . . . . . . . . . . . . . . . . . . . . . . 118
4.2.9 Condition specific network states of monocyte TLR signaling 120
4.2.10 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 121
4.2.11 Setting quantitative gene expression changes into context . . 121
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 128
Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . 134
5 Conclusions and future directions
5.1
5.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Future applications . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Extension of the TLR signaling network. . . . . . . . .
5.2.2 Future directions in the integration of metabolomics data
5.2.3 COBRA modeling of cancer and beyond . . . . . . . .
157
.
.
.
.
.
.
.
.
.
.
157
160
161
161
162
Bibliography
165
6 List of Publications
185
ix
List of Figures
1.1
COBRA: Definition and methods for the functional analysis of the
feasible solution space. . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Applications of the human metabolic model. . . . . . . . . . . . . .
9
1.3
Omics data sets provide a snap-shot of the cellular components at
large scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.4
Ideal systems biological approach. . . . . . . . . . . . . . . . . . .
20
2.1
Metabolic models provide a context for the analysis of metabolomic
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
2.2
Distinction of the models based on energy and cofactor production. .
33
2.3
Distinct phenotypes with regard to oxygen requirements. . . . . . .
35
2.4
Six model clusters were distinguished according to the models robustness towards environmental changes. . . . . . . . . . . . . . .
36
2.5
The models have different sets of essential genes. . . . . . . . . . .
37
2.6
Variation between samples of the same cell line. . . . . . . . . . . .
52
2.7
ATP yield is not informative for the division of OxPhos models. . .
53
2.8
Distinct solution spaces were observed for the 120 models. . . . . .
53
2.9
Highest or lowest number of KOs were not associated with any phenotype defined by the previous analysis. . . . . . . . . . . . . . . .
54
2.10 ATP yield does not correlate with maximal growths. . . . . . . . . .
54
xi
2.11 Metabolic strategies considering both ATP producing glycolysis reactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
2.12 ATP yields do not correspond to the separated clusters of models
from the Phase plane analysis . . . . . . . . . . . . . . . . . . . . .
58
3.1
xii
Combined experimental and computational pipeline to study human
metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
3.2
Sampling reveals different utilization of glycolysis. . . . . . . . . .
68
3.3
Differences in the use of the TCA cycle by the CCRF-CEM model
and the Molt-4 model. . . . . . . . . . . . . . . . . . . . . . . . . .
85
3.4
Sampling reveals different utilization of oxidative phosphorylation. .
86
3.5
Experimental validation of model predictions. . . . . . . . . . . . .
87
3.6
Growth and apoptosis of Molt-4 and CCRF-CEM cells. . . . . . . .
92
4.1
Expression of ihsTLRv2 gene products in normal human tissues. . . 112
4.2
Workflow leading from ihsTLRv1 to a data driven monocyte and
LPS stimulated monocyte model. . . . . . . . . . . . . . . . . . . . 116
4.3
Definition of cutoff for initial monocyte draft-model. . . . . . . . . 117
4.4
Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.5
Network resulting from mapping of the up-regulated genes onto the
LPS stimulation specific monocyte model. . . . . . . . . . . . . . . 123
4.6
Comparison of (chemical compound) connectivity in the LPS stimulation specific versus the up-regulated sub-network. . . . . . . . . 124
4.7
Network modules resulting from mapping of the down-regulated
genes onto the LPS stimulation specific monocyte TLR model. . . . 125
List of Tables
1.1
Metabolite transporters relevant to cancer and their current coverage
in Recon 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.2
Methods for network contextualization. . . . . . . . . . . . . . . .
25
2.1
Reactions discarded from flux split analysis (and ATP yield). . . . .
46
2.2
Distinct Phenotypes. . . . . . . . . . . . . . . . . . . . . . . . . .
47
2.3
Sampling results of the isocitrate dehydrogenase and pyruvate dehydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Excluded were uncalibrated metabolites and those that could not be
produced nor consumed by Recon. . . . . . . . . . . . . . . . . . .
49
2.5
Added exchanges. . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
2.6
Metabolite uptake and secretion not possible in the model. . . . . .
51
2.7
Reactions added to the starting model. . . . . . . . . . . . . . . . .
59
2.8
Models that were infeasible when constraint to experimental growth. 60
3.1
Differentially expressed genes (DEGs) and alternative splicing (AS)
events of central metabolic and cancer-related pathways. . . . . . .
72
3.2
Reactions added to Recon 2 and the global model. . . . . . . . . . .
89
3.3
Comparison of flux changes and gene expression changes of genes
more highly expressed in Molt-4 cells. . . . . . . . . . . . . . . . .
90
Unique Knock-out (KO) genes for each cancer cell line model. . . .
91
2.4
3.4
xiii
3.5
Metabolomic data of CCRF-CEM cells (mapped). . . . . . . . . . .
93
3.6
Metabolomic data of CCRF-CEM cells (not mapped). . . . . . . . .
94
3.7
Metabolomic data of Molt-4 cells (mapped). . . . . . . . . . . . . .
95
3.8
Metabolomic data of Molt-4 cells (not mapped). . . . . . . . . . . .
96
3.9
Tables of absent genes. . . . . . . . . . . . . . . . . . . . . . . . .
97
3.10 Differentially expressed Recon 1 genes (down-regulated). . . . . . .
98
3.11 Differentially expressed Recon 1 genes (up-regulated). . . . . . . .
99
3.12 Detection limits for the definition of model bounds. . . . . . . . . . 100
3.13 Calculation of the growth rates and definition of upper (ub) and
lower bounds (lb) imposed on the CCRF-CEM model. . . . . . . . 101
3.14 Calculation of the growth rates and definition of upper (ub) and
lower bounds (lb) imposed on the Molt-4 model. . . . . . . . . . . 101
3.15 Lower bounds of commonly exchanged metabolites were adjusted
according to the relation of change in uptake/secretion in the experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xiv
4.1
Statistics of the gene extension of the generic human TLR model. . 106
4.2
Comparison between InnateDB interactions among ihsTLRv2 genes
and interactions of ihsTLRv2 network species within ihsTLRv2. . . 108
4.3
Table summarizing ihsTLRv2 genes with clinically linked SNPs,
corresponding clinical phenotypes and consequences of in silico
knock out on ihsTLRv2 function. . . . . . . . . . . . . . . . . . . . 110
4.4
Distribution of absent genes . . . . . . . . . . . . . . . . . . . . . 118
4.5
Inputs and outputs covered by generic (ihsTLRv2) and monocyte
specific (hMonoTLR & hMonoTLR_LPS) TLR signaling models. . 119
4.6
Maximum possible flux values for output reactions in the different
TLR signaling models. . . . . . . . . . . . . . . . . . . . . . . . . 120
4.7
TLR11 receptor was removed from ihsTLRv2 along with 10 reactions associated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.8
TLR11 receptor was removed from ihsTLRv2 along with seven
other metabolites. . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.9
Added exchange reactions. . . . . . . . . . . . . . . . . . . . . . . 135
4.10 Literature evidence for the presence of proteins in monocytes. . . . 135
4.11 Pathway curation of the monocyte draft-model based on output capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.12 Curation of hMonoTLR. . . . . . . . . . . . . . . . . . . . . . . . 137
4.13 Significantly up-regulated hMonoTLR_LPS genes. . . . . . . . . . 138
4.14 Significantly down-regulated hMonoTLR_LPS genes. . . . . . . . . 138
4.15 I/O relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.16 Changes due to mapping of quantitative gene expression changes. . 140
4.17 Changes due to mapping of quantitative gene expression changes
(part 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
xv
Glossary
• AS - Alternatively spliced genes
• ADP - Adenosinediphosphate
• AP-1 - Activating protein-1
• ATP - Adenosinetriphosphate
• ACHR - Artificial centering hit-and-run sampler
• BIGG - Biochemically, genetically, and genomically structured knowledgebase of the target organism
• CASP8 - Caspase-8
• COBRA - Constraint-based reconstruction and analysis
• CNS - Central nervous system
• CHO - Chinese Hamster ovary cells
• DEGs - Differentially expressed genes
• DABG - Detection above background
• DIOS - Distinct input/output pathways
• ETC - Electron transport chain
• EGFR - Epidermal growth factor receptor
• FC - Fold change
• FBA - Flux Balance Analysis
• FVA - Flux Variability Analysis
xvii
• FH - Fumerate hydratase
• FADD - Fas (TNFRSF6)-associated via death domain
• LC-MS - Liquid chromatography-mass spectrometry
• GC-MS - Gas chromatography-mass spectrometry
• GSH - Reduced glutathione
• G6PD - Glucose-6-phosphat-Dehydrogenase
• GEO - Gene Expression Omnibus
• GRAs - Gene-reaction associations
• GTP - Guanosine triphosphate
• GPRs - Gene-Protein-Reaction associations
• GENRE - Genome-scale network reconstruction
• GIMME - Gene Inactivity Moderated by Metabolism and Expression
• GIM(3)E - Gene Inactivation Moderated by Metabolism, Metabolomics and
Expression
• HMDB - Human Metabolomics Database
• HCC - Hepatocellular carcinoma
• HHT - Hereditary Hemorrhagic Telangiectasia
• HDHC - Histone acetylase
• HMR - Human metabolic reaction database
• HT - High-throughput data
• HLRCC - Renal-cell cancer
• iBMK - Immortalized baby mouse kidney epithelial cells
xviii
• IRFs - Interferon regulating factors
• iMAT - Integrative Metabolic Analysis tool
• IDH - Isocitrate dehydrogenase
• IL1R1 - Interleukin-1 receptor 1
• IRAK1 - IL-1 receptor associated kinase 1
• I/O - Input-output
• IRF3 - Interferon regulatory factor 3
• IRF7 - Interferon regulatory factor 7
• IKK - Inhibitor of the kappa light polypeptide gene enhancer in B-cells kinase
• KO - (Gene) knock-out
• LPS - Lipopolysaccharide
• LB - Lower bound
• LBP - Lipopolysaccharide-binding protein
• LP - Linear programming
• LPS - Lipopolysaccharide
• MDR - Multidrug resistance
• MBA - Model Building Algorithm
• mCADRE - Context-specificity Assessed by Deterministic Reaction Evaluation
• MILP - Mixed-integer linear programming
• MS - Mass-spectrometry
• NMR - Magnetic resonance spectroscop
xix
• NCF2 - Neutrophil cytosolic factor 2
• ORAC - Oxygen Radical Absorbance Capacity
• OF - Objective function
• PC - Pyruvate carboxylase
• PhPP - Phenotypic phase plane analysis
• PGDH - Phosphoglycerate dehydrogenase
• PPP - Pentose phosphate pathway
• PKA - Protein kinase A
• PELI3 - Pellino homolog 3
• PPI - Protein-Protein Interactions
• PI3K1A - Phosphoinositide 3-kinase
• PC - Pyruvate carboxylase
• PGK - Phosphoglycerate kinase
• ROS - Reactive oxygen species
• RMA - Robust Multi-array Analysis
• RC - Reductive carboxylation
• Recon - The human metabolic genome-scale reconstruction
• SOG (pathway) - Serine biosynthesis, one-carbon metabolism, and the glycine
cleavage system
• SDH - Succinate dehydrogenase
• SNP - Single nucleotide polymorphism
• S - Stoichiometric matrix
xx
• SPs - Side populations
• SDH - Succinate dehydrogenase
• TLR - Toll-like receptor
• TF - Transcription factors
• TIRAP - TIR domain containing adaptor protein
• TCA - Tricarboxylic acid
• UB - Upper bound
xxi
Acknowledgments
I thank the ERC that funded my doctorate studies at the Center for Systems Biology,
at the University of Iceland. Further, I thank Bernhard Ø. Palsson for giving me the
opportunity to conduct this PhD.
Foremost, I would like to thank my advisor Professor Ines Thiele, for providing me
with the opportunity to complete my PhD thesis at the Center for Systems Biology,
at the University of Iceland. I am very grateful for all the support and guidance
that made my thesis work possible. I am very grateful for his patience, motivation,
enthusiasm that she dedicated to me.
Many thanks to the remaining members of the Center for Systems Biology who
have provided their support, guidance, encouragement and friendship. Thank you
all so much for the time we spent together.
I want to thank my family for their support during the long years I spent abroad, the
understanding and uplifting words that I always received from them.
I thank all my friends in Iceland, Germany, and Austria for their patience and encouragement. Also, I am very happy about the support I received from my friends
in Luxembourg, who were a great help to make the final steps happen.
Particularly, I want to thank my friend Friederike. I would not be at this point
without you, although separated by an ocean, I was always aware of your believe in
me and your support. Thank you so much.
1
1 Introduction
1.1 Systems biology
Methodiological developments in molecular biology now allow simultaneous measurements of thousands of cellular components at different hierarchical levels, including mRNA, proteins, and metabolites (Figure 1.3, [1]). The flood of data has
challenged data management and analysis in biological laboratories, but at the same
time such comprehensive information provides valuable resources for systems biology, which seeks to investigate the behavior of biological systems at large scale [1].
Network reconstructions are a common tool applied by systems biologists, some of
which can be converted into in silico models and used to interrogate the respective
system’s functional properties [2]. Systems biologists apply different formalisms
to modeling and simulation [3, 4]. Generally, there is a separation between topdown and bottom-up network reconstruction. Top-down approaches of network reconstruction infer networks directly from the data, yet the resulting connections of
these networks might not represent actual biological interactions. Networks reconstructed in a bottom-up fashion based on extensive amounts of biochemical literature on the other hand, provide a mechanistic framework for the analysis of omics
data sets, and can therefore elucidate genotype-phenotype relationships. Detailed
dynamic models are still limited to small scale (e.g., single pathways) due to lack
of the kinetic parameters that describe each of the reaction in the complex systems.
The amount of data needed to obtain these parameters and their condition-specific
variation make the acquisition even more complicated [4, 5]. Constraint-based modeling and analysis (COBRA) circumvents this parameter bottleneck by assuming a
quasi steady-state. This assumption allows the simulation of the systems behavior at
large-scale, and because of its comprehensiveness, it constitutes an ideal framework
for the analysis of high-dimensional omics data sets [3, 4].
The focus of this PhD thesis in the field of systems biology is the fusion of omics
data sets and biochemical networks using COBRA for systems-wide analysis of biochemical processes, e.g., the innate immune response and metabolism in health and
disease. The following sections will provide an overview of the COBRA approach,
biochemical networks of human metabolism and innate immune signaling, as well
as the state-of-the-art of network contextualization and its application to biomedical
research, relevant to this thesis.
1
1.2 COBRA
The COBRA approach uses stoichiometry of biochemical reactions to mathematically represent biochemical networks of cellular processes, i.e., metabolism, signaling or transcriptional/translational networks [2, 6, 7, 8]. The genome-scale network reconstruction (GENRE) is assembled in a bottom-up reconstruction process,
according to standardized operating procedures, based on extensive amounts of
organism-specific literature [9]. It presents a biochemically, genetically, and genomically (BIGG) structured knowledge-base of the target organism that is curated
and validated to ensure correct prediction of biological functions by the resulting
model [2, 9, 10].
GENREs contain a hierarchical structure where known genes are connected to the
proteins and enzymes and the catalyzed reactions. These Gene-Protein-Reaction associations (GPRs) are formulated as Boolean rules considering isozymes (OR) and
all subunits of protein complexes (AND). The GPRs are the entry points for the integration of transcriptomic and proteomic data into the network context, and correct
formulation of the GPRs is an important prerequisite for any network contextualization.
Once a comprehensive reaction list has been compiled and all known genes have
been associated with those reactions, the reconstruction can be converted into a
mathematical model. This is through the conversion of the reaction list into a matrix
format (stoichiometric matrix (S)), and the formulation of the systems boundaries
equivalent to the constraints on the in vivo system (Figure 1.1, [2, 10]). The stoichiometric matrix (S) contains a row for each metabolite and a column for each reaction
[5, 10, 11]. The non-zero entries of the S describe which metabolites participate in
each reaction, with a negative entry identifying the substrates and a positive entry
defining a product [5].
Constraints restricting biological systems can be divided into three groups: Physicochemical (hard) constraints (mass and energy conservation), environmental constraints express the time and condition specific differences (e.g., pH or nutrients),
and self-imposed, regulatory constraints [12]. In the model, constraints are routinely applied as either balances or bounds. According to the physical law of mass
conservation, net production and consumption of a metabolite is balanced at steady
state. The steady-state assumption, (including all mass balance equations) is expressed mathematically by S * v = 0, where v is the flux vector containing all
reaction fluxes of one of the entire set of optimal states of the system [13]. The
steady-state is biologically justified through the notion that transients in biochemical reaction networks are much faster compared to other cellular events, e.g., cellular growth rates, and environmental and regulatory changes [13]. Bounds constitute
2
upper and lower limits (vmin ≤ v ≤ vmax ), and restrictions on reaction directions
vmin = 0 or vmax = 0 [5, 14] (Figure 1.1). The upper and lower bounds can further
be set in accordance with experimental data (e.g., metabolite uptake and secretion
fluxes) [5, 14], for a more authentic definition of the solution space, selecting the
condition-specific subset of feasible flux distributions from the entity of possible
network states (Figure 1.1). They provide the entry point for the integration of the
extracellular metabolomic data, as carried out in chapter 2 and chapter 3 of this
thesis.
Once brought into model format, the set of feasible network states can be interrogated (Figure 1.1), e.g., using matlab and the COBRA toolbox [15, 16].
Figure 1.1: COBRA: Definition and methods for the functional analysis of the feasible solution space. Figure redrawn based on Orth et al. 2010 and Price et al. 2004,
[11, 17]
1.2.1 Methods to explore the solution space
Numerous methods exist to interrogate COBRA models [15, 16]. These can be distinguished into biased and unbiased methods. Biased methods rely on the optimality principle and require a user-defined objective function (OF), which biologically
translates into the «cellular goal »(Figure 1.1). Biomass generation and ATP production are commonly used OFs [18, 19]. A biomass OF is used to identify the
subset of model states which support optimal biomass production by the model.
This optimality principle is, at least in microorganisms, thought to be the outcome
of an evolutionary process, driving the organism to maximal proliferation rates and
3
the optimal use of the available, usually limited resources. The definition of an
OF is more difficult for cells of multi-cellular organisms, e.g., differentiated, nonproliferating cells. In contrast, highly proliferating cancer cells might indeed seek
optimal biomass production [20].
Unbiased methods in contrast allow the interrogation of the allowable solution space
without any prior optimality assumption. In the course of this PhD, sampling methods have increasingly been applied to investigate cell-type specific metabolic networks [5, 12, 21]. Below, a subset of biased and unbiased interrogation methods
relevant to the presented studies will be briefly introduced.
1.2.2 Flux balance analysis
Flux balance analysis (FBA) is used to predict a single flux distribution through
the formulation of a linear programming (LP) problem either minimizing of maximizing the flux through the objective function subject to all imposed constraints
(Figure 1.1) [11, 12]. The output flux vector v describes how much each reaction in
the network contributes to the phenotype [11].
The following LP problem is solved to maximize the stated objective Z (adapted
from [11, 22]):
MAX Z = c · v
s.t. S · v = 0
vmin ≤ v ≤ vmax ,
where c is a vector that identifies the objective, and the column vector v indicates
how much each reaction contributes [11].
Since S is under-determined, i.e., the number of reactions exceeds the number of
metabolites, high numbers of flux distributions exist of the same maximum objective
value [13, 22]. In comparison, FBA returns only one solution which lies at a corner
of the allowable solution space (Figure 1.1) [11].
Whereas the multitude of alternate optimal solutions reflects the systems flexibil-
4
ity, depends the actual cellular state on additional factors such as the interplay of
enzymatic and genetic regulatory events [5, 13]. In the absence of detailed enough
constraints to exclude unlikely network states, all alternate optimal solutions could
represent biologically meaningful solutions, making them worth to be investigated
(Figure 1.1).
1.2.3 Flux variability analysis
Flux Variability Analysis (FVA) can provide insights into alternate optimal solutions, which are an expression of the network redundancy. This redundancy contributes to the robustness of the metabolic network (Figure 1.1) [22]. FVA is a
variation of FBA, which reports for each reaction in the model the minimal and the
maximal allowable flux [22]. The analysis returns the range of allowable fluxes for
each reaction, and can identify reactions that are never, or differently used under
distinct sets of environmental or genetic conditions [5, 22].
Another biased FBA-based interrogation method is single gene deletion. Hereby,
gene knock-outs (KO) are simulated by constraining gene-associated reactions to
zero, followed by the assessment of the impact of the network perturbation through
FBA [23]. This analysis is used to identify the weak links in the network, e.g., for
the prediction of drug targets to combat cancer [24].
1.2.4 Sampling analysis
A more comprehensive resolution of the alternate flux distributions compared to
FVA can be achieved through sampling analysis. During the sampling process randomly distributed points (each comprising a flux distribution) are picked from the
feasible solution space, as a representation for the entire solution space (Figure 1.1,
[25]). The artificial centering hit-and-run sampler (ACHR) is implemented in the
COBRA toolbox, and has been used to study the solution space of larger networks
[15, 25, 26]. The procedure starts from an initial point moving through the space
with randomly chosen direction and step length [25]. Only every i-th point is collected to support a random distribution of the sampling points. However, in large
scale networks, high dimensionality and size of the solution space render the coverage of the entire solution space in a finite time uncertain, which has also been
referred to as the slow mixing problem [25]. The outcome of the sampling analysis is commonly illustrated as ranges of feasible fluxes, which is comparable to
the results of the FVA (only unbiased), or as reaction-wise probability distributions
[12].
5
1.3 Biochemical networks
A cell comprises of reams of components that support its structure, its functions
or both. Even more complex than the component list are the interactions among
the components, which make up the global cellular network. COBRA models to
date cover only parts of the global cellular network, which are classified based on
biological functionality, e.g., metabolism or signaling [4]. However, the integration
of multiple functions, and the development towards whole-cell models is an active
field of research [27, 28].
1.4 Signaling networks
The purpose of signaling networks is to convey information between the environment and the cell [2]. Signaling networks transmit extracellular signals, emitted
either by other cells or clues from the environment into the cell and into the nucleus where transcription factors induce gene expression [2]. The process of signal
transduction involves consecutively, the binding of an extracellular ligand to a specific receptor, the intracellular transmission of the evoked signal, e.g., by phosphorylation cascades that amplify the signal, and might evoke adaptations of the cell
through changes in gene expression programs [2]. About two thousand genes in
the human genome encode receptors, kinases and phosphatases which participate in
numerous signaling cascades [29]. These pathways are highly interconnected, i.e.,
cross-talk, and which renders signal transduction networks highly complex [2]. Signaling networks are further connected to other cellular processes such as metabolism
and to regulatory networks. In case of metabolism, connections exist through the
dependency of the signaling pathways on energy and the utilization of common
components [29]. Mammalian signaling networks have been reconstructed for the
mammalian Toll-like receptor (TLR) signaling network [7], and the Jak-Stat signaling network [30].
1.4.1 Innate Immunity
In vertebrates, two immune systems exist, the innate and the adaptive immune system [31]. Innate immunity is the first line of defense, and provides rapid response to
the invasion of pathogens [32, 33]. The human TLR signaling network is involved
in both, innate and adaptive immunity [31, 34]. At least ten distinct TLRs have
been identified in humans [35]. These TLRs are involved in the response to various microbial components, such as lipopolysaccharide (LPS), lipoprotein, porins,
6
peptidoglycan, flagellin, single- and double-stranded RNA, and unmethylated CpG
oligonucleotides ([36] and references herein). Recognition of microbial components
by TLRs initiate signal transduction that leads to the activation of transcription factors that induce expression of cytokines and other genes [33, 37]. Individual TLRs
interact with different combinations of adapter proteins and activate transcription
factors such as NF-κB, activating protein-1 (AP-1), and interferon regulating factors (IRFs), to drive an immune response [31]. Activation of TLR signaling has
been observed in a number of human diseases including cancer, and tissue specific
differences in TLR expression and cell response to environmental stimuli have been
recognized as major challenge in cell signaling [38, 39, 40, 41, 42].
1.4.2 Reconstruction of human Toll-like receptor signaling network
The mammalian TLR signaling network [7] has been manually assembled based on
published literature and a comprehensive map of TLR signaling [43]. It accounts for
909 reactions and 752 distinct chemical components. A total of 14 Toll-like receptors, 49 distinct ligands (including many microbial components), and six possible
outputs have been considered. Overall, the functions of 158 protein-kinases and
16 phosphatases have been included in the TLR network. The outputs include NFκB, CRE, AP-1, reactive oxygen species (ROS) production, IRF3, and IRF7. The
metabolites that were part of the signaling network were among the most highly
connected network species [7]. Interrogation of the network let to the identification of ten distinct input/output (DIOS) pathways. The DIOS pathways were used
to predict potential candidates to selectively interrupt ROS production, IL-1, and
MyD88 pathways without compromising other DIOS pathways [7]. However, the
reconstruction did not include genes or GPRs, such that integration of omics data
into the network was not possible.
1.5 Metabolism
Metabolism is a vital cellular process and comprises thousands of enzymatic reactions that generate energy and metabolites used to support cellular functions [44].
The multitude of reactions are arranged into sequential biochemical pathways, which
are generally divided into anabolism and catabolism. Anabolism supplies the cell
with building blocks such as amino acids and nucleic acids for maintenance and
proliferation [2, 44]. Catabolism, on the other hand mediates the breakdown and
salvage of nutrients and cellular components for energy generation [2, 44]. Within
the cell, membranes separate metabolite pools, and a multitude of membrane trans-
7
porters is necessary to connect metabolic pathways in different cellular compartments.
Gene expression programs define the set of enzymes present in a cell, and can be
altered in response to changes in environmental, cellular or genetic conditions. In
humans, single cells and tissue types contribute only a specific subset of metabolic
functions into the systemic, whole-body metabolism. These differences arise through
differences in the expression of enzymes, isoforms and alternative splicing of transcripts that alter the utilization of reactions and pathways. Given its central role
in the maintenance and proliferation of cells, metabolism is strictly regulated and
alternations of metabolism have been connected to various human diseases [2].
The following sections will provide an overview of the human metabolic network
and human cancer as one of the most successful biomedical applications of COBRA,
and biological topic recurring in this thesis.
1.5.1 Human metabolic genome-scale reconstructions
Published in 2007, Homo Sapiens Recon 1 was the first genome-scale reconstruction of human metabolism [45]. It captured the functions of 2004 proteins, 2766
metabolites, and 3311 metabolic and transport reactions, which were assembled
in a bottom-up reconstruction process based on extensive amounts of literature.
Its pathways distribute over eight cellular compartments (cytoplasm, mitochondria, nucleus, endoplasmic reticulum, golgi apparatus, lysosome, peroxisome and
the extracellular environment). It was validated based on 288 metabolic functions known to appear in cells throughout the human body. Since its publication,
it has been extensively used as knowledge-base of human metabolism, to investigate general and cell-type specific metabolism, to close knowledge and network
gaps in human metabolism, and for data mapping and the generation of tissue
specific models, as well as to investigate human disease processes (Figure 1.2,
[24, 46, 47, 48, 49, 50, 51]). Further, Recon 1 and the tissue-specific networks
derived from it have been used to investigate host-pathogen or host-gut microbial
interactions [21, 52].
The process of network reconstruction is a laborious task. In order to extent the
scope and improve the predictability of Recon as the most comprehensive knowledgebase of human metabolism and for its various applications (Figure 1.2), the incorporation of newly emerging, additive, and corrective knowledge constitutes an ongoing
iterative process (Figure 1.4, see also section 1.5.3.).
As an expression of this iterative process, Recon 2 was recently published [53]. Re-
8
Figure 1.2: Applications of the human metabolic model.
con 2 was created in a community-driven effort, combining Recon 1 with four other
resources of human metabolism, namely EHMN [54], HepatoNet1 [55], Ac-FAO
module [56] and the human small intestinal enterocyte reconstruction [57]. It covers
a total of 1789 genes, 7440 reactions and 2626 unique metabolites distributed over
eight cellular compartments, and its predictive capability has been demonstrated,
e.g., through mapping of inborn errors of metabolism and different omics data sets
[53]. As further development, corrections and additions to the content of extracellular metabolite transporters in both Recon 1 and Recon 2 have recently been reviewed
[58].
The use of the human GENREs in combination with omics data sets will be discussed in more detail towards the end of this chapter.
9
1.5.2 Cancer as a metabolic disease
Cancer is a major burden for the health systems worldwide. In the US, estimations reveal that cancer is the cause of every fourth death [59]. Both primary cancer
tissue, and cell lines are used to unravel the mechanisms in cancer biology [60]. Tumors carry numerous and heterogeneous somatic mutations. As diverse these mutations might be, they frequently affect signaling pathways that regulate metabolism
[61, 62, 63]. This connection to metabolism seems straightforward, provided that
cancer cells proliferate at high rates, and each cell division requires the duplication
of the biomass and extensive amounts of energy. The importance of metabolic alternations in cancer was already noted when Otto Warburg described the differences in
the utilization of central metabolic pathways between cancer and normal body cells
[64]. Even though it is known today that oxidative phosphorylation is functional in
most cancer cells [65], remains the switch from mitochondrial respiration to aerobic
glycolysis connected to the high secretion of lactate, one of the most important observations of cancer metabolism to date, i.e., the Warburg effect. Extensive amounts
of glucose are thereby oxidized to pyruvate, which is subsequently converted into
lactate, and secreted to restore NAD+ and maintain the high glycolytic flux [66].
The amount of ATP produced by glycolysis easily exceeds mitochondrial oxidative
phosphorylation, yet cells depend on the constant supply of glucose [67].
A number of reasons as to why cancer cells might favor inefficient aerobic glycolysis have been discussed, including defective mitochondria, transformation under hypoxic conditions, the faster ATP production, upper limit on possible mitochondrial
density in the cytosol, and also the support of biosynthetic pathways and redox control through diversion glycolytic intermediates into pentose phosphate pathway and
one-carbon metabolism [61, 64, 68]. Support for the latter diversion of glycolytic
intermediates also comes from the cancer characteristic expression of the pyruvate
kinase isoform PKM2. This isoform slows down the glycolytic flux and thereby
supports distribution of glycolytic intermediates into adjacent pathways [61].
Besides glucose, cancer cell metabolism heavily relies on glutamine [66]. Once
inside the cell and converted to glutamate, it either enters glutathione biosynthesis
or TCA cycle as α-ketogluterate in a process called anaplerosis [61]. Anaplerosis
replaces the carbon lost through efflux of TCA cycle intermediates into biosynthetic
pathways, referred to as cataplerosis. The reasons for the addiction of cancer cells
to glutamine remains unresolved, yet genetic and micro-environmental factors have
been proposed [61, 66]. Reductive TCA cycle flux involving the NADPH associated
IDH1 has been observed in different cell lines, allowing them to direct glutamine
towards cytosolic lipid synthesis at least under hypoxic conditions [69, 70]. However, the micro-environment of cancer cells within tumors can vary greatly, and cells
within the tumor might face starvation due to lacking vascularization. It is therefore
10
not surprising to find that cancer cells depend on catabolic processes and use fatty
acids and ketone bodies [71].
The following section of this chapter, is in full a reprint from a section that appears
in Sahoo, S, Aurich, MK, Jonsson, JJ, Thiele, I (2014) Membrane transporters in
a human genome-scale metabolic knowledgebase and their implications for disease.Front. Physiol., 2014, 5:91. I was a contributing author of this publication,
and author of the part which forms the basis for the following section.
11
1.5.3 The importance of extracellular membrane transporters in Cancer
As described above, some of the metabolic characteristics of cancer cells are the
high uptake of glucose, aerobic glycolysis including the secretion of lactate (Warburg effect), and a high rate of glutaminolysis to compensate for the efflux of TCA
cycle intermediates into biosynthetic pathways [66]. Alternations in metabolite uptake (e.g., amino acids and glucose) and secretion through specific sets of metabolite transporters constitute key factors for how these continuously proliferating cells
meet their metabolic demands [72]. Redundancy and overlapping substrate specificity exist within and between metabolite transporter families. Cancer cells have to
operate sets of transporters that best nourish their metabolic dependencies. In fact,
the distinctive transporter expression between cancerous and normal cells could provide good opportunities for targeted treatment [72]. The contribution of transporters
in cancer discussed above has been reviewed elsewhere [72, 73, 74, 75, 76] and is
summarized in Table 1.1.
Coverage and accurate representation of transport systems are essential to perform
valuable simulations using COBRA. Recon 1 has been used for the generation and
analysis of cancer-specific metabolic models [24, 77, 78, 79] and has been recently
summarized [20, 51].
Of the 22 extracellular transporters (Table 1.1, individual bicarbonate exchanger
count as one) that play a role in cancer metabolic reprogramming and proliferation, 13 transporters are correctly represented in Recon 2 (Table 1.1), three need to
be modified, and four are still missing or require further curation. This section discusses the cancer relevant transporters currently missing or requiring revision (Table
1.1).
The pyruvate to lactate conversion is necessary to sustain a high glycolytic flux
[66]. The accumulation of lactate and a decreasing pyruvate level put cell survival
at risk due to increasing acidification of the cytoplasm. Cancer cells counteract the
decrease in intracellular pH by specific ion transport (i.e., bicarbonate and protons)
and lactate export via lactate/H+ symport, which is mediated by one of the four
MTC transporters (SLC16A1, GeneID: 6566; SLC16A7 GeneID: 9194; SLC16A8
GeneID: 23539; SLC16A4 GeneID: 9122). The high affinity lactate transporter
SMCT1 (SLC5A8, GeneID: 160728) favors the import of lactate [80] and is suppressed in a number of cancer cell types, as summarized in [72].
For example, SLC5A8 is silenced by methylation in human astrocytomas and oligodendrogliomas [81] and in primary colon cancers and colon cancer cell lines [82].
In addition to its transporter function, the SLC5A8 protein has a demonstrated role
in tumor suppression through the active import of endogenous inhibitors of histone
12
acetylases (HDACs) (i.e., butyrate, which originates from gut microbes, and pyruvate [83, 84]).
Recently, SLC5A8 was shown to counteract tumor progression independent from its
transport function. Instead, SLC5A8 acts through an unknown mechanism involving a decrease in the anti-apoptotic protein survivin [85]. Recon 2 includes passive
iodide transport via SLC5A8 and the Na+ -coupled transport of lactate, pyruvate,
and the short-chain fatty acids acetate, propionate, and butyrate (Table 1.1, [86]).
Hence, these data were added in the transportmodule. SLC5A8 was not included in
Recon 2, most likely because this protein has been mainly discussed in the context
of cancer. ABC transporters mediate the efflux of cytotoxic drugs, causing multidrug resistance (MDR) and chemotherapy failure [76, 87].
Two of the four major drug transporters, MDR1 (ABCB1, GeneID: 5243) and ABCG2
(ABCG2, GeneID: 9429), are missing in Recon 2. Both are known to be overexpressed in different cancer types [76].
A subpopulation of cancer cells with enriched stem cell activity, so called side populations (SPs), have been extracted from six human lung cancer cell lines (H460,
H23, HTB-58, A549, H441, and H2170). When tested for an elevation in ABC
transporter expression, all of the SPs displayed a significantly higher mRNA expression for ABCG2 compared to their non-SP counterparts [88]. Four SPs also
showed a significantly higher expression for MDR1 transporters. All six showed
resistance to exposure to different chemotherapeutic drugs. The survival of such
cells with stem cell activity upon drug treatment could be connected to a relapse
in vivo [88], and ABC transporter expression might be an indicators for this cancer cell phenotype. Strong expression of aquaporins has been observed in various
tumors, especially aggressive tumors [74]. Some aquaporins are exclusively expressed in malignant tissue [74]. The aquaglyceroporin aquaporin-3, AQP3 (AQP3,
GeneID: 360), which also transports glycerin in addition to water, is expressed in
normal epidermis and overexpressed in basal cell carcinoma and human skin squamous cell carcinomas [89]. AQP3-facilitated glycerol transport was found to determine cellular ATP levels and therefore be important for hyperproliferation and tumor cell proliferation in epidermal mice cells [89]. Correspondingly, the resistance
of AQP3 null-mice toward skin tumors might arise through reduced tumor cell glycerol metabolism and ATP generation [89]. This property renders AQP3 inhibition a
possible target for the prevention and treatment of skin, and possibly other, cancers
associated with aquaglyceroporin overexpression [89]. AQP3 is currently missing
in Recon 2 and covered in the transport module. Although many of the transporters
associated with cancer are present in Recon 2 (Table 1.1), important mediators of
intra and extracellular pH, drug resistance, and proliferative energy metabolism are
still missing.
13
Table 1.1: Yellow shading indicates genes encoding either absent transport proteins
or transport proteins with limited substrate specificity in Recon 2. Blue shading
indicates improvement in the transporter data (either the addition of the protein
and its associated reactions, the expansion of its substrates, or modification of the
GPRs) in Recon 2 over Recon 1.
Entrez Gene ID
4363
6510
6513
6515
6566
6584
8884
9123
9194
23539
23657, 6520
80704
154091
6523
8140, 6520
11254
360
5243
9429
160728
Transporter
MRP1 (ABCC1)
ASCT2 (SLC1A5)
GLUT1
GLUT3
SLC16A1/MCT1
OCTN2 (SLC22A5)
SMVT
SLC16A4/MCT4
SLC16A2/MCT2
SLC16A3/MCT3
xCT(SLC7A11)/ 4F2hc (SLC3A2)
SLC19A3
GLUT12
SGLT1
LAT1 (SLC7A5)/ 4F2hc (SLC3A2)
ATB0,+ (SLC6A14)
AQP3
MDR1 (ABCB1)
ABCG2
SLC5A8/SMCT-1
Relevant cargo
Xenobiotics, cytotoxic drugs
Glutamine
Glucose
Glucose
Lactate
Carnitine
Biotin
Lactate
Lactate
Lactate
Cysteine
Thiamine
Glucose
Glucose
Glutamine/cysteine antiport
Proteinogenic amino acids except glutamate and aspartate
Glycerol
Xenobiotics, cytotoxic drugs
Xenobiotics, cytotoxic drugs
Lactate, pyruvate, and butyrate (gut microbes)
Reference
[76]
[72, 73, 90]
[72]
[72]
[72, 90]
[79, 91]
[92]
[72, 90]
[72, 90]
[72, 90]
[72, 73]
[93]
[72]
[72]
[72, 73]
[72]
[74, 89]
[76]
[76]
[72, 83, 84]
The addiction of cancer cells to glucose and glutamine, and the expression of distinct isozymes frequently detected in cancer cells, altogether demonstrate the important role of metabolism in cancer, and emphasize the potential for metabolic targets
in cancer therapy [51, 94].
1.5.4 Using COBRA to investigate cancer metabolism
Because metabolism constitutes a central part in the disease, Recon 1 and COBRA
provide ideal frameworks for the investigation of cancer [20], which has been reviewed extensively [51, 95, 96, 97]. Since 2010, a growing number of COBRA
studies investigated the Warburg effect and other aspects of cancer metabolism
[24, 77, 78, 94, 98, 99, 100, 101].
The first study of cancer metabolism used a small model which captured only the
most experimentally studied pathways in cancer namely glycolysis, TCA cycle,
pentose phosphate pathway, glutaminolysis and oxidative phosphorylation. This
model was able to represent the physiological conditions in Hela cells and predicted lactate dehydrogenase and pyruvate dehydrogenase as metabolic drug targets
[94]. Further, solvent capacity constraints, i.e., the limit of mitochondrial density in
14
the cytoplasm, evoked a glucose uptake dependent dichotomy of metabolic regimes
in a reduced flux balance model of ATP production. This dichotomy consisted in
a switch from oxidative phosphorylation to aerobic glycolysis [68, 102]. The important role of enzyme mass restrictions at high proliferation rates, and potentially
the emergence of the Warburg effect was consolidated by another group, using Recon 1 and a cancer biomass objective function [98]. Whereas the above mentioned
results were obtained making use of the normal glycolysis, pointed further work towards the existence of different pathway alternatives in cancer cells [100]. This was
the redistribution of metabolic flux into an alternative glycolytic pathway with net
zero ATP production that involved reactions in the serine biosynthesis, one-carbon
metabolism, and the glycine cleavage system (SOG pathway) [100]. It was further
predicted that aerobic glycolysis arises from solvent capacity limits in cancer and
proliferating normal muscle cells equally, both on a small-scale and a large-scale
model. Aerobic glycolysis provided higher ATP yield per volume density than mitochondrial oxidative phosphorylation [68]. Tedeschi et al. (2013) further investigated the predictions of ATP generation through the SOG pathway. Their results
supported the view that the SOG pathway supports cancer proliferation with ATP,
NADPH and purines [101].
The first genome-scale model of cancer metabolism was derived from Recon 1 [45]
using a version of the Model Building Algorithm (MBA) [24, 103]. MBA was further applied to generate a non-small cell lung cancer model using multiple gene
expression data sets, which showed a predictive superiority for cell line specific,
growth-supporting genes compared to the generic cancer model [24]. The generic
cancer model was used to predict synthetic lethal gene pairs as potential drug targets of which a subset was non-toxic to the global model (Recon 1). Succinate
dehydrogenase (SDH) and fumerate hydratase (FH), both frequently mutated in different cancer types, were both predicted to be synthetically lethal with pyruvate
carboxylase (PC). PC was a valid therapeutic target to specifically target SDH and
FH deficient cancer cells [24]. In a follow up study, lethal synergy between FH and
enzymes of the heme metabolic pathway were experimentally validated to provide
insight into the so far unresolved mechanism by which FH deficient cells survive a
non-functional TCA cycle caused by the mutation in the fumerate hydratase gene,
e.g., in renal-cell cancer (HLRCC) [77]. Gatto et al. (2014) used cancer type specific metabolic models as an estimator to confirm the reduced metabolic network of
ccRCC cancers. The authors observed unique metabolic reprogramming in ccRCC
based on transcriptomic and proteomic data that was not shared by any other tumor tissue [104]. A kidney cancer model that was majorly based on data of ccRCC
cells [99] was reduced in size (20% of reactions and 35% of genes) and metabolic
functionality compared to a normal kidney model [99] and other cancer models
reconstructed in the same way. This reduction was in line with the observed downregulation of genes in metabolic pathways in ccRCC [104]. Taken together, these
studies reveal that the use of COBRA in the field of cancer research has developed
15
along with the evolving views in the field, from the earliest studies investigating the
Warburg effect as prevalent hallmark of cancer at the time, and trying to answer
questions as to why aerobic glycolysis might provide an advantage to cancer cells
[68], and has come to a point where it has even been applied one step further than
mere search for the source but to prediction potential treatment possibilities [24].
1.6 High-throughput data
Cellular phenotypes differ although all human cells carry the same genetic information. A cell type specific set of genes is transcribed and mRNAs subsequently
translated into proteins which, once activated, can carry out their (catalytic) function. Proteins however, are subject to degradation processes, or might be inactivated
otherwise, since abundance and activity of enzymes and proteins within the cell is
tightly regulated [105]. Phenotypic and functional differences of the about 200 human cell types arise from regulation of gene expression and active cellular protein
contents, pathway, and reaction fluxes (Figure 1.3, [106]).
Figure 1.3: Omics data sets provide a snap-shot of the cellular components at large
scale. These data sets are ideal for interrogation of cellular network from a systems
perspective.
High-throughput (HT) technologies measure a multitude of cellular components at
a time, and provide snap-shots of the cellular network on the level of DNA, RNA,
protein and metabolites at distinct environmental conditions, in health and disease
(Figure 1.3).
16
1.6.1 Transcriptomics
The cellular transcriptome comprises all transcripts including mRNAs, non-coding
RNAs and small RNAs of the cell at a specific condition [107]. RNA is an important determiner of the molecular constituents of a cell and quantification of transcript expression is often applied to define differences between normal and disease
conditions [107]. Alternative splicing is an important variation in the transcriptome
and greatly contributes to biological complexity [108]. It generally describes the
process where multiple transcripts of distinct lengths are produced from one gene
locus, e.g., by joining different numbers of short exons after the removal of ’noncoding’ intron sequences from the pre-mRNA [108]. The generation of transcript
isoforms from the same gene through alternative splicing is known to differ between
tissues, developmental or disease conditions [108]. Cancer specific PKM2 isoform
expression has gained much attention [109].
Methods to define the transcriptome include custom-made and commercial microarrays, which are incubated with fluorescent labeled cDNA and which can cover the
’entire’ transcriptome at a time, or sequence-based RNA-Seq which relies on deepsequencing technologies [106, 107]. Hybridization-based approaches are relatively
inexpensive. However, they depend on pre-defined gene coding sequences, crosshybridization leads to a high background noise [107]. RNA-Seq also allows highthroughput and quantitative determination of the entire transcriptome surpassing the
shortcomings of the microarray [107].
1.6.2 Proteomics
The proteome comprises the entity of proteins in a cell, which among others, participate in the signaling cascades and catalyze metabolic reactions. Until the emergence
of mass-spectrometry (MS), two-dimensional gel electrophoresis was used for protein analysis [106]. MS has further been applied to elucidate post-translational modifications and protein interactions [106]. Additionally, high-resolution MS-based
proteomics has enabled quantitative determination of the entire cellular proteome
[106, 110, 111]. To determine peptide sequence, quantity, or to identify proteins,
protein samples are digested to peptides, chromatographically fractionated, and
fragmented into fragment-ion spectra in the mass spectrometer. The spectra are
subsequently analyzed [112]. Challenges of MS exist in the lacking possibility to
amplify protein samples, the large number of proteolytic peptides produced through
digestion, and the limitations to identify proteins based on existing resources [106].
17
1.6.3 Metabolomics
The metabolome comprises of the low molecular weight chemicals, and the comprehensive profiling of metabolites in biofluids, tissues or cells [113]. It is the youngest
of the omics techniques, stable, relatively cheap, and highly reproducible [113].
Additionally, it directly profiles the cellular phenotype [113], and thereby represents
the most straightforward resource for the integration with metabolic networks. Hundreds to thousands of metabolites can be identified and quantified [114]. Nuclear
magnetic resonance (NMR) spectroscopy, liquid chromatography-mass spectrometry (LC-MS) and gas chromatography-mass spectrometry (GC-MS) are the most
common analytical methods in metabolomics [113]. Additionally, the potential for
pharmaceutics and individualized drug therapy has been pointed out [113].
1.7 Analysis of omics data in the context of COBRA
models
The development of methods to analyze omics data sets, reduce their complexity for
interpretation, and to make the contents accessible to even non-biochemical experts,
is of major interest to a broad research community. Biochemical reconstructions can
provide a context for the analysis of omics data sets, since they provide a mechanistic and biologically well defined framework for different kinds of experimental data
[2]. Generally, there are multiple ways to combine COBRA models and omics data
(Figure 1.2). First, the topology of the reconstruction can be used for the structured
visualization of the data, e.g., by pathways, as to facilitate interpretation. As an
example, Recon 1 was used for the interpretation gene expression data revealing the
effects of gastric bypass surgery on skeletal muscle metabolism [45]. Comparison
of the size and pathway topology has further been used for the verification of a reduced metabolic network in ccRCC cancer [104]. Another way to use the network
topology for the interpretation of omics data is to identify correlations that could
not be derived from the data lone. Analysis of SNPs in the network context revealed
examples where SNPs with similar pathological impact were mapped to reactions
that belonged to the same correlated reaction set (reactions with 100% correlated
activity) [115]. Second, the predictive nature of COBRA models can be exploited.
The omics data are hereby used to formulate additional constraints to reduce or alter
the shape of the solution space [116]. One example was the integration of quantitative extracellular metabolomic profiles of yeast cells into the network context [117].
Based on sampling analysis, models tailored to different environmental conditions
were compared and changes in intracellular metabolic flux states were identified
along with the regions of the metabolic network that were perturbed [117]. Finally,
18
omics data can be used to generate cell type or condition specific metabolic models.
These constitute subnetworks of the global human models, and the decision as to
which reactions to keep and which ones to discard are made based on the data and
different criteria outlined below (Table 1.2, Figure 1.4).
1.7.1 Methods for network contextualization
A number of algorithms have been published for the integration of omics data sets
with GENREs, which have previously been summarized (Table 1.2, [129, 130]).
These methods mainly emphasized on the integration of transcriptomic and proteomic data. They differ in whether or not they depend on a defined objective function, whether exchange profiles need to be predefined for the target cell type or is
predicted as part of the output model [47, 131], and in which form the input data
is compiled and incorporated. The decision as to which of the algorithms to chose,
depends on the data set at hand. Methods that depend on an user-defined threshold to distinguish reaction activity from inactivity (through GPRs) can be applied
even if only data from one condition is available. Others require data from multiple
conditions (Table 1.2). It is important to distinguish cell type and condition specific models. Cell type specific models should be able to perform the entire range of
metabolic functions, and can only be build through compilation of a large set of data
sets, comprising many different conditions. This requires substantial manual curation based on literature to ensure cell type specific functionality. Tissue (or organ)
specific models capture the metabolic functions of cells that comprise the tissue.
Condition specific models have to be distinguished from the former subnetworks,
since they capture only a subset of cell type specific metabolic functions, namely
those active under a particular set of environmental conditions. Based on their limited scope, these subnetworks can be build from single (extracellular metabolomic
or transcriptomic) data sets, either from the generic human metabolic model (chapters 2 & 3) or from cell type specific metabolic or signaling models (chapter 4).
However, noise in the data will play a role in models that are build from a single
data set, which calls for robust methods to build and analyze such condition specific
models. Figure 1.4 illustrates the cycle for the generation, analysis and validation
of condition specific models. This cycle comprises data generation followed by the
extension and refinement of the starting GENRE to enable best possible mapping of
the data to the model. Subsequently, the data is integrated into the model context
giving rise to the condition-specific metabolic models, using one of the published
methods (Table 1.2), or approaches introduced herein (Chapters 2-4). Analysis of
the generated models, and potentially comparison of the model predictions to additional data, ideally results in testable hypothesis, which can lead to generation of
additional data and subsequent cycles of the systems approach (Figure 1.4).
19
Figure 1.4: The ideal systems biological approach using human metabolic GENREs
and omics data integration describes a circular process, where iterative cycles involve and benefit both, experimental discovery and the development of an increasingly comprehensive model that can give rise to high-precision predictions of the
healthy and disease perturbed states of biochemical systems.
Algorithms like GIMME and iMAT were designed to support completely automated
subnetwork generation (Table 1.2), and to achieve this despite general noisiness of
transcriptomic data, and post-transcriptional and post-translational regulatory impact (Figure 1.3) [47, 131]. These algorithms predict post-transcriptional or posttranslational regulation based on the model context. In case of GIMME producing
a functional model with regard to the stated objective function [131]. More recent
algorithms, such as MBA and FASTCORE require the manual definition of high
confidence reaction sets, around which functional models are build. This development emphasizes that manual work and biological insight is needed in addition to
the data sets in order to ensure the quality of the generated models [103, 126].
1.7.2 Human cell-type specic metabolic models
The number of cell-type specific metabolic models is constantly increasing. To date,
condition and cell-type specific metabolic models have been reconstructed for many
human tissue and cell types. Most of these networks have been generated using one
of the above discussed algorithms and omics data sets, especially transcriptomic
and proteomic data (Table 1.2) [46, 47]. In humans the reconstructed cell types
20
include brain [49], heart [125] and cardiomyocyte [121], liver [55, 103], kidney
[120], macrophage [21, 119], red blood cell [132], and enterocyte [57]. Among
these models, the enterocyte has entirely been reconstructed in a bottom-up process
without consideration of omics data sets [57]. Moreover, reconstruction efforts have
generated high numbers of metabolic cell line models of normal and cancer tissues
[99, 128].
Apart from single cell type models, multi-cell assemblies have also been reconstructed for human brain cells [49], and whole-body systems physiology comprising of adiposite, hepatocyte and myocyte [133]. These models have been applied
to interrogate the metabolic aspects in diverse human disease conditions, such as
cancer [99], neurodegeneration [49] and diabetes [133].
1.7.3 Integration of metabolomic data sets
A major part of this thesis deals with the integration of metabolomic data into
the network context and the generation of condition specific metabolic models.
Metabolomic data can be integrated with metabolic networks as qualitative, quantitative, and thermodynamic constraints [117, 134, 135, 136] to increase the precision of the model predictions [116]. The capacity of Recon 2 for the integration
of metabolomic data has been demonstrated based on published data of the NCI-60
cell line collection [53, 137]. A number of the recent algorithms considers the use
of metabolomic data (Table 1.2). For mCADRE, metabolomic data are discussed
as potential clues and for the definition of metabolic functions that are checked for
during network pruning [128]. (t)INIT allows the inclusion of metabolomic data as
clues for the model building, and the ability of the final model to produce detected
metabolites [99, 127]. In addition, MBA and FastCORE depend on the a priori definition of core reaction sets which includes the consideration of metabolomic data
[103, 126]. Recently, Gene Inactivation Moderated by Metabolism, Metabolomics
and Expression (GIM(3)E) became available, which enforces minimum turnover of
detected metabolites [138]. As an example, Chang et al. (2010) used gene expression data and the GIMME algorithm to generate a kidney model, and set exchange
constraints based on literature clues and metabolomic data dived from the Human
Metabolomics Database (HMDB) [120].
One example that uses a multi-omics approach including metabolomic data is Cakir
et al. (2006). The authors used a small set of metabolites to constrain a yeast model,
which they subsequently used to identify reporter reactions associated with changing metabolite levels, as a consequence of environmental or genetic perturbations.
The reporter reactions were subsequently related to transcriptomic data to infer different forms of regulation [139].
21
Metabolomic data has further been applied for refinement during the model building process. Selvarasu et al. (2012) published a framework for the integrated analysis of fed-batch culture and metabolomic data with in silico modeling, in order
to aid quantitative improvements in the industrial production of recombinant therapeutics in Chinese Hamster ovary (CHO) cells [140]. After the reconstruction of a
metabolic CHO model from a mouse model, metabolomic data was used to improve
the model [140]. The authors simulated the experimental condition constraining the
model according to uptake rates of nutrients: glucose, glutamine, amino acids and
oxygen, as well as secretion rates of cell biomass, IgG, ammonia, lactate and CO2 .
Insights into metabolic pathways involved in growth limitation in these cells were
gained from combination of experimental metabolite trends and flux data obtained
from the model [140].
The integration of metabolomic data was further used to drive metabolic discovery.
Recon 1 in combination with urine metabolomic profiles and transcriptomic data
were used to predict novel putative endogenous substrates of the OAT1 transporter.
This was accomplished by comparing predictions from models constrained based on
metablomic or transcriptomic data from wild-type and mOAT1 knock-out mice. Intermediates of the polyamine pathway were subsequently experimentally confirmed
as putative substrates of mOAT1 [141]. Recon 1, FBA and published metabolite
uptake and secretion rates [137] were further used to support findings derived from
LC-MS-based isotope tracer studies and a metabolic flux model, and congruently
highlighted oxidative phosphorylation as major contributor to ATP production (on
average 84% across the NCI-60 cell lines) in cancer cells [142].
Taken together, metabolomic data find variable applications in numerous recent
biomedical studies, and this diversity is likely to expand.
1.7.4 COBRA for biomedical applications and personalized health
COBRA models are increasingly applied to biomedical questions that by far exceed
cancer. In the course of this PhD, a number of studies applied COBRA to personalized medicine. Jamshidi et al. (2011) analyzed differences in serum metabolome
profiles of a Hereditary Hemorrhagic Telangiectasia (HHT) patient versus non-HHT
controls using Recon 1 [143]. Recon 1 took thereby the role of the whole body
metabolic network (including all the organs), and the differences in the plasma were
interpreted as the net metabolite changes (uptake and secretion) mediated by cells
throughout the human body. The differences in the metabolic profiles were integrated through differential scaling of the coefficients of a non-growth associated
biomass objectives, which distinguished the HHT-patient and the controls. Subsequently, the authors used flux span ratios (FVA) to identify decreased energy
22
production and increased flux potentials in nitrogen handling and disposition pathways in the HHT patients, which was linked to an anti-VEGF drug (bevacizumab
(Avastin)) [143]. After treatment, the HHT metabolomic profiles of patient and
controls became more similar as compared to the pre-treatment HHT sample [143].
Concerning the relevance of the steady-state assumption the authors argued, that
since the plasma profiles were derived after over-night fasting, the body would be
close or at homeostatic state [143].
Others combined transciptomic, proteomic and uptake rates and plasma and white
adipose tissue lipid concentrations (metabolomic data) to generate and analyze a
metabolic adiposite model. The authors used the model to investigate metabolic
alternations in adiposites that would allow the stratification of obese patients [144,
145]. The set of differentially expressed genes between lean and obese males and females was used for the model building and additionally correlated with the predicted
reporter metabolites. Their predictions coincidence with the differential transcriptional (down-) regulation of the mitochondrial pathways in obese. Consequently
the authors proposed to increase mitochondrial acetyl-CoA as potential therapeutic
target to decrease fat in the patients [144]. Additionally, the androsterone level in
plasma was suggested as biomarker for metabolic alternation in these patients [144].
Another study from the same group predicted new potential drugs (anti-metabolites)
to target hepatocellular carcinoma (HCC) specifically, while sparing normal cells
[127]. Six HCC patient specific metabolic models, a generic HCC model and 83
normal tissue models were generated, through integration of proteomic data with
the human metabolic reactions database (HMR) 2.0 and tINIT [127, 146]. Among
the predicted anti-metabolites was L-carnitine, which was shown to selectively inhibit growth in HepG2 cells [127].
Although the first steps towards personalized health supported by COBRA and
omics data integration have been performed, is the use of metabolomic data sets
still underrepresented. The integration of metabolomic data is often limited to few
uptake secretion constraints, or used otherwise (e.g., to define non-growth associated objective, or define high confidence reaction sets for the model building), but
seldom the primary source for the generation of condition-specific subnetworks,
with the study of Mo et al. being one exception [117].
1.7.5 Existing challenges
Methods for the use of metabolomic data for subnetwork generation are less well
developed, and further approaches are needed to make better use of these data, as
well as to simplify integration of multiple omics data sets. Great potential exists
for the interpretation of extracellular metabolomic profiles in the context of human
23
GENREs, in particular for diagnostics and personalized health. However, the integration of biofluid metabolomic profiles is difficult due to the uncertainty of the
actual cellular origin of detected metabolites (within the human body), and the fact
that only limited sets of metabolites are detected, while the exact composition of the
fluids remains unknown. One question is therefore, whether it is possible to overcome the uncertainty connected these metabolomic data sets and provide insight
into the metabolic mechanisms using these data?
24
25
All user defined, high-probability reactions, a maximum number of medium-probability reactions, no
and a minimal number of reactions comprise the functional output model. This minimal but consistent output model is compiled based on confidence values assigned to each reactions. The confidence
values are based on the frequency of appearance of a reactions in the 1000 candidate models, each
generated with a random pruning order.
Searches for a flux consistent subnetwork which contains all user defined core reactions and a mini- flux consistency
mal set of additional reactions.
Finds a sub-network by maximizing the sum of evidence scores, and provides a connected and flux consistency
functional model. All the included reactions should be able to carry flux. Additionally, production
of specified metabolites by the output model is ensured.
MBA
mCADRE
tINIT
INIT
FASTCORE
MADE
PROM
E-flux
In comparison to the preceding version (INIT), which delivered a connected and consistent network,
generates tINIT functional networks based on user-defined, cell type specific set of metabolic functions. The algorithm defines the reaction set necessary for the realization of the specified metabolic
tasks, in case the resulting model misses to perform a task in the test phase, gab-filling is applied
to ensure the functionality of the output model. Additionally, the output model has only irreversible
reactions. Compared to INIT it is optional if net production of metabolites is allowed.
Generates a subnetwork by removing reactions that are not part of the high-confidence core reaction
set, which is defined based on gene expression data, and connectivity clues, while preserving flux
capacity of core reactions, and defined metabolic functionality.
Creates models that each satisfies a proteome-based objective which are combined into one final
GIMMEp model.
Maximizes the number of enzymes whose flux activity is consistent with their measured expression
level (high flux to highly expressed and low flux to lowly expressed) along with pathway length.
Enzyme expression levels are considered as cues rather than fixed determinants of enzyme activity
and connected flux. Post-transcriptional regulation is assumed to be the difference between measured
expression level and predicted flux. Predicts tissue-specific metabolite exchanges.
Predicts metabolic states by setting maximum flux constraints as a function of measured gene expression. Reactions associated with lowly expressed genes are tightly constrained, and those associated
with highly expressed genes subject to loose constraints.
Generate integrated metabolic-regulatory networks. Requires a metabolic network, a regulatory network, gene expression data from different conditions, and additional regulatory interactions. It uses
probabilities, which are estimated from the expression data and different conditions, to represent
gene states and gene–transcription factor interactions.
Matches most closely the genes that exhibit the most statistically significant changes in gene expression levels. It creates a series of models with high accuracy with respect to direction of differential
expression.
GIMMEp
Solver
or
pro- gene expression
P/A calls
MILP
LP
literature-based knowledge, multi-omics
tailored for the use of
HPA (proteomic), transcriptmic data, observed
metabolites
proteomic
LP
a high-confidence core re- LP
action set, based on expression evidence
arbitrary scores for high, MILP
medium, low and absent
proteins (color codes that
in the HPA
MILP
core set of reactions
literature-based knowl- highand
medium- MILP
edge,transcriptomic,
probability reactions
proteomic, metabolomic
and phenotypic data
transcriptomic
teomic
transcriptomic
P/A calls based on user no
defined threshold
Input format
126 human tumor and [128]
normal tissue and cell
types [128]
personalized HCC mod- [127]
els [127]
liver and macrophage [126]
[126]
69 human cell types and [99]
16 cancer types [99]
transition from fermen- [124]
tative to glycerol-based
respiration in S. cerevisiae [124]
human heart [125], liver [103]
[103], cancer metabolic
model [24, 77],
E. coli and M. Tubercu- [123]
losis [123]
[105]
[47], [122]
[119]
[46]
[118]
Examples for models de- Reference
rived
metabolic behavior in S.
cerevisiae batch cultures
[118]
transcriptomic or pro- P/A calls
LP murine
macrophage
teomic
[119],
human
macrophage
[21],
brain cells [49], kidney
[120]
transcriptomic and pro- P/A calls
LP
murine
macrophage
teomic
[119]
transcriptomic or pro- high, medium and low ex- MILP murine
macrophage
teomic
pression
[119],
human
macrophage
[21],
brain cells [49], cardiomyocyte [121]
transcriptomic
gene expression levels
LP
M. tuberculosis bacterium [105]
flux capacity of transcriptomic
core reactions
and metabolic
functions
metabolic tasks
yes
yes
yes
proteome-based
OF
no
Functional flux is defined through FBA, and active and suppressed reactions are defined based on yes
the GPR associations. Subsequently inactive reactions are removed. Removed reactions required for
the predetermined functional flux are reinserted to produce a functional submodel. Incorporates the
idea of post-transcriptional regulation, but relies on arbitrarily chosen flux distribution (FBA).
GIMME
iMAT
Flux through reactions associated with absent genes are constrained to zero.
Binary
Objective
Data
function (OF)
required
no
transcriptomic
Table 1.2: Methods for network contextualization.
Description
Method
1.8 Preview of this thesis
The integration of transcriptomic and proteomic data with metabolic models is well
established and frequently applied for biomedical applications. However, the contextualization with metabolomic data is less well explored. This thesis advances the
integration of omics data by demonstrating the use of both quantitative and semiquantitative extracellular metabolomic data for the inference of internal metabolic
network states from extracellular metabolomic profiles. Furthermore, integrative
analysis of multiple omics data sets (e.g., metabolomic and transcriptomic data) is
carried out. That cell-type specific differences not only exist in metabolic networks,
but also concern innate immune signaling, is demonstrated along with the contextualization of a signaling network.
The chapters of this thesis focus as follows:
• Chapter 1: This chapter describes the background on constraint-based modeling, state-of-the-art omics data integration, and biomedical applications.
The text in chapter 1 is in part a reprint from a section that appears in Sahoo, S, Aurich, MK, Jonsson, JJ, Thiele, I (2014) Membrane transporters in
a human genome-scale metabolic knowledgebase and their implications for
disease. Front. Physiol., 2014, 5:91. I am a contributing author of this publication, and author of the part which forms the basis for the marked section
of this chapter.
• Chapter 2: A large collection of cancer cell line specific models is generated
from quantitative extracellular metabolomic profiles. The models are classified into distinct metabolic phenotypes based on their metabolic strategies and
robustness of the models towards environmental and genetic perturbations is
explored.
The text of chapter 2 is in full a reprint of the manuscript: Aurich, M.K.,
Fleming, R.M.T., Thiele, I. Metabolic heterogeneity and robustness among
the NCI-60 cancer cell lines. Manuscript in preparation. I am the first author
of this manuscript, which forms the basis for this chapter.
• Chapter 3: This chapter demonstrates the integrative analysis of semi-quantitative
extracellular metabolomic profiles and transcriptomic data for two lymphoblastic leukemia cell lines. Interrogation of the inferred internal metabolic network through sampling analysis reveals differences in the use of the internal
metabolic networks that are supported by experimental data. Additionally,
high incidence of differentially expressed and alternatively spliced genes at
26
rate limiting and commitment steps is observed.
The text in chapter 3 is in full a reprint of the manuscript: Aurich, M.K.,
Paglia, P., Rolfsson, Ó, Hrafnsdóttir, S., Magnúsdóttir, M., Stefaniak, M.M.,
Palsson, B.Ø., Fleming, R.M.T., Thiele, I. Prediction of intracellular metabolic
states from extracellular metabolomic data. (2014) Metabolomics, 1-17. I am
the first author of this publication, which forms the basis for this chapter.
• Chapter 4: The cell-type specific differences in TLR signaling and the relevance of the network to human diseases is investigated. A gene set is identified and GPRs are formulated for the proteins of the network. Contextualization of a TLR signaling network towards a cell-type specific TLR signaling
network and further towards a condition specific, LPS activated TLR signaling network is demonstrated. Finally, prediction of the the energy costs of
respective input-output pathways describes one link between signaling and
metabolic network.
The text of chapter 4 in part is a reprint of the material as it appears in
Aurich, M.K. and Thiele, I. (2012) Contextualization Procedure and Modeling
of Monocyte Specific TLR Signaling. PlOS ONE, 7, e49978. I am the first
author of this publication, which forms the basis for this chapter.
• Chapter 5: Conclusions and future directions
27
2 Metabolic heterogeneity and robustness
among the NCI-60 cancer cell lines
The role of metabolic alternations in human diseases is increasingly recognized and
methods are needed that allow fast inference of intracellular metabolic states from
extracellular metabolomic profiles. Herein, we generate a set of 120 cancer cell line
models based on quantitative, extracellular metabolic profiles using a novel computational method. We explore the metabolic heterogeneity inherent to these cancer
cell line models. We provide, for the first time, a systematic assessment of metabolic
strategies that the cancer cells may apply to generate energy and cofactors. We observe different oxotypes, which describe distinct ranges of feasible oxygen uptake
rates. The power of the presented approach was to reflect such distinct phenotypic
behavior solely based on extracellular samples, and despite the uncertainty connected to the medium composition due to the serum. Similar approaches, applied to
patient data could be a milestone for clinical applications.
2.1 Introduction
The incomplete oxidation of glucose to lactate under normoxic conditions [64]is
referred to as aerobic glycolysis. It has been a main focus of cancer research during past decades [147]. However, this irrevocable view is increasingly replaced by
the notion that cancer cells employ heterogeneous metabolic strategies beyond aerobic glycolysis [69, 71, 148, 149]. Many cancer cells generate substantial amounts
of their energy through mitochondrial oxidative phosphorylation [142, 147, 150].
Moreover, cancer cells use additional fuels, such as glutamine and fatty acids, to
support proliferation [71, 151]. These carbon sources can yet again be used in
different ways, e.g., different parts of the tricarboxylic acid (TCA) cycle can be
employed for glutaminolysis [69, 150, 152, 153]. Reductive carboxylation involves
only two TCA cycle reactions run in reverse direction and without producing energy, whereas glutaminolysis in forward direction does yield energy [69, 150, 153].
Apart from different metabolic strategies, cancer cells display distinct robustness towards environmental changes, e.g., nutrient supply or oxygenation [154, 155, 156].
Despite the evident phenotypic differences among cancer cells, no comprehensive
assessment exists of the metabolic heterogeneity among cancer cell lines, as well as
their flexibility towards environmental changes.
29
Metabolic models can be developed using constraint-based modeling and analysis
(COBRA), and comprise comprehensive knowledge-bases of the metabolic network
of an organism [2, 9]. COBRA relies on physico-chemical principles and assumes
a steady-state of the modeled system [2]. Constraints (e.g., limitation of metabolite
uptake and secretion) can be added to increase the precision of the model predictions by eliminating network states that exceed the constraints [116]. A human reconstruction is readily available [45, 53], along with numerous analytical methods
to investigate the metabolic differences that arise through the imposed constraints
[15, 16]. Metabolomic data derived from body fluids and cell culture supernatant
have previously been integrated into metabolic reconstructions [117, 142, 144]. One
existing challenge nevertheless remains the handling of serum, or data derived from
cells grown with serum, since the exact composition is unknown. As a consequence,
the model cannot be adequately constrained. Despite these difficulties, approaches
that allow rapid classification of metabolic strategies from metabolomic profiles
could have a broad impact on both researchers and clinicians.
Recently, liquid chromatography-tandem mass spectrometry (LC-MS) was used to
determine the metabolites that were consumed and released by the NCI-60 cancer
cell lines [137]. Through combination of the obtained metabolomic profiles with
doubling times and transcriptomic data, rapid proliferation was associated with cellular glycine requirements [137]. However, the intracellular pathways that gave rise
to distinct metabolomic profiles remained largely a black-box. This data set was,
because of its comprehensiveness [137], particularly well suited to define metabolic
differences among cancer cells at large scale.
Herein, we developed and applied a novel method, deemed minExCard, for the inference of internal metabolic states from extracellular metabolomic data in the context of the metabolic model, while dealing with the uncertainty connected to serum
composition. We applied this method to generate 120 cancer cell line models from
extracellular metabolomic data and found that the models exhibited a high metabolic
heterogeneity in silico and distinct level of robustness when perturbed in silico.
This work demonstrates how the combination of extracellular metabolomic data
with metabolic modeling can lead to unprecedented insight into different metabolic
strategies of cancer cell lines.
30
Figure 2.1: Metabolic models provide a context for the analysis of metabolomic
data. a. 1. The refinement step denotes the addition of transport and exchange reaction to allow the uptake and secretion of metabolites detected in the metabolomic
profiles of the NCI-60 cell lines [137]. 2. The cancer cell line models were generated using minExCard. In total, 120 cancer models (NCI-60 times 2) are generated
from published metabolomic data and the extended metabolic model. 3. The models
are analyzed using a set of computational methods. Based on the computational
results the models are divided into different metabolic phenotypes and drug targets
are predicted for each individual model. 4. The approach is applicable to a variety
of biomedical applications. Analysis of patient-specific omics data could be used
for the stratification of disease phenotypes and for the prediction of personalized
disease intervention strategies. b. Differences in the number of reactions, metabolites and genes across the large set of models. c. Distribution of the number of
reactions, metabolites, genes and exchanges among the 120 cell line models.
31
2.2 Results
2.2.1 Generation of heterogeneous cancer cell line models
Published metabolomic profiles comprising the uptake and secretion of metabolites
from and into the culture medium were integrated into the metabolic model (Fig.
2.1A), [137]. The metabolomic data consisted of two samples per cell line and
considerable variations between samples (Supplementary Fig. 2.6) let us to generate one model for each sample rather than averaging the data for each cell line
(replicate samples will be referred to as ’-2’). To generate a cancer cell model, the
starting model was constrained according to the quantitative metabolite uptakes and
secretions measured for the respective sample. Next, a minimal set of additional
exchange reactions needed to sustain a growth phenotype was identified based on
the model structure, using minExCard. All other metabolite exchanges and internal
reactions no longer used by the model were removed, giving rise to the individual
cancer cell line model (Fig. 2.1A). The generated 120 cancer cell line models differed with respect to completeness of subsystems, and the numbers of reactions,
metabolites, and genes (Fig. 2.1C-D). Variations in the metabolic model content involved all major metabolite classes (Fig. 2.1B-E). Further, large variations existed
with respect to maximal growth rates achieved by the models. Many exceeded by
far the growth rates expected from any human cell. However, only 15 models could
not grow when constrained according to experimental growth rates (Supplementary
material). ACHN-2 and UACC-257 were limited to experimental growth rates, only
by the imposed metabolic profiles (Supplementary material). Taken together, the
diversity of the models suggested that they were a good starting point to investigate
metabolic heterogeneity among the cell lines.
2.2.2 Distinction of metabolic phenotypes
Metabolic strategies yield different amounts of ATP, e.g., full oxidation of glucose
to CO2 yields 30-32 ATP and aerobic glycolysis yields two ATP [157, 158]. Herein,
we used the ATP yield as an estimator for distinct pathway utilization by the models. The entire range of ATP yields across the models was large (Fig. 2.2A, ATP
yield: min = 2.93, max = 55.3) and exceeded the theoretical measure for aerobic
glycolysis. Exact fit with the theoretical ATP yields was not expected, since the
models could use additional substrates and reactions to produce ATP (Supplementary Fig. 2.6). Rank-ordered ATP yields described a fairly continuous increase,
occasionally interrupted between groups of models (Fig. 2.2A). One gap between
groups of models was associated with the switch of the major ATP producing reaction identified through flux splits. This analysis estimates the contribution of each
32
Figure 2.2: Distinction of the models based on energy and cofactor production. a.
Rank-ordered ATP yield achieved by the models described a gradual increase rather
than accumulated clusters around the theoretical ATP yields of different metabolic
strategies. The spread of ATP yields highlights the metabolic heterogeneity among
the 120 models, potentially using a mixture of pathways and metabolic fuels for ATP
production. Two major strategies for ATP production could be distinguished based
on a jump of ATP yield. The mechanistic difference was resolved based on the calculated flux splits enumerating the contributions of all ATP producing reactions to
the total ATP production in each individual model, as consisting in the higher contribution of either phosphoglycerate kinase (green squares) or ATP synthase (red
squares) to the total ATP production. b. Plotting the contributions of phosphoglycerate kinase, ATP synthase, and succinate-CoA ligase allowed additonal distinction
of two OxPhos subtypes. c. An even more fine-grained division of the OxPhos models was achieved considering production strategies of NADPH, NADH and FADH2
production. Two types of glycolysis models could be distinguished through their
production routes of FADH2 and six OxPhos subtypes were distinguished based on
the main routes of NADH and NADPH production. The table lists for each phenotype (I-VIII) the reactions contributing most to ATP, NADH, NADPH, and FADH2
production.
model reaction producing ATP to the total amount of ATP produced [159]. Models with an ATP yield < 4.2 (’glycolytic’ models, n=37, Fig. 2.2A) produced the
33
highest fraction of ATP through phosphoglycerate kinase (PGK). In contrast, models with an ATP yield > 7.36 produced ATP majorly by ATP synthase (’OxPhos’
models, n=83, Fig. 2.2A). Thus, ATP yield and ATP production strategy divided
the models into glycolytic and OxPhos phenotypes. Considering differences in the
utilization of the TCA cycle, i.e., ATP production of succinate-CoA ligase, allowed
further identification of two OxPhos subtypes (Fig. 2.2B). This division was not
obvious from the ATP yield (Supplementary Fig. 2.7). Besides ATP, cells need
cofactors to support proliferation. Distinct strategies used by the models to produce different cofactors, again identified through flux splits, allowed the division
of glycolytic models into two subtypes (Supplementary Tab. 2.2). The previously
identified two OxPhos subtypes were subdivided into altogether six subtypes (Fig.
2.2C). Glycolytic subtypes differed only in the major FADH2 producing reaction.
Two OxPhos subtypes were associated with high TCA cycle contribution to ATP
production, which was associated with high utilization of cytosolic malic enzyme
as leading NADPH source. The four remaining OxPhos subtypes used predominantly either isocitrate dehydrogenase (IDH) or dihydroceraminde desaturase for
NADPH production. Glyceraldehyde-3-phosphate dehydrogenase was the major
NADH producer in OxPhos models with (relative) higher glycolysis based ATP
production. 2-oxogluterate dehydrogenase was favored by models with higher ATP
synthase contribution (Fig. 2.2C). Thus, predicted strategies of cofactor production
allowed an even more fine-grained model classification.
2.2.3 Robustness towards genetic and environmental perturbation
So far, we characterized the models based on the imposed constraints and the distinct use of central metabolic pathways. In the following, we predict the behavior of
each model towards environmental and genetic perturbations. The course of transformation events shapes the metabolic network and might influence the robustness
of cancer cells towards environmental changes later on [160]. Variation of glucose
and glutamine uptake, and lactate secretion, each along with variation of oxygen uptake (Phenotypic phase plane analysis (PhPP)) led to two major observations [161].
First, the solution space, which contains all possible network states and which was
defined through variation of oxygen uptake, divided the models into three groups:
(1) glycolytic models could only grow at low oxygen uptake rates. The group of
OxPhos models comprised (2) models growing only at high oxygen uptake rates
and (3) models that were indifferent with respect to oxygen uptake rates (Fig.2.3).
The latter two groups provided a separation of the OxPhos models that was distinct
from the previous analysis. Second, size and form of the solution spaces varied
across models (Fig. 2.4). By using form and size of the solution spaces as visual
clues (Supplementary Fig. 2.8), we divided the models into six distinct clusters (Fig.
2.4, Supplemental material). Thus, robustness of the models towards environmental
34
changes yielded yet another division of our models.
Figure 2.3: Distinct phenotypes with regard to oxygen requirements. This distinction between the OxPhos models (blue) was different compared to the phenotypic
classification performed based on energy and cofactor production strategies.
In silico gene knock-outs can predict novel drug targets [77]. Herein, we used single
gene deletion to investigate the robustness of the models to genetic perturbations.
Constraining enzyme function associated with 1279 genes remained without effect
on growth capability. Another 34 were essential genes to all models and could
constitute metabolic targets for all previously defined phenotypes. Additionally, 11
essential genes were present only in a subset of the models, yet essential to all.
The number of essential genes varied across models (min = 92, max = 182, Fig.
2.5A), and was not associated with any phenotype (Supplementary Fig. 2.9). The
remaining 228 essential genes affected only a subset of those models that contained
the gene. The effect consisted either in complete termination of growth or partial
reduction (growth <95%). Some genes appeared in subsets of models and affected
yet another subset of those: terminating (n=4) or reducing growth (n=21). Finally,
203 genes were terminating and affecting growth in 1-119 models, while present in
all. Surprisingly, many of the essential genes that affected only few models (n<20
models) were associated with central metabolism, e.g., TCA cycle (Fig. 2.5B-C).
35
Figure 2.4: Six model clusters were distinguished according to the models robustness towards environmental changes. Variation of glucose, glutamine, lactate
and oxygen allowed for in improved discrimination of OxPhos models. In contrast, no sensible distinction between glycolysis models was achieved through PhPP.
Heatmaps display PhPP results for one model of each cluster (and subcluster).
Lines in the heatmaps indicate the constraints imposed on the respective, exemplified model.
Rare incidence of these essential genes hints towards high dependencies of these
models on central metabolic pathways. To further investigate the dependencies, we
plotted the rare essential genes ordered according to the PhPP clusters (Fig. 2.4).
This revealed an accumulation of rare essential genes in models of cluster 2 and
cluster 4 (Fig. 2.5B). The models of cluster 4C (SK-MEL-28, SK-MEL-28-2, and
SK-MEL-5) were characterized by particularly small solution spaces (Fig. 2.5B).
Besides large variations in the number of essential genes, the combination of gene
deletion and PhPP revealed the connection between a small solution space and high
incidence of rare essential genes.
Cancer cells use the TCA cycle in different ways [69, 150]. Accordingly, diverse
36
Figure 2.5: The models have different sets of essential genes. a. Essential genes for
each cancer model. b. Essential genes affecting maximal 20 models, either reducing
or terminating growth in subsets of models that include the gene. For the heatmap,
we combined genes with the same effects and same pathways (e.g., 42 genes of
NADH dehydrogenase) and displayed only the rare essential genes (n<20 models).
Appearance of rare essential genes associated with central metabolic pathways,
e.g., the TCA cycle characterized clusters 2 and 4C (phase plane analysis), which
were characterized by their small phenotypic space. Models of cluster 2 were selectively affected by, e.g., ALDH1L1, NADH dehydrogenase genes, and SLC25A19 KO.
c.The diverse ways in which cancer cells use the TCA cycle was reflected the variety
of essential genes associated with the TCA cycle, and which included rare essential
genes associated with the reactions mediating mitochondrial reductive carboxylation. IDH and aconitase terminated growth in four models only. These models,
including both SK-MEL-28 models relied on reductive carboxylation.
37
KOs including the rare KOs ACO2 and IDH2 were associated with this pathway
(Fig. 2.5A,C). Interestingly, the reactions associated with the rare KOs operated
reductive carboxylation. These two genes terminated growth in four models (SKMEL-28, SK-MEL-28-2, MALME-3-2, and BT-549) and reduced growth in 14 and
11 additional models. Flux variability analysis (FVA) revealed that these models had
to operate reductive carboxylation [162], whereas this pathway remained optional
for the other models, even when constrained to experimental growth rates. Sampling
analysis conducted for the SK-MEL-28 models further confirmed mandatory reductive carboxylation (Supplementary Tab. 2.3, [15, 163]). In agreement with an observed increase in reductive carboxylation under hypoxic conditions [69], reduction
of the oxygen uptake rate (lb=ub=-100) rendered 14 additional models dependent on
reductive carboxylation. Fourteen models, including the four reductive carboxylation models, belonged to PhPP cluster 4A-B. The remainder belonged to cluster 1B,
characterized by a heavily constricted solution space at low oxygen uptake rates
compared to, e.g., cluster 4C models (Fig. 2.5B). Our models were therefore not
only able to predict reductive carboxylation, but further reproduced the connectivity
between low oxygen and reductive carboxylation in cancer cell lines.
Phosphoglycerate dehydrogenase (PHGDH) was another KO shared among the
four models obliged to reductive carboxylation. Interestingly, SK-MEL-28 and
MALME-3M had previously been associated with amplifications of PGDH due
to 1p12 gain [148, 164]. Cells with high PHGDH activity produce up to 50% of
α-ketoglutarate through this pathway, and PHGDH silencing decreases proliferation through decreased α-ketoglutarate supply [165]. The correct prediction of the
dependency of SK-MEL-28 and MALME-3M on PHGDH provides additional support for the presented approach, and the predicted dependency of SK-MEL-28 on
reductive carboxylation.
2.3 Discussion
Extracellular metabolic profiles can be interpreted in the context of the metabolic
model [117, 142, 144]. To date, exploitation of the methods for clinical applications is hampered by the uncertainty connected to the serum composition. Herein,
we present minExCard, a novel method that predicts a minimal set of additional
metabolite exchanges, absent from measured metabolomics uptake and secretion
profiles. This method enabled the generation of condition-specific metabolic models from individual extracellular metabolomic profiles. The combined use of various computational methods allowed the large-scale assessment of metabolic heterogeneity among the models of the NCI-60 cell lines. The cancer cell line models
had different ‘oxotypes’, which separated models that changed pathway utilization
38
with oxygen uptake from others that were more restricted to low or high oxygen
uptake rates, disregard of oxygen availability. Distinct robustness of cell lines towards environmental and genetic perturbations, as predicted herein, could have important implications for conclusions drawn from experiments performed under normoxic conditions. The context of the metabolic models allows novel biological
insights that could not have been drawn from data analysis alone. The integration of
metabolomic samples can be applied to many cellular systems and constitute an important step towards the application of metabolic models in personalized medicine.
Our novel model generation method constitutes an important step towards the clinical application of metabolic models. In contrast to previous studies [117, 142, 144],
we did not only included quantitative constraints, but we used the context of the
metabolic models to predicted a minimal set of hypothetical undetected exchanges.
Although the added metabolite exchanges could hypothetically constitute valid exchanges (Supplementary Tab. 2.5, Supplementary material), it should be noted
that addition of exchanges may differ depending on the chosen objective function.
Herein, we refrained from experimental validation of the predicted additional exchanges, since we used published data and small differences in culture conditions
can have profound effects on cellular behavior [154]. The discrepancies between
duplicate samples in the published data set (Supplementary Fig. 2.6) illustrates how
vulnerable the phenotype might be. Yet again, such heterogeneity among clonal cell
lines is a known phenomenon. Noise in gene and protein expression has been connected to structural and behavioral differences [166, 167]. Identification of these
’less determined’ exchanges could be an interesting follow-up question, since the
discrepancies in the metabolic profiles frequently let to distinct classification of the
two replicate cell line models, and main conclusion could not be based on such
variation.
The combination of established network interrogation methods allowed the distinction of glycolytic and OxPhos phenotypes. The range of predicted contributions
of glycolysis and oxidative phosphorylation to ATP production were generally supported by literature (Supplemental Tab. 2) [147, 168, 169]. The contribution of
ATP synthase in MCF-7 (68% and 62%) underestimated literature reports > 80%
[147, 168], potentially due to condition-specific differences, or our 20% allowance
around the experimentally defined metabolite uptake and secretion rates. Visual
classification revealed a more gradual transition between size and form of the solution spaces. Together, this corresponds to the observation of glycolytic and OxPhos
phenotypes of various specificity among cancer cell lines [150, 160].
Although the utilization of glycolysis, TCA cycle and ETC broadly explained the
differences in the data, did we observe outliers (e.g., Fig.2.4, cluster 1B). redThis
indicates that the heterogeneity among the models is even greater and depends on
more factors than the ones we investigated, and argues for the interrogation of path-
39
way and fuel utilization beyond commonly monitored pathways[69, 71, 148, 149,
151].
However, pathway utilization was not the only determiner of phenotypic differences, since dependency of the cluster 4 models was not explained by pathways
utilization but coincided with the susceptibility towards rare KOs (Fig. 2.5B). Both
SK-MEL-28 models had KOs in the reactions associated with reductive carboxylation (Fig. 2.5B-C). In which way this reverse directed flux might be compensated by
the opposed flux in the cytosolic analog would need to be investigated, however, that
such a combination of flux might contribute to the transmission of NADPH between
compartments has been indicated [150]. In contrast to previous studies where reductive carboxylation was found to be connected to the IDH1 in the cytosol but not
mitochondrial IDH2, was the mandatory reductive carboxylation predicted herein
connected to IDH2. Nevertheless, it was not excluded, that IDH2 may still promote
reductive carboxylation in tissue or condition dependent manner [69].
Another interesting observation was that the constraints set based on the experimental uptake and secretion rates were cytotoxic in a subset of models. These models
had to dedicate resources to deal with excess nutrients, which became obvious by a
reduction in growth rates (Fig. 2.4, e.g., cluster 4B). One important factor that could
drive excessive nutrient uptake could be non-physiological experimental conditions,
e.g., excessive nutrient supply. Glucose uptake might surpass cellular requirements,
if the receptors are stimulated maximally [170]. Additionally, nutrient uptake is
decoupled from growth factor signals in cancer cells [171].
Hypoxia is believed to drive transformation [150], and tumor cells are exposed to
temporal fluctuations of oxygenation [156]. Such environmental changes necessitate metabolic flexibility, which varied among the models (Fig. 2.3). High glycolytic
rates are often connected to low oxygen consumption in cancer cells [150]. In comparison, all glycolytic models were limited to low oxygen uptake rates (Fig. 2.3).
It has been mentioned that ’physoxia’, which differs between tissues, is closer to
experimental ’hypoxia’ conditions, compared to the usual, ’normoxic’ experimental conditions [172]. That this can have important implications for the conclusions
drawn from experiments conducted in ’normoxia’ [172]. This accounts particularly
for cells that are not limited to low oxygen uptake, which is illustrated by our predicted group of oxygen indifferent models, and the fact that the limitation of oxygen
induced reductive carboxylation in additional models (Fig. 2.3).
In conclusion, our study furthers the interpretation of extracellular metabolomic
profiles in the context of metabolic models and provides biological insights into the
metabolic heterogeneity among the NCI-60 cancer cell lines. Moreover, it emphasizes the importance of oxygenation conditions on the behavior of the cancer cell
lines. The approach carried out herein is applicable to various cellular systems and
40
holds great potential for personalized health, e.g., to predict the effectiveness of
drugs with metabolic targets on cancer or any cell affected by metabolic diseases.
2.4 Matherial and Methods
The starting model
The genome-scale metabolic reconstruction, Recon 2 covers a total of 1789 genes,
7440 reactions and 2626 unique metabolites distributed over eight cellular compartments. Its predictive capability has been demonstrated, e.g., through mapping of
inborn errors of metabolism and different omics data sets [53]. The starting model
that was used herein constitutes a subset of Recon 2, and is the same used in a
previous study [173]. Infinite constraints were set to lb=-2000, ub=2000, and all
exchange reactions in the model were initially opened. Subsequently, constraints
were set on exchange reactions of ions (lb=-100), vitamins (lb=-1), essential amino
acids (lb=-10) and compounds such as water or protons (lb=-100). Oxygen uptake
was constrained to lb=-1000 and ub=0. This range was defined based on reported
oxygen uptake rates of a cancer cell line (2.85*10 -6 ml O2 /105 cells/min = 646.013
fmol/cell/hr [174] Additionally, the lower bounds of the superoxide anion and hydrogen peroxide exchanges were set to zero to prevent the generation of models that
did not require oxygen uptake.
The biomass reaction is usually in units of mmol/gDW/hr. Yet herein, the metabolite
uptake and secretion profiles that were mapped were provided in the unit fmol/cell/h
[137]. We assumed a unitary cell weight of 1e-12 , which was in the range of the the
dry weight (3.645e-12 g) we calculated for lymphocytes in an earlier study [173].
There the dry weight had been inferred from the dry mass (range 35-60ng [175])
and cellular volume (4000 µm3 , [110]) of the human osteosarcoma cell line U2OS,
which we related to the cell volume of lymphocytes (243 µm3 ) [176]. By calculating 4000/243=16.46, 60pg/16.46=3.645pg (3.645e-12 g) [173]. According to
1mmol/gdw = 1e+12 fmol/1e+12 cell, no scaling of the biomass was necessary. The
lb of the biomass objective function was fixed to a minimal value of lb=0.008 to
match the lb defined for the slowest growing cell line in the data set (HOP-92,
88hrs) [177], to ensure that the model building resulted in functional models with
non-zero growth.
41
Constraint-based modeling
We used flux balance analysis (FBA, [11]) to solve the following problem
Z = ∑ cT · v
s.t.S · v = 0
lb ≤ v ≤ ub
(2.1)
lb ≤ ub
where S is the stoichiometric matrix consisting of m metabolites and n reactions
as defined by the metabolic reconstruction. Z is the objective function and c is
the vector of length n that contains the weights with which each of the reactions
contribute to the objective function. The lower bound, lb on the reaction flux vi
has a non-zero negative value in the case of reversible reactions and zero or greater
in the case of irreversible reactions. The upper bound, ub, is greater than zero for
reversible or forward reactions. The bounds on exchange reactions, which supply or
remove metabolites from the model, are defined as follows: if lb < 0 the metabolite
can be taken up, if lb ≥ 0 the metabolite needs to be secreted.
FBA solutions are inherently degenerative. Therefore, we minimized the Eucleadian
norm of the flux vector while maximizing the stated objective function. This method
ensures that the computed flux vector is unique and assumes that the most likely
solution is the one that minimizes the sum flux of the reactions, min ∑(v2i ) [15].
Flux variability analysis (FVA) calculates minimal and maximal flux through each
reaction in the model through performing FBA. This analysis provides insight into
robustness and redundancy of the metabolic network [162]. Herein, FVA was used
to define the flux span of the reductive carboxylation reactions.
Sampling analysis does not depend on the definition of an objective function, and
has previously been applied to investigate differences between metabolic networks
[117, 163]. Herein, the ACHRsampler implemented in the COBRA toolbox [15]
was used to investigate the feasible steady-state flux space under the given set of
constraints for the SK-MEL-28 cell line models. After calculating a set of random points, i.e., warm-up points (n=10,000), the sampling points are collected, by
choosing a random direction and a random step length from the calculated center,
while remaining within the model’s solution space defined by the constraints. In
total, we generated 500,000 sampling points (nFiles = 100, pointsPerFile = 5,000),
with 2,500 steps in between two collected sampling points (stepsPerPoint=2,500),
to better support mixing of the sampling points throughout the solution space. The
42
set of sampling points can be seen as the probability distribution of the flux through
each single reaction in the network.
Data integration and model building
The metabolite consumption and release profiles of 140 metabolites comprised two
samples for each of the 60 NCI-60 cell lines (120 samples) [137]. From the entire
set of detected metabolites, we only used the calibrated (quantitative) uptake and
secretion fluxes (115 metabolites), which were provided in fmol/cell/hr. Metabolite
identifiers in the data were mapped to the metabolite abbreviations in the starting
model. The metabolite aminoisobutyrate was not part of the starting model and
was excluded. Based on the metabolite abbreviations, we identified the existing
metabolite exchange reactions. If no exchange reaction existed in the model but
the metabolite itself was part of the model, a new exchange reaction was added to
the model. In addition to the exchange reactions, transport reactions need to be
present in the model to allow the transport of the metabolites between extracellular
space and model cytosol. Transport reactions need to be added for all metabolites
for which we added exchange reactions. These transport reactions were identified
from the literature. We added a diffusion reaction, if no transporter for the metabolite could be identified. The additions that we made to the model based on the
metabolomic data comprised 43 transport and 36 exchange reactions (Supplementary Tab. 2.7).
Presence of an exchange and transport reactions does not ensure that a metabolite
can be consumed or secreted by the model, since anabolic and/or catabolic pathways might not be present or even still be unknown [53, 58]. To identify the subset
of metabolites the model could consume and secrete, we performed FBA, while
enforcing small uptake (ub=-0.00001) or secretion (lb=0.00001) for all mapped
metabolite exchanges. All metabolites that could not be consumed (15) or secreted
(15) by the model were discarded (Supplementary Tab. 2.6). The identification
of metabolites that are not part of the metabolic reconstruction is common, and
pathways for these metabolites need to be added in future releases for the generic
model [53, 173]. If uptake of a metabolite was possible in the generic model but
not secretion, only metabolite secretions were discarded from the metabolic profiles, while uptakes remained present, and the other way around. After the sets of
’qualitatively’ feasible metabolite exchanges was identified, we mapped the sets of
metabolite uptake and secretions of one sample at a time to the starting model. We
imposed each detected, quantitative fluxe X as constraints to the bounds of the respective metabolite exchange reaction while considering a 20% allowance around X
(lb = X ∗ 0.8 and ub = X ∗ 1.2). The set of exchanges detected for one sample was
consecutively mapped to the starting model. After constraints were placed on one
43
exchange reaction, FBA was performed to check if the model was still feasible. Although the starting model was able to perform all qualitative metabolite exchanges
that were mapped, certain quantities, or combination of constraints could still render the model infeasible. In case of infeasibility, the original bounds of the model
were restored, and we proceeded to the next set of constraints. Quantitative constraints rendered 27 preliminary cell line models infeasible (Figure 2.1 A). Of these
27 models 25x2, 1x1, and 1x4 exchange constraints were restored during the data
integration.
All 120 preliminary models, each with the individual constraints detected for one
sample, were subjected to our new model building method. The idea behind the
method was that although metabolic profiles are likely to be incomplete, the context
of the metabolic model can be used to predict a hypothetical, minimal set of missing
exchanges, to maximally constrain the solution space (containing the set of feasible flux distributions) of the model. Incompleteness of the metabolic profiles results
from limitations of detection methods (LC-MS or NMR), or the fact that the composition of medium, e.g. containing or consisting of serum, is is not entirely defined,
and therefore the list of metabolites utilized and released by the cells. In comparison to the number of metabolites detected in metabolomics approaches (herein 115)
is the number of exchanges in the starting models very high (464 metabolite exchanges). Adding constraints to the model limits the number of feasible network
states, and ideally, only biologically relevant ones would remain. Constraining the
model too much by removing all but the detected metabolite exchanges would inevitably lead to an infeasible model because not all metabolites might have been
measured (e.g., oxygen uptake rates or undetected substrates herein). In order to
maintain a functional model on the one side, while constraining the model as far as
possible on the other, we formulated an LP problem. This LP problem would return
a solution relying on a minimal set of exchanges (minimize cardinality) needed, in
addition to the experimentally defined uptakes and secretions, to sustained a feasible
model.
The procedure was the same for all 120 models: First, an an irreversible model
was generated by duplicating the reversible reactions. Subsequently, all exchange
reactions not part of the respective metabolic profile were identified and flagged to
be part of the minimized set of exchange reactions. The LP problem was solved.
All unused exchange reactions were identified from the LP solution, and all exchange reactions that were not part of the minimal set of exchanges were closed
in the reversible model. The submodel was extracted using a function (identifyBlockedRxns, epsilon=1e-4 ) from the FASTCORE algorithm for reconstruction of
context-specific metabolic networks [126]. The steps were iterated until the minimal
exchange reaction network was identified to build the final cancer cell line specific
model.
44
Growth rates
Cell line specific growth rates [177], which agreed with [137] were used as constraints to analyze the ability of the models to realize experimental growth rates. An
alternative
set
of
NCI-60
growth
rates
(http://dt p.nci.nih.gov/docs/misc/common_ f iles/cell_list.html) did not yield any
different results. Growth rates were only used as constraints if explicitly stated.
Flux split ratios and ATP yield
Flux splits can be used to investigate metabolism in a metabolite-centric view in
addition to commonly used fluxes [159]. Herein, we calculated flux splits to obtain
information on the distinct production strategies of the cancer cell lines models for
ATP and cofactors (NADH, NADPH, and FADH2). The flux splits were calculated
based on the flux vectors identified through optimizing ATP production for each
model. All reaction fluxes producing the metabolite i were identified: Pi,j = Si,j ×Vj
for all reactions j as Pi,j > 0. From the sum of production fluxes Φi = ∑ Pi,j , the
percent contributions were calculated Pi ∗ = Pi,j /Φi as specified [159]. However,
prior to summarizing the total production flux Φi , certain reactions, e.g., transport
reactions or other transformations of no interest to our analysis, were removed (Tab.
2.1). Subsequently, the reaction with the maximal Pi ∗ was identified as the major
producer of ATP, NADH, NADPH, and FADH2. Based on the combination of major
producer reactions, the 120 models were classified into eight different phenotypes
(Supplementary Tab. 2.2). The ATP yield was defined by dividing the ΦATP , by the
glucose uptake of each respective model. It should be noted that although we formulated the ATP yield according to glucose uptake, uptake of other carbon sources,
e.g., glutamine, was still possible since no additional constraints were applied in
this analysis.
Phenotypic Phase Plane Analysis
The robustness of the 120 models towards environmental perturbations was investigated using phenotypic phase plane analysis (PhPP) [161]. Thereby, fluxes through
two exchange reactions representing metabolite uptake or secretion are fixed at different intervals while setting biomass production as the objective function, using
normal FBA. For each step, the optimal value was computed and plotted as heat
maps. Herein, oxygen uptake was varied in combination with either glucose uptake, glutamine uptake, or lactate secretion. All other reaction constraints remained
unchanged. The range that was tested was defined based on the variability of the
45
Table 2.1: Reactions discarded from flux split analysis (and ATP yield).
ATP
ATPtm
ATPtn
ATPtx
ATP1ter
ATP2ter
EX_atp(e)
DNDPt13m
DNDPt2m
DNDPt31m
DNDPt56m
DNDPt32m
DNDPt57m
DNDPt20m
DNDPt44m
DNDPt19m
DNDPt43m
ADK1
ADK1m
NADH
NADHtpu
NADHtru
NADtpu
NADPH
NADPHtru
NADPHtxu
FADH
FADH2tru
FADH2tx
constraints set throughout the set of 120 models: oxygen uptake rate was initially 0
and decreased in steps of 20 units until an uptake rate of -1000 was reached. Glucose uptake rate was initially 0 and decreased in steps of 20 units to -1080 (lowest
and highest glucose uptake was -38*0.8=-30 and -860*1.2=-1032 among the models). Glutamine uptake rate was initially 0 and decreased in steps of 20 units to -400
(lowest and highest glutamine uptake was -13.87*0.8=-11.096 and -304.27*1.2=365.124 among the models). Lactate secretion rate was initially 1620 and decreased
in steps of 20 units to 0 (lowest and highest lactate secretion was 32.35*0.8=25.880
to 1345.14*1.2=1614.2). The unit of all flux values is fmol/cell/hr.
Gene deletion
We performed single gene deletion, using the function implemented in the COBRA
toolbox for each of the 120 models [15]. A KO was defined as growth rate of the
perturbed model was ≤5% of the growth rate of the unperturbed model. Reduced
growth was defined as ≤95% but ≥5% of growth of the unperturbed model.
All calculations were performed using TomLab cplex linear solver and matlab.
46
2.5 Supplementary material
This section captures tables published as supplemental material.
Table 2.2: Distinct Phenotypes.
ATP
ATPS4m
ATPS4m
ATPS4m
ATPS4m
ATPS4m
ATPS4m
PGK
PGK
NADH
AKGDm
AKGDm
GAPD
GAPD
GAPD
MDHm
GAPD
GAPD
NADPH
ICDHyrm
ME2
DHCRD1
ICDHyrm
ME2
DHCRD1
LALDD
LALDD
FADH
SUCD1m
SUCD1m
SUCD1m
SUCD1m
SUCD1m
SUCD1m
SUCD1m
FAOXC160
Type
8
7
6
5
4
3
2
1
Frequency
6
41
4
16
12
4
29
8
In total, 112 metabolites were mapped to our model. For those, we performed spearman’s rank clustering after normalization of each column for both, metabolites and
cell lines (Supplementary Fig. 2.6). Although the samples of the model cluster together, differences do exist between the samples, e.g. for the two samples of M14,
there are a number of metabolites which differ in whether they were consumed
(blue) or released (red). Based on this discrepancies between the samples, we decided to build a model for each sample rather than combining the replicates and
build a model for each cell line.
Additional exchanges
The number of exchanges added to maintain functional models varied between 13
and 28. The unique set of additional metabolite exchanges (n=54) was each added
at least in one model, and two metabolites were added to all 120 models (Supplementary Tab. 2.5), i.e. O2 , which was only uptake, bilirubin-glucuronoside (bilglcur) which had to be either secreted or consumed in all 120 models. The pattern
of uptake (n=76) and secretion (n=44) of bilglcur was opposed to the exchange
profiles of bilirubin, which was subject to uptake in 44 samples and secreted in
76 samples (Supplementary Tab. 2.5). The next most frequently added exchange
was 2-Hydroxybutanoic acid, which had to be secreted by 116 models. Interestingly, 2-Hydroxybutanoic acid has been suggested as biomarker for the detection
of colorectal cancer by the multiple logistic regression model [178]. The additions
further contain a number of fatty acids uptakes that had to be added, e.g. phytanic
acid (n=101, Supplementary Tab. 2.5). Dietary branched-chain lipids like phytanic
acid have been linked to various cancers as well as neurological diseases [179, 180].
Intake of phytanic acid or phytanic acid-containing foods has been connected to an
47
ACHN
ACHN_2
SK-MEL-28
SK-MEL-28-2
median
-549.1930966
-508.2469954
-815.4456521
-771.9925793
std
566.6515157
549.0111082
572.7923604
553.0852402
min
-1780.561244
-1779.061376
-1999.99014
-1999.976868
max
255.0627464
251.4236693
-19.18158557
-31.52032852
pyruvate dehydrogenase
mean
29.07586611
21.57907528
0.009248541
0.006737046
median
29.35353201
21.90132532
0.006416373
0.004685093
std
5.39244374
4.793963776
0.009230659
0.006756143
Table 2.3: Sampling results of the isocitrate dehydrogenase and pyruvate dehydrogenase.
isocitrate dehydrogenase
mean
-620.8973661
-588.0353818
-887.6620988
-854.0857531
min
2.948301163
0.027426137
1.04495E-09
9.15917E-09
max
52.74281973
41.57521337
0.088072245
0.062690636
48
49
Metabolite (b)
homoserine
4-hydroxybenzoate
aminoisobutyrate
ascorbate
pyruvate
homocystine
ADMA
allantoin
cotinine
glycerol_2
NMMA
trimethylamine-N-oxide
aconitate
adipate
biotin
citrate/isocitrate
fru-1,6-DP/fru-2,6-DP/glc-1,6-DP
maleate
hippurate
malonate
salicylurate
methylmalonate
UDP-galactose/UDP-glucose
hyodeoxycholate/ursodeoxycholate
chenodeoxycholate/deoxycholate
lithocholate
taurolithocholate
phenylacetylglycine
ascb_L
pyr
N/A
N/A
alltn
N/A
glyc
N/A
N/A
N/A
N/A
btn
cit/icit
fdp/f26bp/
N/A
N/A
HC00319
N/A
HC00900
udpgal/udpg
N/A
N/A
HC02191
HC02192
N/A
BIGG metabolite
hom_L
4hbz
Direct Recon 1
EX_hom_L(e)
Ex_4hbz[e]
not in Recon 2
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
excluded because not calibrated
Comment
no uptake/secretion in Recon, therefore excluded right away
no uptake/secretion in Recon, therefore excluded right away
Table 2.4: Excluded were uncalibrated metabolites and those that could not be produced nor consumed by Recon.
Method (a)
HILIC
IPR
HILIC
IPR
IPR
IPR
HILIC
HILIC
HILIC
HILIC
HILIC
HILIC
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
IPR
Table 2.5: Added exchanges.
Exchange added
EX_bilglcur(e)
Frequency (n) secretion (n) uptake (n) Comment
120
44
76
The pattern of uptake and secretion corresponds exactly to the CORE profiles of
bilirubin, which was 44x uptake and 76x secretion.
EX_o2(e)
120
0
120
EX_2hb(e)
116
116
0
2-Hydroxybutanoic acid; FA; Biomarkers for detecting colorectal cancer selected
by the multiple logistic regression model. DOI: 10.1371/journal.pone.0040459,
[178]
EX_thmmp(e)
111
10
101
In the CORE profile thiamine (thm) is secreted by 106 models; thm is generated
from Thiamin monophosphate by THMP thiamine phosphatase. Further there is
secretion of thiaminetriphosphate in 4 of the models (see below)
EX_urea(e)
111
111
0
EX_his_L(e)
107
0
107
is also produced from L-Carnosine which is taken up by 68 samples of the CORE
profiles, maybe it’s a matter of how much is needed if carnosine is not sufficient
EX_dmhptcrn(e)
101
101
0
carnitine usual that they are secreted (excreted in urine)
EX_phyt(e)
101
0
101
FA
EX_tdchola(e)
92
92
0
The CORE profile says 94 times uptake, however seems not possible for the
model, and during model building 92 secretions are added to the 26 present ones.
EX_pydx(e)
85
0
85
the remaining 35 models have 4-Pyridoxate uptake, which can alternatively be
produced from Pyridoxal
EX_utp(e)
85
0
85
EX_co2(e)
83
83
0
EX_gtp(e)
82
0
82
EX_gthrd(e)
77
0
77
EX_i(e)
76
0
76
EX_triodthysuf(e)
72
72
0
uptake of Triiodothyronine (triodthy) > -1e-07 in 72 samples in the CORE profiles
EX_atp(e)
68
0
68
EX_5mthf(e)
56
56
0
EX_7thf(e)
56
0
56
EX_h(e)
51
51
0
EX_udp(e)
35
0
35
EX_gdp(e)
32
0
32
EX_adp
27
0
27
EX_lpchol_hs(e)
24
0
24
EX_tchola(e)
20
20
0
EX_tag_hs(e)
19
19
0
EX_vitd3(e)
14
0
14
EX_5mta(e)
8
8
0
EX_nh4(e)
7
7
0
EX_so4(e)
7
0
7
Ex_5hoxindoa[e]
7
7
0
EX_cbasp(e)
6
0
6
EX_Lcystin(e)
5
0
5
EX_cmp(e)
5
0
5
EX_nac(e)
5
0
5
EX_dopa(e)
4
4
0
EX_nad(e)
4
0
4
EX_thmtp(e)
4
4
0
EX_tymsf(e)
4
4
0
EX_34dhphe(e)
3
3
0
EX_pheacgln(e)
3
3
0
EX_s2l2n2m2masn(e) 3
0
3
N-glycan
EX_strch1(e)
3
0
3
EX_ach(e)
2
2
0
EX_estradiolglc(e)
2
2
0
EX_estrones(e)
2
0
2
EX_pchol_hs(e)
2
0
2
EX_tmndnc(e)
2
0
2
FA
EX_3aib_D(e)
1
1
0
EX_4mop(e)
1
1
0
EX_fru(e)
1
1
0
EX_strdnc(e)
1
0
1
FA
EX_ttdca(e)
1
0
1
FA
EX_urate(e)
1
1
0
increased risk for follicular lymphoma, small lymphocytic lymphoma/chronic lymphocytic leukemia, and non-Hodgkin lymphoma risk [181]. Plasma phytanic acid
concentration were significantly associated with intake of dairy fat, however no direct causal relationship could be established to prostate cancer [180, 182]. Prostate
as well as other cancers overexpress alphamethylacyl- CoA racemase (AMACR),
an enzyme that regulates the entrance of branched-chained fatty acids into peroxisomal alpha- and beta-oxidation [179, 183]. Further, alternative splicing produces
50
Table 2.6: Metabolite uptake and secretion not possible in the model.
Secretion not possible
EX_4HPRO(e)
Ex_carn[e]
EX_cmp(e)
EX_crn(e)
EX_fol(e)
EX_nac(e)
EX_phe_L(e)
EX_pnto_R(e)
EX_ppa(e)
EX_sucr(e)
EX_thr_L(e)
EX_trp_L(e)
uptake not possible
EX_4pyrdx(e)
Ex_crtn[e]
EX_gchola(e)
EX_gthox(e)
Ex_kynate[e]
EX_oxa(e)
EX_tchola(e)
EX_tdchola(e)
EX_thyox_L(e)
EX_urate(e)
Ex_5hoxindoa[e]
EX_gdchola(e)
a distinct transcript of AMACR with distinct biochemical properties and singlenucleotide polymorphisms (SNPs) have been observed, that are elevated in prostate
cancer compared to normal tissue[183, 184, 185].
Most models can grow at experimental growth rates
Cancer cell lines are known to be heterogeneous [160, 186, 187, 188]. One distinctive feature consists in the variability of doubling times of individual cell lines
[137, 177]. Large variations of minimal and maximal achievable growth rates were
observed across models. These differences were the consequence of the imposed
quantitative constraints which the models had to deal with (Supplementary Fig.
2.6). Yet, the heterogeneity with respect to growth rates supported that the quantitative metabolomic differences had successfully been translated into distinct solution
spaces of the generated models [116]. Two models did not reach up the in vivo
growth rates, whereas the remainder 13 models exceeded experimentally reported
growth rates, while dealing with the enforced quantitative constraints. For example,
the SK-MEL-28-2 achieved a minimal growth rate (0.034 fmol/cell/hr, 20.5hr) that
exceeded the experimental measurement (bound ub=0.023 fmol/cell/hr, 35.1 hr)
(Table 2.8). In silico growth of the ACHN-2 model agreed particularly well with the
experimental growth (in silico: max = 0.0206 fmol/cell/hr, min = 0.008 fmol/cell/hr
versus experimental: lb=0.020 fmol/cell/hr, ub=0.030 fmol/cell/hr). Additionally,
the UACC_257 model was limited to the experimental growth rate +/-20% (Growth
rates max=0.0155,min=0.008; lb=0.0136 ub=0.0163). The ACHN-2 and UACC_257 models were good examples of how the model specifically predicted experimental growth rates, as a consequence of the applied metabolomic constraints.
51
Figure 2.6: Flux data mapped to the reconstruction shows variation between replicates despite the samples cluster together.
Replicate models were more similar in ATP production than growth
Replicate models were more similar in ATP production than growth. Growth rates
of models of the same cell line could be very different (Supplemtary Table 1). We
found no correlation (anti-correlation) between ATP yield achieved by the models
and maximal in silico growth, as can be expected as the two are competing objectives (Supplementary Fig. 2.10). We compared the similarity of the replicate
52
Figure 2.7: ATP yield is not informative for the division of OxPhos models (blue
and red).
Figure 2.8: Phase plane analysis revealed distinct solution spaces for variation of
nutrients and oxygen among the NCI-60 models. This distinction was performed
through visual inspection, however the transitions between the depicted examples
were rather fluid.
models with regard to the maximum growth rate predicted for each model. Ordering the models accordingly, sorted six pairs of cell line models in direct consecutive
order. Similarly, using the ATP yield per unit of glucose, 16 models appeared in
direct consecutive order. ATP yield was therefore the stronger binding factor for
models generated from duplicate samples.
Considering flux through both ATP producing reactions in glycolysis (PGK and
pyruvate kinase) against ATP production in the electron transport chain (ETC) con-
53
Figure 2.9: Highest or lowest number of KOs were not associated with any phenotype defined by the previous analysis. The higher (dark blue) an the lower group
(light blue) were defined as mean+/-2STD. Average was 132 KO genes (+/-14).
Figure 2.10: ATP yield does not correlate with maximal growth rate of the models.
verted five OxPhos models into glycolytic models (Supplementary Fig. 2.11).
When looking for the reaction with highest contribution, the fact that glycolysis has
two steps which produce ATP was neglected. So we checked what impact summa-
54
Figure 2.11: Metabolic strategies considering both ATP producing glycolysis reactions. Five models switch to a higher collective contribution of glycolysis compared
to ATP synthetase contribution to total ATP producing flux.
tion of the contribution of the two reactions has on the ranked ATP yield plot, and
hence the classification into glycolytic and OxPhos phenotypes. Result, five models switch from OxPhos to glycolysis phenotype (NCI-H522, both SF-295 and both
SNB-19 models). Both NCI-H522 models switched and became glycolysis models,
the four others were CNS models, and the only CNS glycolysis model.
Glycolysis, TCA cycle and ETC together were the major sources of ATP
Cells can produce ATP in many different ways. Altogether 26 reactions were identified that could contribute to the total amount of ATP produced. Among those
were the reactions in glycolysis, the TCA cycle and ATP synthase the major ATP
producers in each of the models. Glycolysis contributed between 7.8% and 86.6%
of total ATP production per model. Equally, contribution of ATP synthase varied
between 4.7% and 68.1%. Combined contribution of glycolysis and ATP synthase
ranged between 73.0% and 97.5%. Including the contribution of succinate-CoA ligase (0.4%-10.1% of total ATP production) the three pathway contributed 81.1% to
98.1% of the total amount of ATP per model. Although the combined contribution
of glycolysis, TCA cycle and ATP synthase was very high in all models, could the
fraction contributed by either glycolysis or ATP synthase be very different.
55
Distinction of the clusters derived from phase plane analysis
Most but not all cells depend on exogeneous sources of glutamine for nucleotide and
hexosamine biosynthesis [189], other cancer cell depend on constant glucose supply [160]. Accordingly we classified the models into distinct clusters, based on their
dependencies towards uptake of glucose, glutamine, and oxygen, as wellas lactate
secretion. Cluster 1 (n=56) was characterized by the requirement for high oxygen
uptake, no dependency upon glutamine uptake or lactate secretion, yet subtypes
among the models of this cluster existed. Cluster 1B could only grow given lactate
secretion was limited, which coincided with the low utilization of succinate-CoA
ligase by these models. Cluster IC was characterized by high glucose and high lactate secretion (SF-295 models). Accordingly, these two models were most shifted
towards the glycolytic phenotype (Fig. 2.4). Cluster 2 (n=7) was characterized by
the need for sufficiently high uptake of oxygen, low glucose uptake and lactate secretion, as well as indifference with regard to glutamine uptake. Cluster 3 (n=6)
showed similar characteristics but with a larger overall solution space compared to
cluster II. Cluster 4 models could use oxygen only up to a limited extent. Three
subclusters were distinguished, cluster 4A was characterize by increasing oxygen
requirement as a consequence of increasing glucose uptake. This increase was also
indicated in cluster 4B, however the models belonging to this subcluster had a very
restricted solution space and operated only at very high glucose uptake and lactate secretion rates. In comparison was cluster 4A limited to low lactate secretion
rates. Both subclusters were only able to grow with relatively low glutamine uptake.
Cluster 4 comprised of all glycolytic models, however within the subtypes of cluster
4, the glyolytic models, subtype I (as defined by flux split analysis) spread across
cluster 4A and 4C (Fig. 2.4), such that based on this analysis, these two glycolytic
subtypes could not have been distinguished. The division into subtypes 4A-C did
not describe distinct clusters in the 3D visualization, or in other words with regard
to PGK, ATP synthetase or succinate-CoA ligase utilization by those models (Fig.
2.4). We observed an accumulation of melanoma models in cluster 4. However, not
the entire set of melanoma models was part of cluster 4. Two melanoma models
each were associated with cluster 5 and 6.
The models of cluster 5 (n=6) and cluster 6 (n=8) were independent from oxygen
uptake (provided uptake > 0 fmol/cell/h). Further, the models of these clusters
were widely unlimited with regard to glutamine or glucose uptake. Cluster 5 distinguished from cluster 6 through their limited lactate secretion ability.
56
KO genes in the TCA cycle
Cancer cells are known to operate the TCA cycle differently [69, 150]. Robustness
of the model towards gene KO depends on the part on the TCA cycle that is used by
individual model. KO genes appeared throughout the TCA cycle (Fig. 2.5). Except
for citrate synthase, all TCA cycle reactions were associated with one or more KO
genes. Herein, all KOs apart from those mentioned in the main text are discussed.
Malate dehydrogenase 2, (MDH2, Entrez gene ID:4191) and succinate-CoA ligase
(SUCLG1, Entrez gene ID:8802) let to a reduction in growth of few models, but
did not terminate growth in any. Two genes associated with the pyruvate dehydrogenase complex, dihydrolipoamide S-acetyltransferase (DLAT and PDHB, Entrez
gene ID:1737 and 5162) had a growth reducing effect on one model only (ACHN,
KO was 93% of WT growth). Dihydrolipoamide dehydrogenase (DLD, Entrez gene
ID:1738) was associated with 12 reactions in the model and KO gene for all 120
models. Among those 12 reactions was TCA cycle reaction alpha-ketoglutarate
dehydrogenase, and pyruvate dehydrogenase. Further simulation of the impact of
constraining flux through each reaction individually revealed that the reaction behind the KO was 2-Oxoadipate:lipoamde 2-oxidoreductase (in lysine Metabolism),
yet had nothing to do with the enzyme function in the TCA cycle, pyruvate dehydrogenase nor glycine cleavage. It demonstrates the potential of the use of Recon
as knowledge-base, since the effect of the KO of this gene could have falsely been
connected to its function in any of the other reactions. Succinate dehydrogenase
was a KO gene for 118 models, however the two SK-OV-3 models were an exception. Fumerase was a KO gene in all 120 models. Additionally, three mitochondrial
metabolite transporters for H2 O, coa and alpha-ketoglutarate/malate transporter affected a subset of models. The two models SF-539-2 and NCI-H226-2 were sensitive to KO of the mitochondrial water transport (AQP8, Entrez gene ID:343), along
with their requirement for complex I of the electron transport chain. CoA transport was slightly reduced in five models (SLC25A16, Entrez gene ID:8034) SKMEL-5, SK-MEL-28s and HCC-2998s. Finally, growth rates were reduced in SKMEL-5 and SF-539-2 after KO of the gene associated with (SLC25A11, Entrez gene
ID:8402).
Further discussion of the four models with reductive TCA cycle ux
The four models were all glycolytic models, and all but MALME-3M-2 belonged
to the glycolytic subtype 1 (Supplemental Tab.2.2). Additionally, they belonged to
cluster 4 of the phase plane analysis, distributing over subclusters 4A and 4B. Cluster 4 models could use oxygen only up to a limited extent. Cluster 4A was characterized by increasing oxygen requirement as a consequence of increasing glucose
57
uptake. This increase was also indicated in cluster 4B, however the models belonging to this subcluster had a very restricted solution space and operated only at very
high glucose uptake and lactate secretion rates. In comparison was cluster 4A limited to low lactate secretion rates. Both subclusters were only able to grow with
relatively low glutamine uptake. Although all four models exceeded the mean number of KO genes (132 +/-14), yet they did not have the highest overall number of
KO genes.
Figure 2.12: ATP yields do not correspond to the separated clusters of models from
the Phase plane analysis
58
Table 2.7: Reactions added to the starting model.
Name
5hoxindoatr
glyaldtr
PEPtr
gudactr
Lkynrtr
Lkynrtr2
Lkynrtr3
LikeBALAPAT1tc
LikeBALABETAtc
CALAtr
CRTNtr
KYNATEtr
KYNATEtr2
CITRtr
3ANTHRNtr
SPMDtr
SPRMtr
SBT_Dtr
THYMDtr2
HKYNRtr
QULNtr
2PGtr
4hbztr
carntr
cholptr
cyst_Ltr
dcmptr
dhaptr
dmglytr
ethamptr
fumtr
g3pctr
glcurtr
hcys_Ltr
icittr
L2aadptr
phpyrtr
xantr
xmptr
xtsntr
3pgtr
udpglcurtr
IMPtr
Ex_2pg[e]
Ex_4hbz[e]
Ex_5hoxindoa[e]
Ex_cala[e]
Ex_carn[e]
Ex_cholp[e]
Ex_citr_L[e]
Ex_crtn[e]
Ex_cyst_L[e]
Ex_dcmp[e]
Ex_dhap[e]
Ex_dmgly[e]
Ex_ethamp[e]
Ex_fum[e]
Ex_g3pc[e]
Ex_glcur[e]
Ex_glyald[e]
Ex_gudac[e]
Ex_hcys_L[e]
Ex_icit[e]
Ex_kynate[e]
Ex_L2aadp[e]
Ex_Lkynr[e]
Ex_pep[e]
Ex_phpyr[e]
Ex_quln[e]
Ex_sbt_D[e]
Ex_spmd[e]
Ex_sprm[e]
Ex_xan[e]
Ex_xmp[e]
Ex_xtsn[e]
Ex_3pg[e]
Ex_3hanthrn[e]
Ex_udpglcur[e]
Ex_hLkynr[e]
Formulas
so4[c] + 5hoxindoa[e] <=> 5hoxindoa[c] + so4[e]
na1[e] + glyald[e] <=> na1[c] + glyald[c]
hco3[e] + pep[c] <=> hco3[c] + pep[e]
gudac[e] <=> gudac[c]
Lkynr[e] <=> Lkynr[c]
phe_L[e] + Lkynr[c] -> phe_L[c] + Lkynr[e]
leu_L[e] + Lkynr[c] -> leu_L[c] + Lkynr[e]
2 na1[c] + cl[c] + cala[c] -> 2 na1[e] + cl[e] + cala[e]
h[c] + cala[c] -> h[e] + cala[e]
cala[e] <=> cala[c]
crtn[c] -> crtn[e]
akg[c] + kynate[e] <=> akg[e] + kynate[c]
kynate[e] <=> kynate[c]
citr_L[e] <=> citr_L[c]
3hanthrn[e] <=> 3hanthrn[c]
spmd[e] <=> spmd[c]
sprm[e] <=> sprm[c]
sbt_D[e] <=> sbt_D[c]
thymd[c] <=> thymd[e]
trp_L[c] + hLkynr[e] <=> hLkynr[c] + trp_L[e]
so4[c] + quln[e] <=> so4[e] + quln[c]
2pg[e] <=> 2pg[c]
4hbz[m] <=> 4hbz[e]
carn[e] <=> carn[c]
cholp[e] <=> cholp[c]
cyst_L[e] <=> cyst_L[c]
dcmp[e] <=> dcmp[c]
dhap[e] <=> dhap[c]
dmgly[e] <=> dmgly[c]
ethamp[e] <=> ethamp[c]
fum[e] <=> fum[c]
g3pc[e] <=> g3pc[c]
glcur[e] <=> glcur[c]
hcys_L[e] <=> hcys_L[c]
icit[e] <=> icit[c]
L2aadp[e] <=> L2aadp[c]
phpyr[e] <=> phpyr[c]
xan[e] <=> xan[c]
xmp[e] <=> xmp[c]
xtsn[e] <=> xtsn[c]
3pg[e] <=> 3pg[c]
udpglcur[e] <=> udpglcur[c]
imp[e] <=> imp[c]
2pg[e] <=>
4hbz[e] <=>
5hoxindoa[e] <=>
cala[e] <=>
carn[e] <=>
cholp[e] <=>
citr_L[e] <=>
crtn[e] <=>
cyst_L[e] <=>
dcmp[e] <=>
dhap[e] <=>
dmgly[e] <=>
ethamp[e] <=>
fum[e] <=>
g3pc[e] <=>
glcur[e] <=>
glyald[e] <=>
gudac[e] <=>
hcys_L[e] <=>
icit[e] <=>
kynate[e] <=>
L2aadp[e] <=>
Lkynr[e] <=>
pep[e] <=>
phpyr[e] <=>
quln[e] <=>
sbt_D[e] <=>
spmd[e] <=>
sprm[e] <=>
xan[e] <=>
xmp[e] <=>
xtsn[e] <=>
3pg[e] <=>
3hanthrn[e] <=>
udpglcur[e] <=>
hLkynr[e] <=>
59
Table 2.8: Models that were infeasible when constraint to experimental growth (ub
+20%/ lb-20%).
cell line
max growth min growth lb
ub
OVCAR-8
4.462
3.473
0 0.022 0.033
OVCAR-5-2
4.052
0.716
0 0.01
0.015
SF-295
30.879
30.457
0 0.018 0.028
SF-295-2
33.785
20.833
0 0.018 0.028
CAKI-1
15.635
13.818
0 0.015 0.022
MDA-MB-435 0.019
0.008
0 0.02
0.03
NCI-H23
7.427
7.274
0 0.015 0.022
NCI-H322M-2 7.45
1.181
0 0.015 0.022
MALME-3M-2 5.939
2.945
0 0.016 0.024
MCF7-2
0.012
0.008
0 0.021 0.032
SF-539
11.05
10.316
0 0.016 0.024
OVCAR-3-2
7.898
7.223
0 0.015 0.022
SK-MEL-28-2 5.479
0.034
0 0.015 0.023
NCI-H226-2
10.131
8.177
0 0.009 0.013
SN12C
8.068
4.066
0 0.018 0.028
60
3 Prediction of intracellular metabolic
states from extracellular metabolomic
data
Metabolic models can provide a mechanistic framework to analyze information-rich
omics data sets, and are increasingly being used to investigate metabolic alternations in human diseases. An expression of the altered metabolic pathway utilization
is the selection of metabolites consumed and released by cells. However, methods
for the inference of intracellular metabolic states from extracellular measurements
in the context of metabolic models remain underdeveloped compared to methods for
other omics data. Herein, we describe a workflow for such an integrative analysis
emphasizing on extracellular metabolomic data. We demonstrate, using the lymphoblastic leukemia cell lines Molt-4 and CCRF-CEM, how our methods can reveal
differences in cell metabolism. Our models explain metabolite uptake and secretion
by predicting a more glycolytic phenotype for the CCRF-CEM model and a more
oxidative phenotype for the Molt-4 model, which was supported by our experimental data. Gene expression analysis revealed altered expression of gene products at
key regulatory steps in those central metabolic pathways, and literature query emphasized the role of these genes in cancer metabolism. Moreover, in silico gene
knock-outs identified unique control points for each cell line model, e.g., phosphoglycerate dehydrogenase for the Molt-4 model. Thus, our workflow is well-suited to
the characterization of cellular metabolic traits based on extracellular metabolomic
data, and it allows the integration of multiple omics data sets into a cohesive picture
based on a defined model context.
3.1 Introduction
Modern high-throughput techniques have increased the pace of biological data generation. Also referred to as the “omics avalanche”, this wealth of data provides
great opportunities for metabolic discovery. Omics data sets contain a snapshot of
almost the entire repertoire of mRNA, protein, or metabolites at a given time point
or under a particular set of experimental conditions. Because of the high complexity
of the data sets, computational modeling is essential for their integrative analysis.
Currently, such data analysis is a bottleneck in the research process and methods
61
are needed to facilitate the use of these data sets, e.g., through meta-analysis of data
available in public databases (e.g., the human protein atlas [190] or the gene expression omnibus [191]), and to increase the accessibility of valuable information for
the biomedical research community.
Constraint-based modeling and analysis (COBRA) is a computational approach
that has been successfully used to investigate and engineer microbial metabolism
through the prediction of steady-states [192]. The basis of COBRA is network reconstruction: networks are assembled in a bottom-up fashion based on genomic
data and extensive organism-specific information from the literature. Metabolic reconstructions capture information on the known biochemical transformations taking
place in a target organism to generate a biochemical, genetic and genomic (BIGG)
knowledge base [10]. Once assembled, a metabolic reconstruction can be converted
into a mathematical model [9], and model properties can be interrogated using a
great variety of methods [15]. The ability of COBRA models to represent genotypephenotype and environment-phenotype relationships arises through the imposition
of constraints, which limit the system to a subset of possible network states [16].
Currently, COBRA models exist for more than 100 organisms, including humans
[45, 53].
Since the first human metabolic reconstruction was described (Recon 1 [45]), biomedical applications of COBRA have increased [50]. One way to contextualize networks is to define their system boundaries according to the metabolic states of the
system e.g., disease or dietary regimes. The consequences of the applied constraints
can then be assessed for the entire network [57]. Additionally, omics data sets have
frequently been used to generate cell-type or condition-specific metabolic models.
Models exist for specific cell types, such as enterocytes [57], macrophages[21], and
adipocytes [144], and even multi-cell assemblies that represent the interactions of
brain cells [49]. All of these cell type specific models, except the enterocyte reconstruction were generated based on omics data sets. Cell-type-specific models have
been used to study diverse human disease conditions. For example, an adipocyte
models was generated using transciptomic, proteomic, and metabolomics data. This
model was subsequently used to investigate metabolic alternations in adipocytes that
would allow for the stratification of obese patients [144]. One highly active field
within the biomedical applications of COBRA is cancer metabolism [51]. Omicsdriven large-scale models have been used to predict drug targets [24, 51]. A cancer
model was generated using multiple gene expression data sets and subsequently
used to predict synthetic lethal gene pairs as potential drug targets selective for the
cancer model, but non-toxic to the global model (Recon 1), a consequence of the
reduced redundancy in the cancer specific model [24]. In a follow up study, lethal
synergy between FH and enzymes of the heme metabolic pathway were experimentally validated and resolved the mechanism by which FH deficient cells e.g.,
in renal-cell cancer cells survive a non-functional TCA cycle [77]. Contextualized
62
models, which contain only the subset of reactions active in a particular cell or
tissue (or cell-) type, can be generated in different ways [51, 131]. However, the
existing algorithms mainly consider gene expression and proteomic data to define
the reaction sets that comprise the contextualized metabolic models. These subset
of reactions are usually defined based on the expression or absence of expression
of the genes or proteins (present and absent calls), or inferred from expression values or differential gene expression. Comprehensive reviews of the methods are
available [129, 193]. Only the compilation of a large set of omics data sets can
result in a tissue (or cell-type) specific metabolic model, whereas the representation of one particular experimental condition is achieved through the integration
of omics data set generated from one experiment only (condition-specific cell line
model). Recently, metabolomic data sets have become more comprehensive and using these data sets allows direct determination of the metabolic network components
(the metabolites). Additionally, metabolomics has proven to be stable, relatively inexpensive, and highly reproducible [113]. These factors make metabolomic data
sets particularly valuable for interrogation of metabolic phenotypes. Thus, the integration of these data sets is now an active field of research [117, 138, 194, 195].
Generally, metabolomic data can be incorporated into metabolic networks as qualitative, quantitative, and thermodynamic constraints [117, 134]. Mo et al. used
metabolites detected in the spent medium of yeast cells to determine intracellular
flux states through a sampling analysis [117], which allowed unbiased interrogation of the possible network states [25] and prediction of internal pathway use.
Such analyses have also been used to reveal the effects of enzymopathies on red
blood cells [17], to study effects of diet on diabetes [163] and to define macrophage
metabolic states [21]. This type of analysis is available as a function in the COBRA
toolbox [15].
In this study, we established a workflow for the generation and analysis of conditionspecific cell line metabolic models that can facilitate the interpretation of metabolomic
data. Our modeling yields meaningful predictions regarding metabolic differences
between two lymphoblastic leukemia cell lines (Figure 3.1A).
3.2 Results
We set up a pipeline that could be used to infer intracellular metabolic states from
semi-quantitative data regarding metabolites exchanged between cells and their environment. Our pipeline combined the following four steps: data acquisition, data
analysis, metabolic modeling and experimental validation of the model predictions
(Figure 3.1A). We demonstrated the pipeline and the predictive potential to predict metabolic alternations in diseases such as cancer based on two lymphoblastic
63
Figure 3.1: a. Combined experimental and computational pipeline to study human metabolism. Experimental work and omics data analysis steps precede computational modeling. Model predictions are validated based on targeted experimental data. Metabolomic and transcriptomic data are used for model refinement
and submodel extraction. Functional analysis methods are used to characterize the
metabolism of cell-line models and compare it to additional experimental data. The
validated models are subsequently used for the prediction of drug targets. b. Uptake and secretion pattern of model metabolites. All metabolite uptake and secretion
steps that were mapped during model generation are shown. Metabolite uptakes are
depicted on the left, and secreted metabolites are shown on the right. A number of
metabolite exchanges mapped to the model were unique to one cell line. Differences between cell lines were used to set quantitative constraints for the sampling
analysis. c. Statistics about the T-cell-specific network generation. d. Quantitative constraints. For the sampling analysis, an additional set of constraints was
imposed on the cell line specific models, emphasizing the differences in metabolite
uptake and secretion between cell lines. Higher uptake of a metabolite was allowed
in the model of the cell line that consumed more of the metabolite in vitro, whereas
the supply was restricted for the model with lower in vitro uptake.
leukemia cell lines. The resulting Molt-4 and CCRF-CEM condition-specific cell
line models were able to explain metabolite uptake and secretion by predicting the
64
distinct utilization of central metabolic pathways by the two cell lines. Whereas the
CCRF-CEM model resembled more a glycolytic, commonly referred to as ‘Warburg’ phenotype, suggested our predictions a more respiratory phenotype for the
Molt-4 model. We found these predictions to be in agreement with measured gene
expression differences at key regulatory steps in the central metabolic pathways,
and they were also consistent with additional experimental data regarding the energy and redox states of the cells. After a brief discussion of the data generation
and analysis steps, the results derived from model generation and analysis will be
described in detail.
3.3 Pipeline for generation of condition-specic
metabolic cell line models
3.3.1 Generation of experimental data
We monitored the growth and viability of lymphoblastic leukemia cell lines in
serum-free medium (Figure 3.6). Multiple omics data sets were derived from these
cells. Extracellular metabolomics (exo-metabolomic) data, comprising measurements of the metabolites in the spent medium of the cell cultures [196], were collected along with transcriptomic data, and these data sets were used to construct the
models.
3.3.2 Analysis of experimental data
Data analysis included defining the sets of metabolites that were taken up or secreted (qualitatively for the generation of the models), and it included determining the quantitative differences in uptake and secretion between cell lines (Figure
3.1B). These differences were later subjected to model constraints. The final sets
of metabolite exchanges that were used for model generation comprised the uptake
and secretion of 14 and 10 metabolites by both models, unique secretion of seven
and unique uptake of four metabolites by the CCRF-CEM model, and secretion of
one and uptake of one unique metabolite in Molt-4 cells (Figure 3.1B). Additionally, sets of genes treated as expressed and unexpressed (absent and present calls),
and groups of differentially expressed genes (DEGs) and alternatively spliced genes
(AS) were predicted by comparing expression in CCRF-CEM and Molt-4 cells (see
Methods section).
65
3.3.3 Generation of the condition-specic models
Model generation involves three steps: refinement of the global model, data mapping and submodel extraction. We added transport and exchange reactions for
metabolites that could not be transported between the extracellular space and the
cytosol (Table 3.2). Nutrient supply (for metabolite uptake) was restricted to the
RPMI medium composition.
First, the detected metabolite uptakes and secretions for each cell line were mapped
separately to the model. The model was thereby constrained to represent a minimal
set of metabolite exchange reactions required to support all of the observed metabolite uptakes and secretions and to explain the experimentally observed growth rates
of the cells (Figure 3.1B, see Methods). The result was a vast reduction of the
number of possible metabolite uptakes and secretions in the two preliminary models (Figure 3.1C), which placed major emphasis on the experimentally observed
metabolite uptake and secretion profiles.
In addition to the (qualitative) exo-metabolomic constraints, genomic data were
mapped to the preliminary models. In general, the mapping of transcriptomic data,
which meant the deletion of all reactions associated with the set of absent genes,
and which was performed after the integration of the exo-metabolomic data, did not
prevent that either model could represent the detected metabolite uptake, metabolite secretion, or biomass production. Curation beyond the initial definition of the
minimal sets of mandatory exchanges was therefore not necessary.
Subsequently, the condition-specific CCRF-CEM and Molt-4 models were extracted
through network pruning. Model reactions unable to support flux were identified
through flux variability analysis (FVA) and removed, leaving the functional reaction
sets to compose the final Molt-4 and CCRF-CEM models.
3.3.4 Condition-specic metabolic models for CCRF-CEM and Molt-4
cells
To determine whether we had obtained two distinct models, we evaluated the reactions, metabolites, and genes of the two models. Both the Molt-4 and CCRF-CEM
models contained approximately half of the reactions and metabolites present in
the global model (Figure 3.1C). They were very similar to each other in terms of
their reactions, metabolites, and genes. The Molt-4 model contained seven reactions that were not present in the CCRF-CEM model (Co-A biosynthesis pathway
and exchange reactions). In contrast, the CCRF-CEM contained 31 unique reac-
66
tions (arginine and proline metabolism, vitamin B6 metabolism, fatty acid activation, transport, and exchange reactions). There were two and 15 unique metabolites
in the Molt-4 and CCRF-CEM models, respectively. Approximately three quarters of the global model genes remained in the condition-specific cell line models
(Figure 3.1C). The Molt-4 model contained 15 unique genes, and the CCRF-CEM
model had four unique genes. Both models lacked NADH dehydrogenase (complex
I of the electron transport chain (ETC)), which was determined by the absence of expression of a mandatory subunit (NDUFB3, Entrez gene ID 4709). Rather, the ETC
was fueled by FADH2 originating from succinate dehydrogenase and from fatty acid
oxidation, which through flavoprotein electron transfer could contribute to the same
ubiquinone pool as complex I and complex II (succinate dehydrogenase). Despite
their different in vitro growth rates (which differed by 11%, see methods) and differences in exo-metabolomic data (Figure 3.1B) and transcriptomic data, the internal
networks were largely conserved in the two condition-specific cell line models.
3.3.5 Condition-specic cell line models predict distinct metabolic
strategies
Despite the overall similarity of the metabolic models, differences in their cellular
uptake and secretion patterns suggested distinct metabolic states in the two cell lines
(Figure 3.1B and see Methods section for more detail). To interrogate the metabolic
differences, we sampled the solution space of each model using an Artificial Centering Hit-and-Run (ACHR) sampler [163]. For this analysis, additional constraints
were applied, emphasizing the quantitative differences in commonly uptaken and
secreted metabolites. The maximum possible uptake and maximum possible secretion flux rates were reduced according to the measured relative differences between
the cell lines (Figure 3.1D, see method section).
We plotted the number of sample points containing a particular flux rate for each
reaction. The resulting binned histograms can be understood as representing the
probability that a particular reaction can have a certain flux value. A comparison
of the sample points obtained for the Molt-4 and CCRF-CEM models revealed a
considerable shift in the distributions, suggesting a higher utilization of glycolysis
by the CCRF-CEM model (Figure 3.2). This result was further supported by differences in medians calculated from sampling points. The shift persisted throughout
all reactions of the pathway and was induced by the higher glucose uptake (35%)
from the extracellular medium in CCRF-CEM cells. The sampling median for glucose uptake was 34% higher in the CCRF-CEM model than in Molt-4 model (Figure
3.2).
The usage of the TCA cycle was also distinct in the two condition-specific cell-
67
Figure 3.2: Histograms of sampling points were different between the CCRF-CEM
model (red) and the Molt-4 model (blue) for 10 glycolysis reactions. Negative values
in the histograms and the table describe reaction fluxes in the reverse direction of
reversible reactions. The table provides the median values of the sampling results.
line models (Figure 3.3). Interestingly, the models used succinate dehydrogenase
differently (Figure 3.3, Figure 3.4). The Molt-4 model utilized an associated reaction to generate FADH2, whereas in the CCRF-CEM model, the histogram was
shifted in the opposite direction, toward the generation of succinate. Additionally,
there was a higher efflux of citrate toward amino acid and lipid metabolism in the
CCRF-CEM model (Figure 3.3). There was higher flux through anaplerotic and
cataplerotic reactions in the CCRF-CEM model than in the Molt-4 model (Figure
3.3); these reactions include the efflux of citrate through ATP-citrate lyase, uptake
68
of glutamine, generation of glutamate from glutamine, transamination of pyruvate
and glutamate to alanine and to 2-oxoglutarate, secretion of nitrogen, and secretion
of alanine. The Molt-4 model showed higher utilization of oxidative phosphorylation (Figure 3.4), again supported by elevated median flux through ATP synthase
(36%) and other enzymes, which contributed to higher oxidative metabolism. The
sampling analysis therefore revealed different usage of central metabolic pathways
by the condition-specific models.
3.4 Experimental validation of energy and redox
status of CCRF-CEM and Molt-4 cells
Cancer cells have to balance their needs for energy and biosynthetic precursors, and
they have to maintain redox homeostasis to proliferate [61]. We conducted enzymatic assays of cell lysates to measure levels and/or ratios of ATP, NADPH + NADP,
NADH + NAD, and glutathione. These measurements were used to provide support
for the in silico predicted metabolic differences (Figure 3.5). Additionally, an Oxygen Radical Absorbance Capacity (ORAC) assay was used to evaluate the cellular
antioxidant status (Figure 3.5B). Total concentrations of NADH + NAD, GSH +
GSSG, NADPH + NADP and ATP, were higher in Molt-4 cells (Figure 3.5A). The
higher ATP concentration in Molt-4 cells could either result from high production
rates, or intracellular accumulation connected to high or low reactions fluxes (Figure
3.5A). Our simplified view that oxidative Molt-4 produces less ATP and was contradicted by the higher ATP concentrations measured (Figure 3.5L). Yet we want
to emphasize that concentrations cannot be compared to flux values, since we are
modeling at steady-state. NADH/NAD+ ratios for both cell lines were shifted toward NADH (Figure 3.5D-E), but the shift toward NADH was more pronounced in
CCRF-CEM (Figure 3.5E), which matched our expectation based on the higher utilization of glycolysis and 2-oxoglutarate dehydrogenase in the CCRF-CEM model
(Figure 3.5L).
The mitochondrial membrane has been suggested to be the quantitatively most important physiological source of superoxide in higher organisms [197]. If the Molt-4
cells were relying more on mitochondrial respiration, we expected them to counteract the increased oxidative stress by using antioxidant systems such as glutathione
and NADPH (Figure 3.5L). Indeed, Molt-4 cells showed a higher capacity for reactive oxygen species (ROS) detoxification than CCRF-CEM cells (Figure 3.5B),
which was supported by the higher utilization of oxidative phosphorylation and
spermidine dismutase by the Molt-4 model (SPODM, median CCRF-CEM = 0.0010
U, and Molt-4 = 0.0011 U) (Figure 3.5L). Reduced glutathione (GSH) is of major
importance for the clearance of ROS [198]. GSH/GSSG ratios were shifted toward
69
GSH in both cell lines (CCRF-CEM = 747:51, Molt-4 = 1182:56), and the shift was
more pronounced in Molt-4 cells (Figure 3.5K).
Both cell lines had low NADPH/NADP+ ratios (CCRF-CEM: 4.7:2.8, Molt-4 6:11.5).
However, in Molt-4 cells, the ratio was shifted toward NADP+, whereas CCRFCEM cells contained higher amounts of NADPH (Figure 3.5G-H). This matched our
expectation that the glycolytic CCRF-CEM model would produce more NADPH
(Figure 3.5L) and that it would exhibit higher flux through the oxidative phase of
the pentose phosphate pathway. Taken together, the experimental data agreed well
with our expectations based on the predicted phenotypes. We sought additional
support for the predicted metabolic differences in the transcriptomic data.
3.5 Comparison of network utilization and alteration
in gene expression
With the assumption that differential expression of particular genes would cause reaction flux changes, we determined how the differences in gene expression (between
CCRF-CEM and Molt-4) compared to the flux differences observed in the models.
Specifically, we checked whether the reactions associated with genes upregulated
(significantly more expressed in CCRF-CEM cells compared to Molt-4 cells) were
indeed more utilized by the CCRF-CEM model, and we checked whether downregulated genes were associated with reactions more utilized by the Molt-4 model.
The set of downregulated genes was associated with 15 reactions, and the set of
49 upregulated genes was associated with 113 reactions in the models. Reactions
were defined as differently utilized if the difference in flux exceeded 10% (considering only non-loop reactions). Of the reactions associated with upregulated genes,
72.57% were more utilized by the CCRF-CEM model, and 2.65% were more utilized by the Molt-4 model (Table 3.3). In contrast, all 15 reactions associated with
the 12 downregulated genes were more utilized in the CCRF-CEM model. After this
initial analysis, we approached the question from a different angle, asking whether
the majority of the reactions associated with each individual gene upregulated in
CCRF-CEM were more utilized by the CCRF-CEM model. We found that this
was the case for 77.55% of the upregulated genes. The majority of reactions associated with two (16.67%) downregulated genes were more utilized by the Molt-4
model. Taken together, our comparisons of the direction of gene expression with
the fluxes of the two cancer cell-line models confirmed that reactions associated
with upregulated genes in the CCRF-CEM cells were generally more utilized by the
CCRF-CEM model.
70
3.6 Accumulation of DEGs and AS genes at key
metabolic steps
After we confirmed that most reactions associated with upregulated genes were
more utilized by the CCRF-CEM model, we checked the locations of differentially
expressed genes within the network. In this analysis, we paid special attention to
the central metabolic pathways that we had found to be distinctively utilized by the
two models. Several differentially expressed genes (DEGs) and alternative splicing
(AS) events were associated with glycolysis, the ETC, pyruvate metabolism, and
the pentose phosphate pathway (PPP) (Table 3.1).
Moreover, in glycolysis, the DEGs and/or AS genes were associated with all three
rate-limiting steps, i.e., the steps mediated by hexokinase, pyruvate kinase, and
phosphofructokinase. Of these key enzymes, hexokinase 1 (Entrez Gene ID: 3098)
was alternatively spliced, and pyruvate kinase (PKM, Entrez gene ID: 5315) was significantly more expressed in the CCRF-CEM cells (Table 3.1), in agreement with the
higher in silico predicted flux. However, in contrast to the observed higher utilization of glycolysis in the CCRF-CEM model, we found that the gene associated with
the rate-limiting glycolysis step, phosphofructokinase (Entrez Gene ID: 5213), was
significantly upregulated in Molt-4 cells relative to CCRF-CEM cells. This higher
expression was detected for only a single isozyme, however. Two of the three genes
associated with phosphofructokinase were also subject to alternative splicing (Table
3.1). In addition to the key enzymes, fructose bisphosphate aldolase (Entrez Gene
ID: 230) was also significantly upregulated in Molt-4 cells relative to CCRF-CEM
cells, which was in contrast to the predicted higher utilization of glycolysis in the
CCRF-CEM model.
Additionally, glucose-6P-dehydrogenase (G6PD), which catalyzes the first reaction
and commitment step of the pentose phosphate pathway (PPP), was an AS gene
(Table 3.1). A second AS gene associated with the PPP reaction of the deoxyribokinase was RBKS (Entrez Gene ID: 64080). This gene is also associated with
ribokinase, but ribokinase was removed during model construction because of the
lack of ribose uptake or secretion. Single AS genes were associated with different
complexes of the ETC (Table 3.1). Literature query revealed that at least 13 genes
associated with alternative splicing events were mentioned previously in connection
with both alternative splicing and cancer, and 37 genes were associated with cancer,
e.g., upregulated, downregulated at the level of mRNA or protein, or otherwise connected to cancer metabolism and signaling. One general observation was that there
was a surprising accumulation of metabolite transporters among the alternatively
spliced genes. Overall, the high incidence of differential gene expression events at
metabolic control points increases the plausibility of the in silico predictions.
71
3.825
2.506
0.146
0.125
Median
Molt-4
0.040
12.898
0.068
36.115
0.035
0.039
0.099
0.351
13.041
6.217
-36.230
326.100
129.365
291.679
129.260
0.073
0.072
0.241
338.276
0.221
2.455
1.563
0.196
0.109
Median
CCRF-CEM
0.051
18.800
0.191
54.746
0.050
0.052
0.138
0.162
18.995
9.835
-54.935
327.300
128.372
289.357
128.219
0.100
0.100
1.300
345.473
0.178
Entrez
Gene ID
219
230
2820
5315
223
223
92579
1737
5213
upregulated
upregulated
upregulated
upregulated
upregulated
upregulated
downregulated
direction
change
upregulated
downregulated
upregulated
upregulated
upregulated
upregulated
upregulated
upregulated
downregulated
upregulated
284273
284273
284273
284273
223
223
5091
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Glycolysis/Glucon.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
Pyruvate Met.
OxPhos
OxPhos
OxPhos
PPP
PPP
Subsystem
4905
1537
64080
2539
8854
8854
92579
1737
5211;5213
3098
5230
284273
284273
284273
284273
8854
8854
5091
9380
10873
AS Entrez
Gene ID
Table 3.1: Differentially expressed genes (DEGs) and alternative splicing (AS) events of central metabolic and cancer-related pathways. Full lists of DEGs and AS are provided in the supplementary material. Upregulated = significantly more expressed in CCRFCEM compared to Molt-4 cells. PPP = pentose phosphate pathway. Glycolysis/glucon = glycolysis/gluconeogenesis. Pyruvate met.
= pyruvate metabolism. OxPhos = oxidative phosphorylation
DEG associated
reactions
ALDD2xm
FBA
G3PD2m
PYK
ALDD2x
ALDD2y
G6PPer
PDHm
PFK
HEX1
PGK
ALCD21_D
ALCD21_L
ALCD22_D
ALCD22_L
LCADi
LCADi_D
PCm
LALDD
ME2m
NADH2_u10m
ATPS4m
CYOR_u10m
DRBK
G6PDH2r
72
3.7 Single gene deletion
Analyses of essential genes in metabolic models have been used to predict candidate
drug targets for cancer cells [24]. Here, we conducted an in silico gene deletion
study for all model genes to identify a unique set of knock-out (KO) genes for each
condition-specific cell line model. The analysis yielded 63 shared lethal KO genes
and distinct sets of KO genes for the CCRF-CEM model (11 genes) and the Molt-4
model (three genes). For three of the unique CCRF-CEM KO genes, the genes were
only present in the CCRF-CEM model (Table 3.4).
The essential genes for both models were then related to the cell-line-specific differences in metabolite uptake and secretion (Figure 1B). The CCRF-CEM model
needed to generate putrescine from ornithine (ORNDC, Entrez Gene ID: 4953) to
subsequently produce 5-methylthioadenosine for secretion (Figure 1B).
S-adenosylmethioninamine produced by adenosylmethionine decarboxylase (arginine and proline metabolism, associated with Entrez Gene ID: 262) is a substrate
required for generation of 5-methylthioadenosine. Another example of a KO gene
connected to an enforced exchange reaction was glutamic-oxaloacetic transaminase
1 (GOT1, Entrez Gene ID: 2805). Without GOT1, the CCRF-CEM model was
forced to secrete 4-hydroxyphenylpyruvate (Figure 3.1B), the second product of
tyrosine transaminase, which is produced only by that enzyme.
One KO gene in the Molt-4 model (Entrez Gene ID: 26227) was associated with
phosphoglycerate dehydrogenase, which catalyzes the conversion of 3-phospho-Dglycerate to 3-phosphohydroxypyruvate while generating NADH from NAD+. This
KO gene is particularly interesting, given the involvement of this reaction in a novel
pathway for ATP generation in rapidly proliferating cells [100, 148, 199]. Reactions
associated with unique KO genes were in many cases utilized more by the model,
in which the gene KO was lethal, underlining the potential importance of these
reactions for the models. Thus, single gene deletion provided unique sets of lethal
genes that could be specifically targeted to kill these cells.
3.8 Discussion
In the current study, we explored the possibility of semi-quantitatively integrating
metabolomic data with the human genome-scale reconstruction to facilitate analysis. By constructing condition-specific cell line models to provide a structured
framework, we derived insights that could not have been obtained from data analysis alone.
73
We derived condition-specific cell line models for CCRF-CEM and Molt-4 cells
that were able to explain the observed exo-metabolomic differences (Figure 3.1B).
Despite the overall similarities between the models, the analysis revealed distinct
usage of central metabolic
pathways (Figures 2-4), which we validated based on experimental data and differential gene expression. The additional data sufficiently supported metabolic differences in these cell lines, providing confidence in the generated models and the
model-based predictions. We used the validated models to predict unique sets of
lethal genes to identify weak links in each model. These weak links may represent
potential drug targets.
Integrating omics data with the human genome-scale reconstruction provides a structured framework (i.e., pathways) that is based on careful consideration of the available biochemical literature [9]. This network context can simplify omics data analysis, and it allows even non-biochemical experts to gain fast and comprehensive
insights into the metabolic aspects of omics data sets. Compared to transcriptomic
data, methods for the integration and analysis of metabolomic data in the context
of metabolic models are less well established, although it is an active field of research [194, 195]. In contrast to other studies, our approach emphasizes the representation of experimental conditions rather than the reconstruction of a generic,
cell-line-specific network, which would require the combination of data sets from
many experimental conditions and extensive manual curation. Rather, our way of
model construction allowed us to efficiently assess the metabolic characteristics of
cells. Despite the fact, that only a limited number of exchanged metabolites can be
measured by available metabolomics platforms and at reasonable time-scale, and
that pathways of measured metabolites might still be unknown to date (Table 3.6
& 3.8), our methods still have the potential to reveal metabolic characteristics of
cells which could be useful for biomedicine and personalized health. The reasons
why some cancers respond to certain treatments and not others remain unclear, and
choosing a treatment for a specific patient is often difficult [199]. One potential
application of our approach could be the characterization of cancer phenotypes to
explore how cancer cells or other cell types with particular metabolic characteristics
respond to drugs.
The generation of our condition-specific cell line models involved only limited manual curation, making this approach a fast way to place metabolomic data into a
network context. Model building mainly involves the rigid reduction of metabolite
exchanges to match the observed metabolite exchange pattern with as few additional
metabolite exchanges as possible. It should be noted that this reduction determines,
which pathways can be utilized by the model. Our approach mostly conserved
the internal network redundancy. However, a more significant reduction may be
achieved using different data. Generally, a trade-off exists between the reduction of
74
the internal network and the increasing number of network gaps that need to be curated by using additional omics data, such as transcriptomics and proteomics. One
way to prevent the emergence of network gaps would be to use mapping algorithms
that conserve network functionality, such as GIMME [131]. However, several additional methods exist for the integration of transcriptomic data [129], and which
model-building method is best depends on the available data. Interestingly, the lack
of a significant contribution of our gene expression data to the reduction of network size suggests that the use of transcriptomic data is not necessary to identify
distinct metabolic strategies; rather, the integration of exo-metabolomic data alone
may provide sufficient insight. However, sampling of the cell line models constrained according to the exo-metabolomic profiles only, or increasing the cutoff for
the generation of absent and present calls (p<0.01), did not yield the same insights
as presented herein. Only recently Gene Inactivation Moderated by Metabolism,
Metabolomics and Expression (GIM(3)E) became available, which enforces minimum turnover of detected metabolites based on intracellular metabolomics data as
well as gene expression microarray data [138]. In contrast to this approach, we
emphasized our analysis on the relative differences in the exo-metabolomic data of
two cell lines. GIM(3)E constitutes another integration method when the analysis
should be emphasized on intracellular metabolomics data [138].
The metabolic differences predicted by the models are generally plausible. Cancers
are known to be heterogeneous [61], and the contribution of oxidative phosphorylation to cellular ATP production may vary [147]. Moreover, leukemia cell lines
have been shown to depend on glucose, glutamine, and fatty acids to varying extents to support proliferation. Such dependence may cause the cells to adapt their
metabolism to the environmental conditions [187]. In addition to identifying supporting data in the literature, we performed several analyses to validate the models
and model predictions. Our expectations regarding the levels and ratios of metabolites relevant to energy and redox state were largely met (Figure 3.5L). The more
pronounced shift of the NADH/NAD+ ratio toward NADH in the CCRF-CEM cells
was in agreement with the predicted Warburg phenotype (Figure 3.5), and the higher
lactate secretion in the CCRF-CEM cells (Figure S2) implies an increase in NADH
relative to NAD+ [200, 201], again matching the known Warburg phenotype.
ROS production is enhanced in certain types of cancer [198, 202], and the generation of ROS is thought to contribute to mutagenesis, tumor promotion, and tumor
progression [202, 203]. However, decreased mitochondrial glucose oxidation and a
transition to aerobic glycolysis protect cells against ROS damage during biosynthesis and cell division [204]. The higher ROS detoxification capability in Molt-4 cells,
in combination with higher spermidine dismutase utilization by the Molt-4 model
(Figure 3.5), provided a consistent picture of the predicted respiratory phenotype
(Figure 3.5L).
75
Control of NADPH maintains the redox potential through GSH and protects against
oxidative stress, yet changes in the NADPH ratio in response to oxidative damage
are not well understood [205]. Under stress conditions, as assumed for Molt-4 cells,
the NADPH/NADP+ ratio is expected to decrease because of the continuous reduction of GSSG (Figure 3.5L), and this was confirmed in the Molt-4 cells (Figure
3.5). The higher amounts of GSH found in Molt-4 cells in vitro may demonstrate
an additional need for ROS scavengers because of a greater reliance on oxidative
metabolism.
Cancer is related to metabolic reprogramming, which results from alterations of
gene expression and the expression of specific isoforms or splice forms to support
proliferation [206, 207]. The gene expression differences detected between the two
cell lines in the present study supported the existence of metabolic differences in
these cell lines, particularly because key steps of the metabolic pathways central to
cancer metabolism seemed to be differentially regulated (Table 3.1). The detailed
analysis of the respective differences on the pathway fluxes exceeds the scope of this
study, which was to demonstrate the potential of the integration of exo-metabolomic
data into the network context. We found discrepancies between differential gene
regulation and the flux differences between the two models as well as the utilization
AS gene-associated reaction. This is not surprising, since analysis of the detailed
system is required to make any further assumptions on the impact that the differential regulation or splicing might have on the reaction flux, given that for many of
the concerned enzymes isozymes exist, or only one of multiple subunits of a protein
complex was concerned. Additionally, reaction fluxes are regulated by numerous
post-translational factors, e.g., protein modification, inhibition through proteins or
metabolites, alter reaction fluxes [208], which are out of the scope of constraintbased steady-state modeling. Rather, the results of the presented approach demonstrate how the models can be used to generate informed hypothesis that can guide
experimental work.
The combination of our tailored metabolic models and differential gene expression
analysis seems well-suited to determine the potential drivers involved in metabolic
differences between cells. Such information could be valuable for drug discovery, especially when more peripheral metabolic pathways are considered. Additionally, statistical comparisons of gene expression data with sampling-derived flux data
could be useful in future studies [144].
A single-gene-deletion analysis revealed that phosphoglycerate dehydrogenase
(PGDH) was a lethal KO gene for the Molt-4 model only. Differences in PGDH
protein levels correspond to the amount of glycolytic carbon diverted into glycine
biosynthesis. Rapidly proliferating cells may use an alternative glycolytic pathway
for ATP generation, which may provide an advantage in the case of extensive oxidative phosphorylation and proliferation [100, 148, 199]. For breast cancer cell lines,
76
variable dependency on the expression of PGDH has already been demonstrated
[148]. This example of a unique KO gene demonstrates how in silico gene deletion
in metabolomics-driven models can identify the metabolic pathways used by cancer
cells. This approach can provide valuable information for drug discovery.
In conclusion, our contextualization method produced metabolic models that agreed
in many ways with the validation data sets. The analyses described in this study have
great potential to reveal the mechanisms of metabolic reprogramming, not only in
cancer cells but also in other cells affected by diseases, and for drug discovery in
general.
3.9 Materials and methods
Global model
The model we used (global model) was a subset of Recon 2 [53], which is freely
available (http://humanmetabolism.org/). Transport and exchange reactions for
metabolites identified according to metabolite uptakes and secretions detected herein
were already considered in the construction of Recon 2. The model captured additional reactions (Table 3.2).
Cell culture
Molt-4 and CCRF-CEM cells were obtained from ATCC (CRL-1582 and CCL-119)
and routinely grown in RPMI 1640 with, 2 mM GlutaMax and 10 % FBS (Invitrogen; 61870-010, 10108-57) in a humidified incubator at 37 ◦ C and 5% CO2. At least
3 days before experiments the medium was changed to serum-free medium (Advanched RPMI1640, containing 2 mM GlutaMax (Invitrogen; 12633-012, 35050038). The medium was refreshed the day before starting the experiment. For experiments cells were centrifuged at 201 x g for 5 min and resuspended in serum-free
medium containing DMSO (0.67%) at a cell concentration of 5 x 105 cellsml. 1
or 2 ml cell suspension was seeded in triplicates in 24 well or 12 well plates, respectively. At the indicated times the cells were removed by centrifugation and the
spent medium frozen at −80 ◦ C. Cell number, size and viability (Trypan blue exclusion) was obtained by counting cells using an automatic cell counter, Countess
(Invitrogen) (Figure 3.6).
77
Analysis of the extracellular metabolome
R
Mass spectrometry analysis of the exo-metabolome was performed by Metabolon,
Inc (Durham, NC, USA) using a standardized analytical platform. In total, 75 extracellular metabolites were detected in the initial data set for at least one of the
two cell lines [196]. Of these metabolites, 15 were not part of our global model
and were discarded. Apart from being absent in our global model, an independent
search in HMDB [209] revealed no pathway information was available for most of
these metabolites. It should be noted that metabolites e.g., N-acetylisoleucine, Nacetylmethionine or pseudouridine, constitute protein and RNA degradation products, which were out of the scope of the metabolic network.
Thiamin (Vitamin B1) was part of the minimal medium of essential compounds
supplied to both models. Riboflavin (Vitamin B2) and Trehalose were excluded
since these compounds cannot be produced by human cells. Erythrose and fructose
were also excluded. In contrast 46 metabolites that were part of the global model.
The data set included two different time points, which allowed us to treat the increase/decrease of a metabolite signal between time points as evidence for uptake
or secretion when the change was greater than 5% from what was observed in the
control (Table 3.5, 3.6, 3.7, 3.8). We found 12 metabolites that were taken up by
both cell lines and 10 metabolites that were commonly secreted by both cell lines
over the course of the experiment. Additionally, Molt-4 cells took up three metabolites not taken up by CCRF-CEM cells, and secreted one metabolite not secreted
by CCRF-CEM cells. Two of the three uniquely uptaken metabolites were essential
amino acids: valine and methionine. However, it is unlikely that these metabolites
were not taken up by the CCRF-CEM cells, and the CCRF-CEM model was allowed
to take up this metabolite. Because of this adjustment, no quantitative constraints
were applied for the sampling analysis either. CCRF-CEM cells had four unique
uptaken and seven unique secreted metabolites (exchange not detected in Molt-4
cells).
Network renement based on exo-metabolic data
Despite its comprehensiveness, the human metabolic reconstruction is not complete
with respect to extracellular metabolite transporters [53, 58]. Accordingly, we identified metabolite transport systems from the literature for metabolites that were already part of the global model, but whose extracellular transport was not yet accounted for. Diffusion reactions were included whenever a respective transporter
could not be identified. In total, 31 reactions (11 exchange reactions, 16 transport
reactions and seven demand reactions (Table 3.2) were added to Recon 2 [53], and
78
two additional reactions were added to the global model (Table 3.2).
Expression proling
Molt-4 and CCRF-CEM cells were grown in advanced RPMI 1640 and 2 mM GlutaMax, and the cells were resuspended in medium containing DMSO (0.67%) at
a concentration of 5 × 105 cells/mL. The cell suspension (2 mL) was seeded in
12-well plates in triplicate. After 48 h of growth, the cells were collected by centrifugation at 201 ×g for 5 min. Cell pellets were snap-frozen in liquid N2 and kept
frozen until RNA extraction and analysis by Aros (Aarhus, Denmark).
Analysis of transcriptomic data
We used the Affymetrix GeneChip Human Exon 1.0 ST Array to measure whole
genome exon expression. We generated Detection above background (DABG) calls
using ROOT (version 22) and the XPS package for R (version 11.1), with Robust
Multi-array Analysis (RMA) summarization. Calls for data mapping were assigned
based on p < 0.05 as the cutoff probability to distinguish presence versus absence
for the 1,278 model genes (Table 3.9; Table S12, http://link.springer.com/article/
10.1007%2Fs11306-014-0721-3 for mapping of probe sets to model genes).
Differential gene expression and alternative splicing analysis were performed by using AltAnalyse software (v2.02beta) with default options on the raw data files (CEL
files). The Homo sapiens Ensemble 65 database was used, probe set filtering was
kept as DABG p < 0.05, and non-log expression < 70 was used for constitutive probe
sets to determine gene expression levels. For the comparison, CCRF-CEM was the
experimental group and Molt-4 was the baseline group. The set of differentially
expressed genes between cell lines was identified based on a p < 0.05 FDR cutoff
(Table 3.10, 3.11). Alternative splicing analysis was performed on core probe sets
with a minimum alternative exon score of 2 and a maximum absolute gene expression change of 3 because alternative splicing is a less critical factor among highly
differentially expressed genes.
Gene expression data, complete lists of DABG p-values, differentially expressed
genes and alternative splicing events have been deposited in the Gene Expression
Omnibus (GEO) database (accession number: GSE53123).
79
Deriving cell-type-specic subnetworks
Transcriptomic data were mapped to the model in a manual fashion (COBRA function: deleteModelGenes). Specifically, reactions dependent on gene products that
were called as “absent” were constrained to zero, such that fluxes through these reactions were disabled. Submodels were extracted based on the set of reactions carrying flux (network pruning) by running fastFVA [162] after mapping the metabolomic
and transcriptomic data using the COBRA toolbox [15].
Cell weight
We calculated the cell dry weight based on the relative volume difference and comparison to human osteosarcoma (U2OS) cells. The cell dry weight of U2OS cells,
60 pg [175], and cell volume, 4000 µm3 [110], were derived from the literature. The
cell volume of lymphocytes (243 µm3 , the average volume of lymphoblasts from
patients with ALL, [176]) was derived from the literature. Cell dry weight was
calculated accordingly: 4000/243=16.46, and 60 pg/16.46 = 3.645 pg (3.645e-12 g).
Denition of maximum uptake rate and minimum uptake rate
The maximum uptake rate was defined by the RPMI medium concentrations, and
the minimum uptake was defined by mass spectrometry detection limits. Therefore,
both medium concentration (mM) and detection limit (mM) were converted to flux
values (mmol/gDW/hr) by using a cell concentration of 2.17*1e6 (the concentration
of viable CCRF-CEM cells after 48 h), an experimental duration of 48 h, and the calculated
dry
weight
of
3.645e-12
g
per
cell:
Flux = MetConc/(CellConc*CellWeight*T*1000). In the case of uptake, they were
defined by the RPMI medium concentration (lower bound, lb) and the detection
limit (upper bound, ub), and in the case of secretion, they were defined by the detection limit (lb) or left unconstrained (ub).
Setting general and qualitative exo-metabolomic constraints during
model building
Medium concentration to flux calculations were based on 3.645*1e-12 g cell weight,
an initial cell concentration of 2.17*1e6 ,
T = 48 h,
and
80
Flux = MetConc/(CellConc*CellWeight*T). We constrained the model by enforcing minimal flux through exchange reactions for secreted or uptaken metabolites
in the correct directions (qualitative constraints). In the case of uptake, the upper
bound of the corresponding exchange reaction was set to the flux equivalent of the
minimal detection limit [196] using the same equation used for the concentrations
in the medium. In the case of secretion, the lower bound of the exchange was set
to be the minimum flux value based on the minimal detection limit (Table 3.12).
The biomass reaction was constrained in a cell-line-specific manner. The experimental growth rate was 0.035 hr-1 for CCRF-CEM and 0.032 hr-1 for Molt-4 (Table
3.13 & 3.14). Vmax and Vmin were set to allow 20% deviation from the experimental growth rate in each direction. Oxygen uptake was constrained to Vmin =
-2.346 mmol/gDW/hr [163]. All infinite fluxes were set to the maximum: -500/500
mmol/gDW/hr. Alanine and glutamine are the breakdown products of GlutaMax in
an external reaction. The model did not account for these reactions. However, the
glutamine concentration was used to calculate the uptake flux of glutamine, which
otherwise was not present in the medium. The increase of both compounds therefore did not necessarily reflect actual secretion by the cells, as it may have simply
reflected the breakdown of GlutaMax, although additional secretion by the cells
cannot be excluded. In the case of glutamine and alanine, the model exchanges
remained unconstrained (qualitative and quantitative constraints) because the actual
cell behavior could not be derived from the data, as it was overshadowed by accumulation resulting from the breakdown of GlutaMax (Table 3.5, 3.6, 3.7, 3.8). Uptake
of the conditionally essential amino acid cysteine (of which adequate amounts may
not be produced) was enabled. Repeated profiling of the two cell lines supported
the uptake of these amino acids (unpublished data). All other exchange reactions
were constrained to zero, except those for basic ions, basic medium compounds and
essential amino acids.
Denition of quantitative constraints
The constraints on the exchange reactions defined during model building were the
same in both condition-specific cell line models (Figure 3.1D). For the analysis, we
used the relative quantitative differences of commonly uptaken or secreted metabolites to further constrain the models (quantitative constraints). The model of the cell
line that secreted more in the experiment was forced to secrete more by increasing
the lower bound of the respective exchange reaction. The new lower bound was set
to be proportionate to the difference in metabolite secretion in the experimental data
(Figure 3.1D, C-D). Accordingly, we decreased the lower bound of the model for
the cell line that showed less uptake of the influx metabolites (Figure 3.1D, A-B).
For a list of the adjusted bounds, see Table 3.15. To estimate the ratio for adjustment, we first calculated the fold change (FC) of each metabolite in the medium
81
and in each cell line by comparing the zero and 48 h time points. Next, we compared the FC values to generate a slope (Slope = FCcelline/FCmedium) for each
cell line. In the last step, we calculated the slope ratios (Slope Ratio = slopeCCRFCEM/slopeMolt-4), which were used for the adjustments (Figure 3.1D, colored x
= Slope Ratio). Some metabolite exchanges were not adjusted, including those of
phosphate and the essential amino acids histidine, L-cysteine, valine, methionine,
alanine, and glutamine. The additional quantitative bounds were established to get
a closer match to the phenotypes, so we refrained from adding constraints based on
data, which was inconclusive. Glutamine and alanine were the breakdown products
of Glutamax, however instead of modeling the breakdown of Glutamax, we did not
constrain the bounds for these compounds. The ACHRsampler implemented in the
COBRA toolbox [15] was used with 10,000 generated warm-up points, nFiles =
100, pointsPerFile = 5000, and stepsPerPoint=2500, and the cell-line models were
used as inputs.
Comparison of network utilization and DEGs/AS
The models shared a set of 1,907 reactions. We defined a reaction as differently
utilized if the median value calculated from the sampling points differed by more
than 10%. The shared reaction set was divided into three groups: x (reactions with
median difference > 10% and higher in CCRF-CEM cells) = 1381, y (reactions
with median difference > 10% and higher in Molt-4 cells) = 158, and z (reactions
with median difference < 10% and reactions with opposite directionality in addition
to loop reactions) = 368. Loop reactions were defined by flux variability analysis
(FVA) with the criteria minFlux = -500 and maxFlux = 500 (219 reactions in Molt-4,
220 reactions in CCRF-CEM).
Models cover equal amounts of dierentially expressed and alternatively
spliced genes
The GeneChip Human Exon 1.0 ST Array had been used to measure gene expression and exon variation in Molt-4 and CCRF-CEM cells. We derived sets of differentially expressed (DEGs) and alternatively spliced (AS) genes by comparing gene
expression between the two cell lines (Tables 3.10 & 3.11). The analysis yielded
57 Recon 1 genes with significantly more expressed in CCRF-CEM compared to
Molt-4 cells (upregulated), and 16 genes with significantly lower expression in
CCRF-CEM compared to Molt-4 cells (downregulated). To validate the models, we
investigated how many of the genes remained part of the condition specific models,
after the integration of the metabolomic data. CCRF-CEM and the Molt-4 specific
82
model covered the same subset of DEGs, both of 49 (66 transcripts) upregulated
genes and 12 (13 transcripts) downregulated genes (missing downregulated genes
Entrez Gene IDs: 19, 875, 23657, 2944; missing upregulated genes Entrez Gene ID:
64131, 64772, 2581, 256435, 6799, 4697, 9951, 7263), and also the same amount
of reactions associated with these DEGs, which was 144 reactions for upregulated
genes, and 15 reactions associated with downregulated genes (Figure 3.1C). We
identified 90 AS genes in the set of Recon 1 genes, and all AS genes remained in
the condition specific models. CCRF-CEM and Molt-4 models both covered an
equal set of 211 AS gene associated reactions (Figure 3.1C). The gene expression
data contributed only minor to the differentiation of the models. Both models included the same amount of DEGs and AS genes, as well as the same amount of
reactions associated with these gene sets.
Enzyme assays
Molt-4 and CCRF-CEM cells were grown as described previously, and harvested
in their respective log growth phase. Cell number, size and viability (Trypan blue
exclusion) was obtained by counting cells using an automatic cell counter, Countess
(Invitrogen). Cells were collected by centrifugation at 201 x g for 5 min, washed
once with PBS and pelleted again by centrifugation. The cells were then resuspended in extraction buffer (0.1 M Tris, 2.5 mM EDTA, pH 7.75) to afford 1x105
cells µL and heated on a heat block set to 100 ◦ C for 2 min followed by cooling
on ice. Following centrifugation at 20000xg, the supernatant fraction (hereafter
metabolite extract: ME) was removed and stored at −80 ◦ C prior to biochemical
assays. ATP content was measured in 100x diluted ME using the CellTiter-Glo kit
(Promega) according to the manufacturers instructions employing a Spectramax M3
microplate reader. NAD+ and NADH were measured in 5x diluted ME using the
Amplite fluorometric NADNADH ratio assay kit (AAT Bioquest) according to the
manufacturers instructions. NADP+ and NADPH were measured similarly using
the Amplite fluorometric NADP+/NADPH ratio assay kit (AAT Bioquest). Oxidized and reduced glutathione was measured similarly in 10 x diluted ME using the
Amplite fluorometric GSHGSSG ratio assay kit (AAT Bioquest). ROS were evaluated using a modified ORAC assay based on a method described by Ganske and
Dell [210]. Briefly, 25 µL of ME or 25 µL of the standard 6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid (Trolox, Sigma) were mixed with 150 µL 10 nM
fluorescein (Sigma) and 25 µL 120 nM [2,2’-azobis(2-methylpropionamidine) dihydrochloride (AAPH)] (Sigma) in a transparent 96 well microplate. Following 15
sec mechanical shaking, fluorescence (ex:485 nm, em: 580 nm) was monitored at 1
min intervals for 80 min at 37 ◦ C. ORAC values were extrapolated off a Trolox standard curve using Softmax Pro software and expressed as µmol of Trolox equivalent
(T.E.) 1x106 cells. All biochemical assay data shown represent triplicate averages,
83
n = 2.
All calculations were performed by using TomLab cplex linear solver and MATLAB.
84
Figure 3.3: Differences in the use of the TCA cycle by the CCRF-CEM model (red)
and the Molt-4 model (blue). The table provides the median values of the sampling
results. Negative values in histograms and in the table describe reactions with flux
in the reverse direction of reversible reactions. There are multiple reversible reactions for the transformation of isocitrate and α-ketoglutarate, malate and fumarate,
and succinyl-CoA and succinate. These reactions are unbounded, and therefore
histograms are not shown. The details of participating cofactors have been removed. Atp = ATP, cit = citrate, adp = ADP, pi = phosphate, oaa = oxaloacetate,
accoa = acetyl-CoA, coa = coenzyme-A; icit = isocitrate; αkg = α-ketoglutarate;
succ-coa = succinyl-CoA; succ = succinate; fum = fumarate; mal = malate, oxa =
oxaloacetate; pyr = pyruvate; lac = lactate; ala = alanine; gln = glutamine; ETC
= electron transport chain.
85
Figure 3.4: Sampling reveals different utilization of oxidative phosphorylation by
the generated models. Different distributions are observed for the CCRF-CEM
model (red) and the Molt-4 model (blue). Molt-4 has higher median flux through
ETC reactions II-IV. The table provides the median values of the sampling results.
Negative values in the histograms and in the table describe reactions with flux in the
reverse direction of reversible reactions. Both models lack Complex I of the ETC
because of constraints arising from the mapping of transcriptomic data. Electron
transfer flavoprotein and electron transfer flavoprotein-ubiquinone oxidoreductase
both also carry higher flux in the Molt-4 model.
86
Figure 3.5: A-K) Experimentally determined ATP, NADH + NAD, NADPH + NADP,
and GSH + GSSG concentrations, and ROS detoxification in the CCRF-CEM and
Molt-4 cells. L) Expectations for cellular energy and redox states. Expectations are
based on predicted metabolic differences of the Molt-4 and CCRF-CEM models.
87
3.10 Supplementary material
This section captures tables published as supplementary material.
88
89
rxns
34HPPte
3MOBte
3MOPte
4HPRO_LTte
4MOPte
5MTAte
5OXPROt
AHCYSte
AICARte
ANTHte
ARGte
CBASPte
DM_4hrpo
DM_Lcystin
DM_anth
DM_btn
DM_fol
DM_ncam
DM_pnto_R
EX_34hpp
EX_3mob(e)
EX_3mop(e)
EX_4mop(e)
EX_5mta(e)
EX_5oxpro(e)
EX_ahcys(e)
EX_aicar(e)
EX_anth(e)
EX_cbasp(e)
EX_mal_L(e)
MAL_Lte
ORNt
OROTGLUt
PNTOte
Formulas
34hpp[e] <=> 34hpp[c]
3mob[e] <=> 3mob[c]
3mop[e] <=> 3mop[c]
4hpro_LT[e] <=> 4hpro_LT[m]
4mop[e] <=> 4mop[c]
5mta[e] <=> 5mta[c]
2 na1[e] + 5oxpro[e] <=> 2 na1[c] + 5oxpro[c]
ahcys[e] <=> ahcys[c]
aicar[e] <=> aicar[c]
anth[e] <=> anth[c]
arg_L[e] <=> arg_L[c]
cbasp[e] <=> cbasp[c]
4hpro_LT[m] ->
Lcystin[c] ->
anth[c] ->
btn[c] ->
fol[c] ->
ncam[c] ->
pnto_R[c] ->
34hpp[e] <=>
3mob[e] <=>
3mop[e] <=>
4mop[e] <=>
5mta[e] <=>
5oxpro[e] <=>
ahcys[e] <=>
aicar[e] <=>
anth[e] <=>
cbasp[e] <=>
mal_L[e] <=>
mal_L[e] <=> mal_L[c]
orn[e] <=> orn[c]
glu_L[c] + orot[e] <=> glu_L[e] + orot[c]
pnto_R[e] <=> pnto_R[c]
RxnNames
34HPPte
3MOBte
3MOPte
trans-4-hydroxy-L-proline-transport
4MOPte
5MTAte
5-oxoproline transport (sodium symport) (2:1)
AHCYSte
aicar transport
p-aminobenzoate (PABA)/anthranilate transport
diffusion reaction for arginin
N-carbamoylaspartate transport
demand reaction for trans-4-hydroxy-L-proline
demand reaction for Lcystin
demand reaction for PABA
demand reaction for biotin
demand reaction for Folate
demand reaction for Nicotinamide
demand reaction for (R)-Pantothenate
EX_34hpp
EX_3mob(e)
EX_3mop(e)
EX_4mop(e)
EX_5mta(e)
EX_5oxpro(e)
EX_ahcys(e)
aicar exchange reaction
EX_anth(e)
EX_cbasp(e)
L-Malate exchange
malate transport
ornithine transport via diffusion (extracellular to periplasm)
antiport of orotate and glutamate
diffusion reaction of pantothenate
lb
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
0
0
0
0
0
0
0
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
-1000
ub
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
10864.1
grRules
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Recon 2
1
1
1
1
1
1
1
1
1
1
1
1
global model
Table 3.2: Reactions added to Recon 2 and the global model. Exchange and transport reactions were added for those intracellular
metabolites of the global model that had been detected in the extracellular metabolome of the cell lines. Added transport mechanisms
were almost exclusively diffusion reactions and not gene associated.
upreg CCRF-CEM
correct
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
upreg Molt-4
incorrect
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CCRF-CEM median
(mmol/gdw/hr)
1.222
0.025
7.103
7.102
0.046
0.057
0.145
0.145
0.128
0.354
0.814
0.099
0.103
7.109
0.229
0.216
Molt-4 median
(mmol/gdw/hr)
0.83
0.018
4.437
4.431
0.037
0.039
0.101
0.101
0.078
0.232
0.287
0.07
0.072
4.441
0.166
0.154
Differentially expressed
down-regulated Genes
84706
51301
230
230
230
63917
2982
2983
4143
4942
5091
10846
10846
5213
5728
5728
average expression
Molt-4
7.986
6.808
8.263
8.263
8.263
8.366
8.672
8.228
6.963
8.376
7.982
6.894
6.894
8.111
10.119
10.119
average expression
CCRF-CEM
6.753
5.335
6.469
6.469
6.469
4.957
4.795
4.235
5.727
4.113
6.714
5.278
5.278
6.591
6.832
6.832
adjusted
p-value
0.015
0.02
0.011
0.011
0.011
0.018
0.006
0.011
0.026
0.004
0.016
0.035
0.035
0.038
0.005
0.005
Glutamate metabolism
O-Glycan Biosynthesis
Glycolysis/Gluconeogenesis
Fructose and Mannose Metabolism
Glyoxylate and Dicarboxylate Metabolism
O-Glycan Biosynthesis
Nucleotides
Nucleotides
Methionine Metabolism
Urea cycle/amino group metabolism
Pyruvate Metabolism
Nucleotides
Nucleotides
Glycolysis/Gluconeogenesis
Inositol Phosphate Metabolism
Inositol Phosphate Metabolism
Table 3.3: Comparison of flux changes and and gene expression changes of genes more highly expressed in Molt-4 cells.
Reaction
Abbreviation
ALATA_L
CORE2GTg
FBA
FBA2
FBA4
GALNTg
GUACYC
GUACYC
METAT
ORNTArm
PCm
PDE1
PDE4
PFK
PI345P3P
PI345P3Pn
90
Table 3.4: Unique Knock-out (KO) genes for each cancer cell line model.
Molt-4
5832.1
2271.1
26227.1
CCRF-CEM unique KO genes
2805.1
316.1
6539.1
262.1
55349.1
1468.1
57026.1
4953.1
55163.1
8566.1
6723.1
91
Figure 3.6: Growth and apoptosis of Molt-4 and CCRF-CEM cells. Cells were
resuspended in RPMI advanced containing DMSO (0.67%) and cultured at 37◦ C
and 5% CO2 in 24 well plates. A Cells were counted using an automatic cell
counter. The graphs show viable cells (excluding Trypan blue). Data shown is the
average and standard deviation of biological triplicates. B Inhibition of growth by
DMSO (0.67%) after 48 hrs. Data shown is the average and standard deviation of
7 (CCRF-CEM) and 6 (Molt-4) independent experiments. C Cell viability is shown
as the fraction of cells excluding Trypan blue in each sample. Data shown is the
average and standard deviation of 7 (CCRF-CEM) and 6 (Molt-4) independent experiments. D Apoptosis was measure after 48 hrs using Annexin V binding and flow
cytometry. Data shown are all apoptotic cells, including cells undergoing necrosis
(stained with Annexin V-PE and 7-AAD). Data shown is the average and standard
deviation of three independent experiments.
92
93
Metabolite
choline
p-aminobenzoate (PABA)
glucose
pyruvate
tryptophan
threonine
lysine
leucine
phenylalanine
folate
isoleucine
proline
tyrosine
trans-4-hydroxyproline
methionine
histidine
valine
5-oxoproline
lactate
citrate
glycine
glutamate
5-methylthioadenosine (MTA)
ornithine
asparagine
4-methyl-2-oxopentanoate
3-methyl-2-oxovalerate
uridine
succinate
betaine
malate
pyridoxate
3-methyl-2-oxobutyrate
4-hydroxyphenylpyruvate
glutamine
alanine
cysteine
1613345.1
45304.84667
1258710.1
52977.77333
30970.784
68680.93
616352.4333
65245.09667
506043.8667
1.28
0.86
0.95
0.82
FC medium over time
0.97
0.89
1.07
1.20
1.01
0.97
1.01
1.00
0.97
0.90
1.00
0.99
1.02
1.04
0.99
1.02
0.99
1.34
1.06
1.00
0.88
1.04
323185.6667
2788313.567
62076.23333
Mean CCRF-CEM (2 hrs)
884112.5667
158271.14
65009057.67
376779.1
2068977
386495.2
196447.1
21119935.67
8237784.667
100629.24
20790535.33
5134219
4023284.333
679862.7
2890029.333
74035.24
2668140
62146.47333
5222377.933
8317.144
219683.7
437398.3667
19417.58
73850.77
744471.0333
9918.992
7222.259333
Mean CCRF-CEM (48 hrs)
259273.9333
60631.19333
24330565.33
249036.3333
1570648
303808.2
149861.6667
16346765.67
6540301.667
84807.62333
17219085
4445918.333
3489981.333
582257.4667
2538211
86165.55
2790196.333
1012932.38
134980059.9
86546.77933
460476.5267
630407.2667
62796.41667
98489.89
1010001.033
129433.4973
145547.7347
4952.689333
82806.571
442897.2833
94181.77233
12895.19667
17641.55667
6115.220333
2063962.067
30868376.53
64524.22333
0.00
0.16
0.09
0.96
FC MOLT over time
3.41
2.61
2.67
1.51
1.32
1.27
1.31
1.29
1.26
1.19
1.21
1.15
1.15
1.17
1.14
0.86
0.96
0.06
0.04
0.10
0.48
0.69
0.31
0.75
0.74
0.08
0.05
0.00
Table 3.5: Metabolomic data of CCRF-CEM cells (mapped).
Mean control medium (48 hrs)
905132.5
178538.2167
71459886.33
387569.1333
2132723.667
417512.6333
195675.8
20801258.67
8345511.333
105904.7067
21225778.33
5168599.333
4063607.667
786011.5667
3018536.333
67843.12
2900419.667
21808.73
1917042.967
8704.7345
186682.92
455197.4667
Mean control medium (2 hrs)
876300.4333
159124.46
76555640.67
465074.6
2153692
406102.2667
198435.8
20823770.33
8087955
95419.73667
21229254.67
5107317.333
4142302
814465.8333
2995910.333
69077.16333
2857012.667
29168.15
2038946.433
8681.527333
163882.9467
473539.8667
Exchange reaction
2.44
1.72
1.60
0.31
0.31
0.30
0.30
0.29
0.29
0.29
0.21
0.17
0.13
0.13
0.15
0.16
0.03
1.28
1.02
0.90
0.40
0.35
0.31
0.20
0.08
0.08
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.16
1.19
0.11
Difference of FC
EX_chol(e)
EX_anth(e)
EX_glc(e)
EX_pyr(e)
EX_trp_L(e)
EX_thr_L(e)
EX_lys_L(e)
EX_leu_L(e)
EX_phe_L(e)
EX_fol(e)
EX_ile_L(e)
EX_pro_L(e)
EX_tyr_L(e)
EX_4hpro_LT(e)
EX_met_L(e)
EX_his_L(e)
EX_val_L(e)
EX_5oxpro(e)
EX_lac-L(e)
EX_cit(e)
EX_gly(e)
EX_glu-L(e)
EX_5mta(e)
EX_orn(e)
EX_asn_L(e)
EX_4mop(e)
EX_3mop(e)
EX_uri(e)
EX_succ(e)
EX_glyb(e)
EX_mal_L(e)
EX_4pyrdx(e)
EX_3mob(e)
EX_34hpp(e)
EX_gln_L(e)
EX_ala-L(e)
EX_cys_L(e)
Comment
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
EssAA -uptake
EssAA -uptake
EssAA -uptake
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
no direction
no direction
no direction
Metabolite
pyridoxine (Vitamin B6)
phosphate
valine
biotin
myo-inositol
aspartate
pantothenate
arginine
nicotinamide
serine
cystine
thiamin (Vitamin B1)
riboflavin (Vitamin B2)
fructose
erythrose
trehalose
caprylate (8:0)
alanylglutamine
cysteine-glutathione disulfide
4-guanidinobutanoate
caproate (6:0)
p-cresol sulfate
phenol red
dimethylarginine (SDMA + ADMA)
N-acetylalanine
N-acetylisoleucine
N-acetylleucine
N-acetylmethionine
O-acetylhomoserine
pseudouridine
beta-hydroxyisovalerate
uracil
orotate
inosine
hypoxanthine
guanine
N-carbamoylaspartate
S-adenosylhomocysteine (SAH)
adenine
Mean control medium (2 hrs)
889550.5333
216828142.3
2857012.667
133350.2667
4614936
632160.0333
84638.70667
379985.8667
376635.7
993272.0667
135388.0233
48437.09
23051.91333
240068.45
74911.88667
19662.35233
2766223.23
71265.4
32599.79
11700.92733
9962.496667
589185.2333
Mean control medium (48 hrs)
899254.5
221118425
2900419.667
128943.8
3920500
612562.3
86751.96
377714.6333
405942.7667
1166496.7
221451.05
38485.935
23033.54
259747.7433
86557.93
0.86
0.91
0.97
1.05
0.91
1.01
0.95
FC medium over time
0.99
0.98
0.99
1.03
1.18
1.03
0.98
1.01
0.93
0.85
0.61
1.26
1.00
0.92
0.87
14023.93033
2575210.853
68153.81667
28704.82333
7450.419
9284.687
585617.5333
Mean CCRF-CEM (2 hrs)
932496.1333
212276379
2668140
133599.3667
3505897
680373.4333
88002.12
374009.2667
386861.5667
1367977.267
206287.07
45028.005
22609.52333
254332.1633
105577.8733
Mean CCRF-CEM (48 hrs)
905903.4667
208623151.3
2790196.333
129767.1333
3971405.333
770903.9333
99449.36667
399764.1333
379674.8
1422040.5
231801.3733
119204.6633
26433.13667
117301.67
66799.03333
274029.1833
8147.758667
952614.66
0.00
0.93
1.02
0.96
1.00
FC CCRF-CEM over time
1.03
1.02
0.96
1.03
0.88
0.88
0.88
0.94
1.02
0.96
0.89
0.38
0.86
2.17
1.58
0.00
1.72
2.70
Table 3.6: Metabolomic data of CCRF-CEM cells (not mapped).
16937.25333
2529354.017
69277.14
34301.30333
10626.237
10082.20967
560994.5
30830.90333
7272.74
9692.281667
584484.7
109093.6367
16868.47433
7383.618
11703.27667
6690.47
46041.62267
11813.504
Exchange reaction
0.04
0.04
0.03
0.00
0.29
0.15
0.09
0.07
0.09
0.11
0.28
0.88
0.15
1.24
0.72
0.00
0.86
1.79
0.97
0.12
0.12
0.05
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Difference of FC
Comment
5% cutoff
5% cutoff
5% cutoff
5% cutoff
small difference
small difference
small difference
small difference
small difference
increasing signal, not clear
increasing signal, not clear
excluded
excluded
excluded
excluded
not in Recon1
not in the medium
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
94
95
Metabolite
p-aminobenzoate (PABA) 159124.46 178538.2167 0.89 162567.13 EX_anth(e) 0.89 uptake
pyruvate
glucose
choline
lysine
valine
phenylalanine
threonine
tryptophan
methionine
pantothenate
leucine
tyrosine
isoleucine
histidine
5-oxoproline
lactate
malate
citrate
glutamate
glycine
aspartate
4-methyl-2-oxopentanoate
3-methyl-2-oxovalerate
ornithine
3-methyl-2-oxobutyrate
glutamine
alanine
cysteine
68680.93
1258710.1
52977.77333
65245.09667
1613345.1
45304.84667
8681.527333
473539.8667
163882.9467
632160.0333
Mean control medium (48 hrs)
387569.1333
71459886.33
905132.5
195675.8
2900419.667
8345511.333
417512.6333
2132723.667
3018536.333
86751.96
20801258.67
4063607.667
21225778.33
67843.12
21808.73
1917042.967
30970.784
8704.7345
455197.4667
186682.92
612562.3
465074.6
76555640.67
876300.4333
198435.8
2857012.667
8087955
406102.2667
2153692
2995910.333
84638.70667
20823770.33
4142302
21229254.67
69077.16333
29168.15
2038946.433
1.28
0.86
0.95
1.00
1.04
0.88
1.03
1.20
1.07
0.97
1.01
0.99
0.97
0.97
1.01
0.99
0.98
1.00
1.02
1.00
1.02
1.34
1.06
FC medium over time
824549.3667
3430342.067
56566.27667
439148.0667
61697085.33
892182.2
188473.1
2853523.667
8215168.333
381085.2333
2037735.333
3024630.333
89717.10667
19725086.67
3934639.333
20799761
69406.69
120655.9867
5654513.467
20292.406
9459.837
462903.8333
121762.3567
590881.7333
34436.50433
25108.829
54272.41667
Mean Molt-4 (2 hrs)
210407.8333
34981419.33
541860.4667
112386.1667
1793173.667
5360276
259555.2667
1387754.333
2266832.333
68882.68333
15148808
3075783.333
17160163
95624.28
2060525.467
101768253
27226.6555
34177.945
1024508.5
310547.7
940705.6
113668.5123
121927.3673
65159.50333
14717.55667
2283200.867
25970024.1
60759.23
Mean Molt-4 (48 hrs)
2.09
1.76
1.65
1.68
1.59
1.53
1.47
1.47
1.33
1.30
1.30
1.28
1.21
0.73
0.06
0.06
0.75
0.28
0.45
0.39
0.63
0.30
0.21
0.83
0.00
0.36
0.13
0.93
FC Molt-4 over time
Table 3.7: Metabolomic data of Molt-4 cells (mapped).
Mean control medium (2 hrs)
EX_pyr(e)
EX_glc(e)
EX_chol(e)
EX_lys_L(e)
EX_val_L(e)
EX_phe_L(e)
EX_thr_L(e)
EX_trp_L(e)
EX_met_L(e)
EX_pnto-R(e)
EX_leu_L(e)
EX_tyr_L(e)
EX_ile_L(e)
EX_his_L(e)
EX_5oxpro(e)
EX_lac-L(e)
EX_mal_L(e)
EX_cit(e)
EX_glu-L(e)
EX_gly(e)
EX_asp_L(e)
EX_4mop(e)
EX_3mop(e)
EX_orn(e)
EX_3mob(e)
EX_gln_L(e)
EX_ala-L(e)
EX_cys_L(e)
Exchange reaction
0.89
0.69
0.68
0.66
0.61
0.56
0.50
0.46
0.34
0.33
0.30
0.26
0.21
0.29
1.28
1.01
0.75
0.72
0.59
0.49
0.40
0.30
0.21
0.12
0.00
0.36
1.15
0.08
Difference of FC
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
uptake
EssAA -uptake
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
secretion
no direction
no direction
no direction, added for modeling
Comment
Metabolite
myo-inositol
nicotinamide
phosphate
folate
serine
cystine
arginine
trans-4-hydroxyproline
caprylate (8:0)
pyridoxine (Vitamin B6)
asparagine
proline
biotin
fructose
erythrose
riboflavin (Vitamin B2)
thiamin (Vitamin B1)
alanylglutamine
cysteine-glutathione disulfide
4-guanidinobutanoate
caproate (6:0)
phenol red
p-cresol sulfate
4-hydroxyphenylpyruvate
5-methylthioadenosine (MTA)
adenine
beta-hydroxyisovalerate
betaine
dimethylarginine (SDMA + ADMA)
guanine
hypoxanthine
inosine
N-acetylalanine
N-acetylisoleucine
N-acetylleucine
N-acetylmethionine
N-carbamoylaspartate
O-acetylhomoserine
orotate
pseudouridine
pyridoxate
S-adenosylhomocysteine (SAH)
succinate
trehalose
uracil
uridine
Mean control medium (2 hrs)
4614936
376635.7
216828142.3
95419.73667
993272.0667
135388.0233
379985.8667
814465.8333
16937.25333
889550.5333
506043.8667
5107317.333
133350.2667
240068.45
74911.88667
23051.91333
48437.09
2529354.017
69277.14
34301.30333
10626.237
560994.5
10082.20967
FC medium over time
1.18
0.93
0.98
0.90
0.85
0.61
1.01
1.04
0.86
0.99
0.82
0.99
1.03
0.92
0.87
1.00
1.26
0.91
0.97
1.05
0.91
0.95
1.01
Mean Molt-4 (2 hrs)
4195275
367096.5667
223518663
97550.78667
814496.1667
121897.495
391354.4333
630513.4
12992.01433
906371.4667
549097.2
5163708.333
127379.2667
321092.33
97682.54333
23085.68333
32363.01
2409678.433
71769.5
32887.87667
8926.486
566563.1667
9632.704667
Mean Molt-4 (48 hrs)
3998122
361576.3667
216863897.3
102678.49
762597.6
130375.3933
375180.1
622493.9
15446.087
930993.3
677509.1333
5263614.333
123131.3
161924.9033
71213.15333
24688.79
87165.94333
173792.14
169136.6333
27785.40667
10492.25633
566028.5
9219.116
FC Molt-4 over time
1.05
1.02
1.03
0.95
1.07
0.93
1.04
1.01
0.84
0.97
0.81
0.98
1.03
1.98
1.37
0.94
0.37
13.87
0.42
1.18
0.85
1.00
1.05
Exchange reaction
Table 3.8: Metabolomic data of Molt-4 cells (not mapped).
Mean control medium (48 hrs)
3920500
405942.7667
221118425
105904.7067
1166496.7
221451.05
377714.6333
786011.5667
19662.35233
899254.5
616352.4333
5168599.333
128943.8
259747.7433
86557.93
23033.54
38485.935
2766223.23
71265.4
32599.79
11700.92733
589185.2333
9962.496667
Difference of FC
0.13
0.09
0.05
0.05
0.22
0.32
0.04
0.02
0.02
0.02
0.01
0.01
0.00
1.06
0.51
0.07
0.89
12.95
0.55
0.13
0.06
0.05
0.03
Comment
small difference
small difference
small difference
small difference
small difference, increasing signal, not clear
small difference, increasing signal, not clear
5% cutoff
5% cutoff
5% cutoff
5% cutoff
5% cutoff
5% cutoff
5% cutoff
excluded
excluded
excluded
excluded
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not in Recon1
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
not detected
96
Table 3.9: Tables of absent genes (Entrez Gene IDs). Cutoff p<=0.05.
Molt-4
535.1
1548.1
2591.1
3037.1
4248.1
4709.1
6522.1
7167.1
7367.1
8399.1
23545.1
129807.1
221823.1
CCRF-CEM
239.1
443.1
535.1
1548.1
2683.1
3037.1
4248.1
4709.1
5232.1
6522.1
7364.1
7367.1
8399.1
23545.1
54363.1
66002.1
129807.1
221823.1
97
SystemCode
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
Avg-M
8.375521333
10.11926708
8.496148571
8.672461364
9.263808333
8.262743
8.228284667
10.87135917
7.985629815
7.981523333
8.365903333
6.808208333
6.962758667
9.71058
6.893592727
8.110786667
Avg-C
4.112782667
6.8316375
4.805906667
4.795136667
7.12825
6.468515
4.234786333
9.849012917
6.752667593
6.713636667
4.956553333
5.335471667
5.726715333
7.897661667
5.277731515
6.590863333
Log fold-C_vs_M
-4.262738667
-3.287629583
-3.690241905
-3.877324697
-2.135558333
-1.794228
-3.993498333
-1.02234625
-1.232962222
-1.267886667
-3.40935
-1.472736667
-1.236043333
-1.812918333
-1.615861212
-1.519923333
FC_vs_M
-19.19606458
-9.765064567
-12.9084324
-14.69572569
-4.394071463
-3.468298332
-15.92805644
-2.03121964
-2.350491107
-2.408085584
-10.62469852
-2.775478787
-2.355516329
-3.513522977
-3.064945057
-2.867758096
Rawp-C_vs_M
7.99E-06
1.78E-05
2.06E-05
2.45E-05
7.77E-05
0.000109391
0.000117929
0.000167936
0.000205075
0.00022421
0.000270385
0.000348653
0.000572536
0.000729345
0.000999696
0.001231408
Adjp-C_vs_M
0.00430063
0.005406007
0.005541875
0.005886867
0.00947567
0.011092184
0.011422963
0.013752713
0.015409421
0.016241419
0.01767911
0.020012748
0.025892296
0.02911905
0.0347777
0.038406061
ANOVA-rawp
7.99E-06
1.78E-05
2.06E-05
2.45E-05
7.77E-05
0.000109391
0.000117929
0.000167936
0.000205075
0.00022421
0.000270385
0.000348653
0.000572536
0.000729345
0.000999696
0.001231408
ANOVA-adjp
0.00430063
0.005406007
0.005541875
0.005886867
0.00947567
0.011092184
0.011422963
0.013752713
0.015409421
0.016241419
0.01767911
0.020012748
0.025892296
0.02911905
0.0347777
0.038406061
Largest FC
4.262738667
3.287629583
3.690241905
3.877324697
2.135558333
1.794228
3.993498333
1.02234625
1.232962222
1.267886667
3.40935
1.472736667
1.236043333
1.812918333
1.615861212
1.519923333
Entrez Gene ID
4942
5728
19
2982
875
230
2983
23657
84706
5091
63917
51301
4143
2944
10846
5213
Table 3.10: Differentially expressed Recon 1 genes. Genes significantly lower expressed in CCRF-CEM compared to Molt-4 cells
(down-regulated)
GeneID
ENSG00000065154
ENSG00000171862
ENSG00000165029
ENSG00000164116
ENSG00000160200
ENSG00000109107
ENSG00000061918
ENSG00000151012
ENSG00000166123
ENSG00000173599
ENSG00000178234
ENSG00000176928
ENSG00000151224
ENSG00000134184
ENSG00000112541
ENSG00000152556
98
99
GeneID
ENSG00000165646
ENSG00000103056
ENSG00000151229
ENSG00000110090
ENSG00000169692
ENSG00000103489
ENSG00000100092
ENSG00000131844
ENSG00000047230
ENSG00000137124
ENSG00000151689
ENSG00000114805
ENSG00000110719
ENSG00000197142
ENSG00000198610
ENSG00000115159
ENSG00000176463
ENSG00000105655
ENSG00000167280
ENSG00000054983
ENSG00000008513
ENSG00000175198
ENSG00000119673
ENSG00000184005
ENSG00000163754
ENSG00000126264
ENSG00000196502
ENSG00000185527
ENSG00000164574
ENSG00000067225
ENSG00000056998
ENSG00000197165
ENSG00000189043
ENSG00000182621
ENSG00000233276
ENSG00000136908
ENSG00000182601
ENSG00000017483
ENSG00000118402
ENSG00000053371
ENSG00000139629
ENSG00000150768
ENSG00000180011
ENSG00000004864
ENSG00000140374
ENSG00000114480
ENSG00000101846
ENSG00000103876
ENSG00000141349
ENSG00000160216
ENSG00000128311
ENSG00000143149
ENSG00000160752
ENSG00000143179
ENSG00000152270
ENSG00000100504
ENSG00000106392
SystemCode
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
En
Avg-M
6.348980833
6.507780667
4.470424444
5.915162667
6.958078
6.802452821
6.92734619
4.655619333
4.632352333
5.721805714
5.938046667
5.515985556
7.503504167
4.759571389
6.75731
2.876411667
5.475066
6.937104074
6.451015
3.950183333
6.702956944
6.313908333
6.076504167
5.121033333
7.105963333
6.687294444
7.036531111
5.553538333
6.763128333
10.22047364
5.887866
6.423324286
7.824286667
7.643203667
8.879731111
8.261582222
5.548533333
7.55413
7.319834444
8.741796667
8.827946061
7.273722778
8.622386667
6.083869048
8.671333333
7.359098056
9.246466667
8.654008333
8.104913333
5.906705
6.151493333
8.648609333
9.592555
9.339323333
7.220595417
6.391890606
7.17008
Avg-C
8.381608125
10.13391733
8.107204444
9.526362667
8.541751333
8.719742308
8.805102857
8.792884
8.514437167
8.771057143
7.491598333
10.07648289
9.165249167
8.206302222
9.349666667
7.917331667
7.960023333
8.705852222
7.742110833
7.531426667
7.935448194
8.556496667
7.37894
6.938466667
9.165953333
8.290237222
9.30005
8.002308333
7.7879425
11.34921576
8.236556
8.217031429
9.003066667
9.415810667
10.53635556
10.29223333
8.326338333
9.102361667
8.975881111
10.02813833
10.04615788
8.661583333
10.18323667
8.008005714
9.890636667
8.772863333
10.41815667
9.846961667
9.541003333
7.820725
7.3064675
9.735004667
11.01180722
10.51800667
8.993825833
7.489673333
8.825687778
Log fold-C_vs_M
2.032627292
3.626136667
3.63678
3.6112
1.583673333
1.917289487
1.877756667
4.137264667
3.882084833
3.049251429
1.553551667
4.560497333
1.661745
3.446730833
2.592356667
5.04092
2.484957333
1.768748148
1.291095833
3.581243333
1.23249125
2.242588333
1.302435833
1.817433333
2.05999
1.602942778
2.263518889
2.44877
1.024814167
1.128742121
2.34869
1.793707143
1.17878
1.772607
1.656624444
2.030651111
2.777805
1.548231667
1.656046667
1.286341667
1.218211818
1.387860556
1.56085
1.924136667
1.219303333
1.413765278
1.17169
1.192953333
1.43609
1.91402
1.154974167
1.086395333
1.419252222
1.178683333
1.773230417
1.097782727
1.655607778
FC_vs_M
4.091492739
12.34741102
12.4388396
12.22023395
2.997320449
3.777127508
3.675031629
17.59708633
14.74429395
8.277823161
2.935388926
23.59644036
3.163989912
10.90358636
6.030830411
32.92062909
5.598177898
3.407581466
2.447138632
11.96910468
2.349723907
4.73245351
2.466449645
3.52453598
4.16983414
3.037622895
4.801612197
5.459504427
2.034697278
2.186680015
5.09361529
3.467046396
2.263852558
3.416708102
3.152779873
4.085892115
6.858081267
2.924584486
3.151517484
2.439087757
2.326581649
2.616903193
2.950276152
3.795096753
2.328342562
2.664316139
2.252754343
2.286202718
2.705865257
3.768577339
2.226803362
2.123428209
2.67446852
2.263700875
3.418184847
2.140255046
3.150558892
Rawp-C_vs_M
9.38E-07
3.09E-07
2.15E-06
2.78E-06
3.42E-06
3.95E-06
8.87E-06
1.12E-05
1.16E-05
1.22E-05
1.46E-05
1.53E-05
1.63E-05
1.98E-05
4.14E-05
4.23E-05
4.91E-05
5.84E-05
7.09E-05
8.40E-05
8.63E-05
9.53E-05
9.87E-05
0.000106542
0.000120482
0.000125061
0.000137686
0.00016152
0.000185194
0.000185153
0.000209733
0.00024208
0.000261637
0.000336372
0.00034123
0.000362879
0.000391522
0.00046156
0.000477661
0.000551609
0.000624148
0.000643874
0.000763683
0.000777083
0.000799335
0.000844445
0.000862407
0.000870648
0.001120531
0.001146319
0.001184066
0.001225217
0.001235009
0.001338504
0.001373503
0.00164339
0.001955335
Adjp-C_vs_M
0.002369837
0.002369837
0.002680364
0.002873331
0.003231507
0.003409965
0.004323043
0.004629771
0.004734584
0.004783728
0.005133602
0.005216399
0.005329837
0.005541875
0.007352644
0.007381679
0.00778803
0.008323233
0.009009148
0.009951944
0.010059905
0.01058427
0.01068324
0.010966345
0.011531411
0.011691684
0.012416874
0.013424253
0.014508219
0.014508219
0.015616513
0.016816003
0.017430384
0.019708994
0.019800001
0.020267225
0.020706009
0.023107631
0.023572077
0.025530913
0.02669763
0.027265716
0.029858445
0.030274263
0.030846084
0.03175414
0.032115557
0.032199289
0.036746456
0.037188559
0.037877908
0.038371413
0.038472621
0.040231461
0.040746602
0.044737426
0.048840237
ANOVA-rawp
9.38E-07
3.09E-07
2.15E-06
2.78E-06
3.42E-06
3.95E-06
8.87E-06
1.12E-05
1.16E-05
1.22E-05
1.46E-05
1.53E-05
1.63E-05
1.98E-05
4.14E-05
4.23E-05
4.91E-05
5.84E-05
7.09E-05
8.40E-05
8.63E-05
9.53E-05
9.87E-05
0.000106542
0.000120482
0.000125061
0.000137686
0.00016152
0.000185194
0.000185153
0.000209733
0.00024208
0.000261637
0.000336372
0.00034123
0.000362879
0.000391522
0.00046156
0.000477661
0.000551609
0.000624148
0.000643874
0.000763683
0.000777083
0.000799335
0.000844445
0.000862407
0.000870648
0.001120531
0.001146319
0.001184066
0.001225217
0.001235009
0.001338504
0.001373503
0.00164339
0.001955335
ANOVA-adjp
0.002369837
0.002369837
0.002680364
0.002873331
0.003231507
0.003409965
0.004323043
0.004629771
0.004734584
0.004783728
0.005133602
0.005216399
0.005329837
0.005541875
0.007352644
0.007381679
0.00778803
0.008323233
0.009009148
0.009951944
0.010059905
0.01058427
0.01068324
0.010966345
0.011531411
0.011691684
0.012416874
0.013424253
0.014508219
0.014508219
0.015616513
0.016816003
0.017430384
0.019708994
0.019800001
0.020267225
0.020706009
0.023107631
0.023572077
0.025530913
0.02669763
0.027265716
0.029858445
0.030274263
0.030846084
0.03175414
0.032115557
0.032199289
0.036746456
0.037188559
0.037877908
0.038371413
0.038472621
0.040231461
0.040746602
0.044737426
0.048840237
Largest FC
2.032627292
3.626136667
3.63678
3.6112
1.583673333
1.917289487
1.877756667
4.137264667
3.882084833
3.049251429
1.553551667
4.560497333
1.661745
3.446730833
2.592356667
5.04092
2.484957333
1.768748148
1.291095833
3.581243333
1.23249125
2.242588333
1.302435833
1.817433333
2.05999
1.602942778
2.263518889
2.44877
1.024814167
1.128742121
2.34869
1.793707143
1.17878
1.772607
1.656624444
2.030651111
2.777805
1.548231667
1.656046667
1.286341667
1.218211818
1.387860556
1.56085
1.924136667
1.219303333
1.413765278
1.17169
1.192953333
1.43609
1.91402
1.154974167
1.086395333
1.419252222
1.178683333
1.773230417
1.097782727
1.655607778
Entrez Gene ID
6571
55512
114134
1374
10555
64131
57026
64087
56474
219
3628
23007
10312
51703
1109
2820
28232
51477
64772
2581
6482
5095
10965
256435
2992
10870
6817
5148
55568
5315
8908
6799
4697
23236
2876
8818
9951
92745
6785
8574
11226
1737
284273
10165
2108
2632
412
2184
92579
56894
7263
223
2224
7371
5140
5836
56913
Table 3.11: Differentially expressed Recon 1 genes. Genes significantly higher expressed in CCRF-CEM compared to Molt-4 cells
(up-regulated)
Table 3.12: Detection limits (derived from Paglia et al., 2012) for the definition
of model bounds. For metabolites that were not captured in the paper, we queried
HMDB using the displayed name.
Exchange metabolites
5-methylthioadenosine (MTA)
uridine
choline
nicotinamide
3-methyl-2-oxovalerate
succinate
pantothenate
5-oxoproline
thiamin (Vitamin B1)
p-aminobenzoate (PABA)
trans-4-hydroxyproline
lactate
3-methyl-2-oxobutyrate
histidine
tryptophan
ornithine
arginine
threonine
folate
glutamine
pyridoxate
serine
glucose
riboflavin (Vitamin B2)
glutamate
tyrosine
phenylalanine
myo-inositol
cystine
leucine
methionine
cysteine
asparagine
malate
isoleucine
pyruvate
lysine
alanine
citrate
proline
glycine
aspartate
4-hydroxyphenylpyruvate
4-methyl-2-oxopentanoate
betaine
Valine
100
Exchange reaction
EX_5mta(e)
EX_uri(e)
EX_chol(e)
EX_ncam(e)
EX_3mop(e)
EX_succ(e)
EX_pnto_R(e)
EX_5oxpro(e)
EX_thm(e)
EX_anth(e)
EX_4HPRO(e)
EX_lac_L(e)
EX_3mob(e)
EX_his_L(e)
EX_trp_L(e)
EX_orn(e)
EX_arg_L(e)
EX_thr_L(e)
EX_fol(e)
EX_gln_L(e)
EX_4pyrdx(e)
EX_ser_L(e)
EX_glc(e)
EX_ribflv(e)
EX_glu_L(e)
EX_tyr_L(e)
EX_phe_L(e)
EX_inost(e)
EX_Lcystin(e)
EX_leu_L(e)
EX_met_L(e)
EX_cys_L(e)
EX_asn_L(e)
EX_mal_L(e)
EX_ile_L(e)
EX_pyr(e)
EX_lys_L(e)
EX_ala_L(e)
EX_cit(e)
EX_pro_L(e)
EX_gly(e)
EX_asp_L(e)
EX_34hpp
EX_4mop(e)
EX_glyb(e)
EX_val_L(e)
Theoretical mass (g/mol)
298.0974
243.0617
104.1075
123.0558
129.0552
117.0188
220.1185
128.0348
265.1123
138.0555
132.0661
89.0239
115.0395
156.0773
205.0977
133.0977
175.1195
120.0661
440.1319
147.077
182.0453
106.0504
179.0556
377.1461
148.061
182.0817
166.0868
179.0556
241.0317
132.1025
150.0589
122.0276
133.0613
133.0137
132.1025
87.0082
147.1134
90.0555
191.0192
116.0712
74.0242
134.0453
180.157
130.142
118.0868
118.0868
LOD (ng/mL)
0.3
1.7
2.8
3
3.5
3.9
4
4.8
6.1
7.7
8.1
10.9
11.2
13.6
15.7
16.9
24.8
25.6
25.7
28.4
32.7
37.5
44
45
45
47.4
48.4
59
59.7
68.9
74.1
77
82.1
99.2
112.9
121.3
131.7
133.5
150.8
169.2
214.3
229.5
537.3
3.5
2.8
28.2
LOD (mM) Input for
1.00638E-06
6.99411E-06
2.68953E-05
2.43792E-05
2.71202E-05
3.3328E-05
1.8172E-05
3.74898E-05
2.30091E-05
5.57747E-05
6.13329E-05
0.000122439
9.73579E-05
8.71363E-05
7.65489E-05
0.000126974
0.000141618
0.000213216
5.83916E-05
0.000193096
0.000179626
0.000353605
0.000245734
0.000119317
0.000303929
0.000260323
0.000291414
0.000329507
0.000247685
0.000521565
0.000493806
0.000631005
0.000617009
0.000745788
0.000854639
0.001394121
0.000895228
0.001482419
0.000789449
0.001457726
0.002894999
0.001712108
0.002982399
2.68937E-05
2.37114E-05
0.000238807
Table 3.13: Calculation of the growth rates and definition of upper (ub) and lower
bounds (lb) imposed on the CCRF-CEM model.
Time hrs
0.5
9
24
48
Viable Concentration
4.47E+05
6.20E+05
1.20E+06
2.17E+06
Stdev
2.52E+04
6.24E+04
1.00E+05
5.77E+04
Counted over hours
47.5
Doubling time
19.6
-20%
20%
19.6
23.52
15.68
Growth rate
0.035
0.029
0.044
lb
ub
Table 3.14: Calculation of the growth rates and definition of upper (ub) and lower
bounds (lb) imposed on the Molt-4 model.
Time hrs
0.5
9
24
48
Viable Concentration
4.63E+05
6.37E+05
1.00E+06
2.00E+06
Stdev
5.86E+04
8.96E+04
8.39E+04
2.00E+05
Counted over hours
47.5
Doubling time
22.0
-20%
20%
22
26.4
17.6
Growth rate
0.032
0.026
0.039
lb
ub
Table 3.15: Lower bounds of commonly exchanged metabolites were adjusted according to the relation of change in uptake/secretion in the experiment.)
direction Exchange of
adjustment
relation CCRF-CEM Molt-4
metabolite
lb
lb
secr
EX_mal_L(e) higher secretion 23.4
0.04597
0.00196
secr
EX_3mop(e)
higher secretion 4.15
0.00030
0.00007
secr
EX_4mop(e)
higher secretion 3.95
0.00028
0.00007
secr
EX_cit(e)
higher secretion 2.88
0.00599
0.00208
secr
EX_lac_L(e)
higher secretion 1.44
0.00046
0.00032
secr
EX_3mob(e)
higher secretion 1.2
0.00031
0.00026
secr
EX_orn(e)
higher secretion 1.11
0.00037
0.00033
secr
EX_glu_L(e)
higher secretion 1.35
0.00080
0.00108
secr
EX_gly(e)
higher secretion 1.18
0.00763
0.00900
secr
EX_5oxpro(e) higher secretion 1.05
0.00010
0.00010
upt
EX_chol(e)
lower uptake
0.48
-0.05637
-0.02706
upt
EX_glc(e)
lower uptake
0.66
-29.26278
-19.31343
upt
EX_pyr(e)
lower uptake
0.62
-1.63303
-2.63391
upt
EX_lys_L(e)
lower uptake
0.72
-0.51962
-0.72169
upt
EX_phe_L(e) lower uptake
0.78
-0.18675
-0.23942
upt
EX_thr_L(e)
lower uptake
0.85
-0.37612
-0.44250
upt
EX_tyr_L(e)
lower uptake
0.89
-0.30240
-0.33977
upt
EX_trp_L(e)
lower uptake
0.89
-0.05743
-0.06453
upt
EX_leu_L(e)
lower uptake
0.99
-0.99609
-1.00615
upt
EX_ile_L(e)
no difference
1
-1.00615
-1.00615
101
4 Contextualization Procedure and
Modeling of Monocyte Specic TLR
Signaling
Innate immunity is the first line of defense against invasion of pathogens. Toll-like
receptor (TLR) signaling is involved in a variety of human diseases extending far
beyond immune system related diseases, affecting a number of different tissues and
cell-types. Computational models often do not account for cell-type specific differences in signaling networks. Investigation of these differences and its phenotypic
implications could increase understanding of cell signaling and processes such as
inflammation. The wealth of knowledge for TLR signaling has been recently summarized in a stoichiometric signaling network applicable for constraint-based modeling and analysis (COBRA). COBRA methods have been applied to investigate tissue specific metabolism using omics data integration. Comparable approaches have
not been conducted using signaling networks. In this study, we present ihsTLRv2,
an updated TLR signaling network accounting for the association of 314 genes with
558 network reactions. We present a mapping procedure for transcriptomic data
onto signaling networks and demonstrate the generation of a monocyte specific TLR
network. The generated monocyte network is characterized through expression of
a specific set of isozymes rather than reduction of pathway contents. While further tailoring the network to a specific stimulation condition, we observed that the
quantitative changes in gene expression due to LPS stimulation affected the tightly
connected set of genes. Differential expression influenced about one third of the entire TLR signaling network, in particular, NF-κB activation. Thus, a cell-type and
condition specific signaling network can provide functional insight into signaling
cascades. Furthermore, we demonstrate the energy dependence of TLR signaling
pathways in monocytes.
4.1 Introduction
Toll-like receptors (TLRs) play a major role in innate immunity for sensing
pathogens and inducing innate immune response [31]. Each TLR specifically recognizes one or more exogenous and endogenous ligands. Exogenous ligands are
highly conserved microbial associated molecular pattern, e.g., CpG sequences within
103
DNA or lipopolysaccharide (LPS), a cell wall component of gram-negative bacteria. Upon stimulation downstream pathways and transcription factors (TF) are activated which modify gene expression and protein levels and induce production of
pro-inflammatory cytokines and chemokines, amongst others. Human cells express
up to ten TLRs [31]. LPS induces specifically TLR4 signaling pathways[31].
Disturbance of TLR signaling is thought to play a role in chronic inflammatory
diseases affecting cells of the gastrointestinal tract, the central nervous system, kidneys, skin, lungs, and joints [38]. TLRs also seem to be involved in both inhibiting
and promoting cancer [39]. TLRs expression has been confirmed for a large number
of human tissues, yet sets of expressed TLRs vary [211, 212, 213]. Activation of
differing downstream pathways has been suggested as response to viruses, TLR7,
and TLR8 agonists in distinct monocytes subsets [214]. Differences and similarities
in the expression of isoforms, TLRs and downstream pathways of the TLR network
of different cells can have important implications for the design of therapeutical approaches. Drugs targeting TLR signaling pathways have considerable therapeutic
potential in inflammatory diseases and cancer [42].
Monocytes are essential for the inflammatory response to microbial pathogens [215].
Blood circulating monocytes migrate into tissues and differentiate into a range of tissue macrophages and dendritic cells. However, monocytes themselves are involved
in the defense against pathogens as they possess an extensive set of pathogen receptors and produce large quantities of effector molecules [215, 216]. Aberrant
TLR signaling in the monocyte/macrophage cell lineage has been implicated in
chronic inflammatory and auto-inflammatory diseases [217]. However, the reason
for this increase in IL-1β secretion for some of these diseases remains unknown
[217]. Taken together, understanding TLR signaling at cell-type and tissue specific
resolution seems to be of major importance for unraveling mechanisms underlying
disease development and progression.
Signaling networks comprise a complex meshwork of multiple pathways, feedback
loops, and cross-talk. Such complex networks may be best investigated using computational approaches. Constraint-based modeling and analysis (COBRA) techniques facilitate investigation of large-scale biological networks without depending on detailed kinetic and concentration information [2]. Instead, COBRA relies
on physical-chemical constraints. A requirement for constraint-based modeling is
a genome-scale reconstruction, which is subsequently converted into mathematical
format. The protocols for biochemical network generation are well established [9]
and tools to interrogate the model are freely available [15]. These networks are
applied to study metabolism under various conditions, yet in multi-cellular organisms, challenged by the fact that individual cell-types are capable of only a limited
range of metabolic functions. Hence, automated procedures have been developed
that aim to tailor global genome-scale reconstructions tissue- and cell-type specific,
104
based on ’omics’ data sets [47, 131, 218]. COBRA procedures have also been
applied to study successfully other cellular processes, including transcription and
translation [6, 219], transcriptional regulation [220, 221], and signaling networks
[7, 8, 222, 223].
The published generic TLR signaling network, ihsTLRv1 [7], represents a stoichiometric, predictive model comprising of 963 reactions and 781 proteins. It includes
the input receptors TLR1-11, NOD1, NOD2, and Interleukin-1 receptor 1 (IL1R1).
These receptors are connected to up to six outputs, ROS, CREB, AP-1, IRF-7, IRF3, and NF-κB (Table 4.5) through an extensive set of kinases and phosphatases.
Due to its coverage, it is ideal to investigate TLR signaling pathways on a broader
scale and to use it as context template for gene expression data sets. However, no
gene identifiers and no gene-reaction associations were included, such that mapping
of gene expression data and analysis of TLR signaling in cell-type or disease specific context, in analogy to applications of metabolic networks, was not possible so
far. Tissue specific differences in the cell response to environmental stimuli have
been recognized as major challenge in cell signaling, yet many models of signaling
pathways neglect these cell-type specific differences[40].
The aim of the study was to explore the possibility of using COBRA methods and
the human TLR signaling network to investigate tissue and disease specific differences in TLR signaling. Therefore, we first identified the set of genes associated
with the reactions in ihsTLRv1, and we then generated an updated version of the
TLR signaling network (ihsTLRv2). We used expression pattern of the identified
TLR genes in human blood derived monocytes to reduce ihsTLRv2 to only contain
the cell-type specific set of expressed isoforms, proteins, and reactions (Figure 4.2,
see File S1 below for details on the procedure). We then investigate the extent and
propagation of the changes induced through LPS stimulation onto pathway utilization.
4.2 Results
4.2.1 Extensions of gene results in ihs TLRv2
Gene-reaction associations (GRAs), connecting each network reaction with genes
encoding participating proteins form the basis for cell-type or condition specific tailoring based on gene expression data. This contextualization was not possible with
ihsTLRv1 due to missing GRAs. We employed the NCBI Entrez gene database
[224], UniProtKB/Swiss-Prot [225], and primary literature to identify Homo sapiens specific genes and established GRAs using AND and OR Boolean logic.
105
ihsTLRv1 represented mammalian TLR signaling and included TLR1 through TLR11.
However, the human open reading frame for TLR11 contains multiple stop codons
indicating that this receptor may not be expressed [226]. Therefore, we removed
TLR11, ten associated reactions (Table 4.7), and eight chemical compounds from
the network (Table 4.8). We added exchange reactions to resolve dead-ends, i.e.,
reactants that were only produced or consumed, in the network (Table 4.9). Gene
extension and tailoring of receptor content led to the human gene extended TLR
signaling network, deemed ihsTLRv2.
In total, we included 314 genes into ihsTLRv2, of which 312 genes were identified for 178 unique chemical compounds and two genes associated with a choline
uniport reaction were taken from human metabolic reconstruction [45]. The choline
uniport transporter encoded by the genes was not a chemical compound in ihsTLRv2.
The 178 unique chemical compounds can be divided into receptors (14), kinases
(64), phosphatases (7), and the remaining chemical compounds (93), also referred
to as other proteins. Receptors were only encoded by single genes, while isoforms
were much more common among the kinases (58%), the phosphatases (96%), and
the other proteins (63%) (Table 4.1). Overall, redundant genes comprised 55% of
the ihsTLRv2 gene content. We established GRAs for 558 of the 980 ihsTLRv2 reactions. A total of 291 modeling related reactions (i.e., sink, demand, and exchange
reactions) were not assigned with GRAs. The remaining reactions without GRAs
split into transport reactions of metabolites (37), TLR ligand expression, transport
and binding reactions (87), reactions involving generic chemical compounds (3),
and orphan chemical compounds (4). Ras family small GTP-binding protein generic
(Ras) genes were not included in the current version of ihsTLRv2 due to functional
ambiguity. The current version of ihsTLRv2 further did not include gene association for lipopolysaccharide-binding protein (LBP) due to its external origin. The
chemical compounds SRC (c-Src), SRCK (Src family kinase (generic)), and SRTK
(Src-related tyrosine kinase) were not unambiguously defined in ihsTLRv1. After
thorough literature review, we assigned one gene to SRC (c-Src), while we treated
SRCK and SRTK as the same chemical compound.
Table 4.1: Statistics of the gene extension of the generic human TLR model.
Groups of
Chemical compounds
Receptors
Kinases
Phosphatases
Proteins
Total
Chemical compounds
(n)
14
64
7
93
178
Genes assigned
(n)
14
100
54
144
312
Unique genes
(n)
14
42
2
83
141
Redundant genes
(n)
0
58
52
61
171
Redundant genes comprise of all genes, which are associated with chemical
compounds having isoforms.
106
4.2.2 Protein-Protein Interactions (PPI) in InnateDB and ihs TLRv2
A compendium of genes, proteins, and interactions specific to innate immune response of humans and mice to microbial infection has been collected in the Innate Immunity database, InnateDB [227]. We compared the ihsTLRv2 genes to the
set of genes captured in the InnateDB PPIs to understand the connectivity of the
TLR signaling involved proteins between each other. The query resulted in interactions among 242 of the 314 genes. The majority of genes without interactions
encoded isoforms distributing mainly among 15 ihsTLRv2 chemical compounds.
Five genes were not present in InnateDB, being calpain small subunit 2 (EntrezGene
ID: 84290), diacylglycerol kinase kappa (EntrezGene ID: 139189), and thioredoxin
reductase 1, 2, and 3 (EntrezGene IDs: 7296, 10587, 114112, respectively).
We also computed the connectivity of the TLR network components based on the
number of network components that co-appear in the network reactions. From this
analysis, we excluded metabolites and ligands. We then ranked the ihsTLRv2 genes
according to their number of PPIs derived from the InnateDB query as well as according to their connectivity in the model and compared the two ranking lists (Table
4.2). As one may expect, the ranking order in the two lists was comparable for
highly connected gene products, even though the number of connections was much
smaller in the ihsTLRv2 based connectivity list. The ten most highly connected
genes were all involved in the MyD88-dependent signaling pathway, a pathway
employed by all TLRs except TLR3 [37]. Hereby, MyD88 associates with a TLR
and recruits IL-1 receptor associated kinase 1 (IRAK1) and TNF receptor-associated
factor 6 (TRAF6)[228]. Poly-ubiquitination of TRAF6 is necessary for downstream
activation of IKK and of NF-κB [229]. For the comparison of the connectivity,
we removed 159 chemical compounds, some of which were the most highly connected chemical compounds in the ihsTLRv2, including protons, adenosinediphosphate (ADP), and adenosinetriphosphate (ATP), emphasizing energy requirements
and dependence on metabolic processes.
4.2.3 SNPs in the TLR signaling network
SNPs have been linked to human pathophysiology [230]. However, the complexity
of human genotype-phenotype relationships is challenging to elucidate. Although
ultimate consequences of sequence variation is relatively easy assessable, changes
perusing through the entire cellular network might not be as obvious. For instance,
sequence variation can affect kinetic properties of individual enzymes and/or its
expression level. Mapping these changes to a metabolic network of the red blood
cell demonstrated the functional consequences of selected SNPs in cell function in
107
Table 4.2: Comparison between InnateDB interactions among ihsTLRv2 genes and
interactions of ihsTLRv2 network species within ihsTLRv2.
Ranking
1
2
3
4
5
6
7
8
9
10
InnateDB
Entrez
Gene
Gene ID Symbol
7189
TRAF6
5970
RELA
7316
UBC
3551
IKBKB
3654
IRAK1
1147
CHUK
4790
NFKB1
8517
IKBKG
4792
NFKBIA
4615
MYD88
Interactions
(n)
180
161
143
137
136
133
130
123
121
119
ihsTLRv2
Corresponding chemical Connections
compound
(n)
TRAF6-D[c]
14
NFKB(p50/p65)[c]
13
UBIQ[c]
23
IKK[c]
25
IRAK1_TIFA-2P3U[c]
16
IKK[c]
25
NFKB(p50/p65)[c]
13
IKK[c]
25
NFKB_IKBA[c]
5
MYD88-D[c]
16
Ranking (model
connectivity)
5
6
2
1
4
1
6
1
13
4
Genes (InnateDB) and corresponding chemical compounds (ihsTLRv2) were
ranked according to number of interactions. Highly ranked genes were also highly
connected in ihsTLRv2, despite smaller number of connection.
chronic and non-chronic anemia patients [231]. In a similar manner, testing the
consequences of functional loss of proteins might offer insights into downstream
effects of SNPs in TLR signaling. In total, we identified SNPs for 12 distinct genes
linked to known clinical phenotypes. We simulated the consequences of loss of
protein function of those genes due to the presence of SNPs upon the input-output
(I/O) relationships in ihsTLRv2 (Table 4.3). Four of these genes code for the receptors, TLR1, TLR3, NOD1, and NOD2. In silico knock-out (KO) of these four
genes disabled receptor dependent I/O pathways. The TIR domain containing adaptor protein (TIRAP), another gene with disease-linked SNPs specifically mediates
the MyD88-dependent pathway via TLR2 and TLR4 [228]. The KO of TIRAP led
to complete disruption of downstream pathways of TLR1/2 and TLR2/6, disabling
all outputs induced by these inputs. However, TLR4 signaling was not affected in
our simulations. In addition to MyD88-dependent signaling, TLR4 also induces
a MyD88-independent pathway through TIR domain-containing adaptor (TRIF).
Both pathways induce activation of NF-κB, although distinguished as early and late
response [228]. Our constraint-based steady-steady simulation approach does not
contain a time component. Thus, differential effects of early and late response could
not be resolved, and fluxes were redirected in the case of TLR4 signaling. To mimic,
fast and slow response one could, for example, add constraints on the upper bound
of the reactions involved in the slow signaling pathway.
In our simulations, the outputs, NF-κB and reactive oxygen species (ROS), were
each selectively disrupted through the KO of some disease-related SNP genes. For
instance, ROS production was disabled through KO of either subunits of NADPH
108
oxidase, neutrophil cytosolic factor 2 (NCF2), or p22-phox protein (CYBA). NF-κB
activation was selectively disabled through the KO of the alpha subunit of inhibitor
of kappa light polypeptide gene enhancer in B-cells kinase (IKK), which phosphorylates IkBα and activates NF-κB [228].
We found that four of the 12 genes were insufficient to influence the flux distributions as these genes encoded exchangeable subunits of phosphatidic acid phosphatase, protein kinase C, protein kinase A, and a A20-binding inhibitor of NF-κB
activation homolog. The KO of any of these four genes did not elicit any effect in
the simulation as the model used an isoform. However, in a particular tissue the
actual number of expressed isoforms may be limited and thus, a phenotype could be
observed using a more tailored, cell-type specific TLR network. Taken together, our
analysis provided some insight into the possible effects of SNP dependent protein
KO. The results also demonstrate the need for contextualization to better resemble
the actual conditions in certain tissues or disease states.
4.2.4 Tissue specic TLR expression
TLR expression is cell-type specific, thus reflecting distinct functions and exposure
to pathogens presence. In order to assess these differences, not only for TLR expression but for all proteins involved in TLR signaling, we obtained protein expression
information for the ihsTLRv2 gene set from the HPA for 66 normal cell types [190].
Data could be obtained for 77 ihsTLRv2 genes (24.5%), present in at least one cell
type. These 77 genes encoded 66 distinct proteins. Gene products with moderate/medium and strong/high expression were assumed to be present, while the other
gene products were assumed to be absent. On average, each tissue expressed 40
proteins with the least number of proteins expressed in ovarian stroma cells (18)
and highest number of expressed proteins in lung macrophages (53). A large number of airborne pathogens inevitably attain the lung in combination with inhaled air
each day [232]. Expression of a high number of proteins involved in TLR signaling
by lung tissue macrophages might reflect such tissue-specific, constant pathogen
exposure. However, gene coverage was on average only 13% per tissue.
Using clustering of cell types and genes based on the Euclidean distance measure
and subsequent visual inspection, we separated genes into five clusters with distinct abundance among cell types (Figure 4.1). Cluster one and three contained
sparsely expressed genes on average expressed in 12 and 17 tissues, respectively.
The genes of cluster one were found to be almost exclusively expressed in lymphoid and hematopoietic cell types. In contrast, genes frequently expressed accumulated in the second, fourth, and fifth cluster expressed on average in 41, 61, and
47 cell types, respectively. Cluster four consisted of 15 genes expressed in 92%
109
Table 4.3: Table summarizing ihsTLRv2 genes with clinically linked SNPs, corresponding clinical phenotypes and consequences of in silico knock out on ihsTLRv2
function.
Entrez
gene ID
9663
7096
Gene
symbol
LPIN2
TLR1
Phenotype
MIM number
609628
613223
613223
609464
186580
266600
607507
101800
160980
255960
610489
64127
NOD2
5573
PRKAR1A
4688
NCF2
188550
233710
5578
7128
PRKCA
TNFAIP3
612967
612378
1147
CHUK
612363
1535
CYBA
233690
114609
TIRAP
606252
611162
610799
7098
TLR3
607948
613002
10392
NOD1
266600
Phenotype
KO effected
Majeed syndrome
Leprosy, protection against
Leprosy, susceptibility to
Sarcoidosis, early-onset
Blau syndrome
Inflammatory bowel disease 1
Psoriatic arthritis, susceptibility to
Acrodysostosis with hormone resistance
Carney complex, type 1
Myxoma, intracardiac
Pigmented nodular adrenocortical
disease, primary
Thyroid carcinoma, papillary, somatic
Chronic granulomatous disease due to
deficiency of NCF-2
Body mass index QTL 15
Systemic lupus erythematosus,
susceptibility to
Plasma level of alanine
aminotransferase, QTL 1
Chronic granulomatous disease,
autosomal, due to deficiency of CYBA
Bacteremia, protection against
Malaria, protection against
Pneumococcal disease, invasive,
protection against
Tuberculosis, protection against
Herpes simplex encephalitis,
susceptibility to
Inflammatory bowel disease 1
no
TLR1/10, TLR1/2 signaling
TLR1/10, TLR1/2 signaling
NOD2 signaling
NOD2 signaling
NOD2 signaling
NOD2 signaling
no
no
no
no
no
ROS production
no
no
NF-κB
ROS production
TLR1/2, TLR2/6 signaling
TLR1/2, TLR2/6 signaling
TLR1/2, TLR2/6 signaling
TLR1/2, TLR2/6 signaling
TLR3 signaling
NOD1 signaling
Listed are those genes, for which SNPs with clinical phenotype could be identified,
as provided by Exome Variant Server (URL: http://evs.gs.washington.edu/EVS/).
of the cell types. After hierarchical clustering of the presence/absence data, we
divided the different cell types into two obvious clusters based on the number of
expressed genes. The first cluster contained cells with high numbers of genes expressed, mostly neuronal cells, glandular cells, hematopoietic, and lymphoid cell
types. The data mapping revealed one example for the tissue specific isoform expression of calcium/calmodulin-dependent protein kinase II beta (CaMK-II subunit
β ), which was exclusively expressed in six neuronal cell types. However, neural
glial cells did not express CaMK-II subunit β . Four isoforms of CaMK-II (α, β ,
γ, and δ ) expressed from separate genes exist in mammals, two of which are brain
specific (α and β ) [233]. While our HPA data captured accurately the β isoform
specific expression, no reliable tissue specific expression of the other isoforms could
110
be observed.
Overall, analysis of the expression pattern of the TLR signaling network specific
gene set using HPA data provided evidence of distinct expression pattern of genes
and isoforms at tissue and cell level, i.e., for brain or lymphoid tissues, which
we found in agreement with experimental data. However, low coverage of the
ihsTLRv2 genes in the HPA data would not be sufficient for the generation of tissuespecific networks of TLR signaling.
111
Figure 4.1: Expression of ihsTLRv2 gene products in normal human tissues. Clustering revealed distinct expression pattern of the ihsTLRv2 genes with respect to
genes and tissues. Genes divided into five clusters with distinct mean abundance
across tissues. Gene cluster were on average expressed in 12, 41, 17, 61, and 47
tissues. Clustering tissue and cell-types revealed two clusters. The first cluster (left)
terminates after the closely assembled lymphoid cell-types. Within the first tissue
cluster, certain groups of related cell-types, such as lymphoid or CNS neuronal celltypes clustered close together. Lymphoid cell-types selectively expressed genes of
the first gene cluster. Tissues and cell-types in the second cluster express genes of
the third and fifth gene cluster less frequently.
112
4.2.5 Protein abundance of ihs TLRv2 in cancer cell lines
After characterizing distinct expression pattern among cell types, we were interested in the abundance of proteins involved in TLR signaling within cells. Recent
advances in high-resolution mass spectrometry (MS)-based proteomics has enabled
large-scale investigation of cellular concentrations of proteins [110, 111]. Although
abundance of distinct protein categories, such as metabolic enzymes, transcription
factors, and kinases has been assessed [111], context-dependent analysis of a signaling network comprising of different protein categories (receptors, kinases, phosphatases) have not been investigated yet. Using our well-defined, well-curated TLR
signaling network, we were interested whether our specific set of signaling proteins
in general and TLR receptors in particular were differentially abundant in different
cell types. To address this question, we employed data from two recent studies,
which conducted large-scale measurements of protein abundance in two cell lines,
cervical cancer originating HeLa cells [111] and human osteosarcoma originating
U2OS cells [110]. We identified 164 ihsTLRv2 model gene products in the HeLa
cell line data and 155 ihsTLRv2 gene products from U2OS cells. The range of protein concentrations of the ihsTLTv2 gene products ranged from 27 copies per cell
(serine/threonine-protein phosphatase 2A regulatory subunit B’, beta) up to 14.5
million copies per cell for ubiquitin. While the entire HeLa proteome ranged from
0.2 to 33 million copies per cell, ihsTLRv2 gene products also covered almost the
entire concentration range with ubiquitin being the 10th most abundant protein in
the HeLa proteome. In the U2OS cells, measured concentrations in the entire U2SO
proteome ranged from <500 up to >20 million copies per cell. The concentrations
of ihsTLRv2 gene products ranged between <500 up to 11.7 million copies per cell.
The latter one was the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1),
which was among the 30 most abundant proteins in the U2SO proteome.
We observed similarities in protein concentrations of ihsTLRv2 gene products among
the two cell lines, for proteins with high and low concentrations. One example was
the heat shock protein beta-1 (Hsp27). Heat shock proteins are known to be overexpressed in human cancer cells and Hsp27 expression has been associated with
poor prognosis in several types of cancer [234]. Therefore, it is not surprising to
find Hsp27 expressed in high quantities in both cancer cell lines. On the other end
of the scale, a subunit of serine/threonine-protein phosphatase 2A was expressed in
very low concentrations. Although the very same regulatory subunit was not part of
the data we derived from the U2SO cell data set, we found alternate subunits of the
serine/threonine-protein phosphatase 2A among low concentrated proteins (<500
copies). A total of 118 ihsTLRv2 model gene products were identified in both cell
lines, corresponding to 38% of all ihsTLRv2 genes. This high overlap indicates
that the majority of the TLR network is expressed in these two cell lines but isoform usage may change. Indeed, shared gene products made up 50% of the chem-
113
ical compounds in ihsTLRv2. Considering proteins expressed in each cell lines,
59% ihsTLRv2 chemical compounds were expressed in U2SO cells and 62% in
HeLa cells. Expression of distinct/additional isoforms was observed in a number of
cases, such as the 14-3-3 protein family, protein kinase 2 (CK2), phosphoinositide
3-kinase (PI3K1A), and protein kinase A (PKA). Both cell types expressed additional isotypes of the 14-3-3 proteins, which are involved in a wide range of cellular
functions [235]. The 14-3-3 proteins can modulate interactions between proteins.
For example, different 14-3-3 isotypes mediate complex formation of Raf-1 with
distinct PKC isotypes leading to tissue-specific differences of the resulting complex
[236].
In the previous section CAMK-II, subunit β , was observed to be specifically expressed in neurons of the central nervous system (CNS). In our current analysis,
both cell lines expressed CAMK-II subunit δ . Additionally, HeLa cells express
CAMK-II subunit γ. Both isoforms have been reported to be expressed in numerous tissues in rat [237].
The only TLR receptor expressed was TLR9 in HeLa cells. Over-expression of
TLR9 has previously been reported in lung cancer and HeLa cells, and it has been
suggested that it might contribute to cancer proliferation, although the mechanism
is unknown [238]. Apart from this, cervical cells seem to lack TLR expression in
the absence of infection, possibly preventing excessive activation of TLR signaling
through normal vaginal flora [239]. Bacterial, viral or protozoan infection in the
cervix has been associated with development of cervical cancer [239]. However,
human papillomavirus infection alone is insufficient to cause cervical cancer [239].
On the other hand, no information was found on TLR signaling in U2SO cells.
Apart from the similarities discussed above, no correlation was observed between
the expression levels of the entire set of 118 overlapping proteins indicating that the
usage of the TLR signaling network between these two cells is quite distinct.
4.2.6 Generation of a draft monocyte specic TLR model based on
gene expression data
In order to derive a monocyte specific model of TLR signaling (ihsMonoTLR), we
mapped gene expression data from untreated monocytes [240] onto the network.
To find a suitable cutoff, distinguishing between presence and absence of expressed
genes, we generated draft-reconstructions based on two different cutoffs p≤0.01
and p≤ 0.05. A set of 37 genes solely received absent calls for the more stringent
cutoff. The cutoff had a major impact on the number of dead-end metabolites and
blocked reactions, i.e., reactions that cannot carry any flux in the network due to
dead-end metabolites (Figure 4.3). The decision for a particular cutoff has there-
114
fore a major impact on the network capabilities as well as on the time required for
manual curation of the model to ensure similar functionality as in the cell. Protein expression data were obtained for 23 genes in two monocytic leukemia cell
lines (THP-1 and U-937) [241]. Most of the genes (17) were moderately expressed,
which was mostly the case for both cell lines. The remaining six gene products had
not been detected using immunohistochemistry, four of which were absent in both
cell lines, while two gene products were only expressed in one of the cell lines.
There was no correspondence between statistical detection probability and negative
detection that would make us favor the more stringent, p≤0.01, cutoff (Figure 4.3).
Literature search yielded experimental evidence for the presence for four genes (Table 4.10). Since the majority of genes rejected by the stringent cutoff was found to
be present in monocytes based on immunohistochemistry and literature evidence,
we proceeded our network tailoring by using the p≤ 0.05 cutoff, as it seemed more
suitable for monocytes and the given data set.
4.2.7 Literature based curation of the draft monocyte specic TLR
model
Literature provided evidence for the production of all six outputs in human monocytes [214, 242, 243, 244]. Using flux variability analysis (FVA) [11, 162] on the
draft monocyte TLR model, we found that ROS production was only partly possible through one of two defined output reactions, and that NF-κB production was
completely impaired (Table 4.6). We completed the corresponding output pathways
by adding the protein kinase C, zeta (EntrezGene ID: 5590), which recovered both
outputs (Table 4.11) see also Methods section). This gene product is known to be
important in NF-κB activation, and its presence in U-937 cells has been demonstrated [245]. Three genes (Entrez Gene ID: 815-818), encoding for isoforms of
CaMKII, had direct impact on the models output capabilities. Reincorporation the
genes encoding CaMKII resulted in a major increase in CREB output production
(Table 4.11). We further curated ihsMonoTLR based on known monocyte function,
instead of relying solely on a pathway driven approach. Only genes were considered, which were absent in ihsMonoTLR, while isoforms of already captured genes
were ignored. Subsequently, 14 genes were reintroduced to the ihsMonoTLR network based on literature support (Table 4.12).
The final monocyte specific TLR signaling network, ihsMonoTLR, contained 62
genes less than the generic TLR signaling network, ihsTLRv2 (Figure 4.2). The
gene reduction mainly affected the presence and absence of redundant genes, while
the signaling pathways mostly remained complete. The genes absent in the final
hMonoTLR model encode proteins of 22 chemical compounds in the network (Table 4.4). We found large decrease in the number of expressed isozymes, e.g., for cal-
115
Figure 4.2: Workflow leading from ihsTLRv1 to a data driven monocyte and LPS
stimulated monocyte model. The workflow describes the process of generating celltype specific, and subsequently cell and condition specific models of TLR signaling
in four steps. (1) In the first step, Homo sapiens genes and gene-reaction associations were added to the model. Further, reactions and chemical compounds connected to the signaling of TLR11 were deleted and exchange reactions added. (2)
Transcriptomic data was mapped to the model leading to preliminary monocyte
specific models of TLR signaling using different cutoffs during the mapping process. (3) The most suitable preliminary model was chosen based on comparison
with cell-type specific proteomic data (HPA) and literature evidence. Manual curation was essential to ensure monocyte specific input-output capabilities of the final
monocyte model ihsMonoTLR. (4) Transcriptomic data derived from LPS stimulated
monocytes was mapped to the ihsMonoTLR to tailor the model condition specific.
Statistics on the network sizes at each stage reveal how network size remains comparable while gene contents reduced with increasing modeling resolution.
pain, which are calcium-dependent cysteine proteases and ubiquitously expressed.
Its functions include, among others, pro-IL-1 processing [246]. In total, nine out
of the 16 calpain genes were found to not be expressed in the monocytes. Despite
the reduced number of genes, functional calpain complexes could still be assembled
and the model could produce IL-1.
During the curation step, the number of dead-end metabolites and blocked reactions
was reduced (Figure 4.3B). The output capabilities of ihsMonoTLR remained equal
to ihsTLRv2 (Table 4.6).
TLR3 and TLR10 were absent in the monocyte specific model, while all other TLRs
and NODs were present (Table 4.5), in agreement with literature [212, 214, 247,
116
Figure 4.3: Definition of cutoff for initial monocyte draft-model. A. The procedure for the generation of the monocyte specific model was divided into two parts.
First, a suitable cutoff was defined for mapping the gene expression data. Therefore, preliminary monocyte models were generated for two cutoffs (p≤0.01 and
p≤0.05). Both cutoffs led to high numbers of blocked reactions and dead-end nodes
in the networks. We identified the set of genes only absent in the more stringent
cutoff and validated expression of the gene products using the Human Protein Atlas immunohistochemical data of two monocytic leukemia cell lines (THP-1 and
U-937) and chose the cutoff, which represented monocyte protein expression the
best. The second part of the procedure concerned the assurance of monocyte specific network functionality. Input and output capabilities of the monocyte model
were curated according to cell-type specific literature evidence. B. Statistic of the
number of deleted genes, reactions constrained during the data mapping, blocked
reactions, and dead-end nodes in the preliminary monocyte models, the curated
monocyte model (ihsMonoTLR), and the LPS stimulation specific monocyte model
(ihsMonoTLR_LPS). C. Graph illustrating the detection probability of the genes
absent in the stringent and present in the moderate cutoff. Genes are colored according to whether they were expressed (red), they were not expressed (blue), no
data was available (pale), or data among cell lines was discriminating (purple). In
many cases, no data was available, or the proteins were expressed in the cell lines.
Only in few cases, the genes were not expressed in any of the cell lines. Also, absent
gene expression was distributes across the entire range of the thresholds, such that
no intermediate cutoff could be established. As a result, the monocyte model was
based on the more moderate cutoff.
117
248, 249].
The wealth of supporting literature evidence for TLR signaling specific components
in monocytes underlines that we defined biologically conclusive cutoff for generating a monocyte model of TLR signaling. We also demonstrated that the monocyte
specific model generation required substantial manual curation upon gene expression data mapping to reflect well the known cell-type specific receptor content.
Table 4.4: Distribution of absent genes
Chemical
compound
Ajuba
Kinase suppressor of RAS 2
Toll-like receptor 10
Toll-like receptor 3
beta-transducin repeat containing protein 2
A20-binding inhibitor of NF-κB activation
Sarco/endoplasmic reticulum Ca(2+)-ATPase
Serum/glucocorticoid regulated kinase
Thioredoxin reductase
Ubiquitin-conjugating enzyme E2D
cAMP responsive element binding protein
Ubiquitin
Protein phosphatase 2B
Protein kinase A
Cholin uniport
Phosphoinositide 3-kinase
MAP kinase phosphatase
Protein Phosphatase 2A
Src family kinase/ Src-related tyrosine kinase
Ddiacylglycerol kinase (generic)
Histone H3
Phosphatidic acid phosphatase (generic)
Calpain
Genes absent
in hMonoTLR
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
3
3
5
6
9
9
9
Genes encoding
chemical compound
1
1
1
1
2
3
3
3
3
3
4
4
5
7
2
6
16
17
10
10
12
14
16
Group of
chemical compound
Protein
Kinase
Receptor
Receptor
Protein
Protein
Kinase
Kinase
Protein
Protein
Protein
Protein
Phosphatase
Kinase
Metabolite transporter*
Kinase
Phosphatase
Phosphatase
Kinase
Kinase
Protein
Phosphatase
Protein
*Metabolite transporter did not have a chemical compound as the genes were only
added to the transport reaction.
4.2.8 Tailoring the monocyte TLR model to a LPS stimulation specic
model
In monocytes, TLR4 stimulation activates several signaling pathways and transcription factors (TFs) as well as induces inflammatory gene expression programs [242].
In order to investigate this distinct network state, we used gene expression data
of the aforementioned experiment [240] to tailor ihsMonoTLR condition specific.
Two ihsMonoTLR genes, pellino homolog 3 (PELI3) and TLR6, were no longer
expressed upon LPS stimulation. It has been experimentally shown that LPS stim-
118
119
Ligand Abbrev.
26dap-LL
ALPS
BDFN2
BPM
CPGCIGC
CSGA
DCLDLPP
DCLLPP
DSRNA
ENVP
FBNG
FLGN
FUSP
GCSPL
GLC
HSP60
HSP70
IMQ
LAM
LP
LPPS
LPS_HS
LTA
LXR
MRAP
MRDP
MRNA
OLSCHYA
OMPA
OSPALP
IL1A
IL1B
PRNS
PSCHPS
PSM
PTG_HS
SF
SSRNA
STF
T3RFBN
TCLDLPP
TLRL1/10
TLRL10
TLRL2/10
TXL
UMLCPGD
ZMS
Ligand name
diaminopimelic acid
atypical lipopolysaccharide
beta defensin 2
bropirimine
CpG chromatic IgG2a complexs
CsgA
diacetylated lipopeptides
diacyl lipopeptides
double stranded RNA
envelope protein
fibrinogen
flagellin
fusion protein
glycoinositol phospholipids
glycolipids
heat shock protein (60kDa)
heat shock protein (70kDa)
imidazoquinoline
lipoarabinomannan
lipoprotein
lipopeptides
lipopolysaccharide (Homo sapiens)
lipoteichoic acid
loxoribine
mannuronic acid polymer
muramyl dipeptide
mRNA
oligosaccharides of hyaluronic acid
outer membrane protein A
outer surface protein A
IL-1A
IL-1B
porins
polysaccharide fragment of heparan sulphate
phenol-soluble modulin
peptidoglycan (Homo sapiens)
soluble factors
single stranded RNA
soluble tuberculosis factor
type III repeat extra domain A of fibronectin
triacetylated lipoproteins
TLR1/10 ligand
TLR10 ligand
TLR2/10 ligand
taxol
unmethylated CpG DNA
zymosan
Receptor type activated
NOD1
TLR2
TLR4
TLR7
TLR9
TLR2
TLR2/6
TLR2/6
TLR3
TLR4
TLR4
TLR5
TLR4
TLR2
TLR2
TLR4
TLR2, TLR4
TLR7, TLR8
TLR2
TLR2
TLR2
TLR2, TLR4
TLR2/6, TLR2
TLR7
TLR2, TLR4
NOD2
TLR3
TLR4
TLR2
TLR2/6
IL1R1
IL1R1
TLR2
TLR4
TLR2/6, TLR2
TLR2
TLR1/2
TLR7, TLR8
TLR2/6
TLR4
TLR1/2
TLR1/10
TLR10
TLR2/10
TLR4
TLR9
TLR2/6, TLR2
Outputs
NF-κB
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AP-1
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
CREB
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
ROS
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
IRF-3
•
•
•
IRF-7*
hMonoTLR_LPS
hMonoTLR
hMonoTLR
hMonoTLR
hMonoTLR_LPS
hMonoTLR_LPS
hMonoTLR_LPS
hMonoTLR_LPS
hMonoTLR_LPS
hMonoTLR_LPS
hMonoTLR
First model absent
Table 4.5: Inputs and outputs covered by generic (ihsTLRv2) and monocyte specific (hMonoTLR & hMonoTLR_LPS) TLR signaling
models. *IRF7 can only be produced after stimulation of combinations of TLR receptors, TLR3 or TLR4 combined with either TLR7,
TLR8 or TLR9.
Table 4.6: Maximum possible flux values for output reactions in the different TLR
signaling models.
Output
IRF3
IRF7
ROS
ROS
AP-1
CRE
AP-1
NF-κB
NF-κB
ihsTLRv2
25.00
11.11
25.00
50.00
25.00
12.50
25.00
14.29
14.29
ihsMonoTLR draft p≤0.05
0.00
11.11
25.00
50.00
1.00
12.50
25.00
0.00
0.00
Fluxes are given in (
ihsMonoTLR draft p≤0.01
0.00
11.11
0.00
27.00
1.00
0.00
0.00
0.00
0.00
final ihsMonoTLR
25.00
11.11
25.00
50.00
25.00
12.50
25.00
14.29
14.29
ihsMonoTLR_LPS
25.00
11.11
25.00
50.00
25.00
12.50
25.00
14.29
14.29
µmol
).
g protein · min
ulation led to degradation of the PELI3 gene product in human peripheral blood
mononuclear cells and that the protein levels only recovered after several hours
[250]. Absence of the PELI3 gene was therefore unlikely to be an artifact. The
resulting ihsMonoTLR_LPS model contained 960 reactions and 763 chemical compounds (Figure 4.2). The number of reactions reduced by three, which were associated with the two absent genes, and two chemical compounds were absent compared
to the monocyte model. The number of dead-ends and blocked reactions increased,
while functionality with respect to the outputs remained the same (Table 4.6). Two
genes, which were absent in unstimulated monocytes, appeared to be expressed after
LPS stimulation. A Src family kinase (Entrez Gene ID: 7525) and TNFAIP3 interacting protein 3 (Entrez Gene ID: 79931). Hence, these genes were not expressed
in unstimulated monocytes, they were no longer part of ihsMonoTLR and were not
further considered in ihsMonoTLR_LPS. Both of the genes encode isoforms and
in both cases at least one other isoform was present in unstimulated and stimulated
monocytes. Addition of the genes would therefore not have altered the number of
active reactions. This observation highlights that the generation of a truly generic,
condition unspecific monocyte model requires a compilation of multiple data sets
and that curation of redundant genes would be needed.
4.2.9 Condition specic network states of monocyte TLR signaling
The monocyte models reconstructed herein allow for simulation and analysis of
changes in energy levels and altered gene expression that occur in case of innate
immune response. Both cases will be investigated in the following sections.
120
4.2.10 Sensitivity analysis
Signaling and innate immune response are energy dependent cellular processes.
During antibacterial innate immune response, intracellular ATP levels might rapidly
deplete [251]. In order to evaluate the energy dependency of the TLR signaling network, we performed a sensitivity analysis testing ATP and guanosinetriphosphate
(GTP) requirements of the distinct outputs produced after stimulation through one
of 13 input receptors in hMonoTLR_LPS (File S3, see also Methods section). We
found the same qualitative dependencies on energy species for 12 input receptors
and production of ROS, CREB, AP-1 and NF-κB (File S3 below). As depicted for
TLR4 stimulation (Figure 4.4), all output production was dependent on ATP, and
ROS production further dependent on GTP. TLR4 is the only input receptor in the
monocyte model to produce both Interferon regulatory factor 3 (IRF3) and IRF7,
which was dependent on ATP but not GTP. NOD receptors produced other outputs
beside NF-κB due to thermodynamically infeasible loops in the network (discussed
in [7]) which are necessary for network function (File S3 below, Figures S2-3). Production of NF-κB from NOD receptor (NOD1 and NOD2) stimulation was ATPbut not GTP-dependent. This sensitivity analysis demonstrates the requirement of
the TLR model for ATP and GTP and that the availability of energy could indeed
modulate the signaling outputs.
4.2.11 Setting quantitative gene expression changes into context
Quantitative changes in gene expression could possibly alter flux distributions and
produced outputs within the network. Compared to the relatively small differences in qualitative gene expression, LPS stimulation induced up-regulation of 28
ihsMonoTLR_LPS genes (Table 4.13) and down-regulation of three genes (Table
4.14). Together, they represented 12% of the genes. Of the 28 up-regulated genes,
ten encoded isoforms. None of the down-regulated genes were isoforms. We called
a gene differentially expressed when at least 50% of the probe sets of the gene
were differentially expressed. Eight genes, three up- and five down-regulated with
regulated probe sets, were rejected due to this threshold and will be referred to as
subthreshold genes in the following sections. Subthreshold genes represented further 3% of the ihsMonoTLR_LPS genes. Taken together, only a small number of
the ihsMonoTLR_LPS genes showed altered gene expression level two hours poststimulation with LPS.
121
Figure 4.4: Sensitivity analysis. hMonoTLR_LPS was used for the sensitivity analysis. The network contains nine output reactions for six distinct outputs ROS, IRF3,
IRF7, CRE, AP-1, and NF-κB. ROS,CRE, AP-1, and NF-κB could be produced by
all receptor inputs. Energy dependencies of output production did not differ among
input receptors. IRF3 was only produced after stimulation of TLR4. IRF7 was only
produced when TLR4 and either TLR7, TLR8 or TLR9 were stimulated together. In
case of IRF7, we stimulated the network via TLR4 and TLR8.
Estimation of the impact of the up-regulated genes on network topology
We were interested in the impact of the regulated genes on the TLR signaling network functionality. Since ihsMonoTLR_LPS represents accurately the functions
of each gene product, we extracted a sub-network consisting of all reactions associated with the 28 up-regulated genes (ihsMonoTLR_LPS_upreg), which included
185 reactions (19% of ihsMonoTLR_LPS) and 296 chemical compounds (39% of
ihsMonoTLR_LPS) (Figure 4.5). The sub-network also included output reactions
for NF-κB and AP-1 implying an influence of the up-regulated gene set, in particular, upon these two different model outputs, which is in agreement with experimental data [242, 253]. We compared the connectivity of the chemical compounds
within the sub-network with ihsMonoTLR_LPS. The high metabolite connectivity
of protons, ATP, and ADP was conserved in the sub-network, even though the rela-
122
Figure 4.5: Network resulting from mapping of the up-regulated genes onto the
LPS stimulation specific monocyte model. We extracted a sub-network from LPS
stimulation specific monocyte model (ihsMonoTLR_LPS) consisting of all reactions
associated with the 28 up-regulated genes, which included 19% of the reactions and
39% of the chemical compounds of ihsMonoTLR_LPS. The visualization revealed
a comprehensively connected network. Network illustration was generated using
software Paint4Net [252].
tive connectivity was smaller in the sub-network than in ihsMonoTLR_LPS (Figure
4.6). Chemical compounds, such as ubiquitin and the inhibitor of the kappa light
polypeptide gene enhancer in B-cells kinase (IKK), had lower numbers of connections compared to ihsMonoTLR_LPS, but in relation to the number of chemical
compounds in ihsMonoTLR_LPS (n=763) and the sub-network (n=296) relative
connectivity was higher in the sub-network. In contrast, we found chemical compounds, such as TRAF-6 and MyD88, to be higher connected in ihsMonoTLR_LPS. These differences in the connectivity arose since the set of up-regulated genes
centered on NF-κB activation, while the chemical compounds with higher relative
connectivity in the ihsMonoTLR_LPS appear more up-stream in the signaling cascades of the network. Since the ihsMonoTLR_LPS_upreg comprises of all reactions
and functions that are higher used upon LPS stimulation, they can be interpreted as
the active sub-network used by the monocytes to process the information and initiate the corresponding program. The high connectivity in ihsMonoTLR_LPS_upreg
indicates that the retrieved sub-network mediates NF-κB activation subsequent to
LPS stimulation.
123
Figure 4.6: Comparison of (chemical compound) connectivity in the LPS stimulation specific versus the up-regulated sub-network. We report the connectivity
as a ratio of compound i and ∑(chemical compounds) in the respective model
(ihsMonoTLR_LPS subnetwork and ihsMonoTLR_LPS).
Analysis of the down-regulated sub-network module
Sub-network extraction was also performed based on the three down-regulated
genes. The resulting sub-network comprised eleven reactions and 26 chemical
compounds (Figure 4.7). It did not include any output reaction. The impact of
down-regulation, based on involvement, as we assessed in the previous section, was
rather small. The sub-network consisted of three separated modules centering either mitogen-activated protein kinase kinase kinase 14 (MAP3K14), TLR1, or Fas
(TNFRSF6)-associated via death domain (FADD). In case of FADD, another gene
product of a minority gene and not used for sub-network extraction, appeared in this
context as a direct interaction partner, i.e., caspase-8 (CASP8). CASP8 is known
to interact with FADD in monocytes, as part of the differentiation pathway, and
to prevent sustained NF-κB activation along the macrophage differentiation [254].
This example shows how ihsMonoTLR can serve as a resource for context-specific
analysis by providing functional relationships.
124
Figure 4.7: Network modules resulting from mapping of the down-regulated genes
onto the LPS stimulation specific monocyte TLR model. The sub-network that was
extracted from ihsMonoTLR_LPS based on the three down-regulated genes comprised of 26 metabolites and eleven reactions. Illustration of the sub-network revealed three separated modules confirming that the impact of down-regulation,
based on involvement, was rather small. Network illustration was generated using software Paint4Net [252].
Functional representation of quantitative changes induced through LPS
stimulation
We used the computed fold changes (FCs) to represent the LPS activated state of
ihsMonoTLR_LPS. Up- and down-regulation was mimicked by either enforcing
the minimal reaction flux or reducing the possible maximum flux through reactions
associated with regulated genes. Mapping was performed separately for each of 117
I/O relationships covering 13 input reactions and 9 output reactions in hMonoTLR_LPS. Subsequently we assessed the consequences of gene regulation based on the
altered flux ranges of the 9 output reactions, obtained through FVA. The TLR model
contains thermodynamically infeasible loops [7], which cause baseline flux through
output reactions in the model. First, we investigated the effect of stimulation beyond
baseline flux values for each I/O relationship. Therefore, we subtracted fluxes derived after stimulation from the baseline fluxes (Table 4.15). The pattern of outputs
125
produced by stimulation of an input was, as expected, in the majority of cases. Stimulation caused flux through ROS, CREB, AP-1, and NF-κB for all TLRs and for
IL1R1. Stimulation of TLR4 additionally induced IRF3. Combined stimulation of
TLR4 and TLR8 led to IRF7. Stimulation of NOD receptors produced NF-κB and
AP-1 could be produced (through ’AP1_FOS_JUN_BIND’). After we confirmed
the I/O relationships, we went on to investigate the effect of quantitative gene expression changes onto output production. In total, 183 reactions were associated
with regulated genes, whereof only a subset was active in a particular I/O relationship. As expected, mapping of differential expression onto the network enforced
AP-1 and NF-κB production across all ihsMonoTLR_LPS inputs, as genes directly
associated with the output reactions of AP-1 and NF-κB were up-regulated (Table
4.16, Table 4.17). Flux was further enforced through AP-1 output reactions (’AP1_FOS_JUN_BIND’, and ’AP1_JUN_BIND’) equally for all 13 inputs, except for
NOD receptors. We predicted a lower flux through ’AP1_FOS_JUN_BIND’ when
the NOD receptors were stimulated than for the other receptors. Data mapping
enforced the production of IRF3 output in the model after TLR4 stimulation, and
IRF7 after stimulation through TLR4 and TLR8. ROS and CREB output production was not affected by the mapping of differentially expressed genes. Among the
output reactions, no effects of the mapping of down-regulation were observed. This
analysis demonstrated how the model can be used to predict differences in cellular
phenotypes due to quantitative gene expression differences.
4.3 Discussion
The aim of this study was to establish a method for omics data driven contextualization of signaling networks after gene-extension of the human TLR signaling
network (Figure 4.2, see File S1 below for details on the procedure). Our key results demonstrate that i) substantial manual curation is required after specializing the
generic TLR signaling network to a cell-type and condition specific sub-network;
ii) the monocyte TLR signaling network captured most of the functionality of the
generic network but gene redundancy was removed, indicating cell-type specific use
of isoforms; and iii) TLR signaling is highly energy dependent as all TLR signaling
pathways required ATP availability and ROS production was additionally dependent
on GTP availability. Taken together, we demonstrated that the contextualization of
the TLR network enables the functional analysis of TLR signaling in health and
disease.
We employed the gene-extended TLR signaling network together with gene expression data and literature evidence, which ensured monocyte specific functionality
with respect to I/O pathway content (Table 4.5). The role of manual curation work
126
has been emphasized as important step in the generation of a biological meaningful
cell-type or tissue specific models, despite a growing number of sophisticated algorithms [129]. Curation with respect to the function was important as the monocyte
model was the template for subsequent, condition specific tailoring and analysis of
the consequences of LPS stimulation for network structure and function. The TLR
expression in monocytes at the chosen cutoff and cell-type specific literature were
found to be in good agreement with some but not all experimental studies [212, 214]
indicating the importance of reproducible, consistent experimental conditions and
of using identical monocyte subsets. For instance, infection states or stimulation can
drastically alter cellular processes and induce the production of effector molecules,
such as cytokines [215], for which the cell has to provide energy for the transcriptional and translational machinery. Such cellular changes can even involve usage
of central metabolic pathways, including the switch to glycolysis for faster energy
allocation [255, 256]. ihsTLRv2 was redundant in its pathways connecting inputs
with specific sets of outputs and with respect to genes encoding isoforms. Transition
from ihsTLRv2 to cell-type specific ihsMonoTLR was characterized by isoform reduction, while network size remained comparable. This may be partly due to not
manually curating the expression state of isoforms. Monocytes describe cells that
are central to the host innate immune defense and are known to express many TLRs
[211, 216, 257]. Our finding that the majority of the signaling network is preserved
in monocytes is thus plausible. Transition from unstimulated to the LPS stimulated model of TLR signaling was characterized by only few qualitative differences
but prevailing differences in quantitative gene expression was observed. The set of
up-regulated genes was found to be tightly connected (Figure 4.5). The impact of
the up-regulated genes spread across one third of ihsMonoTLR_LPS depicting the
strong influence that LPS stimulation has upon the monocyte TLR signaling network. The LPS stimulation specific sub-network of up-regulated genes correctly
contained the transcription factors NF-κB and AP-1 as their activation is an expected response of a monocyte to LPS stimulation [242, 258].
TLR signaling is highly energy dependent as demonstrated with the sensitivity analysis (Figure 4.4). TLR signaling network accounts for a number of other metabolites that link it to further metabolic processes. Integrating of models of different
cellular processes, such as metabolism, signaling, and gene regulatory networks
[259, 260, 261, 262], will enable important insights into the crosstalk between
signaling and metabolism. Corresponding modeling tools are currently developed
[6, 27, 263]. In fact, the interaction between metabolism and innate immunity is
of great interest both for health and disease [256]. For instance, TLR agonists can
stimulate a switch from oxidative phosphorylation to glycolysis in murine dendritic
cells and macrophages [255, 256]. This switch lead to faster yet less effective ATP
production, similar to the Warburg effect observed in cancer cells, and may function
as a protective mechanism to preserve cellular ATP levels and maintain cell viability
and function during an immune response [255, 256]. Moreover, it has been sug-
127
gested that neuronal TLR signaling is involved in triggering cell death in response
to brain injury [213]. Combined signaling and metabolic COBRA modeling could
help consolidating the complexity of the diseases by highlighting cross-relations.
Taken together, we demonstrated that a stoichiometric model of the TLR signaling
network combined with transcriptomic data can provide functional insight into its
signaling cascades. The presented gene extension and method to integrate transcriptomic data opens up an alley for more detailed, disease directed research, including
drug target discovery, and thus rendering signaling models amenable to similar contextualization as already established for metabolic models.
4.4 Materials and Methods
Gene extension
Genes for chemical components were identified using NCBI Entrez gene database
[224], UniProtKB/Swiss-Prot [225], and primary literature. The generation of GeneReaction Associations (GRAs) was subsequently performed using the rBioNet software [264]. The rBioNet software requires a gene index file. The gene index file
contains Entrez gene ID, gene Symbol, location, gene type and description of added
model genes and of the genes encoding the members of the Ras family. To generate
the gene index file, Homo sapiens gene information was downloaded from NCBI
(4/13/2011). The software allows loading model structures and to easily alter the
model content, such as reactions and GRAs. Genes were associated with reactions
using Boolean logic, AND for complexes requiring multiple subunits and reactions
requiring multiple proteins. OR was assigned for functional isoforms.
Gene association additional information
Ras protein family is encoded by 35 genes [265], but they were not included in the
current version of ihsTLRv2 due to functional ambiguity. However, the genes were
included into the gene index file, and can easily be added rBioNet [264]. Additionally, no gene associations were added for reactions involving lipopolysaccharidebinding protein (LBP), which has been described as protein produced in the liver
and transported in the blood[266, 267]. Cytokine production can even be induced in
absence of LBP, as was demonstrated for monocytes stimulated with LPS in presence of rsCD14 [268]. The primary purpose here was to enable data mapping, and
by not adding the gene we ensured that absence of the LBP gene in gene expression
128
data could not interfere with TLR4 signaling. The LBP gene was included in the
gene index file so it could easily be added. Due to the lack of gene reaction association, reactions connected to these chemical compounds will be always active when
data is mapped onto the network.
Model tailoring
In addition to the identification and association of model reactions with human
genes, a number of reactions were removed from the model content using rBioNet
software [264] in order to tailor the model human specific. The removed reactions
concerned the transmission of input signal from TLR11 stimulation. Furthermore,
27 exchange reactions were added for dead-end chemical compounds such as extracellular invaders, IL1A and IL1B.
Identication of mouse orthologs
We
downloaded
the
file
containing
mouse
orthologs
(ftp://ftp.informatics.jax.org/pub/reports/HMD_Human5.rpt from Mouse Genome
Informatics (Mouse Genome Informatics (MGI) Web, The Jackson Laboratory, Bar
Harbor,
#
Maine.
World
Wide
Web
(URL: http://www.informatics.jax.org, 11/16/2011) [269]. We identified matches
for the 314 identified human genes. Additionally, we searched for missing genes
using NCBI Entrez gene database [224]. We added murine neutrophil cytosolic
factor 1 gene (Entrez Gene ID: 17969) to our list of genes. However, six genes
could not be found (cAMP-dependent protein kinase catalytic subunit gamma; H3
histone family, member M; H3 histone family, member J; serine/threonine protein
phosphatase 2A subunit B, PR48 isoform; toll-like receptor 10; calpain 14). Five
genes encoded isoforms. The absence of murine TLR10 gene was expected [270].
Despite of the missing TLR10, mice express TLR11, TLR12, and TLR13 [271],
whereof TLR12 and TLR13 were not part of ihsTLRv1.
SNPs
Exome Varint Server (Exome Variant Server, NHLBI Exome Sequencing Project
(ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/, 12/2011) was queried
for SNPs associated with ihsTLRv2 genes. Additional information about clinical
links of the SNPs were derived from OMIM webpage (Online Mendelian Inheritance in Man, OMIM (using links provided in the Exome Varint Server overview
129
results file) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD, 12/2011). World Wide Web URL: http://omim.org/).
InnateDB PPIs
InnateDB was queried for interactions among the 314 ihsTLRv2 genes
(http://innatedb.com/, 12/02/2011) [227]. In total, 3765 interaction entries were
returned among 242 interacting ihsTLRv2 gene products. No entries were found
for 5 genes (calpain small subunit 2, Entrez Gene ID: 84290; diacylglycerol kinase
kappa, Entrez Gene IDs: 139189; and thioredoxin reductase 1, 2, and 3, Entrez
Gene IDs: 7296, 10587, 114112, respectively) and no interactions could be found
for 67 gene products.
Human Protein Atlas data mapping
Expression profiles for proteins for normal human tissues based on immunohistochemisty was obtained from Human Protein Atlas (version 9.0 and Ensembl version 64.37, 12/2011)[190]. Ensemble IDs were mapped onto ihsTLRv2 genes.
Data with low or uncertain reliability were excluded from analysis. Gene products with moderate/medium and strong/high levels of expression were assumed to
be present, while all others were assumed to be absent. Gene products without data
were assumed to be absent. Hierarchical clustering was done using GenePattern
(http://genepattern.broadinstitute.org/) [272], and Euclidean distance measure for
cell-types and genes.
Analysis of gene expression data and mapping
Gene expression data for unstimulated and LPS stimulated human monocytes were
obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). We
employed data from two experimental groups (vehicle (control), LPS low 2hrs)
[240]. Three chips were excluded from the analysis (GSM252451, GSM252454,
and GSM252479) after visual inspection. Absence and presence calls, dividing
the set of ihsTLRv2 genes into sets of expressed (present) and unexpressed (absent)
genes were generated using the PANP package [273], R (2.13.0) computational platform [274], and using Affymetrix annotation files, for the loose cutoff p≤ 0.05 and
for the stringent cutoff p≤ 0.01. For genes with multiple identifiers, we only used
the identifier showing the highest mean expression intensity in the control group.
Therefore, it is more likely to assign presence calls to absent genes than the other
130
way around. For the mapping of the transcriptomic data, we took advantage of the
previously defined GRAs. Reactions were disabled that were associated with gene
products that had an absence call associated. In case of functional isoforms, reactions were only disabled if all isoforms were called absent. This way the protein and
reaction content of the TLR network was reduced to form the preliminary monocyte
models of two different cutoffs.
Cuto-denition
The Human Protein Atlas (http://www.proteinatlas.org/) was queried using gene
symbols of 33 genes with different P/A calls using two different gene expression
cutoffs, for expression of the encoded proteins in two monocytic leukemia cell lines,
THP-1 and U-937. If the corresponding antibody yielded at least weakly staining
for the majority of tests in one cell sample, we called the protein present.
I/O pathway curation using illustration tool
In order to enable all I/O pathways in the monocyte draft-model, network reactions
connecting missing outputs to input were identified, using software Paint4Net [252].
This tool facilitated curation of incomplete I/O relationships in ihsMonoTLR. We
first derived a list of reactions involved in the signaling pathway towards NF-κB using ihsTLRv2, which contained the complete pathways, as reference. Subsequently,
we did the same for the uncurated ihsMonoTLR and the disconnected output pathways. Comparison of the resulting list of participating reactions revealed the missing links in ihsMonoTLR. Through the GPAs of missing reactions we quickly identified six candidate genes with potential impact on output production. Reincorporation of a single gene at a time revealed the impact of the absence of the particular
gene on the output capability of the model.
Sensitivity analysis
All exchange reactions of ligands and Ligand to receptor binding reactions in
µmol
hMonoTLR_LPS were constraint to zero
. To simulate the distinct I/O
g protein · min
relationships, input combinations were as follows, ’EX_26dap-LL[e] and ’NOD1P_BIND’, ’EX_ALPS[e]’ and ’TLR2/L-D_BIND’, ’EX_LPS_HS[e]’ and ’TLR4/L_MD2_BIND’, ’EX_FLGN[e]’ and ’TLR5_BIND’, ’EX_TCLDLPP[e]’ and ’TLR1/2_BIND’,
’EX_BPM[e]’
and
’TLR7_BIND’,
’EX_SSRNA[e]’
and
131
’TLR8_BIND’, ’EX_UMLCPGD[e]’ and ’TLR9_BIND’ or ’TLR9_BINDII’, ’EX_MRDP[e]’ and ’NOD2P_BIND’). For interleukin-1, no exchange existed to specifically drive IL1R1 stimulation. We therefore added an exchange reaction for IL1R1
(’EX_IL1R1_LIG[e]’). This exchange reaction was enabled in combination with
’IL1R1_BIND’ in order to simulate single receptor IL1R1 stimulation. The nine
output reactions were ’DM_PHOX_GTP-3P[v]’, ’DM_PHOX_GTP-8P[v]’, ’DM_ISRE_IRF3[n]’, ’DM_ISRE_IRF7[n]’, ’CREB_CRE_BIND’, ’AP1_FOS_JUN_BIND’, ’AP1_JUN_BIND’, ’NFKB_IKBA_DISS’, and ’NFKB_IKBB_DISS’. As
implemented in the network structure of the TLR model, IRF7 output could only
carry flux if at least two different inputs were activated ((TLR4) and (TLR7 or
TLR8 or TLR9 or TLR9II)). Thus, in case of IRF7, we additionally enabled flux
through ’TLR8_BIND’ and ’EX_SSRNA[e]’. Note that flux through the remaining output reactions remained possible. To simulate the energy requirements of the
µmol
I/O relationships, we enabled one exchange reaction (lb =ub= -1
) and
g protein · min
µmol
one corresponding binding reaction (lb = ub = 1
) of the specified input
g protein · min
combinations and used the COBRA robustness analysis function. Either atp or gtp
exchange reaction was the reaction of interest, and nPoints = 50. Sensitivity analysis was performed for each I/O relationship. Prior to analysis atp and gtp exchange
µmol
µmol
reactions were constraint to lb=-25
and ub =0
.
g protein · min
g protein · min
Quantitative gene expression analysis
Gene expression data for unstimulated and LPS stimulated human monocytes were
obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). We
employed data from two experimental groups (vehicle (control), LPS low 2hrs)
[240]. Three chips were excluded from the analysis (GSM252451, GSM252454,
and GSM252479) after visual inspection. Lists of up-and down-regulated genes
(Table 4.13 & Table 4.13) were generated using twofold change and p≤0.05 FDR
for min 50% of the identifiers per gene cutoffs using AltAnalyze_v2.02beta for processing of the data [275] using default settings, EnsMart65 database and affymetrix
annotation files.
Mapping of quantitative expression changes
For this analysis, exchanges were closed and the same I/O relationships used as described for the sensitivity analysis. However, for this analysis, Energy supply of the
132
µmol
µmol
, lb = -100
)
g protein · min
g protein · min
µmol
µmol
and exchange of gtp ub = 0
, lb = -50
). For each I/O relag protein · min
g protein · min
tionship, FBA was run using the minNorm option. The sets of model reactions connected to the up-regulated and down-regulated genes were identified and assigned
with a fold change (FCrnx). FCrxn was derived from the change in expression of
the regulated gene that was associated with a reaction. If more than one gene associated with a reaction was significantly regulated, the mean fold change was calculated. Highest fold change for up-and down-regulation in the data set served as a
reference fold change (FC-up and FC-down). Reaction bounds were adjusted based
FCrnx
on the following equations (1) model.lb=
and (2) model.ub=
FC − up ∗ FBAsol.x
FCrnx
. We compared minimum and maximum flux values derived
FC − down ∗ FBAsol.x
through FVA [162] for each of the nine output reactions in response to stimulation
through each of the input receptor types.
model was restricted to exchange of atp ub = 0
All computations were carried out using Matlab (Mathworks, Inc), the COBRA
toolbox [15], and TomOpt (Tomlab, Inc) as linear programming solver.
133
4.5 Supplementary material
This section captures tables published as supplementary material.
Table 4.7: TLR11 receptor was removed from ihsTLRv2 along with 10 reactions
associated.
RXNS
TLRL11_PLACE
TLR11_BIND
EX_UNKN[e]
TIR_MYD_BIND5
DM_TLRL11[c]
PLP_TLRL1
UNKN_TLRL11
UROPBC_EXPR
EX_PLP[e]
TXPLGD_EXPR
Desciption
TLR11 ligand placeholder
Toll like receptor 11 ligand binding
Unknown TLR11 ligand exchange
TLR11-mediated TIR-MyD88 binding
Toll-like receptor 11 ligand (generic) demand
Toll-like receptor 11 ligand (Profilin-like protein)
Toll-like receptor 11 ligand (Unknown)
Expression of uropathogenic bacteria (invader)
Profilin-like protein exchange
Expression of Toxoplasma gondii parasite (invader)
Table 4.8: TLR11 receptor was removed from ihsTLRv2 along with seven other
metabolites.
Metabolite
TLR11/L-D (c)
UROPBC(e)
TXPLGD(e)
PLP(e)
TLRL11 (e)
Unkn(e)
TLRL11(c)
TLR 11 (c)
134
Rxns participation (Formula)
2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c]
UROPBC[e] -> UNKN[e]
TXPLGD[e] -> PLP[e]
PLP[e] -> TLRL11[e]; PLP[e] <=>; TXPLGD[e] -> PLP[e]
TLRL11[e] -> TLRL_PLACEHOLDER[c]; 2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; PLP[e] -> TLRL11[e]
UNKN[e] <=>; UROPBC[e] -> UNKN[e]
MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c]; TLRL11[c] ->
2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c]
Table 4.9: Added exchange reactions.
Reaction
EX_MYCBTB[e]
EX_MYCB[e]
EX_MYCP[e]
EX_NEISMN[e]
EX_NEIS[e]
EX_PLNT[e]
EX_RSV[e]
EX_SALMEN[e]
EX_STLCEP[e]
EX_SYNCPD[e]
EX_PPYMGG[e]
EX_PSDMAR[e]
EX_TRPNMT[e]
EX_TRYPCR[e]
EX_VIRS[e]
EX_KLBS[e]
EX_LPTSIG[e]
EX_MMTV[e]
EX_CLMYPN[e]
EX_FUNG[e]
EX_GRAMN[e]
EX_GRAMP[e]
EX_HOST[e]
EX_BORRBG[e]
EX_IL1A[e]
EX_IL1B[e]
EX_BACT[e]
Description
Exchange of mycobacterium tuberculosis (invader)
Exchange of mycobacteria (invader)
Exchange of mycoplasma (invader)
Exchange of neisseria meningitides (invader)
Exchange of neisseria (invader)
Exchange of plants (invader)
Exchange of RS virus (invader)
Exchange of salmonella enterica (invader)
Exchange of staphylococcus epidermidis (invader)
Exchange of synthetic compounds (invader)
Exchange of porphyromonas gingivalis (invader)
Exchange of pseudomonas aerug (invader)
Exchange of treponema maltophilum (invader)
Exchange of trypanosoma cruzi parasite (invader)
Exchange of viruses (invader)
Exchange of klebsiella (invader)
Exchange of leptospira interrogans (invader)
Exchange of MMT virus (invader)
Exchange of chlamydia pneumoniae (invader)
Exchange of fungi (invaders)
Exchange of Gram-negative bacteria (invader)
Exchange of Gram-positive bacteria (invader)
Exchange of host (invader)
Exchange of borrelia burgdorfen (invader)
Exchange of IL-1A
Exchange of IL-1B
Exchange of bacteria (invader)
Reaction formula
MYCBTB[e] <=>
MYCB[e] <=>
MYCP[e] <=>
NEISMN[e] <=>
NEIS[e] <=>
PLNT[e] <=>
RSV[e] <=>
SALMEN[e] <=>
STLCEP[e] <=>
SYNCPD[e] <=>
PPYMGG[e] <=>
PSDMAR[e] <=>
TRPNMT[e] <=>
TRYPCR[e] <=>
VIRS[e] <=>
KLBS[e] <=>
LPTSIG[e] <=>
MMTV[e] <=>
CLMYPN[e] <=>
FUNG[e] <=>
GRAMN[e] <=>
GRAMP[e] <=>
HOST[e] <=>
BORRBG[e] <=>
IL1A[e] <=>
IL1B[e] <=>
BACT[e] <=>
Table 4.10: Literature evidence for the presence of proteins in monocytes. The four
proteins listed here had not been expressed in two human cell lines according to
immunohistological data downloaded from Human protein Atlas [42].
HPA negative
TRAF2
HSPB1
MAP2K6
ITPR1
Literature evidence expression in Monocytes
PMID: 18827186
PMID: 20557877
PMID: 11257452
PMID: 15995150
135
Table 4.11: Pathway curation of the monocyte draft-model based on output capabilities. Candidate genes were identified among the absent genes that appeared to be
connected to the blocked output NFκB. Maximum flux (µ mol × gprotein-1 × min1) through each of the output reactions generated using FBA, for the unconstrained,
hMonoTLR model after recovery of one of the five candidate genes (TIRAP = tollinterleukin 1 receptor (TIR) domain containing adapter protein; PKCZ = protein
kinase C (zeta isoform); MAPK11 = mitogen-activated protein kinase 11; CAMK-II
= calmodulin-dependent kinase 2; RPS6KA5 = ribosomal protein S6 kinase, 90kDa,
polypeptide 5).
Output
Output reaction
ROS
ROS
IRF3
IRF7
CREB
AP-1
AP-1
NF-κB
NF-κB
DM_PHOX_GTP-3P
DM_PHOX_GTP-8P
DM_ISRE_IRF3
DM_ISRE_IRF7
CREB_CRE_BIND
AP1_FOS_JUN_BIND
AP1_JUN_BIND
NFKB_IKBA_DISS
NFKB_IKBB_DISS
136
recovered gene
TIRAP
0
11.45
3
6
1
10.71
21
0
0
PKCZ
41
11.45
3
6
1
10.71
21
25.14
25.14
MAPK11
0
11.45
3
6
1
10.71
21
0
0
CAMK-II
0
11.45
3
6
21.5
10.75
21
0
0
RPS6KA5
0
11.45
3
6
1
10.71
21
0
0
137
Gene
815
816
817
818
5578
5590
5600
7100
8844
9252
10392
25998
114609
151742
6582
6584
283455
8945
81793
84962
Gene Name
CAMK2A
CAMK2B
CAMK2D
CAMK2G
PRKCA
PRKCZ
MAPK11
TLR5
KSR1
RPS6KA5
NOD1
IBTK
TIRAP
PPM1L
SLC22A2
SLC22A5
KSR2
BTRC
TLR10
JUB
15
12
4
9
11
7
1
17
2
1
1
1
2
6
3
rxns
4
4
4
4
weak
moderate/weak
PMID:11561001
PMID:20923704
moderate/strong
weak
strong
strong
strong/weak
moderate/strong weak/negative
HPA
moderate
negative/moderate
PMID:20227498
Citation
PMID:16154993
PMID:16154993
PMID:16154993
PMID:16154993
PMID:8288312
PMID: 8523529
PMID: 15356147
PMID:11561001; PMID:15096475
PMID:20227498
PMID: 11257452
PMID:20584763
PMID:18596081
PMID:16439361; PMID:20525286
PMID:12121439
Decision
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
no longer absent
absent
absent
absent
absent
absent
absent
Group of chemical compound
Kinase
Kinase
Kinase
Kinase
Kinase
Kinase
Kinase
Receptor
Protein (enzyme, no kinase activity)
Kinase
Receptor
Kinase
Protein (adapter protein)
Phosphatase
metabolite transporter
metabolite transporter
Protein (enzyme, no kinase activity)
Protein
Receptor
Protein
Table 4.12: Curation of hMonoTLR. In order to represent monocyte function, a number of genes were reintroduced according to
literature evidence for expression in human monocytes. The table lists the genes alongside with the number of reactions, it was
assigned to, corresponding literature, expression according to the Human Protein Atlas (HPA) for cell lines [32], and based on the
evidence the decision whether to reincorporate the gene or not. The genes for KSR1 and NOD1 were no longer considered absent
while generating the LPS stimulation specific model.
Table 4.13: Significantly up-regulated hMonoTLR_LPS genes. The Table lists the
number of significantly tested identifiers, number of total number of identifiers per
genes, and the resulting percentage of regulated identifiers. Note that in case of the
three genes (IRAK2, TLR4, and PLD1) only a minority of identifiers was differently
expressed after LPS stimulation.
Gene name
NFKB1
DUSP1
IL1B
MAP3K7IP2
TNFAIP3
TIFA
EIF4E
MAP3K8
MYC
TXN
RIPK2
NFKBIZ
TNIP3
CASP1
UBE2D1
PELI1
PPP3CC
IL1R1
TBK1
JUN
SOCS1
TNIP1
IL1A
IRAK2
TLR4
PLD1
Entrez Gene ID
4790
1843
3553
23118
7128
92610
1977
1326
4609
7295
8767
64332
79931
834
7321
57162
5533
3554
29110
3725
8651
10318
3552
3656
7099
5337
Number of probesets significant
1
2
2
2
2
2
4
2
1
2
2
2
1
4
2
2
2
1
1
2
2
1
1
1
1
1
Number of total probesets
1
2
2
2
2
2
4
2
1
2
2
2
1
5
3
3
4
2
2
4
4
2
2
3
4
7
% differentially expressed
100
100
100
100
100
100
100
100
100
100
100
100
100
80
67
67
50
50
50
50
50
50
50
33
25
14
Table 4.14: Significantly down-regulated hMonoTLR_LPS genes. The Table lists the
number of significantly tested identifiers, number of total number of identifiers per
genes, and the resulting percentage of regulated identifiers. Note that in case of five
genes (CBL, CASP8, MAP3K3, YWHAH, and MBP) only a minority of identifiers
was differently expressed after LPS stimulation.
Gene name
TLR1
MAP3K14
TLR6
FADD
CBL
CASP8
MAP3K3
YWHAH
MBP
138
Entrez Gene ID
7096
9020
10333
8772
867
841
4215
7533
4155
Number of probesets significant
1
1
1
1
2
1
1
1
1
Number of total probesets
1
1
1
1
5
3
3
3
8
% differentially expressed
100
100
100
100
40
33
33
33
13
139
NFKB_IKBB_DISS
NFKB_IKBA_DISS
AP1_JUN_BIND
AP1_FOS_JUN_BIND
CREB_CRE_BIND
DM_ISRE_IRF7[n]
DM_ISRE_IRF3[n]
DM_PHOX_GTP-8P[v]
DM_PHOX_GTP-3P[v]
max
min
max
min
max
min
max
min
max
min
max
min
max
min
max
min
max
min
IL1R1_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
NOD1P_BIND
0
0
0
0
0
0
0
0
0
0
0.167
0
0
0
1
0
1
0
NOD2P_BIND
0
0
0
0
0
0
0
0
0
0
0.167
0
0
0
0.667
0
0.667
0
STLR4_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR1/2_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR2/L-D_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.667
0
0.667
0
TLR4/L_MD2_BIND
1
0
0.375
0
0.25
0
0.5
0
0.5
0
0.25
0
0.5
0
0.667
0
0.667
0
TLR5_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR7_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR8_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR9_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
TLR9_BINDII
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
STLR2/L_SCD14_BIND
1
0
0.375
0
0
0
0
0
0.5
0
0.25
0
0.5
0
0.333
0
0.333
0
Table 4.15: I/O relationships. Outputs produced when receptor input was each 1 (µ mol × gprotein-1 × min-1). Baseline flux values
resulting from thermodynamically infeasible loops have been subtracted.
DM_ISRE_IRF3[n]
DM_ISRE_IRF7[n]
AP1_FOS_JUN_BIND
AP1_JUN_BIND
NFKB_IKBA_DISS
NFKB_IKBB_DISS
min
min
min
min
min
min
IL1R1_BIND
0.000
0.000
0.010
0.019
0.011
0.027
NOD1P_ BIND
0.000
0.000
0.006
0.000
0.034
0.082
NOD2P_BIND
0.000
0.000
0.006
0.000
0.023
0.055
STLR4_BIND
0.000
0.000
0.010
0.019
0.011
0.027
TLR1/2_BIND
0.000
0.000
0.010
0.019
0.011
0.027
TLR2/L-D_BIND
0.000
0.000
0.010
0.019
0.023
0.055
TLR4/L_MD2_ BIND
0.007
0.008
0.010
0.019
0.023
0.055
Table 4.16: Changes due to mapping of quantitative gene expression changes. Observed changes in min flux values (µ mol ×
gprotein-1 × min-1) of model output reactions, defined by FVA after mapping of quantitative gene expression changes. No changes
were observed in the maximum values.
IRF3
IRF7
AP-1
AP-1
NFκB
NFκB
140
141
IRF3
IRF7
AP-1
AP-1
NFκB
NFκB
DM_ISRE_IRF3[n]
DM_ISRE_IRF7[n]
AP1_FOS_JUN_BIND
AP1_JUN_BIND
NFKB_IKBA_DISS
NFKB_IKBB_DISS
min
min
min
min
min
min
TLR7_BIND
0.000
0.000
0.010
0.019
0.011
0.027
TLR8_BIND
0.000
0.000
0.010
0.019
0.011
0.027
TLR9_BIND
0.000
0.000
0.010
0.019
0.011
0.027
TLR9_BINDII
0.000
0.000
0.010
0.019
0.011
0.027
STLR2/L_SCD14_BIND
0.000
0.000
0.010
0.019
0.011
0.027
Table 4.17: Changes due to mapping of quantitative gene expression changes. Observed changes in min flux values (µ mol ×
gprotein-1 × min-1) of model output reactions, defined by FVA after mapping of quantitative gene expression changes. No changes
were observed in the maximum values.
File S1
142
1
Maike K. Aurich and Ines Thiele, Contextualization procedure and modeling
of monocyte specific TLR signaling.
Workflow for the generation of a cell-type specific network of
TLR-signaling
Requirements of software and packages:
– R (integrated suite of software facilities for data manipulation, calculation and graphical display, http://www.r-project.org/, [1]) and PANP package [2]
– Matlab (Mathworks, Inc)
– COBRA toolbox (http://opencobra.sourceforge.net/openCOBRA/Welcome.html, [3]) and a
linear programming solver
– Paint4Net [4]
Step 1: Select data
• Download CEL files from e.g. Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/,
[5]).
Step 2: Generate P/A calls from gene expression data using R
• Use PANP package [2] in R for data processing and the generation of P/A calls.
2
Workflow in R:
>source(”http://bioconductor.org/biocLite.R”)biocLite(”panp”)
> setwd (e.g.’C:/...’)
> library(gcrma)
> Data < − ReadAffy() read data in working directory
> eset < − gcrma(Data)
>library(panp)
> PA < − pa.calls(eset, looseCutoff = 0.05,tightCutoff = 0.01, verbose = FALSE)
> myPcalls < − PA$Pcalls
> write.table(myPcalls, ”A P Calls”, append = FALSE, quote = TRUE,sep = ””,eol = ”\n” ,
na = ”NA”,dec = ”.”, row.names = TRUE)
> myPvals < − PA$Pvals
> write.table(myPvals, ”A P vals”, append = FALSE, quote = TRUE,sep = ””,eol = ”\n”,
na = ”NA”,dec = ”.”, row.names = TRUE)
> write.exprs(eset,file=”myresults.txt”)
This was the only analysis step done performed in R. From now on only use Matlab.
Step 3: Derive cell-type specific gene lists of absent genes from P/A calls at
two different cutoffs
• Map Affymetrix IDs (A/P calls) to Entrez Gene IDs (ihsTLRv2.genes) (e.g using IDconverter
http://idconverter.bioinfo.cnio.es/ [6]).
• If multiple Affymetrix IDs matched to one model gene, we used the Affymetrix IDs showing the
highest mean expression intensity in the untreated group. For the generation of the monocyte
model, we derived calls using Affymetrix IDs specified in (File S2, Table S14).
• Summarize calls for untreated replicates (to generate monocyte model) to receive one call (A or P)
per ihsTLRv2 gene and cutoff (p≤ 0.01 and p≤ 0.05).
• Marginal calls are absent in the tight (p≤ 0.01) and present in the loose (p≤ 0.05) cutoff. If a gene
received absent calls in the majority of replicates, call the gene absent.
• Lists of absent genes used for the generation of the monocyte draft models (p≤ 0.01 and p≤ 0.05)
are provided in the supplementary information (File S2, Table S15).
3
Step 4: Generate of two draft models from lists of absent genes
• Use ihsTLRv2, lists of absent genes and the COBRA toolbox [3] function deletemodelgenes to
generate two draftmodels.
• Set columns of constrained reactions in the draftmodel S matrix to zero.
Commands in Matlab:
[draftmodel,hasEffect,constrRxnNames,deletedGenes] = deleteModelGenes(model,geneList);
constrRxn= find(ismember(draftmodel,constrRxnNames));
for i = 1 : length(constrRxn)
draftmodel.S(:,constrRxn(i))=0;
end
• The applied method relies entirely on the input (list of absent genes) and disables flux through any
reaction associated with a deleted gene if no isozymes are associated. Gene expression data are
known to be noisy, therefore it is important to find a suitable cutoff when using this method, and
to perform manual curation afterwards.
Step 5: Find the best draft reconstruction with respect to cell-type
• Check if absent genes are absent at protein level in target cell-type using the Human Protein Atlas
and decide based on this additional information, on the biologically most conclusive cutoff.
Step by step:
– Find the set of genes absent in the tight (p≤ 0.01) and present in the loose (p≤ 0.05) cutoff.
– Check for protein expression using the Human Protein Atlas (http://www.proteinatlas.org/)
in the cell-type (herein, two monocytic leukemia cell lines (THP-1 and U-937)).
– If the antibody yielded at least weak staining for the majority of tests in one cell sample, call
the gene product present.
– Use expression information and probability values from P/A calls (A P vals file generated by
PANP) to define the most suitable cutoff.
– Perhaps also take into consideration the number of blocked reactions, dead ends and input
receptor covered by either draftmodel.
4
Commands in Matlab:
[minFlux,maxFlux] = fluxVariability(draftmodel,0);
Flux = [minFlux maxFlux];
for i = 1 : length(Flux)
x = length (find (abs(Flux(i,:))<=10e-10==0))==2;
Blockedrxns(i,1) = x;
end
for i = 1:length(draftmodel.mets)
MetConn(i,1)=length(find(draftmodel.S(i,:)));
end
MetConnCompare1 = sort(MetConnTLR,’descend’);
deadends(1,1) = length(find(MetConnCompare1(:,1)==1));
• Decide on a suitable cutoff.
• For the monocyte, we decided to continue with the draftmodel generated using data from the loose
(p≤ 0.05) cutoff.
• If necessary, repeat draft model generation based on absent gene set of the newly defined cutoff
(Step 4).
Step 6: Curate draftmodel: complete disconnected output pathways
• Check using Flux variability analysis (FVA) if the draftmodel can produce all outputs known to be
produced in the cell-type.
Command in Matlab:
[minFlux,maxFlux] = fluxVariability(model,0);
– Monocytes produce all six ihsTLRv2 outputs. However, the monocyte draftmodel was not
able to produce e.g. NF-κB.
• Search for candidate genes to complete disconnected output pathways using the pathway illustration
tool, Paint4Net [4]:
– First, identify a list of reactions involved in the signaling pathway towards NF-κB using ihsTLRv2, which contains the complete pathways, as reference. Choose a wide radius such that
involvedRxns contains the complete pathway from an input to the specified output to capture
the entire pathway.
5
Command in Matlab:
[involvedRxns,involvedMets,deadEnds]=draw by met(model,metAbbr,drawMap,...
radius,direction,excludeMets,flux);
– Do the same for the draftmodel and the disconnected output pathways.
– Compare the resulting list of participating reactions to reveal the missing links in the draftmodel. Find the genes associated with these reactions. These are the candidate genes.
– Generate draft models, while reincorporating one candidate gene at a time and check the
impact on output production.
– Candidate genes identified using this approach and output fluxes derived after reincorporation
of candidate genes was added to the supplementary information (File S2, Table S6).
Step 7: Curate draftmodel: Curate model based on cell-type specific literature
• Search for literature evidence of absent genes being expressed in the specific cell-type.
– We only considered genes, which were absent in the monocyte draftmodel, while isoforms of
already captured genes were ignored.
– During manual curation, 14 genes were reintroduced to the ihsMonoTLR model based on
literature support (File S2, Table S7).
Step 8: Derive curated cell-type model
• Update gene list by removing curated genes from list of absent genes at defined cutoff (p≤ 0.05).
• Generate final cell-type model based on deletion of curated gene list (Step 4).
• The absent gene list used for the generation of the final monocyte model is provided in the supplementary information (File S2, Table S15).
• Use COBRA function extractSubNetwork and the set of reactions that remained unconstrained during model generation (constrRxnNames output of deletemodelgenes provides the list of constrained
reactions).
Command in Matlab:
subModel = extractSubNetwork(model,rxnNames)
Step 9: Tailor cell-type model condition specific
• Find list of cell-type model genes absent in the specific condition (e.g. LPS treatment of monocytes).
• Repeat Step 4 using the final cell-type model as model and the set of absent genes.
• In case of the monocyte model two genes (Entrez Gene ID: 246330 and 10333) were deleted in order
to derive the LPS-stimulation specific monocyte network (hMonoTLR LPS).
6
References
1. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software
development for computational biology and bioinformatics. Genome biology 5: R80.
2. Warren P, Taylor D, Martini P, Jackson J, Bienkowska J (2007) PANP-a new method of gene
detection on oligonucleotide expression arrays. In: Bioinformatics and Bioengineering, 2007. BIBE
2007. Proceedings of the 7th IEEE International Conference on. IEEE, pp. 108–115.
3. Schellenberger J, Que R, Fleming R, Thiele I, Orth J, et al. (2011) Quantitative prediction of
cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature Protocols 6:
1290–1307.
4. Kostromins A, Stalidzans E (2012) Paint4Net: COBRA Toolbox extension for visualization of
stoichiometric models of metabolism. BioSystems .
5. Edgar R, Domrachev M, Lash A (2002) Gene Expression Omnibus: NCBI gene expression and
hybridization array data repository. Nucleic acids research 30: 207–210.
6. Alibés A, Yankilevich P, et al. (2007) IDconverter and IDClight: Conversion and annotation of gene
and protein IDs. BMC bioinformatics 8: 9.
File S3
149
Figure 1. Sensitivity analysis
Figure 2. Sensitivity analysis
1
Figure 3. Sensitivity analysis
Figure 4. Sensitivity analysis
2
Figure 5. Sensitivity analysis
Figure 6. Sensitivity analysis
3
Figure 7. Sensitivity analysis
Figure 8. Sensitivity analysis
4
Figure 9. Sensitivity analysis
Figure 10. Sensitivity analysis
5
Figure 11. Sensitivity analysis
Figure 12. Sensitivity analysis
6
Figure 13. Sensitivity analysis
7
5 Conclusions and future directions
5.1 Conclusions
Modern high-throughput techniques offer immense opportunities to investigate whole
systems behavior, with human diseases being one application. However, the immanent complexity of the data challenges the interpretation, and new avenues need to
be taken in order to handle the complexity of both, diseases and data. In chapter 1 of
this thesis, I provided an overview of the COBRA approach, and how its biomedical
applications and the number of algorithms for the integration of omics data into the
network context is constantly increasing [24, 127, 146, 276].
At the starting point of this thesis, existing approaches to build cell type or condition
specific metabolic models mainly emphasized on the integration of proteomic and
transcriptomic data. This PhD work initially focused on the use of metabolomics
data as primary data type during condition specific network generation. It provides
approaches for the integration of both quantitative (chapter 2) and semi-quantitative
(chapter 3) extracellular metabolomic data, and shows how COBRA can be used
to gain insights into the intracellular metabolic network that could not have been
drawn from the data alone.
In chapter 2, I used published quantitative metabolomic data [137] and the human
metabolic model to investigate metabolic heterogeneity among a large set of cancer
cell lines. I generated and analyzed a set of 120 cancer cell line specific metabolic
models from the same number of single metabolomic profiles. The data set was particularly interesting, because all cell lines were grown under the same set of environmental conditions, emphasizing genetic differences among the cancer cell lines.
Computational analysis of the model set allowed me to reveal the vast metabolic
heterogeneity among the models, and to classify the models into distinct phenotypic
groups. The classifications were based on distinct utilization of ATP and cofactor
production pathways, as well as the flexibility of the models to variation of nutrient
uptake and secretion rates. Thus, the inference of internal metabolic states from
extracellular metabolomic data revealed large differences of metabolic strategies
among the cell line models.
The cancer cell line models differed with respect to their feasible range of oxy-
157
gen uptake rates, which was one example of how predictions by models, generated
from extracellular metabolomic data alone, can provide novel insight into cellular
metabolic traits. Some of the cell line models had a limited range of feasible uptake
rates, whereas others were indifferent towards oxygenation. Moreover, limitation of
oxygen uptake rates induced a dependency on reductive carboxylation in an increasing number of cancer cell line models compared to the ’unlimited’ condition. This
result could have broad implications for experimental work, since it demonstrated,
that the oxygenation conditions define pathway usage. Experiments are usually
conducted under normoxic conditions, which is much higher than tissue or tumor
oxygenation [172]. Thus, normoxic conditions might introduce a cell line specific
bias into experimental results, and the conclusions drawn from these experiments.
One methodological challenge addressed in this study was the identification of ’undetected exchanges’, either due to the limitation of the detection method or the uncertain composition of serum. This problem was addressed by predicting a minimal
number of ’hypothetically undetected’ additional exchanges, based on the context
of the human metabolic model and subject to the stated objective function. Frequently added metabolite exchanges were connected to cancer by literature query.
Additionally, some of the models correctly predicted essential genes, i.e., PGDH
dependency in the SK-MEL-28 models. These correct predictions provided additional support for the approach to model generation from extracellular metabolomic
data alone.
Although the strategy of minimizing the number of exchanges provided a vast reduction of the overall model size, remained the internal network redundancy preserved.
Nevertheless, meaningful predictions could be drawn from the models, suggesting
that the metabolic profiles were comprehensive enough to provide sufficient definition of the solution spaces of the model. However, metabolic profiles often capture
less metabolites, leading to the question if meaningful predictions can be drawn
from less comprehensive, or semi-quantitative extracellular metabolomic profiles?
And if additional transcriptomic data would improve predictions, or could even provide additional insights in combination with the model predictions?
In chapter 3, I used another set of metabolomic data to generate metabolic models
of two lymphoblastic leukemia cell lines. I developed an approach for the mapping
of semi-quantitative extracellular metabolomic data. Additionally, transcriptomic
data was used to apply constraints to the internal reactions. Quantitative differences
in the metabolite exchange profiles of the two cell lines were translated into quantitative differences in the constraints imposed on metabolite exchange reactions in
the two models. Further, unique uptake and secretion of metabolites were detected
in one cell line or the other. As a consequence of the imposed constraints, the two
cancer cell line models explained the differences in the extracellular metabolomic
and transcriptomic data by different utilization of glycolysis and oxidative phospho-
158
rylation, suggesting aerobic glycolysis was more utilized by the CCRF-CEM model
compared the Molt-4 model, which utilized more oxidative phosphorylation. Differential gene expression analysis and analysis of alternative splicing events distinguished the two cell lines, and revealed an accumulation of these regulatory events
at rate limiting steps of central metabolic pathways, which further supported that
the two cell lines had distinct metabolic phenotypes. Thus, integration of semiquantitative extracellular metabolomics data into the context of a human metabolic
model enabled the interrogation of metabolic differences among cell lines.
Additionally, the potential of integrating multiple omics data set was emphasized.
Differential gene expression is suitable to push the systems analysis to the next level,
i.e., to find indicators that might explain the metabolic differences between the cell
lines. Such a comparison of differential gene expression and predicted reaction
fluxes was also performed to compare adiposite metabolism in obese versus lean
subjects, and again for the investigation of disease mechanisms in non-alcoholic
fatty liver disease [144, 146]. However, to my knowledge, differential splice variants
have so far not been taken into consideration. Considering the differential regulation
of genes as potential determiner of the observed metabolic differences between cell
lines ultimately reveals the boundaries of using metabolic models, which to date and
in humans does not consider regulatory impacts.
In chapter 1 of this thesis, I discussed that in cancer, rewiring of the metabolic
network is connected to genetic alternations affecting characteristic signaling pathways. A first step was to answer the question, as to how far signaling networks
vary on the level of cell types or conditions, and if the COBRA approach could provide valuable insights into such cell type and condition specific differences through
network contextualization using omics data sets.
Among the only two reconstructed signaling networks in higher organisms was the
mammalian TLR signaling network [7, 8]. Generally, cell type specific differences
in TLR expression had been observed, as well as an involvement of TLR signaling
in diseases such as cancer was evident [38, 39, 211, 212, 213]. The published TLR
signaling network did not include genes and GPRs, which was an obstacle to data
integration. Hence, I manually identified the genes encoding the proteins in the
network and formulated the GPRs to enable mapping of transcriptomic data.
Subsequently, I demonstrated, using the newly identified set of genes and various
data sets, e.g., InnateDB, and the Human Protein Atlas [190, 227], that the network
was indeed subject to cell type specific differences, and that cell type or condition specificity can be displayed by the signaling network. Further, comparison of
the quantitative differences in the abundance of the respective signaling proteins
and TLR receptors could reveal differences between cancer cell lines, for which
large-scale proteomic data sets had been recently published [110, 111]. Apart from
159
quantitative and qualitative tissue specific differences of protein expression, I also
investigated the disease relevance of the TLR signaling network. Based on the newly
identified gene set I identified SNPs for 12 distinct network genes linked to known
clinical phenotypes including cancer (Table 4.3).
COBRA is limited to the exploration of steady-states. Nevertheless, network contextualization could provide valuable insights into TLR signaling. I set up a pipeline
for the contextualization of signaling networks, which was based on transcriptomic
data. To establish the contextualization of a signaling network, I used data from a
cell type which was well studied with respect to TLR signaling, such that the outcome could be readily validated based on existing literature. I first generated a cell
type specific, monocyte model of TLR signaling. The threshold for the definition of
absent and present gene sets was based on manual evaluation of the overlap between
presence of gene expression and presence of protein expression [190]. Additionally, the preliminary monocyte model was manually curated to fill in the emerging
gaps. The final cell type specific monocyte model was further tailored towards an
LPS activation condition specific TLR signaling network of the human monocyte.
There were only minor differences in qualitative gene expression. However quantitative differential gene expression between LPS activated and non-activated human
monocytes revealed topological differences. Differential gene regulation post LPS
stimulation involved about one third of the TLR signaling network of the monocyte
including the transcription factors NF-κB and AP-1, which was an expected response [242, 258]. Thus, the contextualization of the signaling network constitutes
a tool to investigate the impact of changes in the network topology and generation
of input/output relationships in a cell type or condition specific manner. In addition
to the contextualization, I demonstrated linkage of the TLR signaling network to
metabolism by elucidating the dependency of the individual input-output relationships on energy metabolites.
Thus, this thesis furthers the application of COBRA to the analysis of omics data,
and for biomedical applications in multiple ways. It introduces novel approaches to
the generation of condition specific metabolic models from extracellular metabolomics
data sets alone (Chapter 2), or in combination with transcriptomic data (Chapter 3),
and opens further the avenue for the contextualization of signaling networks (Chapter 4).
5.2 Future applications
The outcome of this thesis are working procedures for the contextualization of
metabolic models with an emphasis on the integration of extracellular metabolomic
160
data [15, 252], and for the contextualization of a signaling network. Multiple future
steps arise as potential continuation of the work presented in this thesis.
5.2.1 Extension of the TLR signaling network.
TLR signaling is, not least for its contribution to human disease, an active field of
research. The mammalian signaling network was compiled based on a map published in 2006 [7, 43]. New knowledge has been generated since then, and it would
be worthwhile to add this emerging biochemical knowledge into ihsTLRv2 in order
to provide a better framework for future contextualization. In order to widen the
scope of ihsTLRv2 for the modeling of neurodegenerative diseases, the microglial
MAC1 receptor could be incorporated into ihsTLRv2, which has been shown to
become activated though β -amyloid peptide, followed by activation of NADPH oxidase (PHOX), production of superoxide and potentially the death of dopaminergic
neurons [277]. In order to integrate the response to β -amyloid into the TLR network, two network species need to be added to ihsTLRv2, MAC1 receptor and
RAC.
Besides the use of transcriptomic or proteomic data as demonstrated herein, the
TLR signaling network could provide a context for the analysis of additional omics
data sets, e.g., phosphoproteomics. The consequences of LPS stimulation on the
protein phosphorylation in primary macrophages could be a good starting point to
establish the integration of this data, since it again deals with a well studied system
and the quality of the insights gained through the analysis could be validated based
on the existing literature [278].
5.2.2 Future directions in the integration of metabolomics data
The use of metabolomic data seems particularly promising for personalized health
and diagnostics. These data sets are particularly well suited to investigate metabolic
phenotypes, since metabolites constitute the entities of the metabolic networks, with
compared to transcriptomics or proteomics, less regulatory influence (Figure 1.3).
Furthermore, metabolomic data sets have become increasingly comprehensive in
recent years and metabolomics profiles from the spent medium of cell cultures or
patients’ bio fluids can be relatively easily obtained. One future goal would be the
use of COBRA as tool to predict disease risk (e.g. IEM or cancer), or to support
personalized medication. However, a number of challenges have to be overcome in
order to reach this point. Methods need to be established to integrate patient-derived
body fluid samples (e.g. blood, urine, interstitial fluid or even mixtures of these)
161
into the context of the metabolic reconstruction. Currently, uptake rates are taken
into account, or the data was incorporated into a non-growth associated biomass
objective function [117, 143]. The prediction of a minimized set of ‘undetected’
metabolite exchanges, as demonstrated in chapter 2, will enable the integration of
incomplete extracellular metabolomics profiles. However, experimental validation
will need to confirm the correctness of the predicted additional exchanges, and the
potential impact of false prediction on the phenotype of the patient-specific model
needs to be evaluated. Furthermore, the prediction of metabolite exchanges carried
out herein depends on a stated OF. Whereas biomass production can be a valid assumption for proliferating cancer cells or cell lines, is the definition of an OF of
normal human cells problematic. The variability of metabolic profiles from cells of
the same cell line and the impact on the metabolic phenotype of the cell line model
became evident in chapter 2. Further work is needed to investigate the variability of
the exchanges across studies, to prevent false prediction, and evaluate the robustness
of the approach towards natural variability. Samples from body fluids bear the problem that the cellular origin of increased or decreased metabolite levels is unknown.
However, the application of Recon as whole body metabolic network (including
all the organs), and to explain the differences in the plasma as the net metabolite
changes (uptake and secretion) mediated by cells throughout human body [53, 143],
could be a good starting point.
5.2.3 COBRA modeling of cancer and beyond
The number of aspects of cancer metabolism that are investigated using COBRA
is constantly expanding, and goes even beyond metabolism, e.g. solvent capacity
[102]. Two of the studies comprising the thesis (chapter 2 & 3) add up to the growing number of cancer studies, namely providing novel insights into the metabolic
heterogeneity and the robustness towards systems perturbations among cancer cell
lines. Similarly, the methods could be applied to other cellular systems and disease
conditions.
TLR signaling has been connected to cancer and cancer-associated metabolic phenotypes, e.g., it has been observed that TLR agonists can stimulate a switch from
oxidative phosphorylation to glycolysis in murine immune cells [255, 256]. Part of
chapter 4 described the absence of TLR receptors from the quantitative proteomic
data of two cancer cell lines. It would be interesting to investigate the expression using alternative data sets for a correlation between phenotype and the pathways comprising the TLR signaling cascades. Extensive cross-talk exists between signaling
pathways. As an example, cross-talk between the epidermal growth factor receptor (EGFR) and TLR signaling was shown to compromise the immune response to
viruses [279]. However, to investigate such cross-talk, the epidermal growth factor
162
receptor (EGFR) network would need to be reconstructed.
Chapter 4 emphasized the energy requirements of the distinct input-output pathways
in the TLR signaling network that constitute direct links between the signaling and
metabolic network. Another good example for the interlocking of signaling and
metabolism is the protein kinase function of the glycolysis enzyme PKM2 and its
impact on gene transcription [280, 281]. Taken together, important insights into the
crosstalk between signaling and metabolism can be expected from the integration of
models of different cellular processes, i.e., metabolism and signaling, or metabolism
and gene regulatory networks [256, 259, 260, 261, 262]. Tools to accomplish the
coupling are being developed [6, 27, 263], however the size of such networks likely
exceed currently available computing power.
Finally, predictions from contextualized networks can only be as good as the starting model. Continuous improvement and extension of the starting model based on
existing and emerging biochemical knowledge or data, as carried out throughout
this work (Figure 1.4) [53, 58, 173], is one of the most important tasks to facilitate
the analysis of omics data in the model context, and to improve the predictions and
the applicability of the biochemical reaction networks in general.
Hence, this thesis provides starting points for a wide range of applications of biochemical reaction networks in health and disease.
163
Bibliography
[1] Kitano, H. Foundations of Systems Biology. MIT press Cambridge, MA,
(2001).
[2] Palsson, B. Systems biology : properties of reconstructed networks. Cambridge University Press, Cambridge ; New York, (2006).
[3] Kitano, H. Science 295(5560), 1662–1664 (2002).
[4] Machado, D., Costa, R. S., Rocha, M., Ferreira, E. C., Tidor, B., and Rocha,
I. AMB Express 1(1), 1–14 (2011).
[5] Durot, M., Bourguignon, P.-Y., and Schachter, V. FEMS Microbiology Reviews 33(1), 164–190 (2009).
[6] Thiele, I., Jamshidi, N., Fleming, R., and Palsson, B. PLoS Computational
Biology 5(3), e1000312 (2009).
[7] Li, F., Thiele, I., Jamshidi, N., and Palsson, B. PLoS Computational Biology
5(2), e1000292 (2009).
[8] Papin, J. and Palsson, B. Biophysical Journal 87(1), 37–46 (2004).
[9] Thiele, I. and Palsson, B. Nature Protocols 5(1), 93–121 (2010).
[10] Reed, J. L., Famili, I., Thiele, I., and Palsson, B. Ø. Nature Reviews Genetics
7(2), 130–141 (2006).
[11] Orth, J., Thiele, I., and Palsson, B. Nature Biotechnology 28(3), 245–248
(2010).
[12] Lewis, N. E., Jamshidi, N., Thiele, I., and Palsson, B. Ø. Encyclopedia of
Complexity and Systems Science, Robert A Meyers (ed) (2009).
[13] Varma, A. and Palsson, B. Ø. Nature Biotechnology 12, 994–998 (1994).
[14] Terzer, M., Maynard, N. D., Covert, M. W., and Stelling, J. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 1(3), 285–297 (2009).
[15] Schellenberger, J., Que, R., Fleming, R., Thiele, I., Orth, J., Feist, A., Zielinski, D., Bordbar, A., Lewis, N., Rahmanian, S., et al. Nature Protocols 6(9),
1290–1307 (2011).
165
[16] Lewis, N. E., Nagarajan, H., and Palsson, B. Ø. Nature Reviews Microbiology
10(4), 291–305 (2012).
[17] Price, N. D., Reed, J. L., and Palsson, B. Ø. Nature Reviews Microbiology
2(11), 886–897 (2004).
[18] Savinell, J. M. and Palsson, B. Ø. Journal of theoretical biology 154(4),
421–454 (1992).
[19] Vo, T. D., Greenberg, H. J., and Palsson, B. Ø. Journal of Biological Chemistry 279(38), 39532–39540 (2004).
[20] Patino, C. E. H., Jaime-Munoz, G., and Resendis-Antonio, O. Frontiers in
Physiology 3, 481 (2012).
[21] Bordbar, A., Lewis, N., Schellenberger, J., Palsson, B. Ø., and Jamshidi, N.
Molecular Systems Biology 6(1), 422 (2010).
[22] Mahadevan, R. and Schilling, C.
(2003).
Metabolic engineering 5(4), 264–276
[23] Edwards, J. S. and Palsson, B. Ø. BMC Bioinformatics 1(1), 1–1 (2000).
[24] Folger, O., Jerby, L., Frezza, C., Gottlieb, E., Ruppin, E., and Shlomi, T.
Molecular Systems Biology 7, 501 (2011).
[25] Schellenberger, J. and Palsson, B. Ø. Journal of Biological Chemistry 284(9),
5457–5461 (2009).
[26] Kaufman, D. E. and Smith, R. L. Operations Research 46(1), 84–95 (1998).
[27] Thiele, I., Fleming, R. M., Que, R., Bordbar, A., Diep, D., and Palsson, B. O.
PLoS ONE 7(9), e45635 (2012).
[28] Jensen, P., Lutz, K., and Papin, J. BMC Systems Biology 5(1), 147 (2011).
[29] Papin, J. A., Hunter, T., Palsson, B. Ø., and Subramaniam, S. Nature Reviews
Molecular Cell Biology 6(2), 99–111 February (2005).
[30] Papin, J. and Palsson, B. Journal of theoretical biology 227(2), 283–297
(2004).
[31] Kawai, T. and Akira, S. Cell Death & Differentiation 13(5), 816–825 (2006).
[32] Akira, S., Takeda, K., and Kaisho, T. Nature Immunology 2(8), 675–680
(2001).
[33] Beutler, B. Molecular Immunology 40(12), 845–859 (2004).
166
[34] Kaisho, T. and Akira, S. Journal of Allergy and Clinical Immunology 117(5),
979–987 (2006).
[35] Xu, D., Komai-Koma, M., and Liew, F. Y. Cellular Immunology 233(2),
85–89 (2005).
[36] MacLeod, H. and Wetzler, L. M. Science Signaling 2007(402), pe48 (2007).
[37] Takeda, K. and Akira, S. International Immunology 17(1), 1–14 (2005).
[38] Ospelt, C. and Gay, S. The International Journal of Biochemistry & Cell
Biology 42(4), 495–505 (2010).
[39] Rakoff-Nahoum, S. and Medzhitov, R. Nature Reviews Cancer 9(1), 57–63
(2008).
[40] Bajikar, S. S. and Janes, K. A. Annals of Biomedical Engineering (2012).
[41] Kawai, T., Takeuchi, O., Fujita, T., Inoue, J.-i., Mühlradt, P. F., Sato, S.,
Hoshino, K., and Akira, S. The Journal of Immunology 167(10), 5887–5894
(2001).
[42] Dunne, A., Marshall, N., and Mills, K. Current Opinion in Pharmacology
11(4), 404–411 (2011).
[43] Oda, K. and Kitano, H. Molecular Systems Biology 2(1), 2006 0015 (2006).
[44] Voet, D., Voet, J., and Pratt, C. Fundamentals of Biochemistry, Upgraded
Edition. John Wiley & Sons, Inc, United States of America, (2002).
[45] Duarte, N., Becker, S., Jamshidi, N., Thiele, I., Mo, M., Vo, T., Srivas, R.,
and Palsson, B. Proceedings of the National Academy of Sciences 104(6),
1777–1782 (2007).
[46] Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., and
Herrgard, M. J. Nature Protocols 2(3), 727–738 March (2007).
[47] Shlomi, T., Cabili, M., Herrgard, M., Palsson, B., and Ruppin, E. Nature
Biotechnology 26(9), 1003–1010 (2008).
[48] Rolfsson, O., Palsson, B. Ø., and Thiele, I. BMC Systems Biology 5(1), 155
(2011).
[49] Lewis, N., Schramm, G., Bordbar, A., Schellenberger, J., Andersen, M.,
Cheng, J., Patel, N., Yee, A., Lewis, R., Eils, R., et al. Nature Biotechnology
28(12), 1279–1285 (2010).
[50] Bordbar, A. and Palsson, B. Ø. Journal of Internal Medicine 271(2), 131–141
(2012).
167
[51] Jerby, L. and Ruppin, E. Clinical Cancer Research 18(20), 5572–5584 Oct
(2012).
[52] Heinken, A., Sahoo, S., Fleming, R. M., Thiele, I., et al. Gut microbes 4(1),
28–40 (2013).
[53] Thiele, I., Swainston, N., Fleming, R. M., Hoppe, A., Sahoo, S., Aurich,
M. K., Haraldsdottir, H., Mo, M. L., Rolfsson, O., Stobbe, M. D., et al.
Nature Biotechnology 31, 419–425 (2013).
[54] Hao, T., Ma, H.-W., Zhao, X.-M., and Goryanin, I. BMC Bioinformatics
11(1), 393 (2010).
[55] Gille, C., Bölling, C., Hoppe, A., Bulik, S., Hoffmann, S., Hübner, K., Karlstädt, A., Ganeshan, R., König, M., Rother, K., et al. Molecular Systems
Biology 6(1) (2010).
[56] Sahoo, S., Franzson, L., Jonsson, J. J., and Thiele, I. Molecular BioSystems
8, 2545–2558 (2012).
[57] Sahoo, S. and Thiele, I. Human molecular genetics 22(13), 2705–2722
(2013).
[58] Sahoo, S., Aurich, M. K., Jonsson, J. J., and Thiele, I. Frontiers in Physiology
5 (2014).
[59] Siegel, R., Naishadham, D., and Jemal, A. CA: A Cancer Journal for Clinicians 63(1), 11–30 (2013).
[60] Masters, J. and Palsson, B. Human Cell Culture: Volume I: Cancer Cell
Lines. Cancer Cell Lines. Springer, (1999).
[61] Cairns, R. A., Harris, I. S., and Mak, T. W. Nature Reviews Cancer 11(2),
85–95 (2011).
[62] Stratton, M. R., Campbell, P. J., and Futreal, P. A. Nature 458(7239), 719–
724 (2009).
[63] Hudson, T. J., Anderson, W., Aretz, A., Barker, A. D., Bell, C., Bernabé,
R. R., Bhan, M., Calvo, F., Eerola, I., Gerhard, D. S., et al. Nature 464(7291),
993–998 (2010).
[64] Warburg, O. et al. Science 123(3191), 309–314 (1956).
[65] Frezza, C. and Gottlieb, E. Seminars in Cancer Biology 19(1), 4 – 11 (2009).
[66] Feron, O. Radiotherapy & Oncology 92(3), 329–333 Sep (2009).
168
[67] Guppy, M., Greiner, E., and Brand, K. European Journal of Biochemistry
212(1), 95–99 (1993).
[68] Vazquez, A. and Oltvai, Z. N. PLoS ONE 6(4), e19538 (2011).
[69] Metallo, C. M., Gameiro, P. A., Bell, E. L., Mattaini, K. R., Yang, J., Hiller,
K., Jewell, C. M., Johnson, Z. R., Irvine, D. J., Guarente, L., et al. Nature
481(7381), 380–384 (2011).
[70] Fan, J., Kamphorst, J. J., Rabinowitz, J. D., and Shlomi, T. Journal of Biological Chemistry 288(43), 31363–31369 (2013).
[71] Carracedo, A., Cantley, L. C., and Pandolfi, P. P. Nature Reviews Cancer
13(4), 227–232 (2013).
[72] Ganapathy, V., Thangaraju, M., and Prasad, P. D. Pharmacology & Therapeutics 121(1), 29–40 Jan (2009).
[73] Fuchs, B. C. and Bode, B. P. Seminars in Cancer Biology 15(4), 254–266
Aug (2005).
[74] Verkman, A. S., Hara-Chikuma, M., and Papadopoulos, M. C. Journal of
Molecular Medicine 86(5), 523–529 May (2008).
[75] Calvo, M. B., Figueroa, A., Pulido, E. G., Campelo, R. G., and Aparicio,
L. A. International Journal of Endocrinology 2010 (2010).
[76] Fletcher, J. I., Haber, M., Henderson, M. J., and Norris, M. D. Nature Review
Cancer 10(2), 147–156 Feb (2010).
[77] Frezza, C., Zheng, L., Folger, O., Rajagopalan, K. N., MacKenzie, E. D.,
Jerby, L., Micaroni, M., Chaneton, B., Adam, J., Hedley, A., Kalna, G., Tomlinson, I. P. M., Pollard, P. J., Watson, D. G., Deberardinis, R. J., Shlomi, T.,
Ruppin, E., and Gottlieb, E. Nature 477(7363), 225–228 Sep (2011).
[78] Jerby, L., Wolf, L., Denkert, C., Stein, G. Y., Hilvo, M., Oresic, M., Geiger,
T., and Ruppin, E. Cancer Research 72(22), 5712–5720 Nov (2012).
[79] Wang, C., Uray, I. P., Mazumdar, A., Mayer, J. A., and Brown, P. H. Breast
Cancer Research and Treatment 134(1), 101–115 Jul (2012).
[80] Gopal, E., Fei, Y.-J., Sugawara, M., Miyauchi, S., Zhuang, L., Martin, P.,
Smith, S. B., Prasad, P. D., and Ganapathy, V. Journal of Biological Chemistry 279(43), 44522–44532 (2004).
[81] Hong, C., Maunakea, A., Jun, P., Bollen, A. W., Hodgson, J. G., Goldenberg,
D. D., Weiss, W. A., and Costello, J. F. Cancer Research 65(9), 3617–3623
May (2005).
169
[82] Li, H., Myeroff, L., Smiraglia, D., Romero, M. F., Pretlow, T. P., Kasturi,
L., Lutterbaugh, J., Rerko, R. M., Casey, G., Issa, J.-P., Willis, J., Willson, J.
K. V., Plass, C., and Markowitz, S. D. Proceedings of the National Academy
of Sciences 100(14), 8412–8417 Jul (2003).
[83] Thangaraju, M., Gopal, E., Martin, P. M., Ananth, S., Smith, S. B., Prasad,
P. D., Sterneck, E., and Ganapathy, V. Cancer Research 66(24), 11560–
11564 Dec (2006).
[84] Thangaraju, M., Cresci, G., Itagaki, S., Mellinger, J., Browning, D., Berger,
F., Prasad, P., and Ganapathy, V. Journal of Gastrointestinal Surgery 12(10),
1773–1782 (2008).
[85] Coothankandaswamy, V., Elangovan, S., Singh, N., Prasad, P. D.,
Thangaraju, M., and Ganapathy, V. Biochemical Journal 450(1), 169–178
(2013).
[86] Miyauchi, S., Gopal, E., Fei, Y.-J., and Ganapathy, V. Journal of Biological
Chemistry 279(14), 13293–13296 (2004).
[87] Falasca, M. and Linton, K. J. Expert Opinion on Investigational Drugs 21(5),
657–666 May (2012).
[88] Ho, M. M., Ng, A. V., Lam, S., and Hung, J. Y. Cancer Research 67(10),
4827–4833 May (2007).
[89] Hara-Chikuma, M. and Verkman, A. S. Molecular and Cellular Biology
28(1), 326–332 (2008).
[90] Pouyssegur, J., Dayan, F., and Mazure, N. M. Nature 441(7092), 437–443
May (2006).
[91] Scalise, M., Galluccio, M., Accardi, R., Cornet, I., Tommasino, M., and Indiveri, C. Cell Biochemistry and Function 30(5), 419–425 (2012).
[92] Vadlapudi, A. D., Vadlapatla, R. K., Pal, D., and Mitra, A. K. International
journal of pharmaceutics 441, 535–543 (2013).
[93] Sweet, R., Paul, A., and Zastre, J. Cancer Biology & Therapy 10(11), 1101–
1111 (2010).
[94] Resendis-Antonio, O., Checa, A., and Encarnacion, S. PLoS One 5(8),
e12383 (2010).
[95] Bordbar, A., Monk, J. M., King, Z. A., and Palsson, B. O. Nature Reviews
Genetics 15(2), 107–120 (2014).
[96] Masoudi-Nejad, A. and Asgari, Y. Seminars in Cancer Biology (0), – (2014).
170
[97] Lewis, N. E. and Abdel-Haleem, A. M. Frontiers in Physiology 4 (2013).
[98] Shlomi, T., Benyamini, T., Gottlieb, E., Sharan, R., and Ruppin, E. PLoS
Computational Biology 7(3), e1002018 (2011).
[99] Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., and
Nielsen, J. PLoS Computational Biology 8(5), e1002518 05 (2012).
[100] Vazquez, A., Markert, E. K., and Oltvai, Z. N. PLoS ONE 6(11), e25881
(2011).
[101] Tedeschi, P., Markert, E., Gounder, M., Lin, H., Dvorzhinski, D., Dolfi, S.,
Chan, L. L., Qiu, J., DiPaola, R., Hirshfield, K., et al. Cell death & disease
4(10), e877 (2013).
[102] Vazquez, A., Liu, J., Zhou, Y., and Oltvai, Z. BMC Systems Biology 4(1), 58
(2010).
[103] Jerby, L., Shlomi, T., and Ruppin, E. Molecular Systems Biology 6, 401 Sep
(2010).
[104] Gatto, F., Nookaew, I., and Nielsen, J. Proceedings of the National Academy
of Sciences 111(9), E866–E875 (2014).
[105] Colijn, C., Brandes, A., Zucker, J., Lun, D. S., Weiner, B., Farhat, M. R.,
Cheng, T.-Y., Moody, D. B., Murray, M., and Galagan, J. E. PLoS Computational Biology 5(8), e1000489 08 (2009).
[106] Cox, J. and Mann, M. Cell 130(3), 395–398 (2007).
[107] Wang, Z., Gerstein, M., and Snyder, M. Nature Reviews Genetics 10(1),
57–63 (2009).
[108] Barash, Y., Calarco, J., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B.,
and Frey, B. Nature 465(7294), 53–59 May (2010).
[109] Sun, Q., Chen, X., Ma, J., Peng, H., Wang, F., Zha, X., Wang, Y., Jing, Y.,
Yang, H., Chen, R., Chang, L., Zhang, Y., Goto, J., Onda, H., Chen, T., Wang,
M.-R., Lu, Y., You, H., Kwiatkowski, D., and Zhang, H. Proceedings of the
National Academy of Sciences 108(10), 4129–4134 Mar (2011).
[110] Beck, M., Schmidt, A., Malmstroem, J., Claassen, M., Ori, A., Szymborska,
A., Herzog, F., Rinner, O., Ellenberg, J., and Aebersold, R. Molecular Systems Biology 7(1), 549 (2011).
[111] Nagaraj, N., Wisniewski, J., Geiger, T., Cox, J., Kircher, M., Kelso, J., Pääbo,
S., and Mann, M. Molecular Systems Biology 7(1), 548 (2011).
171
[112] Sabidó, E., Selevsek, N., and Aebersold, R. Current Opinion in Biotechnology 23(4), 591–597 (2012).
[113] Antonucci, R., Pilloni, M. D., Atzori, L., and Fanos, V. Journal of MaternalFetal and Neonatal Medicine 25(S5), 22–26 (2012).
[114] Kaddurah-Daouk, R., Kristal, B. S., and Weinshilboum, R. M. Annual Review of Pharmacology and Toxicology 48, 653–683 (2008).
[115] Jamshidi, N. and Palsson, B. Ø. Molecular Systems Biology 2(1) (2006).
[116] Reed, J. L. PLoS Computational Biology 8(8), e1002662 (2012).
[117] Mo, M. L., Palsson, B. Ø., and Herrgård, M. J. BMC Systems Biology 3(1)
(2009).
[118] Åkesson, M., Förster, J., and Nielsen, J. Metabolic Engineering 6(4), 285–
293 (2004).
[119] Bordbar, A., Mo, M. L., Nakayasu, E. S., Schrimpe-Rutledge, A. C., Kim,
Y.-M., Metz, T. O., Jones, M. B., Frank, B. C., Smith, R. D., Peterson, S. N.,
et al. Molecular Systems Biology 8(1) (2012).
[120] Chang, R., Xie, L., Xie, L., Bourne, P., and Palsson, B. PLoS Computational
Biology 6(9), e1000938 (2010).
[121] Karlstädt, A., Fliegner, D., Kararigas, G., Ruderisch, H. S., Regitz-Zagrosek,
V., and Holzhütter, H.-G. BMC Systems Biology 6(1), 114 (2012).
[122] Zur, H., Ruppin, E., and Shlomi, T. Bioinformatics 26(24), 3140–3142
(2010).
[123] Chandrasekaran, S. and Price, N. Proceedings of the National Academy of
Sciences 107(41), 17845–17850 (2010).
[124] Jensen, P. and Papin, J. Bioinformatics 27(4), 541–547 (2011).
[125] Zhao, Y. and Huang, J. Biochemical and biophysical research communications 415(3), 450–454 (2011).
[126] Vlassis, N., Pacheco, M. P., and Sauter, T. arXiv preprint arXiv:1304.7992
(2013).
[127] Agren, R., Mardinoglu, A., Asplund, A., Kampf, C., Uhlen, M., and Nielsen,
J. Molecular Systems Biology 10(3) (2014).
[128] Wang, Y., Eddy, J., and Price, N. BMC Systems Biology 6(1), 153 (2012).
172
[129] Blazier, A. and Papin, J.
Medicine 3, 299 (2012).
Frontiers in Computational Physiology and
[130] Shlomi, T. Biotechnology and Genetic Engineering Reviews 26(1), 281–296
(2009).
[131] Becker, S. and Palsson, B. Ø. PLoS Computational Biology 4(5), e1000082
(2008).
[132] Bordbar, A., Jamshidi, N., and Palsson, B. Ø. BMC Systems Biology 5(1),
110 (2011).
[133] Bordbar, A., Feist, A., Usaite-Black, R., Woodcock, J., Palsson, B., and
Famili, I. BMC Systems Biology 5(1), 180 (2011).
[134] Fleming, R., Thiele, I., and Nasheuer, H. Biophysical chemistry 145(2), 47–
56 (2009).
[135] Kümmel, A., Panke, S., and Heinemann, M. BMC Bioinformatics 7(1), 512
(2006).
[136] Yizhak, K., Benyamini, T., Liebermeister, W., Ruppin, E., and Shlomi, T.
Bioinformatics 26(12), i255–i260 (2010).
[137] Jain, M., Nilsson, R., Sharma, S., Madhusudhan, N., Kitami, T., Souza,
A. L., Kafri, R., Kirschner, M. W., Clish, C. B., and Mootha, V. K. Science
336(6084), 1040–1044 (2012).
[138] Schmidt, B. J., Ebrahim, A., Metz, T. O., Adkins, J. N., Palsson, B. Ø., and
Hyduke, D. R. Bioinformatics 29(22), 2900–2908 (2013).
[139] Cakir, T., Patil, K. R., Onsan, Z. I., Ulgen, K. O., Kirdar, B., and Nielsen, J.
Molecular Systems Biology 2(50) September (2006).
[140] Selvarasu, S., Ho, Y. S., Chong, W. P., Wong, N. S., Yusufi, F. N., Lee, Y. Y.,
Yap, M. G., and Lee, D.-Y. Biotechnology and Bioengineering 109(6), 1415–
1429 (2012).
[141] Ahn, S.-Y., Jamshidi, N., Mo, M. L., Wu, W., Eraly, S. A., Dnyanmote, A.,
Bush, K. T., Gallegos, T. F., Sweet, D. H., Palsson, B. Ø., et al. Journal of
Biological Chemistry 286(36), 31522–31531 (2011).
[142] Fan, J., Kamphorst, J. J., Mathew, R., Chung, M. K., White, E., Shlomi, T.,
and Rabinowitz, J. D. Molecular Systems Biology 9(1) (2013).
[143] Jamshidi, N., Miller, F. J., Mandel, J., Evans, T., and Kuo, M. D. BMC
Systems Biology 5(1), 200 (2011).
173
[144] Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Nookaew, I., Jacobson,
P., Walley, A. J., Froguel, P., Carlsson, L. M., Uhlen, M., et al. Molecular
Systems Biology 9(1) (2013).
[145] Bordel, S., Agren, R., and Nielsen, J. PLoS Computational Biology 6(7),
e1000859 (2010).
[146] Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Uhlen, M., and Nielsen,
J. Nature Communications 5 (2014).
[147] Zu, X. L. and Guppy, M. Biochemical and Biophysical Research Communications 313(3), 459–465 (2004).
[148] Locasale, J. W., Grassian, A. R., Melman, T., Lyssiotis, C. A., Mattaini,
K. R., Bass, A. J., Heffron, G., Metallo, C. M., Muranen, T., Sharfi, H.,
et al. Nature Genetics 43(9), 869–874 (2011).
[149] Vander Heiden, M. G., Locasale, J. W., Swanson, K. D., Sharfi, H., Heffron,
G. J., Amador-Noguez, D., Christofk, H. R., Wagner, G., Rabinowitz, J. D.,
Asara, J. M., et al. Science 329(5998), 1492–1499 (2010).
[150] Smolková, K., Plecitá-Hlavatá, L., Bellance, N., Benard, G., Rossignol, R.,
and Ježek, P. The International Journal of Biochemistry & Cell Biology
43(7), 950–968 (2011).
[151] Zielke, H. R., Ozand, P. T., Tildon, J. T., Sevdalian, D. A., and Cornblath, M.
Proceedings of the National Academy of Sciences 73(11), 4110–4114 Nov
(1976).
[152] DeBerardinis, R. J., Mancuso, A., Daikhin, E., Nissim, I., Yudkoff, M.,
Wehrli, S., and Thompson, C. B. Proceedings of the National Academy of
Sciences 104(49), 19345–19350 Dec (2007).
[153] Holleran, A. L., Briscoe, D. A., Fiskum, G., and Kelleher, J. K. Molecular
and Cellular Biochemistry 152(2), 95–101 Nov (1995).
[154] Gstraunthaler, G., Seppi, T., and Pfaller, W. Cellular Physiology and Biochemistry 9(3), 150–172 (1999).
[155] Marroquin, L. D., Hynes, J., Dykens, J. A., Jamieson, J. D., and Will, Y.
Toxicological Sciences 97(2), 539–547 (2007).
[156] Dewhirst, M. W., Braun, R. D., and Lanzen, J. L. International Journal of
Radiation Oncology Biology Physics 42(4), 723–726 (1998).
[157] Saks, V. Molecular System Bioenergetics: Energy for Life. Wiley, (2008).
174
[158] Voet, D., Voet, J., and Pratt, C. Fundamentals of Biochemistry: Life at the
Molecular Level. 2nd edit. Wiley, New York, (2006).
[159] Riemer, S. A., Rex, R., and Schomburg, D. BMC Systems Biology 7(1), 33
(2013).
[160] Griguer, C. E., Oliva, C. R., and Gillespie, G. Y. Journal of Neuro-Oncology
74(2), 123–133 (2005).
[161] Edwards, J. S., Ramakrishna, R., and Palsson, B. O. Biotechnology and
Bioengineering 77(1), 27–36 (2002).
[162] Gudmundsson, S. and Thiele, I. BMC Bioinformatics 11(1), 489 (2010).
[163] Thiele, I., Price, N., Vo, T., and Palsson, B. Journal of Biological Chemistry
280(12), 11683–11695 (2005).
[164] Greshock, J., Feng, B., Nogueira, C., Ivanova, E., Perna, I., Nathanson, K.,
Protopopov, A., Weber, B. L., and Chin, L. Cancer Research 67(21), 10173–
10180 (2007).
[165] Possemato, R., Marks, K. M., Shaul, Y. D., Pacold, M. E., Kim, D., Birsoy,
K., Sethumadhavan, S., Woo, H.-K., Jang, H. G., Jha, A. K., Chen, W. W.,
Barrett, F. G., Stransky, N., Tsun, Z.-Y., Cowley, G. S., Barretina, J., Kalaany,
N. Y., Hsu, P. P., Ottina, K., Chan, A. M., Yuan, B., Garraway, L. A., Root,
D. E., Mino-Kenudson, M., Brachtel, E. F., Driggers, E. M., and Sabatini,
D. M. Nature 476(7360), 346–350 Aug (2011).
[166] Sanz-Moreno, V., Gadea, G., Ahn, J., Paterson, H., Marra, P., Pinner, S.,
Sahai, E., and Marshall, C. J. Cell 135(3), 510–523 (2008).
[167] Spencer, S. L., Gaudet, S., Albeck, J. G., Burke, J. M., and Sorger, P. K.
Nature 459(7245), 428–432 (2009).
[168] Guppy, M., Leedman, P., Zu, X., and Russell, V. Biochemical Journal 364(Pt
1), 309–315 May (2002).
[169] Bellance, N., Benard, G., Furt, F., Begueret, H., Smolková, K., Passerieux,
E., Delage, J., Baste, J., Moreau, P., and Rossignol, R. The International
Journal of Biochemistry & Cell Biology 41(12), 2566 – 2577 (2009).
[170] Thompson, C. B. Cancer & Metabolism 2(Suppl 1), O32 (2014).
[171] Wellen, K. E., Lu, C., Mancuso, A., Lemons, J. M., Ryczko, M., Dennis,
J. W., Rabinowitz, J. D., Coller, H. A., and Thompson, C. B. Genes & Development 24(24), 2784–2799 (2010).
175
[172] Carreau, A., Hafny-Rahbi, B. E., Matejuk, A., Grillon, C., and Kieda, C.
Journal of Cellular and Molecular Medicine 15(6), 1239–1253 (2011).
[173] Aurich, M. K., Paglia, G., Rolfsson, O., Hrafnsdóttir, S., Magnúsdóttir,
M., Stefaniak, M. M., Palsson, B. O., Fleming, R. M., and Thiele, I.
Metabolomics , 1–17 (2014).
[174] Chunta, J. L., Vistisen, K. S., Yazdi, Z., and Braun, R. D. PLoS ONE 7(5),
e37471 05 (2012).
[175] Mir, M., Wang, Z., Shen, Z., Bednarz, M., Bashir, R., Golding, I., Prasanth,
S. G., and Popescu, G. Proceedings of the National Academy of Sciences
108(32), 13124–13129 (2011).
[176] Chapman, E. H., Kurec, A. S., and Davey, F. Journal of Clinical Pathology
34(10), 1083–1090 (1981).
[177] O’Connor, P. M., Jackman, J., Bae, I., Myers, T. G., Fan, S., Mutoh, M.,
Scudiero, D. A., Monks, A., Sausville, E. A., Weinstein, J. N., et al. Cancer
Research 57(19), 4285–4300 (1997).
[178] Nishiumi, S., Kobayashi, T., Ikeda, A., Yoshie, T., Kibi, M., Izumi, Y.,
Okuno, T., Hayashi, N., Kawano, S., Takenawa, T., Azuma, T., and Yoshida,
M. PLoS ONE 7(7), e40459 07 (2012).
[179] Lloyd, M. D., Darley, D. J., Wierzbicki, A. S., and Threadgill, M. D. FEBS
journal 275(6), 1089–1102 (2008).
[180] Hellgren, L. I. Annals of the New York Academy of Sciences 1190(1), 42–49
(2010).
[181] Ollberding, N. J., Aschebrook-Kilfoy, B., Caces, D. B. D., Wright, M. E.,
Weisenburger, D. D., Smith, S. M., and Chiu, B. C.-H. Carcinogenesis 34(1),
170–175 (2013).
[182] Price, A. J., Allen, N. E., Appleby, P. N., Crowe, F. L., Jenab, M., Rinaldi,
S., Slimani, N., Kaaks, R., Rohrmann, S., Boeing, H., et al. The American
journal of clinical nutrition 91(6), 1769–1776 (2010).
[183] Lloyd, M. D., Yevglevskis, M., Lee, G. L., Wood, P. J., Threadgill, M. D.,
and Woodman, T. J. Progress in Lipid Research 52(2), 220 – 230 (2013).
[184] Mubiru, J. N., Valente, A. J., and Troyer, D. A. The Prostate 65(2), 117–123
(2005).
[185] Ouyang, B., Leung, Y.-K., Wang, V., Chung, E., Levin, L., Bracken, B.,
Cheng, L., and Ho, S.-M. Urology 77(1), 249.e1 – 249.e7 (2011).
176
[186] Hu, J., Locasale, J. W., Bielas, J. H., O’Sullivan, J., Sheahan, K., Cantley,
L. C., Vander Heiden, M. G., and Vitkup, D. Nature biotechnology 31(6),
522–529 (2013).
[187] Suganuma, K., Miwa, H., Imai, N., Shikami, M., Gotou, M., Goto, M.,
Mizuno, S., Takahashi, M., Yamamoto, H., Hiramatsu, A., et al. Leukemia &
lymphoma 51(11), 2112–2119 (2010).
[188] ZHENG, J. Oncology Letters 4(6), 1151 (2012).
[189] Cheng, T., Sudderth, J., Yang, C., Mullen, A. R., Jin, E. S., Mates, J. M.,
and DeBerardinis, R. J. Proceedings of the National Academy of Sciences
108(21), 8674–8679 (2011).
[190] Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., Jonasson, K., Forsberg,
M., Zwahlen, M., Kampf, C., Wester, K., Hober, S., et al. Nature Biotechnology 28(12), 1248–1250 (2010).
[191] Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim,
I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M.,
et al. Nucleic Acids Research 39(suppl 1), D1005–D1010 (2011).
[192] Durot, M., Bourguignon, P.-Y., and Schachter, V. FEMS Microbiology Reviews 33(1), 164–190 (2008).
[193] Hyduke, D. R., Lewis, N. E., and Palsson, B. O. Molecular BioSystems 9,
167–174 (2013).
[194] Li, S., Park, Y., Duraisingham, S., Strobel, F. H., Khan, N., Soltow, Q. A.,
Jones, D. P., and Pulendran, B. PLOS Computational Biology 9(7), e1003123
(2013).
[195] Paglia, G., Palsson, B. O., and Sigurjonsson, O. E. Journal of Proteomics
76(0), 163 – 167 (2012). Special Issue: Integrated Omics.
[196] Paglia, G., Hrafnsdóttir, S., Magnúsdóttir, M., Fleming, R., Thorlacius, S.,
Palsson, B., and Thiele, I. Analytical and Bioanalytical Chemistry , 1–16
(2012).
[197] Chance, B., Sies, H., and Boveris, A. Physiological Reviews 59(3), 527–605
(1979).
[198] Dröge, W. Physiological Reviews 82(1), 47–95 (2002).
[199] Vander Heiden, M. Nature Reviews Drug Discovery 10(9), 671–684 (2011).
[200] Chiarugi, A., Dölle, C., Felici, R., and Ziegler, M. Nature Reviews Cancer
12(11), 741–752 (2012).
177
[201] Nikiforov, A., Dölle, C., Niere, M., and Ziegler, M. Journal of Biological
Chemistry 286(24), 21767–21778 (2011).
[202] Ha, H., Thiagalingam, A., Nelkin, B., and Casero, R. Clinical Cancer Research 6(9), 3783–3787 (2000).
[203] Dreher, D. and Junod, A. European Journal of Cancer 32(1), 30–38 (1996).
[204] Brand, K. and Hermfisse, U. The FASEB journal 11(5), 388–395 (1997).
[205] Ogasawara, Y., Funakoshi, M., and Ishii, K. Biological and Pharmaceutical
Bulletin 32(11), 1819–1823 (2009).
[206] Cortés-Cros, M., Hemmerlin, C., Ferretti, S., Zhang, J., Gounarides, J. S.,
Yin, H., Muller, A., Haberkorn, A., Chene, P., Sellers, W. R., et al. Proceedings of the National Academy of Sciences 110(2), 489–494 (2013).
[207] Marin-Hernandez, A., Gallardo-Perez, J. C., Ralph, S. J., RodriguezEnriquez, S., and Moreno-Sanchez, R. Mini Reviews in Medicinal Chemistry
9(9), 1084–1101 (2009).
[208] Lenzen, S. Journal of Biological Chemistry 289(18), 12189–12194 (2014).
[209] Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y.,
Djoumbou, Y., Mandal, R., Aziat, F., Dong, E., et al. Nucleic Acids Research
41(D1), D801–D807 (2013).
[210] Ganske, F. and Dell, E. BMG LABTECH (2006).
[211] Zarember, K. and Godowski, P. The Journal of Immunology 168(2), 554–561
(2002).
[212] Kadowaki, N., Ho, S., Antonenko, S., de Waal Malefyt, R., Kastelein, R.,
Bazan, F., and Liu, Y. The Journal of Experimental Medicine 194(6), 863–
869 (2001).
[213] Tang, S., Arumugam, T., Xu, X., Cheng, A., Mughal, M., Jo, D., Lathia, J.,
Siler, D., Chigurupati, S., Ouyang, X., et al. Proceedings of the National
Academy of Sciences 104(34), 13798–13803 (2007).
[214] Cros, J., Cagnard, N., Woollard, K., Patey, N., Zhang, S., Senechal, B., Puel,
A., Biswas, S., Moshous, D., Picard, C., et al. Immunity 33(3), 375–386
(2010).
[215] Serbina, N., Jia, T., Hohl, T., and Pamer, E. Annual Review of Immunology
26, 421–452 (2008).
[216] Auffray, C., Sieweke, M., and Geissmann, F. Annual Review of Immunology
27, 669–692 (2009).
178
[217] Dinarello, C. Blood 117(14), 3720–3732 (2011).
[218] Jensen, P. and Papin, J. Bioinformatics 27(4), 541–547 (2011).
[219] Thiele, I., Fleming, R., Bordbar, A., Schellenberger, J., and Palsson, B. Biophysical Journal 98(10), 2072–2081 (2010).
[220] Gianchandani, E., Papin, J., Price, N., Joyce, A., and Palsson, B. PLoS Computational Biology 2(8), e101 (2006).
[221] Gianchandani, E., Joyce, A., Palsson, B., and Papin, J. PLoS Computational
Biology 5(6), e1000403 (2009).
[222] Dasika, M., Burgard, A., and Maranas, C. Biophysical Journal 91(1), 382–
398 (2006).
[223] Richard, G., Belta, C., Julius, A., and Amar, S. PLoS ONE 7(2), e31341
(2012).
[224] Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. Nucleic Acids Research
39, D52–D57 (2011).
[225] Apweiler, R., Martin, M. J., O’Donovan, C., Magrane, M., Alam-Faruque,
Y., Antunes, R., Barrell, D., Bely, B., Bingley, M., Binns, D., Bower, L.,
Browne, P., Chan, W. M., Dimmer, E., Eberhardt, R., Fazzini, F., Fedotov, A.,
Foulger, R., Garavelli, J., Castro, L. G., Huntley, R., Jacobsen, J., Kleen, M.,
Laiho, K., Legge, D., Lin, Q. A., Liu, W. D., Luo, J., Orchard, S., Patient, S.,
Pichler, K., Poggioli, D., Pontikos, N., Pruess, M., Rosanoff, S., Sawford, T.,
Sehra, H., Turner, E., Corbett, M., Donnelly, M., van Rensburg, P., Xenarios,
I., Bougueleret, L., Auchincloss, A., Argoud-Puy, G., Axelsen, K., Bairoch,
A., Baratin, D., Blatter, M. C., Boeckmann, B., Bolleman, J., Bollondi, L.,
Boutet, E., Quintaje, S. B., Breuza, L., Bridge, A., deCastro, E., Coudert, E.,
Cusin, I., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti, L.,
Feuermann, M., Gehant, S., Ferro, S., Gasteiger, E., Gateau, A., Gerritsen,
V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Hulo, N., James,
J., Jimenez, S., Jungo, F., Kappler, T., Keller, G., Lara, V., Lemereier, P.,
Lieberherr, D., Martin, X., Masson, P., Moinat, M., Morgat, A., Paesano, S.,
Pedruzzi, I., Pilbout, S., Poux, S., Pozzato, M., Redaschi, N., Rivoire, C.,
Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stanley,
E., et al. Nucleic Acids Research 39, D214–D219 (2011).
[226] Zhang, D., Zhang, G., Hayden, M., Greenblatt, M., Bussey, C., Flavell, R.,
and Ghosh, S. Science 303(5663), 1522–1526 (2004).
[227] Lynn, D., Winsor, G., Chan, C., Richard, N., Laird, M., Barsky, A., Gardy,
J., Roche, F., Chan, T., Shah, N., et al. Molecular Systems Biology 4(1), 218
(2008).
179
[228] Takeda, K. and Akira, S. Seminars in Immunology 16(1), 3–9 (2004).
[229] Lowe, E., Doherty, T., Karahashi, H., and Arditi, M. Journal of Endotoxin
Research 12(6), 337–345 (2006).
[230] Syvanen, A. Nature Reviews Genetics 2(12), 930–942 (2001).
[231] Jamshidi, N., Wiback, S., and Palsson, B. Genome Research 12(11), 1687–
1692 (2002).
[232] Bals, R. and Hiemstra, P. European Respiratory Journal 23(2), 327–333
(2004).
[233] Swulius, M. and Waxham, M. Cellular and Molecular Life Sciences 65(17),
2637–2657 (2008).
[234] Ciocca, D. and Calderwood, S. Cell stress & chaperones 10(2), 86–103
(2005).
[235] Aitken, A. Seminars in Cancer Biology 16(3), 162–172 (2006).
[236] Van Der Hoeven, P., Van Der Wal, J., Ruurs, P., Van Dijk, M., and Van Blitterswijk, J. Biochemical Journal 345(2), 297–306 (2000).
[237] Tobimatsu, T. and Fujisawa, H. Journal of Biological Chemistry 264(30),
17907–17912 (1989).
[238] Droemann, D., Albrecht, D., Gerdes, J., Ulmer, A., Branscheid, D., Vollmer,
E., Dalhoff, K., Zabel, P., Goldmann, T., et al. Respiratory Research 6(1),
1–6 (2005).
[239] Werner, J., DeCarlo, C., Escott, N., Zehbe, I., and Ulanova, M. Innate Immunity 18(1), 55–69 (2011).
[240] Dower, K., Ellis, D., Saraf, K., Jelinsky, S., and Lin, L. The Journal of
Immunology 180(5), 3520–3534 (2008).
[241] Berglund, L., Björling, E., Oksvold, P., Fagerberg, L., Asplund, A., AlKhalili Szigyarto, C., Persson, A., Ottosson, J., Wernérus, H., Nilsson, P.,
et al. Molecular & Cellular Proteomics 7(10), 2019–27 (2008).
[242] Guha, M. and Mackman, N. Cellular Signalling 13(2), 85–94 (2001).
[243] Izaguirre, A., Barnes, B., Amrute, S., Yeow, W., Megjugorac, N., Dai, J.,
Feng, D., Chung, E., Pitha, P., and Fitzgerald-Bocarsly, P. Journal of Leukocyte Biology 74(6), 1125–1138 (2003).
180
[244] Cachia, O., Benna, J., Pedruzzi, E., Descomps, B., Gougerot-Pocidalo,
M., and Leger, C. Journal of Biological Chemistry 273(49), 32801–32805
(1998).
[245] Rahman, M. and McFadden, G. PLoS Pathogens 2(2), e4 (2006).
[246] Kavita, U. and Mizel, S. Journal of Biological Chemistry 270(46), 27758–
27765 (1995).
[247] Farina, C., Theil, D., Semlinger, B., Hohlfeld, R., and Meinl, E. International
Immunology 16(6), 799–809 (2004).
[248] Lech, M., Avila-Ferrufino, A., Skuginna, V., Susanti, H., and Anders, H.
International Immunology 22(9), 717–728 (2010).
[249] Ogura, Y., Inohara, N., Benito, A., Chen, F., Yamaoka, S., and Núñez, G.
Journal of Biological Chemistry 276(7), 4812–4818 (2001).
[250] Moynagh, P. N. Trends in Immunology 30(1), 33–42 (2009).
[251] Cohn, Z. A. and Benson, B. The Journal of Experimental Medicine 121(1),
153–170 (1965).
[252] Kostromins, A. and Stalidzans, E. Biosystems 109(2), 233 – 239 (2012).
[253] Krappmann, D., Wegener, E., Sunami, Y., Esen, M., Thiel, A., Mordmuller,
B., and Scheidereit, C. Molecular and Cellular Biology 24(14), 6488–6500
(2004).
[254] Rebe, C., Cathelin, S., Launay, S., Filomenko, R., Prevotat, L., L’Ollivier,
C., Gyan, E., Micheau, O., Grant, S., Dubart-Kupperschmitt, A., et al. Blood
109(4), 1442–1450 (2007).
[255] Krawczyk, C., Holowka, T., Sun, J., Blagih, J., Amiel, E., DeBerardinis, R.,
Cross, J., Jung, E., Thompson, C., Jones, R., et al. Blood 115(23), 4742–4749
(2010).
[256] Tannahill, G. and O’Neill, L. FEBS Letters 585(11), 1568–1572 (2011).
[257] Dinarello, C. Blood 87(6), 2095–2147 (1996).
[258] Mackman, N., Brand, K., and Edgington, T. The Journal of Experimental
Medicine 174(6), 1517–1526 (1991).
[259] Richard, G., Chang, H., Cizelj, I., Belta, C., Julius, A., and Amar, S. 50th
IEEE Conference on Decision and Control and European Control Conference (CDC-ECC),Orlando, FL, USA. , 2227–2232 (2011).
181
[260] Lee, J., Gianchandani, E., Eddy, J., and Papin, J. PLoS Computational Biology 4(5), e1000086 (2008).
[261] Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J., Bolival, B.,
Assad-Garcia, N., Glass, J., and Covert, M. Cell 150(2), 389–401 (2012).
[262] Covert, M., Xiao, N., Chen, T., and Karr, J. Bioinformatics 24(18), 2044–
2050 (2008).
[263] Vardi, L., Ruppin, E., and Sharan, R. Journal of Computational Biology
19(2), 232–240 (2012).
[264] Thorleifsson, S. and Thiele, I. Bioinformatics 27(14) (2011).
[265] Colicelli, J. Science Signaling 2004(250), re13–re13 (2004).
[266] Schumann, R., Leong, S.R.and Flaggs, G., Gray, P., Wright, S., Mathison, J.,
Tobias, P., and Ulevitch, R. Science 249(4975), 1429–1431 (1990).
[267] Grube, B., Cochane, C., Ye, R., Green, C., McPhail, M., Ulevitch, R., and
Tobias, P. Journal of Biological Chemistry 269(11), 8477–8482 (1994).
[268] Thomas, C., Kapoor, M., Sharma, S., Bausinger, H., Zyilan, U., Lipsker, D.,
Hanau, D., and Surolia, A. FEBS Letters 531(2), 184–188 (2002).
[269] Blake, J. A., Bult, C. J., Kadin, J. A., Richardson, J. E., and Eppig, J. T.
Nucleic Acids Research 39(suppl 1), D842–D848 (2011).
[270] Hasan, U., Chaffois, C., Gaillard, C., Saulnier, V., Merck, E., Tancredi, S.,
Guiet, C., Brière, F., Vlach, J., Lebecque, S., et al. The Journal of Immunology 174(5), 2942–2950 (2005).
[271] Mishra, S., Mishra, J., Gee, K., McManus, D., LaCasse, E., and Kumar, A.
Journal of Biological Chemistry 280(45), 37536–37546 (2005).
[272] Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., and Mesirov, J.
Nature Genetics 38(5), 500–501 (2006).
[273] Warren, P., Taylor, D., Martini, P., Jackson, J., and Bienkowska, J. In Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th
IEEE International Conference on, 108–115. IEEE, (2007).
[274] Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S.,
Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. Genome biology 5(10), R80
(2004).
[275] Emig, D., Salomonis, N., Baumbach, J., Lengauer, T., Conklin, B., and Albrecht, M. Nucleic Acids Research 38(suppl 2), W755–W762 (2010).
182
[276] Machado, D. and Herrgård, M.
e1003580 (2014).
PLoS Computational Biology 10(4),
[277] Zhang, D., Hu, X., Qian, L., Chen, S.-H., Zhou, H., Wilson, B., Miller, D. S.,
and Hong, J.-S. J. Neuroinflammation 8(3) (2011).
[278] Weintz, G., Olsen, J. V., Frühauf, K., Niedzielska, M., Amit, I., Jantsch, J.,
Mages, J., Frech, C., Dölken, L., Mann, M., et al. Molecular Systems Biology
6(1) (2010).
[279] Yamashita, M., Chattopadhyay, S., Fensterl, V., Saikia, P., Wetzel, J. L., and
Sen, G. C. Science signaling 5(233), ra50 (2012).
[280] Gao, X., Wang, H., Yang, J., Liu, X., and Liu, Z.-R. Molecular Cell 45(5),
598–609 (2012).
[281] Yang, W., Xia, Y., Hawke, D., Li, X., Liang, J., Xing, D., Aldape, K., Hunter,
T., Alfred Yung, W., and Lu, Z. Cell 150(4), 685–696 (2012).
183
6 List of Publications
• Aurich, M.K., Paglia, P., Rolfsson, Ó, Hrafnsdóttir, S., Magnúsdóttir, M.,
Stefaniak, M.M., Palsson, B.Ø., Fleming, R.M.T., Thiele, I. Prediction of
intracellular metabolic states from extracellular metabolomic data. (2014)
Metabolomics, 1-17.
• Sahoo, S., Aurich, M.K., Jónsson, J.J., Thiele, I. Membrane transporters in
a human genome-scale metabolic knowledgebase and their implications for
disease. (2014) Frontiers in Physiology 5, 91.
• Thiele, I., Swainston, N., Fleming, R. M., Hoppe, A., Sahoo, S., Aurich,
M. K., Haraldsdottir, H., Mo, M. L., Rolfsson, O., Stobbe, M. D., et al. A
community-driven global reconstruction of human metabolism. (2013) Nature Biotechnology, 31, 419-425.
• Mednis, M. and Aurich, M.K. Application of string similarity ratio and edit
distance in automatic metabolite reconciliation comparing reconstructions and
models. (2012) Biosystems and Information Technology 1 (1), 14-18.
• Aurich,M.K., Thiele, I., Contextualization Procedure and Modeling of Monocyte Specific TLR Signaling. (2012) PloS one 7 (12), e49978.
185