* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Integration of omics data with biochemical reaction
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Expression vector wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Signal transduction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Pharmacometabolomics wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Metabolomics wikipedia , lookup
Paracrine signalling wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression profiling wikipedia , lookup
Biochemical cascade wikipedia , lookup
Integration of omics data with biochemical reaction networks Maike Kathrin Aurich Department Department of of Life Life and and Environmental Environmental Sciences Sciences University University of of Iceland Iceland 2014 2014 INTEGRATION OF OMICS DATA WITH BIOCHEMICAL REACTION NETWORKS Maike Kathrin Aurich Dissertation submitted in partial fulllment of Philosophiae Doctor degree in Biology Advisor Professor Ines Thiele Thesis Committee Professor Ólafur S. Andrésson Professor Ines Thiele Professor Jón.J. Jónsson Professor Sigurður Brynjolfsson Opponents Professor Dietmar Schomburg Professor Fabien Jourdan Department of Life and Environmental Sciences School of Engineering and Natural Sciences University of Iceland Reykjavik, June 2014 Integration of omics data with biochemical reaction networks Integration of omics data with biochemical networks Dissertation submitted in a partial fulfillment of a Ph.D. degree in Biology c 2014 Maike Kathrin Aurich Copyright All rights reserved Department of Life and Environmental Sciences School of Engineering and Natural Sciences University of Iceland Sturlugata 7 (Askja) 101, Reykjavik, Reykjavik Iceland Telephone: 525 4000 Bibliographic information: Maike Kathrin Aurich, 2014, Integration of omics data with biochemical reaction networks, Ph.D. thesis, Department of Life and Environmental Sciences, University of Iceland. ISBN XX Printing: Háskólaprent, Fálkagata 2, 107 Reykjavik Reykjavik, Iceland, June 2014 Abstract The appearance of omics data sets has contributed to the rapid development of systems biology, which seeks the understanding of complex biological systems. Constraint-based modeling is one modeling formalism applied in systems biology, which relies on genome-scale network reconstructions. Metabolic reconstructions are increasingly used to understand normal cellular and disease states, which often involves the generation of cell-line or tissue-specific metabolic models through the integration of omics data. Metabolomic data can be easily obtained. Yet, methods for the generation of condition-specific metabolic models are less well developed. In this thesis, a workflow is established for the generation of condition-specific models from extracellular metabolomic data and the human metabolic model. The analysis of the models enables the investigation of metabolic phenotypes among cancer cell line specific models, based on model predictions of ATP yield, and the robustness of the models towards environmental and genetic perturbation. The models are built through a rigid reduction of exchange reactions, which emphasizes the detected metabolite concentration changes. However, the internal pathway redundancy remains widely preserved. Integration of transcriptomic reduces the internal pathway redundancy. Hence, in a following study, two lymphoblastic leukemia cells line models are generated, combining metabolomic and transcriptomic data. The models explain distinctive concentration changes in the spent medium of the two cancer cell lines by different utilization of glycolysis and oxidative phosphorylation. Analysis further reveals the accumulation of differential gene regulation and alternative splicing events at key steps of central metabolic pathways. Metabolism is closely intertwined with other cellular processes, namely signaling pathways, which play a key role in diseases like cancer. Hence, a contextualization procedure for signaling networks was developed, opening yet another avenue for omics data analysis. This approach is demonstrated through the contextualization of the Toll-like receptor (TLR) signaling network towards a generic monocyte TLR signaling network at first, and subsequently towards an LPS activated TLR signaling network. Taken together, my work extends the scope of omics data integration within the COBRA field. The inference of internal network states from extracellular measurements, as demonstrated herein, holds great potential for personalized medicine. However, further development is needed for the interpretation of metabolomic data derived from bio-fluids. Additionally, contextualization of signaling and metabolic networks can become crucial to understand the interplay between different cellular processes that collectively give rise to complex diseases. Útdráttur Tilkoma mengjagagna hefur ýtt undir hraða þróun kerfislíffræði, fræðigreinar sem miðar að því að auka skilning á flóknum líffræðilegum kerfum. Meðal þeirra líkana sem eru notuð í kerfislíffræði eru skorðuð líkön af efnaskiptanetum, sem ná yfir stóran hluta af genamengjum lífvera. Líkön af efnaskiptanetum eru notuð í sífellt meiri mæli til að skilja hegðun fruma í heilbrigðu eða sjúku ástandi. Það felur oft í sér smíði sérhæfðra líkana af ákveðinni frumulínu eða vefjagerð við ákveðin skilyrði. Slík skilyrða-sérhæfð líkön má smíða með því að tvinna saman mengjagögn og almenn líkön. Utanfrumumælingar á efnaskiptaefnamengi fruma við tiltekin skilyrði má nota til að smíða sérhæfð efnaskiptalíkön. Auðvelt að nálgast slíkar mælingar, en aðferðir til að smíða líkön út frá þeim hafa hingað til ekki verið nægilega þróaðar. Þessi ritgerð mun kynna verkferli til að smíða skilyrðasérhæfð efnaskiptalíkön út frá utanfrumumælingum af efnaskiptaefnamengjum og almennu líkani af efnaskiptaneti manna. Sérhæfð líkön fyrir krabbameinsfrumulínur má nota í rannsóknum á efnaskiptasvipgerðum slíkra frumulína til að spá fyrir um ATP nýtni og næmni fyrir umhverfis- og genabreytingum. Líkönin eru smíðuð með því að fækka víxlunarefnahvörfum í samræmi við mældar breytingar á styrkleika efnaskiptaefna. Þessi fækkun ein og sér leiðir ekki til mikillar minnkunar á umfremd innri efnaskiptaferla. Minnkun á umfremd innri efnaskiptaferla fæst fram með viðbótargögnum um umritamengi frumulínanna. Í rannsókn sem hér er lýst tvinnuðum við saman gögnum um bæði efnaskiptaefnamengi og umritamengi til að smíða líkön af tveimur frumulínum úr hvítblæði í eitilfrumum. Líkönin skýra mismun á styrkbreytingum í ræktunarvökva þessarra tveggja frumulína með mismunandi notkun á sykurrofi og oxunarfosfórun. Greining okkar leiddi einnig í ljós uppsöfnun á mismunandi genastýringaratburðum og breytilegri splæsingu við lykilskref í miðlægum efnaskiptaferlum. Efnaskipti eru náið samtengd öðrum frumuferlum, sérstaklega boðefnaferlum sem leika lykilhlutverk í sjúkdómum eins og krabbameini. Við þróuðum því aðferð til að aðlaga boðefnanet og opnuðum þar með á enn aðra leið til að greina mengjagögn. Við sýnum þessa aðferð með því að aðlaga boðefnanet fyrir Toll-líka viðtaka (TLR net), fyrst að almennu TLR neti í einkjörnungum, svo að LPS virkjuðu TLR neti. Vinna mín í heild sinni eykur við umfang samtvinnunar mengjagagna innan kerfislíffræði. Aðferðir til að draga ályktanir um innri ástand efnaskiptaneta út frá utanfrumumælingum opna á mikla möguleika fyrir einstaklingsmiðaðar lækningar, eins og sýnt er fram á hér. Þó er þörf á frekari þróun aðferða til að túlka gögn um efnaskiptaefnamengi sem fengin eru úr lífvökva. Þar að auki getur aðlögun boðefna- og efnaskiptaneta orðið lykilatriði í að skilja samspil mismunandi frumuferla sem saman valda flóknum sjúkdómum. To Inge Aurich, Miriam, Elias, Finn & Friederike Contents List of Figures xi List of Tables xiii Abbreviations xxi Acknowledgements 1 1 Introduction 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Systems biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . COBRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Methods to explore the solution space . . . . . . . . . . . . 1.2.2 Flux balance analysis . . . . . . . . . . . . . . . . . . . . . 1.2.3 Flux variability analysis . . . . . . . . . . . . . . . . . . . 1.2.4 Sampling analysis . . . . . . . . . . . . . . . . . . . . . . Biochemical networks . . . . . . . . . . . . . . . . . . . . . . . . Signaling networks . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Innate Immunity . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Reconstruction of human Toll-like receptor signaling network Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Human metabolic genome-scale reconstructions . . . . . . . 1.5.2 Cancer as a metabolic disease . . . . . . . . . . . . . . . . 1.5.3 The importance of extracellular membrane transporters in Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 Using COBRA to investigate cancer metabolism . . . . . . High-throughput data . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Transcriptomics . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Proteomics . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Metabolomics . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of omics data in the context of COBRA models . . . . . . 1.7.1 Methods for network contextualization . . . . . . . . . . . 1.7.2 Human cell-type specific metabolic models . . . . . . . . . 1.7.3 Integration of metabolomic data sets . . . . . . . . . . . . . 1.7.4 COBRA for biomedical applications and personalized health 1.7.5 Existing challenges . . . . . . . . . . . . . . . . . . . . . . Preview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 5 5 6 6 6 7 7 8 10 12 14 16 17 17 18 18 19 20 21 22 23 26 vii 2 Metabolic heterogeneity and robustness among the NCI-60 cancer cell lines 29 2.1 2.2 2.3 2.4 2.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Generation of heterogeneous cancer cell line models . . . . 2.2.2 Distinction of metabolic phenotypes . . . . . . . . . . . . . 2.2.3 Robustness towards genetic and environmental perturbation Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matherial and Methods . . . . . . . . . . . . . . . . . . . . . . . . Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . 29 32 32 32 34 38 41 47 3 Prediction of intracellular metabolic states from extracellular metabolomic data 61 3.1 3.2 3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pipeline for generation of condition-specific metabolic cell line models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Generation of experimental data . . . . . . . . . . . . . . . 3.3.2 Analysis of experimental data . . . . . . . . . . . . . . . . 3.3.3 Generation of the condition-specific models . . . . . . . . . 3.3.4 Condition-specific metabolic models for CCRF-CEM and Molt-4 cells . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Condition-specific cell line models predict distinct metabolic strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Experimental validation of energy and redox status of CCRF-CEM and Molt-4 cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Comparison of network utilization and alteration in gene expression 3.6 Accumulation of DEGs and AS genes at key metabolic steps . . . . 3.7 Single gene deletion . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . 61 63 65 65 65 66 66 67 69 70 71 73 73 77 88 4 Contextualization Procedure and Modeling of Monocyte Specic TLR Signaling 103 4.1 4.2 viii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.2.1 Extensions of gene results in ihsTLRv2 . . . . . . . . . . . 105 4.2.2 Protein-Protein Interactions (PPI) in InnateDB and ihsTLRv2 107 4.2.3 SNPs in the TLR signaling network . . . . . . . . . . . . . 107 4.2.4 Tissue specific TLR expression . . . . . . . . . . . . . . . 109 4.2.5 Protein abundance of ihsTLRv2 in cancer cell lines . . . . . 113 4.2.6 4.3 4.4 4.5 Generation of a draft monocyte specific TLR model based on gene expression data . . . . . . . . . . . . . . . . . . . 114 4.2.7 Literature based curation of the draft monocyte specific TLR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.2.8 Tailoring the monocyte TLR model to a LPS stimulation specific model . . . . . . . . . . . . . . . . . . . . . . . . 118 4.2.9 Condition specific network states of monocyte TLR signaling 120 4.2.10 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 121 4.2.11 Setting quantitative gene expression changes into context . . 121 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 128 Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . 134 5 Conclusions and future directions 5.1 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future applications . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Extension of the TLR signaling network. . . . . . . . . 5.2.2 Future directions in the integration of metabolomics data 5.2.3 COBRA modeling of cancer and beyond . . . . . . . . 157 . . . . . . . . . . 157 160 161 161 162 Bibliography 165 6 List of Publications 185 ix List of Figures 1.1 COBRA: Definition and methods for the functional analysis of the feasible solution space. . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Applications of the human metabolic model. . . . . . . . . . . . . . 9 1.3 Omics data sets provide a snap-shot of the cellular components at large scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Ideal systems biological approach. . . . . . . . . . . . . . . . . . . 20 2.1 Metabolic models provide a context for the analysis of metabolomic data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 Distinction of the models based on energy and cofactor production. . 33 2.3 Distinct phenotypes with regard to oxygen requirements. . . . . . . 35 2.4 Six model clusters were distinguished according to the models robustness towards environmental changes. . . . . . . . . . . . . . . 36 2.5 The models have different sets of essential genes. . . . . . . . . . . 37 2.6 Variation between samples of the same cell line. . . . . . . . . . . . 52 2.7 ATP yield is not informative for the division of OxPhos models. . . 53 2.8 Distinct solution spaces were observed for the 120 models. . . . . . 53 2.9 Highest or lowest number of KOs were not associated with any phenotype defined by the previous analysis. . . . . . . . . . . . . . . . 54 2.10 ATP yield does not correlate with maximal growths. . . . . . . . . . 54 xi 2.11 Metabolic strategies considering both ATP producing glycolysis reactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.12 ATP yields do not correspond to the separated clusters of models from the Phase plane analysis . . . . . . . . . . . . . . . . . . . . . 58 3.1 xii Combined experimental and computational pipeline to study human metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2 Sampling reveals different utilization of glycolysis. . . . . . . . . . 68 3.3 Differences in the use of the TCA cycle by the CCRF-CEM model and the Molt-4 model. . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.4 Sampling reveals different utilization of oxidative phosphorylation. . 86 3.5 Experimental validation of model predictions. . . . . . . . . . . . . 87 3.6 Growth and apoptosis of Molt-4 and CCRF-CEM cells. . . . . . . . 92 4.1 Expression of ihsTLRv2 gene products in normal human tissues. . . 112 4.2 Workflow leading from ihsTLRv1 to a data driven monocyte and LPS stimulated monocyte model. . . . . . . . . . . . . . . . . . . . 116 4.3 Definition of cutoff for initial monocyte draft-model. . . . . . . . . 117 4.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.5 Network resulting from mapping of the up-regulated genes onto the LPS stimulation specific monocyte model. . . . . . . . . . . . . . . 123 4.6 Comparison of (chemical compound) connectivity in the LPS stimulation specific versus the up-regulated sub-network. . . . . . . . . 124 4.7 Network modules resulting from mapping of the down-regulated genes onto the LPS stimulation specific monocyte TLR model. . . . 125 List of Tables 1.1 Metabolite transporters relevant to cancer and their current coverage in Recon 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Methods for network contextualization. . . . . . . . . . . . . . . . 25 2.1 Reactions discarded from flux split analysis (and ATP yield). . . . . 46 2.2 Distinct Phenotypes. . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.3 Sampling results of the isocitrate dehydrogenase and pyruvate dehydrogenase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Excluded were uncalibrated metabolites and those that could not be produced nor consumed by Recon. . . . . . . . . . . . . . . . . . . 49 2.5 Added exchanges. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.6 Metabolite uptake and secretion not possible in the model. . . . . . 51 2.7 Reactions added to the starting model. . . . . . . . . . . . . . . . . 59 2.8 Models that were infeasible when constraint to experimental growth. 60 3.1 Differentially expressed genes (DEGs) and alternative splicing (AS) events of central metabolic and cancer-related pathways. . . . . . . 72 3.2 Reactions added to Recon 2 and the global model. . . . . . . . . . . 89 3.3 Comparison of flux changes and gene expression changes of genes more highly expressed in Molt-4 cells. . . . . . . . . . . . . . . . . 90 Unique Knock-out (KO) genes for each cancer cell line model. . . . 91 2.4 3.4 xiii 3.5 Metabolomic data of CCRF-CEM cells (mapped). . . . . . . . . . . 93 3.6 Metabolomic data of CCRF-CEM cells (not mapped). . . . . . . . . 94 3.7 Metabolomic data of Molt-4 cells (mapped). . . . . . . . . . . . . . 95 3.8 Metabolomic data of Molt-4 cells (not mapped). . . . . . . . . . . . 96 3.9 Tables of absent genes. . . . . . . . . . . . . . . . . . . . . . . . . 97 3.10 Differentially expressed Recon 1 genes (down-regulated). . . . . . . 98 3.11 Differentially expressed Recon 1 genes (up-regulated). . . . . . . . 99 3.12 Detection limits for the definition of model bounds. . . . . . . . . . 100 3.13 Calculation of the growth rates and definition of upper (ub) and lower bounds (lb) imposed on the CCRF-CEM model. . . . . . . . 101 3.14 Calculation of the growth rates and definition of upper (ub) and lower bounds (lb) imposed on the Molt-4 model. . . . . . . . . . . 101 3.15 Lower bounds of commonly exchanged metabolites were adjusted according to the relation of change in uptake/secretion in the experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 xiv 4.1 Statistics of the gene extension of the generic human TLR model. . 106 4.2 Comparison between InnateDB interactions among ihsTLRv2 genes and interactions of ihsTLRv2 network species within ihsTLRv2. . . 108 4.3 Table summarizing ihsTLRv2 genes with clinically linked SNPs, corresponding clinical phenotypes and consequences of in silico knock out on ihsTLRv2 function. . . . . . . . . . . . . . . . . . . . 110 4.4 Distribution of absent genes . . . . . . . . . . . . . . . . . . . . . 118 4.5 Inputs and outputs covered by generic (ihsTLRv2) and monocyte specific (hMonoTLR & hMonoTLR_LPS) TLR signaling models. . 119 4.6 Maximum possible flux values for output reactions in the different TLR signaling models. . . . . . . . . . . . . . . . . . . . . . . . . 120 4.7 TLR11 receptor was removed from ihsTLRv2 along with 10 reactions associated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.8 TLR11 receptor was removed from ihsTLRv2 along with seven other metabolites. . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.9 Added exchange reactions. . . . . . . . . . . . . . . . . . . . . . . 135 4.10 Literature evidence for the presence of proteins in monocytes. . . . 135 4.11 Pathway curation of the monocyte draft-model based on output capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.12 Curation of hMonoTLR. . . . . . . . . . . . . . . . . . . . . . . . 137 4.13 Significantly up-regulated hMonoTLR_LPS genes. . . . . . . . . . 138 4.14 Significantly down-regulated hMonoTLR_LPS genes. . . . . . . . . 138 4.15 I/O relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.16 Changes due to mapping of quantitative gene expression changes. . 140 4.17 Changes due to mapping of quantitative gene expression changes (part 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 xv Glossary • AS - Alternatively spliced genes • ADP - Adenosinediphosphate • AP-1 - Activating protein-1 • ATP - Adenosinetriphosphate • ACHR - Artificial centering hit-and-run sampler • BIGG - Biochemically, genetically, and genomically structured knowledgebase of the target organism • CASP8 - Caspase-8 • COBRA - Constraint-based reconstruction and analysis • CNS - Central nervous system • CHO - Chinese Hamster ovary cells • DEGs - Differentially expressed genes • DABG - Detection above background • DIOS - Distinct input/output pathways • ETC - Electron transport chain • EGFR - Epidermal growth factor receptor • FC - Fold change • FBA - Flux Balance Analysis • FVA - Flux Variability Analysis xvii • FH - Fumerate hydratase • FADD - Fas (TNFRSF6)-associated via death domain • LC-MS - Liquid chromatography-mass spectrometry • GC-MS - Gas chromatography-mass spectrometry • GSH - Reduced glutathione • G6PD - Glucose-6-phosphat-Dehydrogenase • GEO - Gene Expression Omnibus • GRAs - Gene-reaction associations • GTP - Guanosine triphosphate • GPRs - Gene-Protein-Reaction associations • GENRE - Genome-scale network reconstruction • GIMME - Gene Inactivity Moderated by Metabolism and Expression • GIM(3)E - Gene Inactivation Moderated by Metabolism, Metabolomics and Expression • HMDB - Human Metabolomics Database • HCC - Hepatocellular carcinoma • HHT - Hereditary Hemorrhagic Telangiectasia • HDHC - Histone acetylase • HMR - Human metabolic reaction database • HT - High-throughput data • HLRCC - Renal-cell cancer • iBMK - Immortalized baby mouse kidney epithelial cells xviii • IRFs - Interferon regulating factors • iMAT - Integrative Metabolic Analysis tool • IDH - Isocitrate dehydrogenase • IL1R1 - Interleukin-1 receptor 1 • IRAK1 - IL-1 receptor associated kinase 1 • I/O - Input-output • IRF3 - Interferon regulatory factor 3 • IRF7 - Interferon regulatory factor 7 • IKK - Inhibitor of the kappa light polypeptide gene enhancer in B-cells kinase • KO - (Gene) knock-out • LPS - Lipopolysaccharide • LB - Lower bound • LBP - Lipopolysaccharide-binding protein • LP - Linear programming • LPS - Lipopolysaccharide • MDR - Multidrug resistance • MBA - Model Building Algorithm • mCADRE - Context-specificity Assessed by Deterministic Reaction Evaluation • MILP - Mixed-integer linear programming • MS - Mass-spectrometry • NMR - Magnetic resonance spectroscop xix • NCF2 - Neutrophil cytosolic factor 2 • ORAC - Oxygen Radical Absorbance Capacity • OF - Objective function • PC - Pyruvate carboxylase • PhPP - Phenotypic phase plane analysis • PGDH - Phosphoglycerate dehydrogenase • PPP - Pentose phosphate pathway • PKA - Protein kinase A • PELI3 - Pellino homolog 3 • PPI - Protein-Protein Interactions • PI3K1A - Phosphoinositide 3-kinase • PC - Pyruvate carboxylase • PGK - Phosphoglycerate kinase • ROS - Reactive oxygen species • RMA - Robust Multi-array Analysis • RC - Reductive carboxylation • Recon - The human metabolic genome-scale reconstruction • SOG (pathway) - Serine biosynthesis, one-carbon metabolism, and the glycine cleavage system • SDH - Succinate dehydrogenase • SNP - Single nucleotide polymorphism • S - Stoichiometric matrix xx • SPs - Side populations • SDH - Succinate dehydrogenase • TLR - Toll-like receptor • TF - Transcription factors • TIRAP - TIR domain containing adaptor protein • TCA - Tricarboxylic acid • UB - Upper bound xxi Acknowledgments I thank the ERC that funded my doctorate studies at the Center for Systems Biology, at the University of Iceland. Further, I thank Bernhard Ø. Palsson for giving me the opportunity to conduct this PhD. Foremost, I would like to thank my advisor Professor Ines Thiele, for providing me with the opportunity to complete my PhD thesis at the Center for Systems Biology, at the University of Iceland. I am very grateful for all the support and guidance that made my thesis work possible. I am very grateful for his patience, motivation, enthusiasm that she dedicated to me. Many thanks to the remaining members of the Center for Systems Biology who have provided their support, guidance, encouragement and friendship. Thank you all so much for the time we spent together. I want to thank my family for their support during the long years I spent abroad, the understanding and uplifting words that I always received from them. I thank all my friends in Iceland, Germany, and Austria for their patience and encouragement. Also, I am very happy about the support I received from my friends in Luxembourg, who were a great help to make the final steps happen. Particularly, I want to thank my friend Friederike. I would not be at this point without you, although separated by an ocean, I was always aware of your believe in me and your support. Thank you so much. 1 1 Introduction 1.1 Systems biology Methodiological developments in molecular biology now allow simultaneous measurements of thousands of cellular components at different hierarchical levels, including mRNA, proteins, and metabolites (Figure 1.3, [1]). The flood of data has challenged data management and analysis in biological laboratories, but at the same time such comprehensive information provides valuable resources for systems biology, which seeks to investigate the behavior of biological systems at large scale [1]. Network reconstructions are a common tool applied by systems biologists, some of which can be converted into in silico models and used to interrogate the respective system’s functional properties [2]. Systems biologists apply different formalisms to modeling and simulation [3, 4]. Generally, there is a separation between topdown and bottom-up network reconstruction. Top-down approaches of network reconstruction infer networks directly from the data, yet the resulting connections of these networks might not represent actual biological interactions. Networks reconstructed in a bottom-up fashion based on extensive amounts of biochemical literature on the other hand, provide a mechanistic framework for the analysis of omics data sets, and can therefore elucidate genotype-phenotype relationships. Detailed dynamic models are still limited to small scale (e.g., single pathways) due to lack of the kinetic parameters that describe each of the reaction in the complex systems. The amount of data needed to obtain these parameters and their condition-specific variation make the acquisition even more complicated [4, 5]. Constraint-based modeling and analysis (COBRA) circumvents this parameter bottleneck by assuming a quasi steady-state. This assumption allows the simulation of the systems behavior at large-scale, and because of its comprehensiveness, it constitutes an ideal framework for the analysis of high-dimensional omics data sets [3, 4]. The focus of this PhD thesis in the field of systems biology is the fusion of omics data sets and biochemical networks using COBRA for systems-wide analysis of biochemical processes, e.g., the innate immune response and metabolism in health and disease. The following sections will provide an overview of the COBRA approach, biochemical networks of human metabolism and innate immune signaling, as well as the state-of-the-art of network contextualization and its application to biomedical research, relevant to this thesis. 1 1.2 COBRA The COBRA approach uses stoichiometry of biochemical reactions to mathematically represent biochemical networks of cellular processes, i.e., metabolism, signaling or transcriptional/translational networks [2, 6, 7, 8]. The genome-scale network reconstruction (GENRE) is assembled in a bottom-up reconstruction process, according to standardized operating procedures, based on extensive amounts of organism-specific literature [9]. It presents a biochemically, genetically, and genomically (BIGG) structured knowledge-base of the target organism that is curated and validated to ensure correct prediction of biological functions by the resulting model [2, 9, 10]. GENREs contain a hierarchical structure where known genes are connected to the proteins and enzymes and the catalyzed reactions. These Gene-Protein-Reaction associations (GPRs) are formulated as Boolean rules considering isozymes (OR) and all subunits of protein complexes (AND). The GPRs are the entry points for the integration of transcriptomic and proteomic data into the network context, and correct formulation of the GPRs is an important prerequisite for any network contextualization. Once a comprehensive reaction list has been compiled and all known genes have been associated with those reactions, the reconstruction can be converted into a mathematical model. This is through the conversion of the reaction list into a matrix format (stoichiometric matrix (S)), and the formulation of the systems boundaries equivalent to the constraints on the in vivo system (Figure 1.1, [2, 10]). The stoichiometric matrix (S) contains a row for each metabolite and a column for each reaction [5, 10, 11]. The non-zero entries of the S describe which metabolites participate in each reaction, with a negative entry identifying the substrates and a positive entry defining a product [5]. Constraints restricting biological systems can be divided into three groups: Physicochemical (hard) constraints (mass and energy conservation), environmental constraints express the time and condition specific differences (e.g., pH or nutrients), and self-imposed, regulatory constraints [12]. In the model, constraints are routinely applied as either balances or bounds. According to the physical law of mass conservation, net production and consumption of a metabolite is balanced at steady state. The steady-state assumption, (including all mass balance equations) is expressed mathematically by S * v = 0, where v is the flux vector containing all reaction fluxes of one of the entire set of optimal states of the system [13]. The steady-state is biologically justified through the notion that transients in biochemical reaction networks are much faster compared to other cellular events, e.g., cellular growth rates, and environmental and regulatory changes [13]. Bounds constitute 2 upper and lower limits (vmin ≤ v ≤ vmax ), and restrictions on reaction directions vmin = 0 or vmax = 0 [5, 14] (Figure 1.1). The upper and lower bounds can further be set in accordance with experimental data (e.g., metabolite uptake and secretion fluxes) [5, 14], for a more authentic definition of the solution space, selecting the condition-specific subset of feasible flux distributions from the entity of possible network states (Figure 1.1). They provide the entry point for the integration of the extracellular metabolomic data, as carried out in chapter 2 and chapter 3 of this thesis. Once brought into model format, the set of feasible network states can be interrogated (Figure 1.1), e.g., using matlab and the COBRA toolbox [15, 16]. Figure 1.1: COBRA: Definition and methods for the functional analysis of the feasible solution space. Figure redrawn based on Orth et al. 2010 and Price et al. 2004, [11, 17] 1.2.1 Methods to explore the solution space Numerous methods exist to interrogate COBRA models [15, 16]. These can be distinguished into biased and unbiased methods. Biased methods rely on the optimality principle and require a user-defined objective function (OF), which biologically translates into the «cellular goal »(Figure 1.1). Biomass generation and ATP production are commonly used OFs [18, 19]. A biomass OF is used to identify the subset of model states which support optimal biomass production by the model. This optimality principle is, at least in microorganisms, thought to be the outcome of an evolutionary process, driving the organism to maximal proliferation rates and 3 the optimal use of the available, usually limited resources. The definition of an OF is more difficult for cells of multi-cellular organisms, e.g., differentiated, nonproliferating cells. In contrast, highly proliferating cancer cells might indeed seek optimal biomass production [20]. Unbiased methods in contrast allow the interrogation of the allowable solution space without any prior optimality assumption. In the course of this PhD, sampling methods have increasingly been applied to investigate cell-type specific metabolic networks [5, 12, 21]. Below, a subset of biased and unbiased interrogation methods relevant to the presented studies will be briefly introduced. 1.2.2 Flux balance analysis Flux balance analysis (FBA) is used to predict a single flux distribution through the formulation of a linear programming (LP) problem either minimizing of maximizing the flux through the objective function subject to all imposed constraints (Figure 1.1) [11, 12]. The output flux vector v describes how much each reaction in the network contributes to the phenotype [11]. The following LP problem is solved to maximize the stated objective Z (adapted from [11, 22]): MAX Z = c · v s.t. S · v = 0 vmin ≤ v ≤ vmax , where c is a vector that identifies the objective, and the column vector v indicates how much each reaction contributes [11]. Since S is under-determined, i.e., the number of reactions exceeds the number of metabolites, high numbers of flux distributions exist of the same maximum objective value [13, 22]. In comparison, FBA returns only one solution which lies at a corner of the allowable solution space (Figure 1.1) [11]. Whereas the multitude of alternate optimal solutions reflects the systems flexibil- 4 ity, depends the actual cellular state on additional factors such as the interplay of enzymatic and genetic regulatory events [5, 13]. In the absence of detailed enough constraints to exclude unlikely network states, all alternate optimal solutions could represent biologically meaningful solutions, making them worth to be investigated (Figure 1.1). 1.2.3 Flux variability analysis Flux Variability Analysis (FVA) can provide insights into alternate optimal solutions, which are an expression of the network redundancy. This redundancy contributes to the robustness of the metabolic network (Figure 1.1) [22]. FVA is a variation of FBA, which reports for each reaction in the model the minimal and the maximal allowable flux [22]. The analysis returns the range of allowable fluxes for each reaction, and can identify reactions that are never, or differently used under distinct sets of environmental or genetic conditions [5, 22]. Another biased FBA-based interrogation method is single gene deletion. Hereby, gene knock-outs (KO) are simulated by constraining gene-associated reactions to zero, followed by the assessment of the impact of the network perturbation through FBA [23]. This analysis is used to identify the weak links in the network, e.g., for the prediction of drug targets to combat cancer [24]. 1.2.4 Sampling analysis A more comprehensive resolution of the alternate flux distributions compared to FVA can be achieved through sampling analysis. During the sampling process randomly distributed points (each comprising a flux distribution) are picked from the feasible solution space, as a representation for the entire solution space (Figure 1.1, [25]). The artificial centering hit-and-run sampler (ACHR) is implemented in the COBRA toolbox, and has been used to study the solution space of larger networks [15, 25, 26]. The procedure starts from an initial point moving through the space with randomly chosen direction and step length [25]. Only every i-th point is collected to support a random distribution of the sampling points. However, in large scale networks, high dimensionality and size of the solution space render the coverage of the entire solution space in a finite time uncertain, which has also been referred to as the slow mixing problem [25]. The outcome of the sampling analysis is commonly illustrated as ranges of feasible fluxes, which is comparable to the results of the FVA (only unbiased), or as reaction-wise probability distributions [12]. 5 1.3 Biochemical networks A cell comprises of reams of components that support its structure, its functions or both. Even more complex than the component list are the interactions among the components, which make up the global cellular network. COBRA models to date cover only parts of the global cellular network, which are classified based on biological functionality, e.g., metabolism or signaling [4]. However, the integration of multiple functions, and the development towards whole-cell models is an active field of research [27, 28]. 1.4 Signaling networks The purpose of signaling networks is to convey information between the environment and the cell [2]. Signaling networks transmit extracellular signals, emitted either by other cells or clues from the environment into the cell and into the nucleus where transcription factors induce gene expression [2]. The process of signal transduction involves consecutively, the binding of an extracellular ligand to a specific receptor, the intracellular transmission of the evoked signal, e.g., by phosphorylation cascades that amplify the signal, and might evoke adaptations of the cell through changes in gene expression programs [2]. About two thousand genes in the human genome encode receptors, kinases and phosphatases which participate in numerous signaling cascades [29]. These pathways are highly interconnected, i.e., cross-talk, and which renders signal transduction networks highly complex [2]. Signaling networks are further connected to other cellular processes such as metabolism and to regulatory networks. In case of metabolism, connections exist through the dependency of the signaling pathways on energy and the utilization of common components [29]. Mammalian signaling networks have been reconstructed for the mammalian Toll-like receptor (TLR) signaling network [7], and the Jak-Stat signaling network [30]. 1.4.1 Innate Immunity In vertebrates, two immune systems exist, the innate and the adaptive immune system [31]. Innate immunity is the first line of defense, and provides rapid response to the invasion of pathogens [32, 33]. The human TLR signaling network is involved in both, innate and adaptive immunity [31, 34]. At least ten distinct TLRs have been identified in humans [35]. These TLRs are involved in the response to various microbial components, such as lipopolysaccharide (LPS), lipoprotein, porins, 6 peptidoglycan, flagellin, single- and double-stranded RNA, and unmethylated CpG oligonucleotides ([36] and references herein). Recognition of microbial components by TLRs initiate signal transduction that leads to the activation of transcription factors that induce expression of cytokines and other genes [33, 37]. Individual TLRs interact with different combinations of adapter proteins and activate transcription factors such as NF-κB, activating protein-1 (AP-1), and interferon regulating factors (IRFs), to drive an immune response [31]. Activation of TLR signaling has been observed in a number of human diseases including cancer, and tissue specific differences in TLR expression and cell response to environmental stimuli have been recognized as major challenge in cell signaling [38, 39, 40, 41, 42]. 1.4.2 Reconstruction of human Toll-like receptor signaling network The mammalian TLR signaling network [7] has been manually assembled based on published literature and a comprehensive map of TLR signaling [43]. It accounts for 909 reactions and 752 distinct chemical components. A total of 14 Toll-like receptors, 49 distinct ligands (including many microbial components), and six possible outputs have been considered. Overall, the functions of 158 protein-kinases and 16 phosphatases have been included in the TLR network. The outputs include NFκB, CRE, AP-1, reactive oxygen species (ROS) production, IRF3, and IRF7. The metabolites that were part of the signaling network were among the most highly connected network species [7]. Interrogation of the network let to the identification of ten distinct input/output (DIOS) pathways. The DIOS pathways were used to predict potential candidates to selectively interrupt ROS production, IL-1, and MyD88 pathways without compromising other DIOS pathways [7]. However, the reconstruction did not include genes or GPRs, such that integration of omics data into the network was not possible. 1.5 Metabolism Metabolism is a vital cellular process and comprises thousands of enzymatic reactions that generate energy and metabolites used to support cellular functions [44]. The multitude of reactions are arranged into sequential biochemical pathways, which are generally divided into anabolism and catabolism. Anabolism supplies the cell with building blocks such as amino acids and nucleic acids for maintenance and proliferation [2, 44]. Catabolism, on the other hand mediates the breakdown and salvage of nutrients and cellular components for energy generation [2, 44]. Within the cell, membranes separate metabolite pools, and a multitude of membrane trans- 7 porters is necessary to connect metabolic pathways in different cellular compartments. Gene expression programs define the set of enzymes present in a cell, and can be altered in response to changes in environmental, cellular or genetic conditions. In humans, single cells and tissue types contribute only a specific subset of metabolic functions into the systemic, whole-body metabolism. These differences arise through differences in the expression of enzymes, isoforms and alternative splicing of transcripts that alter the utilization of reactions and pathways. Given its central role in the maintenance and proliferation of cells, metabolism is strictly regulated and alternations of metabolism have been connected to various human diseases [2]. The following sections will provide an overview of the human metabolic network and human cancer as one of the most successful biomedical applications of COBRA, and biological topic recurring in this thesis. 1.5.1 Human metabolic genome-scale reconstructions Published in 2007, Homo Sapiens Recon 1 was the first genome-scale reconstruction of human metabolism [45]. It captured the functions of 2004 proteins, 2766 metabolites, and 3311 metabolic and transport reactions, which were assembled in a bottom-up reconstruction process based on extensive amounts of literature. Its pathways distribute over eight cellular compartments (cytoplasm, mitochondria, nucleus, endoplasmic reticulum, golgi apparatus, lysosome, peroxisome and the extracellular environment). It was validated based on 288 metabolic functions known to appear in cells throughout the human body. Since its publication, it has been extensively used as knowledge-base of human metabolism, to investigate general and cell-type specific metabolism, to close knowledge and network gaps in human metabolism, and for data mapping and the generation of tissue specific models, as well as to investigate human disease processes (Figure 1.2, [24, 46, 47, 48, 49, 50, 51]). Further, Recon 1 and the tissue-specific networks derived from it have been used to investigate host-pathogen or host-gut microbial interactions [21, 52]. The process of network reconstruction is a laborious task. In order to extent the scope and improve the predictability of Recon as the most comprehensive knowledgebase of human metabolism and for its various applications (Figure 1.2), the incorporation of newly emerging, additive, and corrective knowledge constitutes an ongoing iterative process (Figure 1.4, see also section 1.5.3.). As an expression of this iterative process, Recon 2 was recently published [53]. Re- 8 Figure 1.2: Applications of the human metabolic model. con 2 was created in a community-driven effort, combining Recon 1 with four other resources of human metabolism, namely EHMN [54], HepatoNet1 [55], Ac-FAO module [56] and the human small intestinal enterocyte reconstruction [57]. It covers a total of 1789 genes, 7440 reactions and 2626 unique metabolites distributed over eight cellular compartments, and its predictive capability has been demonstrated, e.g., through mapping of inborn errors of metabolism and different omics data sets [53]. As further development, corrections and additions to the content of extracellular metabolite transporters in both Recon 1 and Recon 2 have recently been reviewed [58]. The use of the human GENREs in combination with omics data sets will be discussed in more detail towards the end of this chapter. 9 1.5.2 Cancer as a metabolic disease Cancer is a major burden for the health systems worldwide. In the US, estimations reveal that cancer is the cause of every fourth death [59]. Both primary cancer tissue, and cell lines are used to unravel the mechanisms in cancer biology [60]. Tumors carry numerous and heterogeneous somatic mutations. As diverse these mutations might be, they frequently affect signaling pathways that regulate metabolism [61, 62, 63]. This connection to metabolism seems straightforward, provided that cancer cells proliferate at high rates, and each cell division requires the duplication of the biomass and extensive amounts of energy. The importance of metabolic alternations in cancer was already noted when Otto Warburg described the differences in the utilization of central metabolic pathways between cancer and normal body cells [64]. Even though it is known today that oxidative phosphorylation is functional in most cancer cells [65], remains the switch from mitochondrial respiration to aerobic glycolysis connected to the high secretion of lactate, one of the most important observations of cancer metabolism to date, i.e., the Warburg effect. Extensive amounts of glucose are thereby oxidized to pyruvate, which is subsequently converted into lactate, and secreted to restore NAD+ and maintain the high glycolytic flux [66]. The amount of ATP produced by glycolysis easily exceeds mitochondrial oxidative phosphorylation, yet cells depend on the constant supply of glucose [67]. A number of reasons as to why cancer cells might favor inefficient aerobic glycolysis have been discussed, including defective mitochondria, transformation under hypoxic conditions, the faster ATP production, upper limit on possible mitochondrial density in the cytosol, and also the support of biosynthetic pathways and redox control through diversion glycolytic intermediates into pentose phosphate pathway and one-carbon metabolism [61, 64, 68]. Support for the latter diversion of glycolytic intermediates also comes from the cancer characteristic expression of the pyruvate kinase isoform PKM2. This isoform slows down the glycolytic flux and thereby supports distribution of glycolytic intermediates into adjacent pathways [61]. Besides glucose, cancer cell metabolism heavily relies on glutamine [66]. Once inside the cell and converted to glutamate, it either enters glutathione biosynthesis or TCA cycle as α-ketogluterate in a process called anaplerosis [61]. Anaplerosis replaces the carbon lost through efflux of TCA cycle intermediates into biosynthetic pathways, referred to as cataplerosis. The reasons for the addiction of cancer cells to glutamine remains unresolved, yet genetic and micro-environmental factors have been proposed [61, 66]. Reductive TCA cycle flux involving the NADPH associated IDH1 has been observed in different cell lines, allowing them to direct glutamine towards cytosolic lipid synthesis at least under hypoxic conditions [69, 70]. However, the micro-environment of cancer cells within tumors can vary greatly, and cells within the tumor might face starvation due to lacking vascularization. It is therefore 10 not surprising to find that cancer cells depend on catabolic processes and use fatty acids and ketone bodies [71]. The following section of this chapter, is in full a reprint from a section that appears in Sahoo, S, Aurich, MK, Jonsson, JJ, Thiele, I (2014) Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease.Front. Physiol., 2014, 5:91. I was a contributing author of this publication, and author of the part which forms the basis for the following section. 11 1.5.3 The importance of extracellular membrane transporters in Cancer As described above, some of the metabolic characteristics of cancer cells are the high uptake of glucose, aerobic glycolysis including the secretion of lactate (Warburg effect), and a high rate of glutaminolysis to compensate for the efflux of TCA cycle intermediates into biosynthetic pathways [66]. Alternations in metabolite uptake (e.g., amino acids and glucose) and secretion through specific sets of metabolite transporters constitute key factors for how these continuously proliferating cells meet their metabolic demands [72]. Redundancy and overlapping substrate specificity exist within and between metabolite transporter families. Cancer cells have to operate sets of transporters that best nourish their metabolic dependencies. In fact, the distinctive transporter expression between cancerous and normal cells could provide good opportunities for targeted treatment [72]. The contribution of transporters in cancer discussed above has been reviewed elsewhere [72, 73, 74, 75, 76] and is summarized in Table 1.1. Coverage and accurate representation of transport systems are essential to perform valuable simulations using COBRA. Recon 1 has been used for the generation and analysis of cancer-specific metabolic models [24, 77, 78, 79] and has been recently summarized [20, 51]. Of the 22 extracellular transporters (Table 1.1, individual bicarbonate exchanger count as one) that play a role in cancer metabolic reprogramming and proliferation, 13 transporters are correctly represented in Recon 2 (Table 1.1), three need to be modified, and four are still missing or require further curation. This section discusses the cancer relevant transporters currently missing or requiring revision (Table 1.1). The pyruvate to lactate conversion is necessary to sustain a high glycolytic flux [66]. The accumulation of lactate and a decreasing pyruvate level put cell survival at risk due to increasing acidification of the cytoplasm. Cancer cells counteract the decrease in intracellular pH by specific ion transport (i.e., bicarbonate and protons) and lactate export via lactate/H+ symport, which is mediated by one of the four MTC transporters (SLC16A1, GeneID: 6566; SLC16A7 GeneID: 9194; SLC16A8 GeneID: 23539; SLC16A4 GeneID: 9122). The high affinity lactate transporter SMCT1 (SLC5A8, GeneID: 160728) favors the import of lactate [80] and is suppressed in a number of cancer cell types, as summarized in [72]. For example, SLC5A8 is silenced by methylation in human astrocytomas and oligodendrogliomas [81] and in primary colon cancers and colon cancer cell lines [82]. In addition to its transporter function, the SLC5A8 protein has a demonstrated role in tumor suppression through the active import of endogenous inhibitors of histone 12 acetylases (HDACs) (i.e., butyrate, which originates from gut microbes, and pyruvate [83, 84]). Recently, SLC5A8 was shown to counteract tumor progression independent from its transport function. Instead, SLC5A8 acts through an unknown mechanism involving a decrease in the anti-apoptotic protein survivin [85]. Recon 2 includes passive iodide transport via SLC5A8 and the Na+ -coupled transport of lactate, pyruvate, and the short-chain fatty acids acetate, propionate, and butyrate (Table 1.1, [86]). Hence, these data were added in the transportmodule. SLC5A8 was not included in Recon 2, most likely because this protein has been mainly discussed in the context of cancer. ABC transporters mediate the efflux of cytotoxic drugs, causing multidrug resistance (MDR) and chemotherapy failure [76, 87]. Two of the four major drug transporters, MDR1 (ABCB1, GeneID: 5243) and ABCG2 (ABCG2, GeneID: 9429), are missing in Recon 2. Both are known to be overexpressed in different cancer types [76]. A subpopulation of cancer cells with enriched stem cell activity, so called side populations (SPs), have been extracted from six human lung cancer cell lines (H460, H23, HTB-58, A549, H441, and H2170). When tested for an elevation in ABC transporter expression, all of the SPs displayed a significantly higher mRNA expression for ABCG2 compared to their non-SP counterparts [88]. Four SPs also showed a significantly higher expression for MDR1 transporters. All six showed resistance to exposure to different chemotherapeutic drugs. The survival of such cells with stem cell activity upon drug treatment could be connected to a relapse in vivo [88], and ABC transporter expression might be an indicators for this cancer cell phenotype. Strong expression of aquaporins has been observed in various tumors, especially aggressive tumors [74]. Some aquaporins are exclusively expressed in malignant tissue [74]. The aquaglyceroporin aquaporin-3, AQP3 (AQP3, GeneID: 360), which also transports glycerin in addition to water, is expressed in normal epidermis and overexpressed in basal cell carcinoma and human skin squamous cell carcinomas [89]. AQP3-facilitated glycerol transport was found to determine cellular ATP levels and therefore be important for hyperproliferation and tumor cell proliferation in epidermal mice cells [89]. Correspondingly, the resistance of AQP3 null-mice toward skin tumors might arise through reduced tumor cell glycerol metabolism and ATP generation [89]. This property renders AQP3 inhibition a possible target for the prevention and treatment of skin, and possibly other, cancers associated with aquaglyceroporin overexpression [89]. AQP3 is currently missing in Recon 2 and covered in the transport module. Although many of the transporters associated with cancer are present in Recon 2 (Table 1.1), important mediators of intra and extracellular pH, drug resistance, and proliferative energy metabolism are still missing. 13 Table 1.1: Yellow shading indicates genes encoding either absent transport proteins or transport proteins with limited substrate specificity in Recon 2. Blue shading indicates improvement in the transporter data (either the addition of the protein and its associated reactions, the expansion of its substrates, or modification of the GPRs) in Recon 2 over Recon 1. Entrez Gene ID 4363 6510 6513 6515 6566 6584 8884 9123 9194 23539 23657, 6520 80704 154091 6523 8140, 6520 11254 360 5243 9429 160728 Transporter MRP1 (ABCC1) ASCT2 (SLC1A5) GLUT1 GLUT3 SLC16A1/MCT1 OCTN2 (SLC22A5) SMVT SLC16A4/MCT4 SLC16A2/MCT2 SLC16A3/MCT3 xCT(SLC7A11)/ 4F2hc (SLC3A2) SLC19A3 GLUT12 SGLT1 LAT1 (SLC7A5)/ 4F2hc (SLC3A2) ATB0,+ (SLC6A14) AQP3 MDR1 (ABCB1) ABCG2 SLC5A8/SMCT-1 Relevant cargo Xenobiotics, cytotoxic drugs Glutamine Glucose Glucose Lactate Carnitine Biotin Lactate Lactate Lactate Cysteine Thiamine Glucose Glucose Glutamine/cysteine antiport Proteinogenic amino acids except glutamate and aspartate Glycerol Xenobiotics, cytotoxic drugs Xenobiotics, cytotoxic drugs Lactate, pyruvate, and butyrate (gut microbes) Reference [76] [72, 73, 90] [72] [72] [72, 90] [79, 91] [92] [72, 90] [72, 90] [72, 90] [72, 73] [93] [72] [72] [72, 73] [72] [74, 89] [76] [76] [72, 83, 84] The addiction of cancer cells to glucose and glutamine, and the expression of distinct isozymes frequently detected in cancer cells, altogether demonstrate the important role of metabolism in cancer, and emphasize the potential for metabolic targets in cancer therapy [51, 94]. 1.5.4 Using COBRA to investigate cancer metabolism Because metabolism constitutes a central part in the disease, Recon 1 and COBRA provide ideal frameworks for the investigation of cancer [20], which has been reviewed extensively [51, 95, 96, 97]. Since 2010, a growing number of COBRA studies investigated the Warburg effect and other aspects of cancer metabolism [24, 77, 78, 94, 98, 99, 100, 101]. The first study of cancer metabolism used a small model which captured only the most experimentally studied pathways in cancer namely glycolysis, TCA cycle, pentose phosphate pathway, glutaminolysis and oxidative phosphorylation. This model was able to represent the physiological conditions in Hela cells and predicted lactate dehydrogenase and pyruvate dehydrogenase as metabolic drug targets [94]. Further, solvent capacity constraints, i.e., the limit of mitochondrial density in 14 the cytoplasm, evoked a glucose uptake dependent dichotomy of metabolic regimes in a reduced flux balance model of ATP production. This dichotomy consisted in a switch from oxidative phosphorylation to aerobic glycolysis [68, 102]. The important role of enzyme mass restrictions at high proliferation rates, and potentially the emergence of the Warburg effect was consolidated by another group, using Recon 1 and a cancer biomass objective function [98]. Whereas the above mentioned results were obtained making use of the normal glycolysis, pointed further work towards the existence of different pathway alternatives in cancer cells [100]. This was the redistribution of metabolic flux into an alternative glycolytic pathway with net zero ATP production that involved reactions in the serine biosynthesis, one-carbon metabolism, and the glycine cleavage system (SOG pathway) [100]. It was further predicted that aerobic glycolysis arises from solvent capacity limits in cancer and proliferating normal muscle cells equally, both on a small-scale and a large-scale model. Aerobic glycolysis provided higher ATP yield per volume density than mitochondrial oxidative phosphorylation [68]. Tedeschi et al. (2013) further investigated the predictions of ATP generation through the SOG pathway. Their results supported the view that the SOG pathway supports cancer proliferation with ATP, NADPH and purines [101]. The first genome-scale model of cancer metabolism was derived from Recon 1 [45] using a version of the Model Building Algorithm (MBA) [24, 103]. MBA was further applied to generate a non-small cell lung cancer model using multiple gene expression data sets, which showed a predictive superiority for cell line specific, growth-supporting genes compared to the generic cancer model [24]. The generic cancer model was used to predict synthetic lethal gene pairs as potential drug targets of which a subset was non-toxic to the global model (Recon 1). Succinate dehydrogenase (SDH) and fumerate hydratase (FH), both frequently mutated in different cancer types, were both predicted to be synthetically lethal with pyruvate carboxylase (PC). PC was a valid therapeutic target to specifically target SDH and FH deficient cancer cells [24]. In a follow up study, lethal synergy between FH and enzymes of the heme metabolic pathway were experimentally validated to provide insight into the so far unresolved mechanism by which FH deficient cells survive a non-functional TCA cycle caused by the mutation in the fumerate hydratase gene, e.g., in renal-cell cancer (HLRCC) [77]. Gatto et al. (2014) used cancer type specific metabolic models as an estimator to confirm the reduced metabolic network of ccRCC cancers. The authors observed unique metabolic reprogramming in ccRCC based on transcriptomic and proteomic data that was not shared by any other tumor tissue [104]. A kidney cancer model that was majorly based on data of ccRCC cells [99] was reduced in size (20% of reactions and 35% of genes) and metabolic functionality compared to a normal kidney model [99] and other cancer models reconstructed in the same way. This reduction was in line with the observed downregulation of genes in metabolic pathways in ccRCC [104]. Taken together, these studies reveal that the use of COBRA in the field of cancer research has developed 15 along with the evolving views in the field, from the earliest studies investigating the Warburg effect as prevalent hallmark of cancer at the time, and trying to answer questions as to why aerobic glycolysis might provide an advantage to cancer cells [68], and has come to a point where it has even been applied one step further than mere search for the source but to prediction potential treatment possibilities [24]. 1.6 High-throughput data Cellular phenotypes differ although all human cells carry the same genetic information. A cell type specific set of genes is transcribed and mRNAs subsequently translated into proteins which, once activated, can carry out their (catalytic) function. Proteins however, are subject to degradation processes, or might be inactivated otherwise, since abundance and activity of enzymes and proteins within the cell is tightly regulated [105]. Phenotypic and functional differences of the about 200 human cell types arise from regulation of gene expression and active cellular protein contents, pathway, and reaction fluxes (Figure 1.3, [106]). Figure 1.3: Omics data sets provide a snap-shot of the cellular components at large scale. These data sets are ideal for interrogation of cellular network from a systems perspective. High-throughput (HT) technologies measure a multitude of cellular components at a time, and provide snap-shots of the cellular network on the level of DNA, RNA, protein and metabolites at distinct environmental conditions, in health and disease (Figure 1.3). 16 1.6.1 Transcriptomics The cellular transcriptome comprises all transcripts including mRNAs, non-coding RNAs and small RNAs of the cell at a specific condition [107]. RNA is an important determiner of the molecular constituents of a cell and quantification of transcript expression is often applied to define differences between normal and disease conditions [107]. Alternative splicing is an important variation in the transcriptome and greatly contributes to biological complexity [108]. It generally describes the process where multiple transcripts of distinct lengths are produced from one gene locus, e.g., by joining different numbers of short exons after the removal of ’noncoding’ intron sequences from the pre-mRNA [108]. The generation of transcript isoforms from the same gene through alternative splicing is known to differ between tissues, developmental or disease conditions [108]. Cancer specific PKM2 isoform expression has gained much attention [109]. Methods to define the transcriptome include custom-made and commercial microarrays, which are incubated with fluorescent labeled cDNA and which can cover the ’entire’ transcriptome at a time, or sequence-based RNA-Seq which relies on deepsequencing technologies [106, 107]. Hybridization-based approaches are relatively inexpensive. However, they depend on pre-defined gene coding sequences, crosshybridization leads to a high background noise [107]. RNA-Seq also allows highthroughput and quantitative determination of the entire transcriptome surpassing the shortcomings of the microarray [107]. 1.6.2 Proteomics The proteome comprises the entity of proteins in a cell, which among others, participate in the signaling cascades and catalyze metabolic reactions. Until the emergence of mass-spectrometry (MS), two-dimensional gel electrophoresis was used for protein analysis [106]. MS has further been applied to elucidate post-translational modifications and protein interactions [106]. Additionally, high-resolution MS-based proteomics has enabled quantitative determination of the entire cellular proteome [106, 110, 111]. To determine peptide sequence, quantity, or to identify proteins, protein samples are digested to peptides, chromatographically fractionated, and fragmented into fragment-ion spectra in the mass spectrometer. The spectra are subsequently analyzed [112]. Challenges of MS exist in the lacking possibility to amplify protein samples, the large number of proteolytic peptides produced through digestion, and the limitations to identify proteins based on existing resources [106]. 17 1.6.3 Metabolomics The metabolome comprises of the low molecular weight chemicals, and the comprehensive profiling of metabolites in biofluids, tissues or cells [113]. It is the youngest of the omics techniques, stable, relatively cheap, and highly reproducible [113]. Additionally, it directly profiles the cellular phenotype [113], and thereby represents the most straightforward resource for the integration with metabolic networks. Hundreds to thousands of metabolites can be identified and quantified [114]. Nuclear magnetic resonance (NMR) spectroscopy, liquid chromatography-mass spectrometry (LC-MS) and gas chromatography-mass spectrometry (GC-MS) are the most common analytical methods in metabolomics [113]. Additionally, the potential for pharmaceutics and individualized drug therapy has been pointed out [113]. 1.7 Analysis of omics data in the context of COBRA models The development of methods to analyze omics data sets, reduce their complexity for interpretation, and to make the contents accessible to even non-biochemical experts, is of major interest to a broad research community. Biochemical reconstructions can provide a context for the analysis of omics data sets, since they provide a mechanistic and biologically well defined framework for different kinds of experimental data [2]. Generally, there are multiple ways to combine COBRA models and omics data (Figure 1.2). First, the topology of the reconstruction can be used for the structured visualization of the data, e.g., by pathways, as to facilitate interpretation. As an example, Recon 1 was used for the interpretation gene expression data revealing the effects of gastric bypass surgery on skeletal muscle metabolism [45]. Comparison of the size and pathway topology has further been used for the verification of a reduced metabolic network in ccRCC cancer [104]. Another way to use the network topology for the interpretation of omics data is to identify correlations that could not be derived from the data lone. Analysis of SNPs in the network context revealed examples where SNPs with similar pathological impact were mapped to reactions that belonged to the same correlated reaction set (reactions with 100% correlated activity) [115]. Second, the predictive nature of COBRA models can be exploited. The omics data are hereby used to formulate additional constraints to reduce or alter the shape of the solution space [116]. One example was the integration of quantitative extracellular metabolomic profiles of yeast cells into the network context [117]. Based on sampling analysis, models tailored to different environmental conditions were compared and changes in intracellular metabolic flux states were identified along with the regions of the metabolic network that were perturbed [117]. Finally, 18 omics data can be used to generate cell type or condition specific metabolic models. These constitute subnetworks of the global human models, and the decision as to which reactions to keep and which ones to discard are made based on the data and different criteria outlined below (Table 1.2, Figure 1.4). 1.7.1 Methods for network contextualization A number of algorithms have been published for the integration of omics data sets with GENREs, which have previously been summarized (Table 1.2, [129, 130]). These methods mainly emphasized on the integration of transcriptomic and proteomic data. They differ in whether or not they depend on a defined objective function, whether exchange profiles need to be predefined for the target cell type or is predicted as part of the output model [47, 131], and in which form the input data is compiled and incorporated. The decision as to which of the algorithms to chose, depends on the data set at hand. Methods that depend on an user-defined threshold to distinguish reaction activity from inactivity (through GPRs) can be applied even if only data from one condition is available. Others require data from multiple conditions (Table 1.2). It is important to distinguish cell type and condition specific models. Cell type specific models should be able to perform the entire range of metabolic functions, and can only be build through compilation of a large set of data sets, comprising many different conditions. This requires substantial manual curation based on literature to ensure cell type specific functionality. Tissue (or organ) specific models capture the metabolic functions of cells that comprise the tissue. Condition specific models have to be distinguished from the former subnetworks, since they capture only a subset of cell type specific metabolic functions, namely those active under a particular set of environmental conditions. Based on their limited scope, these subnetworks can be build from single (extracellular metabolomic or transcriptomic) data sets, either from the generic human metabolic model (chapters 2 & 3) or from cell type specific metabolic or signaling models (chapter 4). However, noise in the data will play a role in models that are build from a single data set, which calls for robust methods to build and analyze such condition specific models. Figure 1.4 illustrates the cycle for the generation, analysis and validation of condition specific models. This cycle comprises data generation followed by the extension and refinement of the starting GENRE to enable best possible mapping of the data to the model. Subsequently, the data is integrated into the model context giving rise to the condition-specific metabolic models, using one of the published methods (Table 1.2), or approaches introduced herein (Chapters 2-4). Analysis of the generated models, and potentially comparison of the model predictions to additional data, ideally results in testable hypothesis, which can lead to generation of additional data and subsequent cycles of the systems approach (Figure 1.4). 19 Figure 1.4: The ideal systems biological approach using human metabolic GENREs and omics data integration describes a circular process, where iterative cycles involve and benefit both, experimental discovery and the development of an increasingly comprehensive model that can give rise to high-precision predictions of the healthy and disease perturbed states of biochemical systems. Algorithms like GIMME and iMAT were designed to support completely automated subnetwork generation (Table 1.2), and to achieve this despite general noisiness of transcriptomic data, and post-transcriptional and post-translational regulatory impact (Figure 1.3) [47, 131]. These algorithms predict post-transcriptional or posttranslational regulation based on the model context. In case of GIMME producing a functional model with regard to the stated objective function [131]. More recent algorithms, such as MBA and FASTCORE require the manual definition of high confidence reaction sets, around which functional models are build. This development emphasizes that manual work and biological insight is needed in addition to the data sets in order to ensure the quality of the generated models [103, 126]. 1.7.2 Human cell-type specic metabolic models The number of cell-type specific metabolic models is constantly increasing. To date, condition and cell-type specific metabolic models have been reconstructed for many human tissue and cell types. Most of these networks have been generated using one of the above discussed algorithms and omics data sets, especially transcriptomic and proteomic data (Table 1.2) [46, 47]. In humans the reconstructed cell types 20 include brain [49], heart [125] and cardiomyocyte [121], liver [55, 103], kidney [120], macrophage [21, 119], red blood cell [132], and enterocyte [57]. Among these models, the enterocyte has entirely been reconstructed in a bottom-up process without consideration of omics data sets [57]. Moreover, reconstruction efforts have generated high numbers of metabolic cell line models of normal and cancer tissues [99, 128]. Apart from single cell type models, multi-cell assemblies have also been reconstructed for human brain cells [49], and whole-body systems physiology comprising of adiposite, hepatocyte and myocyte [133]. These models have been applied to interrogate the metabolic aspects in diverse human disease conditions, such as cancer [99], neurodegeneration [49] and diabetes [133]. 1.7.3 Integration of metabolomic data sets A major part of this thesis deals with the integration of metabolomic data into the network context and the generation of condition specific metabolic models. Metabolomic data can be integrated with metabolic networks as qualitative, quantitative, and thermodynamic constraints [117, 134, 135, 136] to increase the precision of the model predictions [116]. The capacity of Recon 2 for the integration of metabolomic data has been demonstrated based on published data of the NCI-60 cell line collection [53, 137]. A number of the recent algorithms considers the use of metabolomic data (Table 1.2). For mCADRE, metabolomic data are discussed as potential clues and for the definition of metabolic functions that are checked for during network pruning [128]. (t)INIT allows the inclusion of metabolomic data as clues for the model building, and the ability of the final model to produce detected metabolites [99, 127]. In addition, MBA and FastCORE depend on the a priori definition of core reaction sets which includes the consideration of metabolomic data [103, 126]. Recently, Gene Inactivation Moderated by Metabolism, Metabolomics and Expression (GIM(3)E) became available, which enforces minimum turnover of detected metabolites [138]. As an example, Chang et al. (2010) used gene expression data and the GIMME algorithm to generate a kidney model, and set exchange constraints based on literature clues and metabolomic data dived from the Human Metabolomics Database (HMDB) [120]. One example that uses a multi-omics approach including metabolomic data is Cakir et al. (2006). The authors used a small set of metabolites to constrain a yeast model, which they subsequently used to identify reporter reactions associated with changing metabolite levels, as a consequence of environmental or genetic perturbations. The reporter reactions were subsequently related to transcriptomic data to infer different forms of regulation [139]. 21 Metabolomic data has further been applied for refinement during the model building process. Selvarasu et al. (2012) published a framework for the integrated analysis of fed-batch culture and metabolomic data with in silico modeling, in order to aid quantitative improvements in the industrial production of recombinant therapeutics in Chinese Hamster ovary (CHO) cells [140]. After the reconstruction of a metabolic CHO model from a mouse model, metabolomic data was used to improve the model [140]. The authors simulated the experimental condition constraining the model according to uptake rates of nutrients: glucose, glutamine, amino acids and oxygen, as well as secretion rates of cell biomass, IgG, ammonia, lactate and CO2 . Insights into metabolic pathways involved in growth limitation in these cells were gained from combination of experimental metabolite trends and flux data obtained from the model [140]. The integration of metabolomic data was further used to drive metabolic discovery. Recon 1 in combination with urine metabolomic profiles and transcriptomic data were used to predict novel putative endogenous substrates of the OAT1 transporter. This was accomplished by comparing predictions from models constrained based on metablomic or transcriptomic data from wild-type and mOAT1 knock-out mice. Intermediates of the polyamine pathway were subsequently experimentally confirmed as putative substrates of mOAT1 [141]. Recon 1, FBA and published metabolite uptake and secretion rates [137] were further used to support findings derived from LC-MS-based isotope tracer studies and a metabolic flux model, and congruently highlighted oxidative phosphorylation as major contributor to ATP production (on average 84% across the NCI-60 cell lines) in cancer cells [142]. Taken together, metabolomic data find variable applications in numerous recent biomedical studies, and this diversity is likely to expand. 1.7.4 COBRA for biomedical applications and personalized health COBRA models are increasingly applied to biomedical questions that by far exceed cancer. In the course of this PhD, a number of studies applied COBRA to personalized medicine. Jamshidi et al. (2011) analyzed differences in serum metabolome profiles of a Hereditary Hemorrhagic Telangiectasia (HHT) patient versus non-HHT controls using Recon 1 [143]. Recon 1 took thereby the role of the whole body metabolic network (including all the organs), and the differences in the plasma were interpreted as the net metabolite changes (uptake and secretion) mediated by cells throughout the human body. The differences in the metabolic profiles were integrated through differential scaling of the coefficients of a non-growth associated biomass objectives, which distinguished the HHT-patient and the controls. Subsequently, the authors used flux span ratios (FVA) to identify decreased energy 22 production and increased flux potentials in nitrogen handling and disposition pathways in the HHT patients, which was linked to an anti-VEGF drug (bevacizumab (Avastin)) [143]. After treatment, the HHT metabolomic profiles of patient and controls became more similar as compared to the pre-treatment HHT sample [143]. Concerning the relevance of the steady-state assumption the authors argued, that since the plasma profiles were derived after over-night fasting, the body would be close or at homeostatic state [143]. Others combined transciptomic, proteomic and uptake rates and plasma and white adipose tissue lipid concentrations (metabolomic data) to generate and analyze a metabolic adiposite model. The authors used the model to investigate metabolic alternations in adiposites that would allow the stratification of obese patients [144, 145]. The set of differentially expressed genes between lean and obese males and females was used for the model building and additionally correlated with the predicted reporter metabolites. Their predictions coincidence with the differential transcriptional (down-) regulation of the mitochondrial pathways in obese. Consequently the authors proposed to increase mitochondrial acetyl-CoA as potential therapeutic target to decrease fat in the patients [144]. Additionally, the androsterone level in plasma was suggested as biomarker for metabolic alternation in these patients [144]. Another study from the same group predicted new potential drugs (anti-metabolites) to target hepatocellular carcinoma (HCC) specifically, while sparing normal cells [127]. Six HCC patient specific metabolic models, a generic HCC model and 83 normal tissue models were generated, through integration of proteomic data with the human metabolic reactions database (HMR) 2.0 and tINIT [127, 146]. Among the predicted anti-metabolites was L-carnitine, which was shown to selectively inhibit growth in HepG2 cells [127]. Although the first steps towards personalized health supported by COBRA and omics data integration have been performed, is the use of metabolomic data sets still underrepresented. The integration of metabolomic data is often limited to few uptake secretion constraints, or used otherwise (e.g., to define non-growth associated objective, or define high confidence reaction sets for the model building), but seldom the primary source for the generation of condition-specific subnetworks, with the study of Mo et al. being one exception [117]. 1.7.5 Existing challenges Methods for the use of metabolomic data for subnetwork generation are less well developed, and further approaches are needed to make better use of these data, as well as to simplify integration of multiple omics data sets. Great potential exists for the interpretation of extracellular metabolomic profiles in the context of human 23 GENREs, in particular for diagnostics and personalized health. However, the integration of biofluid metabolomic profiles is difficult due to the uncertainty of the actual cellular origin of detected metabolites (within the human body), and the fact that only limited sets of metabolites are detected, while the exact composition of the fluids remains unknown. One question is therefore, whether it is possible to overcome the uncertainty connected these metabolomic data sets and provide insight into the metabolic mechanisms using these data? 24 25 All user defined, high-probability reactions, a maximum number of medium-probability reactions, no and a minimal number of reactions comprise the functional output model. This minimal but consistent output model is compiled based on confidence values assigned to each reactions. The confidence values are based on the frequency of appearance of a reactions in the 1000 candidate models, each generated with a random pruning order. Searches for a flux consistent subnetwork which contains all user defined core reactions and a mini- flux consistency mal set of additional reactions. Finds a sub-network by maximizing the sum of evidence scores, and provides a connected and flux consistency functional model. All the included reactions should be able to carry flux. Additionally, production of specified metabolites by the output model is ensured. MBA mCADRE tINIT INIT FASTCORE MADE PROM E-flux In comparison to the preceding version (INIT), which delivered a connected and consistent network, generates tINIT functional networks based on user-defined, cell type specific set of metabolic functions. The algorithm defines the reaction set necessary for the realization of the specified metabolic tasks, in case the resulting model misses to perform a task in the test phase, gab-filling is applied to ensure the functionality of the output model. Additionally, the output model has only irreversible reactions. Compared to INIT it is optional if net production of metabolites is allowed. Generates a subnetwork by removing reactions that are not part of the high-confidence core reaction set, which is defined based on gene expression data, and connectivity clues, while preserving flux capacity of core reactions, and defined metabolic functionality. Creates models that each satisfies a proteome-based objective which are combined into one final GIMMEp model. Maximizes the number of enzymes whose flux activity is consistent with their measured expression level (high flux to highly expressed and low flux to lowly expressed) along with pathway length. Enzyme expression levels are considered as cues rather than fixed determinants of enzyme activity and connected flux. Post-transcriptional regulation is assumed to be the difference between measured expression level and predicted flux. Predicts tissue-specific metabolite exchanges. Predicts metabolic states by setting maximum flux constraints as a function of measured gene expression. Reactions associated with lowly expressed genes are tightly constrained, and those associated with highly expressed genes subject to loose constraints. Generate integrated metabolic-regulatory networks. Requires a metabolic network, a regulatory network, gene expression data from different conditions, and additional regulatory interactions. It uses probabilities, which are estimated from the expression data and different conditions, to represent gene states and gene–transcription factor interactions. Matches most closely the genes that exhibit the most statistically significant changes in gene expression levels. It creates a series of models with high accuracy with respect to direction of differential expression. GIMMEp Solver or pro- gene expression P/A calls MILP LP literature-based knowledge, multi-omics tailored for the use of HPA (proteomic), transcriptmic data, observed metabolites proteomic LP a high-confidence core re- LP action set, based on expression evidence arbitrary scores for high, MILP medium, low and absent proteins (color codes that in the HPA MILP core set of reactions literature-based knowl- highand medium- MILP edge,transcriptomic, probability reactions proteomic, metabolomic and phenotypic data transcriptomic teomic transcriptomic P/A calls based on user no defined threshold Input format 126 human tumor and [128] normal tissue and cell types [128] personalized HCC mod- [127] els [127] liver and macrophage [126] [126] 69 human cell types and [99] 16 cancer types [99] transition from fermen- [124] tative to glycerol-based respiration in S. cerevisiae [124] human heart [125], liver [103] [103], cancer metabolic model [24, 77], E. coli and M. Tubercu- [123] losis [123] [105] [47], [122] [119] [46] [118] Examples for models de- Reference rived metabolic behavior in S. cerevisiae batch cultures [118] transcriptomic or pro- P/A calls LP murine macrophage teomic [119], human macrophage [21], brain cells [49], kidney [120] transcriptomic and pro- P/A calls LP murine macrophage teomic [119] transcriptomic or pro- high, medium and low ex- MILP murine macrophage teomic pression [119], human macrophage [21], brain cells [49], cardiomyocyte [121] transcriptomic gene expression levels LP M. tuberculosis bacterium [105] flux capacity of transcriptomic core reactions and metabolic functions metabolic tasks yes yes yes proteome-based OF no Functional flux is defined through FBA, and active and suppressed reactions are defined based on yes the GPR associations. Subsequently inactive reactions are removed. Removed reactions required for the predetermined functional flux are reinserted to produce a functional submodel. Incorporates the idea of post-transcriptional regulation, but relies on arbitrarily chosen flux distribution (FBA). GIMME iMAT Flux through reactions associated with absent genes are constrained to zero. Binary Objective Data function (OF) required no transcriptomic Table 1.2: Methods for network contextualization. Description Method 1.8 Preview of this thesis The integration of transcriptomic and proteomic data with metabolic models is well established and frequently applied for biomedical applications. However, the contextualization with metabolomic data is less well explored. This thesis advances the integration of omics data by demonstrating the use of both quantitative and semiquantitative extracellular metabolomic data for the inference of internal metabolic network states from extracellular metabolomic profiles. Furthermore, integrative analysis of multiple omics data sets (e.g., metabolomic and transcriptomic data) is carried out. That cell-type specific differences not only exist in metabolic networks, but also concern innate immune signaling, is demonstrated along with the contextualization of a signaling network. The chapters of this thesis focus as follows: • Chapter 1: This chapter describes the background on constraint-based modeling, state-of-the-art omics data integration, and biomedical applications. The text in chapter 1 is in part a reprint from a section that appears in Sahoo, S, Aurich, MK, Jonsson, JJ, Thiele, I (2014) Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. Front. Physiol., 2014, 5:91. I am a contributing author of this publication, and author of the part which forms the basis for the marked section of this chapter. • Chapter 2: A large collection of cancer cell line specific models is generated from quantitative extracellular metabolomic profiles. The models are classified into distinct metabolic phenotypes based on their metabolic strategies and robustness of the models towards environmental and genetic perturbations is explored. The text of chapter 2 is in full a reprint of the manuscript: Aurich, M.K., Fleming, R.M.T., Thiele, I. Metabolic heterogeneity and robustness among the NCI-60 cancer cell lines. Manuscript in preparation. I am the first author of this manuscript, which forms the basis for this chapter. • Chapter 3: This chapter demonstrates the integrative analysis of semi-quantitative extracellular metabolomic profiles and transcriptomic data for two lymphoblastic leukemia cell lines. Interrogation of the inferred internal metabolic network through sampling analysis reveals differences in the use of the internal metabolic networks that are supported by experimental data. Additionally, high incidence of differentially expressed and alternatively spliced genes at 26 rate limiting and commitment steps is observed. The text in chapter 3 is in full a reprint of the manuscript: Aurich, M.K., Paglia, P., Rolfsson, Ó, Hrafnsdóttir, S., Magnúsdóttir, M., Stefaniak, M.M., Palsson, B.Ø., Fleming, R.M.T., Thiele, I. Prediction of intracellular metabolic states from extracellular metabolomic data. (2014) Metabolomics, 1-17. I am the first author of this publication, which forms the basis for this chapter. • Chapter 4: The cell-type specific differences in TLR signaling and the relevance of the network to human diseases is investigated. A gene set is identified and GPRs are formulated for the proteins of the network. Contextualization of a TLR signaling network towards a cell-type specific TLR signaling network and further towards a condition specific, LPS activated TLR signaling network is demonstrated. Finally, prediction of the the energy costs of respective input-output pathways describes one link between signaling and metabolic network. The text of chapter 4 in part is a reprint of the material as it appears in Aurich, M.K. and Thiele, I. (2012) Contextualization Procedure and Modeling of Monocyte Specific TLR Signaling. PlOS ONE, 7, e49978. I am the first author of this publication, which forms the basis for this chapter. • Chapter 5: Conclusions and future directions 27 2 Metabolic heterogeneity and robustness among the NCI-60 cancer cell lines The role of metabolic alternations in human diseases is increasingly recognized and methods are needed that allow fast inference of intracellular metabolic states from extracellular metabolomic profiles. Herein, we generate a set of 120 cancer cell line models based on quantitative, extracellular metabolic profiles using a novel computational method. We explore the metabolic heterogeneity inherent to these cancer cell line models. We provide, for the first time, a systematic assessment of metabolic strategies that the cancer cells may apply to generate energy and cofactors. We observe different oxotypes, which describe distinct ranges of feasible oxygen uptake rates. The power of the presented approach was to reflect such distinct phenotypic behavior solely based on extracellular samples, and despite the uncertainty connected to the medium composition due to the serum. Similar approaches, applied to patient data could be a milestone for clinical applications. 2.1 Introduction The incomplete oxidation of glucose to lactate under normoxic conditions [64]is referred to as aerobic glycolysis. It has been a main focus of cancer research during past decades [147]. However, this irrevocable view is increasingly replaced by the notion that cancer cells employ heterogeneous metabolic strategies beyond aerobic glycolysis [69, 71, 148, 149]. Many cancer cells generate substantial amounts of their energy through mitochondrial oxidative phosphorylation [142, 147, 150]. Moreover, cancer cells use additional fuels, such as glutamine and fatty acids, to support proliferation [71, 151]. These carbon sources can yet again be used in different ways, e.g., different parts of the tricarboxylic acid (TCA) cycle can be employed for glutaminolysis [69, 150, 152, 153]. Reductive carboxylation involves only two TCA cycle reactions run in reverse direction and without producing energy, whereas glutaminolysis in forward direction does yield energy [69, 150, 153]. Apart from different metabolic strategies, cancer cells display distinct robustness towards environmental changes, e.g., nutrient supply or oxygenation [154, 155, 156]. Despite the evident phenotypic differences among cancer cells, no comprehensive assessment exists of the metabolic heterogeneity among cancer cell lines, as well as their flexibility towards environmental changes. 29 Metabolic models can be developed using constraint-based modeling and analysis (COBRA), and comprise comprehensive knowledge-bases of the metabolic network of an organism [2, 9]. COBRA relies on physico-chemical principles and assumes a steady-state of the modeled system [2]. Constraints (e.g., limitation of metabolite uptake and secretion) can be added to increase the precision of the model predictions by eliminating network states that exceed the constraints [116]. A human reconstruction is readily available [45, 53], along with numerous analytical methods to investigate the metabolic differences that arise through the imposed constraints [15, 16]. Metabolomic data derived from body fluids and cell culture supernatant have previously been integrated into metabolic reconstructions [117, 142, 144]. One existing challenge nevertheless remains the handling of serum, or data derived from cells grown with serum, since the exact composition is unknown. As a consequence, the model cannot be adequately constrained. Despite these difficulties, approaches that allow rapid classification of metabolic strategies from metabolomic profiles could have a broad impact on both researchers and clinicians. Recently, liquid chromatography-tandem mass spectrometry (LC-MS) was used to determine the metabolites that were consumed and released by the NCI-60 cancer cell lines [137]. Through combination of the obtained metabolomic profiles with doubling times and transcriptomic data, rapid proliferation was associated with cellular glycine requirements [137]. However, the intracellular pathways that gave rise to distinct metabolomic profiles remained largely a black-box. This data set was, because of its comprehensiveness [137], particularly well suited to define metabolic differences among cancer cells at large scale. Herein, we developed and applied a novel method, deemed minExCard, for the inference of internal metabolic states from extracellular metabolomic data in the context of the metabolic model, while dealing with the uncertainty connected to serum composition. We applied this method to generate 120 cancer cell line models from extracellular metabolomic data and found that the models exhibited a high metabolic heterogeneity in silico and distinct level of robustness when perturbed in silico. This work demonstrates how the combination of extracellular metabolomic data with metabolic modeling can lead to unprecedented insight into different metabolic strategies of cancer cell lines. 30 Figure 2.1: Metabolic models provide a context for the analysis of metabolomic data. a. 1. The refinement step denotes the addition of transport and exchange reaction to allow the uptake and secretion of metabolites detected in the metabolomic profiles of the NCI-60 cell lines [137]. 2. The cancer cell line models were generated using minExCard. In total, 120 cancer models (NCI-60 times 2) are generated from published metabolomic data and the extended metabolic model. 3. The models are analyzed using a set of computational methods. Based on the computational results the models are divided into different metabolic phenotypes and drug targets are predicted for each individual model. 4. The approach is applicable to a variety of biomedical applications. Analysis of patient-specific omics data could be used for the stratification of disease phenotypes and for the prediction of personalized disease intervention strategies. b. Differences in the number of reactions, metabolites and genes across the large set of models. c. Distribution of the number of reactions, metabolites, genes and exchanges among the 120 cell line models. 31 2.2 Results 2.2.1 Generation of heterogeneous cancer cell line models Published metabolomic profiles comprising the uptake and secretion of metabolites from and into the culture medium were integrated into the metabolic model (Fig. 2.1A), [137]. The metabolomic data consisted of two samples per cell line and considerable variations between samples (Supplementary Fig. 2.6) let us to generate one model for each sample rather than averaging the data for each cell line (replicate samples will be referred to as ’-2’). To generate a cancer cell model, the starting model was constrained according to the quantitative metabolite uptakes and secretions measured for the respective sample. Next, a minimal set of additional exchange reactions needed to sustain a growth phenotype was identified based on the model structure, using minExCard. All other metabolite exchanges and internal reactions no longer used by the model were removed, giving rise to the individual cancer cell line model (Fig. 2.1A). The generated 120 cancer cell line models differed with respect to completeness of subsystems, and the numbers of reactions, metabolites, and genes (Fig. 2.1C-D). Variations in the metabolic model content involved all major metabolite classes (Fig. 2.1B-E). Further, large variations existed with respect to maximal growth rates achieved by the models. Many exceeded by far the growth rates expected from any human cell. However, only 15 models could not grow when constrained according to experimental growth rates (Supplementary material). ACHN-2 and UACC-257 were limited to experimental growth rates, only by the imposed metabolic profiles (Supplementary material). Taken together, the diversity of the models suggested that they were a good starting point to investigate metabolic heterogeneity among the cell lines. 2.2.2 Distinction of metabolic phenotypes Metabolic strategies yield different amounts of ATP, e.g., full oxidation of glucose to CO2 yields 30-32 ATP and aerobic glycolysis yields two ATP [157, 158]. Herein, we used the ATP yield as an estimator for distinct pathway utilization by the models. The entire range of ATP yields across the models was large (Fig. 2.2A, ATP yield: min = 2.93, max = 55.3) and exceeded the theoretical measure for aerobic glycolysis. Exact fit with the theoretical ATP yields was not expected, since the models could use additional substrates and reactions to produce ATP (Supplementary Fig. 2.6). Rank-ordered ATP yields described a fairly continuous increase, occasionally interrupted between groups of models (Fig. 2.2A). One gap between groups of models was associated with the switch of the major ATP producing reaction identified through flux splits. This analysis estimates the contribution of each 32 Figure 2.2: Distinction of the models based on energy and cofactor production. a. Rank-ordered ATP yield achieved by the models described a gradual increase rather than accumulated clusters around the theoretical ATP yields of different metabolic strategies. The spread of ATP yields highlights the metabolic heterogeneity among the 120 models, potentially using a mixture of pathways and metabolic fuels for ATP production. Two major strategies for ATP production could be distinguished based on a jump of ATP yield. The mechanistic difference was resolved based on the calculated flux splits enumerating the contributions of all ATP producing reactions to the total ATP production in each individual model, as consisting in the higher contribution of either phosphoglycerate kinase (green squares) or ATP synthase (red squares) to the total ATP production. b. Plotting the contributions of phosphoglycerate kinase, ATP synthase, and succinate-CoA ligase allowed additonal distinction of two OxPhos subtypes. c. An even more fine-grained division of the OxPhos models was achieved considering production strategies of NADPH, NADH and FADH2 production. Two types of glycolysis models could be distinguished through their production routes of FADH2 and six OxPhos subtypes were distinguished based on the main routes of NADH and NADPH production. The table lists for each phenotype (I-VIII) the reactions contributing most to ATP, NADH, NADPH, and FADH2 production. model reaction producing ATP to the total amount of ATP produced [159]. Models with an ATP yield < 4.2 (’glycolytic’ models, n=37, Fig. 2.2A) produced the 33 highest fraction of ATP through phosphoglycerate kinase (PGK). In contrast, models with an ATP yield > 7.36 produced ATP majorly by ATP synthase (’OxPhos’ models, n=83, Fig. 2.2A). Thus, ATP yield and ATP production strategy divided the models into glycolytic and OxPhos phenotypes. Considering differences in the utilization of the TCA cycle, i.e., ATP production of succinate-CoA ligase, allowed further identification of two OxPhos subtypes (Fig. 2.2B). This division was not obvious from the ATP yield (Supplementary Fig. 2.7). Besides ATP, cells need cofactors to support proliferation. Distinct strategies used by the models to produce different cofactors, again identified through flux splits, allowed the division of glycolytic models into two subtypes (Supplementary Tab. 2.2). The previously identified two OxPhos subtypes were subdivided into altogether six subtypes (Fig. 2.2C). Glycolytic subtypes differed only in the major FADH2 producing reaction. Two OxPhos subtypes were associated with high TCA cycle contribution to ATP production, which was associated with high utilization of cytosolic malic enzyme as leading NADPH source. The four remaining OxPhos subtypes used predominantly either isocitrate dehydrogenase (IDH) or dihydroceraminde desaturase for NADPH production. Glyceraldehyde-3-phosphate dehydrogenase was the major NADH producer in OxPhos models with (relative) higher glycolysis based ATP production. 2-oxogluterate dehydrogenase was favored by models with higher ATP synthase contribution (Fig. 2.2C). Thus, predicted strategies of cofactor production allowed an even more fine-grained model classification. 2.2.3 Robustness towards genetic and environmental perturbation So far, we characterized the models based on the imposed constraints and the distinct use of central metabolic pathways. In the following, we predict the behavior of each model towards environmental and genetic perturbations. The course of transformation events shapes the metabolic network and might influence the robustness of cancer cells towards environmental changes later on [160]. Variation of glucose and glutamine uptake, and lactate secretion, each along with variation of oxygen uptake (Phenotypic phase plane analysis (PhPP)) led to two major observations [161]. First, the solution space, which contains all possible network states and which was defined through variation of oxygen uptake, divided the models into three groups: (1) glycolytic models could only grow at low oxygen uptake rates. The group of OxPhos models comprised (2) models growing only at high oxygen uptake rates and (3) models that were indifferent with respect to oxygen uptake rates (Fig.2.3). The latter two groups provided a separation of the OxPhos models that was distinct from the previous analysis. Second, size and form of the solution spaces varied across models (Fig. 2.4). By using form and size of the solution spaces as visual clues (Supplementary Fig. 2.8), we divided the models into six distinct clusters (Fig. 2.4, Supplemental material). Thus, robustness of the models towards environmental 34 changes yielded yet another division of our models. Figure 2.3: Distinct phenotypes with regard to oxygen requirements. This distinction between the OxPhos models (blue) was different compared to the phenotypic classification performed based on energy and cofactor production strategies. In silico gene knock-outs can predict novel drug targets [77]. Herein, we used single gene deletion to investigate the robustness of the models to genetic perturbations. Constraining enzyme function associated with 1279 genes remained without effect on growth capability. Another 34 were essential genes to all models and could constitute metabolic targets for all previously defined phenotypes. Additionally, 11 essential genes were present only in a subset of the models, yet essential to all. The number of essential genes varied across models (min = 92, max = 182, Fig. 2.5A), and was not associated with any phenotype (Supplementary Fig. 2.9). The remaining 228 essential genes affected only a subset of those models that contained the gene. The effect consisted either in complete termination of growth or partial reduction (growth <95%). Some genes appeared in subsets of models and affected yet another subset of those: terminating (n=4) or reducing growth (n=21). Finally, 203 genes were terminating and affecting growth in 1-119 models, while present in all. Surprisingly, many of the essential genes that affected only few models (n<20 models) were associated with central metabolism, e.g., TCA cycle (Fig. 2.5B-C). 35 Figure 2.4: Six model clusters were distinguished according to the models robustness towards environmental changes. Variation of glucose, glutamine, lactate and oxygen allowed for in improved discrimination of OxPhos models. In contrast, no sensible distinction between glycolysis models was achieved through PhPP. Heatmaps display PhPP results for one model of each cluster (and subcluster). Lines in the heatmaps indicate the constraints imposed on the respective, exemplified model. Rare incidence of these essential genes hints towards high dependencies of these models on central metabolic pathways. To further investigate the dependencies, we plotted the rare essential genes ordered according to the PhPP clusters (Fig. 2.4). This revealed an accumulation of rare essential genes in models of cluster 2 and cluster 4 (Fig. 2.5B). The models of cluster 4C (SK-MEL-28, SK-MEL-28-2, and SK-MEL-5) were characterized by particularly small solution spaces (Fig. 2.5B). Besides large variations in the number of essential genes, the combination of gene deletion and PhPP revealed the connection between a small solution space and high incidence of rare essential genes. Cancer cells use the TCA cycle in different ways [69, 150]. Accordingly, diverse 36 Figure 2.5: The models have different sets of essential genes. a. Essential genes for each cancer model. b. Essential genes affecting maximal 20 models, either reducing or terminating growth in subsets of models that include the gene. For the heatmap, we combined genes with the same effects and same pathways (e.g., 42 genes of NADH dehydrogenase) and displayed only the rare essential genes (n<20 models). Appearance of rare essential genes associated with central metabolic pathways, e.g., the TCA cycle characterized clusters 2 and 4C (phase plane analysis), which were characterized by their small phenotypic space. Models of cluster 2 were selectively affected by, e.g., ALDH1L1, NADH dehydrogenase genes, and SLC25A19 KO. c.The diverse ways in which cancer cells use the TCA cycle was reflected the variety of essential genes associated with the TCA cycle, and which included rare essential genes associated with the reactions mediating mitochondrial reductive carboxylation. IDH and aconitase terminated growth in four models only. These models, including both SK-MEL-28 models relied on reductive carboxylation. 37 KOs including the rare KOs ACO2 and IDH2 were associated with this pathway (Fig. 2.5A,C). Interestingly, the reactions associated with the rare KOs operated reductive carboxylation. These two genes terminated growth in four models (SKMEL-28, SK-MEL-28-2, MALME-3-2, and BT-549) and reduced growth in 14 and 11 additional models. Flux variability analysis (FVA) revealed that these models had to operate reductive carboxylation [162], whereas this pathway remained optional for the other models, even when constrained to experimental growth rates. Sampling analysis conducted for the SK-MEL-28 models further confirmed mandatory reductive carboxylation (Supplementary Tab. 2.3, [15, 163]). In agreement with an observed increase in reductive carboxylation under hypoxic conditions [69], reduction of the oxygen uptake rate (lb=ub=-100) rendered 14 additional models dependent on reductive carboxylation. Fourteen models, including the four reductive carboxylation models, belonged to PhPP cluster 4A-B. The remainder belonged to cluster 1B, characterized by a heavily constricted solution space at low oxygen uptake rates compared to, e.g., cluster 4C models (Fig. 2.5B). Our models were therefore not only able to predict reductive carboxylation, but further reproduced the connectivity between low oxygen and reductive carboxylation in cancer cell lines. Phosphoglycerate dehydrogenase (PHGDH) was another KO shared among the four models obliged to reductive carboxylation. Interestingly, SK-MEL-28 and MALME-3M had previously been associated with amplifications of PGDH due to 1p12 gain [148, 164]. Cells with high PHGDH activity produce up to 50% of α-ketoglutarate through this pathway, and PHGDH silencing decreases proliferation through decreased α-ketoglutarate supply [165]. The correct prediction of the dependency of SK-MEL-28 and MALME-3M on PHGDH provides additional support for the presented approach, and the predicted dependency of SK-MEL-28 on reductive carboxylation. 2.3 Discussion Extracellular metabolic profiles can be interpreted in the context of the metabolic model [117, 142, 144]. To date, exploitation of the methods for clinical applications is hampered by the uncertainty connected to the serum composition. Herein, we present minExCard, a novel method that predicts a minimal set of additional metabolite exchanges, absent from measured metabolomics uptake and secretion profiles. This method enabled the generation of condition-specific metabolic models from individual extracellular metabolomic profiles. The combined use of various computational methods allowed the large-scale assessment of metabolic heterogeneity among the models of the NCI-60 cell lines. The cancer cell line models had different ‘oxotypes’, which separated models that changed pathway utilization 38 with oxygen uptake from others that were more restricted to low or high oxygen uptake rates, disregard of oxygen availability. Distinct robustness of cell lines towards environmental and genetic perturbations, as predicted herein, could have important implications for conclusions drawn from experiments performed under normoxic conditions. The context of the metabolic models allows novel biological insights that could not have been drawn from data analysis alone. The integration of metabolomic samples can be applied to many cellular systems and constitute an important step towards the application of metabolic models in personalized medicine. Our novel model generation method constitutes an important step towards the clinical application of metabolic models. In contrast to previous studies [117, 142, 144], we did not only included quantitative constraints, but we used the context of the metabolic models to predicted a minimal set of hypothetical undetected exchanges. Although the added metabolite exchanges could hypothetically constitute valid exchanges (Supplementary Tab. 2.5, Supplementary material), it should be noted that addition of exchanges may differ depending on the chosen objective function. Herein, we refrained from experimental validation of the predicted additional exchanges, since we used published data and small differences in culture conditions can have profound effects on cellular behavior [154]. The discrepancies between duplicate samples in the published data set (Supplementary Fig. 2.6) illustrates how vulnerable the phenotype might be. Yet again, such heterogeneity among clonal cell lines is a known phenomenon. Noise in gene and protein expression has been connected to structural and behavioral differences [166, 167]. Identification of these ’less determined’ exchanges could be an interesting follow-up question, since the discrepancies in the metabolic profiles frequently let to distinct classification of the two replicate cell line models, and main conclusion could not be based on such variation. The combination of established network interrogation methods allowed the distinction of glycolytic and OxPhos phenotypes. The range of predicted contributions of glycolysis and oxidative phosphorylation to ATP production were generally supported by literature (Supplemental Tab. 2) [147, 168, 169]. The contribution of ATP synthase in MCF-7 (68% and 62%) underestimated literature reports > 80% [147, 168], potentially due to condition-specific differences, or our 20% allowance around the experimentally defined metabolite uptake and secretion rates. Visual classification revealed a more gradual transition between size and form of the solution spaces. Together, this corresponds to the observation of glycolytic and OxPhos phenotypes of various specificity among cancer cell lines [150, 160]. Although the utilization of glycolysis, TCA cycle and ETC broadly explained the differences in the data, did we observe outliers (e.g., Fig.2.4, cluster 1B). redThis indicates that the heterogeneity among the models is even greater and depends on more factors than the ones we investigated, and argues for the interrogation of path- 39 way and fuel utilization beyond commonly monitored pathways[69, 71, 148, 149, 151]. However, pathway utilization was not the only determiner of phenotypic differences, since dependency of the cluster 4 models was not explained by pathways utilization but coincided with the susceptibility towards rare KOs (Fig. 2.5B). Both SK-MEL-28 models had KOs in the reactions associated with reductive carboxylation (Fig. 2.5B-C). In which way this reverse directed flux might be compensated by the opposed flux in the cytosolic analog would need to be investigated, however, that such a combination of flux might contribute to the transmission of NADPH between compartments has been indicated [150]. In contrast to previous studies where reductive carboxylation was found to be connected to the IDH1 in the cytosol but not mitochondrial IDH2, was the mandatory reductive carboxylation predicted herein connected to IDH2. Nevertheless, it was not excluded, that IDH2 may still promote reductive carboxylation in tissue or condition dependent manner [69]. Another interesting observation was that the constraints set based on the experimental uptake and secretion rates were cytotoxic in a subset of models. These models had to dedicate resources to deal with excess nutrients, which became obvious by a reduction in growth rates (Fig. 2.4, e.g., cluster 4B). One important factor that could drive excessive nutrient uptake could be non-physiological experimental conditions, e.g., excessive nutrient supply. Glucose uptake might surpass cellular requirements, if the receptors are stimulated maximally [170]. Additionally, nutrient uptake is decoupled from growth factor signals in cancer cells [171]. Hypoxia is believed to drive transformation [150], and tumor cells are exposed to temporal fluctuations of oxygenation [156]. Such environmental changes necessitate metabolic flexibility, which varied among the models (Fig. 2.3). High glycolytic rates are often connected to low oxygen consumption in cancer cells [150]. In comparison, all glycolytic models were limited to low oxygen uptake rates (Fig. 2.3). It has been mentioned that ’physoxia’, which differs between tissues, is closer to experimental ’hypoxia’ conditions, compared to the usual, ’normoxic’ experimental conditions [172]. That this can have important implications for the conclusions drawn from experiments conducted in ’normoxia’ [172]. This accounts particularly for cells that are not limited to low oxygen uptake, which is illustrated by our predicted group of oxygen indifferent models, and the fact that the limitation of oxygen induced reductive carboxylation in additional models (Fig. 2.3). In conclusion, our study furthers the interpretation of extracellular metabolomic profiles in the context of metabolic models and provides biological insights into the metabolic heterogeneity among the NCI-60 cancer cell lines. Moreover, it emphasizes the importance of oxygenation conditions on the behavior of the cancer cell lines. The approach carried out herein is applicable to various cellular systems and 40 holds great potential for personalized health, e.g., to predict the effectiveness of drugs with metabolic targets on cancer or any cell affected by metabolic diseases. 2.4 Matherial and Methods The starting model The genome-scale metabolic reconstruction, Recon 2 covers a total of 1789 genes, 7440 reactions and 2626 unique metabolites distributed over eight cellular compartments. Its predictive capability has been demonstrated, e.g., through mapping of inborn errors of metabolism and different omics data sets [53]. The starting model that was used herein constitutes a subset of Recon 2, and is the same used in a previous study [173]. Infinite constraints were set to lb=-2000, ub=2000, and all exchange reactions in the model were initially opened. Subsequently, constraints were set on exchange reactions of ions (lb=-100), vitamins (lb=-1), essential amino acids (lb=-10) and compounds such as water or protons (lb=-100). Oxygen uptake was constrained to lb=-1000 and ub=0. This range was defined based on reported oxygen uptake rates of a cancer cell line (2.85*10 -6 ml O2 /105 cells/min = 646.013 fmol/cell/hr [174] Additionally, the lower bounds of the superoxide anion and hydrogen peroxide exchanges were set to zero to prevent the generation of models that did not require oxygen uptake. The biomass reaction is usually in units of mmol/gDW/hr. Yet herein, the metabolite uptake and secretion profiles that were mapped were provided in the unit fmol/cell/h [137]. We assumed a unitary cell weight of 1e-12 , which was in the range of the the dry weight (3.645e-12 g) we calculated for lymphocytes in an earlier study [173]. There the dry weight had been inferred from the dry mass (range 35-60ng [175]) and cellular volume (4000 µm3 , [110]) of the human osteosarcoma cell line U2OS, which we related to the cell volume of lymphocytes (243 µm3 ) [176]. By calculating 4000/243=16.46, 60pg/16.46=3.645pg (3.645e-12 g) [173]. According to 1mmol/gdw = 1e+12 fmol/1e+12 cell, no scaling of the biomass was necessary. The lb of the biomass objective function was fixed to a minimal value of lb=0.008 to match the lb defined for the slowest growing cell line in the data set (HOP-92, 88hrs) [177], to ensure that the model building resulted in functional models with non-zero growth. 41 Constraint-based modeling We used flux balance analysis (FBA, [11]) to solve the following problem Z = ∑ cT · v s.t.S · v = 0 lb ≤ v ≤ ub (2.1) lb ≤ ub where S is the stoichiometric matrix consisting of m metabolites and n reactions as defined by the metabolic reconstruction. Z is the objective function and c is the vector of length n that contains the weights with which each of the reactions contribute to the objective function. The lower bound, lb on the reaction flux vi has a non-zero negative value in the case of reversible reactions and zero or greater in the case of irreversible reactions. The upper bound, ub, is greater than zero for reversible or forward reactions. The bounds on exchange reactions, which supply or remove metabolites from the model, are defined as follows: if lb < 0 the metabolite can be taken up, if lb ≥ 0 the metabolite needs to be secreted. FBA solutions are inherently degenerative. Therefore, we minimized the Eucleadian norm of the flux vector while maximizing the stated objective function. This method ensures that the computed flux vector is unique and assumes that the most likely solution is the one that minimizes the sum flux of the reactions, min ∑(v2i ) [15]. Flux variability analysis (FVA) calculates minimal and maximal flux through each reaction in the model through performing FBA. This analysis provides insight into robustness and redundancy of the metabolic network [162]. Herein, FVA was used to define the flux span of the reductive carboxylation reactions. Sampling analysis does not depend on the definition of an objective function, and has previously been applied to investigate differences between metabolic networks [117, 163]. Herein, the ACHRsampler implemented in the COBRA toolbox [15] was used to investigate the feasible steady-state flux space under the given set of constraints for the SK-MEL-28 cell line models. After calculating a set of random points, i.e., warm-up points (n=10,000), the sampling points are collected, by choosing a random direction and a random step length from the calculated center, while remaining within the model’s solution space defined by the constraints. In total, we generated 500,000 sampling points (nFiles = 100, pointsPerFile = 5,000), with 2,500 steps in between two collected sampling points (stepsPerPoint=2,500), to better support mixing of the sampling points throughout the solution space. The 42 set of sampling points can be seen as the probability distribution of the flux through each single reaction in the network. Data integration and model building The metabolite consumption and release profiles of 140 metabolites comprised two samples for each of the 60 NCI-60 cell lines (120 samples) [137]. From the entire set of detected metabolites, we only used the calibrated (quantitative) uptake and secretion fluxes (115 metabolites), which were provided in fmol/cell/hr. Metabolite identifiers in the data were mapped to the metabolite abbreviations in the starting model. The metabolite aminoisobutyrate was not part of the starting model and was excluded. Based on the metabolite abbreviations, we identified the existing metabolite exchange reactions. If no exchange reaction existed in the model but the metabolite itself was part of the model, a new exchange reaction was added to the model. In addition to the exchange reactions, transport reactions need to be present in the model to allow the transport of the metabolites between extracellular space and model cytosol. Transport reactions need to be added for all metabolites for which we added exchange reactions. These transport reactions were identified from the literature. We added a diffusion reaction, if no transporter for the metabolite could be identified. The additions that we made to the model based on the metabolomic data comprised 43 transport and 36 exchange reactions (Supplementary Tab. 2.7). Presence of an exchange and transport reactions does not ensure that a metabolite can be consumed or secreted by the model, since anabolic and/or catabolic pathways might not be present or even still be unknown [53, 58]. To identify the subset of metabolites the model could consume and secrete, we performed FBA, while enforcing small uptake (ub=-0.00001) or secretion (lb=0.00001) for all mapped metabolite exchanges. All metabolites that could not be consumed (15) or secreted (15) by the model were discarded (Supplementary Tab. 2.6). The identification of metabolites that are not part of the metabolic reconstruction is common, and pathways for these metabolites need to be added in future releases for the generic model [53, 173]. If uptake of a metabolite was possible in the generic model but not secretion, only metabolite secretions were discarded from the metabolic profiles, while uptakes remained present, and the other way around. After the sets of ’qualitatively’ feasible metabolite exchanges was identified, we mapped the sets of metabolite uptake and secretions of one sample at a time to the starting model. We imposed each detected, quantitative fluxe X as constraints to the bounds of the respective metabolite exchange reaction while considering a 20% allowance around X (lb = X ∗ 0.8 and ub = X ∗ 1.2). The set of exchanges detected for one sample was consecutively mapped to the starting model. After constraints were placed on one 43 exchange reaction, FBA was performed to check if the model was still feasible. Although the starting model was able to perform all qualitative metabolite exchanges that were mapped, certain quantities, or combination of constraints could still render the model infeasible. In case of infeasibility, the original bounds of the model were restored, and we proceeded to the next set of constraints. Quantitative constraints rendered 27 preliminary cell line models infeasible (Figure 2.1 A). Of these 27 models 25x2, 1x1, and 1x4 exchange constraints were restored during the data integration. All 120 preliminary models, each with the individual constraints detected for one sample, were subjected to our new model building method. The idea behind the method was that although metabolic profiles are likely to be incomplete, the context of the metabolic model can be used to predict a hypothetical, minimal set of missing exchanges, to maximally constrain the solution space (containing the set of feasible flux distributions) of the model. Incompleteness of the metabolic profiles results from limitations of detection methods (LC-MS or NMR), or the fact that the composition of medium, e.g. containing or consisting of serum, is is not entirely defined, and therefore the list of metabolites utilized and released by the cells. In comparison to the number of metabolites detected in metabolomics approaches (herein 115) is the number of exchanges in the starting models very high (464 metabolite exchanges). Adding constraints to the model limits the number of feasible network states, and ideally, only biologically relevant ones would remain. Constraining the model too much by removing all but the detected metabolite exchanges would inevitably lead to an infeasible model because not all metabolites might have been measured (e.g., oxygen uptake rates or undetected substrates herein). In order to maintain a functional model on the one side, while constraining the model as far as possible on the other, we formulated an LP problem. This LP problem would return a solution relying on a minimal set of exchanges (minimize cardinality) needed, in addition to the experimentally defined uptakes and secretions, to sustained a feasible model. The procedure was the same for all 120 models: First, an an irreversible model was generated by duplicating the reversible reactions. Subsequently, all exchange reactions not part of the respective metabolic profile were identified and flagged to be part of the minimized set of exchange reactions. The LP problem was solved. All unused exchange reactions were identified from the LP solution, and all exchange reactions that were not part of the minimal set of exchanges were closed in the reversible model. The submodel was extracted using a function (identifyBlockedRxns, epsilon=1e-4 ) from the FASTCORE algorithm for reconstruction of context-specific metabolic networks [126]. The steps were iterated until the minimal exchange reaction network was identified to build the final cancer cell line specific model. 44 Growth rates Cell line specific growth rates [177], which agreed with [137] were used as constraints to analyze the ability of the models to realize experimental growth rates. An alternative set of NCI-60 growth rates (http://dt p.nci.nih.gov/docs/misc/common_ f iles/cell_list.html) did not yield any different results. Growth rates were only used as constraints if explicitly stated. Flux split ratios and ATP yield Flux splits can be used to investigate metabolism in a metabolite-centric view in addition to commonly used fluxes [159]. Herein, we calculated flux splits to obtain information on the distinct production strategies of the cancer cell lines models for ATP and cofactors (NADH, NADPH, and FADH2). The flux splits were calculated based on the flux vectors identified through optimizing ATP production for each model. All reaction fluxes producing the metabolite i were identified: Pi,j = Si,j ×Vj for all reactions j as Pi,j > 0. From the sum of production fluxes Φi = ∑ Pi,j , the percent contributions were calculated Pi ∗ = Pi,j /Φi as specified [159]. However, prior to summarizing the total production flux Φi , certain reactions, e.g., transport reactions or other transformations of no interest to our analysis, were removed (Tab. 2.1). Subsequently, the reaction with the maximal Pi ∗ was identified as the major producer of ATP, NADH, NADPH, and FADH2. Based on the combination of major producer reactions, the 120 models were classified into eight different phenotypes (Supplementary Tab. 2.2). The ATP yield was defined by dividing the ΦATP , by the glucose uptake of each respective model. It should be noted that although we formulated the ATP yield according to glucose uptake, uptake of other carbon sources, e.g., glutamine, was still possible since no additional constraints were applied in this analysis. Phenotypic Phase Plane Analysis The robustness of the 120 models towards environmental perturbations was investigated using phenotypic phase plane analysis (PhPP) [161]. Thereby, fluxes through two exchange reactions representing metabolite uptake or secretion are fixed at different intervals while setting biomass production as the objective function, using normal FBA. For each step, the optimal value was computed and plotted as heat maps. Herein, oxygen uptake was varied in combination with either glucose uptake, glutamine uptake, or lactate secretion. All other reaction constraints remained unchanged. The range that was tested was defined based on the variability of the 45 Table 2.1: Reactions discarded from flux split analysis (and ATP yield). ATP ATPtm ATPtn ATPtx ATP1ter ATP2ter EX_atp(e) DNDPt13m DNDPt2m DNDPt31m DNDPt56m DNDPt32m DNDPt57m DNDPt20m DNDPt44m DNDPt19m DNDPt43m ADK1 ADK1m NADH NADHtpu NADHtru NADtpu NADPH NADPHtru NADPHtxu FADH FADH2tru FADH2tx constraints set throughout the set of 120 models: oxygen uptake rate was initially 0 and decreased in steps of 20 units until an uptake rate of -1000 was reached. Glucose uptake rate was initially 0 and decreased in steps of 20 units to -1080 (lowest and highest glucose uptake was -38*0.8=-30 and -860*1.2=-1032 among the models). Glutamine uptake rate was initially 0 and decreased in steps of 20 units to -400 (lowest and highest glutamine uptake was -13.87*0.8=-11.096 and -304.27*1.2=365.124 among the models). Lactate secretion rate was initially 1620 and decreased in steps of 20 units to 0 (lowest and highest lactate secretion was 32.35*0.8=25.880 to 1345.14*1.2=1614.2). The unit of all flux values is fmol/cell/hr. Gene deletion We performed single gene deletion, using the function implemented in the COBRA toolbox for each of the 120 models [15]. A KO was defined as growth rate of the perturbed model was ≤5% of the growth rate of the unperturbed model. Reduced growth was defined as ≤95% but ≥5% of growth of the unperturbed model. All calculations were performed using TomLab cplex linear solver and matlab. 46 2.5 Supplementary material This section captures tables published as supplemental material. Table 2.2: Distinct Phenotypes. ATP ATPS4m ATPS4m ATPS4m ATPS4m ATPS4m ATPS4m PGK PGK NADH AKGDm AKGDm GAPD GAPD GAPD MDHm GAPD GAPD NADPH ICDHyrm ME2 DHCRD1 ICDHyrm ME2 DHCRD1 LALDD LALDD FADH SUCD1m SUCD1m SUCD1m SUCD1m SUCD1m SUCD1m SUCD1m FAOXC160 Type 8 7 6 5 4 3 2 1 Frequency 6 41 4 16 12 4 29 8 In total, 112 metabolites were mapped to our model. For those, we performed spearman’s rank clustering after normalization of each column for both, metabolites and cell lines (Supplementary Fig. 2.6). Although the samples of the model cluster together, differences do exist between the samples, e.g. for the two samples of M14, there are a number of metabolites which differ in whether they were consumed (blue) or released (red). Based on this discrepancies between the samples, we decided to build a model for each sample rather than combining the replicates and build a model for each cell line. Additional exchanges The number of exchanges added to maintain functional models varied between 13 and 28. The unique set of additional metabolite exchanges (n=54) was each added at least in one model, and two metabolites were added to all 120 models (Supplementary Tab. 2.5), i.e. O2 , which was only uptake, bilirubin-glucuronoside (bilglcur) which had to be either secreted or consumed in all 120 models. The pattern of uptake (n=76) and secretion (n=44) of bilglcur was opposed to the exchange profiles of bilirubin, which was subject to uptake in 44 samples and secreted in 76 samples (Supplementary Tab. 2.5). The next most frequently added exchange was 2-Hydroxybutanoic acid, which had to be secreted by 116 models. Interestingly, 2-Hydroxybutanoic acid has been suggested as biomarker for the detection of colorectal cancer by the multiple logistic regression model [178]. The additions further contain a number of fatty acids uptakes that had to be added, e.g. phytanic acid (n=101, Supplementary Tab. 2.5). Dietary branched-chain lipids like phytanic acid have been linked to various cancers as well as neurological diseases [179, 180]. Intake of phytanic acid or phytanic acid-containing foods has been connected to an 47 ACHN ACHN_2 SK-MEL-28 SK-MEL-28-2 median -549.1930966 -508.2469954 -815.4456521 -771.9925793 std 566.6515157 549.0111082 572.7923604 553.0852402 min -1780.561244 -1779.061376 -1999.99014 -1999.976868 max 255.0627464 251.4236693 -19.18158557 -31.52032852 pyruvate dehydrogenase mean 29.07586611 21.57907528 0.009248541 0.006737046 median 29.35353201 21.90132532 0.006416373 0.004685093 std 5.39244374 4.793963776 0.009230659 0.006756143 Table 2.3: Sampling results of the isocitrate dehydrogenase and pyruvate dehydrogenase. isocitrate dehydrogenase mean -620.8973661 -588.0353818 -887.6620988 -854.0857531 min 2.948301163 0.027426137 1.04495E-09 9.15917E-09 max 52.74281973 41.57521337 0.088072245 0.062690636 48 49 Metabolite (b) homoserine 4-hydroxybenzoate aminoisobutyrate ascorbate pyruvate homocystine ADMA allantoin cotinine glycerol_2 NMMA trimethylamine-N-oxide aconitate adipate biotin citrate/isocitrate fru-1,6-DP/fru-2,6-DP/glc-1,6-DP maleate hippurate malonate salicylurate methylmalonate UDP-galactose/UDP-glucose hyodeoxycholate/ursodeoxycholate chenodeoxycholate/deoxycholate lithocholate taurolithocholate phenylacetylglycine ascb_L pyr N/A N/A alltn N/A glyc N/A N/A N/A N/A btn cit/icit fdp/f26bp/ N/A N/A HC00319 N/A HC00900 udpgal/udpg N/A N/A HC02191 HC02192 N/A BIGG metabolite hom_L 4hbz Direct Recon 1 EX_hom_L(e) Ex_4hbz[e] not in Recon 2 excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated excluded because not calibrated Comment no uptake/secretion in Recon, therefore excluded right away no uptake/secretion in Recon, therefore excluded right away Table 2.4: Excluded were uncalibrated metabolites and those that could not be produced nor consumed by Recon. Method (a) HILIC IPR HILIC IPR IPR IPR HILIC HILIC HILIC HILIC HILIC HILIC IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR IPR Table 2.5: Added exchanges. Exchange added EX_bilglcur(e) Frequency (n) secretion (n) uptake (n) Comment 120 44 76 The pattern of uptake and secretion corresponds exactly to the CORE profiles of bilirubin, which was 44x uptake and 76x secretion. EX_o2(e) 120 0 120 EX_2hb(e) 116 116 0 2-Hydroxybutanoic acid; FA; Biomarkers for detecting colorectal cancer selected by the multiple logistic regression model. DOI: 10.1371/journal.pone.0040459, [178] EX_thmmp(e) 111 10 101 In the CORE profile thiamine (thm) is secreted by 106 models; thm is generated from Thiamin monophosphate by THMP thiamine phosphatase. Further there is secretion of thiaminetriphosphate in 4 of the models (see below) EX_urea(e) 111 111 0 EX_his_L(e) 107 0 107 is also produced from L-Carnosine which is taken up by 68 samples of the CORE profiles, maybe it’s a matter of how much is needed if carnosine is not sufficient EX_dmhptcrn(e) 101 101 0 carnitine usual that they are secreted (excreted in urine) EX_phyt(e) 101 0 101 FA EX_tdchola(e) 92 92 0 The CORE profile says 94 times uptake, however seems not possible for the model, and during model building 92 secretions are added to the 26 present ones. EX_pydx(e) 85 0 85 the remaining 35 models have 4-Pyridoxate uptake, which can alternatively be produced from Pyridoxal EX_utp(e) 85 0 85 EX_co2(e) 83 83 0 EX_gtp(e) 82 0 82 EX_gthrd(e) 77 0 77 EX_i(e) 76 0 76 EX_triodthysuf(e) 72 72 0 uptake of Triiodothyronine (triodthy) > -1e-07 in 72 samples in the CORE profiles EX_atp(e) 68 0 68 EX_5mthf(e) 56 56 0 EX_7thf(e) 56 0 56 EX_h(e) 51 51 0 EX_udp(e) 35 0 35 EX_gdp(e) 32 0 32 EX_adp 27 0 27 EX_lpchol_hs(e) 24 0 24 EX_tchola(e) 20 20 0 EX_tag_hs(e) 19 19 0 EX_vitd3(e) 14 0 14 EX_5mta(e) 8 8 0 EX_nh4(e) 7 7 0 EX_so4(e) 7 0 7 Ex_5hoxindoa[e] 7 7 0 EX_cbasp(e) 6 0 6 EX_Lcystin(e) 5 0 5 EX_cmp(e) 5 0 5 EX_nac(e) 5 0 5 EX_dopa(e) 4 4 0 EX_nad(e) 4 0 4 EX_thmtp(e) 4 4 0 EX_tymsf(e) 4 4 0 EX_34dhphe(e) 3 3 0 EX_pheacgln(e) 3 3 0 EX_s2l2n2m2masn(e) 3 0 3 N-glycan EX_strch1(e) 3 0 3 EX_ach(e) 2 2 0 EX_estradiolglc(e) 2 2 0 EX_estrones(e) 2 0 2 EX_pchol_hs(e) 2 0 2 EX_tmndnc(e) 2 0 2 FA EX_3aib_D(e) 1 1 0 EX_4mop(e) 1 1 0 EX_fru(e) 1 1 0 EX_strdnc(e) 1 0 1 FA EX_ttdca(e) 1 0 1 FA EX_urate(e) 1 1 0 increased risk for follicular lymphoma, small lymphocytic lymphoma/chronic lymphocytic leukemia, and non-Hodgkin lymphoma risk [181]. Plasma phytanic acid concentration were significantly associated with intake of dairy fat, however no direct causal relationship could be established to prostate cancer [180, 182]. Prostate as well as other cancers overexpress alphamethylacyl- CoA racemase (AMACR), an enzyme that regulates the entrance of branched-chained fatty acids into peroxisomal alpha- and beta-oxidation [179, 183]. Further, alternative splicing produces 50 Table 2.6: Metabolite uptake and secretion not possible in the model. Secretion not possible EX_4HPRO(e) Ex_carn[e] EX_cmp(e) EX_crn(e) EX_fol(e) EX_nac(e) EX_phe_L(e) EX_pnto_R(e) EX_ppa(e) EX_sucr(e) EX_thr_L(e) EX_trp_L(e) uptake not possible EX_4pyrdx(e) Ex_crtn[e] EX_gchola(e) EX_gthox(e) Ex_kynate[e] EX_oxa(e) EX_tchola(e) EX_tdchola(e) EX_thyox_L(e) EX_urate(e) Ex_5hoxindoa[e] EX_gdchola(e) a distinct transcript of AMACR with distinct biochemical properties and singlenucleotide polymorphisms (SNPs) have been observed, that are elevated in prostate cancer compared to normal tissue[183, 184, 185]. Most models can grow at experimental growth rates Cancer cell lines are known to be heterogeneous [160, 186, 187, 188]. One distinctive feature consists in the variability of doubling times of individual cell lines [137, 177]. Large variations of minimal and maximal achievable growth rates were observed across models. These differences were the consequence of the imposed quantitative constraints which the models had to deal with (Supplementary Fig. 2.6). Yet, the heterogeneity with respect to growth rates supported that the quantitative metabolomic differences had successfully been translated into distinct solution spaces of the generated models [116]. Two models did not reach up the in vivo growth rates, whereas the remainder 13 models exceeded experimentally reported growth rates, while dealing with the enforced quantitative constraints. For example, the SK-MEL-28-2 achieved a minimal growth rate (0.034 fmol/cell/hr, 20.5hr) that exceeded the experimental measurement (bound ub=0.023 fmol/cell/hr, 35.1 hr) (Table 2.8). In silico growth of the ACHN-2 model agreed particularly well with the experimental growth (in silico: max = 0.0206 fmol/cell/hr, min = 0.008 fmol/cell/hr versus experimental: lb=0.020 fmol/cell/hr, ub=0.030 fmol/cell/hr). Additionally, the UACC_257 model was limited to the experimental growth rate +/-20% (Growth rates max=0.0155,min=0.008; lb=0.0136 ub=0.0163). The ACHN-2 and UACC_257 models were good examples of how the model specifically predicted experimental growth rates, as a consequence of the applied metabolomic constraints. 51 Figure 2.6: Flux data mapped to the reconstruction shows variation between replicates despite the samples cluster together. Replicate models were more similar in ATP production than growth Replicate models were more similar in ATP production than growth. Growth rates of models of the same cell line could be very different (Supplemtary Table 1). We found no correlation (anti-correlation) between ATP yield achieved by the models and maximal in silico growth, as can be expected as the two are competing objectives (Supplementary Fig. 2.10). We compared the similarity of the replicate 52 Figure 2.7: ATP yield is not informative for the division of OxPhos models (blue and red). Figure 2.8: Phase plane analysis revealed distinct solution spaces for variation of nutrients and oxygen among the NCI-60 models. This distinction was performed through visual inspection, however the transitions between the depicted examples were rather fluid. models with regard to the maximum growth rate predicted for each model. Ordering the models accordingly, sorted six pairs of cell line models in direct consecutive order. Similarly, using the ATP yield per unit of glucose, 16 models appeared in direct consecutive order. ATP yield was therefore the stronger binding factor for models generated from duplicate samples. Considering flux through both ATP producing reactions in glycolysis (PGK and pyruvate kinase) against ATP production in the electron transport chain (ETC) con- 53 Figure 2.9: Highest or lowest number of KOs were not associated with any phenotype defined by the previous analysis. The higher (dark blue) an the lower group (light blue) were defined as mean+/-2STD. Average was 132 KO genes (+/-14). Figure 2.10: ATP yield does not correlate with maximal growth rate of the models. verted five OxPhos models into glycolytic models (Supplementary Fig. 2.11). When looking for the reaction with highest contribution, the fact that glycolysis has two steps which produce ATP was neglected. So we checked what impact summa- 54 Figure 2.11: Metabolic strategies considering both ATP producing glycolysis reactions. Five models switch to a higher collective contribution of glycolysis compared to ATP synthetase contribution to total ATP producing flux. tion of the contribution of the two reactions has on the ranked ATP yield plot, and hence the classification into glycolytic and OxPhos phenotypes. Result, five models switch from OxPhos to glycolysis phenotype (NCI-H522, both SF-295 and both SNB-19 models). Both NCI-H522 models switched and became glycolysis models, the four others were CNS models, and the only CNS glycolysis model. Glycolysis, TCA cycle and ETC together were the major sources of ATP Cells can produce ATP in many different ways. Altogether 26 reactions were identified that could contribute to the total amount of ATP produced. Among those were the reactions in glycolysis, the TCA cycle and ATP synthase the major ATP producers in each of the models. Glycolysis contributed between 7.8% and 86.6% of total ATP production per model. Equally, contribution of ATP synthase varied between 4.7% and 68.1%. Combined contribution of glycolysis and ATP synthase ranged between 73.0% and 97.5%. Including the contribution of succinate-CoA ligase (0.4%-10.1% of total ATP production) the three pathway contributed 81.1% to 98.1% of the total amount of ATP per model. Although the combined contribution of glycolysis, TCA cycle and ATP synthase was very high in all models, could the fraction contributed by either glycolysis or ATP synthase be very different. 55 Distinction of the clusters derived from phase plane analysis Most but not all cells depend on exogeneous sources of glutamine for nucleotide and hexosamine biosynthesis [189], other cancer cell depend on constant glucose supply [160]. Accordingly we classified the models into distinct clusters, based on their dependencies towards uptake of glucose, glutamine, and oxygen, as wellas lactate secretion. Cluster 1 (n=56) was characterized by the requirement for high oxygen uptake, no dependency upon glutamine uptake or lactate secretion, yet subtypes among the models of this cluster existed. Cluster 1B could only grow given lactate secretion was limited, which coincided with the low utilization of succinate-CoA ligase by these models. Cluster IC was characterized by high glucose and high lactate secretion (SF-295 models). Accordingly, these two models were most shifted towards the glycolytic phenotype (Fig. 2.4). Cluster 2 (n=7) was characterized by the need for sufficiently high uptake of oxygen, low glucose uptake and lactate secretion, as well as indifference with regard to glutamine uptake. Cluster 3 (n=6) showed similar characteristics but with a larger overall solution space compared to cluster II. Cluster 4 models could use oxygen only up to a limited extent. Three subclusters were distinguished, cluster 4A was characterize by increasing oxygen requirement as a consequence of increasing glucose uptake. This increase was also indicated in cluster 4B, however the models belonging to this subcluster had a very restricted solution space and operated only at very high glucose uptake and lactate secretion rates. In comparison was cluster 4A limited to low lactate secretion rates. Both subclusters were only able to grow with relatively low glutamine uptake. Cluster 4 comprised of all glycolytic models, however within the subtypes of cluster 4, the glyolytic models, subtype I (as defined by flux split analysis) spread across cluster 4A and 4C (Fig. 2.4), such that based on this analysis, these two glycolytic subtypes could not have been distinguished. The division into subtypes 4A-C did not describe distinct clusters in the 3D visualization, or in other words with regard to PGK, ATP synthetase or succinate-CoA ligase utilization by those models (Fig. 2.4). We observed an accumulation of melanoma models in cluster 4. However, not the entire set of melanoma models was part of cluster 4. Two melanoma models each were associated with cluster 5 and 6. The models of cluster 5 (n=6) and cluster 6 (n=8) were independent from oxygen uptake (provided uptake > 0 fmol/cell/h). Further, the models of these clusters were widely unlimited with regard to glutamine or glucose uptake. Cluster 5 distinguished from cluster 6 through their limited lactate secretion ability. 56 KO genes in the TCA cycle Cancer cells are known to operate the TCA cycle differently [69, 150]. Robustness of the model towards gene KO depends on the part on the TCA cycle that is used by individual model. KO genes appeared throughout the TCA cycle (Fig. 2.5). Except for citrate synthase, all TCA cycle reactions were associated with one or more KO genes. Herein, all KOs apart from those mentioned in the main text are discussed. Malate dehydrogenase 2, (MDH2, Entrez gene ID:4191) and succinate-CoA ligase (SUCLG1, Entrez gene ID:8802) let to a reduction in growth of few models, but did not terminate growth in any. Two genes associated with the pyruvate dehydrogenase complex, dihydrolipoamide S-acetyltransferase (DLAT and PDHB, Entrez gene ID:1737 and 5162) had a growth reducing effect on one model only (ACHN, KO was 93% of WT growth). Dihydrolipoamide dehydrogenase (DLD, Entrez gene ID:1738) was associated with 12 reactions in the model and KO gene for all 120 models. Among those 12 reactions was TCA cycle reaction alpha-ketoglutarate dehydrogenase, and pyruvate dehydrogenase. Further simulation of the impact of constraining flux through each reaction individually revealed that the reaction behind the KO was 2-Oxoadipate:lipoamde 2-oxidoreductase (in lysine Metabolism), yet had nothing to do with the enzyme function in the TCA cycle, pyruvate dehydrogenase nor glycine cleavage. It demonstrates the potential of the use of Recon as knowledge-base, since the effect of the KO of this gene could have falsely been connected to its function in any of the other reactions. Succinate dehydrogenase was a KO gene for 118 models, however the two SK-OV-3 models were an exception. Fumerase was a KO gene in all 120 models. Additionally, three mitochondrial metabolite transporters for H2 O, coa and alpha-ketoglutarate/malate transporter affected a subset of models. The two models SF-539-2 and NCI-H226-2 were sensitive to KO of the mitochondrial water transport (AQP8, Entrez gene ID:343), along with their requirement for complex I of the electron transport chain. CoA transport was slightly reduced in five models (SLC25A16, Entrez gene ID:8034) SKMEL-5, SK-MEL-28s and HCC-2998s. Finally, growth rates were reduced in SKMEL-5 and SF-539-2 after KO of the gene associated with (SLC25A11, Entrez gene ID:8402). Further discussion of the four models with reductive TCA cycle ux The four models were all glycolytic models, and all but MALME-3M-2 belonged to the glycolytic subtype 1 (Supplemental Tab.2.2). Additionally, they belonged to cluster 4 of the phase plane analysis, distributing over subclusters 4A and 4B. Cluster 4 models could use oxygen only up to a limited extent. Cluster 4A was characterized by increasing oxygen requirement as a consequence of increasing glucose 57 uptake. This increase was also indicated in cluster 4B, however the models belonging to this subcluster had a very restricted solution space and operated only at very high glucose uptake and lactate secretion rates. In comparison was cluster 4A limited to low lactate secretion rates. Both subclusters were only able to grow with relatively low glutamine uptake. Although all four models exceeded the mean number of KO genes (132 +/-14), yet they did not have the highest overall number of KO genes. Figure 2.12: ATP yields do not correspond to the separated clusters of models from the Phase plane analysis 58 Table 2.7: Reactions added to the starting model. Name 5hoxindoatr glyaldtr PEPtr gudactr Lkynrtr Lkynrtr2 Lkynrtr3 LikeBALAPAT1tc LikeBALABETAtc CALAtr CRTNtr KYNATEtr KYNATEtr2 CITRtr 3ANTHRNtr SPMDtr SPRMtr SBT_Dtr THYMDtr2 HKYNRtr QULNtr 2PGtr 4hbztr carntr cholptr cyst_Ltr dcmptr dhaptr dmglytr ethamptr fumtr g3pctr glcurtr hcys_Ltr icittr L2aadptr phpyrtr xantr xmptr xtsntr 3pgtr udpglcurtr IMPtr Ex_2pg[e] Ex_4hbz[e] Ex_5hoxindoa[e] Ex_cala[e] Ex_carn[e] Ex_cholp[e] Ex_citr_L[e] Ex_crtn[e] Ex_cyst_L[e] Ex_dcmp[e] Ex_dhap[e] Ex_dmgly[e] Ex_ethamp[e] Ex_fum[e] Ex_g3pc[e] Ex_glcur[e] Ex_glyald[e] Ex_gudac[e] Ex_hcys_L[e] Ex_icit[e] Ex_kynate[e] Ex_L2aadp[e] Ex_Lkynr[e] Ex_pep[e] Ex_phpyr[e] Ex_quln[e] Ex_sbt_D[e] Ex_spmd[e] Ex_sprm[e] Ex_xan[e] Ex_xmp[e] Ex_xtsn[e] Ex_3pg[e] Ex_3hanthrn[e] Ex_udpglcur[e] Ex_hLkynr[e] Formulas so4[c] + 5hoxindoa[e] <=> 5hoxindoa[c] + so4[e] na1[e] + glyald[e] <=> na1[c] + glyald[c] hco3[e] + pep[c] <=> hco3[c] + pep[e] gudac[e] <=> gudac[c] Lkynr[e] <=> Lkynr[c] phe_L[e] + Lkynr[c] -> phe_L[c] + Lkynr[e] leu_L[e] + Lkynr[c] -> leu_L[c] + Lkynr[e] 2 na1[c] + cl[c] + cala[c] -> 2 na1[e] + cl[e] + cala[e] h[c] + cala[c] -> h[e] + cala[e] cala[e] <=> cala[c] crtn[c] -> crtn[e] akg[c] + kynate[e] <=> akg[e] + kynate[c] kynate[e] <=> kynate[c] citr_L[e] <=> citr_L[c] 3hanthrn[e] <=> 3hanthrn[c] spmd[e] <=> spmd[c] sprm[e] <=> sprm[c] sbt_D[e] <=> sbt_D[c] thymd[c] <=> thymd[e] trp_L[c] + hLkynr[e] <=> hLkynr[c] + trp_L[e] so4[c] + quln[e] <=> so4[e] + quln[c] 2pg[e] <=> 2pg[c] 4hbz[m] <=> 4hbz[e] carn[e] <=> carn[c] cholp[e] <=> cholp[c] cyst_L[e] <=> cyst_L[c] dcmp[e] <=> dcmp[c] dhap[e] <=> dhap[c] dmgly[e] <=> dmgly[c] ethamp[e] <=> ethamp[c] fum[e] <=> fum[c] g3pc[e] <=> g3pc[c] glcur[e] <=> glcur[c] hcys_L[e] <=> hcys_L[c] icit[e] <=> icit[c] L2aadp[e] <=> L2aadp[c] phpyr[e] <=> phpyr[c] xan[e] <=> xan[c] xmp[e] <=> xmp[c] xtsn[e] <=> xtsn[c] 3pg[e] <=> 3pg[c] udpglcur[e] <=> udpglcur[c] imp[e] <=> imp[c] 2pg[e] <=> 4hbz[e] <=> 5hoxindoa[e] <=> cala[e] <=> carn[e] <=> cholp[e] <=> citr_L[e] <=> crtn[e] <=> cyst_L[e] <=> dcmp[e] <=> dhap[e] <=> dmgly[e] <=> ethamp[e] <=> fum[e] <=> g3pc[e] <=> glcur[e] <=> glyald[e] <=> gudac[e] <=> hcys_L[e] <=> icit[e] <=> kynate[e] <=> L2aadp[e] <=> Lkynr[e] <=> pep[e] <=> phpyr[e] <=> quln[e] <=> sbt_D[e] <=> spmd[e] <=> sprm[e] <=> xan[e] <=> xmp[e] <=> xtsn[e] <=> 3pg[e] <=> 3hanthrn[e] <=> udpglcur[e] <=> hLkynr[e] <=> 59 Table 2.8: Models that were infeasible when constraint to experimental growth (ub +20%/ lb-20%). cell line max growth min growth lb ub OVCAR-8 4.462 3.473 0 0.022 0.033 OVCAR-5-2 4.052 0.716 0 0.01 0.015 SF-295 30.879 30.457 0 0.018 0.028 SF-295-2 33.785 20.833 0 0.018 0.028 CAKI-1 15.635 13.818 0 0.015 0.022 MDA-MB-435 0.019 0.008 0 0.02 0.03 NCI-H23 7.427 7.274 0 0.015 0.022 NCI-H322M-2 7.45 1.181 0 0.015 0.022 MALME-3M-2 5.939 2.945 0 0.016 0.024 MCF7-2 0.012 0.008 0 0.021 0.032 SF-539 11.05 10.316 0 0.016 0.024 OVCAR-3-2 7.898 7.223 0 0.015 0.022 SK-MEL-28-2 5.479 0.034 0 0.015 0.023 NCI-H226-2 10.131 8.177 0 0.009 0.013 SN12C 8.068 4.066 0 0.018 0.028 60 3 Prediction of intracellular metabolic states from extracellular metabolomic data Metabolic models can provide a mechanistic framework to analyze information-rich omics data sets, and are increasingly being used to investigate metabolic alternations in human diseases. An expression of the altered metabolic pathway utilization is the selection of metabolites consumed and released by cells. However, methods for the inference of intracellular metabolic states from extracellular measurements in the context of metabolic models remain underdeveloped compared to methods for other omics data. Herein, we describe a workflow for such an integrative analysis emphasizing on extracellular metabolomic data. We demonstrate, using the lymphoblastic leukemia cell lines Molt-4 and CCRF-CEM, how our methods can reveal differences in cell metabolism. Our models explain metabolite uptake and secretion by predicting a more glycolytic phenotype for the CCRF-CEM model and a more oxidative phenotype for the Molt-4 model, which was supported by our experimental data. Gene expression analysis revealed altered expression of gene products at key regulatory steps in those central metabolic pathways, and literature query emphasized the role of these genes in cancer metabolism. Moreover, in silico gene knock-outs identified unique control points for each cell line model, e.g., phosphoglycerate dehydrogenase for the Molt-4 model. Thus, our workflow is well-suited to the characterization of cellular metabolic traits based on extracellular metabolomic data, and it allows the integration of multiple omics data sets into a cohesive picture based on a defined model context. 3.1 Introduction Modern high-throughput techniques have increased the pace of biological data generation. Also referred to as the “omics avalanche”, this wealth of data provides great opportunities for metabolic discovery. Omics data sets contain a snapshot of almost the entire repertoire of mRNA, protein, or metabolites at a given time point or under a particular set of experimental conditions. Because of the high complexity of the data sets, computational modeling is essential for their integrative analysis. Currently, such data analysis is a bottleneck in the research process and methods 61 are needed to facilitate the use of these data sets, e.g., through meta-analysis of data available in public databases (e.g., the human protein atlas [190] or the gene expression omnibus [191]), and to increase the accessibility of valuable information for the biomedical research community. Constraint-based modeling and analysis (COBRA) is a computational approach that has been successfully used to investigate and engineer microbial metabolism through the prediction of steady-states [192]. The basis of COBRA is network reconstruction: networks are assembled in a bottom-up fashion based on genomic data and extensive organism-specific information from the literature. Metabolic reconstructions capture information on the known biochemical transformations taking place in a target organism to generate a biochemical, genetic and genomic (BIGG) knowledge base [10]. Once assembled, a metabolic reconstruction can be converted into a mathematical model [9], and model properties can be interrogated using a great variety of methods [15]. The ability of COBRA models to represent genotypephenotype and environment-phenotype relationships arises through the imposition of constraints, which limit the system to a subset of possible network states [16]. Currently, COBRA models exist for more than 100 organisms, including humans [45, 53]. Since the first human metabolic reconstruction was described (Recon 1 [45]), biomedical applications of COBRA have increased [50]. One way to contextualize networks is to define their system boundaries according to the metabolic states of the system e.g., disease or dietary regimes. The consequences of the applied constraints can then be assessed for the entire network [57]. Additionally, omics data sets have frequently been used to generate cell-type or condition-specific metabolic models. Models exist for specific cell types, such as enterocytes [57], macrophages[21], and adipocytes [144], and even multi-cell assemblies that represent the interactions of brain cells [49]. All of these cell type specific models, except the enterocyte reconstruction were generated based on omics data sets. Cell-type-specific models have been used to study diverse human disease conditions. For example, an adipocyte models was generated using transciptomic, proteomic, and metabolomics data. This model was subsequently used to investigate metabolic alternations in adipocytes that would allow for the stratification of obese patients [144]. One highly active field within the biomedical applications of COBRA is cancer metabolism [51]. Omicsdriven large-scale models have been used to predict drug targets [24, 51]. A cancer model was generated using multiple gene expression data sets and subsequently used to predict synthetic lethal gene pairs as potential drug targets selective for the cancer model, but non-toxic to the global model (Recon 1), a consequence of the reduced redundancy in the cancer specific model [24]. In a follow up study, lethal synergy between FH and enzymes of the heme metabolic pathway were experimentally validated and resolved the mechanism by which FH deficient cells e.g., in renal-cell cancer cells survive a non-functional TCA cycle [77]. Contextualized 62 models, which contain only the subset of reactions active in a particular cell or tissue (or cell-) type, can be generated in different ways [51, 131]. However, the existing algorithms mainly consider gene expression and proteomic data to define the reaction sets that comprise the contextualized metabolic models. These subset of reactions are usually defined based on the expression or absence of expression of the genes or proteins (present and absent calls), or inferred from expression values or differential gene expression. Comprehensive reviews of the methods are available [129, 193]. Only the compilation of a large set of omics data sets can result in a tissue (or cell-type) specific metabolic model, whereas the representation of one particular experimental condition is achieved through the integration of omics data set generated from one experiment only (condition-specific cell line model). Recently, metabolomic data sets have become more comprehensive and using these data sets allows direct determination of the metabolic network components (the metabolites). Additionally, metabolomics has proven to be stable, relatively inexpensive, and highly reproducible [113]. These factors make metabolomic data sets particularly valuable for interrogation of metabolic phenotypes. Thus, the integration of these data sets is now an active field of research [117, 138, 194, 195]. Generally, metabolomic data can be incorporated into metabolic networks as qualitative, quantitative, and thermodynamic constraints [117, 134]. Mo et al. used metabolites detected in the spent medium of yeast cells to determine intracellular flux states through a sampling analysis [117], which allowed unbiased interrogation of the possible network states [25] and prediction of internal pathway use. Such analyses have also been used to reveal the effects of enzymopathies on red blood cells [17], to study effects of diet on diabetes [163] and to define macrophage metabolic states [21]. This type of analysis is available as a function in the COBRA toolbox [15]. In this study, we established a workflow for the generation and analysis of conditionspecific cell line metabolic models that can facilitate the interpretation of metabolomic data. Our modeling yields meaningful predictions regarding metabolic differences between two lymphoblastic leukemia cell lines (Figure 3.1A). 3.2 Results We set up a pipeline that could be used to infer intracellular metabolic states from semi-quantitative data regarding metabolites exchanged between cells and their environment. Our pipeline combined the following four steps: data acquisition, data analysis, metabolic modeling and experimental validation of the model predictions (Figure 3.1A). We demonstrated the pipeline and the predictive potential to predict metabolic alternations in diseases such as cancer based on two lymphoblastic 63 Figure 3.1: a. Combined experimental and computational pipeline to study human metabolism. Experimental work and omics data analysis steps precede computational modeling. Model predictions are validated based on targeted experimental data. Metabolomic and transcriptomic data are used for model refinement and submodel extraction. Functional analysis methods are used to characterize the metabolism of cell-line models and compare it to additional experimental data. The validated models are subsequently used for the prediction of drug targets. b. Uptake and secretion pattern of model metabolites. All metabolite uptake and secretion steps that were mapped during model generation are shown. Metabolite uptakes are depicted on the left, and secreted metabolites are shown on the right. A number of metabolite exchanges mapped to the model were unique to one cell line. Differences between cell lines were used to set quantitative constraints for the sampling analysis. c. Statistics about the T-cell-specific network generation. d. Quantitative constraints. For the sampling analysis, an additional set of constraints was imposed on the cell line specific models, emphasizing the differences in metabolite uptake and secretion between cell lines. Higher uptake of a metabolite was allowed in the model of the cell line that consumed more of the metabolite in vitro, whereas the supply was restricted for the model with lower in vitro uptake. leukemia cell lines. The resulting Molt-4 and CCRF-CEM condition-specific cell line models were able to explain metabolite uptake and secretion by predicting the 64 distinct utilization of central metabolic pathways by the two cell lines. Whereas the CCRF-CEM model resembled more a glycolytic, commonly referred to as ‘Warburg’ phenotype, suggested our predictions a more respiratory phenotype for the Molt-4 model. We found these predictions to be in agreement with measured gene expression differences at key regulatory steps in the central metabolic pathways, and they were also consistent with additional experimental data regarding the energy and redox states of the cells. After a brief discussion of the data generation and analysis steps, the results derived from model generation and analysis will be described in detail. 3.3 Pipeline for generation of condition-specic metabolic cell line models 3.3.1 Generation of experimental data We monitored the growth and viability of lymphoblastic leukemia cell lines in serum-free medium (Figure 3.6). Multiple omics data sets were derived from these cells. Extracellular metabolomics (exo-metabolomic) data, comprising measurements of the metabolites in the spent medium of the cell cultures [196], were collected along with transcriptomic data, and these data sets were used to construct the models. 3.3.2 Analysis of experimental data Data analysis included defining the sets of metabolites that were taken up or secreted (qualitatively for the generation of the models), and it included determining the quantitative differences in uptake and secretion between cell lines (Figure 3.1B). These differences were later subjected to model constraints. The final sets of metabolite exchanges that were used for model generation comprised the uptake and secretion of 14 and 10 metabolites by both models, unique secretion of seven and unique uptake of four metabolites by the CCRF-CEM model, and secretion of one and uptake of one unique metabolite in Molt-4 cells (Figure 3.1B). Additionally, sets of genes treated as expressed and unexpressed (absent and present calls), and groups of differentially expressed genes (DEGs) and alternatively spliced genes (AS) were predicted by comparing expression in CCRF-CEM and Molt-4 cells (see Methods section). 65 3.3.3 Generation of the condition-specic models Model generation involves three steps: refinement of the global model, data mapping and submodel extraction. We added transport and exchange reactions for metabolites that could not be transported between the extracellular space and the cytosol (Table 3.2). Nutrient supply (for metabolite uptake) was restricted to the RPMI medium composition. First, the detected metabolite uptakes and secretions for each cell line were mapped separately to the model. The model was thereby constrained to represent a minimal set of metabolite exchange reactions required to support all of the observed metabolite uptakes and secretions and to explain the experimentally observed growth rates of the cells (Figure 3.1B, see Methods). The result was a vast reduction of the number of possible metabolite uptakes and secretions in the two preliminary models (Figure 3.1C), which placed major emphasis on the experimentally observed metabolite uptake and secretion profiles. In addition to the (qualitative) exo-metabolomic constraints, genomic data were mapped to the preliminary models. In general, the mapping of transcriptomic data, which meant the deletion of all reactions associated with the set of absent genes, and which was performed after the integration of the exo-metabolomic data, did not prevent that either model could represent the detected metabolite uptake, metabolite secretion, or biomass production. Curation beyond the initial definition of the minimal sets of mandatory exchanges was therefore not necessary. Subsequently, the condition-specific CCRF-CEM and Molt-4 models were extracted through network pruning. Model reactions unable to support flux were identified through flux variability analysis (FVA) and removed, leaving the functional reaction sets to compose the final Molt-4 and CCRF-CEM models. 3.3.4 Condition-specic metabolic models for CCRF-CEM and Molt-4 cells To determine whether we had obtained two distinct models, we evaluated the reactions, metabolites, and genes of the two models. Both the Molt-4 and CCRF-CEM models contained approximately half of the reactions and metabolites present in the global model (Figure 3.1C). They were very similar to each other in terms of their reactions, metabolites, and genes. The Molt-4 model contained seven reactions that were not present in the CCRF-CEM model (Co-A biosynthesis pathway and exchange reactions). In contrast, the CCRF-CEM contained 31 unique reac- 66 tions (arginine and proline metabolism, vitamin B6 metabolism, fatty acid activation, transport, and exchange reactions). There were two and 15 unique metabolites in the Molt-4 and CCRF-CEM models, respectively. Approximately three quarters of the global model genes remained in the condition-specific cell line models (Figure 3.1C). The Molt-4 model contained 15 unique genes, and the CCRF-CEM model had four unique genes. Both models lacked NADH dehydrogenase (complex I of the electron transport chain (ETC)), which was determined by the absence of expression of a mandatory subunit (NDUFB3, Entrez gene ID 4709). Rather, the ETC was fueled by FADH2 originating from succinate dehydrogenase and from fatty acid oxidation, which through flavoprotein electron transfer could contribute to the same ubiquinone pool as complex I and complex II (succinate dehydrogenase). Despite their different in vitro growth rates (which differed by 11%, see methods) and differences in exo-metabolomic data (Figure 3.1B) and transcriptomic data, the internal networks were largely conserved in the two condition-specific cell line models. 3.3.5 Condition-specic cell line models predict distinct metabolic strategies Despite the overall similarity of the metabolic models, differences in their cellular uptake and secretion patterns suggested distinct metabolic states in the two cell lines (Figure 3.1B and see Methods section for more detail). To interrogate the metabolic differences, we sampled the solution space of each model using an Artificial Centering Hit-and-Run (ACHR) sampler [163]. For this analysis, additional constraints were applied, emphasizing the quantitative differences in commonly uptaken and secreted metabolites. The maximum possible uptake and maximum possible secretion flux rates were reduced according to the measured relative differences between the cell lines (Figure 3.1D, see method section). We plotted the number of sample points containing a particular flux rate for each reaction. The resulting binned histograms can be understood as representing the probability that a particular reaction can have a certain flux value. A comparison of the sample points obtained for the Molt-4 and CCRF-CEM models revealed a considerable shift in the distributions, suggesting a higher utilization of glycolysis by the CCRF-CEM model (Figure 3.2). This result was further supported by differences in medians calculated from sampling points. The shift persisted throughout all reactions of the pathway and was induced by the higher glucose uptake (35%) from the extracellular medium in CCRF-CEM cells. The sampling median for glucose uptake was 34% higher in the CCRF-CEM model than in Molt-4 model (Figure 3.2). The usage of the TCA cycle was also distinct in the two condition-specific cell- 67 Figure 3.2: Histograms of sampling points were different between the CCRF-CEM model (red) and the Molt-4 model (blue) for 10 glycolysis reactions. Negative values in the histograms and the table describe reaction fluxes in the reverse direction of reversible reactions. The table provides the median values of the sampling results. line models (Figure 3.3). Interestingly, the models used succinate dehydrogenase differently (Figure 3.3, Figure 3.4). The Molt-4 model utilized an associated reaction to generate FADH2, whereas in the CCRF-CEM model, the histogram was shifted in the opposite direction, toward the generation of succinate. Additionally, there was a higher efflux of citrate toward amino acid and lipid metabolism in the CCRF-CEM model (Figure 3.3). There was higher flux through anaplerotic and cataplerotic reactions in the CCRF-CEM model than in the Molt-4 model (Figure 3.3); these reactions include the efflux of citrate through ATP-citrate lyase, uptake 68 of glutamine, generation of glutamate from glutamine, transamination of pyruvate and glutamate to alanine and to 2-oxoglutarate, secretion of nitrogen, and secretion of alanine. The Molt-4 model showed higher utilization of oxidative phosphorylation (Figure 3.4), again supported by elevated median flux through ATP synthase (36%) and other enzymes, which contributed to higher oxidative metabolism. The sampling analysis therefore revealed different usage of central metabolic pathways by the condition-specific models. 3.4 Experimental validation of energy and redox status of CCRF-CEM and Molt-4 cells Cancer cells have to balance their needs for energy and biosynthetic precursors, and they have to maintain redox homeostasis to proliferate [61]. We conducted enzymatic assays of cell lysates to measure levels and/or ratios of ATP, NADPH + NADP, NADH + NAD, and glutathione. These measurements were used to provide support for the in silico predicted metabolic differences (Figure 3.5). Additionally, an Oxygen Radical Absorbance Capacity (ORAC) assay was used to evaluate the cellular antioxidant status (Figure 3.5B). Total concentrations of NADH + NAD, GSH + GSSG, NADPH + NADP and ATP, were higher in Molt-4 cells (Figure 3.5A). The higher ATP concentration in Molt-4 cells could either result from high production rates, or intracellular accumulation connected to high or low reactions fluxes (Figure 3.5A). Our simplified view that oxidative Molt-4 produces less ATP and was contradicted by the higher ATP concentrations measured (Figure 3.5L). Yet we want to emphasize that concentrations cannot be compared to flux values, since we are modeling at steady-state. NADH/NAD+ ratios for both cell lines were shifted toward NADH (Figure 3.5D-E), but the shift toward NADH was more pronounced in CCRF-CEM (Figure 3.5E), which matched our expectation based on the higher utilization of glycolysis and 2-oxoglutarate dehydrogenase in the CCRF-CEM model (Figure 3.5L). The mitochondrial membrane has been suggested to be the quantitatively most important physiological source of superoxide in higher organisms [197]. If the Molt-4 cells were relying more on mitochondrial respiration, we expected them to counteract the increased oxidative stress by using antioxidant systems such as glutathione and NADPH (Figure 3.5L). Indeed, Molt-4 cells showed a higher capacity for reactive oxygen species (ROS) detoxification than CCRF-CEM cells (Figure 3.5B), which was supported by the higher utilization of oxidative phosphorylation and spermidine dismutase by the Molt-4 model (SPODM, median CCRF-CEM = 0.0010 U, and Molt-4 = 0.0011 U) (Figure 3.5L). Reduced glutathione (GSH) is of major importance for the clearance of ROS [198]. GSH/GSSG ratios were shifted toward 69 GSH in both cell lines (CCRF-CEM = 747:51, Molt-4 = 1182:56), and the shift was more pronounced in Molt-4 cells (Figure 3.5K). Both cell lines had low NADPH/NADP+ ratios (CCRF-CEM: 4.7:2.8, Molt-4 6:11.5). However, in Molt-4 cells, the ratio was shifted toward NADP+, whereas CCRFCEM cells contained higher amounts of NADPH (Figure 3.5G-H). This matched our expectation that the glycolytic CCRF-CEM model would produce more NADPH (Figure 3.5L) and that it would exhibit higher flux through the oxidative phase of the pentose phosphate pathway. Taken together, the experimental data agreed well with our expectations based on the predicted phenotypes. We sought additional support for the predicted metabolic differences in the transcriptomic data. 3.5 Comparison of network utilization and alteration in gene expression With the assumption that differential expression of particular genes would cause reaction flux changes, we determined how the differences in gene expression (between CCRF-CEM and Molt-4) compared to the flux differences observed in the models. Specifically, we checked whether the reactions associated with genes upregulated (significantly more expressed in CCRF-CEM cells compared to Molt-4 cells) were indeed more utilized by the CCRF-CEM model, and we checked whether downregulated genes were associated with reactions more utilized by the Molt-4 model. The set of downregulated genes was associated with 15 reactions, and the set of 49 upregulated genes was associated with 113 reactions in the models. Reactions were defined as differently utilized if the difference in flux exceeded 10% (considering only non-loop reactions). Of the reactions associated with upregulated genes, 72.57% were more utilized by the CCRF-CEM model, and 2.65% were more utilized by the Molt-4 model (Table 3.3). In contrast, all 15 reactions associated with the 12 downregulated genes were more utilized in the CCRF-CEM model. After this initial analysis, we approached the question from a different angle, asking whether the majority of the reactions associated with each individual gene upregulated in CCRF-CEM were more utilized by the CCRF-CEM model. We found that this was the case for 77.55% of the upregulated genes. The majority of reactions associated with two (16.67%) downregulated genes were more utilized by the Molt-4 model. Taken together, our comparisons of the direction of gene expression with the fluxes of the two cancer cell-line models confirmed that reactions associated with upregulated genes in the CCRF-CEM cells were generally more utilized by the CCRF-CEM model. 70 3.6 Accumulation of DEGs and AS genes at key metabolic steps After we confirmed that most reactions associated with upregulated genes were more utilized by the CCRF-CEM model, we checked the locations of differentially expressed genes within the network. In this analysis, we paid special attention to the central metabolic pathways that we had found to be distinctively utilized by the two models. Several differentially expressed genes (DEGs) and alternative splicing (AS) events were associated with glycolysis, the ETC, pyruvate metabolism, and the pentose phosphate pathway (PPP) (Table 3.1). Moreover, in glycolysis, the DEGs and/or AS genes were associated with all three rate-limiting steps, i.e., the steps mediated by hexokinase, pyruvate kinase, and phosphofructokinase. Of these key enzymes, hexokinase 1 (Entrez Gene ID: 3098) was alternatively spliced, and pyruvate kinase (PKM, Entrez gene ID: 5315) was significantly more expressed in the CCRF-CEM cells (Table 3.1), in agreement with the higher in silico predicted flux. However, in contrast to the observed higher utilization of glycolysis in the CCRF-CEM model, we found that the gene associated with the rate-limiting glycolysis step, phosphofructokinase (Entrez Gene ID: 5213), was significantly upregulated in Molt-4 cells relative to CCRF-CEM cells. This higher expression was detected for only a single isozyme, however. Two of the three genes associated with phosphofructokinase were also subject to alternative splicing (Table 3.1). In addition to the key enzymes, fructose bisphosphate aldolase (Entrez Gene ID: 230) was also significantly upregulated in Molt-4 cells relative to CCRF-CEM cells, which was in contrast to the predicted higher utilization of glycolysis in the CCRF-CEM model. Additionally, glucose-6P-dehydrogenase (G6PD), which catalyzes the first reaction and commitment step of the pentose phosphate pathway (PPP), was an AS gene (Table 3.1). A second AS gene associated with the PPP reaction of the deoxyribokinase was RBKS (Entrez Gene ID: 64080). This gene is also associated with ribokinase, but ribokinase was removed during model construction because of the lack of ribose uptake or secretion. Single AS genes were associated with different complexes of the ETC (Table 3.1). Literature query revealed that at least 13 genes associated with alternative splicing events were mentioned previously in connection with both alternative splicing and cancer, and 37 genes were associated with cancer, e.g., upregulated, downregulated at the level of mRNA or protein, or otherwise connected to cancer metabolism and signaling. One general observation was that there was a surprising accumulation of metabolite transporters among the alternatively spliced genes. Overall, the high incidence of differential gene expression events at metabolic control points increases the plausibility of the in silico predictions. 71 3.825 2.506 0.146 0.125 Median Molt-4 0.040 12.898 0.068 36.115 0.035 0.039 0.099 0.351 13.041 6.217 -36.230 326.100 129.365 291.679 129.260 0.073 0.072 0.241 338.276 0.221 2.455 1.563 0.196 0.109 Median CCRF-CEM 0.051 18.800 0.191 54.746 0.050 0.052 0.138 0.162 18.995 9.835 -54.935 327.300 128.372 289.357 128.219 0.100 0.100 1.300 345.473 0.178 Entrez Gene ID 219 230 2820 5315 223 223 92579 1737 5213 upregulated upregulated upregulated upregulated upregulated upregulated downregulated direction change upregulated downregulated upregulated upregulated upregulated upregulated upregulated upregulated downregulated upregulated 284273 284273 284273 284273 223 223 5091 Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Glycolysis/Glucon. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. Pyruvate Met. OxPhos OxPhos OxPhos PPP PPP Subsystem 4905 1537 64080 2539 8854 8854 92579 1737 5211;5213 3098 5230 284273 284273 284273 284273 8854 8854 5091 9380 10873 AS Entrez Gene ID Table 3.1: Differentially expressed genes (DEGs) and alternative splicing (AS) events of central metabolic and cancer-related pathways. Full lists of DEGs and AS are provided in the supplementary material. Upregulated = significantly more expressed in CCRFCEM compared to Molt-4 cells. PPP = pentose phosphate pathway. Glycolysis/glucon = glycolysis/gluconeogenesis. Pyruvate met. = pyruvate metabolism. OxPhos = oxidative phosphorylation DEG associated reactions ALDD2xm FBA G3PD2m PYK ALDD2x ALDD2y G6PPer PDHm PFK HEX1 PGK ALCD21_D ALCD21_L ALCD22_D ALCD22_L LCADi LCADi_D PCm LALDD ME2m NADH2_u10m ATPS4m CYOR_u10m DRBK G6PDH2r 72 3.7 Single gene deletion Analyses of essential genes in metabolic models have been used to predict candidate drug targets for cancer cells [24]. Here, we conducted an in silico gene deletion study for all model genes to identify a unique set of knock-out (KO) genes for each condition-specific cell line model. The analysis yielded 63 shared lethal KO genes and distinct sets of KO genes for the CCRF-CEM model (11 genes) and the Molt-4 model (three genes). For three of the unique CCRF-CEM KO genes, the genes were only present in the CCRF-CEM model (Table 3.4). The essential genes for both models were then related to the cell-line-specific differences in metabolite uptake and secretion (Figure 1B). The CCRF-CEM model needed to generate putrescine from ornithine (ORNDC, Entrez Gene ID: 4953) to subsequently produce 5-methylthioadenosine for secretion (Figure 1B). S-adenosylmethioninamine produced by adenosylmethionine decarboxylase (arginine and proline metabolism, associated with Entrez Gene ID: 262) is a substrate required for generation of 5-methylthioadenosine. Another example of a KO gene connected to an enforced exchange reaction was glutamic-oxaloacetic transaminase 1 (GOT1, Entrez Gene ID: 2805). Without GOT1, the CCRF-CEM model was forced to secrete 4-hydroxyphenylpyruvate (Figure 3.1B), the second product of tyrosine transaminase, which is produced only by that enzyme. One KO gene in the Molt-4 model (Entrez Gene ID: 26227) was associated with phosphoglycerate dehydrogenase, which catalyzes the conversion of 3-phospho-Dglycerate to 3-phosphohydroxypyruvate while generating NADH from NAD+. This KO gene is particularly interesting, given the involvement of this reaction in a novel pathway for ATP generation in rapidly proliferating cells [100, 148, 199]. Reactions associated with unique KO genes were in many cases utilized more by the model, in which the gene KO was lethal, underlining the potential importance of these reactions for the models. Thus, single gene deletion provided unique sets of lethal genes that could be specifically targeted to kill these cells. 3.8 Discussion In the current study, we explored the possibility of semi-quantitatively integrating metabolomic data with the human genome-scale reconstruction to facilitate analysis. By constructing condition-specific cell line models to provide a structured framework, we derived insights that could not have been obtained from data analysis alone. 73 We derived condition-specific cell line models for CCRF-CEM and Molt-4 cells that were able to explain the observed exo-metabolomic differences (Figure 3.1B). Despite the overall similarities between the models, the analysis revealed distinct usage of central metabolic pathways (Figures 2-4), which we validated based on experimental data and differential gene expression. The additional data sufficiently supported metabolic differences in these cell lines, providing confidence in the generated models and the model-based predictions. We used the validated models to predict unique sets of lethal genes to identify weak links in each model. These weak links may represent potential drug targets. Integrating omics data with the human genome-scale reconstruction provides a structured framework (i.e., pathways) that is based on careful consideration of the available biochemical literature [9]. This network context can simplify omics data analysis, and it allows even non-biochemical experts to gain fast and comprehensive insights into the metabolic aspects of omics data sets. Compared to transcriptomic data, methods for the integration and analysis of metabolomic data in the context of metabolic models are less well established, although it is an active field of research [194, 195]. In contrast to other studies, our approach emphasizes the representation of experimental conditions rather than the reconstruction of a generic, cell-line-specific network, which would require the combination of data sets from many experimental conditions and extensive manual curation. Rather, our way of model construction allowed us to efficiently assess the metabolic characteristics of cells. Despite the fact, that only a limited number of exchanged metabolites can be measured by available metabolomics platforms and at reasonable time-scale, and that pathways of measured metabolites might still be unknown to date (Table 3.6 & 3.8), our methods still have the potential to reveal metabolic characteristics of cells which could be useful for biomedicine and personalized health. The reasons why some cancers respond to certain treatments and not others remain unclear, and choosing a treatment for a specific patient is often difficult [199]. One potential application of our approach could be the characterization of cancer phenotypes to explore how cancer cells or other cell types with particular metabolic characteristics respond to drugs. The generation of our condition-specific cell line models involved only limited manual curation, making this approach a fast way to place metabolomic data into a network context. Model building mainly involves the rigid reduction of metabolite exchanges to match the observed metabolite exchange pattern with as few additional metabolite exchanges as possible. It should be noted that this reduction determines, which pathways can be utilized by the model. Our approach mostly conserved the internal network redundancy. However, a more significant reduction may be achieved using different data. Generally, a trade-off exists between the reduction of 74 the internal network and the increasing number of network gaps that need to be curated by using additional omics data, such as transcriptomics and proteomics. One way to prevent the emergence of network gaps would be to use mapping algorithms that conserve network functionality, such as GIMME [131]. However, several additional methods exist for the integration of transcriptomic data [129], and which model-building method is best depends on the available data. Interestingly, the lack of a significant contribution of our gene expression data to the reduction of network size suggests that the use of transcriptomic data is not necessary to identify distinct metabolic strategies; rather, the integration of exo-metabolomic data alone may provide sufficient insight. However, sampling of the cell line models constrained according to the exo-metabolomic profiles only, or increasing the cutoff for the generation of absent and present calls (p<0.01), did not yield the same insights as presented herein. Only recently Gene Inactivation Moderated by Metabolism, Metabolomics and Expression (GIM(3)E) became available, which enforces minimum turnover of detected metabolites based on intracellular metabolomics data as well as gene expression microarray data [138]. In contrast to this approach, we emphasized our analysis on the relative differences in the exo-metabolomic data of two cell lines. GIM(3)E constitutes another integration method when the analysis should be emphasized on intracellular metabolomics data [138]. The metabolic differences predicted by the models are generally plausible. Cancers are known to be heterogeneous [61], and the contribution of oxidative phosphorylation to cellular ATP production may vary [147]. Moreover, leukemia cell lines have been shown to depend on glucose, glutamine, and fatty acids to varying extents to support proliferation. Such dependence may cause the cells to adapt their metabolism to the environmental conditions [187]. In addition to identifying supporting data in the literature, we performed several analyses to validate the models and model predictions. Our expectations regarding the levels and ratios of metabolites relevant to energy and redox state were largely met (Figure 3.5L). The more pronounced shift of the NADH/NAD+ ratio toward NADH in the CCRF-CEM cells was in agreement with the predicted Warburg phenotype (Figure 3.5), and the higher lactate secretion in the CCRF-CEM cells (Figure S2) implies an increase in NADH relative to NAD+ [200, 201], again matching the known Warburg phenotype. ROS production is enhanced in certain types of cancer [198, 202], and the generation of ROS is thought to contribute to mutagenesis, tumor promotion, and tumor progression [202, 203]. However, decreased mitochondrial glucose oxidation and a transition to aerobic glycolysis protect cells against ROS damage during biosynthesis and cell division [204]. The higher ROS detoxification capability in Molt-4 cells, in combination with higher spermidine dismutase utilization by the Molt-4 model (Figure 3.5), provided a consistent picture of the predicted respiratory phenotype (Figure 3.5L). 75 Control of NADPH maintains the redox potential through GSH and protects against oxidative stress, yet changes in the NADPH ratio in response to oxidative damage are not well understood [205]. Under stress conditions, as assumed for Molt-4 cells, the NADPH/NADP+ ratio is expected to decrease because of the continuous reduction of GSSG (Figure 3.5L), and this was confirmed in the Molt-4 cells (Figure 3.5). The higher amounts of GSH found in Molt-4 cells in vitro may demonstrate an additional need for ROS scavengers because of a greater reliance on oxidative metabolism. Cancer is related to metabolic reprogramming, which results from alterations of gene expression and the expression of specific isoforms or splice forms to support proliferation [206, 207]. The gene expression differences detected between the two cell lines in the present study supported the existence of metabolic differences in these cell lines, particularly because key steps of the metabolic pathways central to cancer metabolism seemed to be differentially regulated (Table 3.1). The detailed analysis of the respective differences on the pathway fluxes exceeds the scope of this study, which was to demonstrate the potential of the integration of exo-metabolomic data into the network context. We found discrepancies between differential gene regulation and the flux differences between the two models as well as the utilization AS gene-associated reaction. This is not surprising, since analysis of the detailed system is required to make any further assumptions on the impact that the differential regulation or splicing might have on the reaction flux, given that for many of the concerned enzymes isozymes exist, or only one of multiple subunits of a protein complex was concerned. Additionally, reaction fluxes are regulated by numerous post-translational factors, e.g., protein modification, inhibition through proteins or metabolites, alter reaction fluxes [208], which are out of the scope of constraintbased steady-state modeling. Rather, the results of the presented approach demonstrate how the models can be used to generate informed hypothesis that can guide experimental work. The combination of our tailored metabolic models and differential gene expression analysis seems well-suited to determine the potential drivers involved in metabolic differences between cells. Such information could be valuable for drug discovery, especially when more peripheral metabolic pathways are considered. Additionally, statistical comparisons of gene expression data with sampling-derived flux data could be useful in future studies [144]. A single-gene-deletion analysis revealed that phosphoglycerate dehydrogenase (PGDH) was a lethal KO gene for the Molt-4 model only. Differences in PGDH protein levels correspond to the amount of glycolytic carbon diverted into glycine biosynthesis. Rapidly proliferating cells may use an alternative glycolytic pathway for ATP generation, which may provide an advantage in the case of extensive oxidative phosphorylation and proliferation [100, 148, 199]. For breast cancer cell lines, 76 variable dependency on the expression of PGDH has already been demonstrated [148]. This example of a unique KO gene demonstrates how in silico gene deletion in metabolomics-driven models can identify the metabolic pathways used by cancer cells. This approach can provide valuable information for drug discovery. In conclusion, our contextualization method produced metabolic models that agreed in many ways with the validation data sets. The analyses described in this study have great potential to reveal the mechanisms of metabolic reprogramming, not only in cancer cells but also in other cells affected by diseases, and for drug discovery in general. 3.9 Materials and methods Global model The model we used (global model) was a subset of Recon 2 [53], which is freely available (http://humanmetabolism.org/). Transport and exchange reactions for metabolites identified according to metabolite uptakes and secretions detected herein were already considered in the construction of Recon 2. The model captured additional reactions (Table 3.2). Cell culture Molt-4 and CCRF-CEM cells were obtained from ATCC (CRL-1582 and CCL-119) and routinely grown in RPMI 1640 with, 2 mM GlutaMax and 10 % FBS (Invitrogen; 61870-010, 10108-57) in a humidified incubator at 37 ◦ C and 5% CO2. At least 3 days before experiments the medium was changed to serum-free medium (Advanched RPMI1640, containing 2 mM GlutaMax (Invitrogen; 12633-012, 35050038). The medium was refreshed the day before starting the experiment. For experiments cells were centrifuged at 201 x g for 5 min and resuspended in serum-free medium containing DMSO (0.67%) at a cell concentration of 5 x 105 cellsml. 1 or 2 ml cell suspension was seeded in triplicates in 24 well or 12 well plates, respectively. At the indicated times the cells were removed by centrifugation and the spent medium frozen at −80 ◦ C. Cell number, size and viability (Trypan blue exclusion) was obtained by counting cells using an automatic cell counter, Countess (Invitrogen) (Figure 3.6). 77 Analysis of the extracellular metabolome R Mass spectrometry analysis of the exo-metabolome was performed by Metabolon, Inc (Durham, NC, USA) using a standardized analytical platform. In total, 75 extracellular metabolites were detected in the initial data set for at least one of the two cell lines [196]. Of these metabolites, 15 were not part of our global model and were discarded. Apart from being absent in our global model, an independent search in HMDB [209] revealed no pathway information was available for most of these metabolites. It should be noted that metabolites e.g., N-acetylisoleucine, Nacetylmethionine or pseudouridine, constitute protein and RNA degradation products, which were out of the scope of the metabolic network. Thiamin (Vitamin B1) was part of the minimal medium of essential compounds supplied to both models. Riboflavin (Vitamin B2) and Trehalose were excluded since these compounds cannot be produced by human cells. Erythrose and fructose were also excluded. In contrast 46 metabolites that were part of the global model. The data set included two different time points, which allowed us to treat the increase/decrease of a metabolite signal between time points as evidence for uptake or secretion when the change was greater than 5% from what was observed in the control (Table 3.5, 3.6, 3.7, 3.8). We found 12 metabolites that were taken up by both cell lines and 10 metabolites that were commonly secreted by both cell lines over the course of the experiment. Additionally, Molt-4 cells took up three metabolites not taken up by CCRF-CEM cells, and secreted one metabolite not secreted by CCRF-CEM cells. Two of the three uniquely uptaken metabolites were essential amino acids: valine and methionine. However, it is unlikely that these metabolites were not taken up by the CCRF-CEM cells, and the CCRF-CEM model was allowed to take up this metabolite. Because of this adjustment, no quantitative constraints were applied for the sampling analysis either. CCRF-CEM cells had four unique uptaken and seven unique secreted metabolites (exchange not detected in Molt-4 cells). Network renement based on exo-metabolic data Despite its comprehensiveness, the human metabolic reconstruction is not complete with respect to extracellular metabolite transporters [53, 58]. Accordingly, we identified metabolite transport systems from the literature for metabolites that were already part of the global model, but whose extracellular transport was not yet accounted for. Diffusion reactions were included whenever a respective transporter could not be identified. In total, 31 reactions (11 exchange reactions, 16 transport reactions and seven demand reactions (Table 3.2) were added to Recon 2 [53], and 78 two additional reactions were added to the global model (Table 3.2). Expression proling Molt-4 and CCRF-CEM cells were grown in advanced RPMI 1640 and 2 mM GlutaMax, and the cells were resuspended in medium containing DMSO (0.67%) at a concentration of 5 × 105 cells/mL. The cell suspension (2 mL) was seeded in 12-well plates in triplicate. After 48 h of growth, the cells were collected by centrifugation at 201 ×g for 5 min. Cell pellets were snap-frozen in liquid N2 and kept frozen until RNA extraction and analysis by Aros (Aarhus, Denmark). Analysis of transcriptomic data We used the Affymetrix GeneChip Human Exon 1.0 ST Array to measure whole genome exon expression. We generated Detection above background (DABG) calls using ROOT (version 22) and the XPS package for R (version 11.1), with Robust Multi-array Analysis (RMA) summarization. Calls for data mapping were assigned based on p < 0.05 as the cutoff probability to distinguish presence versus absence for the 1,278 model genes (Table 3.9; Table S12, http://link.springer.com/article/ 10.1007%2Fs11306-014-0721-3 for mapping of probe sets to model genes). Differential gene expression and alternative splicing analysis were performed by using AltAnalyse software (v2.02beta) with default options on the raw data files (CEL files). The Homo sapiens Ensemble 65 database was used, probe set filtering was kept as DABG p < 0.05, and non-log expression < 70 was used for constitutive probe sets to determine gene expression levels. For the comparison, CCRF-CEM was the experimental group and Molt-4 was the baseline group. The set of differentially expressed genes between cell lines was identified based on a p < 0.05 FDR cutoff (Table 3.10, 3.11). Alternative splicing analysis was performed on core probe sets with a minimum alternative exon score of 2 and a maximum absolute gene expression change of 3 because alternative splicing is a less critical factor among highly differentially expressed genes. Gene expression data, complete lists of DABG p-values, differentially expressed genes and alternative splicing events have been deposited in the Gene Expression Omnibus (GEO) database (accession number: GSE53123). 79 Deriving cell-type-specic subnetworks Transcriptomic data were mapped to the model in a manual fashion (COBRA function: deleteModelGenes). Specifically, reactions dependent on gene products that were called as “absent” were constrained to zero, such that fluxes through these reactions were disabled. Submodels were extracted based on the set of reactions carrying flux (network pruning) by running fastFVA [162] after mapping the metabolomic and transcriptomic data using the COBRA toolbox [15]. Cell weight We calculated the cell dry weight based on the relative volume difference and comparison to human osteosarcoma (U2OS) cells. The cell dry weight of U2OS cells, 60 pg [175], and cell volume, 4000 µm3 [110], were derived from the literature. The cell volume of lymphocytes (243 µm3 , the average volume of lymphoblasts from patients with ALL, [176]) was derived from the literature. Cell dry weight was calculated accordingly: 4000/243=16.46, and 60 pg/16.46 = 3.645 pg (3.645e-12 g). Denition of maximum uptake rate and minimum uptake rate The maximum uptake rate was defined by the RPMI medium concentrations, and the minimum uptake was defined by mass spectrometry detection limits. Therefore, both medium concentration (mM) and detection limit (mM) were converted to flux values (mmol/gDW/hr) by using a cell concentration of 2.17*1e6 (the concentration of viable CCRF-CEM cells after 48 h), an experimental duration of 48 h, and the calculated dry weight of 3.645e-12 g per cell: Flux = MetConc/(CellConc*CellWeight*T*1000). In the case of uptake, they were defined by the RPMI medium concentration (lower bound, lb) and the detection limit (upper bound, ub), and in the case of secretion, they were defined by the detection limit (lb) or left unconstrained (ub). Setting general and qualitative exo-metabolomic constraints during model building Medium concentration to flux calculations were based on 3.645*1e-12 g cell weight, an initial cell concentration of 2.17*1e6 , T = 48 h, and 80 Flux = MetConc/(CellConc*CellWeight*T). We constrained the model by enforcing minimal flux through exchange reactions for secreted or uptaken metabolites in the correct directions (qualitative constraints). In the case of uptake, the upper bound of the corresponding exchange reaction was set to the flux equivalent of the minimal detection limit [196] using the same equation used for the concentrations in the medium. In the case of secretion, the lower bound of the exchange was set to be the minimum flux value based on the minimal detection limit (Table 3.12). The biomass reaction was constrained in a cell-line-specific manner. The experimental growth rate was 0.035 hr-1 for CCRF-CEM and 0.032 hr-1 for Molt-4 (Table 3.13 & 3.14). Vmax and Vmin were set to allow 20% deviation from the experimental growth rate in each direction. Oxygen uptake was constrained to Vmin = -2.346 mmol/gDW/hr [163]. All infinite fluxes were set to the maximum: -500/500 mmol/gDW/hr. Alanine and glutamine are the breakdown products of GlutaMax in an external reaction. The model did not account for these reactions. However, the glutamine concentration was used to calculate the uptake flux of glutamine, which otherwise was not present in the medium. The increase of both compounds therefore did not necessarily reflect actual secretion by the cells, as it may have simply reflected the breakdown of GlutaMax, although additional secretion by the cells cannot be excluded. In the case of glutamine and alanine, the model exchanges remained unconstrained (qualitative and quantitative constraints) because the actual cell behavior could not be derived from the data, as it was overshadowed by accumulation resulting from the breakdown of GlutaMax (Table 3.5, 3.6, 3.7, 3.8). Uptake of the conditionally essential amino acid cysteine (of which adequate amounts may not be produced) was enabled. Repeated profiling of the two cell lines supported the uptake of these amino acids (unpublished data). All other exchange reactions were constrained to zero, except those for basic ions, basic medium compounds and essential amino acids. Denition of quantitative constraints The constraints on the exchange reactions defined during model building were the same in both condition-specific cell line models (Figure 3.1D). For the analysis, we used the relative quantitative differences of commonly uptaken or secreted metabolites to further constrain the models (quantitative constraints). The model of the cell line that secreted more in the experiment was forced to secrete more by increasing the lower bound of the respective exchange reaction. The new lower bound was set to be proportionate to the difference in metabolite secretion in the experimental data (Figure 3.1D, C-D). Accordingly, we decreased the lower bound of the model for the cell line that showed less uptake of the influx metabolites (Figure 3.1D, A-B). For a list of the adjusted bounds, see Table 3.15. To estimate the ratio for adjustment, we first calculated the fold change (FC) of each metabolite in the medium 81 and in each cell line by comparing the zero and 48 h time points. Next, we compared the FC values to generate a slope (Slope = FCcelline/FCmedium) for each cell line. In the last step, we calculated the slope ratios (Slope Ratio = slopeCCRFCEM/slopeMolt-4), which were used for the adjustments (Figure 3.1D, colored x = Slope Ratio). Some metabolite exchanges were not adjusted, including those of phosphate and the essential amino acids histidine, L-cysteine, valine, methionine, alanine, and glutamine. The additional quantitative bounds were established to get a closer match to the phenotypes, so we refrained from adding constraints based on data, which was inconclusive. Glutamine and alanine were the breakdown products of Glutamax, however instead of modeling the breakdown of Glutamax, we did not constrain the bounds for these compounds. The ACHRsampler implemented in the COBRA toolbox [15] was used with 10,000 generated warm-up points, nFiles = 100, pointsPerFile = 5000, and stepsPerPoint=2500, and the cell-line models were used as inputs. Comparison of network utilization and DEGs/AS The models shared a set of 1,907 reactions. We defined a reaction as differently utilized if the median value calculated from the sampling points differed by more than 10%. The shared reaction set was divided into three groups: x (reactions with median difference > 10% and higher in CCRF-CEM cells) = 1381, y (reactions with median difference > 10% and higher in Molt-4 cells) = 158, and z (reactions with median difference < 10% and reactions with opposite directionality in addition to loop reactions) = 368. Loop reactions were defined by flux variability analysis (FVA) with the criteria minFlux = -500 and maxFlux = 500 (219 reactions in Molt-4, 220 reactions in CCRF-CEM). Models cover equal amounts of dierentially expressed and alternatively spliced genes The GeneChip Human Exon 1.0 ST Array had been used to measure gene expression and exon variation in Molt-4 and CCRF-CEM cells. We derived sets of differentially expressed (DEGs) and alternatively spliced (AS) genes by comparing gene expression between the two cell lines (Tables 3.10 & 3.11). The analysis yielded 57 Recon 1 genes with significantly more expressed in CCRF-CEM compared to Molt-4 cells (upregulated), and 16 genes with significantly lower expression in CCRF-CEM compared to Molt-4 cells (downregulated). To validate the models, we investigated how many of the genes remained part of the condition specific models, after the integration of the metabolomic data. CCRF-CEM and the Molt-4 specific 82 model covered the same subset of DEGs, both of 49 (66 transcripts) upregulated genes and 12 (13 transcripts) downregulated genes (missing downregulated genes Entrez Gene IDs: 19, 875, 23657, 2944; missing upregulated genes Entrez Gene ID: 64131, 64772, 2581, 256435, 6799, 4697, 9951, 7263), and also the same amount of reactions associated with these DEGs, which was 144 reactions for upregulated genes, and 15 reactions associated with downregulated genes (Figure 3.1C). We identified 90 AS genes in the set of Recon 1 genes, and all AS genes remained in the condition specific models. CCRF-CEM and Molt-4 models both covered an equal set of 211 AS gene associated reactions (Figure 3.1C). The gene expression data contributed only minor to the differentiation of the models. Both models included the same amount of DEGs and AS genes, as well as the same amount of reactions associated with these gene sets. Enzyme assays Molt-4 and CCRF-CEM cells were grown as described previously, and harvested in their respective log growth phase. Cell number, size and viability (Trypan blue exclusion) was obtained by counting cells using an automatic cell counter, Countess (Invitrogen). Cells were collected by centrifugation at 201 x g for 5 min, washed once with PBS and pelleted again by centrifugation. The cells were then resuspended in extraction buffer (0.1 M Tris, 2.5 mM EDTA, pH 7.75) to afford 1x105 cells µL and heated on a heat block set to 100 ◦ C for 2 min followed by cooling on ice. Following centrifugation at 20000xg, the supernatant fraction (hereafter metabolite extract: ME) was removed and stored at −80 ◦ C prior to biochemical assays. ATP content was measured in 100x diluted ME using the CellTiter-Glo kit (Promega) according to the manufacturers instructions employing a Spectramax M3 microplate reader. NAD+ and NADH were measured in 5x diluted ME using the Amplite fluorometric NADNADH ratio assay kit (AAT Bioquest) according to the manufacturers instructions. NADP+ and NADPH were measured similarly using the Amplite fluorometric NADP+/NADPH ratio assay kit (AAT Bioquest). Oxidized and reduced glutathione was measured similarly in 10 x diluted ME using the Amplite fluorometric GSHGSSG ratio assay kit (AAT Bioquest). ROS were evaluated using a modified ORAC assay based on a method described by Ganske and Dell [210]. Briefly, 25 µL of ME or 25 µL of the standard 6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid (Trolox, Sigma) were mixed with 150 µL 10 nM fluorescein (Sigma) and 25 µL 120 nM [2,2’-azobis(2-methylpropionamidine) dihydrochloride (AAPH)] (Sigma) in a transparent 96 well microplate. Following 15 sec mechanical shaking, fluorescence (ex:485 nm, em: 580 nm) was monitored at 1 min intervals for 80 min at 37 ◦ C. ORAC values were extrapolated off a Trolox standard curve using Softmax Pro software and expressed as µmol of Trolox equivalent (T.E.) 1x106 cells. All biochemical assay data shown represent triplicate averages, 83 n = 2. All calculations were performed by using TomLab cplex linear solver and MATLAB. 84 Figure 3.3: Differences in the use of the TCA cycle by the CCRF-CEM model (red) and the Molt-4 model (blue). The table provides the median values of the sampling results. Negative values in histograms and in the table describe reactions with flux in the reverse direction of reversible reactions. There are multiple reversible reactions for the transformation of isocitrate and α-ketoglutarate, malate and fumarate, and succinyl-CoA and succinate. These reactions are unbounded, and therefore histograms are not shown. The details of participating cofactors have been removed. Atp = ATP, cit = citrate, adp = ADP, pi = phosphate, oaa = oxaloacetate, accoa = acetyl-CoA, coa = coenzyme-A; icit = isocitrate; αkg = α-ketoglutarate; succ-coa = succinyl-CoA; succ = succinate; fum = fumarate; mal = malate, oxa = oxaloacetate; pyr = pyruvate; lac = lactate; ala = alanine; gln = glutamine; ETC = electron transport chain. 85 Figure 3.4: Sampling reveals different utilization of oxidative phosphorylation by the generated models. Different distributions are observed for the CCRF-CEM model (red) and the Molt-4 model (blue). Molt-4 has higher median flux through ETC reactions II-IV. The table provides the median values of the sampling results. Negative values in the histograms and in the table describe reactions with flux in the reverse direction of reversible reactions. Both models lack Complex I of the ETC because of constraints arising from the mapping of transcriptomic data. Electron transfer flavoprotein and electron transfer flavoprotein-ubiquinone oxidoreductase both also carry higher flux in the Molt-4 model. 86 Figure 3.5: A-K) Experimentally determined ATP, NADH + NAD, NADPH + NADP, and GSH + GSSG concentrations, and ROS detoxification in the CCRF-CEM and Molt-4 cells. L) Expectations for cellular energy and redox states. Expectations are based on predicted metabolic differences of the Molt-4 and CCRF-CEM models. 87 3.10 Supplementary material This section captures tables published as supplementary material. 88 89 rxns 34HPPte 3MOBte 3MOPte 4HPRO_LTte 4MOPte 5MTAte 5OXPROt AHCYSte AICARte ANTHte ARGte CBASPte DM_4hrpo DM_Lcystin DM_anth DM_btn DM_fol DM_ncam DM_pnto_R EX_34hpp EX_3mob(e) EX_3mop(e) EX_4mop(e) EX_5mta(e) EX_5oxpro(e) EX_ahcys(e) EX_aicar(e) EX_anth(e) EX_cbasp(e) EX_mal_L(e) MAL_Lte ORNt OROTGLUt PNTOte Formulas 34hpp[e] <=> 34hpp[c] 3mob[e] <=> 3mob[c] 3mop[e] <=> 3mop[c] 4hpro_LT[e] <=> 4hpro_LT[m] 4mop[e] <=> 4mop[c] 5mta[e] <=> 5mta[c] 2 na1[e] + 5oxpro[e] <=> 2 na1[c] + 5oxpro[c] ahcys[e] <=> ahcys[c] aicar[e] <=> aicar[c] anth[e] <=> anth[c] arg_L[e] <=> arg_L[c] cbasp[e] <=> cbasp[c] 4hpro_LT[m] -> Lcystin[c] -> anth[c] -> btn[c] -> fol[c] -> ncam[c] -> pnto_R[c] -> 34hpp[e] <=> 3mob[e] <=> 3mop[e] <=> 4mop[e] <=> 5mta[e] <=> 5oxpro[e] <=> ahcys[e] <=> aicar[e] <=> anth[e] <=> cbasp[e] <=> mal_L[e] <=> mal_L[e] <=> mal_L[c] orn[e] <=> orn[c] glu_L[c] + orot[e] <=> glu_L[e] + orot[c] pnto_R[e] <=> pnto_R[c] RxnNames 34HPPte 3MOBte 3MOPte trans-4-hydroxy-L-proline-transport 4MOPte 5MTAte 5-oxoproline transport (sodium symport) (2:1) AHCYSte aicar transport p-aminobenzoate (PABA)/anthranilate transport diffusion reaction for arginin N-carbamoylaspartate transport demand reaction for trans-4-hydroxy-L-proline demand reaction for Lcystin demand reaction for PABA demand reaction for biotin demand reaction for Folate demand reaction for Nicotinamide demand reaction for (R)-Pantothenate EX_34hpp EX_3mob(e) EX_3mop(e) EX_4mop(e) EX_5mta(e) EX_5oxpro(e) EX_ahcys(e) aicar exchange reaction EX_anth(e) EX_cbasp(e) L-Malate exchange malate transport ornithine transport via diffusion (extracellular to periplasm) antiport of orotate and glutamate diffusion reaction of pantothenate lb -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 0 0 0 0 0 0 0 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 ub 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10864.1 grRules 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Recon 2 1 1 1 1 1 1 1 1 1 1 1 1 global model Table 3.2: Reactions added to Recon 2 and the global model. Exchange and transport reactions were added for those intracellular metabolites of the global model that had been detected in the extracellular metabolome of the cell lines. Added transport mechanisms were almost exclusively diffusion reactions and not gene associated. upreg CCRF-CEM correct 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 upreg Molt-4 incorrect 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CCRF-CEM median (mmol/gdw/hr) 1.222 0.025 7.103 7.102 0.046 0.057 0.145 0.145 0.128 0.354 0.814 0.099 0.103 7.109 0.229 0.216 Molt-4 median (mmol/gdw/hr) 0.83 0.018 4.437 4.431 0.037 0.039 0.101 0.101 0.078 0.232 0.287 0.07 0.072 4.441 0.166 0.154 Differentially expressed down-regulated Genes 84706 51301 230 230 230 63917 2982 2983 4143 4942 5091 10846 10846 5213 5728 5728 average expression Molt-4 7.986 6.808 8.263 8.263 8.263 8.366 8.672 8.228 6.963 8.376 7.982 6.894 6.894 8.111 10.119 10.119 average expression CCRF-CEM 6.753 5.335 6.469 6.469 6.469 4.957 4.795 4.235 5.727 4.113 6.714 5.278 5.278 6.591 6.832 6.832 adjusted p-value 0.015 0.02 0.011 0.011 0.011 0.018 0.006 0.011 0.026 0.004 0.016 0.035 0.035 0.038 0.005 0.005 Glutamate metabolism O-Glycan Biosynthesis Glycolysis/Gluconeogenesis Fructose and Mannose Metabolism Glyoxylate and Dicarboxylate Metabolism O-Glycan Biosynthesis Nucleotides Nucleotides Methionine Metabolism Urea cycle/amino group metabolism Pyruvate Metabolism Nucleotides Nucleotides Glycolysis/Gluconeogenesis Inositol Phosphate Metabolism Inositol Phosphate Metabolism Table 3.3: Comparison of flux changes and and gene expression changes of genes more highly expressed in Molt-4 cells. Reaction Abbreviation ALATA_L CORE2GTg FBA FBA2 FBA4 GALNTg GUACYC GUACYC METAT ORNTArm PCm PDE1 PDE4 PFK PI345P3P PI345P3Pn 90 Table 3.4: Unique Knock-out (KO) genes for each cancer cell line model. Molt-4 5832.1 2271.1 26227.1 CCRF-CEM unique KO genes 2805.1 316.1 6539.1 262.1 55349.1 1468.1 57026.1 4953.1 55163.1 8566.1 6723.1 91 Figure 3.6: Growth and apoptosis of Molt-4 and CCRF-CEM cells. Cells were resuspended in RPMI advanced containing DMSO (0.67%) and cultured at 37◦ C and 5% CO2 in 24 well plates. A Cells were counted using an automatic cell counter. The graphs show viable cells (excluding Trypan blue). Data shown is the average and standard deviation of biological triplicates. B Inhibition of growth by DMSO (0.67%) after 48 hrs. Data shown is the average and standard deviation of 7 (CCRF-CEM) and 6 (Molt-4) independent experiments. C Cell viability is shown as the fraction of cells excluding Trypan blue in each sample. Data shown is the average and standard deviation of 7 (CCRF-CEM) and 6 (Molt-4) independent experiments. D Apoptosis was measure after 48 hrs using Annexin V binding and flow cytometry. Data shown are all apoptotic cells, including cells undergoing necrosis (stained with Annexin V-PE and 7-AAD). Data shown is the average and standard deviation of three independent experiments. 92 93 Metabolite choline p-aminobenzoate (PABA) glucose pyruvate tryptophan threonine lysine leucine phenylalanine folate isoleucine proline tyrosine trans-4-hydroxyproline methionine histidine valine 5-oxoproline lactate citrate glycine glutamate 5-methylthioadenosine (MTA) ornithine asparagine 4-methyl-2-oxopentanoate 3-methyl-2-oxovalerate uridine succinate betaine malate pyridoxate 3-methyl-2-oxobutyrate 4-hydroxyphenylpyruvate glutamine alanine cysteine 1613345.1 45304.84667 1258710.1 52977.77333 30970.784 68680.93 616352.4333 65245.09667 506043.8667 1.28 0.86 0.95 0.82 FC medium over time 0.97 0.89 1.07 1.20 1.01 0.97 1.01 1.00 0.97 0.90 1.00 0.99 1.02 1.04 0.99 1.02 0.99 1.34 1.06 1.00 0.88 1.04 323185.6667 2788313.567 62076.23333 Mean CCRF-CEM (2 hrs) 884112.5667 158271.14 65009057.67 376779.1 2068977 386495.2 196447.1 21119935.67 8237784.667 100629.24 20790535.33 5134219 4023284.333 679862.7 2890029.333 74035.24 2668140 62146.47333 5222377.933 8317.144 219683.7 437398.3667 19417.58 73850.77 744471.0333 9918.992 7222.259333 Mean CCRF-CEM (48 hrs) 259273.9333 60631.19333 24330565.33 249036.3333 1570648 303808.2 149861.6667 16346765.67 6540301.667 84807.62333 17219085 4445918.333 3489981.333 582257.4667 2538211 86165.55 2790196.333 1012932.38 134980059.9 86546.77933 460476.5267 630407.2667 62796.41667 98489.89 1010001.033 129433.4973 145547.7347 4952.689333 82806.571 442897.2833 94181.77233 12895.19667 17641.55667 6115.220333 2063962.067 30868376.53 64524.22333 0.00 0.16 0.09 0.96 FC MOLT over time 3.41 2.61 2.67 1.51 1.32 1.27 1.31 1.29 1.26 1.19 1.21 1.15 1.15 1.17 1.14 0.86 0.96 0.06 0.04 0.10 0.48 0.69 0.31 0.75 0.74 0.08 0.05 0.00 Table 3.5: Metabolomic data of CCRF-CEM cells (mapped). Mean control medium (48 hrs) 905132.5 178538.2167 71459886.33 387569.1333 2132723.667 417512.6333 195675.8 20801258.67 8345511.333 105904.7067 21225778.33 5168599.333 4063607.667 786011.5667 3018536.333 67843.12 2900419.667 21808.73 1917042.967 8704.7345 186682.92 455197.4667 Mean control medium (2 hrs) 876300.4333 159124.46 76555640.67 465074.6 2153692 406102.2667 198435.8 20823770.33 8087955 95419.73667 21229254.67 5107317.333 4142302 814465.8333 2995910.333 69077.16333 2857012.667 29168.15 2038946.433 8681.527333 163882.9467 473539.8667 Exchange reaction 2.44 1.72 1.60 0.31 0.31 0.30 0.30 0.29 0.29 0.29 0.21 0.17 0.13 0.13 0.15 0.16 0.03 1.28 1.02 0.90 0.40 0.35 0.31 0.20 0.08 0.08 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 1.19 0.11 Difference of FC EX_chol(e) EX_anth(e) EX_glc(e) EX_pyr(e) EX_trp_L(e) EX_thr_L(e) EX_lys_L(e) EX_leu_L(e) EX_phe_L(e) EX_fol(e) EX_ile_L(e) EX_pro_L(e) EX_tyr_L(e) EX_4hpro_LT(e) EX_met_L(e) EX_his_L(e) EX_val_L(e) EX_5oxpro(e) EX_lac-L(e) EX_cit(e) EX_gly(e) EX_glu-L(e) EX_5mta(e) EX_orn(e) EX_asn_L(e) EX_4mop(e) EX_3mop(e) EX_uri(e) EX_succ(e) EX_glyb(e) EX_mal_L(e) EX_4pyrdx(e) EX_3mob(e) EX_34hpp(e) EX_gln_L(e) EX_ala-L(e) EX_cys_L(e) Comment uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake EssAA -uptake EssAA -uptake EssAA -uptake secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion no direction no direction no direction Metabolite pyridoxine (Vitamin B6) phosphate valine biotin myo-inositol aspartate pantothenate arginine nicotinamide serine cystine thiamin (Vitamin B1) riboflavin (Vitamin B2) fructose erythrose trehalose caprylate (8:0) alanylglutamine cysteine-glutathione disulfide 4-guanidinobutanoate caproate (6:0) p-cresol sulfate phenol red dimethylarginine (SDMA + ADMA) N-acetylalanine N-acetylisoleucine N-acetylleucine N-acetylmethionine O-acetylhomoserine pseudouridine beta-hydroxyisovalerate uracil orotate inosine hypoxanthine guanine N-carbamoylaspartate S-adenosylhomocysteine (SAH) adenine Mean control medium (2 hrs) 889550.5333 216828142.3 2857012.667 133350.2667 4614936 632160.0333 84638.70667 379985.8667 376635.7 993272.0667 135388.0233 48437.09 23051.91333 240068.45 74911.88667 19662.35233 2766223.23 71265.4 32599.79 11700.92733 9962.496667 589185.2333 Mean control medium (48 hrs) 899254.5 221118425 2900419.667 128943.8 3920500 612562.3 86751.96 377714.6333 405942.7667 1166496.7 221451.05 38485.935 23033.54 259747.7433 86557.93 0.86 0.91 0.97 1.05 0.91 1.01 0.95 FC medium over time 0.99 0.98 0.99 1.03 1.18 1.03 0.98 1.01 0.93 0.85 0.61 1.26 1.00 0.92 0.87 14023.93033 2575210.853 68153.81667 28704.82333 7450.419 9284.687 585617.5333 Mean CCRF-CEM (2 hrs) 932496.1333 212276379 2668140 133599.3667 3505897 680373.4333 88002.12 374009.2667 386861.5667 1367977.267 206287.07 45028.005 22609.52333 254332.1633 105577.8733 Mean CCRF-CEM (48 hrs) 905903.4667 208623151.3 2790196.333 129767.1333 3971405.333 770903.9333 99449.36667 399764.1333 379674.8 1422040.5 231801.3733 119204.6633 26433.13667 117301.67 66799.03333 274029.1833 8147.758667 952614.66 0.00 0.93 1.02 0.96 1.00 FC CCRF-CEM over time 1.03 1.02 0.96 1.03 0.88 0.88 0.88 0.94 1.02 0.96 0.89 0.38 0.86 2.17 1.58 0.00 1.72 2.70 Table 3.6: Metabolomic data of CCRF-CEM cells (not mapped). 16937.25333 2529354.017 69277.14 34301.30333 10626.237 10082.20967 560994.5 30830.90333 7272.74 9692.281667 584484.7 109093.6367 16868.47433 7383.618 11703.27667 6690.47 46041.62267 11813.504 Exchange reaction 0.04 0.04 0.03 0.00 0.29 0.15 0.09 0.07 0.09 0.11 0.28 0.88 0.15 1.24 0.72 0.00 0.86 1.79 0.97 0.12 0.12 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Difference of FC Comment 5% cutoff 5% cutoff 5% cutoff 5% cutoff small difference small difference small difference small difference small difference increasing signal, not clear increasing signal, not clear excluded excluded excluded excluded not in Recon1 not in the medium not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not detected not detected not detected not detected not detected not detected not detected not detected 94 95 Metabolite p-aminobenzoate (PABA) 159124.46 178538.2167 0.89 162567.13 EX_anth(e) 0.89 uptake pyruvate glucose choline lysine valine phenylalanine threonine tryptophan methionine pantothenate leucine tyrosine isoleucine histidine 5-oxoproline lactate malate citrate glutamate glycine aspartate 4-methyl-2-oxopentanoate 3-methyl-2-oxovalerate ornithine 3-methyl-2-oxobutyrate glutamine alanine cysteine 68680.93 1258710.1 52977.77333 65245.09667 1613345.1 45304.84667 8681.527333 473539.8667 163882.9467 632160.0333 Mean control medium (48 hrs) 387569.1333 71459886.33 905132.5 195675.8 2900419.667 8345511.333 417512.6333 2132723.667 3018536.333 86751.96 20801258.67 4063607.667 21225778.33 67843.12 21808.73 1917042.967 30970.784 8704.7345 455197.4667 186682.92 612562.3 465074.6 76555640.67 876300.4333 198435.8 2857012.667 8087955 406102.2667 2153692 2995910.333 84638.70667 20823770.33 4142302 21229254.67 69077.16333 29168.15 2038946.433 1.28 0.86 0.95 1.00 1.04 0.88 1.03 1.20 1.07 0.97 1.01 0.99 0.97 0.97 1.01 0.99 0.98 1.00 1.02 1.00 1.02 1.34 1.06 FC medium over time 824549.3667 3430342.067 56566.27667 439148.0667 61697085.33 892182.2 188473.1 2853523.667 8215168.333 381085.2333 2037735.333 3024630.333 89717.10667 19725086.67 3934639.333 20799761 69406.69 120655.9867 5654513.467 20292.406 9459.837 462903.8333 121762.3567 590881.7333 34436.50433 25108.829 54272.41667 Mean Molt-4 (2 hrs) 210407.8333 34981419.33 541860.4667 112386.1667 1793173.667 5360276 259555.2667 1387754.333 2266832.333 68882.68333 15148808 3075783.333 17160163 95624.28 2060525.467 101768253 27226.6555 34177.945 1024508.5 310547.7 940705.6 113668.5123 121927.3673 65159.50333 14717.55667 2283200.867 25970024.1 60759.23 Mean Molt-4 (48 hrs) 2.09 1.76 1.65 1.68 1.59 1.53 1.47 1.47 1.33 1.30 1.30 1.28 1.21 0.73 0.06 0.06 0.75 0.28 0.45 0.39 0.63 0.30 0.21 0.83 0.00 0.36 0.13 0.93 FC Molt-4 over time Table 3.7: Metabolomic data of Molt-4 cells (mapped). Mean control medium (2 hrs) EX_pyr(e) EX_glc(e) EX_chol(e) EX_lys_L(e) EX_val_L(e) EX_phe_L(e) EX_thr_L(e) EX_trp_L(e) EX_met_L(e) EX_pnto-R(e) EX_leu_L(e) EX_tyr_L(e) EX_ile_L(e) EX_his_L(e) EX_5oxpro(e) EX_lac-L(e) EX_mal_L(e) EX_cit(e) EX_glu-L(e) EX_gly(e) EX_asp_L(e) EX_4mop(e) EX_3mop(e) EX_orn(e) EX_3mob(e) EX_gln_L(e) EX_ala-L(e) EX_cys_L(e) Exchange reaction 0.89 0.69 0.68 0.66 0.61 0.56 0.50 0.46 0.34 0.33 0.30 0.26 0.21 0.29 1.28 1.01 0.75 0.72 0.59 0.49 0.40 0.30 0.21 0.12 0.00 0.36 1.15 0.08 Difference of FC uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake uptake EssAA -uptake secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion secretion no direction no direction no direction, added for modeling Comment Metabolite myo-inositol nicotinamide phosphate folate serine cystine arginine trans-4-hydroxyproline caprylate (8:0) pyridoxine (Vitamin B6) asparagine proline biotin fructose erythrose riboflavin (Vitamin B2) thiamin (Vitamin B1) alanylglutamine cysteine-glutathione disulfide 4-guanidinobutanoate caproate (6:0) phenol red p-cresol sulfate 4-hydroxyphenylpyruvate 5-methylthioadenosine (MTA) adenine beta-hydroxyisovalerate betaine dimethylarginine (SDMA + ADMA) guanine hypoxanthine inosine N-acetylalanine N-acetylisoleucine N-acetylleucine N-acetylmethionine N-carbamoylaspartate O-acetylhomoserine orotate pseudouridine pyridoxate S-adenosylhomocysteine (SAH) succinate trehalose uracil uridine Mean control medium (2 hrs) 4614936 376635.7 216828142.3 95419.73667 993272.0667 135388.0233 379985.8667 814465.8333 16937.25333 889550.5333 506043.8667 5107317.333 133350.2667 240068.45 74911.88667 23051.91333 48437.09 2529354.017 69277.14 34301.30333 10626.237 560994.5 10082.20967 FC medium over time 1.18 0.93 0.98 0.90 0.85 0.61 1.01 1.04 0.86 0.99 0.82 0.99 1.03 0.92 0.87 1.00 1.26 0.91 0.97 1.05 0.91 0.95 1.01 Mean Molt-4 (2 hrs) 4195275 367096.5667 223518663 97550.78667 814496.1667 121897.495 391354.4333 630513.4 12992.01433 906371.4667 549097.2 5163708.333 127379.2667 321092.33 97682.54333 23085.68333 32363.01 2409678.433 71769.5 32887.87667 8926.486 566563.1667 9632.704667 Mean Molt-4 (48 hrs) 3998122 361576.3667 216863897.3 102678.49 762597.6 130375.3933 375180.1 622493.9 15446.087 930993.3 677509.1333 5263614.333 123131.3 161924.9033 71213.15333 24688.79 87165.94333 173792.14 169136.6333 27785.40667 10492.25633 566028.5 9219.116 FC Molt-4 over time 1.05 1.02 1.03 0.95 1.07 0.93 1.04 1.01 0.84 0.97 0.81 0.98 1.03 1.98 1.37 0.94 0.37 13.87 0.42 1.18 0.85 1.00 1.05 Exchange reaction Table 3.8: Metabolomic data of Molt-4 cells (not mapped). Mean control medium (48 hrs) 3920500 405942.7667 221118425 105904.7067 1166496.7 221451.05 377714.6333 786011.5667 19662.35233 899254.5 616352.4333 5168599.333 128943.8 259747.7433 86557.93 23033.54 38485.935 2766223.23 71265.4 32599.79 11700.92733 589185.2333 9962.496667 Difference of FC 0.13 0.09 0.05 0.05 0.22 0.32 0.04 0.02 0.02 0.02 0.01 0.01 0.00 1.06 0.51 0.07 0.89 12.95 0.55 0.13 0.06 0.05 0.03 Comment small difference small difference small difference small difference small difference, increasing signal, not clear small difference, increasing signal, not clear 5% cutoff 5% cutoff 5% cutoff 5% cutoff 5% cutoff 5% cutoff 5% cutoff excluded excluded excluded excluded not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not in Recon1 not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected not detected 96 Table 3.9: Tables of absent genes (Entrez Gene IDs). Cutoff p<=0.05. Molt-4 535.1 1548.1 2591.1 3037.1 4248.1 4709.1 6522.1 7167.1 7367.1 8399.1 23545.1 129807.1 221823.1 CCRF-CEM 239.1 443.1 535.1 1548.1 2683.1 3037.1 4248.1 4709.1 5232.1 6522.1 7364.1 7367.1 8399.1 23545.1 54363.1 66002.1 129807.1 221823.1 97 SystemCode En En En En En En En En En En En En En En En En Avg-M 8.375521333 10.11926708 8.496148571 8.672461364 9.263808333 8.262743 8.228284667 10.87135917 7.985629815 7.981523333 8.365903333 6.808208333 6.962758667 9.71058 6.893592727 8.110786667 Avg-C 4.112782667 6.8316375 4.805906667 4.795136667 7.12825 6.468515 4.234786333 9.849012917 6.752667593 6.713636667 4.956553333 5.335471667 5.726715333 7.897661667 5.277731515 6.590863333 Log fold-C_vs_M -4.262738667 -3.287629583 -3.690241905 -3.877324697 -2.135558333 -1.794228 -3.993498333 -1.02234625 -1.232962222 -1.267886667 -3.40935 -1.472736667 -1.236043333 -1.812918333 -1.615861212 -1.519923333 FC_vs_M -19.19606458 -9.765064567 -12.9084324 -14.69572569 -4.394071463 -3.468298332 -15.92805644 -2.03121964 -2.350491107 -2.408085584 -10.62469852 -2.775478787 -2.355516329 -3.513522977 -3.064945057 -2.867758096 Rawp-C_vs_M 7.99E-06 1.78E-05 2.06E-05 2.45E-05 7.77E-05 0.000109391 0.000117929 0.000167936 0.000205075 0.00022421 0.000270385 0.000348653 0.000572536 0.000729345 0.000999696 0.001231408 Adjp-C_vs_M 0.00430063 0.005406007 0.005541875 0.005886867 0.00947567 0.011092184 0.011422963 0.013752713 0.015409421 0.016241419 0.01767911 0.020012748 0.025892296 0.02911905 0.0347777 0.038406061 ANOVA-rawp 7.99E-06 1.78E-05 2.06E-05 2.45E-05 7.77E-05 0.000109391 0.000117929 0.000167936 0.000205075 0.00022421 0.000270385 0.000348653 0.000572536 0.000729345 0.000999696 0.001231408 ANOVA-adjp 0.00430063 0.005406007 0.005541875 0.005886867 0.00947567 0.011092184 0.011422963 0.013752713 0.015409421 0.016241419 0.01767911 0.020012748 0.025892296 0.02911905 0.0347777 0.038406061 Largest FC 4.262738667 3.287629583 3.690241905 3.877324697 2.135558333 1.794228 3.993498333 1.02234625 1.232962222 1.267886667 3.40935 1.472736667 1.236043333 1.812918333 1.615861212 1.519923333 Entrez Gene ID 4942 5728 19 2982 875 230 2983 23657 84706 5091 63917 51301 4143 2944 10846 5213 Table 3.10: Differentially expressed Recon 1 genes. Genes significantly lower expressed in CCRF-CEM compared to Molt-4 cells (down-regulated) GeneID ENSG00000065154 ENSG00000171862 ENSG00000165029 ENSG00000164116 ENSG00000160200 ENSG00000109107 ENSG00000061918 ENSG00000151012 ENSG00000166123 ENSG00000173599 ENSG00000178234 ENSG00000176928 ENSG00000151224 ENSG00000134184 ENSG00000112541 ENSG00000152556 98 99 GeneID ENSG00000165646 ENSG00000103056 ENSG00000151229 ENSG00000110090 ENSG00000169692 ENSG00000103489 ENSG00000100092 ENSG00000131844 ENSG00000047230 ENSG00000137124 ENSG00000151689 ENSG00000114805 ENSG00000110719 ENSG00000197142 ENSG00000198610 ENSG00000115159 ENSG00000176463 ENSG00000105655 ENSG00000167280 ENSG00000054983 ENSG00000008513 ENSG00000175198 ENSG00000119673 ENSG00000184005 ENSG00000163754 ENSG00000126264 ENSG00000196502 ENSG00000185527 ENSG00000164574 ENSG00000067225 ENSG00000056998 ENSG00000197165 ENSG00000189043 ENSG00000182621 ENSG00000233276 ENSG00000136908 ENSG00000182601 ENSG00000017483 ENSG00000118402 ENSG00000053371 ENSG00000139629 ENSG00000150768 ENSG00000180011 ENSG00000004864 ENSG00000140374 ENSG00000114480 ENSG00000101846 ENSG00000103876 ENSG00000141349 ENSG00000160216 ENSG00000128311 ENSG00000143149 ENSG00000160752 ENSG00000143179 ENSG00000152270 ENSG00000100504 ENSG00000106392 SystemCode En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En En Avg-M 6.348980833 6.507780667 4.470424444 5.915162667 6.958078 6.802452821 6.92734619 4.655619333 4.632352333 5.721805714 5.938046667 5.515985556 7.503504167 4.759571389 6.75731 2.876411667 5.475066 6.937104074 6.451015 3.950183333 6.702956944 6.313908333 6.076504167 5.121033333 7.105963333 6.687294444 7.036531111 5.553538333 6.763128333 10.22047364 5.887866 6.423324286 7.824286667 7.643203667 8.879731111 8.261582222 5.548533333 7.55413 7.319834444 8.741796667 8.827946061 7.273722778 8.622386667 6.083869048 8.671333333 7.359098056 9.246466667 8.654008333 8.104913333 5.906705 6.151493333 8.648609333 9.592555 9.339323333 7.220595417 6.391890606 7.17008 Avg-C 8.381608125 10.13391733 8.107204444 9.526362667 8.541751333 8.719742308 8.805102857 8.792884 8.514437167 8.771057143 7.491598333 10.07648289 9.165249167 8.206302222 9.349666667 7.917331667 7.960023333 8.705852222 7.742110833 7.531426667 7.935448194 8.556496667 7.37894 6.938466667 9.165953333 8.290237222 9.30005 8.002308333 7.7879425 11.34921576 8.236556 8.217031429 9.003066667 9.415810667 10.53635556 10.29223333 8.326338333 9.102361667 8.975881111 10.02813833 10.04615788 8.661583333 10.18323667 8.008005714 9.890636667 8.772863333 10.41815667 9.846961667 9.541003333 7.820725 7.3064675 9.735004667 11.01180722 10.51800667 8.993825833 7.489673333 8.825687778 Log fold-C_vs_M 2.032627292 3.626136667 3.63678 3.6112 1.583673333 1.917289487 1.877756667 4.137264667 3.882084833 3.049251429 1.553551667 4.560497333 1.661745 3.446730833 2.592356667 5.04092 2.484957333 1.768748148 1.291095833 3.581243333 1.23249125 2.242588333 1.302435833 1.817433333 2.05999 1.602942778 2.263518889 2.44877 1.024814167 1.128742121 2.34869 1.793707143 1.17878 1.772607 1.656624444 2.030651111 2.777805 1.548231667 1.656046667 1.286341667 1.218211818 1.387860556 1.56085 1.924136667 1.219303333 1.413765278 1.17169 1.192953333 1.43609 1.91402 1.154974167 1.086395333 1.419252222 1.178683333 1.773230417 1.097782727 1.655607778 FC_vs_M 4.091492739 12.34741102 12.4388396 12.22023395 2.997320449 3.777127508 3.675031629 17.59708633 14.74429395 8.277823161 2.935388926 23.59644036 3.163989912 10.90358636 6.030830411 32.92062909 5.598177898 3.407581466 2.447138632 11.96910468 2.349723907 4.73245351 2.466449645 3.52453598 4.16983414 3.037622895 4.801612197 5.459504427 2.034697278 2.186680015 5.09361529 3.467046396 2.263852558 3.416708102 3.152779873 4.085892115 6.858081267 2.924584486 3.151517484 2.439087757 2.326581649 2.616903193 2.950276152 3.795096753 2.328342562 2.664316139 2.252754343 2.286202718 2.705865257 3.768577339 2.226803362 2.123428209 2.67446852 2.263700875 3.418184847 2.140255046 3.150558892 Rawp-C_vs_M 9.38E-07 3.09E-07 2.15E-06 2.78E-06 3.42E-06 3.95E-06 8.87E-06 1.12E-05 1.16E-05 1.22E-05 1.46E-05 1.53E-05 1.63E-05 1.98E-05 4.14E-05 4.23E-05 4.91E-05 5.84E-05 7.09E-05 8.40E-05 8.63E-05 9.53E-05 9.87E-05 0.000106542 0.000120482 0.000125061 0.000137686 0.00016152 0.000185194 0.000185153 0.000209733 0.00024208 0.000261637 0.000336372 0.00034123 0.000362879 0.000391522 0.00046156 0.000477661 0.000551609 0.000624148 0.000643874 0.000763683 0.000777083 0.000799335 0.000844445 0.000862407 0.000870648 0.001120531 0.001146319 0.001184066 0.001225217 0.001235009 0.001338504 0.001373503 0.00164339 0.001955335 Adjp-C_vs_M 0.002369837 0.002369837 0.002680364 0.002873331 0.003231507 0.003409965 0.004323043 0.004629771 0.004734584 0.004783728 0.005133602 0.005216399 0.005329837 0.005541875 0.007352644 0.007381679 0.00778803 0.008323233 0.009009148 0.009951944 0.010059905 0.01058427 0.01068324 0.010966345 0.011531411 0.011691684 0.012416874 0.013424253 0.014508219 0.014508219 0.015616513 0.016816003 0.017430384 0.019708994 0.019800001 0.020267225 0.020706009 0.023107631 0.023572077 0.025530913 0.02669763 0.027265716 0.029858445 0.030274263 0.030846084 0.03175414 0.032115557 0.032199289 0.036746456 0.037188559 0.037877908 0.038371413 0.038472621 0.040231461 0.040746602 0.044737426 0.048840237 ANOVA-rawp 9.38E-07 3.09E-07 2.15E-06 2.78E-06 3.42E-06 3.95E-06 8.87E-06 1.12E-05 1.16E-05 1.22E-05 1.46E-05 1.53E-05 1.63E-05 1.98E-05 4.14E-05 4.23E-05 4.91E-05 5.84E-05 7.09E-05 8.40E-05 8.63E-05 9.53E-05 9.87E-05 0.000106542 0.000120482 0.000125061 0.000137686 0.00016152 0.000185194 0.000185153 0.000209733 0.00024208 0.000261637 0.000336372 0.00034123 0.000362879 0.000391522 0.00046156 0.000477661 0.000551609 0.000624148 0.000643874 0.000763683 0.000777083 0.000799335 0.000844445 0.000862407 0.000870648 0.001120531 0.001146319 0.001184066 0.001225217 0.001235009 0.001338504 0.001373503 0.00164339 0.001955335 ANOVA-adjp 0.002369837 0.002369837 0.002680364 0.002873331 0.003231507 0.003409965 0.004323043 0.004629771 0.004734584 0.004783728 0.005133602 0.005216399 0.005329837 0.005541875 0.007352644 0.007381679 0.00778803 0.008323233 0.009009148 0.009951944 0.010059905 0.01058427 0.01068324 0.010966345 0.011531411 0.011691684 0.012416874 0.013424253 0.014508219 0.014508219 0.015616513 0.016816003 0.017430384 0.019708994 0.019800001 0.020267225 0.020706009 0.023107631 0.023572077 0.025530913 0.02669763 0.027265716 0.029858445 0.030274263 0.030846084 0.03175414 0.032115557 0.032199289 0.036746456 0.037188559 0.037877908 0.038371413 0.038472621 0.040231461 0.040746602 0.044737426 0.048840237 Largest FC 2.032627292 3.626136667 3.63678 3.6112 1.583673333 1.917289487 1.877756667 4.137264667 3.882084833 3.049251429 1.553551667 4.560497333 1.661745 3.446730833 2.592356667 5.04092 2.484957333 1.768748148 1.291095833 3.581243333 1.23249125 2.242588333 1.302435833 1.817433333 2.05999 1.602942778 2.263518889 2.44877 1.024814167 1.128742121 2.34869 1.793707143 1.17878 1.772607 1.656624444 2.030651111 2.777805 1.548231667 1.656046667 1.286341667 1.218211818 1.387860556 1.56085 1.924136667 1.219303333 1.413765278 1.17169 1.192953333 1.43609 1.91402 1.154974167 1.086395333 1.419252222 1.178683333 1.773230417 1.097782727 1.655607778 Entrez Gene ID 6571 55512 114134 1374 10555 64131 57026 64087 56474 219 3628 23007 10312 51703 1109 2820 28232 51477 64772 2581 6482 5095 10965 256435 2992 10870 6817 5148 55568 5315 8908 6799 4697 23236 2876 8818 9951 92745 6785 8574 11226 1737 284273 10165 2108 2632 412 2184 92579 56894 7263 223 2224 7371 5140 5836 56913 Table 3.11: Differentially expressed Recon 1 genes. Genes significantly higher expressed in CCRF-CEM compared to Molt-4 cells (up-regulated) Table 3.12: Detection limits (derived from Paglia et al., 2012) for the definition of model bounds. For metabolites that were not captured in the paper, we queried HMDB using the displayed name. Exchange metabolites 5-methylthioadenosine (MTA) uridine choline nicotinamide 3-methyl-2-oxovalerate succinate pantothenate 5-oxoproline thiamin (Vitamin B1) p-aminobenzoate (PABA) trans-4-hydroxyproline lactate 3-methyl-2-oxobutyrate histidine tryptophan ornithine arginine threonine folate glutamine pyridoxate serine glucose riboflavin (Vitamin B2) glutamate tyrosine phenylalanine myo-inositol cystine leucine methionine cysteine asparagine malate isoleucine pyruvate lysine alanine citrate proline glycine aspartate 4-hydroxyphenylpyruvate 4-methyl-2-oxopentanoate betaine Valine 100 Exchange reaction EX_5mta(e) EX_uri(e) EX_chol(e) EX_ncam(e) EX_3mop(e) EX_succ(e) EX_pnto_R(e) EX_5oxpro(e) EX_thm(e) EX_anth(e) EX_4HPRO(e) EX_lac_L(e) EX_3mob(e) EX_his_L(e) EX_trp_L(e) EX_orn(e) EX_arg_L(e) EX_thr_L(e) EX_fol(e) EX_gln_L(e) EX_4pyrdx(e) EX_ser_L(e) EX_glc(e) EX_ribflv(e) EX_glu_L(e) EX_tyr_L(e) EX_phe_L(e) EX_inost(e) EX_Lcystin(e) EX_leu_L(e) EX_met_L(e) EX_cys_L(e) EX_asn_L(e) EX_mal_L(e) EX_ile_L(e) EX_pyr(e) EX_lys_L(e) EX_ala_L(e) EX_cit(e) EX_pro_L(e) EX_gly(e) EX_asp_L(e) EX_34hpp EX_4mop(e) EX_glyb(e) EX_val_L(e) Theoretical mass (g/mol) 298.0974 243.0617 104.1075 123.0558 129.0552 117.0188 220.1185 128.0348 265.1123 138.0555 132.0661 89.0239 115.0395 156.0773 205.0977 133.0977 175.1195 120.0661 440.1319 147.077 182.0453 106.0504 179.0556 377.1461 148.061 182.0817 166.0868 179.0556 241.0317 132.1025 150.0589 122.0276 133.0613 133.0137 132.1025 87.0082 147.1134 90.0555 191.0192 116.0712 74.0242 134.0453 180.157 130.142 118.0868 118.0868 LOD (ng/mL) 0.3 1.7 2.8 3 3.5 3.9 4 4.8 6.1 7.7 8.1 10.9 11.2 13.6 15.7 16.9 24.8 25.6 25.7 28.4 32.7 37.5 44 45 45 47.4 48.4 59 59.7 68.9 74.1 77 82.1 99.2 112.9 121.3 131.7 133.5 150.8 169.2 214.3 229.5 537.3 3.5 2.8 28.2 LOD (mM) Input for 1.00638E-06 6.99411E-06 2.68953E-05 2.43792E-05 2.71202E-05 3.3328E-05 1.8172E-05 3.74898E-05 2.30091E-05 5.57747E-05 6.13329E-05 0.000122439 9.73579E-05 8.71363E-05 7.65489E-05 0.000126974 0.000141618 0.000213216 5.83916E-05 0.000193096 0.000179626 0.000353605 0.000245734 0.000119317 0.000303929 0.000260323 0.000291414 0.000329507 0.000247685 0.000521565 0.000493806 0.000631005 0.000617009 0.000745788 0.000854639 0.001394121 0.000895228 0.001482419 0.000789449 0.001457726 0.002894999 0.001712108 0.002982399 2.68937E-05 2.37114E-05 0.000238807 Table 3.13: Calculation of the growth rates and definition of upper (ub) and lower bounds (lb) imposed on the CCRF-CEM model. Time hrs 0.5 9 24 48 Viable Concentration 4.47E+05 6.20E+05 1.20E+06 2.17E+06 Stdev 2.52E+04 6.24E+04 1.00E+05 5.77E+04 Counted over hours 47.5 Doubling time 19.6 -20% 20% 19.6 23.52 15.68 Growth rate 0.035 0.029 0.044 lb ub Table 3.14: Calculation of the growth rates and definition of upper (ub) and lower bounds (lb) imposed on the Molt-4 model. Time hrs 0.5 9 24 48 Viable Concentration 4.63E+05 6.37E+05 1.00E+06 2.00E+06 Stdev 5.86E+04 8.96E+04 8.39E+04 2.00E+05 Counted over hours 47.5 Doubling time 22.0 -20% 20% 22 26.4 17.6 Growth rate 0.032 0.026 0.039 lb ub Table 3.15: Lower bounds of commonly exchanged metabolites were adjusted according to the relation of change in uptake/secretion in the experiment.) direction Exchange of adjustment relation CCRF-CEM Molt-4 metabolite lb lb secr EX_mal_L(e) higher secretion 23.4 0.04597 0.00196 secr EX_3mop(e) higher secretion 4.15 0.00030 0.00007 secr EX_4mop(e) higher secretion 3.95 0.00028 0.00007 secr EX_cit(e) higher secretion 2.88 0.00599 0.00208 secr EX_lac_L(e) higher secretion 1.44 0.00046 0.00032 secr EX_3mob(e) higher secretion 1.2 0.00031 0.00026 secr EX_orn(e) higher secretion 1.11 0.00037 0.00033 secr EX_glu_L(e) higher secretion 1.35 0.00080 0.00108 secr EX_gly(e) higher secretion 1.18 0.00763 0.00900 secr EX_5oxpro(e) higher secretion 1.05 0.00010 0.00010 upt EX_chol(e) lower uptake 0.48 -0.05637 -0.02706 upt EX_glc(e) lower uptake 0.66 -29.26278 -19.31343 upt EX_pyr(e) lower uptake 0.62 -1.63303 -2.63391 upt EX_lys_L(e) lower uptake 0.72 -0.51962 -0.72169 upt EX_phe_L(e) lower uptake 0.78 -0.18675 -0.23942 upt EX_thr_L(e) lower uptake 0.85 -0.37612 -0.44250 upt EX_tyr_L(e) lower uptake 0.89 -0.30240 -0.33977 upt EX_trp_L(e) lower uptake 0.89 -0.05743 -0.06453 upt EX_leu_L(e) lower uptake 0.99 -0.99609 -1.00615 upt EX_ile_L(e) no difference 1 -1.00615 -1.00615 101 4 Contextualization Procedure and Modeling of Monocyte Specic TLR Signaling Innate immunity is the first line of defense against invasion of pathogens. Toll-like receptor (TLR) signaling is involved in a variety of human diseases extending far beyond immune system related diseases, affecting a number of different tissues and cell-types. Computational models often do not account for cell-type specific differences in signaling networks. Investigation of these differences and its phenotypic implications could increase understanding of cell signaling and processes such as inflammation. The wealth of knowledge for TLR signaling has been recently summarized in a stoichiometric signaling network applicable for constraint-based modeling and analysis (COBRA). COBRA methods have been applied to investigate tissue specific metabolism using omics data integration. Comparable approaches have not been conducted using signaling networks. In this study, we present ihsTLRv2, an updated TLR signaling network accounting for the association of 314 genes with 558 network reactions. We present a mapping procedure for transcriptomic data onto signaling networks and demonstrate the generation of a monocyte specific TLR network. The generated monocyte network is characterized through expression of a specific set of isozymes rather than reduction of pathway contents. While further tailoring the network to a specific stimulation condition, we observed that the quantitative changes in gene expression due to LPS stimulation affected the tightly connected set of genes. Differential expression influenced about one third of the entire TLR signaling network, in particular, NF-κB activation. Thus, a cell-type and condition specific signaling network can provide functional insight into signaling cascades. Furthermore, we demonstrate the energy dependence of TLR signaling pathways in monocytes. 4.1 Introduction Toll-like receptors (TLRs) play a major role in innate immunity for sensing pathogens and inducing innate immune response [31]. Each TLR specifically recognizes one or more exogenous and endogenous ligands. Exogenous ligands are highly conserved microbial associated molecular pattern, e.g., CpG sequences within 103 DNA or lipopolysaccharide (LPS), a cell wall component of gram-negative bacteria. Upon stimulation downstream pathways and transcription factors (TF) are activated which modify gene expression and protein levels and induce production of pro-inflammatory cytokines and chemokines, amongst others. Human cells express up to ten TLRs [31]. LPS induces specifically TLR4 signaling pathways[31]. Disturbance of TLR signaling is thought to play a role in chronic inflammatory diseases affecting cells of the gastrointestinal tract, the central nervous system, kidneys, skin, lungs, and joints [38]. TLRs also seem to be involved in both inhibiting and promoting cancer [39]. TLRs expression has been confirmed for a large number of human tissues, yet sets of expressed TLRs vary [211, 212, 213]. Activation of differing downstream pathways has been suggested as response to viruses, TLR7, and TLR8 agonists in distinct monocytes subsets [214]. Differences and similarities in the expression of isoforms, TLRs and downstream pathways of the TLR network of different cells can have important implications for the design of therapeutical approaches. Drugs targeting TLR signaling pathways have considerable therapeutic potential in inflammatory diseases and cancer [42]. Monocytes are essential for the inflammatory response to microbial pathogens [215]. Blood circulating monocytes migrate into tissues and differentiate into a range of tissue macrophages and dendritic cells. However, monocytes themselves are involved in the defense against pathogens as they possess an extensive set of pathogen receptors and produce large quantities of effector molecules [215, 216]. Aberrant TLR signaling in the monocyte/macrophage cell lineage has been implicated in chronic inflammatory and auto-inflammatory diseases [217]. However, the reason for this increase in IL-1β secretion for some of these diseases remains unknown [217]. Taken together, understanding TLR signaling at cell-type and tissue specific resolution seems to be of major importance for unraveling mechanisms underlying disease development and progression. Signaling networks comprise a complex meshwork of multiple pathways, feedback loops, and cross-talk. Such complex networks may be best investigated using computational approaches. Constraint-based modeling and analysis (COBRA) techniques facilitate investigation of large-scale biological networks without depending on detailed kinetic and concentration information [2]. Instead, COBRA relies on physical-chemical constraints. A requirement for constraint-based modeling is a genome-scale reconstruction, which is subsequently converted into mathematical format. The protocols for biochemical network generation are well established [9] and tools to interrogate the model are freely available [15]. These networks are applied to study metabolism under various conditions, yet in multi-cellular organisms, challenged by the fact that individual cell-types are capable of only a limited range of metabolic functions. Hence, automated procedures have been developed that aim to tailor global genome-scale reconstructions tissue- and cell-type specific, 104 based on ’omics’ data sets [47, 131, 218]. COBRA procedures have also been applied to study successfully other cellular processes, including transcription and translation [6, 219], transcriptional regulation [220, 221], and signaling networks [7, 8, 222, 223]. The published generic TLR signaling network, ihsTLRv1 [7], represents a stoichiometric, predictive model comprising of 963 reactions and 781 proteins. It includes the input receptors TLR1-11, NOD1, NOD2, and Interleukin-1 receptor 1 (IL1R1). These receptors are connected to up to six outputs, ROS, CREB, AP-1, IRF-7, IRF3, and NF-κB (Table 4.5) through an extensive set of kinases and phosphatases. Due to its coverage, it is ideal to investigate TLR signaling pathways on a broader scale and to use it as context template for gene expression data sets. However, no gene identifiers and no gene-reaction associations were included, such that mapping of gene expression data and analysis of TLR signaling in cell-type or disease specific context, in analogy to applications of metabolic networks, was not possible so far. Tissue specific differences in the cell response to environmental stimuli have been recognized as major challenge in cell signaling, yet many models of signaling pathways neglect these cell-type specific differences[40]. The aim of the study was to explore the possibility of using COBRA methods and the human TLR signaling network to investigate tissue and disease specific differences in TLR signaling. Therefore, we first identified the set of genes associated with the reactions in ihsTLRv1, and we then generated an updated version of the TLR signaling network (ihsTLRv2). We used expression pattern of the identified TLR genes in human blood derived monocytes to reduce ihsTLRv2 to only contain the cell-type specific set of expressed isoforms, proteins, and reactions (Figure 4.2, see File S1 below for details on the procedure). We then investigate the extent and propagation of the changes induced through LPS stimulation onto pathway utilization. 4.2 Results 4.2.1 Extensions of gene results in ihs TLRv2 Gene-reaction associations (GRAs), connecting each network reaction with genes encoding participating proteins form the basis for cell-type or condition specific tailoring based on gene expression data. This contextualization was not possible with ihsTLRv1 due to missing GRAs. We employed the NCBI Entrez gene database [224], UniProtKB/Swiss-Prot [225], and primary literature to identify Homo sapiens specific genes and established GRAs using AND and OR Boolean logic. 105 ihsTLRv1 represented mammalian TLR signaling and included TLR1 through TLR11. However, the human open reading frame for TLR11 contains multiple stop codons indicating that this receptor may not be expressed [226]. Therefore, we removed TLR11, ten associated reactions (Table 4.7), and eight chemical compounds from the network (Table 4.8). We added exchange reactions to resolve dead-ends, i.e., reactants that were only produced or consumed, in the network (Table 4.9). Gene extension and tailoring of receptor content led to the human gene extended TLR signaling network, deemed ihsTLRv2. In total, we included 314 genes into ihsTLRv2, of which 312 genes were identified for 178 unique chemical compounds and two genes associated with a choline uniport reaction were taken from human metabolic reconstruction [45]. The choline uniport transporter encoded by the genes was not a chemical compound in ihsTLRv2. The 178 unique chemical compounds can be divided into receptors (14), kinases (64), phosphatases (7), and the remaining chemical compounds (93), also referred to as other proteins. Receptors were only encoded by single genes, while isoforms were much more common among the kinases (58%), the phosphatases (96%), and the other proteins (63%) (Table 4.1). Overall, redundant genes comprised 55% of the ihsTLRv2 gene content. We established GRAs for 558 of the 980 ihsTLRv2 reactions. A total of 291 modeling related reactions (i.e., sink, demand, and exchange reactions) were not assigned with GRAs. The remaining reactions without GRAs split into transport reactions of metabolites (37), TLR ligand expression, transport and binding reactions (87), reactions involving generic chemical compounds (3), and orphan chemical compounds (4). Ras family small GTP-binding protein generic (Ras) genes were not included in the current version of ihsTLRv2 due to functional ambiguity. The current version of ihsTLRv2 further did not include gene association for lipopolysaccharide-binding protein (LBP) due to its external origin. The chemical compounds SRC (c-Src), SRCK (Src family kinase (generic)), and SRTK (Src-related tyrosine kinase) were not unambiguously defined in ihsTLRv1. After thorough literature review, we assigned one gene to SRC (c-Src), while we treated SRCK and SRTK as the same chemical compound. Table 4.1: Statistics of the gene extension of the generic human TLR model. Groups of Chemical compounds Receptors Kinases Phosphatases Proteins Total Chemical compounds (n) 14 64 7 93 178 Genes assigned (n) 14 100 54 144 312 Unique genes (n) 14 42 2 83 141 Redundant genes (n) 0 58 52 61 171 Redundant genes comprise of all genes, which are associated with chemical compounds having isoforms. 106 4.2.2 Protein-Protein Interactions (PPI) in InnateDB and ihs TLRv2 A compendium of genes, proteins, and interactions specific to innate immune response of humans and mice to microbial infection has been collected in the Innate Immunity database, InnateDB [227]. We compared the ihsTLRv2 genes to the set of genes captured in the InnateDB PPIs to understand the connectivity of the TLR signaling involved proteins between each other. The query resulted in interactions among 242 of the 314 genes. The majority of genes without interactions encoded isoforms distributing mainly among 15 ihsTLRv2 chemical compounds. Five genes were not present in InnateDB, being calpain small subunit 2 (EntrezGene ID: 84290), diacylglycerol kinase kappa (EntrezGene ID: 139189), and thioredoxin reductase 1, 2, and 3 (EntrezGene IDs: 7296, 10587, 114112, respectively). We also computed the connectivity of the TLR network components based on the number of network components that co-appear in the network reactions. From this analysis, we excluded metabolites and ligands. We then ranked the ihsTLRv2 genes according to their number of PPIs derived from the InnateDB query as well as according to their connectivity in the model and compared the two ranking lists (Table 4.2). As one may expect, the ranking order in the two lists was comparable for highly connected gene products, even though the number of connections was much smaller in the ihsTLRv2 based connectivity list. The ten most highly connected genes were all involved in the MyD88-dependent signaling pathway, a pathway employed by all TLRs except TLR3 [37]. Hereby, MyD88 associates with a TLR and recruits IL-1 receptor associated kinase 1 (IRAK1) and TNF receptor-associated factor 6 (TRAF6)[228]. Poly-ubiquitination of TRAF6 is necessary for downstream activation of IKK and of NF-κB [229]. For the comparison of the connectivity, we removed 159 chemical compounds, some of which were the most highly connected chemical compounds in the ihsTLRv2, including protons, adenosinediphosphate (ADP), and adenosinetriphosphate (ATP), emphasizing energy requirements and dependence on metabolic processes. 4.2.3 SNPs in the TLR signaling network SNPs have been linked to human pathophysiology [230]. However, the complexity of human genotype-phenotype relationships is challenging to elucidate. Although ultimate consequences of sequence variation is relatively easy assessable, changes perusing through the entire cellular network might not be as obvious. For instance, sequence variation can affect kinetic properties of individual enzymes and/or its expression level. Mapping these changes to a metabolic network of the red blood cell demonstrated the functional consequences of selected SNPs in cell function in 107 Table 4.2: Comparison between InnateDB interactions among ihsTLRv2 genes and interactions of ihsTLRv2 network species within ihsTLRv2. Ranking 1 2 3 4 5 6 7 8 9 10 InnateDB Entrez Gene Gene ID Symbol 7189 TRAF6 5970 RELA 7316 UBC 3551 IKBKB 3654 IRAK1 1147 CHUK 4790 NFKB1 8517 IKBKG 4792 NFKBIA 4615 MYD88 Interactions (n) 180 161 143 137 136 133 130 123 121 119 ihsTLRv2 Corresponding chemical Connections compound (n) TRAF6-D[c] 14 NFKB(p50/p65)[c] 13 UBIQ[c] 23 IKK[c] 25 IRAK1_TIFA-2P3U[c] 16 IKK[c] 25 NFKB(p50/p65)[c] 13 IKK[c] 25 NFKB_IKBA[c] 5 MYD88-D[c] 16 Ranking (model connectivity) 5 6 2 1 4 1 6 1 13 4 Genes (InnateDB) and corresponding chemical compounds (ihsTLRv2) were ranked according to number of interactions. Highly ranked genes were also highly connected in ihsTLRv2, despite smaller number of connection. chronic and non-chronic anemia patients [231]. In a similar manner, testing the consequences of functional loss of proteins might offer insights into downstream effects of SNPs in TLR signaling. In total, we identified SNPs for 12 distinct genes linked to known clinical phenotypes. We simulated the consequences of loss of protein function of those genes due to the presence of SNPs upon the input-output (I/O) relationships in ihsTLRv2 (Table 4.3). Four of these genes code for the receptors, TLR1, TLR3, NOD1, and NOD2. In silico knock-out (KO) of these four genes disabled receptor dependent I/O pathways. The TIR domain containing adaptor protein (TIRAP), another gene with disease-linked SNPs specifically mediates the MyD88-dependent pathway via TLR2 and TLR4 [228]. The KO of TIRAP led to complete disruption of downstream pathways of TLR1/2 and TLR2/6, disabling all outputs induced by these inputs. However, TLR4 signaling was not affected in our simulations. In addition to MyD88-dependent signaling, TLR4 also induces a MyD88-independent pathway through TIR domain-containing adaptor (TRIF). Both pathways induce activation of NF-κB, although distinguished as early and late response [228]. Our constraint-based steady-steady simulation approach does not contain a time component. Thus, differential effects of early and late response could not be resolved, and fluxes were redirected in the case of TLR4 signaling. To mimic, fast and slow response one could, for example, add constraints on the upper bound of the reactions involved in the slow signaling pathway. In our simulations, the outputs, NF-κB and reactive oxygen species (ROS), were each selectively disrupted through the KO of some disease-related SNP genes. For instance, ROS production was disabled through KO of either subunits of NADPH 108 oxidase, neutrophil cytosolic factor 2 (NCF2), or p22-phox protein (CYBA). NF-κB activation was selectively disabled through the KO of the alpha subunit of inhibitor of kappa light polypeptide gene enhancer in B-cells kinase (IKK), which phosphorylates IkBα and activates NF-κB [228]. We found that four of the 12 genes were insufficient to influence the flux distributions as these genes encoded exchangeable subunits of phosphatidic acid phosphatase, protein kinase C, protein kinase A, and a A20-binding inhibitor of NF-κB activation homolog. The KO of any of these four genes did not elicit any effect in the simulation as the model used an isoform. However, in a particular tissue the actual number of expressed isoforms may be limited and thus, a phenotype could be observed using a more tailored, cell-type specific TLR network. Taken together, our analysis provided some insight into the possible effects of SNP dependent protein KO. The results also demonstrate the need for contextualization to better resemble the actual conditions in certain tissues or disease states. 4.2.4 Tissue specic TLR expression TLR expression is cell-type specific, thus reflecting distinct functions and exposure to pathogens presence. In order to assess these differences, not only for TLR expression but for all proteins involved in TLR signaling, we obtained protein expression information for the ihsTLRv2 gene set from the HPA for 66 normal cell types [190]. Data could be obtained for 77 ihsTLRv2 genes (24.5%), present in at least one cell type. These 77 genes encoded 66 distinct proteins. Gene products with moderate/medium and strong/high expression were assumed to be present, while the other gene products were assumed to be absent. On average, each tissue expressed 40 proteins with the least number of proteins expressed in ovarian stroma cells (18) and highest number of expressed proteins in lung macrophages (53). A large number of airborne pathogens inevitably attain the lung in combination with inhaled air each day [232]. Expression of a high number of proteins involved in TLR signaling by lung tissue macrophages might reflect such tissue-specific, constant pathogen exposure. However, gene coverage was on average only 13% per tissue. Using clustering of cell types and genes based on the Euclidean distance measure and subsequent visual inspection, we separated genes into five clusters with distinct abundance among cell types (Figure 4.1). Cluster one and three contained sparsely expressed genes on average expressed in 12 and 17 tissues, respectively. The genes of cluster one were found to be almost exclusively expressed in lymphoid and hematopoietic cell types. In contrast, genes frequently expressed accumulated in the second, fourth, and fifth cluster expressed on average in 41, 61, and 47 cell types, respectively. Cluster four consisted of 15 genes expressed in 92% 109 Table 4.3: Table summarizing ihsTLRv2 genes with clinically linked SNPs, corresponding clinical phenotypes and consequences of in silico knock out on ihsTLRv2 function. Entrez gene ID 9663 7096 Gene symbol LPIN2 TLR1 Phenotype MIM number 609628 613223 613223 609464 186580 266600 607507 101800 160980 255960 610489 64127 NOD2 5573 PRKAR1A 4688 NCF2 188550 233710 5578 7128 PRKCA TNFAIP3 612967 612378 1147 CHUK 612363 1535 CYBA 233690 114609 TIRAP 606252 611162 610799 7098 TLR3 607948 613002 10392 NOD1 266600 Phenotype KO effected Majeed syndrome Leprosy, protection against Leprosy, susceptibility to Sarcoidosis, early-onset Blau syndrome Inflammatory bowel disease 1 Psoriatic arthritis, susceptibility to Acrodysostosis with hormone resistance Carney complex, type 1 Myxoma, intracardiac Pigmented nodular adrenocortical disease, primary Thyroid carcinoma, papillary, somatic Chronic granulomatous disease due to deficiency of NCF-2 Body mass index QTL 15 Systemic lupus erythematosus, susceptibility to Plasma level of alanine aminotransferase, QTL 1 Chronic granulomatous disease, autosomal, due to deficiency of CYBA Bacteremia, protection against Malaria, protection against Pneumococcal disease, invasive, protection against Tuberculosis, protection against Herpes simplex encephalitis, susceptibility to Inflammatory bowel disease 1 no TLR1/10, TLR1/2 signaling TLR1/10, TLR1/2 signaling NOD2 signaling NOD2 signaling NOD2 signaling NOD2 signaling no no no no no ROS production no no NF-κB ROS production TLR1/2, TLR2/6 signaling TLR1/2, TLR2/6 signaling TLR1/2, TLR2/6 signaling TLR1/2, TLR2/6 signaling TLR3 signaling NOD1 signaling Listed are those genes, for which SNPs with clinical phenotype could be identified, as provided by Exome Variant Server (URL: http://evs.gs.washington.edu/EVS/). of the cell types. After hierarchical clustering of the presence/absence data, we divided the different cell types into two obvious clusters based on the number of expressed genes. The first cluster contained cells with high numbers of genes expressed, mostly neuronal cells, glandular cells, hematopoietic, and lymphoid cell types. The data mapping revealed one example for the tissue specific isoform expression of calcium/calmodulin-dependent protein kinase II beta (CaMK-II subunit β ), which was exclusively expressed in six neuronal cell types. However, neural glial cells did not express CaMK-II subunit β . Four isoforms of CaMK-II (α, β , γ, and δ ) expressed from separate genes exist in mammals, two of which are brain specific (α and β ) [233]. While our HPA data captured accurately the β isoform specific expression, no reliable tissue specific expression of the other isoforms could 110 be observed. Overall, analysis of the expression pattern of the TLR signaling network specific gene set using HPA data provided evidence of distinct expression pattern of genes and isoforms at tissue and cell level, i.e., for brain or lymphoid tissues, which we found in agreement with experimental data. However, low coverage of the ihsTLRv2 genes in the HPA data would not be sufficient for the generation of tissuespecific networks of TLR signaling. 111 Figure 4.1: Expression of ihsTLRv2 gene products in normal human tissues. Clustering revealed distinct expression pattern of the ihsTLRv2 genes with respect to genes and tissues. Genes divided into five clusters with distinct mean abundance across tissues. Gene cluster were on average expressed in 12, 41, 17, 61, and 47 tissues. Clustering tissue and cell-types revealed two clusters. The first cluster (left) terminates after the closely assembled lymphoid cell-types. Within the first tissue cluster, certain groups of related cell-types, such as lymphoid or CNS neuronal celltypes clustered close together. Lymphoid cell-types selectively expressed genes of the first gene cluster. Tissues and cell-types in the second cluster express genes of the third and fifth gene cluster less frequently. 112 4.2.5 Protein abundance of ihs TLRv2 in cancer cell lines After characterizing distinct expression pattern among cell types, we were interested in the abundance of proteins involved in TLR signaling within cells. Recent advances in high-resolution mass spectrometry (MS)-based proteomics has enabled large-scale investigation of cellular concentrations of proteins [110, 111]. Although abundance of distinct protein categories, such as metabolic enzymes, transcription factors, and kinases has been assessed [111], context-dependent analysis of a signaling network comprising of different protein categories (receptors, kinases, phosphatases) have not been investigated yet. Using our well-defined, well-curated TLR signaling network, we were interested whether our specific set of signaling proteins in general and TLR receptors in particular were differentially abundant in different cell types. To address this question, we employed data from two recent studies, which conducted large-scale measurements of protein abundance in two cell lines, cervical cancer originating HeLa cells [111] and human osteosarcoma originating U2OS cells [110]. We identified 164 ihsTLRv2 model gene products in the HeLa cell line data and 155 ihsTLRv2 gene products from U2OS cells. The range of protein concentrations of the ihsTLTv2 gene products ranged from 27 copies per cell (serine/threonine-protein phosphatase 2A regulatory subunit B’, beta) up to 14.5 million copies per cell for ubiquitin. While the entire HeLa proteome ranged from 0.2 to 33 million copies per cell, ihsTLRv2 gene products also covered almost the entire concentration range with ubiquitin being the 10th most abundant protein in the HeLa proteome. In the U2OS cells, measured concentrations in the entire U2SO proteome ranged from <500 up to >20 million copies per cell. The concentrations of ihsTLRv2 gene products ranged between <500 up to 11.7 million copies per cell. The latter one was the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1), which was among the 30 most abundant proteins in the U2SO proteome. We observed similarities in protein concentrations of ihsTLRv2 gene products among the two cell lines, for proteins with high and low concentrations. One example was the heat shock protein beta-1 (Hsp27). Heat shock proteins are known to be overexpressed in human cancer cells and Hsp27 expression has been associated with poor prognosis in several types of cancer [234]. Therefore, it is not surprising to find Hsp27 expressed in high quantities in both cancer cell lines. On the other end of the scale, a subunit of serine/threonine-protein phosphatase 2A was expressed in very low concentrations. Although the very same regulatory subunit was not part of the data we derived from the U2SO cell data set, we found alternate subunits of the serine/threonine-protein phosphatase 2A among low concentrated proteins (<500 copies). A total of 118 ihsTLRv2 model gene products were identified in both cell lines, corresponding to 38% of all ihsTLRv2 genes. This high overlap indicates that the majority of the TLR network is expressed in these two cell lines but isoform usage may change. Indeed, shared gene products made up 50% of the chem- 113 ical compounds in ihsTLRv2. Considering proteins expressed in each cell lines, 59% ihsTLRv2 chemical compounds were expressed in U2SO cells and 62% in HeLa cells. Expression of distinct/additional isoforms was observed in a number of cases, such as the 14-3-3 protein family, protein kinase 2 (CK2), phosphoinositide 3-kinase (PI3K1A), and protein kinase A (PKA). Both cell types expressed additional isotypes of the 14-3-3 proteins, which are involved in a wide range of cellular functions [235]. The 14-3-3 proteins can modulate interactions between proteins. For example, different 14-3-3 isotypes mediate complex formation of Raf-1 with distinct PKC isotypes leading to tissue-specific differences of the resulting complex [236]. In the previous section CAMK-II, subunit β , was observed to be specifically expressed in neurons of the central nervous system (CNS). In our current analysis, both cell lines expressed CAMK-II subunit δ . Additionally, HeLa cells express CAMK-II subunit γ. Both isoforms have been reported to be expressed in numerous tissues in rat [237]. The only TLR receptor expressed was TLR9 in HeLa cells. Over-expression of TLR9 has previously been reported in lung cancer and HeLa cells, and it has been suggested that it might contribute to cancer proliferation, although the mechanism is unknown [238]. Apart from this, cervical cells seem to lack TLR expression in the absence of infection, possibly preventing excessive activation of TLR signaling through normal vaginal flora [239]. Bacterial, viral or protozoan infection in the cervix has been associated with development of cervical cancer [239]. However, human papillomavirus infection alone is insufficient to cause cervical cancer [239]. On the other hand, no information was found on TLR signaling in U2SO cells. Apart from the similarities discussed above, no correlation was observed between the expression levels of the entire set of 118 overlapping proteins indicating that the usage of the TLR signaling network between these two cells is quite distinct. 4.2.6 Generation of a draft monocyte specic TLR model based on gene expression data In order to derive a monocyte specific model of TLR signaling (ihsMonoTLR), we mapped gene expression data from untreated monocytes [240] onto the network. To find a suitable cutoff, distinguishing between presence and absence of expressed genes, we generated draft-reconstructions based on two different cutoffs p≤0.01 and p≤ 0.05. A set of 37 genes solely received absent calls for the more stringent cutoff. The cutoff had a major impact on the number of dead-end metabolites and blocked reactions, i.e., reactions that cannot carry any flux in the network due to dead-end metabolites (Figure 4.3). The decision for a particular cutoff has there- 114 fore a major impact on the network capabilities as well as on the time required for manual curation of the model to ensure similar functionality as in the cell. Protein expression data were obtained for 23 genes in two monocytic leukemia cell lines (THP-1 and U-937) [241]. Most of the genes (17) were moderately expressed, which was mostly the case for both cell lines. The remaining six gene products had not been detected using immunohistochemistry, four of which were absent in both cell lines, while two gene products were only expressed in one of the cell lines. There was no correspondence between statistical detection probability and negative detection that would make us favor the more stringent, p≤0.01, cutoff (Figure 4.3). Literature search yielded experimental evidence for the presence for four genes (Table 4.10). Since the majority of genes rejected by the stringent cutoff was found to be present in monocytes based on immunohistochemistry and literature evidence, we proceeded our network tailoring by using the p≤ 0.05 cutoff, as it seemed more suitable for monocytes and the given data set. 4.2.7 Literature based curation of the draft monocyte specic TLR model Literature provided evidence for the production of all six outputs in human monocytes [214, 242, 243, 244]. Using flux variability analysis (FVA) [11, 162] on the draft monocyte TLR model, we found that ROS production was only partly possible through one of two defined output reactions, and that NF-κB production was completely impaired (Table 4.6). We completed the corresponding output pathways by adding the protein kinase C, zeta (EntrezGene ID: 5590), which recovered both outputs (Table 4.11) see also Methods section). This gene product is known to be important in NF-κB activation, and its presence in U-937 cells has been demonstrated [245]. Three genes (Entrez Gene ID: 815-818), encoding for isoforms of CaMKII, had direct impact on the models output capabilities. Reincorporation the genes encoding CaMKII resulted in a major increase in CREB output production (Table 4.11). We further curated ihsMonoTLR based on known monocyte function, instead of relying solely on a pathway driven approach. Only genes were considered, which were absent in ihsMonoTLR, while isoforms of already captured genes were ignored. Subsequently, 14 genes were reintroduced to the ihsMonoTLR network based on literature support (Table 4.12). The final monocyte specific TLR signaling network, ihsMonoTLR, contained 62 genes less than the generic TLR signaling network, ihsTLRv2 (Figure 4.2). The gene reduction mainly affected the presence and absence of redundant genes, while the signaling pathways mostly remained complete. The genes absent in the final hMonoTLR model encode proteins of 22 chemical compounds in the network (Table 4.4). We found large decrease in the number of expressed isozymes, e.g., for cal- 115 Figure 4.2: Workflow leading from ihsTLRv1 to a data driven monocyte and LPS stimulated monocyte model. The workflow describes the process of generating celltype specific, and subsequently cell and condition specific models of TLR signaling in four steps. (1) In the first step, Homo sapiens genes and gene-reaction associations were added to the model. Further, reactions and chemical compounds connected to the signaling of TLR11 were deleted and exchange reactions added. (2) Transcriptomic data was mapped to the model leading to preliminary monocyte specific models of TLR signaling using different cutoffs during the mapping process. (3) The most suitable preliminary model was chosen based on comparison with cell-type specific proteomic data (HPA) and literature evidence. Manual curation was essential to ensure monocyte specific input-output capabilities of the final monocyte model ihsMonoTLR. (4) Transcriptomic data derived from LPS stimulated monocytes was mapped to the ihsMonoTLR to tailor the model condition specific. Statistics on the network sizes at each stage reveal how network size remains comparable while gene contents reduced with increasing modeling resolution. pain, which are calcium-dependent cysteine proteases and ubiquitously expressed. Its functions include, among others, pro-IL-1 processing [246]. In total, nine out of the 16 calpain genes were found to not be expressed in the monocytes. Despite the reduced number of genes, functional calpain complexes could still be assembled and the model could produce IL-1. During the curation step, the number of dead-end metabolites and blocked reactions was reduced (Figure 4.3B). The output capabilities of ihsMonoTLR remained equal to ihsTLRv2 (Table 4.6). TLR3 and TLR10 were absent in the monocyte specific model, while all other TLRs and NODs were present (Table 4.5), in agreement with literature [212, 214, 247, 116 Figure 4.3: Definition of cutoff for initial monocyte draft-model. A. The procedure for the generation of the monocyte specific model was divided into two parts. First, a suitable cutoff was defined for mapping the gene expression data. Therefore, preliminary monocyte models were generated for two cutoffs (p≤0.01 and p≤0.05). Both cutoffs led to high numbers of blocked reactions and dead-end nodes in the networks. We identified the set of genes only absent in the more stringent cutoff and validated expression of the gene products using the Human Protein Atlas immunohistochemical data of two monocytic leukemia cell lines (THP-1 and U-937) and chose the cutoff, which represented monocyte protein expression the best. The second part of the procedure concerned the assurance of monocyte specific network functionality. Input and output capabilities of the monocyte model were curated according to cell-type specific literature evidence. B. Statistic of the number of deleted genes, reactions constrained during the data mapping, blocked reactions, and dead-end nodes in the preliminary monocyte models, the curated monocyte model (ihsMonoTLR), and the LPS stimulation specific monocyte model (ihsMonoTLR_LPS). C. Graph illustrating the detection probability of the genes absent in the stringent and present in the moderate cutoff. Genes are colored according to whether they were expressed (red), they were not expressed (blue), no data was available (pale), or data among cell lines was discriminating (purple). In many cases, no data was available, or the proteins were expressed in the cell lines. Only in few cases, the genes were not expressed in any of the cell lines. Also, absent gene expression was distributes across the entire range of the thresholds, such that no intermediate cutoff could be established. As a result, the monocyte model was based on the more moderate cutoff. 117 248, 249]. The wealth of supporting literature evidence for TLR signaling specific components in monocytes underlines that we defined biologically conclusive cutoff for generating a monocyte model of TLR signaling. We also demonstrated that the monocyte specific model generation required substantial manual curation upon gene expression data mapping to reflect well the known cell-type specific receptor content. Table 4.4: Distribution of absent genes Chemical compound Ajuba Kinase suppressor of RAS 2 Toll-like receptor 10 Toll-like receptor 3 beta-transducin repeat containing protein 2 A20-binding inhibitor of NF-κB activation Sarco/endoplasmic reticulum Ca(2+)-ATPase Serum/glucocorticoid regulated kinase Thioredoxin reductase Ubiquitin-conjugating enzyme E2D cAMP responsive element binding protein Ubiquitin Protein phosphatase 2B Protein kinase A Cholin uniport Phosphoinositide 3-kinase MAP kinase phosphatase Protein Phosphatase 2A Src family kinase/ Src-related tyrosine kinase Ddiacylglycerol kinase (generic) Histone H3 Phosphatidic acid phosphatase (generic) Calpain Genes absent in hMonoTLR 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 3 5 6 9 9 9 Genes encoding chemical compound 1 1 1 1 2 3 3 3 3 3 4 4 5 7 2 6 16 17 10 10 12 14 16 Group of chemical compound Protein Kinase Receptor Receptor Protein Protein Kinase Kinase Protein Protein Protein Protein Phosphatase Kinase Metabolite transporter* Kinase Phosphatase Phosphatase Kinase Kinase Protein Phosphatase Protein *Metabolite transporter did not have a chemical compound as the genes were only added to the transport reaction. 4.2.8 Tailoring the monocyte TLR model to a LPS stimulation specic model In monocytes, TLR4 stimulation activates several signaling pathways and transcription factors (TFs) as well as induces inflammatory gene expression programs [242]. In order to investigate this distinct network state, we used gene expression data of the aforementioned experiment [240] to tailor ihsMonoTLR condition specific. Two ihsMonoTLR genes, pellino homolog 3 (PELI3) and TLR6, were no longer expressed upon LPS stimulation. It has been experimentally shown that LPS stim- 118 119 Ligand Abbrev. 26dap-LL ALPS BDFN2 BPM CPGCIGC CSGA DCLDLPP DCLLPP DSRNA ENVP FBNG FLGN FUSP GCSPL GLC HSP60 HSP70 IMQ LAM LP LPPS LPS_HS LTA LXR MRAP MRDP MRNA OLSCHYA OMPA OSPALP IL1A IL1B PRNS PSCHPS PSM PTG_HS SF SSRNA STF T3RFBN TCLDLPP TLRL1/10 TLRL10 TLRL2/10 TXL UMLCPGD ZMS Ligand name diaminopimelic acid atypical lipopolysaccharide beta defensin 2 bropirimine CpG chromatic IgG2a complexs CsgA diacetylated lipopeptides diacyl lipopeptides double stranded RNA envelope protein fibrinogen flagellin fusion protein glycoinositol phospholipids glycolipids heat shock protein (60kDa) heat shock protein (70kDa) imidazoquinoline lipoarabinomannan lipoprotein lipopeptides lipopolysaccharide (Homo sapiens) lipoteichoic acid loxoribine mannuronic acid polymer muramyl dipeptide mRNA oligosaccharides of hyaluronic acid outer membrane protein A outer surface protein A IL-1A IL-1B porins polysaccharide fragment of heparan sulphate phenol-soluble modulin peptidoglycan (Homo sapiens) soluble factors single stranded RNA soluble tuberculosis factor type III repeat extra domain A of fibronectin triacetylated lipoproteins TLR1/10 ligand TLR10 ligand TLR2/10 ligand taxol unmethylated CpG DNA zymosan Receptor type activated NOD1 TLR2 TLR4 TLR7 TLR9 TLR2 TLR2/6 TLR2/6 TLR3 TLR4 TLR4 TLR5 TLR4 TLR2 TLR2 TLR4 TLR2, TLR4 TLR7, TLR8 TLR2 TLR2 TLR2 TLR2, TLR4 TLR2/6, TLR2 TLR7 TLR2, TLR4 NOD2 TLR3 TLR4 TLR2 TLR2/6 IL1R1 IL1R1 TLR2 TLR4 TLR2/6, TLR2 TLR2 TLR1/2 TLR7, TLR8 TLR2/6 TLR4 TLR1/2 TLR1/10 TLR10 TLR2/10 TLR4 TLR9 TLR2/6, TLR2 Outputs NF-κB • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • AP-1 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • CREB • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ROS • • • • • • • • • • • • • • • • • • • • • • • • • • IRF-3 • • • IRF-7* hMonoTLR_LPS hMonoTLR hMonoTLR hMonoTLR hMonoTLR_LPS hMonoTLR_LPS hMonoTLR_LPS hMonoTLR_LPS hMonoTLR_LPS hMonoTLR_LPS hMonoTLR First model absent Table 4.5: Inputs and outputs covered by generic (ihsTLRv2) and monocyte specific (hMonoTLR & hMonoTLR_LPS) TLR signaling models. *IRF7 can only be produced after stimulation of combinations of TLR receptors, TLR3 or TLR4 combined with either TLR7, TLR8 or TLR9. Table 4.6: Maximum possible flux values for output reactions in the different TLR signaling models. Output IRF3 IRF7 ROS ROS AP-1 CRE AP-1 NF-κB NF-κB ihsTLRv2 25.00 11.11 25.00 50.00 25.00 12.50 25.00 14.29 14.29 ihsMonoTLR draft p≤0.05 0.00 11.11 25.00 50.00 1.00 12.50 25.00 0.00 0.00 Fluxes are given in ( ihsMonoTLR draft p≤0.01 0.00 11.11 0.00 27.00 1.00 0.00 0.00 0.00 0.00 final ihsMonoTLR 25.00 11.11 25.00 50.00 25.00 12.50 25.00 14.29 14.29 ihsMonoTLR_LPS 25.00 11.11 25.00 50.00 25.00 12.50 25.00 14.29 14.29 µmol ). g protein · min ulation led to degradation of the PELI3 gene product in human peripheral blood mononuclear cells and that the protein levels only recovered after several hours [250]. Absence of the PELI3 gene was therefore unlikely to be an artifact. The resulting ihsMonoTLR_LPS model contained 960 reactions and 763 chemical compounds (Figure 4.2). The number of reactions reduced by three, which were associated with the two absent genes, and two chemical compounds were absent compared to the monocyte model. The number of dead-ends and blocked reactions increased, while functionality with respect to the outputs remained the same (Table 4.6). Two genes, which were absent in unstimulated monocytes, appeared to be expressed after LPS stimulation. A Src family kinase (Entrez Gene ID: 7525) and TNFAIP3 interacting protein 3 (Entrez Gene ID: 79931). Hence, these genes were not expressed in unstimulated monocytes, they were no longer part of ihsMonoTLR and were not further considered in ihsMonoTLR_LPS. Both of the genes encode isoforms and in both cases at least one other isoform was present in unstimulated and stimulated monocytes. Addition of the genes would therefore not have altered the number of active reactions. This observation highlights that the generation of a truly generic, condition unspecific monocyte model requires a compilation of multiple data sets and that curation of redundant genes would be needed. 4.2.9 Condition specic network states of monocyte TLR signaling The monocyte models reconstructed herein allow for simulation and analysis of changes in energy levels and altered gene expression that occur in case of innate immune response. Both cases will be investigated in the following sections. 120 4.2.10 Sensitivity analysis Signaling and innate immune response are energy dependent cellular processes. During antibacterial innate immune response, intracellular ATP levels might rapidly deplete [251]. In order to evaluate the energy dependency of the TLR signaling network, we performed a sensitivity analysis testing ATP and guanosinetriphosphate (GTP) requirements of the distinct outputs produced after stimulation through one of 13 input receptors in hMonoTLR_LPS (File S3, see also Methods section). We found the same qualitative dependencies on energy species for 12 input receptors and production of ROS, CREB, AP-1 and NF-κB (File S3 below). As depicted for TLR4 stimulation (Figure 4.4), all output production was dependent on ATP, and ROS production further dependent on GTP. TLR4 is the only input receptor in the monocyte model to produce both Interferon regulatory factor 3 (IRF3) and IRF7, which was dependent on ATP but not GTP. NOD receptors produced other outputs beside NF-κB due to thermodynamically infeasible loops in the network (discussed in [7]) which are necessary for network function (File S3 below, Figures S2-3). Production of NF-κB from NOD receptor (NOD1 and NOD2) stimulation was ATPbut not GTP-dependent. This sensitivity analysis demonstrates the requirement of the TLR model for ATP and GTP and that the availability of energy could indeed modulate the signaling outputs. 4.2.11 Setting quantitative gene expression changes into context Quantitative changes in gene expression could possibly alter flux distributions and produced outputs within the network. Compared to the relatively small differences in qualitative gene expression, LPS stimulation induced up-regulation of 28 ihsMonoTLR_LPS genes (Table 4.13) and down-regulation of three genes (Table 4.14). Together, they represented 12% of the genes. Of the 28 up-regulated genes, ten encoded isoforms. None of the down-regulated genes were isoforms. We called a gene differentially expressed when at least 50% of the probe sets of the gene were differentially expressed. Eight genes, three up- and five down-regulated with regulated probe sets, were rejected due to this threshold and will be referred to as subthreshold genes in the following sections. Subthreshold genes represented further 3% of the ihsMonoTLR_LPS genes. Taken together, only a small number of the ihsMonoTLR_LPS genes showed altered gene expression level two hours poststimulation with LPS. 121 Figure 4.4: Sensitivity analysis. hMonoTLR_LPS was used for the sensitivity analysis. The network contains nine output reactions for six distinct outputs ROS, IRF3, IRF7, CRE, AP-1, and NF-κB. ROS,CRE, AP-1, and NF-κB could be produced by all receptor inputs. Energy dependencies of output production did not differ among input receptors. IRF3 was only produced after stimulation of TLR4. IRF7 was only produced when TLR4 and either TLR7, TLR8 or TLR9 were stimulated together. In case of IRF7, we stimulated the network via TLR4 and TLR8. Estimation of the impact of the up-regulated genes on network topology We were interested in the impact of the regulated genes on the TLR signaling network functionality. Since ihsMonoTLR_LPS represents accurately the functions of each gene product, we extracted a sub-network consisting of all reactions associated with the 28 up-regulated genes (ihsMonoTLR_LPS_upreg), which included 185 reactions (19% of ihsMonoTLR_LPS) and 296 chemical compounds (39% of ihsMonoTLR_LPS) (Figure 4.5). The sub-network also included output reactions for NF-κB and AP-1 implying an influence of the up-regulated gene set, in particular, upon these two different model outputs, which is in agreement with experimental data [242, 253]. We compared the connectivity of the chemical compounds within the sub-network with ihsMonoTLR_LPS. The high metabolite connectivity of protons, ATP, and ADP was conserved in the sub-network, even though the rela- 122 Figure 4.5: Network resulting from mapping of the up-regulated genes onto the LPS stimulation specific monocyte model. We extracted a sub-network from LPS stimulation specific monocyte model (ihsMonoTLR_LPS) consisting of all reactions associated with the 28 up-regulated genes, which included 19% of the reactions and 39% of the chemical compounds of ihsMonoTLR_LPS. The visualization revealed a comprehensively connected network. Network illustration was generated using software Paint4Net [252]. tive connectivity was smaller in the sub-network than in ihsMonoTLR_LPS (Figure 4.6). Chemical compounds, such as ubiquitin and the inhibitor of the kappa light polypeptide gene enhancer in B-cells kinase (IKK), had lower numbers of connections compared to ihsMonoTLR_LPS, but in relation to the number of chemical compounds in ihsMonoTLR_LPS (n=763) and the sub-network (n=296) relative connectivity was higher in the sub-network. In contrast, we found chemical compounds, such as TRAF-6 and MyD88, to be higher connected in ihsMonoTLR_LPS. These differences in the connectivity arose since the set of up-regulated genes centered on NF-κB activation, while the chemical compounds with higher relative connectivity in the ihsMonoTLR_LPS appear more up-stream in the signaling cascades of the network. Since the ihsMonoTLR_LPS_upreg comprises of all reactions and functions that are higher used upon LPS stimulation, they can be interpreted as the active sub-network used by the monocytes to process the information and initiate the corresponding program. The high connectivity in ihsMonoTLR_LPS_upreg indicates that the retrieved sub-network mediates NF-κB activation subsequent to LPS stimulation. 123 Figure 4.6: Comparison of (chemical compound) connectivity in the LPS stimulation specific versus the up-regulated sub-network. We report the connectivity as a ratio of compound i and ∑(chemical compounds) in the respective model (ihsMonoTLR_LPS subnetwork and ihsMonoTLR_LPS). Analysis of the down-regulated sub-network module Sub-network extraction was also performed based on the three down-regulated genes. The resulting sub-network comprised eleven reactions and 26 chemical compounds (Figure 4.7). It did not include any output reaction. The impact of down-regulation, based on involvement, as we assessed in the previous section, was rather small. The sub-network consisted of three separated modules centering either mitogen-activated protein kinase kinase kinase 14 (MAP3K14), TLR1, or Fas (TNFRSF6)-associated via death domain (FADD). In case of FADD, another gene product of a minority gene and not used for sub-network extraction, appeared in this context as a direct interaction partner, i.e., caspase-8 (CASP8). CASP8 is known to interact with FADD in monocytes, as part of the differentiation pathway, and to prevent sustained NF-κB activation along the macrophage differentiation [254]. This example shows how ihsMonoTLR can serve as a resource for context-specific analysis by providing functional relationships. 124 Figure 4.7: Network modules resulting from mapping of the down-regulated genes onto the LPS stimulation specific monocyte TLR model. The sub-network that was extracted from ihsMonoTLR_LPS based on the three down-regulated genes comprised of 26 metabolites and eleven reactions. Illustration of the sub-network revealed three separated modules confirming that the impact of down-regulation, based on involvement, was rather small. Network illustration was generated using software Paint4Net [252]. Functional representation of quantitative changes induced through LPS stimulation We used the computed fold changes (FCs) to represent the LPS activated state of ihsMonoTLR_LPS. Up- and down-regulation was mimicked by either enforcing the minimal reaction flux or reducing the possible maximum flux through reactions associated with regulated genes. Mapping was performed separately for each of 117 I/O relationships covering 13 input reactions and 9 output reactions in hMonoTLR_LPS. Subsequently we assessed the consequences of gene regulation based on the altered flux ranges of the 9 output reactions, obtained through FVA. The TLR model contains thermodynamically infeasible loops [7], which cause baseline flux through output reactions in the model. First, we investigated the effect of stimulation beyond baseline flux values for each I/O relationship. Therefore, we subtracted fluxes derived after stimulation from the baseline fluxes (Table 4.15). The pattern of outputs 125 produced by stimulation of an input was, as expected, in the majority of cases. Stimulation caused flux through ROS, CREB, AP-1, and NF-κB for all TLRs and for IL1R1. Stimulation of TLR4 additionally induced IRF3. Combined stimulation of TLR4 and TLR8 led to IRF7. Stimulation of NOD receptors produced NF-κB and AP-1 could be produced (through ’AP1_FOS_JUN_BIND’). After we confirmed the I/O relationships, we went on to investigate the effect of quantitative gene expression changes onto output production. In total, 183 reactions were associated with regulated genes, whereof only a subset was active in a particular I/O relationship. As expected, mapping of differential expression onto the network enforced AP-1 and NF-κB production across all ihsMonoTLR_LPS inputs, as genes directly associated with the output reactions of AP-1 and NF-κB were up-regulated (Table 4.16, Table 4.17). Flux was further enforced through AP-1 output reactions (’AP1_FOS_JUN_BIND’, and ’AP1_JUN_BIND’) equally for all 13 inputs, except for NOD receptors. We predicted a lower flux through ’AP1_FOS_JUN_BIND’ when the NOD receptors were stimulated than for the other receptors. Data mapping enforced the production of IRF3 output in the model after TLR4 stimulation, and IRF7 after stimulation through TLR4 and TLR8. ROS and CREB output production was not affected by the mapping of differentially expressed genes. Among the output reactions, no effects of the mapping of down-regulation were observed. This analysis demonstrated how the model can be used to predict differences in cellular phenotypes due to quantitative gene expression differences. 4.3 Discussion The aim of this study was to establish a method for omics data driven contextualization of signaling networks after gene-extension of the human TLR signaling network (Figure 4.2, see File S1 below for details on the procedure). Our key results demonstrate that i) substantial manual curation is required after specializing the generic TLR signaling network to a cell-type and condition specific sub-network; ii) the monocyte TLR signaling network captured most of the functionality of the generic network but gene redundancy was removed, indicating cell-type specific use of isoforms; and iii) TLR signaling is highly energy dependent as all TLR signaling pathways required ATP availability and ROS production was additionally dependent on GTP availability. Taken together, we demonstrated that the contextualization of the TLR network enables the functional analysis of TLR signaling in health and disease. We employed the gene-extended TLR signaling network together with gene expression data and literature evidence, which ensured monocyte specific functionality with respect to I/O pathway content (Table 4.5). The role of manual curation work 126 has been emphasized as important step in the generation of a biological meaningful cell-type or tissue specific models, despite a growing number of sophisticated algorithms [129]. Curation with respect to the function was important as the monocyte model was the template for subsequent, condition specific tailoring and analysis of the consequences of LPS stimulation for network structure and function. The TLR expression in monocytes at the chosen cutoff and cell-type specific literature were found to be in good agreement with some but not all experimental studies [212, 214] indicating the importance of reproducible, consistent experimental conditions and of using identical monocyte subsets. For instance, infection states or stimulation can drastically alter cellular processes and induce the production of effector molecules, such as cytokines [215], for which the cell has to provide energy for the transcriptional and translational machinery. Such cellular changes can even involve usage of central metabolic pathways, including the switch to glycolysis for faster energy allocation [255, 256]. ihsTLRv2 was redundant in its pathways connecting inputs with specific sets of outputs and with respect to genes encoding isoforms. Transition from ihsTLRv2 to cell-type specific ihsMonoTLR was characterized by isoform reduction, while network size remained comparable. This may be partly due to not manually curating the expression state of isoforms. Monocytes describe cells that are central to the host innate immune defense and are known to express many TLRs [211, 216, 257]. Our finding that the majority of the signaling network is preserved in monocytes is thus plausible. Transition from unstimulated to the LPS stimulated model of TLR signaling was characterized by only few qualitative differences but prevailing differences in quantitative gene expression was observed. The set of up-regulated genes was found to be tightly connected (Figure 4.5). The impact of the up-regulated genes spread across one third of ihsMonoTLR_LPS depicting the strong influence that LPS stimulation has upon the monocyte TLR signaling network. The LPS stimulation specific sub-network of up-regulated genes correctly contained the transcription factors NF-κB and AP-1 as their activation is an expected response of a monocyte to LPS stimulation [242, 258]. TLR signaling is highly energy dependent as demonstrated with the sensitivity analysis (Figure 4.4). TLR signaling network accounts for a number of other metabolites that link it to further metabolic processes. Integrating of models of different cellular processes, such as metabolism, signaling, and gene regulatory networks [259, 260, 261, 262], will enable important insights into the crosstalk between signaling and metabolism. Corresponding modeling tools are currently developed [6, 27, 263]. In fact, the interaction between metabolism and innate immunity is of great interest both for health and disease [256]. For instance, TLR agonists can stimulate a switch from oxidative phosphorylation to glycolysis in murine dendritic cells and macrophages [255, 256]. This switch lead to faster yet less effective ATP production, similar to the Warburg effect observed in cancer cells, and may function as a protective mechanism to preserve cellular ATP levels and maintain cell viability and function during an immune response [255, 256]. Moreover, it has been sug- 127 gested that neuronal TLR signaling is involved in triggering cell death in response to brain injury [213]. Combined signaling and metabolic COBRA modeling could help consolidating the complexity of the diseases by highlighting cross-relations. Taken together, we demonstrated that a stoichiometric model of the TLR signaling network combined with transcriptomic data can provide functional insight into its signaling cascades. The presented gene extension and method to integrate transcriptomic data opens up an alley for more detailed, disease directed research, including drug target discovery, and thus rendering signaling models amenable to similar contextualization as already established for metabolic models. 4.4 Materials and Methods Gene extension Genes for chemical components were identified using NCBI Entrez gene database [224], UniProtKB/Swiss-Prot [225], and primary literature. The generation of GeneReaction Associations (GRAs) was subsequently performed using the rBioNet software [264]. The rBioNet software requires a gene index file. The gene index file contains Entrez gene ID, gene Symbol, location, gene type and description of added model genes and of the genes encoding the members of the Ras family. To generate the gene index file, Homo sapiens gene information was downloaded from NCBI (4/13/2011). The software allows loading model structures and to easily alter the model content, such as reactions and GRAs. Genes were associated with reactions using Boolean logic, AND for complexes requiring multiple subunits and reactions requiring multiple proteins. OR was assigned for functional isoforms. Gene association additional information Ras protein family is encoded by 35 genes [265], but they were not included in the current version of ihsTLRv2 due to functional ambiguity. However, the genes were included into the gene index file, and can easily be added rBioNet [264]. Additionally, no gene associations were added for reactions involving lipopolysaccharidebinding protein (LBP), which has been described as protein produced in the liver and transported in the blood[266, 267]. Cytokine production can even be induced in absence of LBP, as was demonstrated for monocytes stimulated with LPS in presence of rsCD14 [268]. The primary purpose here was to enable data mapping, and by not adding the gene we ensured that absence of the LBP gene in gene expression 128 data could not interfere with TLR4 signaling. The LBP gene was included in the gene index file so it could easily be added. Due to the lack of gene reaction association, reactions connected to these chemical compounds will be always active when data is mapped onto the network. Model tailoring In addition to the identification and association of model reactions with human genes, a number of reactions were removed from the model content using rBioNet software [264] in order to tailor the model human specific. The removed reactions concerned the transmission of input signal from TLR11 stimulation. Furthermore, 27 exchange reactions were added for dead-end chemical compounds such as extracellular invaders, IL1A and IL1B. Identication of mouse orthologs We downloaded the file containing mouse orthologs (ftp://ftp.informatics.jax.org/pub/reports/HMD_Human5.rpt from Mouse Genome Informatics (Mouse Genome Informatics (MGI) Web, The Jackson Laboratory, Bar Harbor, # Maine. World Wide Web (URL: http://www.informatics.jax.org, 11/16/2011) [269]. We identified matches for the 314 identified human genes. Additionally, we searched for missing genes using NCBI Entrez gene database [224]. We added murine neutrophil cytosolic factor 1 gene (Entrez Gene ID: 17969) to our list of genes. However, six genes could not be found (cAMP-dependent protein kinase catalytic subunit gamma; H3 histone family, member M; H3 histone family, member J; serine/threonine protein phosphatase 2A subunit B, PR48 isoform; toll-like receptor 10; calpain 14). Five genes encoded isoforms. The absence of murine TLR10 gene was expected [270]. Despite of the missing TLR10, mice express TLR11, TLR12, and TLR13 [271], whereof TLR12 and TLR13 were not part of ihsTLRv1. SNPs Exome Varint Server (Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/, 12/2011) was queried for SNPs associated with ihsTLRv2 genes. Additional information about clinical links of the SNPs were derived from OMIM webpage (Online Mendelian Inheritance in Man, OMIM (using links provided in the Exome Varint Server overview 129 results file) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD, 12/2011). World Wide Web URL: http://omim.org/). InnateDB PPIs InnateDB was queried for interactions among the 314 ihsTLRv2 genes (http://innatedb.com/, 12/02/2011) [227]. In total, 3765 interaction entries were returned among 242 interacting ihsTLRv2 gene products. No entries were found for 5 genes (calpain small subunit 2, Entrez Gene ID: 84290; diacylglycerol kinase kappa, Entrez Gene IDs: 139189; and thioredoxin reductase 1, 2, and 3, Entrez Gene IDs: 7296, 10587, 114112, respectively) and no interactions could be found for 67 gene products. Human Protein Atlas data mapping Expression profiles for proteins for normal human tissues based on immunohistochemisty was obtained from Human Protein Atlas (version 9.0 and Ensembl version 64.37, 12/2011)[190]. Ensemble IDs were mapped onto ihsTLRv2 genes. Data with low or uncertain reliability were excluded from analysis. Gene products with moderate/medium and strong/high levels of expression were assumed to be present, while all others were assumed to be absent. Gene products without data were assumed to be absent. Hierarchical clustering was done using GenePattern (http://genepattern.broadinstitute.org/) [272], and Euclidean distance measure for cell-types and genes. Analysis of gene expression data and mapping Gene expression data for unstimulated and LPS stimulated human monocytes were obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). We employed data from two experimental groups (vehicle (control), LPS low 2hrs) [240]. Three chips were excluded from the analysis (GSM252451, GSM252454, and GSM252479) after visual inspection. Absence and presence calls, dividing the set of ihsTLRv2 genes into sets of expressed (present) and unexpressed (absent) genes were generated using the PANP package [273], R (2.13.0) computational platform [274], and using Affymetrix annotation files, for the loose cutoff p≤ 0.05 and for the stringent cutoff p≤ 0.01. For genes with multiple identifiers, we only used the identifier showing the highest mean expression intensity in the control group. Therefore, it is more likely to assign presence calls to absent genes than the other 130 way around. For the mapping of the transcriptomic data, we took advantage of the previously defined GRAs. Reactions were disabled that were associated with gene products that had an absence call associated. In case of functional isoforms, reactions were only disabled if all isoforms were called absent. This way the protein and reaction content of the TLR network was reduced to form the preliminary monocyte models of two different cutoffs. Cuto-denition The Human Protein Atlas (http://www.proteinatlas.org/) was queried using gene symbols of 33 genes with different P/A calls using two different gene expression cutoffs, for expression of the encoded proteins in two monocytic leukemia cell lines, THP-1 and U-937. If the corresponding antibody yielded at least weakly staining for the majority of tests in one cell sample, we called the protein present. I/O pathway curation using illustration tool In order to enable all I/O pathways in the monocyte draft-model, network reactions connecting missing outputs to input were identified, using software Paint4Net [252]. This tool facilitated curation of incomplete I/O relationships in ihsMonoTLR. We first derived a list of reactions involved in the signaling pathway towards NF-κB using ihsTLRv2, which contained the complete pathways, as reference. Subsequently, we did the same for the uncurated ihsMonoTLR and the disconnected output pathways. Comparison of the resulting list of participating reactions revealed the missing links in ihsMonoTLR. Through the GPAs of missing reactions we quickly identified six candidate genes with potential impact on output production. Reincorporation of a single gene at a time revealed the impact of the absence of the particular gene on the output capability of the model. Sensitivity analysis All exchange reactions of ligands and Ligand to receptor binding reactions in µmol hMonoTLR_LPS were constraint to zero . To simulate the distinct I/O g protein · min relationships, input combinations were as follows, ’EX_26dap-LL[e] and ’NOD1P_BIND’, ’EX_ALPS[e]’ and ’TLR2/L-D_BIND’, ’EX_LPS_HS[e]’ and ’TLR4/L_MD2_BIND’, ’EX_FLGN[e]’ and ’TLR5_BIND’, ’EX_TCLDLPP[e]’ and ’TLR1/2_BIND’, ’EX_BPM[e]’ and ’TLR7_BIND’, ’EX_SSRNA[e]’ and 131 ’TLR8_BIND’, ’EX_UMLCPGD[e]’ and ’TLR9_BIND’ or ’TLR9_BINDII’, ’EX_MRDP[e]’ and ’NOD2P_BIND’). For interleukin-1, no exchange existed to specifically drive IL1R1 stimulation. We therefore added an exchange reaction for IL1R1 (’EX_IL1R1_LIG[e]’). This exchange reaction was enabled in combination with ’IL1R1_BIND’ in order to simulate single receptor IL1R1 stimulation. The nine output reactions were ’DM_PHOX_GTP-3P[v]’, ’DM_PHOX_GTP-8P[v]’, ’DM_ISRE_IRF3[n]’, ’DM_ISRE_IRF7[n]’, ’CREB_CRE_BIND’, ’AP1_FOS_JUN_BIND’, ’AP1_JUN_BIND’, ’NFKB_IKBA_DISS’, and ’NFKB_IKBB_DISS’. As implemented in the network structure of the TLR model, IRF7 output could only carry flux if at least two different inputs were activated ((TLR4) and (TLR7 or TLR8 or TLR9 or TLR9II)). Thus, in case of IRF7, we additionally enabled flux through ’TLR8_BIND’ and ’EX_SSRNA[e]’. Note that flux through the remaining output reactions remained possible. To simulate the energy requirements of the µmol I/O relationships, we enabled one exchange reaction (lb =ub= -1 ) and g protein · min µmol one corresponding binding reaction (lb = ub = 1 ) of the specified input g protein · min combinations and used the COBRA robustness analysis function. Either atp or gtp exchange reaction was the reaction of interest, and nPoints = 50. Sensitivity analysis was performed for each I/O relationship. Prior to analysis atp and gtp exchange µmol µmol reactions were constraint to lb=-25 and ub =0 . g protein · min g protein · min Quantitative gene expression analysis Gene expression data for unstimulated and LPS stimulated human monocytes were obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). We employed data from two experimental groups (vehicle (control), LPS low 2hrs) [240]. Three chips were excluded from the analysis (GSM252451, GSM252454, and GSM252479) after visual inspection. Lists of up-and down-regulated genes (Table 4.13 & Table 4.13) were generated using twofold change and p≤0.05 FDR for min 50% of the identifiers per gene cutoffs using AltAnalyze_v2.02beta for processing of the data [275] using default settings, EnsMart65 database and affymetrix annotation files. Mapping of quantitative expression changes For this analysis, exchanges were closed and the same I/O relationships used as described for the sensitivity analysis. However, for this analysis, Energy supply of the 132 µmol µmol , lb = -100 ) g protein · min g protein · min µmol µmol and exchange of gtp ub = 0 , lb = -50 ). For each I/O relag protein · min g protein · min tionship, FBA was run using the minNorm option. The sets of model reactions connected to the up-regulated and down-regulated genes were identified and assigned with a fold change (FCrnx). FCrxn was derived from the change in expression of the regulated gene that was associated with a reaction. If more than one gene associated with a reaction was significantly regulated, the mean fold change was calculated. Highest fold change for up-and down-regulation in the data set served as a reference fold change (FC-up and FC-down). Reaction bounds were adjusted based FCrnx on the following equations (1) model.lb= and (2) model.ub= FC − up ∗ FBAsol.x FCrnx . We compared minimum and maximum flux values derived FC − down ∗ FBAsol.x through FVA [162] for each of the nine output reactions in response to stimulation through each of the input receptor types. model was restricted to exchange of atp ub = 0 All computations were carried out using Matlab (Mathworks, Inc), the COBRA toolbox [15], and TomOpt (Tomlab, Inc) as linear programming solver. 133 4.5 Supplementary material This section captures tables published as supplementary material. Table 4.7: TLR11 receptor was removed from ihsTLRv2 along with 10 reactions associated. RXNS TLRL11_PLACE TLR11_BIND EX_UNKN[e] TIR_MYD_BIND5 DM_TLRL11[c] PLP_TLRL1 UNKN_TLRL11 UROPBC_EXPR EX_PLP[e] TXPLGD_EXPR Desciption TLR11 ligand placeholder Toll like receptor 11 ligand binding Unknown TLR11 ligand exchange TLR11-mediated TIR-MyD88 binding Toll-like receptor 11 ligand (generic) demand Toll-like receptor 11 ligand (Profilin-like protein) Toll-like receptor 11 ligand (Unknown) Expression of uropathogenic bacteria (invader) Profilin-like protein exchange Expression of Toxoplasma gondii parasite (invader) Table 4.8: TLR11 receptor was removed from ihsTLRv2 along with seven other metabolites. Metabolite TLR11/L-D (c) UROPBC(e) TXPLGD(e) PLP(e) TLRL11 (e) Unkn(e) TLRL11(c) TLR 11 (c) 134 Rxns participation (Formula) 2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c] UROPBC[e] -> UNKN[e] TXPLGD[e] -> PLP[e] PLP[e] -> TLRL11[e]; PLP[e] <=>; TXPLGD[e] -> PLP[e] TLRL11[e] -> TLRL_PLACEHOLDER[c]; 2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; PLP[e] -> TLRL11[e] UNKN[e] <=>; UROPBC[e] -> UNKN[e] MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c]; TLRL11[c] -> 2 TLR11[c] + TLRL11[e] -> TLR11/L-D[c]; MYD88-D[c] + TIR[c] + TLR11/L-D[c] -> TIR_MYD[c] + 2 TLR11[c] + TLRL11[c] Table 4.9: Added exchange reactions. Reaction EX_MYCBTB[e] EX_MYCB[e] EX_MYCP[e] EX_NEISMN[e] EX_NEIS[e] EX_PLNT[e] EX_RSV[e] EX_SALMEN[e] EX_STLCEP[e] EX_SYNCPD[e] EX_PPYMGG[e] EX_PSDMAR[e] EX_TRPNMT[e] EX_TRYPCR[e] EX_VIRS[e] EX_KLBS[e] EX_LPTSIG[e] EX_MMTV[e] EX_CLMYPN[e] EX_FUNG[e] EX_GRAMN[e] EX_GRAMP[e] EX_HOST[e] EX_BORRBG[e] EX_IL1A[e] EX_IL1B[e] EX_BACT[e] Description Exchange of mycobacterium tuberculosis (invader) Exchange of mycobacteria (invader) Exchange of mycoplasma (invader) Exchange of neisseria meningitides (invader) Exchange of neisseria (invader) Exchange of plants (invader) Exchange of RS virus (invader) Exchange of salmonella enterica (invader) Exchange of staphylococcus epidermidis (invader) Exchange of synthetic compounds (invader) Exchange of porphyromonas gingivalis (invader) Exchange of pseudomonas aerug (invader) Exchange of treponema maltophilum (invader) Exchange of trypanosoma cruzi parasite (invader) Exchange of viruses (invader) Exchange of klebsiella (invader) Exchange of leptospira interrogans (invader) Exchange of MMT virus (invader) Exchange of chlamydia pneumoniae (invader) Exchange of fungi (invaders) Exchange of Gram-negative bacteria (invader) Exchange of Gram-positive bacteria (invader) Exchange of host (invader) Exchange of borrelia burgdorfen (invader) Exchange of IL-1A Exchange of IL-1B Exchange of bacteria (invader) Reaction formula MYCBTB[e] <=> MYCB[e] <=> MYCP[e] <=> NEISMN[e] <=> NEIS[e] <=> PLNT[e] <=> RSV[e] <=> SALMEN[e] <=> STLCEP[e] <=> SYNCPD[e] <=> PPYMGG[e] <=> PSDMAR[e] <=> TRPNMT[e] <=> TRYPCR[e] <=> VIRS[e] <=> KLBS[e] <=> LPTSIG[e] <=> MMTV[e] <=> CLMYPN[e] <=> FUNG[e] <=> GRAMN[e] <=> GRAMP[e] <=> HOST[e] <=> BORRBG[e] <=> IL1A[e] <=> IL1B[e] <=> BACT[e] <=> Table 4.10: Literature evidence for the presence of proteins in monocytes. The four proteins listed here had not been expressed in two human cell lines according to immunohistological data downloaded from Human protein Atlas [42]. HPA negative TRAF2 HSPB1 MAP2K6 ITPR1 Literature evidence expression in Monocytes PMID: 18827186 PMID: 20557877 PMID: 11257452 PMID: 15995150 135 Table 4.11: Pathway curation of the monocyte draft-model based on output capabilities. Candidate genes were identified among the absent genes that appeared to be connected to the blocked output NFκB. Maximum flux (µ mol × gprotein-1 × min1) through each of the output reactions generated using FBA, for the unconstrained, hMonoTLR model after recovery of one of the five candidate genes (TIRAP = tollinterleukin 1 receptor (TIR) domain containing adapter protein; PKCZ = protein kinase C (zeta isoform); MAPK11 = mitogen-activated protein kinase 11; CAMK-II = calmodulin-dependent kinase 2; RPS6KA5 = ribosomal protein S6 kinase, 90kDa, polypeptide 5). Output Output reaction ROS ROS IRF3 IRF7 CREB AP-1 AP-1 NF-κB NF-κB DM_PHOX_GTP-3P DM_PHOX_GTP-8P DM_ISRE_IRF3 DM_ISRE_IRF7 CREB_CRE_BIND AP1_FOS_JUN_BIND AP1_JUN_BIND NFKB_IKBA_DISS NFKB_IKBB_DISS 136 recovered gene TIRAP 0 11.45 3 6 1 10.71 21 0 0 PKCZ 41 11.45 3 6 1 10.71 21 25.14 25.14 MAPK11 0 11.45 3 6 1 10.71 21 0 0 CAMK-II 0 11.45 3 6 21.5 10.75 21 0 0 RPS6KA5 0 11.45 3 6 1 10.71 21 0 0 137 Gene 815 816 817 818 5578 5590 5600 7100 8844 9252 10392 25998 114609 151742 6582 6584 283455 8945 81793 84962 Gene Name CAMK2A CAMK2B CAMK2D CAMK2G PRKCA PRKCZ MAPK11 TLR5 KSR1 RPS6KA5 NOD1 IBTK TIRAP PPM1L SLC22A2 SLC22A5 KSR2 BTRC TLR10 JUB 15 12 4 9 11 7 1 17 2 1 1 1 2 6 3 rxns 4 4 4 4 weak moderate/weak PMID:11561001 PMID:20923704 moderate/strong weak strong strong strong/weak moderate/strong weak/negative HPA moderate negative/moderate PMID:20227498 Citation PMID:16154993 PMID:16154993 PMID:16154993 PMID:16154993 PMID:8288312 PMID: 8523529 PMID: 15356147 PMID:11561001; PMID:15096475 PMID:20227498 PMID: 11257452 PMID:20584763 PMID:18596081 PMID:16439361; PMID:20525286 PMID:12121439 Decision no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent no longer absent absent absent absent absent absent absent Group of chemical compound Kinase Kinase Kinase Kinase Kinase Kinase Kinase Receptor Protein (enzyme, no kinase activity) Kinase Receptor Kinase Protein (adapter protein) Phosphatase metabolite transporter metabolite transporter Protein (enzyme, no kinase activity) Protein Receptor Protein Table 4.12: Curation of hMonoTLR. In order to represent monocyte function, a number of genes were reintroduced according to literature evidence for expression in human monocytes. The table lists the genes alongside with the number of reactions, it was assigned to, corresponding literature, expression according to the Human Protein Atlas (HPA) for cell lines [32], and based on the evidence the decision whether to reincorporate the gene or not. The genes for KSR1 and NOD1 were no longer considered absent while generating the LPS stimulation specific model. Table 4.13: Significantly up-regulated hMonoTLR_LPS genes. The Table lists the number of significantly tested identifiers, number of total number of identifiers per genes, and the resulting percentage of regulated identifiers. Note that in case of the three genes (IRAK2, TLR4, and PLD1) only a minority of identifiers was differently expressed after LPS stimulation. Gene name NFKB1 DUSP1 IL1B MAP3K7IP2 TNFAIP3 TIFA EIF4E MAP3K8 MYC TXN RIPK2 NFKBIZ TNIP3 CASP1 UBE2D1 PELI1 PPP3CC IL1R1 TBK1 JUN SOCS1 TNIP1 IL1A IRAK2 TLR4 PLD1 Entrez Gene ID 4790 1843 3553 23118 7128 92610 1977 1326 4609 7295 8767 64332 79931 834 7321 57162 5533 3554 29110 3725 8651 10318 3552 3656 7099 5337 Number of probesets significant 1 2 2 2 2 2 4 2 1 2 2 2 1 4 2 2 2 1 1 2 2 1 1 1 1 1 Number of total probesets 1 2 2 2 2 2 4 2 1 2 2 2 1 5 3 3 4 2 2 4 4 2 2 3 4 7 % differentially expressed 100 100 100 100 100 100 100 100 100 100 100 100 100 80 67 67 50 50 50 50 50 50 50 33 25 14 Table 4.14: Significantly down-regulated hMonoTLR_LPS genes. The Table lists the number of significantly tested identifiers, number of total number of identifiers per genes, and the resulting percentage of regulated identifiers. Note that in case of five genes (CBL, CASP8, MAP3K3, YWHAH, and MBP) only a minority of identifiers was differently expressed after LPS stimulation. Gene name TLR1 MAP3K14 TLR6 FADD CBL CASP8 MAP3K3 YWHAH MBP 138 Entrez Gene ID 7096 9020 10333 8772 867 841 4215 7533 4155 Number of probesets significant 1 1 1 1 2 1 1 1 1 Number of total probesets 1 1 1 1 5 3 3 3 8 % differentially expressed 100 100 100 100 40 33 33 33 13 139 NFKB_IKBB_DISS NFKB_IKBA_DISS AP1_JUN_BIND AP1_FOS_JUN_BIND CREB_CRE_BIND DM_ISRE_IRF7[n] DM_ISRE_IRF3[n] DM_PHOX_GTP-8P[v] DM_PHOX_GTP-3P[v] max min max min max min max min max min max min max min max min max min IL1R1_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 NOD1P_BIND 0 0 0 0 0 0 0 0 0 0 0.167 0 0 0 1 0 1 0 NOD2P_BIND 0 0 0 0 0 0 0 0 0 0 0.167 0 0 0 0.667 0 0.667 0 STLR4_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR1/2_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR2/L-D_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.667 0 0.667 0 TLR4/L_MD2_BIND 1 0 0.375 0 0.25 0 0.5 0 0.5 0 0.25 0 0.5 0 0.667 0 0.667 0 TLR5_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR7_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR8_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR9_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 TLR9_BINDII 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 STLR2/L_SCD14_BIND 1 0 0.375 0 0 0 0 0 0.5 0 0.25 0 0.5 0 0.333 0 0.333 0 Table 4.15: I/O relationships. Outputs produced when receptor input was each 1 (µ mol × gprotein-1 × min-1). Baseline flux values resulting from thermodynamically infeasible loops have been subtracted. DM_ISRE_IRF3[n] DM_ISRE_IRF7[n] AP1_FOS_JUN_BIND AP1_JUN_BIND NFKB_IKBA_DISS NFKB_IKBB_DISS min min min min min min IL1R1_BIND 0.000 0.000 0.010 0.019 0.011 0.027 NOD1P_ BIND 0.000 0.000 0.006 0.000 0.034 0.082 NOD2P_BIND 0.000 0.000 0.006 0.000 0.023 0.055 STLR4_BIND 0.000 0.000 0.010 0.019 0.011 0.027 TLR1/2_BIND 0.000 0.000 0.010 0.019 0.011 0.027 TLR2/L-D_BIND 0.000 0.000 0.010 0.019 0.023 0.055 TLR4/L_MD2_ BIND 0.007 0.008 0.010 0.019 0.023 0.055 Table 4.16: Changes due to mapping of quantitative gene expression changes. Observed changes in min flux values (µ mol × gprotein-1 × min-1) of model output reactions, defined by FVA after mapping of quantitative gene expression changes. No changes were observed in the maximum values. IRF3 IRF7 AP-1 AP-1 NFκB NFκB 140 141 IRF3 IRF7 AP-1 AP-1 NFκB NFκB DM_ISRE_IRF3[n] DM_ISRE_IRF7[n] AP1_FOS_JUN_BIND AP1_JUN_BIND NFKB_IKBA_DISS NFKB_IKBB_DISS min min min min min min TLR7_BIND 0.000 0.000 0.010 0.019 0.011 0.027 TLR8_BIND 0.000 0.000 0.010 0.019 0.011 0.027 TLR9_BIND 0.000 0.000 0.010 0.019 0.011 0.027 TLR9_BINDII 0.000 0.000 0.010 0.019 0.011 0.027 STLR2/L_SCD14_BIND 0.000 0.000 0.010 0.019 0.011 0.027 Table 4.17: Changes due to mapping of quantitative gene expression changes. Observed changes in min flux values (µ mol × gprotein-1 × min-1) of model output reactions, defined by FVA after mapping of quantitative gene expression changes. No changes were observed in the maximum values. File S1 142 1 Maike K. Aurich and Ines Thiele, Contextualization procedure and modeling of monocyte specific TLR signaling. Workflow for the generation of a cell-type specific network of TLR-signaling Requirements of software and packages: – R (integrated suite of software facilities for data manipulation, calculation and graphical display, http://www.r-project.org/, [1]) and PANP package [2] – Matlab (Mathworks, Inc) – COBRA toolbox (http://opencobra.sourceforge.net/openCOBRA/Welcome.html, [3]) and a linear programming solver – Paint4Net [4] Step 1: Select data • Download CEL files from e.g. Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/, [5]). Step 2: Generate P/A calls from gene expression data using R • Use PANP package [2] in R for data processing and the generation of P/A calls. 2 Workflow in R: >source(”http://bioconductor.org/biocLite.R”)biocLite(”panp”) > setwd (e.g.’C:/...’) > library(gcrma) > Data < − ReadAffy() read data in working directory > eset < − gcrma(Data) >library(panp) > PA < − pa.calls(eset, looseCutoff = 0.05,tightCutoff = 0.01, verbose = FALSE) > myPcalls < − PA$Pcalls > write.table(myPcalls, ”A P Calls”, append = FALSE, quote = TRUE,sep = ””,eol = ”\n” , na = ”NA”,dec = ”.”, row.names = TRUE) > myPvals < − PA$Pvals > write.table(myPvals, ”A P vals”, append = FALSE, quote = TRUE,sep = ””,eol = ”\n”, na = ”NA”,dec = ”.”, row.names = TRUE) > write.exprs(eset,file=”myresults.txt”) This was the only analysis step done performed in R. From now on only use Matlab. Step 3: Derive cell-type specific gene lists of absent genes from P/A calls at two different cutoffs • Map Affymetrix IDs (A/P calls) to Entrez Gene IDs (ihsTLRv2.genes) (e.g using IDconverter http://idconverter.bioinfo.cnio.es/ [6]). • If multiple Affymetrix IDs matched to one model gene, we used the Affymetrix IDs showing the highest mean expression intensity in the untreated group. For the generation of the monocyte model, we derived calls using Affymetrix IDs specified in (File S2, Table S14). • Summarize calls for untreated replicates (to generate monocyte model) to receive one call (A or P) per ihsTLRv2 gene and cutoff (p≤ 0.01 and p≤ 0.05). • Marginal calls are absent in the tight (p≤ 0.01) and present in the loose (p≤ 0.05) cutoff. If a gene received absent calls in the majority of replicates, call the gene absent. • Lists of absent genes used for the generation of the monocyte draft models (p≤ 0.01 and p≤ 0.05) are provided in the supplementary information (File S2, Table S15). 3 Step 4: Generate of two draft models from lists of absent genes • Use ihsTLRv2, lists of absent genes and the COBRA toolbox [3] function deletemodelgenes to generate two draftmodels. • Set columns of constrained reactions in the draftmodel S matrix to zero. Commands in Matlab: [draftmodel,hasEffect,constrRxnNames,deletedGenes] = deleteModelGenes(model,geneList); constrRxn= find(ismember(draftmodel,constrRxnNames)); for i = 1 : length(constrRxn) draftmodel.S(:,constrRxn(i))=0; end • The applied method relies entirely on the input (list of absent genes) and disables flux through any reaction associated with a deleted gene if no isozymes are associated. Gene expression data are known to be noisy, therefore it is important to find a suitable cutoff when using this method, and to perform manual curation afterwards. Step 5: Find the best draft reconstruction with respect to cell-type • Check if absent genes are absent at protein level in target cell-type using the Human Protein Atlas and decide based on this additional information, on the biologically most conclusive cutoff. Step by step: – Find the set of genes absent in the tight (p≤ 0.01) and present in the loose (p≤ 0.05) cutoff. – Check for protein expression using the Human Protein Atlas (http://www.proteinatlas.org/) in the cell-type (herein, two monocytic leukemia cell lines (THP-1 and U-937)). – If the antibody yielded at least weak staining for the majority of tests in one cell sample, call the gene product present. – Use expression information and probability values from P/A calls (A P vals file generated by PANP) to define the most suitable cutoff. – Perhaps also take into consideration the number of blocked reactions, dead ends and input receptor covered by either draftmodel. 4 Commands in Matlab: [minFlux,maxFlux] = fluxVariability(draftmodel,0); Flux = [minFlux maxFlux]; for i = 1 : length(Flux) x = length (find (abs(Flux(i,:))<=10e-10==0))==2; Blockedrxns(i,1) = x; end for i = 1:length(draftmodel.mets) MetConn(i,1)=length(find(draftmodel.S(i,:))); end MetConnCompare1 = sort(MetConnTLR,’descend’); deadends(1,1) = length(find(MetConnCompare1(:,1)==1)); • Decide on a suitable cutoff. • For the monocyte, we decided to continue with the draftmodel generated using data from the loose (p≤ 0.05) cutoff. • If necessary, repeat draft model generation based on absent gene set of the newly defined cutoff (Step 4). Step 6: Curate draftmodel: complete disconnected output pathways • Check using Flux variability analysis (FVA) if the draftmodel can produce all outputs known to be produced in the cell-type. Command in Matlab: [minFlux,maxFlux] = fluxVariability(model,0); – Monocytes produce all six ihsTLRv2 outputs. However, the monocyte draftmodel was not able to produce e.g. NF-κB. • Search for candidate genes to complete disconnected output pathways using the pathway illustration tool, Paint4Net [4]: – First, identify a list of reactions involved in the signaling pathway towards NF-κB using ihsTLRv2, which contains the complete pathways, as reference. Choose a wide radius such that involvedRxns contains the complete pathway from an input to the specified output to capture the entire pathway. 5 Command in Matlab: [involvedRxns,involvedMets,deadEnds]=draw by met(model,metAbbr,drawMap,... radius,direction,excludeMets,flux); – Do the same for the draftmodel and the disconnected output pathways. – Compare the resulting list of participating reactions to reveal the missing links in the draftmodel. Find the genes associated with these reactions. These are the candidate genes. – Generate draft models, while reincorporating one candidate gene at a time and check the impact on output production. – Candidate genes identified using this approach and output fluxes derived after reincorporation of candidate genes was added to the supplementary information (File S2, Table S6). Step 7: Curate draftmodel: Curate model based on cell-type specific literature • Search for literature evidence of absent genes being expressed in the specific cell-type. – We only considered genes, which were absent in the monocyte draftmodel, while isoforms of already captured genes were ignored. – During manual curation, 14 genes were reintroduced to the ihsMonoTLR model based on literature support (File S2, Table S7). Step 8: Derive curated cell-type model • Update gene list by removing curated genes from list of absent genes at defined cutoff (p≤ 0.05). • Generate final cell-type model based on deletion of curated gene list (Step 4). • The absent gene list used for the generation of the final monocyte model is provided in the supplementary information (File S2, Table S15). • Use COBRA function extractSubNetwork and the set of reactions that remained unconstrained during model generation (constrRxnNames output of deletemodelgenes provides the list of constrained reactions). Command in Matlab: subModel = extractSubNetwork(model,rxnNames) Step 9: Tailor cell-type model condition specific • Find list of cell-type model genes absent in the specific condition (e.g. LPS treatment of monocytes). • Repeat Step 4 using the final cell-type model as model and the set of absent genes. • In case of the monocyte model two genes (Entrez Gene ID: 246330 and 10333) were deleted in order to derive the LPS-stimulation specific monocyte network (hMonoTLR LPS). 6 References 1. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5: R80. 2. Warren P, Taylor D, Martini P, Jackson J, Bienkowska J (2007) PANP-a new method of gene detection on oligonucleotide expression arrays. In: Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. IEEE, pp. 108–115. 3. Schellenberger J, Que R, Fleming R, Thiele I, Orth J, et al. (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature Protocols 6: 1290–1307. 4. Kostromins A, Stalidzans E (2012) Paint4Net: COBRA Toolbox extension for visualization of stoichiometric models of metabolism. BioSystems . 5. Edgar R, Domrachev M, Lash A (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research 30: 207–210. 6. Alibés A, Yankilevich P, et al. (2007) IDconverter and IDClight: Conversion and annotation of gene and protein IDs. BMC bioinformatics 8: 9. File S3 149 Figure 1. Sensitivity analysis Figure 2. Sensitivity analysis 1 Figure 3. Sensitivity analysis Figure 4. Sensitivity analysis 2 Figure 5. Sensitivity analysis Figure 6. Sensitivity analysis 3 Figure 7. Sensitivity analysis Figure 8. Sensitivity analysis 4 Figure 9. Sensitivity analysis Figure 10. Sensitivity analysis 5 Figure 11. Sensitivity analysis Figure 12. Sensitivity analysis 6 Figure 13. Sensitivity analysis 7 5 Conclusions and future directions 5.1 Conclusions Modern high-throughput techniques offer immense opportunities to investigate whole systems behavior, with human diseases being one application. However, the immanent complexity of the data challenges the interpretation, and new avenues need to be taken in order to handle the complexity of both, diseases and data. In chapter 1 of this thesis, I provided an overview of the COBRA approach, and how its biomedical applications and the number of algorithms for the integration of omics data into the network context is constantly increasing [24, 127, 146, 276]. At the starting point of this thesis, existing approaches to build cell type or condition specific metabolic models mainly emphasized on the integration of proteomic and transcriptomic data. This PhD work initially focused on the use of metabolomics data as primary data type during condition specific network generation. It provides approaches for the integration of both quantitative (chapter 2) and semi-quantitative (chapter 3) extracellular metabolomic data, and shows how COBRA can be used to gain insights into the intracellular metabolic network that could not have been drawn from the data alone. In chapter 2, I used published quantitative metabolomic data [137] and the human metabolic model to investigate metabolic heterogeneity among a large set of cancer cell lines. I generated and analyzed a set of 120 cancer cell line specific metabolic models from the same number of single metabolomic profiles. The data set was particularly interesting, because all cell lines were grown under the same set of environmental conditions, emphasizing genetic differences among the cancer cell lines. Computational analysis of the model set allowed me to reveal the vast metabolic heterogeneity among the models, and to classify the models into distinct phenotypic groups. The classifications were based on distinct utilization of ATP and cofactor production pathways, as well as the flexibility of the models to variation of nutrient uptake and secretion rates. Thus, the inference of internal metabolic states from extracellular metabolomic data revealed large differences of metabolic strategies among the cell line models. The cancer cell line models differed with respect to their feasible range of oxy- 157 gen uptake rates, which was one example of how predictions by models, generated from extracellular metabolomic data alone, can provide novel insight into cellular metabolic traits. Some of the cell line models had a limited range of feasible uptake rates, whereas others were indifferent towards oxygenation. Moreover, limitation of oxygen uptake rates induced a dependency on reductive carboxylation in an increasing number of cancer cell line models compared to the ’unlimited’ condition. This result could have broad implications for experimental work, since it demonstrated, that the oxygenation conditions define pathway usage. Experiments are usually conducted under normoxic conditions, which is much higher than tissue or tumor oxygenation [172]. Thus, normoxic conditions might introduce a cell line specific bias into experimental results, and the conclusions drawn from these experiments. One methodological challenge addressed in this study was the identification of ’undetected exchanges’, either due to the limitation of the detection method or the uncertain composition of serum. This problem was addressed by predicting a minimal number of ’hypothetically undetected’ additional exchanges, based on the context of the human metabolic model and subject to the stated objective function. Frequently added metabolite exchanges were connected to cancer by literature query. Additionally, some of the models correctly predicted essential genes, i.e., PGDH dependency in the SK-MEL-28 models. These correct predictions provided additional support for the approach to model generation from extracellular metabolomic data alone. Although the strategy of minimizing the number of exchanges provided a vast reduction of the overall model size, remained the internal network redundancy preserved. Nevertheless, meaningful predictions could be drawn from the models, suggesting that the metabolic profiles were comprehensive enough to provide sufficient definition of the solution spaces of the model. However, metabolic profiles often capture less metabolites, leading to the question if meaningful predictions can be drawn from less comprehensive, or semi-quantitative extracellular metabolomic profiles? And if additional transcriptomic data would improve predictions, or could even provide additional insights in combination with the model predictions? In chapter 3, I used another set of metabolomic data to generate metabolic models of two lymphoblastic leukemia cell lines. I developed an approach for the mapping of semi-quantitative extracellular metabolomic data. Additionally, transcriptomic data was used to apply constraints to the internal reactions. Quantitative differences in the metabolite exchange profiles of the two cell lines were translated into quantitative differences in the constraints imposed on metabolite exchange reactions in the two models. Further, unique uptake and secretion of metabolites were detected in one cell line or the other. As a consequence of the imposed constraints, the two cancer cell line models explained the differences in the extracellular metabolomic and transcriptomic data by different utilization of glycolysis and oxidative phospho- 158 rylation, suggesting aerobic glycolysis was more utilized by the CCRF-CEM model compared the Molt-4 model, which utilized more oxidative phosphorylation. Differential gene expression analysis and analysis of alternative splicing events distinguished the two cell lines, and revealed an accumulation of these regulatory events at rate limiting steps of central metabolic pathways, which further supported that the two cell lines had distinct metabolic phenotypes. Thus, integration of semiquantitative extracellular metabolomics data into the context of a human metabolic model enabled the interrogation of metabolic differences among cell lines. Additionally, the potential of integrating multiple omics data set was emphasized. Differential gene expression is suitable to push the systems analysis to the next level, i.e., to find indicators that might explain the metabolic differences between the cell lines. Such a comparison of differential gene expression and predicted reaction fluxes was also performed to compare adiposite metabolism in obese versus lean subjects, and again for the investigation of disease mechanisms in non-alcoholic fatty liver disease [144, 146]. However, to my knowledge, differential splice variants have so far not been taken into consideration. Considering the differential regulation of genes as potential determiner of the observed metabolic differences between cell lines ultimately reveals the boundaries of using metabolic models, which to date and in humans does not consider regulatory impacts. In chapter 1 of this thesis, I discussed that in cancer, rewiring of the metabolic network is connected to genetic alternations affecting characteristic signaling pathways. A first step was to answer the question, as to how far signaling networks vary on the level of cell types or conditions, and if the COBRA approach could provide valuable insights into such cell type and condition specific differences through network contextualization using omics data sets. Among the only two reconstructed signaling networks in higher organisms was the mammalian TLR signaling network [7, 8]. Generally, cell type specific differences in TLR expression had been observed, as well as an involvement of TLR signaling in diseases such as cancer was evident [38, 39, 211, 212, 213]. The published TLR signaling network did not include genes and GPRs, which was an obstacle to data integration. Hence, I manually identified the genes encoding the proteins in the network and formulated the GPRs to enable mapping of transcriptomic data. Subsequently, I demonstrated, using the newly identified set of genes and various data sets, e.g., InnateDB, and the Human Protein Atlas [190, 227], that the network was indeed subject to cell type specific differences, and that cell type or condition specificity can be displayed by the signaling network. Further, comparison of the quantitative differences in the abundance of the respective signaling proteins and TLR receptors could reveal differences between cancer cell lines, for which large-scale proteomic data sets had been recently published [110, 111]. Apart from 159 quantitative and qualitative tissue specific differences of protein expression, I also investigated the disease relevance of the TLR signaling network. Based on the newly identified gene set I identified SNPs for 12 distinct network genes linked to known clinical phenotypes including cancer (Table 4.3). COBRA is limited to the exploration of steady-states. Nevertheless, network contextualization could provide valuable insights into TLR signaling. I set up a pipeline for the contextualization of signaling networks, which was based on transcriptomic data. To establish the contextualization of a signaling network, I used data from a cell type which was well studied with respect to TLR signaling, such that the outcome could be readily validated based on existing literature. I first generated a cell type specific, monocyte model of TLR signaling. The threshold for the definition of absent and present gene sets was based on manual evaluation of the overlap between presence of gene expression and presence of protein expression [190]. Additionally, the preliminary monocyte model was manually curated to fill in the emerging gaps. The final cell type specific monocyte model was further tailored towards an LPS activation condition specific TLR signaling network of the human monocyte. There were only minor differences in qualitative gene expression. However quantitative differential gene expression between LPS activated and non-activated human monocytes revealed topological differences. Differential gene regulation post LPS stimulation involved about one third of the TLR signaling network of the monocyte including the transcription factors NF-κB and AP-1, which was an expected response [242, 258]. Thus, the contextualization of the signaling network constitutes a tool to investigate the impact of changes in the network topology and generation of input/output relationships in a cell type or condition specific manner. In addition to the contextualization, I demonstrated linkage of the TLR signaling network to metabolism by elucidating the dependency of the individual input-output relationships on energy metabolites. Thus, this thesis furthers the application of COBRA to the analysis of omics data, and for biomedical applications in multiple ways. It introduces novel approaches to the generation of condition specific metabolic models from extracellular metabolomics data sets alone (Chapter 2), or in combination with transcriptomic data (Chapter 3), and opens further the avenue for the contextualization of signaling networks (Chapter 4). 5.2 Future applications The outcome of this thesis are working procedures for the contextualization of metabolic models with an emphasis on the integration of extracellular metabolomic 160 data [15, 252], and for the contextualization of a signaling network. Multiple future steps arise as potential continuation of the work presented in this thesis. 5.2.1 Extension of the TLR signaling network. TLR signaling is, not least for its contribution to human disease, an active field of research. The mammalian signaling network was compiled based on a map published in 2006 [7, 43]. New knowledge has been generated since then, and it would be worthwhile to add this emerging biochemical knowledge into ihsTLRv2 in order to provide a better framework for future contextualization. In order to widen the scope of ihsTLRv2 for the modeling of neurodegenerative diseases, the microglial MAC1 receptor could be incorporated into ihsTLRv2, which has been shown to become activated though β -amyloid peptide, followed by activation of NADPH oxidase (PHOX), production of superoxide and potentially the death of dopaminergic neurons [277]. In order to integrate the response to β -amyloid into the TLR network, two network species need to be added to ihsTLRv2, MAC1 receptor and RAC. Besides the use of transcriptomic or proteomic data as demonstrated herein, the TLR signaling network could provide a context for the analysis of additional omics data sets, e.g., phosphoproteomics. The consequences of LPS stimulation on the protein phosphorylation in primary macrophages could be a good starting point to establish the integration of this data, since it again deals with a well studied system and the quality of the insights gained through the analysis could be validated based on the existing literature [278]. 5.2.2 Future directions in the integration of metabolomics data The use of metabolomic data seems particularly promising for personalized health and diagnostics. These data sets are particularly well suited to investigate metabolic phenotypes, since metabolites constitute the entities of the metabolic networks, with compared to transcriptomics or proteomics, less regulatory influence (Figure 1.3). Furthermore, metabolomic data sets have become increasingly comprehensive in recent years and metabolomics profiles from the spent medium of cell cultures or patients’ bio fluids can be relatively easily obtained. One future goal would be the use of COBRA as tool to predict disease risk (e.g. IEM or cancer), or to support personalized medication. However, a number of challenges have to be overcome in order to reach this point. Methods need to be established to integrate patient-derived body fluid samples (e.g. blood, urine, interstitial fluid or even mixtures of these) 161 into the context of the metabolic reconstruction. Currently, uptake rates are taken into account, or the data was incorporated into a non-growth associated biomass objective function [117, 143]. The prediction of a minimized set of ‘undetected’ metabolite exchanges, as demonstrated in chapter 2, will enable the integration of incomplete extracellular metabolomics profiles. However, experimental validation will need to confirm the correctness of the predicted additional exchanges, and the potential impact of false prediction on the phenotype of the patient-specific model needs to be evaluated. Furthermore, the prediction of metabolite exchanges carried out herein depends on a stated OF. Whereas biomass production can be a valid assumption for proliferating cancer cells or cell lines, is the definition of an OF of normal human cells problematic. The variability of metabolic profiles from cells of the same cell line and the impact on the metabolic phenotype of the cell line model became evident in chapter 2. Further work is needed to investigate the variability of the exchanges across studies, to prevent false prediction, and evaluate the robustness of the approach towards natural variability. Samples from body fluids bear the problem that the cellular origin of increased or decreased metabolite levels is unknown. However, the application of Recon as whole body metabolic network (including all the organs), and to explain the differences in the plasma as the net metabolite changes (uptake and secretion) mediated by cells throughout human body [53, 143], could be a good starting point. 5.2.3 COBRA modeling of cancer and beyond The number of aspects of cancer metabolism that are investigated using COBRA is constantly expanding, and goes even beyond metabolism, e.g. solvent capacity [102]. Two of the studies comprising the thesis (chapter 2 & 3) add up to the growing number of cancer studies, namely providing novel insights into the metabolic heterogeneity and the robustness towards systems perturbations among cancer cell lines. Similarly, the methods could be applied to other cellular systems and disease conditions. TLR signaling has been connected to cancer and cancer-associated metabolic phenotypes, e.g., it has been observed that TLR agonists can stimulate a switch from oxidative phosphorylation to glycolysis in murine immune cells [255, 256]. Part of chapter 4 described the absence of TLR receptors from the quantitative proteomic data of two cancer cell lines. It would be interesting to investigate the expression using alternative data sets for a correlation between phenotype and the pathways comprising the TLR signaling cascades. Extensive cross-talk exists between signaling pathways. As an example, cross-talk between the epidermal growth factor receptor (EGFR) and TLR signaling was shown to compromise the immune response to viruses [279]. However, to investigate such cross-talk, the epidermal growth factor 162 receptor (EGFR) network would need to be reconstructed. Chapter 4 emphasized the energy requirements of the distinct input-output pathways in the TLR signaling network that constitute direct links between the signaling and metabolic network. Another good example for the interlocking of signaling and metabolism is the protein kinase function of the glycolysis enzyme PKM2 and its impact on gene transcription [280, 281]. Taken together, important insights into the crosstalk between signaling and metabolism can be expected from the integration of models of different cellular processes, i.e., metabolism and signaling, or metabolism and gene regulatory networks [256, 259, 260, 261, 262]. Tools to accomplish the coupling are being developed [6, 27, 263], however the size of such networks likely exceed currently available computing power. Finally, predictions from contextualized networks can only be as good as the starting model. Continuous improvement and extension of the starting model based on existing and emerging biochemical knowledge or data, as carried out throughout this work (Figure 1.4) [53, 58, 173], is one of the most important tasks to facilitate the analysis of omics data in the model context, and to improve the predictions and the applicability of the biochemical reaction networks in general. Hence, this thesis provides starting points for a wide range of applications of biochemical reaction networks in health and disease. 163 Bibliography [1] Kitano, H. Foundations of Systems Biology. MIT press Cambridge, MA, (2001). [2] Palsson, B. Systems biology : properties of reconstructed networks. Cambridge University Press, Cambridge ; New York, (2006). [3] Kitano, H. Science 295(5560), 1662–1664 (2002). [4] Machado, D., Costa, R. S., Rocha, M., Ferreira, E. C., Tidor, B., and Rocha, I. AMB Express 1(1), 1–14 (2011). [5] Durot, M., Bourguignon, P.-Y., and Schachter, V. FEMS Microbiology Reviews 33(1), 164–190 (2009). [6] Thiele, I., Jamshidi, N., Fleming, R., and Palsson, B. PLoS Computational Biology 5(3), e1000312 (2009). [7] Li, F., Thiele, I., Jamshidi, N., and Palsson, B. PLoS Computational Biology 5(2), e1000292 (2009). [8] Papin, J. and Palsson, B. Biophysical Journal 87(1), 37–46 (2004). [9] Thiele, I. and Palsson, B. Nature Protocols 5(1), 93–121 (2010). [10] Reed, J. L., Famili, I., Thiele, I., and Palsson, B. Ø. Nature Reviews Genetics 7(2), 130–141 (2006). [11] Orth, J., Thiele, I., and Palsson, B. Nature Biotechnology 28(3), 245–248 (2010). [12] Lewis, N. E., Jamshidi, N., Thiele, I., and Palsson, B. Ø. Encyclopedia of Complexity and Systems Science, Robert A Meyers (ed) (2009). [13] Varma, A. and Palsson, B. Ø. Nature Biotechnology 12, 994–998 (1994). [14] Terzer, M., Maynard, N. D., Covert, M. W., and Stelling, J. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 1(3), 285–297 (2009). [15] Schellenberger, J., Que, R., Fleming, R., Thiele, I., Orth, J., Feist, A., Zielinski, D., Bordbar, A., Lewis, N., Rahmanian, S., et al. Nature Protocols 6(9), 1290–1307 (2011). 165 [16] Lewis, N. E., Nagarajan, H., and Palsson, B. Ø. Nature Reviews Microbiology 10(4), 291–305 (2012). [17] Price, N. D., Reed, J. L., and Palsson, B. Ø. Nature Reviews Microbiology 2(11), 886–897 (2004). [18] Savinell, J. M. and Palsson, B. Ø. Journal of theoretical biology 154(4), 421–454 (1992). [19] Vo, T. D., Greenberg, H. J., and Palsson, B. Ø. Journal of Biological Chemistry 279(38), 39532–39540 (2004). [20] Patino, C. E. H., Jaime-Munoz, G., and Resendis-Antonio, O. Frontiers in Physiology 3, 481 (2012). [21] Bordbar, A., Lewis, N., Schellenberger, J., Palsson, B. Ø., and Jamshidi, N. Molecular Systems Biology 6(1), 422 (2010). [22] Mahadevan, R. and Schilling, C. (2003). Metabolic engineering 5(4), 264–276 [23] Edwards, J. S. and Palsson, B. Ø. BMC Bioinformatics 1(1), 1–1 (2000). [24] Folger, O., Jerby, L., Frezza, C., Gottlieb, E., Ruppin, E., and Shlomi, T. Molecular Systems Biology 7, 501 (2011). [25] Schellenberger, J. and Palsson, B. Ø. Journal of Biological Chemistry 284(9), 5457–5461 (2009). [26] Kaufman, D. E. and Smith, R. L. Operations Research 46(1), 84–95 (1998). [27] Thiele, I., Fleming, R. M., Que, R., Bordbar, A., Diep, D., and Palsson, B. O. PLoS ONE 7(9), e45635 (2012). [28] Jensen, P., Lutz, K., and Papin, J. BMC Systems Biology 5(1), 147 (2011). [29] Papin, J. A., Hunter, T., Palsson, B. Ø., and Subramaniam, S. Nature Reviews Molecular Cell Biology 6(2), 99–111 February (2005). [30] Papin, J. and Palsson, B. Journal of theoretical biology 227(2), 283–297 (2004). [31] Kawai, T. and Akira, S. Cell Death & Differentiation 13(5), 816–825 (2006). [32] Akira, S., Takeda, K., and Kaisho, T. Nature Immunology 2(8), 675–680 (2001). [33] Beutler, B. Molecular Immunology 40(12), 845–859 (2004). 166 [34] Kaisho, T. and Akira, S. Journal of Allergy and Clinical Immunology 117(5), 979–987 (2006). [35] Xu, D., Komai-Koma, M., and Liew, F. Y. Cellular Immunology 233(2), 85–89 (2005). [36] MacLeod, H. and Wetzler, L. M. Science Signaling 2007(402), pe48 (2007). [37] Takeda, K. and Akira, S. International Immunology 17(1), 1–14 (2005). [38] Ospelt, C. and Gay, S. The International Journal of Biochemistry & Cell Biology 42(4), 495–505 (2010). [39] Rakoff-Nahoum, S. and Medzhitov, R. Nature Reviews Cancer 9(1), 57–63 (2008). [40] Bajikar, S. S. and Janes, K. A. Annals of Biomedical Engineering (2012). [41] Kawai, T., Takeuchi, O., Fujita, T., Inoue, J.-i., Mühlradt, P. F., Sato, S., Hoshino, K., and Akira, S. The Journal of Immunology 167(10), 5887–5894 (2001). [42] Dunne, A., Marshall, N., and Mills, K. Current Opinion in Pharmacology 11(4), 404–411 (2011). [43] Oda, K. and Kitano, H. Molecular Systems Biology 2(1), 2006 0015 (2006). [44] Voet, D., Voet, J., and Pratt, C. Fundamentals of Biochemistry, Upgraded Edition. John Wiley & Sons, Inc, United States of America, (2002). [45] Duarte, N., Becker, S., Jamshidi, N., Thiele, I., Mo, M., Vo, T., Srivas, R., and Palsson, B. Proceedings of the National Academy of Sciences 104(6), 1777–1782 (2007). [46] Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., and Herrgard, M. J. Nature Protocols 2(3), 727–738 March (2007). [47] Shlomi, T., Cabili, M., Herrgard, M., Palsson, B., and Ruppin, E. Nature Biotechnology 26(9), 1003–1010 (2008). [48] Rolfsson, O., Palsson, B. Ø., and Thiele, I. BMC Systems Biology 5(1), 155 (2011). [49] Lewis, N., Schramm, G., Bordbar, A., Schellenberger, J., Andersen, M., Cheng, J., Patel, N., Yee, A., Lewis, R., Eils, R., et al. Nature Biotechnology 28(12), 1279–1285 (2010). [50] Bordbar, A. and Palsson, B. Ø. Journal of Internal Medicine 271(2), 131–141 (2012). 167 [51] Jerby, L. and Ruppin, E. Clinical Cancer Research 18(20), 5572–5584 Oct (2012). [52] Heinken, A., Sahoo, S., Fleming, R. M., Thiele, I., et al. Gut microbes 4(1), 28–40 (2013). [53] Thiele, I., Swainston, N., Fleming, R. M., Hoppe, A., Sahoo, S., Aurich, M. K., Haraldsdottir, H., Mo, M. L., Rolfsson, O., Stobbe, M. D., et al. Nature Biotechnology 31, 419–425 (2013). [54] Hao, T., Ma, H.-W., Zhao, X.-M., and Goryanin, I. BMC Bioinformatics 11(1), 393 (2010). [55] Gille, C., Bölling, C., Hoppe, A., Bulik, S., Hoffmann, S., Hübner, K., Karlstädt, A., Ganeshan, R., König, M., Rother, K., et al. Molecular Systems Biology 6(1) (2010). [56] Sahoo, S., Franzson, L., Jonsson, J. J., and Thiele, I. Molecular BioSystems 8, 2545–2558 (2012). [57] Sahoo, S. and Thiele, I. Human molecular genetics 22(13), 2705–2722 (2013). [58] Sahoo, S., Aurich, M. K., Jonsson, J. J., and Thiele, I. Frontiers in Physiology 5 (2014). [59] Siegel, R., Naishadham, D., and Jemal, A. CA: A Cancer Journal for Clinicians 63(1), 11–30 (2013). [60] Masters, J. and Palsson, B. Human Cell Culture: Volume I: Cancer Cell Lines. Cancer Cell Lines. Springer, (1999). [61] Cairns, R. A., Harris, I. S., and Mak, T. W. Nature Reviews Cancer 11(2), 85–95 (2011). [62] Stratton, M. R., Campbell, P. J., and Futreal, P. A. Nature 458(7239), 719– 724 (2009). [63] Hudson, T. J., Anderson, W., Aretz, A., Barker, A. D., Bell, C., Bernabé, R. R., Bhan, M., Calvo, F., Eerola, I., Gerhard, D. S., et al. Nature 464(7291), 993–998 (2010). [64] Warburg, O. et al. Science 123(3191), 309–314 (1956). [65] Frezza, C. and Gottlieb, E. Seminars in Cancer Biology 19(1), 4 – 11 (2009). [66] Feron, O. Radiotherapy & Oncology 92(3), 329–333 Sep (2009). 168 [67] Guppy, M., Greiner, E., and Brand, K. European Journal of Biochemistry 212(1), 95–99 (1993). [68] Vazquez, A. and Oltvai, Z. N. PLoS ONE 6(4), e19538 (2011). [69] Metallo, C. M., Gameiro, P. A., Bell, E. L., Mattaini, K. R., Yang, J., Hiller, K., Jewell, C. M., Johnson, Z. R., Irvine, D. J., Guarente, L., et al. Nature 481(7381), 380–384 (2011). [70] Fan, J., Kamphorst, J. J., Rabinowitz, J. D., and Shlomi, T. Journal of Biological Chemistry 288(43), 31363–31369 (2013). [71] Carracedo, A., Cantley, L. C., and Pandolfi, P. P. Nature Reviews Cancer 13(4), 227–232 (2013). [72] Ganapathy, V., Thangaraju, M., and Prasad, P. D. Pharmacology & Therapeutics 121(1), 29–40 Jan (2009). [73] Fuchs, B. C. and Bode, B. P. Seminars in Cancer Biology 15(4), 254–266 Aug (2005). [74] Verkman, A. S., Hara-Chikuma, M., and Papadopoulos, M. C. Journal of Molecular Medicine 86(5), 523–529 May (2008). [75] Calvo, M. B., Figueroa, A., Pulido, E. G., Campelo, R. G., and Aparicio, L. A. International Journal of Endocrinology 2010 (2010). [76] Fletcher, J. I., Haber, M., Henderson, M. J., and Norris, M. D. Nature Review Cancer 10(2), 147–156 Feb (2010). [77] Frezza, C., Zheng, L., Folger, O., Rajagopalan, K. N., MacKenzie, E. D., Jerby, L., Micaroni, M., Chaneton, B., Adam, J., Hedley, A., Kalna, G., Tomlinson, I. P. M., Pollard, P. J., Watson, D. G., Deberardinis, R. J., Shlomi, T., Ruppin, E., and Gottlieb, E. Nature 477(7363), 225–228 Sep (2011). [78] Jerby, L., Wolf, L., Denkert, C., Stein, G. Y., Hilvo, M., Oresic, M., Geiger, T., and Ruppin, E. Cancer Research 72(22), 5712–5720 Nov (2012). [79] Wang, C., Uray, I. P., Mazumdar, A., Mayer, J. A., and Brown, P. H. Breast Cancer Research and Treatment 134(1), 101–115 Jul (2012). [80] Gopal, E., Fei, Y.-J., Sugawara, M., Miyauchi, S., Zhuang, L., Martin, P., Smith, S. B., Prasad, P. D., and Ganapathy, V. Journal of Biological Chemistry 279(43), 44522–44532 (2004). [81] Hong, C., Maunakea, A., Jun, P., Bollen, A. W., Hodgson, J. G., Goldenberg, D. D., Weiss, W. A., and Costello, J. F. Cancer Research 65(9), 3617–3623 May (2005). 169 [82] Li, H., Myeroff, L., Smiraglia, D., Romero, M. F., Pretlow, T. P., Kasturi, L., Lutterbaugh, J., Rerko, R. M., Casey, G., Issa, J.-P., Willis, J., Willson, J. K. V., Plass, C., and Markowitz, S. D. Proceedings of the National Academy of Sciences 100(14), 8412–8417 Jul (2003). [83] Thangaraju, M., Gopal, E., Martin, P. M., Ananth, S., Smith, S. B., Prasad, P. D., Sterneck, E., and Ganapathy, V. Cancer Research 66(24), 11560– 11564 Dec (2006). [84] Thangaraju, M., Cresci, G., Itagaki, S., Mellinger, J., Browning, D., Berger, F., Prasad, P., and Ganapathy, V. Journal of Gastrointestinal Surgery 12(10), 1773–1782 (2008). [85] Coothankandaswamy, V., Elangovan, S., Singh, N., Prasad, P. D., Thangaraju, M., and Ganapathy, V. Biochemical Journal 450(1), 169–178 (2013). [86] Miyauchi, S., Gopal, E., Fei, Y.-J., and Ganapathy, V. Journal of Biological Chemistry 279(14), 13293–13296 (2004). [87] Falasca, M. and Linton, K. J. Expert Opinion on Investigational Drugs 21(5), 657–666 May (2012). [88] Ho, M. M., Ng, A. V., Lam, S., and Hung, J. Y. Cancer Research 67(10), 4827–4833 May (2007). [89] Hara-Chikuma, M. and Verkman, A. S. Molecular and Cellular Biology 28(1), 326–332 (2008). [90] Pouyssegur, J., Dayan, F., and Mazure, N. M. Nature 441(7092), 437–443 May (2006). [91] Scalise, M., Galluccio, M., Accardi, R., Cornet, I., Tommasino, M., and Indiveri, C. Cell Biochemistry and Function 30(5), 419–425 (2012). [92] Vadlapudi, A. D., Vadlapatla, R. K., Pal, D., and Mitra, A. K. International journal of pharmaceutics 441, 535–543 (2013). [93] Sweet, R., Paul, A., and Zastre, J. Cancer Biology & Therapy 10(11), 1101– 1111 (2010). [94] Resendis-Antonio, O., Checa, A., and Encarnacion, S. PLoS One 5(8), e12383 (2010). [95] Bordbar, A., Monk, J. M., King, Z. A., and Palsson, B. O. Nature Reviews Genetics 15(2), 107–120 (2014). [96] Masoudi-Nejad, A. and Asgari, Y. Seminars in Cancer Biology (0), – (2014). 170 [97] Lewis, N. E. and Abdel-Haleem, A. M. Frontiers in Physiology 4 (2013). [98] Shlomi, T., Benyamini, T., Gottlieb, E., Sharan, R., and Ruppin, E. PLoS Computational Biology 7(3), e1002018 (2011). [99] Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., and Nielsen, J. PLoS Computational Biology 8(5), e1002518 05 (2012). [100] Vazquez, A., Markert, E. K., and Oltvai, Z. N. PLoS ONE 6(11), e25881 (2011). [101] Tedeschi, P., Markert, E., Gounder, M., Lin, H., Dvorzhinski, D., Dolfi, S., Chan, L. L., Qiu, J., DiPaola, R., Hirshfield, K., et al. Cell death & disease 4(10), e877 (2013). [102] Vazquez, A., Liu, J., Zhou, Y., and Oltvai, Z. BMC Systems Biology 4(1), 58 (2010). [103] Jerby, L., Shlomi, T., and Ruppin, E. Molecular Systems Biology 6, 401 Sep (2010). [104] Gatto, F., Nookaew, I., and Nielsen, J. Proceedings of the National Academy of Sciences 111(9), E866–E875 (2014). [105] Colijn, C., Brandes, A., Zucker, J., Lun, D. S., Weiner, B., Farhat, M. R., Cheng, T.-Y., Moody, D. B., Murray, M., and Galagan, J. E. PLoS Computational Biology 5(8), e1000489 08 (2009). [106] Cox, J. and Mann, M. Cell 130(3), 395–398 (2007). [107] Wang, Z., Gerstein, M., and Snyder, M. Nature Reviews Genetics 10(1), 57–63 (2009). [108] Barash, Y., Calarco, J., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B., and Frey, B. Nature 465(7294), 53–59 May (2010). [109] Sun, Q., Chen, X., Ma, J., Peng, H., Wang, F., Zha, X., Wang, Y., Jing, Y., Yang, H., Chen, R., Chang, L., Zhang, Y., Goto, J., Onda, H., Chen, T., Wang, M.-R., Lu, Y., You, H., Kwiatkowski, D., and Zhang, H. Proceedings of the National Academy of Sciences 108(10), 4129–4134 Mar (2011). [110] Beck, M., Schmidt, A., Malmstroem, J., Claassen, M., Ori, A., Szymborska, A., Herzog, F., Rinner, O., Ellenberg, J., and Aebersold, R. Molecular Systems Biology 7(1), 549 (2011). [111] Nagaraj, N., Wisniewski, J., Geiger, T., Cox, J., Kircher, M., Kelso, J., Pääbo, S., and Mann, M. Molecular Systems Biology 7(1), 548 (2011). 171 [112] Sabidó, E., Selevsek, N., and Aebersold, R. Current Opinion in Biotechnology 23(4), 591–597 (2012). [113] Antonucci, R., Pilloni, M. D., Atzori, L., and Fanos, V. Journal of MaternalFetal and Neonatal Medicine 25(S5), 22–26 (2012). [114] Kaddurah-Daouk, R., Kristal, B. S., and Weinshilboum, R. M. Annual Review of Pharmacology and Toxicology 48, 653–683 (2008). [115] Jamshidi, N. and Palsson, B. Ø. Molecular Systems Biology 2(1) (2006). [116] Reed, J. L. PLoS Computational Biology 8(8), e1002662 (2012). [117] Mo, M. L., Palsson, B. Ø., and Herrgård, M. J. BMC Systems Biology 3(1) (2009). [118] Åkesson, M., Förster, J., and Nielsen, J. Metabolic Engineering 6(4), 285– 293 (2004). [119] Bordbar, A., Mo, M. L., Nakayasu, E. S., Schrimpe-Rutledge, A. C., Kim, Y.-M., Metz, T. O., Jones, M. B., Frank, B. C., Smith, R. D., Peterson, S. N., et al. Molecular Systems Biology 8(1) (2012). [120] Chang, R., Xie, L., Xie, L., Bourne, P., and Palsson, B. PLoS Computational Biology 6(9), e1000938 (2010). [121] Karlstädt, A., Fliegner, D., Kararigas, G., Ruderisch, H. S., Regitz-Zagrosek, V., and Holzhütter, H.-G. BMC Systems Biology 6(1), 114 (2012). [122] Zur, H., Ruppin, E., and Shlomi, T. Bioinformatics 26(24), 3140–3142 (2010). [123] Chandrasekaran, S. and Price, N. Proceedings of the National Academy of Sciences 107(41), 17845–17850 (2010). [124] Jensen, P. and Papin, J. Bioinformatics 27(4), 541–547 (2011). [125] Zhao, Y. and Huang, J. Biochemical and biophysical research communications 415(3), 450–454 (2011). [126] Vlassis, N., Pacheco, M. P., and Sauter, T. arXiv preprint arXiv:1304.7992 (2013). [127] Agren, R., Mardinoglu, A., Asplund, A., Kampf, C., Uhlen, M., and Nielsen, J. Molecular Systems Biology 10(3) (2014). [128] Wang, Y., Eddy, J., and Price, N. BMC Systems Biology 6(1), 153 (2012). 172 [129] Blazier, A. and Papin, J. Medicine 3, 299 (2012). Frontiers in Computational Physiology and [130] Shlomi, T. Biotechnology and Genetic Engineering Reviews 26(1), 281–296 (2009). [131] Becker, S. and Palsson, B. Ø. PLoS Computational Biology 4(5), e1000082 (2008). [132] Bordbar, A., Jamshidi, N., and Palsson, B. Ø. BMC Systems Biology 5(1), 110 (2011). [133] Bordbar, A., Feist, A., Usaite-Black, R., Woodcock, J., Palsson, B., and Famili, I. BMC Systems Biology 5(1), 180 (2011). [134] Fleming, R., Thiele, I., and Nasheuer, H. Biophysical chemistry 145(2), 47– 56 (2009). [135] Kümmel, A., Panke, S., and Heinemann, M. BMC Bioinformatics 7(1), 512 (2006). [136] Yizhak, K., Benyamini, T., Liebermeister, W., Ruppin, E., and Shlomi, T. Bioinformatics 26(12), i255–i260 (2010). [137] Jain, M., Nilsson, R., Sharma, S., Madhusudhan, N., Kitami, T., Souza, A. L., Kafri, R., Kirschner, M. W., Clish, C. B., and Mootha, V. K. Science 336(6084), 1040–1044 (2012). [138] Schmidt, B. J., Ebrahim, A., Metz, T. O., Adkins, J. N., Palsson, B. Ø., and Hyduke, D. R. Bioinformatics 29(22), 2900–2908 (2013). [139] Cakir, T., Patil, K. R., Onsan, Z. I., Ulgen, K. O., Kirdar, B., and Nielsen, J. Molecular Systems Biology 2(50) September (2006). [140] Selvarasu, S., Ho, Y. S., Chong, W. P., Wong, N. S., Yusufi, F. N., Lee, Y. Y., Yap, M. G., and Lee, D.-Y. Biotechnology and Bioengineering 109(6), 1415– 1429 (2012). [141] Ahn, S.-Y., Jamshidi, N., Mo, M. L., Wu, W., Eraly, S. A., Dnyanmote, A., Bush, K. T., Gallegos, T. F., Sweet, D. H., Palsson, B. Ø., et al. Journal of Biological Chemistry 286(36), 31522–31531 (2011). [142] Fan, J., Kamphorst, J. J., Mathew, R., Chung, M. K., White, E., Shlomi, T., and Rabinowitz, J. D. Molecular Systems Biology 9(1) (2013). [143] Jamshidi, N., Miller, F. J., Mandel, J., Evans, T., and Kuo, M. D. BMC Systems Biology 5(1), 200 (2011). 173 [144] Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Nookaew, I., Jacobson, P., Walley, A. J., Froguel, P., Carlsson, L. M., Uhlen, M., et al. Molecular Systems Biology 9(1) (2013). [145] Bordel, S., Agren, R., and Nielsen, J. PLoS Computational Biology 6(7), e1000859 (2010). [146] Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Uhlen, M., and Nielsen, J. Nature Communications 5 (2014). [147] Zu, X. L. and Guppy, M. Biochemical and Biophysical Research Communications 313(3), 459–465 (2004). [148] Locasale, J. W., Grassian, A. R., Melman, T., Lyssiotis, C. A., Mattaini, K. R., Bass, A. J., Heffron, G., Metallo, C. M., Muranen, T., Sharfi, H., et al. Nature Genetics 43(9), 869–874 (2011). [149] Vander Heiden, M. G., Locasale, J. W., Swanson, K. D., Sharfi, H., Heffron, G. J., Amador-Noguez, D., Christofk, H. R., Wagner, G., Rabinowitz, J. D., Asara, J. M., et al. Science 329(5998), 1492–1499 (2010). [150] Smolková, K., Plecitá-Hlavatá, L., Bellance, N., Benard, G., Rossignol, R., and Ježek, P. The International Journal of Biochemistry & Cell Biology 43(7), 950–968 (2011). [151] Zielke, H. R., Ozand, P. T., Tildon, J. T., Sevdalian, D. A., and Cornblath, M. Proceedings of the National Academy of Sciences 73(11), 4110–4114 Nov (1976). [152] DeBerardinis, R. J., Mancuso, A., Daikhin, E., Nissim, I., Yudkoff, M., Wehrli, S., and Thompson, C. B. Proceedings of the National Academy of Sciences 104(49), 19345–19350 Dec (2007). [153] Holleran, A. L., Briscoe, D. A., Fiskum, G., and Kelleher, J. K. Molecular and Cellular Biochemistry 152(2), 95–101 Nov (1995). [154] Gstraunthaler, G., Seppi, T., and Pfaller, W. Cellular Physiology and Biochemistry 9(3), 150–172 (1999). [155] Marroquin, L. D., Hynes, J., Dykens, J. A., Jamieson, J. D., and Will, Y. Toxicological Sciences 97(2), 539–547 (2007). [156] Dewhirst, M. W., Braun, R. D., and Lanzen, J. L. International Journal of Radiation Oncology Biology Physics 42(4), 723–726 (1998). [157] Saks, V. Molecular System Bioenergetics: Energy for Life. Wiley, (2008). 174 [158] Voet, D., Voet, J., and Pratt, C. Fundamentals of Biochemistry: Life at the Molecular Level. 2nd edit. Wiley, New York, (2006). [159] Riemer, S. A., Rex, R., and Schomburg, D. BMC Systems Biology 7(1), 33 (2013). [160] Griguer, C. E., Oliva, C. R., and Gillespie, G. Y. Journal of Neuro-Oncology 74(2), 123–133 (2005). [161] Edwards, J. S., Ramakrishna, R., and Palsson, B. O. Biotechnology and Bioengineering 77(1), 27–36 (2002). [162] Gudmundsson, S. and Thiele, I. BMC Bioinformatics 11(1), 489 (2010). [163] Thiele, I., Price, N., Vo, T., and Palsson, B. Journal of Biological Chemistry 280(12), 11683–11695 (2005). [164] Greshock, J., Feng, B., Nogueira, C., Ivanova, E., Perna, I., Nathanson, K., Protopopov, A., Weber, B. L., and Chin, L. Cancer Research 67(21), 10173– 10180 (2007). [165] Possemato, R., Marks, K. M., Shaul, Y. D., Pacold, M. E., Kim, D., Birsoy, K., Sethumadhavan, S., Woo, H.-K., Jang, H. G., Jha, A. K., Chen, W. W., Barrett, F. G., Stransky, N., Tsun, Z.-Y., Cowley, G. S., Barretina, J., Kalaany, N. Y., Hsu, P. P., Ottina, K., Chan, A. M., Yuan, B., Garraway, L. A., Root, D. E., Mino-Kenudson, M., Brachtel, E. F., Driggers, E. M., and Sabatini, D. M. Nature 476(7360), 346–350 Aug (2011). [166] Sanz-Moreno, V., Gadea, G., Ahn, J., Paterson, H., Marra, P., Pinner, S., Sahai, E., and Marshall, C. J. Cell 135(3), 510–523 (2008). [167] Spencer, S. L., Gaudet, S., Albeck, J. G., Burke, J. M., and Sorger, P. K. Nature 459(7245), 428–432 (2009). [168] Guppy, M., Leedman, P., Zu, X., and Russell, V. Biochemical Journal 364(Pt 1), 309–315 May (2002). [169] Bellance, N., Benard, G., Furt, F., Begueret, H., Smolková, K., Passerieux, E., Delage, J., Baste, J., Moreau, P., and Rossignol, R. The International Journal of Biochemistry & Cell Biology 41(12), 2566 – 2577 (2009). [170] Thompson, C. B. Cancer & Metabolism 2(Suppl 1), O32 (2014). [171] Wellen, K. E., Lu, C., Mancuso, A., Lemons, J. M., Ryczko, M., Dennis, J. W., Rabinowitz, J. D., Coller, H. A., and Thompson, C. B. Genes & Development 24(24), 2784–2799 (2010). 175 [172] Carreau, A., Hafny-Rahbi, B. E., Matejuk, A., Grillon, C., and Kieda, C. Journal of Cellular and Molecular Medicine 15(6), 1239–1253 (2011). [173] Aurich, M. K., Paglia, G., Rolfsson, O., Hrafnsdóttir, S., Magnúsdóttir, M., Stefaniak, M. M., Palsson, B. O., Fleming, R. M., and Thiele, I. Metabolomics , 1–17 (2014). [174] Chunta, J. L., Vistisen, K. S., Yazdi, Z., and Braun, R. D. PLoS ONE 7(5), e37471 05 (2012). [175] Mir, M., Wang, Z., Shen, Z., Bednarz, M., Bashir, R., Golding, I., Prasanth, S. G., and Popescu, G. Proceedings of the National Academy of Sciences 108(32), 13124–13129 (2011). [176] Chapman, E. H., Kurec, A. S., and Davey, F. Journal of Clinical Pathology 34(10), 1083–1090 (1981). [177] O’Connor, P. M., Jackman, J., Bae, I., Myers, T. G., Fan, S., Mutoh, M., Scudiero, D. A., Monks, A., Sausville, E. A., Weinstein, J. N., et al. Cancer Research 57(19), 4285–4300 (1997). [178] Nishiumi, S., Kobayashi, T., Ikeda, A., Yoshie, T., Kibi, M., Izumi, Y., Okuno, T., Hayashi, N., Kawano, S., Takenawa, T., Azuma, T., and Yoshida, M. PLoS ONE 7(7), e40459 07 (2012). [179] Lloyd, M. D., Darley, D. J., Wierzbicki, A. S., and Threadgill, M. D. FEBS journal 275(6), 1089–1102 (2008). [180] Hellgren, L. I. Annals of the New York Academy of Sciences 1190(1), 42–49 (2010). [181] Ollberding, N. J., Aschebrook-Kilfoy, B., Caces, D. B. D., Wright, M. E., Weisenburger, D. D., Smith, S. M., and Chiu, B. C.-H. Carcinogenesis 34(1), 170–175 (2013). [182] Price, A. J., Allen, N. E., Appleby, P. N., Crowe, F. L., Jenab, M., Rinaldi, S., Slimani, N., Kaaks, R., Rohrmann, S., Boeing, H., et al. The American journal of clinical nutrition 91(6), 1769–1776 (2010). [183] Lloyd, M. D., Yevglevskis, M., Lee, G. L., Wood, P. J., Threadgill, M. D., and Woodman, T. J. Progress in Lipid Research 52(2), 220 – 230 (2013). [184] Mubiru, J. N., Valente, A. J., and Troyer, D. A. The Prostate 65(2), 117–123 (2005). [185] Ouyang, B., Leung, Y.-K., Wang, V., Chung, E., Levin, L., Bracken, B., Cheng, L., and Ho, S.-M. Urology 77(1), 249.e1 – 249.e7 (2011). 176 [186] Hu, J., Locasale, J. W., Bielas, J. H., O’Sullivan, J., Sheahan, K., Cantley, L. C., Vander Heiden, M. G., and Vitkup, D. Nature biotechnology 31(6), 522–529 (2013). [187] Suganuma, K., Miwa, H., Imai, N., Shikami, M., Gotou, M., Goto, M., Mizuno, S., Takahashi, M., Yamamoto, H., Hiramatsu, A., et al. Leukemia & lymphoma 51(11), 2112–2119 (2010). [188] ZHENG, J. Oncology Letters 4(6), 1151 (2012). [189] Cheng, T., Sudderth, J., Yang, C., Mullen, A. R., Jin, E. S., Mates, J. M., and DeBerardinis, R. J. Proceedings of the National Academy of Sciences 108(21), 8674–8679 (2011). [190] Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., Jonasson, K., Forsberg, M., Zwahlen, M., Kampf, C., Wester, K., Hober, S., et al. Nature Biotechnology 28(12), 1248–1250 (2010). [191] Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., et al. Nucleic Acids Research 39(suppl 1), D1005–D1010 (2011). [192] Durot, M., Bourguignon, P.-Y., and Schachter, V. FEMS Microbiology Reviews 33(1), 164–190 (2008). [193] Hyduke, D. R., Lewis, N. E., and Palsson, B. O. Molecular BioSystems 9, 167–174 (2013). [194] Li, S., Park, Y., Duraisingham, S., Strobel, F. H., Khan, N., Soltow, Q. A., Jones, D. P., and Pulendran, B. PLOS Computational Biology 9(7), e1003123 (2013). [195] Paglia, G., Palsson, B. O., and Sigurjonsson, O. E. Journal of Proteomics 76(0), 163 – 167 (2012). Special Issue: Integrated Omics. [196] Paglia, G., Hrafnsdóttir, S., Magnúsdóttir, M., Fleming, R., Thorlacius, S., Palsson, B., and Thiele, I. Analytical and Bioanalytical Chemistry , 1–16 (2012). [197] Chance, B., Sies, H., and Boveris, A. Physiological Reviews 59(3), 527–605 (1979). [198] Dröge, W. Physiological Reviews 82(1), 47–95 (2002). [199] Vander Heiden, M. Nature Reviews Drug Discovery 10(9), 671–684 (2011). [200] Chiarugi, A., Dölle, C., Felici, R., and Ziegler, M. Nature Reviews Cancer 12(11), 741–752 (2012). 177 [201] Nikiforov, A., Dölle, C., Niere, M., and Ziegler, M. Journal of Biological Chemistry 286(24), 21767–21778 (2011). [202] Ha, H., Thiagalingam, A., Nelkin, B., and Casero, R. Clinical Cancer Research 6(9), 3783–3787 (2000). [203] Dreher, D. and Junod, A. European Journal of Cancer 32(1), 30–38 (1996). [204] Brand, K. and Hermfisse, U. The FASEB journal 11(5), 388–395 (1997). [205] Ogasawara, Y., Funakoshi, M., and Ishii, K. Biological and Pharmaceutical Bulletin 32(11), 1819–1823 (2009). [206] Cortés-Cros, M., Hemmerlin, C., Ferretti, S., Zhang, J., Gounarides, J. S., Yin, H., Muller, A., Haberkorn, A., Chene, P., Sellers, W. R., et al. Proceedings of the National Academy of Sciences 110(2), 489–494 (2013). [207] Marin-Hernandez, A., Gallardo-Perez, J. C., Ralph, S. J., RodriguezEnriquez, S., and Moreno-Sanchez, R. Mini Reviews in Medicinal Chemistry 9(9), 1084–1101 (2009). [208] Lenzen, S. Journal of Biological Chemistry 289(18), 12189–12194 (2014). [209] Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., Djoumbou, Y., Mandal, R., Aziat, F., Dong, E., et al. Nucleic Acids Research 41(D1), D801–D807 (2013). [210] Ganske, F. and Dell, E. BMG LABTECH (2006). [211] Zarember, K. and Godowski, P. The Journal of Immunology 168(2), 554–561 (2002). [212] Kadowaki, N., Ho, S., Antonenko, S., de Waal Malefyt, R., Kastelein, R., Bazan, F., and Liu, Y. The Journal of Experimental Medicine 194(6), 863– 869 (2001). [213] Tang, S., Arumugam, T., Xu, X., Cheng, A., Mughal, M., Jo, D., Lathia, J., Siler, D., Chigurupati, S., Ouyang, X., et al. Proceedings of the National Academy of Sciences 104(34), 13798–13803 (2007). [214] Cros, J., Cagnard, N., Woollard, K., Patey, N., Zhang, S., Senechal, B., Puel, A., Biswas, S., Moshous, D., Picard, C., et al. Immunity 33(3), 375–386 (2010). [215] Serbina, N., Jia, T., Hohl, T., and Pamer, E. Annual Review of Immunology 26, 421–452 (2008). [216] Auffray, C., Sieweke, M., and Geissmann, F. Annual Review of Immunology 27, 669–692 (2009). 178 [217] Dinarello, C. Blood 117(14), 3720–3732 (2011). [218] Jensen, P. and Papin, J. Bioinformatics 27(4), 541–547 (2011). [219] Thiele, I., Fleming, R., Bordbar, A., Schellenberger, J., and Palsson, B. Biophysical Journal 98(10), 2072–2081 (2010). [220] Gianchandani, E., Papin, J., Price, N., Joyce, A., and Palsson, B. PLoS Computational Biology 2(8), e101 (2006). [221] Gianchandani, E., Joyce, A., Palsson, B., and Papin, J. PLoS Computational Biology 5(6), e1000403 (2009). [222] Dasika, M., Burgard, A., and Maranas, C. Biophysical Journal 91(1), 382– 398 (2006). [223] Richard, G., Belta, C., Julius, A., and Amar, S. PLoS ONE 7(2), e31341 (2012). [224] Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. Nucleic Acids Research 39, D52–D57 (2011). [225] Apweiler, R., Martin, M. J., O’Donovan, C., Magrane, M., Alam-Faruque, Y., Antunes, R., Barrell, D., Bely, B., Bingley, M., Binns, D., Bower, L., Browne, P., Chan, W. M., Dimmer, E., Eberhardt, R., Fazzini, F., Fedotov, A., Foulger, R., Garavelli, J., Castro, L. G., Huntley, R., Jacobsen, J., Kleen, M., Laiho, K., Legge, D., Lin, Q. A., Liu, W. D., Luo, J., Orchard, S., Patient, S., Pichler, K., Poggioli, D., Pontikos, N., Pruess, M., Rosanoff, S., Sawford, T., Sehra, H., Turner, E., Corbett, M., Donnelly, M., van Rensburg, P., Xenarios, I., Bougueleret, L., Auchincloss, A., Argoud-Puy, G., Axelsen, K., Bairoch, A., Baratin, D., Blatter, M. C., Boeckmann, B., Bolleman, J., Bollondi, L., Boutet, E., Quintaje, S. B., Breuza, L., Bridge, A., deCastro, E., Coudert, E., Cusin, I., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gehant, S., Ferro, S., Gasteiger, E., Gateau, A., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Hulo, N., James, J., Jimenez, S., Jungo, F., Kappler, T., Keller, G., Lara, V., Lemereier, P., Lieberherr, D., Martin, X., Masson, P., Moinat, M., Morgat, A., Paesano, S., Pedruzzi, I., Pilbout, S., Poux, S., Pozzato, M., Redaschi, N., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stanley, E., et al. Nucleic Acids Research 39, D214–D219 (2011). [226] Zhang, D., Zhang, G., Hayden, M., Greenblatt, M., Bussey, C., Flavell, R., and Ghosh, S. Science 303(5663), 1522–1526 (2004). [227] Lynn, D., Winsor, G., Chan, C., Richard, N., Laird, M., Barsky, A., Gardy, J., Roche, F., Chan, T., Shah, N., et al. Molecular Systems Biology 4(1), 218 (2008). 179 [228] Takeda, K. and Akira, S. Seminars in Immunology 16(1), 3–9 (2004). [229] Lowe, E., Doherty, T., Karahashi, H., and Arditi, M. Journal of Endotoxin Research 12(6), 337–345 (2006). [230] Syvanen, A. Nature Reviews Genetics 2(12), 930–942 (2001). [231] Jamshidi, N., Wiback, S., and Palsson, B. Genome Research 12(11), 1687– 1692 (2002). [232] Bals, R. and Hiemstra, P. European Respiratory Journal 23(2), 327–333 (2004). [233] Swulius, M. and Waxham, M. Cellular and Molecular Life Sciences 65(17), 2637–2657 (2008). [234] Ciocca, D. and Calderwood, S. Cell stress & chaperones 10(2), 86–103 (2005). [235] Aitken, A. Seminars in Cancer Biology 16(3), 162–172 (2006). [236] Van Der Hoeven, P., Van Der Wal, J., Ruurs, P., Van Dijk, M., and Van Blitterswijk, J. Biochemical Journal 345(2), 297–306 (2000). [237] Tobimatsu, T. and Fujisawa, H. Journal of Biological Chemistry 264(30), 17907–17912 (1989). [238] Droemann, D., Albrecht, D., Gerdes, J., Ulmer, A., Branscheid, D., Vollmer, E., Dalhoff, K., Zabel, P., Goldmann, T., et al. Respiratory Research 6(1), 1–6 (2005). [239] Werner, J., DeCarlo, C., Escott, N., Zehbe, I., and Ulanova, M. Innate Immunity 18(1), 55–69 (2011). [240] Dower, K., Ellis, D., Saraf, K., Jelinsky, S., and Lin, L. The Journal of Immunology 180(5), 3520–3534 (2008). [241] Berglund, L., Björling, E., Oksvold, P., Fagerberg, L., Asplund, A., AlKhalili Szigyarto, C., Persson, A., Ottosson, J., Wernérus, H., Nilsson, P., et al. Molecular & Cellular Proteomics 7(10), 2019–27 (2008). [242] Guha, M. and Mackman, N. Cellular Signalling 13(2), 85–94 (2001). [243] Izaguirre, A., Barnes, B., Amrute, S., Yeow, W., Megjugorac, N., Dai, J., Feng, D., Chung, E., Pitha, P., and Fitzgerald-Bocarsly, P. Journal of Leukocyte Biology 74(6), 1125–1138 (2003). 180 [244] Cachia, O., Benna, J., Pedruzzi, E., Descomps, B., Gougerot-Pocidalo, M., and Leger, C. Journal of Biological Chemistry 273(49), 32801–32805 (1998). [245] Rahman, M. and McFadden, G. PLoS Pathogens 2(2), e4 (2006). [246] Kavita, U. and Mizel, S. Journal of Biological Chemistry 270(46), 27758– 27765 (1995). [247] Farina, C., Theil, D., Semlinger, B., Hohlfeld, R., and Meinl, E. International Immunology 16(6), 799–809 (2004). [248] Lech, M., Avila-Ferrufino, A., Skuginna, V., Susanti, H., and Anders, H. International Immunology 22(9), 717–728 (2010). [249] Ogura, Y., Inohara, N., Benito, A., Chen, F., Yamaoka, S., and Núñez, G. Journal of Biological Chemistry 276(7), 4812–4818 (2001). [250] Moynagh, P. N. Trends in Immunology 30(1), 33–42 (2009). [251] Cohn, Z. A. and Benson, B. The Journal of Experimental Medicine 121(1), 153–170 (1965). [252] Kostromins, A. and Stalidzans, E. Biosystems 109(2), 233 – 239 (2012). [253] Krappmann, D., Wegener, E., Sunami, Y., Esen, M., Thiel, A., Mordmuller, B., and Scheidereit, C. Molecular and Cellular Biology 24(14), 6488–6500 (2004). [254] Rebe, C., Cathelin, S., Launay, S., Filomenko, R., Prevotat, L., L’Ollivier, C., Gyan, E., Micheau, O., Grant, S., Dubart-Kupperschmitt, A., et al. Blood 109(4), 1442–1450 (2007). [255] Krawczyk, C., Holowka, T., Sun, J., Blagih, J., Amiel, E., DeBerardinis, R., Cross, J., Jung, E., Thompson, C., Jones, R., et al. Blood 115(23), 4742–4749 (2010). [256] Tannahill, G. and O’Neill, L. FEBS Letters 585(11), 1568–1572 (2011). [257] Dinarello, C. Blood 87(6), 2095–2147 (1996). [258] Mackman, N., Brand, K., and Edgington, T. The Journal of Experimental Medicine 174(6), 1517–1526 (1991). [259] Richard, G., Chang, H., Cizelj, I., Belta, C., Julius, A., and Amar, S. 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC),Orlando, FL, USA. , 2227–2232 (2011). 181 [260] Lee, J., Gianchandani, E., Eddy, J., and Papin, J. PLoS Computational Biology 4(5), e1000086 (2008). [261] Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J., Bolival, B., Assad-Garcia, N., Glass, J., and Covert, M. Cell 150(2), 389–401 (2012). [262] Covert, M., Xiao, N., Chen, T., and Karr, J. Bioinformatics 24(18), 2044– 2050 (2008). [263] Vardi, L., Ruppin, E., and Sharan, R. Journal of Computational Biology 19(2), 232–240 (2012). [264] Thorleifsson, S. and Thiele, I. Bioinformatics 27(14) (2011). [265] Colicelli, J. Science Signaling 2004(250), re13–re13 (2004). [266] Schumann, R., Leong, S.R.and Flaggs, G., Gray, P., Wright, S., Mathison, J., Tobias, P., and Ulevitch, R. Science 249(4975), 1429–1431 (1990). [267] Grube, B., Cochane, C., Ye, R., Green, C., McPhail, M., Ulevitch, R., and Tobias, P. Journal of Biological Chemistry 269(11), 8477–8482 (1994). [268] Thomas, C., Kapoor, M., Sharma, S., Bausinger, H., Zyilan, U., Lipsker, D., Hanau, D., and Surolia, A. FEBS Letters 531(2), 184–188 (2002). [269] Blake, J. A., Bult, C. J., Kadin, J. A., Richardson, J. E., and Eppig, J. T. Nucleic Acids Research 39(suppl 1), D842–D848 (2011). [270] Hasan, U., Chaffois, C., Gaillard, C., Saulnier, V., Merck, E., Tancredi, S., Guiet, C., Brière, F., Vlach, J., Lebecque, S., et al. The Journal of Immunology 174(5), 2942–2950 (2005). [271] Mishra, S., Mishra, J., Gee, K., McManus, D., LaCasse, E., and Kumar, A. Journal of Biological Chemistry 280(45), 37536–37546 (2005). [272] Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., and Mesirov, J. Nature Genetics 38(5), 500–501 (2006). [273] Warren, P., Taylor, D., Martini, P., Jackson, J., and Bienkowska, J. In Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on, 108–115. IEEE, (2007). [274] Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. Genome biology 5(10), R80 (2004). [275] Emig, D., Salomonis, N., Baumbach, J., Lengauer, T., Conklin, B., and Albrecht, M. Nucleic Acids Research 38(suppl 2), W755–W762 (2010). 182 [276] Machado, D. and Herrgård, M. e1003580 (2014). PLoS Computational Biology 10(4), [277] Zhang, D., Hu, X., Qian, L., Chen, S.-H., Zhou, H., Wilson, B., Miller, D. S., and Hong, J.-S. J. Neuroinflammation 8(3) (2011). [278] Weintz, G., Olsen, J. V., Frühauf, K., Niedzielska, M., Amit, I., Jantsch, J., Mages, J., Frech, C., Dölken, L., Mann, M., et al. Molecular Systems Biology 6(1) (2010). [279] Yamashita, M., Chattopadhyay, S., Fensterl, V., Saikia, P., Wetzel, J. L., and Sen, G. C. Science signaling 5(233), ra50 (2012). [280] Gao, X., Wang, H., Yang, J., Liu, X., and Liu, Z.-R. Molecular Cell 45(5), 598–609 (2012). [281] Yang, W., Xia, Y., Hawke, D., Li, X., Liang, J., Xing, D., Aldape, K., Hunter, T., Alfred Yung, W., and Lu, Z. Cell 150(4), 685–696 (2012). 183 6 List of Publications • Aurich, M.K., Paglia, P., Rolfsson, Ó, Hrafnsdóttir, S., Magnúsdóttir, M., Stefaniak, M.M., Palsson, B.Ø., Fleming, R.M.T., Thiele, I. Prediction of intracellular metabolic states from extracellular metabolomic data. (2014) Metabolomics, 1-17. • Sahoo, S., Aurich, M.K., Jónsson, J.J., Thiele, I. Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. (2014) Frontiers in Physiology 5, 91. • Thiele, I., Swainston, N., Fleming, R. M., Hoppe, A., Sahoo, S., Aurich, M. K., Haraldsdottir, H., Mo, M. L., Rolfsson, O., Stobbe, M. D., et al. A community-driven global reconstruction of human metabolism. (2013) Nature Biotechnology, 31, 419-425. • Mednis, M. and Aurich, M.K. Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. (2012) Biosystems and Information Technology 1 (1), 14-18. • Aurich,M.K., Thiele, I., Contextualization Procedure and Modeling of Monocyte Specific TLR Signaling. (2012) PloS one 7 (12), e49978. 185