* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Supplementary Online Material
BRCA mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Public health genomics wikipedia , lookup
Essential gene wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Minimal genome wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Gene expression profiling wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Supplementary Online Material Predicting Selective Drug Targets in Cancer through Metabolic Networks Ori Folger1, Livnat Jerby1, Christian Frezza2, Eyal Gottlieb2, Eytan Ruppin1,3,*, Tomer Shlomi4,* 1 The Blavatnik School of Computer Science – Tel Aviv University The Beatson Institute for Cancer Research, Glasgow, Scotland, UK 3 The Sackler School of Medicine – Tel Aviv University 4 Computer Science Department - Technion - Israel Institute of Technology * Equal contribution, and to whom correspondence should be addressed. 2 Table of Contents The Model Building Algorithm ........................................................................................................................................ 2 Similarity in metabolic activity across cancer cell-lines .................................................................................................. 4 Grouping functionally related targets ............................................................................................................................. 5 Reconstructing a lung cancer metabolic network model ............................................................................................... 6 Predicting synthetic lethal gene targets suggests novel combinatorial drug therapies ................................................. 8 Figures 1. The distribution of Kolmogorov-Smirnov (KS) scores for an enrichment test of randomly chosen sets of 199 genes with genes ranked as highly contributing to growth based on the shRNA data of Luo et al. ......................................... 4 2. Synthetic lethal gene pairs whose knockdown is predicted to selectively inhibit proliferation of cancer cells ......... 7 3. The number of predicted synergistic drug targets pairs involving different genes with known somatic mutations in cancer ............................................................................................................................................................................. 9 Tables (in Data Set 1) 1. All genes in the generic human metabolic model. 2. Genes predicted to be highly cytostatic in the cancer model. 3. Gene pairs predicted to be highly synergetic and cytostatic. 4. The connections between pathways by the cancer synergic pairs. 5. All drugs from DrugBank. 6. Detailed biomass calculation. 7. The simulation details of RPMI 1640, the growth medium used. 8. The list of reactions appearing in the generic human model, cancer model and lung model. 9. Drugs predicted to target specific cancer types by chromosomal loss of synethetic lethal participant genes. 10. Pathway content of metabolic models. 11. Grouping of cytostatic genes prediction by affected biomass metabolites. 12. Grouping of cytostatic SL prediction by affected biomass metabolites. 13. Frequency of gene deletion through karyotyping across cancer types 1 The Model Building Algorithm (MBA; as described in Jerby et al.1) A metabolic network model, amenable to constraint based modeling (CBM), can be represented by a 4-tuple, , in which, denotes a set of metabolites, denotes reactions’ stoichiometry, and denotes a set of biochemical reactions, denotes constraints on reactions’ directionality (i.e. reflecting a lower bound on reactions’ flux rate). Reactions’ stoichiometry is represented by a stoichiometric matrix, represents the stoichiometric coefficient of metabolite in reactions . A feasible flux distribution within a metabolic network model, is a vector ( ), and directionality constraints ( , in which , satisfying mass-balance ). A metabolic network model is considered consistent if it enables to activate all of its reactions– i.e. for each reaction there exists a feasible flux distribution , such that . Given a metabolic network model, , referred to as the generic model (the human 2 model of Duarte et al. , in our case), a partial model, , including only a subset of the generic model’s reactions ( ) can be defined based on the corresponding subsets of metabolites, stoichiometry and directionality constraints. Notably, a partial model of a consistent generic model is not necessarily consistent itself (i.e., it may contain dead-end reactions that cannot be activated considering all possible feasible flux distributions). Given a generic model, GM, and a sets of core reactions , known to have a high probability to be included in some partial model our goal is to derive the most parsimonious, consistent, partial model cancer model, in our case), consisting of all reaction in following optimization problem: . Specifically, we would like to identify (the by solving the Minimize s.t. PM is consistent, To solve the above optimization problem, we employ the following simple search heuristic: 1. Define 2. Choose a random permutation, , of reactions from the set 3. For each reaction a. b. If then First, the current set of reactions in the partial model, , is initialized to include all reactions in the generic model (1). Then, a random permutation of all reactions that are not in the high reliability core, (and hence their potential removal from the partial model can improve the optimization objective function) is generated. In (3), the potential removal of each reaction in turn (based on the random permutation ) is evaluated. The procedure computes the set of high reliability reactions in removal of reaction together with reaction from , and returns this set as such that aside from , . that cannot be activated after the denotes a set of reactions that can be removed the core is active. In case the removal of 2 from would affect the ability to activate a high reliability reaction, would not be removed from of would improve the optimization objective value, will be removed from reactions that cannot be activated following its removal. A naive implementation of (3.b). Otherwise, if the removal , along with the additional can simply iterate through all reactions in 3 applying FVA to check whether they can carry non-zero flux within a feasible flux distribution when is removed. The naïve algorithm is computationally prohibitive, with an overall time complexity of denotes the , where number of reactions in , denotes the number of non-core reactions ( ), and denotes the (polynomial) complexity of each LP problem. Instead, a simple speed up technique was implemented that aims to concurrently activate multiple core reactions in the same LP problem, by trying to maximize their total sum of flux. These optimizations are repeated several times on a monotonically decreasing set of core reactions. Each iteration is performed on a smaller subset of the core reactions, which have received zero flux in all previous iterations. These speed ups reduce the running time by 97%, in comparison to the naïve approach. The total running time of the algorithm is ~10-20 minutes depending on the size of the core reactions set. Since the resulting model depends on the chosen reaction scanning order, the algorithm is executed repeatedly for a number of times (1,000 in the results presented here) with different, random scanning orders. Each run results in a candidate model. All 1,000 candidate models are then processed to assign the non-core reactions with confidence scores, which are the fraction of candidate models in which they appear. An aggregative model is built by considering the scores across all runs, starting with the reactions and incrementally adding reactions in order of descending confidence score until a consistent, viable model is obtained. 3 Similarity in metabolic activity across cancer cell-lines The similarity in metabolic activity across cell-lines is evident via high-throughput molecular data such as those used to reconstruct and validate the generic cancer metabolic model: (i) Microarray data – shows significant up-regulation of central metabolic pathways, including oxidative phosphorylation and nucleotide biosynthesis, across many cancer cell-lines 4. (ii) Metabolomics – showing a highly statistically significant correlation between metabolomic profiles measured across NCI-60 cancer cell-lines (p-value < 1e-5). Specifically, the metabolomics data includes measurements of the concentration 98 metabolites across 57 cancer cell lines from the NCI collection (measured by Metabolon). In order to assess the significance of similarity of metabolite concentrations between cell lines, we performed the following permutation test. The mean Pearson correlation between the concentration vectors of each pair of cell lines is a statistic measuring similarity between all cell lines. We shuffle the concentration values of each metabolite between the different cell lines and calculate the similarity statistic for the new, shuffled dataset. Repeating this process over 100,000 times, we found that no shuffled dataset reaches the similarity statistic of the original dataset, resulting in a p-value < 1e-5. (iii) shRNA knockdowns - showing a substantial correlation in gene requirements for proliferation of many celllines 5. Repeating the original analysis performed in Luo et al. while focusing specifically on metabolic genes has shown that a significantly high number of metabolic genes are ranked across the top 5% of essential genes based on shRNA across the different cancer cell-lines (p-value < 1e-5; Supp. Material). The p-value was computed by comparing the number of genes that are in the top 5% essential genes in at least 5 cell-lines (out of 12) to that obtained when randomly shuffling the essentiality rankings. Validating predicted growth-supporting genes via the experimental shRNA data As described in the main text, the set of 199 predicted growth-supporting genes are found to be ranked as highly essential based on shRNA gene knockdown data (Kolmogorov-Smirnov p-value = 0.0045) when aggregating together the shRNA measurements across the 12 cell lines. When inspecting each cell-line separately, the set of 199 predicted genes are ranked as highly essential in 5 out of these 12 cancer cell-lines (where significance is computed as described in the Methods, via one-sided Kolmogorov-Smirnov test, considering a false-discovery rate of 0.05). Supplementary Figure 1: The distribution of Kolmogorov-Smirnov (KS) scores for an enrichment test of randomly chosen sets of 199 genes with genes ranked as highly contributing to growth based on the shRNA data of Luo et al. The KS scores are computed for the set of probes associated with each randomly chosen set of genes. The red area denotes the fraction of randomly chosen gene sets with an equal or better KS score than that obtained for the set of 199 predicted growth-supporting 4 genes predicted by the cancer-model (showing that the latter set is significantly enriched with genes ranked as highly contributing to growth; p-value of 0.0045). As a further control, we evaluated the predictive performance of the original generic human metabolic network itself. We found that the human model predicts a smaller set of 92 growth-supporting genes that are also ranked as essential based on shRNA data, but with a p-value of 0.029, which is larger by an order of magnitude than that obtained by the cancer model (0.0045). Examining the shRNA of each cell-line separately, this set of growthsupporting genes is ranked as highly essential in only 2 out of the 12 cell-lines cancer cell-lines. The enrichment of these genes with known cancer mutation genes6 is markedly lower than that obtained with the cancer model’s predictions, with a p-value of 0.025 versus 0.0021 obtained with the cancer model. As another control, we evaluated the predictive performance of the gene expression data alone (i.e. without utilizing a metabolic network model). We found that the set of 197 genes that are highly expressed across 90% of the cancer cell-lines (used as input to reconstruct the cancer model) are not enriched with genes ranked as highly contributing to cancer growth based on the shRNA data (p-value 0.075; when aggregating the shRNA data across the 12 cell-lines). They also provide a lower enrichment with known cancer mutation genes (p-value = 0.0077). Next, we evaluated the predictive performance of gene expression data measured specifically in these cell-lines, finding that genes with high expression levels are ranked as highly essential based on the shRNA data in only 3 out of these cancer cell-lines. For the latter analysis, we utilized the following gene expression datasets from GEO: GSE8045, GSE8332, GSE10841, GSE8332, GSE10841, GSE7562, GSE11058, GSE12056, GSE4536, GSE13313, and GSE4536. We estimated the sensitivity of our predictions of growth-supporting genes to uncertainty in biomass composition (and its potential variation across cancer types) by adding random noise to the biomass coefficients and evaluating its effect on the predictive performance of growth-supporting genes. The random noise was drawn from a Gaussian distribution with a standard deviation of 50% of the original biomass coefficient values. For each choice of random biomass coefficients, we repeated (for a 1000 runs) the prediction of growth-supporting genes via FBA and tested for enrichment of genes that contribute to growth based on the shRNA data from Luo et al, as described in the ranking procedure above. We found that the cancer model is robust to different choices of biomass composition (in accordance with previous similar findings in microbial networks7), with a mean p-value of 0.0505 and standard deviation of 0.0168 manifested at even these fairly high noise levels. We further estimated the sensitivity of this result to the specific choice of threshold for predicting growthsupporting genes, by repeating the analysis for a wide range of thresholds in the range 0.001 to 0.2. We find that the predictive performance is highly robust to the specific choice of threshold, with the corresponding p-values remaining significant in the range 0.0045 to 0.042. The sensitivity of the model reconstruction approach and predicted growth-inducing genes prediction to uncertainty in the definition of the core reactions set was estimated as following: We applied MBA 1,000 times to reconstruct a cancer metabolic model while randomly replacing 10% of the core genes with random genes from the generic human model. We found that that the variability in overlap between the sets of growth-inducing genes predicted by the 1,000 models is at most as 10% - testifying for the highly robust nature of our approach. Grouping functionally related targets We analyzed the specific set of biomass compounds inhibited by the knockdown of each identified single and double targets. To investigate how targets differ from one another, we partitioned the single and double targets into groups 5 based on the set of biomass compounds affected by their knockdown. The set of 52 single drug targets formed 19 target groups (Supp. Table 11), while the 133 synthetic lethal pairs formed 44 target groups (Supp. Table 12). The target group that specifically inhibits cholesterol production consists of a long chain of enzymes directly involved in cholesterol biosynthesis (in accordance with the pathway annotation of these enzymes). In some cases, the pathway annotation of targets did not match their partition into target groups, as was already shown earlier in another context8. For example, various single targets in nucleotide and glycerophospholipid metabolism are predicted to affect different biomass compounds. Notably, a group of targets whose knockdown is predicted to inhibit the same set of biomass compounds may still differ from one another in the set of additional (non-biomass related) metabolic alterations. Reconstructing a lung cancer metabolic network model We applied our cancer model building approach to reconstruct a model of non-small cell lung cancer metabolism, utilizing multiple gene expression datasets from a specific NCI-60 cell-line that has a relative abundance of such data (A549)9. The resulting model consists of 791 genes, 957 reactions and 730 metabolites (in comparison with 696, 813, and 665, genes, reactions, and metabolites, respectively, in the generic cancer model). Comparing the pathway content of both models shows a markedly higher number of reactions in Vitamin A and B6 metabolism in the lung cancer model 10,11and of phenylalanine and tyrosine metabolism (in accordance with findings regarding altered metabolic function of these pathways in lung cancer10-12; Supp. Table 10).The growth-supporting genes predicted by the lung cancer model are ranked as highly essential based on shRNA gene silencing data measured in this cell-line (Kolmogorov-Smirnov p-value = 0.025), notably outperforming the predictions made by the generic cancer model that are not ranked as significantly essential in this cell-line (the predictions made by the generic cancer model are ranked as essential via shRNA in 5 out of the studied 12 cancer cell-lines; Supp. Material, while those made by the lung cancer model are uniquely significant for the lung data). Importantly, the set of highly expressed genes in lung cancer that was used as input in the reconstruction of the lung cancer model are also not ranked as highly essential testifying for the added value of building a cancer type-specific model in this case. Filtering the set of predicted growth-supporting genes in lung cancer to identify those that do not damage normal cell activity (as done with the generic cancer model) results in a set of 60 targets with high cytostatic score, with an overlap of 50 with those predicted by the generic cancer model (testifying to the general utility of the generic model; Supp. Table 2). This set of genes includes a target of the drug Pemetrexed that is approved by FDA for treatment of non-small cell lung cancer. Many of the targets predicted only by the lung cancer model belong to the same metabolic pathways as the known and experimental anticancer drug targets, including nucleotide metabolism and Glycerophospholipid Metabolism (Supp Table 2). One specifically interesting case of these additional targets is SLC29A1 (nucleoside transporter), targeted by Troglitazone13 which was shown to have anti-proliferative effect on the analyzed lung cancer cell-line A54914. Overall, these results demonstrate the potential of our computational approach for the future development and study of more refined cancer type-specific models. 6 Supplementary Figure 2: Synthetic lethal gene pairs whose knockdown is predicted to selectively inhibit proliferation of cancer cells: (a) The cancer model lacks the alternative pathway for glycine production via SARDH, and hence the double knockdown of both SHMT1 and AGXT activity is predicted to inhibit their glycine production. Its existence in normal cells, however, is predicted to still enable glycine synthesis in face of such a combinatorial knockdown. (b) The cancer model lacks several histidine transporters due to low expression levels, and hence the double knockdown of both isozymes SLC6A14 and SLC38A2 (that do belong to the cancer model) is predicted to be 7 essential only for cancer cells. (c) The cancer model lacks several transporters of inorganic phosphate, making the double knockout of the isozymes SLC20A1 and SLC20A2 essential for viability only for cancer cells. Predicting synthetic lethal gene targets suggests novel combinatorial drug therapies The predicted synthetic lethal gene pairs suggest novel combinatorial drug therapies. For 9 of the predicted synergistic selective gene pairs there are either approved or experimental drugs (not necessarily cancer related) that target both genes (this relative paucity is expected given the relative scarcity of treatments attacking multiple metabolic targets). In one such pair, both genes (IMPDH1 and IMPDH2 that code for isoenzymes of inosine monophosphate dehydrogenase) are targeted by the same approved anticancer drug (either by Mercaptopurine or Thioguanine). Another pair is FH (fumarate dehydrogenase) and UROD (uroporphyrinogen decarboxylase), the latter targeted by a recently developed inhibitor15,16. Another pair consists of two isoenzymes of inositol monophosphatase (IMPA1 and IMPA2), both targeted by Lithium (clinically used for treating bipolar affective disorder), which was previously shown to reduce proliferation of esophageal cancer17. Three additional gene pairs include the gene SHMT1 (serine hydroxymethyltransferase) targeted by the drug Mimosine, which was previously shown to inhibit proliferation of human cancer cells18. Overall, the 133 synthetic lethal gene pairs include 4 additional known anticancer drug targets not identified in the single knockdown analysis described above, representing a highly significant enrichment (hypergeometric p-value < 2e-6). For 60 out of the 133 predicted synergistic gene pairs there already exists a drug targeting one of the genes, making them quite appealing from an applicative standpoint, with the additionally predicted gene forming an interesting candidate for potential drug targeting. For example, PNP (purine nucleoside phosphorylase) and AK5 (adenylate kinase), the former targeted by Cladribine that is in use to treat hairy cell leukemia19. 8 Supplementary Figure 3: The number of predicted synergistic drug targets pairs involving different genes with known somatic mutations in cancer20. For each gene in which cancer mutations were previously identified, the figure shows the number of predicted synergistic gene pairs in which it participates. The number of paired genes targeted by existing drugs is shown in a red, while the number of paired genes which are not targeted by currently available drugs is shown in green. 9 References: 1. Jerby, L., Shlomi, T. & Ruppin, E. Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism. Molecular systems biology 6, 1 (2010). 2. Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A 104, 1777-82 (2007). 3. Mahadevan, R. & Schilling, C.H. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5, 264-76 (2003). 4. Ertel, A., Verghese, A., Byers, S.W., Ochs, M. & Tozeren, A. Pathway-specific differences between tumor cell lines and normal and tumor tissue cells. Mol Cancer 5, 55 (2006). 5. Luo, B. et al. Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A 105, 20380-5 (2008). 6. Futreal, P.A. et al. A census of human cancer genes. Nature Reviews Cancer 4, 177-183 (2004). 7. Varma, A., Boesch, B.W. & Palsson, B.O. Biochemical production capabilities of escherichia coli. Biotechnol Bioeng 42, 59-73 (1993). 8. Shlomi, T. et al. Systematic condition-dependent annotation of metabolic genes. Genome Res 17, 1626-33 (2007). 9. Grever, M.R., Schepartz, S.A. & Chabner, B.A. The National Cancer Institute: cancer drug discovery and development program. Semin Oncol 19, 622-38 (1992). 10. Johansson, M. et al. Serum B vitamin levels and risk of lung cancer. JAMA 303, 2377-85 (2010). 11. Willett, W.C. Vitamin A and lung cancer. Nutr Rev 48, 201-11 (1990). 12. Yang, Q. et al. Urinary metabonomic study of lung cancer by a fully automatic hyphenated hydrophilic interaction/RPLC MS system. Journal of Separation Science 33, 1495-1503 (2010). 13. Leung, G., Man, R. & Tse, C. Effect of thiazolidinediones on equilibrative nucleoside transporter-1 in human aortic smooth muscle cells. Biochemical pharmacology 70, 355-362 (2005). 14. Tsubouchi, Y. et al. Inhibition of human lung cancer cell growth by the peroxisome proliferator-activated receptor-[gamma] agonists through induction of apoptosis. Biochemical and Biophysical Research Communications 270, 400-405 (2000). 15. Ito, E. et al. Uroporphyrinogen decarboxylase is a radiosensitizing target for head and neck cancer. Sci Transl Med 3, 67ra7 (2011). 16. Phillips, J.D., Bergonia, H.A., Reilly, C.A., Franklin, M.R. & Kushner, J.P. A porphomethene inhibitor of uroporphyrinogen decarboxylase causes porphyria cutanea tarda. Proc Natl Acad Sci U S A 104, 5079-84 (2007). 17. Cohen, Y., Chetrit, A., Sirota, P. & Modan, B. Cancer morbidity in psychiatric patients: influence of lithium carbonate treatment. Medical Oncology 15, 32-36 (1998). 18. Chang, H., Lee, T., Chuang, L., Yen, M. & Hung, W. Inhibitory effect of mimosine on proliferation of human lung cancer cells is mediated by multiple mechanisms. Cancer letters 145, 1-8 (1999). 19. Else, M. et al. Long term follow up of 233 patients with hairy cell leukaemia, treated initially with pentostatin or cladribine, at a median of 16 years from diagnosis. British journal of haematology 145, 733-740 (2009). 20. Forbes, S.A. et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet Chapter 10, Unit 10 11 (2008). 10