Download Supplementary Online Material

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

BRCA mutation wikipedia , lookup

Mutagen wikipedia , lookup

Genome evolution wikipedia , lookup

Public health genomics wikipedia , lookup

Essential gene wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Gene expression profiling wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome (book) wikipedia , lookup

NEDD9 wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Oncogenomics wikipedia , lookup

Transcript
Supplementary Online Material
Predicting Selective Drug Targets in Cancer through Metabolic Networks
Ori Folger1, Livnat Jerby1, Christian Frezza2, Eyal Gottlieb2, Eytan Ruppin1,3,*, Tomer Shlomi4,*
1
The Blavatnik School of Computer Science – Tel Aviv University
The Beatson Institute for Cancer Research, Glasgow, Scotland, UK
3
The Sackler School of Medicine – Tel Aviv University
4
Computer Science Department - Technion - Israel Institute of Technology
*
Equal contribution, and to whom correspondence should be addressed.
2
Table of Contents
The Model Building Algorithm ........................................................................................................................................ 2
Similarity in metabolic activity across cancer cell-lines .................................................................................................. 4
Grouping functionally related targets ............................................................................................................................. 5
Reconstructing a lung cancer metabolic network model ............................................................................................... 6
Predicting synthetic lethal gene targets suggests novel combinatorial drug therapies ................................................. 8
Figures
1. The distribution of Kolmogorov-Smirnov (KS) scores for an enrichment test of randomly chosen sets of 199 genes
with genes ranked as highly contributing to growth based on the shRNA data of Luo et al. ......................................... 4
2. Synthetic lethal gene pairs whose knockdown is predicted to selectively inhibit proliferation of cancer cells ......... 7
3. The number of predicted synergistic drug targets pairs involving different genes with known somatic mutations in
cancer ............................................................................................................................................................................. 9
Tables (in Data Set 1)
1. All genes in the generic human metabolic model.
2. Genes predicted to be highly cytostatic in the cancer model.
3. Gene pairs predicted to be highly synergetic and cytostatic.
4. The connections between pathways by the cancer synergic pairs.
5. All drugs from DrugBank.
6. Detailed biomass calculation.
7. The simulation details of RPMI 1640, the growth medium used.
8. The list of reactions appearing in the generic human model, cancer model and lung model.
9. Drugs predicted to target specific cancer types by chromosomal loss of synethetic lethal participant genes.
10. Pathway content of metabolic models.
11. Grouping of cytostatic genes prediction by affected biomass metabolites.
12. Grouping of cytostatic SL prediction by affected biomass metabolites.
13. Frequency of gene deletion through karyotyping across cancer types
1
The Model Building Algorithm (MBA; as described in Jerby et al.1)
A metabolic network model, amenable to constraint based modeling (CBM), can be represented by a 4-tuple,
, in which,
denotes a set of metabolites,
denotes reactions’ stoichiometry, and
denotes a set of biochemical reactions,
denotes constraints on reactions’ directionality (i.e. reflecting a lower
bound on reactions’ flux rate). Reactions’ stoichiometry is represented by a stoichiometric matrix,
represents the stoichiometric coefficient of metabolite in reactions .
A feasible flux distribution within a metabolic network model, is a vector
(
), and directionality constraints (
, in which
, satisfying mass-balance
). A metabolic network model is considered consistent if it enables
to activate all of its reactions– i.e. for each reaction
there exists a feasible flux distribution
, such that
.
Given a metabolic network model,
, referred to as the generic model (the human
2
model of Duarte et al. , in our case), a partial model,
, including only a subset of the generic
model’s reactions (
) can be defined based on the corresponding subsets of metabolites, stoichiometry and
directionality constraints. Notably, a partial model of a consistent generic model is not necessarily consistent itself
(i.e., it may contain dead-end reactions that cannot be activated considering all possible feasible flux distributions).
Given a generic model, GM, and a sets of core reactions
, known to have a high probability to be
included in some partial model our goal is to derive the most parsimonious, consistent, partial model
cancer model, in our case), consisting of all reaction in
following optimization problem:
. Specifically,
we would like to identify
(the
by solving the
Minimize
s.t.
PM is consistent,
To solve the above optimization problem, we employ the following simple search heuristic:
1. Define
2. Choose a random permutation,
, of reactions from the set
3. For each reaction
a.
b.
If
then
First, the current set of reactions in the partial model,
, is initialized to include all reactions in the generic
model
(1). Then, a random permutation of all reactions that are not in the high reliability core,
(and hence
their potential removal from the partial model can improve the optimization objective function) is generated. In (3),
the potential removal of each reaction in turn (based on the random permutation
) is evaluated. The procedure
computes the set of high reliability reactions in
removal of reaction
together with reaction
from
, and returns this set as
such that aside from
,
.
that cannot be activated after the
denotes a set of reactions that can be removed
the core is active. In case the removal of
2
from
would affect
the ability to activate a high reliability reaction,
would not be removed from
of would improve the optimization objective value, will be removed from
reactions
that cannot be activated following its removal.
A naive implementation of
(3.b). Otherwise, if the removal
, along with the additional
can simply iterate through all reactions in
3
applying FVA to check whether they can carry non-zero flux within a feasible flux distribution when
is removed.
The naïve algorithm is computationally prohibitive, with an overall time complexity of
denotes the
, where
number of reactions in , denotes the number of non-core reactions (
), and denotes the
(polynomial) complexity of each LP problem. Instead, a simple speed up technique was implemented that aims to
concurrently activate multiple core reactions in the same LP problem, by trying to maximize their total sum of flux.
These optimizations are repeated several times on a monotonically decreasing set of core reactions. Each iteration is
performed on a smaller subset of the core reactions, which have received zero flux in all previous iterations. These
speed ups reduce the running time by 97%, in comparison to the naïve approach. The total running time of the
algorithm is ~10-20 minutes depending on the size of the core reactions set.
Since the resulting model depends on the chosen reaction scanning order, the algorithm is executed
repeatedly for a number of times (1,000 in the results presented here) with different, random scanning orders. Each
run results in a candidate model. All 1,000 candidate models are then processed to assign the non-core reactions
with confidence scores, which are the fraction of candidate models in which they appear. An aggregative model is
built by considering the scores across all runs, starting with the
reactions and incrementally adding reactions in
order of descending confidence score until a consistent, viable model is obtained.
3
Similarity in metabolic activity across cancer cell-lines
The similarity in metabolic activity across cell-lines is evident via high-throughput molecular data such as those used
to reconstruct and validate the generic cancer metabolic model:
(i)
Microarray data – shows significant up-regulation of central metabolic pathways, including oxidative
phosphorylation and nucleotide biosynthesis, across many cancer cell-lines 4.
(ii)
Metabolomics – showing a highly statistically significant correlation between metabolomic profiles measured
across NCI-60 cancer cell-lines (p-value < 1e-5). Specifically, the metabolomics data includes measurements
of the concentration 98 metabolites across 57 cancer cell lines from the NCI collection (measured by
Metabolon). In order to assess the significance of similarity of metabolite concentrations between cell lines,
we performed the following permutation test. The mean Pearson correlation between the concentration
vectors of each pair of cell lines is a statistic measuring similarity between all cell lines. We shuffle the
concentration values of each metabolite between the different cell lines and calculate the similarity statistic
for the new, shuffled dataset. Repeating this process over 100,000 times, we found that no shuffled dataset
reaches the similarity statistic of the original dataset, resulting in a p-value < 1e-5.
(iii)
shRNA knockdowns - showing a substantial correlation in gene requirements for proliferation of many celllines 5. Repeating the original analysis performed in Luo et al. while focusing specifically on metabolic genes
has shown that a significantly high number of metabolic genes are ranked across the top 5% of essential
genes based on shRNA across the different cancer cell-lines (p-value < 1e-5; Supp. Material). The p-value was
computed by comparing the number of genes that are in the top 5% essential genes in at least 5 cell-lines
(out of 12) to that obtained when randomly shuffling the essentiality rankings.
Validating predicted growth-supporting genes via the experimental shRNA data
As described in the main text, the set of 199 predicted growth-supporting genes are found to be ranked as highly
essential based on shRNA gene knockdown data (Kolmogorov-Smirnov p-value = 0.0045) when aggregating together
the shRNA measurements across the 12 cell lines. When inspecting each cell-line separately, the set of 199 predicted
genes are ranked as highly essential in 5 out of these 12 cancer cell-lines (where significance is computed as
described in the Methods, via one-sided Kolmogorov-Smirnov test, considering a false-discovery rate of 0.05).
Supplementary Figure 1: The distribution of Kolmogorov-Smirnov (KS) scores for an enrichment test of randomly chosen sets
of 199 genes with genes ranked as highly contributing to growth based on the shRNA data of Luo et al. The KS scores are
computed for the set of probes associated with each randomly chosen set of genes. The red area denotes the fraction of
randomly chosen gene sets with an equal or better KS score than that obtained for the set of 199 predicted growth-supporting
4
genes predicted by the cancer-model (showing that the latter set is significantly enriched with genes ranked as highly
contributing to growth; p-value of 0.0045).
As a further control, we evaluated the predictive performance of the original generic human metabolic
network itself. We found that the human model predicts a smaller set of 92 growth-supporting genes that are also
ranked as essential based on shRNA data, but with a p-value of 0.029, which is larger by an order of magnitude than
that obtained by the cancer model (0.0045). Examining the shRNA of each cell-line separately, this set of growthsupporting genes is ranked as highly essential in only 2 out of the 12 cell-lines cancer cell-lines. The enrichment of
these genes with known cancer mutation genes6 is markedly lower than that obtained with the cancer model’s
predictions, with a p-value of 0.025 versus 0.0021 obtained with the cancer model.
As another control, we evaluated the predictive performance of the gene expression data alone (i.e. without
utilizing a metabolic network model). We found that the set of 197 genes that are highly expressed across 90% of the
cancer cell-lines (used as input to reconstruct the cancer model) are not enriched with genes ranked as highly
contributing to cancer growth based on the shRNA data (p-value 0.075; when aggregating the shRNA data across the
12 cell-lines). They also provide a lower enrichment with known cancer mutation genes (p-value = 0.0077). Next, we
evaluated the predictive performance of gene expression data measured specifically in these cell-lines, finding that
genes with high expression levels are ranked as highly essential based on the shRNA data in only 3 out of these
cancer cell-lines. For the latter analysis, we utilized the following gene expression datasets from GEO: GSE8045,
GSE8332, GSE10841, GSE8332, GSE10841, GSE7562, GSE11058, GSE12056, GSE4536, GSE13313, and GSE4536.
We estimated the sensitivity of our predictions of growth-supporting genes to uncertainty in biomass
composition (and its potential variation across cancer types) by adding random noise to the biomass coefficients and
evaluating its effect on the predictive performance of growth-supporting genes. The random noise was drawn from a
Gaussian distribution with a standard deviation of 50% of the original biomass coefficient values. For each choice of
random biomass coefficients, we repeated (for a 1000 runs) the prediction of growth-supporting genes via FBA and
tested for enrichment of genes that contribute to growth based on the shRNA data from Luo et al, as described in
the ranking procedure above. We found that the cancer model is robust to different choices of biomass composition
(in accordance with previous similar findings in microbial networks7), with a mean p-value of 0.0505 and standard
deviation of 0.0168 manifested at even these fairly high noise levels.
We further estimated the sensitivity of this result to the specific choice of threshold for predicting growthsupporting genes, by repeating the analysis for a wide range of thresholds in the range 0.001 to 0.2. We find that the
predictive performance is highly robust to the specific choice of threshold, with the corresponding p-values
remaining significant in the range 0.0045 to 0.042.
The sensitivity of the model reconstruction approach and predicted growth-inducing genes prediction to
uncertainty in the definition of the core reactions set was estimated as following: We applied MBA 1,000 times to
reconstruct a cancer metabolic model while randomly replacing 10% of the core genes with random genes from the
generic human model. We found that that the variability in overlap between the sets of growth-inducing genes
predicted by the 1,000 models is at most as 10% - testifying for the highly robust nature of our approach.
Grouping functionally related targets
We analyzed the specific set of biomass compounds inhibited by the knockdown of each identified single and double
targets. To investigate how targets differ from one another, we partitioned the single and double targets into groups
5
based on the set of biomass compounds affected by their knockdown. The set of 52 single drug targets formed 19
target groups (Supp. Table 11), while the 133 synthetic lethal pairs formed 44 target groups (Supp. Table 12). The
target group that specifically inhibits cholesterol production consists of a long chain of enzymes directly involved in
cholesterol biosynthesis (in accordance with the pathway annotation of these enzymes). In some cases, the pathway
annotation of targets did not match their partition into target groups, as was already shown earlier in another
context8. For example, various single targets in nucleotide and glycerophospholipid metabolism are predicted to
affect different biomass compounds. Notably, a group of targets whose knockdown is predicted to inhibit the same
set of biomass compounds may still differ from one another in the set of additional (non-biomass related) metabolic
alterations.
Reconstructing a lung cancer metabolic network model
We applied our cancer model building approach to reconstruct a model of non-small cell lung cancer metabolism,
utilizing multiple gene expression datasets from a specific NCI-60 cell-line that has a relative abundance of such data
(A549)9. The resulting model consists of 791 genes, 957 reactions and 730 metabolites (in comparison with 696, 813,
and 665, genes, reactions, and metabolites, respectively, in the generic cancer model). Comparing the pathway
content of both models shows a markedly higher number of reactions in Vitamin A and B6 metabolism in the lung
cancer model 10,11and of phenylalanine and tyrosine metabolism (in accordance with findings regarding altered
metabolic function of these pathways in lung cancer10-12; Supp. Table 10).The growth-supporting genes predicted by
the lung cancer model are ranked as highly essential based on shRNA gene silencing data measured in this cell-line
(Kolmogorov-Smirnov p-value = 0.025), notably outperforming the predictions made by the generic cancer model
that are not ranked as significantly essential in this cell-line (the predictions made by the generic cancer model are
ranked as essential via shRNA in 5 out of the studied 12 cancer cell-lines; Supp. Material, while those made by the
lung cancer model are uniquely significant for the lung data). Importantly, the set of highly expressed genes in lung
cancer that was used as input in the reconstruction of the lung cancer model are also not ranked as highly essential testifying for the added value of building a cancer type-specific model in this case. Filtering the set of predicted
growth-supporting genes in lung cancer to identify those that do not damage normal cell activity (as done with the
generic cancer model) results in a set of 60 targets with high cytostatic score, with an overlap of 50 with those
predicted by the generic cancer model (testifying to the general utility of the generic model; Supp. Table 2). This set
of genes includes a target of the drug Pemetrexed that is approved by FDA for treatment of non-small cell lung
cancer. Many of the targets predicted only by the lung cancer model belong to the same metabolic pathways as the
known and experimental anticancer drug targets, including nucleotide metabolism and Glycerophospholipid
Metabolism (Supp Table 2). One specifically interesting case of these additional targets is SLC29A1 (nucleoside
transporter), targeted by Troglitazone13 which was shown to have anti-proliferative effect on the analyzed lung
cancer cell-line A54914. Overall, these results demonstrate the potential of our computational approach for the
future development and study of more refined cancer type-specific models.
6
Supplementary Figure 2: Synthetic lethal gene pairs whose knockdown is predicted to selectively inhibit
proliferation of cancer cells: (a) The cancer model lacks the alternative pathway for glycine production via SARDH,
and hence the double knockdown of both SHMT1 and AGXT activity is predicted to inhibit their glycine production.
Its existence in normal cells, however, is predicted to still enable glycine synthesis in face of such a combinatorial
knockdown. (b) The cancer model lacks several histidine transporters due to low expression levels, and hence the
double knockdown of both isozymes SLC6A14 and SLC38A2 (that do belong to the cancer model) is predicted to be
7
essential only for cancer cells. (c) The cancer model lacks several transporters of inorganic phosphate, making the
double knockout of the isozymes SLC20A1 and SLC20A2 essential for viability only for cancer cells.
Predicting synthetic lethal gene targets suggests novel combinatorial drug therapies
The predicted synthetic lethal gene pairs suggest novel combinatorial drug therapies. For 9 of the predicted
synergistic selective gene pairs there are either approved or experimental drugs (not necessarily cancer related) that
target both genes (this relative paucity is expected given the relative scarcity of treatments attacking multiple
metabolic targets). In one such pair, both genes (IMPDH1 and IMPDH2 that code for isoenzymes of inosine
monophosphate dehydrogenase) are targeted by the same approved anticancer drug (either by Mercaptopurine or
Thioguanine). Another pair is FH (fumarate dehydrogenase) and UROD (uroporphyrinogen decarboxylase), the latter
targeted by a recently developed inhibitor15,16. Another pair consists of two isoenzymes of inositol monophosphatase
(IMPA1 and IMPA2), both targeted by Lithium (clinically used for treating bipolar affective disorder), which was
previously shown to reduce proliferation of esophageal cancer17. Three additional gene pairs include the gene
SHMT1 (serine hydroxymethyltransferase) targeted by the drug Mimosine, which was previously shown to inhibit
proliferation of human cancer cells18. Overall, the 133 synthetic lethal gene pairs include 4 additional known
anticancer drug targets not identified in the single knockdown analysis described above, representing a highly
significant enrichment (hypergeometric p-value < 2e-6). For 60 out of the 133 predicted synergistic gene pairs there
already exists a drug targeting one of the genes, making them quite appealing from an applicative standpoint, with
the additionally predicted gene forming an interesting candidate for potential drug targeting. For example, PNP
(purine nucleoside phosphorylase) and AK5 (adenylate kinase), the former targeted by Cladribine that is in use to
treat hairy cell leukemia19.
8
Supplementary Figure
3: The number of predicted synergistic drug targets pairs involving different genes with known somatic mutations
in cancer20. For each gene in which cancer mutations were previously identified, the figure shows the number of
predicted synergistic gene pairs in which it participates. The number of paired genes targeted by existing drugs is
shown in a red, while the number of paired genes which are not targeted by currently available drugs is shown in
green.
9
References:
1.
Jerby, L., Shlomi, T. & Ruppin, E. Computational reconstruction of tissue-specific metabolic models:
application to human liver metabolism. Molecular systems biology 6, 1 (2010).
2.
Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic
data. Proc Natl Acad Sci U S A 104, 1777-82 (2007).
3.
Mahadevan, R. & Schilling, C.H. The effects of alternate optimal solutions in constraint-based genome-scale
metabolic models. Metab Eng 5, 264-76 (2003).
4.
Ertel, A., Verghese, A., Byers, S.W., Ochs, M. & Tozeren, A. Pathway-specific differences between tumor cell
lines and normal and tumor tissue cells. Mol Cancer 5, 55 (2006).
5.
Luo, B. et al. Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A 105,
20380-5 (2008).
6.
Futreal, P.A. et al. A census of human cancer genes. Nature Reviews Cancer 4, 177-183 (2004).
7.
Varma, A., Boesch, B.W. & Palsson, B.O. Biochemical production capabilities of escherichia coli. Biotechnol
Bioeng 42, 59-73 (1993).
8.
Shlomi, T. et al. Systematic condition-dependent annotation of metabolic genes. Genome Res 17, 1626-33
(2007).
9.
Grever, M.R., Schepartz, S.A. & Chabner, B.A. The National Cancer Institute: cancer drug discovery and
development program. Semin Oncol 19, 622-38 (1992).
10.
Johansson, M. et al. Serum B vitamin levels and risk of lung cancer. JAMA 303, 2377-85 (2010).
11.
Willett, W.C. Vitamin A and lung cancer. Nutr Rev 48, 201-11 (1990).
12.
Yang, Q. et al. Urinary metabonomic study of lung cancer by a fully automatic hyphenated hydrophilic
interaction/RPLC MS system. Journal of Separation Science 33, 1495-1503 (2010).
13.
Leung, G., Man, R. & Tse, C. Effect of thiazolidinediones on equilibrative nucleoside transporter-1 in human
aortic smooth muscle cells. Biochemical pharmacology 70, 355-362 (2005).
14.
Tsubouchi, Y. et al. Inhibition of human lung cancer cell growth by the peroxisome proliferator-activated
receptor-[gamma] agonists through induction of apoptosis. Biochemical and Biophysical Research
Communications 270, 400-405 (2000).
15.
Ito, E. et al. Uroporphyrinogen decarboxylase is a radiosensitizing target for head and neck cancer. Sci Transl
Med 3, 67ra7 (2011).
16.
Phillips, J.D., Bergonia, H.A., Reilly, C.A., Franklin, M.R. & Kushner, J.P. A porphomethene inhibitor of
uroporphyrinogen decarboxylase causes porphyria cutanea tarda. Proc Natl Acad Sci U S A 104, 5079-84
(2007).
17.
Cohen, Y., Chetrit, A., Sirota, P. & Modan, B. Cancer morbidity in psychiatric patients: influence of lithium
carbonate treatment. Medical Oncology 15, 32-36 (1998).
18.
Chang, H., Lee, T., Chuang, L., Yen, M. & Hung, W. Inhibitory effect of mimosine on proliferation of human
lung cancer cells is mediated by multiple mechanisms. Cancer letters 145, 1-8 (1999).
19.
Else, M. et al. Long term follow up of 233 patients with hairy cell leukaemia, treated initially with pentostatin
or cladribine, at a median of 16 years from diagnosis. British journal of haematology 145, 733-740 (2009).
20.
Forbes, S.A. et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet Chapter
10, Unit 10 11 (2008).
10