Download Template to create a scientific poster

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Pan-cancer analysis of prognostic genes
Jordan Anaya
Omnes Res, omnesres.com, @omnesresnetwork
Abstract
Overlaps of prognostic genes
In this study I have used publicly available clinical and RNA-SEQ data from the
TCGA to investigate each gene’s correlation with survival for 16 different cancers,
which included 6,495 patients. For the measure of correlation I used multivariate
Cox regression, with gene expression, grade, sex, and age as the multivariates. To
improve performance of the models gene expression was inverse normal
transformed.
Cancers showed large differences in numbers of significantly
correlated genes, which could not be explained by sample size or events. However,
even cancers with low signal to noise displayed meaningful expression patterns of
protective and harmful genes, and gene set enrichments with MSigDB. The most
significant protective and harmful genes were not shared across cancers, but these
genes were enriched in gene sets that were shared across certain groups of
cancers. These groups of cancers were independently recapitulated by both
unsupervised clustering of Cox coefficients for individual genes, and for gene
programs. This is the first time comprehensive lists of prognostic genes has been
made publicly available, and the first time cancers have been compared using a
measure of correlation to survival, which contains more information than expression,
including hidden information such as treatment response.
Clustering of cancers using gene Cox
coefficients
Cancers display a range of p-value distributions
The overlaps of the 100 most significant prognostic genes in each
cancer.
Overlaps of gene sets
Cancers were clustered using normalized Cox coefficients of prognostic genes. In
general cancers which shared gene sets clustered together.
Clustering of cancers using gene programs
While some cancers show a large number of genes with p-values below .05 such as
LGG, other cancers such as STAD have a nearly flat distribution of p-values. See
Table 1 for more details.
Harmful and protective genes display opposite
expression patterns
I clustered patients with the 100 most significant prognostic genes and 100 most
significant harmful genes. In general, protective genes showed very similar
expression patterns across patients, and this same trend was seen for harmful
genes. This has important implications for identifying a small gene set to predict
patient survival. There are thousands of combinations of genes that would give very
similar predictions, making identification of one set of genes of questionable utility. It
is important to note that while all the genes for STAD did not pass a FDR cutoff, they
still showed meaningful expression patterns.
The prognostic genes in any cancer can cluster
patients into two statistically different groups
Using the 250 most significant harmful genes in each cancer, I found the
100 most enriched gene sets with MSigDB. These are the overlaps of
those 100 gene sets. Some cancers such as LIHC, LUAD, and KIRP
share large numbers of gene sets.
Characteristics of datasets and patients
included in this study
Using established gene programs from Hoadley et al., I found the average
normalized Cox coefficient for each cancer across the program. The programs and
cancers were then clustered with hierarchical clustering. Overall the same cancers
which shared gene sets and clustered together with individual gene Cox
coefficients again clustered together.
Conclusions
• RNA-SEQ can find meaningful survival correlations across 16
different cancers
• Cancers show a wide range in the number of genes that meet a FDR
cutoff, which should be used to inform p-values found for individual
genes
• Cancers do not share prognostic genes, but do share gene sets
• Cancers can be clustered with Cox coefficients of individual genes and
gene programs
• The same groupings of cancers (for example LIHC/LUAD/KIRP and
COAD/GBM/LUSC) were found with three independent methods,
indicating that using prognostic information to compare cancers can
find unappreciated commonalities among cancers.
These Kaplan Meier plots were made with the clusters from above. Not surprisingly
the LGG clusters were highly statistically different. More surprising was finding the
clusters from STAD also being highly statistically different, indicating that even the
cancers with low signal to noise contain useful biological information. As a result, all
cancers were included in future analyses.
References
Hoadley, K. A. et al. Multiplatform analysis of 12 cancer types reveals molecular
classification within and across tissues of origin. Cell 158, 929-944 (2014).