Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, omnesres.com, @omnesresnetwork Abstract Overlaps of prognostic genes In this study I have used publicly available clinical and RNA-SEQ data from the TCGA to investigate each gene’s correlation with survival for 16 different cancers, which included 6,495 patients. For the measure of correlation I used multivariate Cox regression, with gene expression, grade, sex, and age as the multivariates. To improve performance of the models gene expression was inverse normal transformed. Cancers showed large differences in numbers of significantly correlated genes, which could not be explained by sample size or events. However, even cancers with low signal to noise displayed meaningful expression patterns of protective and harmful genes, and gene set enrichments with MSigDB. The most significant protective and harmful genes were not shared across cancers, but these genes were enriched in gene sets that were shared across certain groups of cancers. These groups of cancers were independently recapitulated by both unsupervised clustering of Cox coefficients for individual genes, and for gene programs. This is the first time comprehensive lists of prognostic genes has been made publicly available, and the first time cancers have been compared using a measure of correlation to survival, which contains more information than expression, including hidden information such as treatment response. Clustering of cancers using gene Cox coefficients Cancers display a range of p-value distributions The overlaps of the 100 most significant prognostic genes in each cancer. Overlaps of gene sets Cancers were clustered using normalized Cox coefficients of prognostic genes. In general cancers which shared gene sets clustered together. Clustering of cancers using gene programs While some cancers show a large number of genes with p-values below .05 such as LGG, other cancers such as STAD have a nearly flat distribution of p-values. See Table 1 for more details. Harmful and protective genes display opposite expression patterns I clustered patients with the 100 most significant prognostic genes and 100 most significant harmful genes. In general, protective genes showed very similar expression patterns across patients, and this same trend was seen for harmful genes. This has important implications for identifying a small gene set to predict patient survival. There are thousands of combinations of genes that would give very similar predictions, making identification of one set of genes of questionable utility. It is important to note that while all the genes for STAD did not pass a FDR cutoff, they still showed meaningful expression patterns. The prognostic genes in any cancer can cluster patients into two statistically different groups Using the 250 most significant harmful genes in each cancer, I found the 100 most enriched gene sets with MSigDB. These are the overlaps of those 100 gene sets. Some cancers such as LIHC, LUAD, and KIRP share large numbers of gene sets. Characteristics of datasets and patients included in this study Using established gene programs from Hoadley et al., I found the average normalized Cox coefficient for each cancer across the program. The programs and cancers were then clustered with hierarchical clustering. Overall the same cancers which shared gene sets and clustered together with individual gene Cox coefficients again clustered together. Conclusions • RNA-SEQ can find meaningful survival correlations across 16 different cancers • Cancers show a wide range in the number of genes that meet a FDR cutoff, which should be used to inform p-values found for individual genes • Cancers do not share prognostic genes, but do share gene sets • Cancers can be clustered with Cox coefficients of individual genes and gene programs • The same groupings of cancers (for example LIHC/LUAD/KIRP and COAD/GBM/LUSC) were found with three independent methods, indicating that using prognostic information to compare cancers can find unappreciated commonalities among cancers. These Kaplan Meier plots were made with the clusters from above. Not surprisingly the LGG clusters were highly statistically different. More surprising was finding the clusters from STAD also being highly statistically different, indicating that even the cancers with low signal to noise contain useful biological information. As a result, all cancers were included in future analyses. References Hoadley, K. A. et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158, 929-944 (2014).