Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Integration of methylation, long non-coding RNA and mRNA expression data in Lung Cancer Travers Ching, Sijia Huang, Fangxiu Xu, Jinli Qu, Jingxin Li, Herbert Yu, Biyun Qian, Lana Garmire Dept. of Molecular Biosciences and Bioengineering, University of Hawaiʻi at Mānoa Epidemiology Program, University of Hawaii Cancer Center Introduction Methylation modules correlate well with gene expression Background: • Lung cancer is the most commonly diagnosed cancer (12.7%) • Lung cancer accounts for the most cancer death (18.2%) Methylation network analysis using “Spin Glass” community detection algorithm The samples: • 24 paired tissue samples collected from 12 lung cancer patients The High-throughput Microarray Platforms: • SBC lncRNA chip (Shanghai Biotechnlogy Co., Ltd.) • Illumina HumanMethylation450 BeadChip array (450K) Significant differences between tumor and normal tissues • Map epigenetic hotspots onto a protein interaction network (HPRD reference) • Validate methylation results with mRNA data • Hierarchical clustering of hotspots Fig. 2: A: example epigenetically modified hotspot (epimod) B: mRNA expression heatmap of the 23 epimod “seeds” • Clustering of seeds (fig. 2B) genes show separation of cancer and normal tissue samples Methods: Use of a general linear model (GLM) to test differential expression (DE). expression level patient pairing 𝑻 Effective prediction of gene expression from methylation features Question: “How well can DNA methylation predict gene expression?” • Split data into training (80%) and testing (20%) sets • build model on training set • evaluate performance on test set • Features set (287 total): • methylation features, transcript features. • Correlation Feature Selection (CFS) selects 32 features 𝑻 • Model: 𝝁𝒊 = 𝒙𝟏𝒊 𝜷 + 𝒙𝟐𝒊 𝑷 + 𝜷𝟎 tissue Results: • Significant DE (after MHT correction) • mRNA: 10,000 DE transcripts • lncRNA: 9,000 DE transcripts • methylation: 100,000 DE CpG sites • Strong separation between tissue types (Fig. 1) LncRNAs play important roles in cancer Predictions of lncRNA targets • Cis targets: targets based on proximity • Trans targets: targets based on sequence homology • Enrichment analysis of BIOCARTA and KEGG pathways pathways Term Focal adhesion Adherens junction Axon guidance Fig 1. Top: PCA of expression data Pathways in cancer Middle: volcano plot of 450K ATM Signaling Bottom: example correlation plot Cell Cycle of lncRNA/target gene Trans-target Results Count P-value 22 0.0046 10 0.029 31 0.0049 26 0.077 6 0.014 5 0.091 Cis-target Results Count P-Value 50 0.00011 25 0.00014 31 0.0049 66 0.0041 8 0.041 8 0.083 • Performance (AUC): • Random Forest: 0.806 • Linear SVM: 0.770 Fig. 3. ROC curves on holdout test • Gaussian SVM: 0.794 set for the prediction of up• Logistic Regression: 0.76 regulated genes Future work • • • • Additional features for gene prediction classifier (e.g., histone modification, intron information) apply other non-linear classification methods Find better metrics for lncRNA target prediction generalize spin glass algorithm to other types of datasets (originally intended for Illumina 27K) Acknowledgements Funding: This work is supported by the University of Hawaii Cancer Center Faculty Startup Grant, and the Collaboration Enhancement Award from NIMHD Grant 5 U54 MD008149-07