Download Integration of methylation, long non

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Integration of methylation, long non-coding RNA and mRNA expression
data in Lung Cancer
Travers Ching, Sijia Huang, Fangxiu Xu, Jinli Qu, Jingxin Li, Herbert Yu, Biyun Qian, Lana Garmire
Dept. of Molecular Biosciences and Bioengineering, University of Hawaiʻi at Mānoa
Epidemiology Program, University of Hawaii Cancer Center
Introduction
Methylation modules correlate
well with gene expression
Background:
• Lung cancer is the most commonly diagnosed cancer (12.7%)
• Lung cancer accounts for the most cancer death (18.2%)
Methylation network analysis using “Spin Glass”
community detection algorithm
The samples:
• 24 paired tissue samples collected from 12 lung cancer patients
The High-throughput Microarray Platforms:
• SBC lncRNA chip (Shanghai Biotechnlogy Co., Ltd.)
• Illumina HumanMethylation450 BeadChip array (450K)
Significant differences between tumor and normal tissues
• Map epigenetic hotspots onto a protein
interaction network (HPRD reference)
• Validate methylation results with mRNA data
• Hierarchical clustering of hotspots Fig. 2: A: example epigenetically modified hotspot (epimod)
B: mRNA expression heatmap of the 23 epimod “seeds”
• Clustering of seeds (fig. 2B)
genes show separation of cancer and normal tissue samples
Methods: Use of a general linear model (GLM) to test
differential expression (DE).
expression level patient pairing
𝑻
Effective prediction of gene expression from methylation features
Question: “How well can DNA methylation predict gene expression?”
• Split data into training (80%) and testing (20%) sets
• build model on training set
• evaluate performance on test set
• Features set (287 total):
• methylation features, transcript features.
• Correlation Feature Selection (CFS) selects 32 features
𝑻
• Model: 𝝁𝒊 = 𝒙𝟏𝒊 𝜷 + 𝒙𝟐𝒊 𝑷 + 𝜷𝟎
tissue
Results:
• Significant DE (after MHT correction)
• mRNA: 10,000 DE transcripts
• lncRNA: 9,000 DE transcripts
• methylation: 100,000 DE CpG sites
• Strong separation between tissue types (Fig. 1)
LncRNAs play important roles in cancer
Predictions of lncRNA targets
• Cis targets: targets based on proximity
• Trans targets: targets based on sequence homology
• Enrichment analysis of BIOCARTA and KEGG pathways
pathways
Term
Focal adhesion
Adherens junction
Axon guidance
Fig 1. Top: PCA of expression data
Pathways in cancer
Middle: volcano plot of 450K
ATM Signaling
Bottom: example correlation plot
Cell Cycle
of lncRNA/target gene
Trans-target Results
Count
P-value
22
0.0046
10
0.029
31
0.0049
26
0.077
6
0.014
5
0.091
Cis-target Results
Count P-Value
50
0.00011
25
0.00014
31
0.0049
66
0.0041
8
0.041
8
0.083
• Performance (AUC):
• Random Forest: 0.806
• Linear SVM: 0.770
Fig. 3. ROC curves on holdout test
• Gaussian SVM: 0.794
set for the prediction of up• Logistic Regression: 0.76
regulated genes
Future work
•
•
•
•
Additional features for gene prediction classifier (e.g., histone modification, intron information)
apply other non-linear classification methods
Find better metrics for lncRNA target prediction
generalize spin glass algorithm to other types of datasets (originally intended for Illumina 27K)
Acknowledgements
Funding: This work is supported by the University of Hawaii Cancer Center Faculty Startup Grant, and the
Collaboration Enhancement Award from NIMHD Grant 5 U54 MD008149-07
Related documents