Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected] • • • • • Disease marker Prediction model Classification Precise treatment Hypothesis test Applications • • • • • Data mining Machine learning Regression models Exploratory analysis Statistics Methods BIG Data • • • • • http://jdr.sagepub.com/content/90/5/561 Genomics Transcriptomics Proteomics Epigenomics Clinical Data NGS Technologies http://www.slideshare.net/mkim8/a-comparison-of-ngs-platforms A decade’s perspective on DNA sequencing technology Elaine R. Mardis, Nature(2011) 470, 198-203 Patient Technologies Data Analysis Integration and interpretation point mutation Genomics WGS, WES Copy number variation Structural variation Functional effect of mutation Differential expression Transcriptomics RNA-Seq Gene fusion Network and pathway analysis Alternative splicing RNA editing Integrative analysis Epigenomics Bisulfite-Seq ChIP-Seq Methylation Histone modification Transcription Factor binding Shyr D, Liu Q. Biol Proced Online. (2013)15,4 Further understanding of cancer and clinical applications Small indels Objectives 1. Understand relationships between different types of molecular data 2. Understand the phenotype – latent: disease subtype – Observable: patient outcome GTEX http://www.gtexportal.org/home/ TCGA https://tcga-data.nci.nih.gov/tcga/ http://www.nature.com/ng/journal/v45/n10/full/ng.2764.html Inferring regulation networks transcription DNA Transcriptional regulation network Post-transcription RNA Protein miRNA TF TF Post-transcription regulation network Reveal the relationships between different molecular layers – The strength of association indicates in trans-regulation. miRNA Integrative method GSE10843 miRNA-mRNA correlation mRNA decay protein/mRNA ratio miRNA-ratio correlation Translational repression protein miRNA-protein correlation i mRNA GSE10833 microRNA 79 miRNAs Combined effect 5144 genes Sequence features on site efficacy Association of sequence features with estimated mRNA decay or translation repression microRNA-target interactions Significant inverse Correlation (p<0.005) Supported by TargetScan, miRanda or MirTarget2 Site type Site location 7235 functional relationships Binding evidence Local AU-context Additional 3’ pairing microRNA-target interactions the relative contribution of translation repression 580 interactions 60miRNAs 423 genes Features on site efficacy for these two regulation types mRNA decay : 8mer is efficient Tanslational repression : 8mer site do not show significant efficacy mRNA decay : 3’UTR>ORF>5’UTR translational repression : marginal significance in ORF Features on site efficacy for these two regulation types AU-rich context appears to favor both mRNA decay and translational repression 3’ pairing enhance mRNA decay , but disfavor efficacy for translational repression miR-138 prefers translational repression SW620 and SW480 (derived from the same patient) SW620 SW480 source lymph node primary metastasis high poor miR-138 (log2) 3.06 6.39 A B C UP DOWN GOLGI_VESICLE_TRANSPORT(FDR=0.07) KEGG_AMINOACYL_TRNA_BIOSYNTHESIS (FDR=0.03) KEGG_PROTEASOME (FDR=0.03) CYTOKINE_METABOLIC_PROCESS (FDR=0.09) FEEDING_BEHAVIOR (FDR=0.005) D UP DOWN (FDR=0.00001) KEGG_PRIMARY_IMMUNODEFICIENCY (FDR=0.002) GPROTEIN_COUPLED_RECEPTOR_SIGNALING (FDR=0.005) KEGG_ALLOGRAFT_REJECTION (FDR=0.005) KEGG_CELL_ADHESION_MOLECULES_CAMS (FDR=0.003) T_CELL_ACTIVATION (FDR=0.002) 123 Stage I mRNA Limma Stage-dependent alterations Correlation CNV 55 Stage IV Methylation TF-target CNV effect Correlation Methylation effect Regression Model Stage-dependent TF activity changes A D B C Stage-dependent TF activities changes Regulator Target regulation Effect size FDR GATA6 Up 0.14 1.2e-13 NFIL3 Down -0.12 1.0e-08 SREBF2 Up 0.12 7.3e-08 SREBF1 Down -0.08 1.0e-07 TBP Up 0.05 1.4e-07 HLF Up 0.11 7.5e-07 TCF12 Up 0.10 3.1e-06 GATA1 Down -0.07 1.6e-05 FOSB Up 0.10 1.7e-05 RARA/RARB/ RARG/RXRB Up 0.21 6.5e-05 REST Up 0.14 9.2e-05 FOXF2 Down -0.05 1.3e-04 FOXC1 Up 0.09 1.7e-04 HMGA1 Up 0.09 1.9e-04 E2F7 Up 0.12 3.6e-04 NKX2-1 Up 0.06 8.2e-04 Challenges • • • • • • • Complex structure, but limited sample size Cooperative regulation Incorporate prior knowledge Nonlinear effect Long range chromatin interaction Data heterogeneity Complexity and model sparsity Individual omics analysis Integrative omics analysis Illustrative example of SNF steps The advantage of the integrative procedure is that weak similarities (low-weight edges) disappear, helping to reduce the noise, and strong similarities (high-weight edges) present in one or more networks are added to the others. Additionally, low-weight edges supported by all networks are retained depending on how tightly connected their neighborhoods are across networks. Methods Extension to more than 2 data types Inspired by the theoretical multiview learning framework developed for the computer vision and image processing applications. Patient similarities for each data types compared to SNF fused similarity Comparison of SNF with icluster and concatenation Challenges • Systems-level probabilistic modeling of multiple data types • Correlated data • Missing values • Dependence among genes Thank you very much for your attention!