Download RNA-Seq data analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Advances and challenges in computational
modeling and statistical learning of
biological systems
Qi Liu
Department of Biomedical Informatics
Vanderbilt University School of Medicine
[email protected]
•
•
•
•
•
Disease marker
Prediction model
Classification
Precise treatment
Hypothesis test
Applications
•
•
•
•
•
Data mining
Machine learning
Regression models
Exploratory analysis
Statistics
Methods
BIG Data
•
•
•
•
•
http://jdr.sagepub.com/content/90/5/561
Genomics
Transcriptomics
Proteomics
Epigenomics
Clinical Data
NGS Technologies
http://www.slideshare.net/mkim8/a-comparison-of-ngs-platforms
A decade’s perspective on DNA
sequencing technology
Elaine R. Mardis, Nature(2011) 470, 198-203
Patient
Technologies
Data Analysis
Integration and interpretation
point mutation
Genomics
WGS, WES
Copy number
variation
Structural
variation
Functional effect of
mutation
Differential
expression
Transcriptomics
RNA-Seq
Gene fusion
Network and pathway
analysis
Alternative
splicing
RNA editing
Integrative analysis
Epigenomics
Bisulfite-Seq
ChIP-Seq
Methylation
Histone
modification
Transcription
Factor binding
Shyr D, Liu Q. Biol Proced Online. (2013)15,4
Further understanding of cancer and clinical applications
Small indels
Objectives
1. Understand relationships between different
types of molecular data
2. Understand the phenotype
– latent: disease subtype
– Observable: patient outcome
GTEX
http://www.gtexportal.org/home/
TCGA
https://tcga-data.nci.nih.gov/tcga/
http://www.nature.com/ng/journal/v45/n10/full/ng.2764.html
Inferring regulation networks
transcription
DNA
Transcriptional
regulation network
Post-transcription
RNA
Protein
miRNA
TF
TF
Post-transcription
regulation network
Reveal the relationships
between different molecular
layers
– The strength of association indicates
in trans-regulation.
miRNA
Integrative method
GSE10843
miRNA-mRNA
correlation
mRNA decay
protein/mRNA
ratio
miRNA-ratio
correlation
Translational
repression
protein
miRNA-protein
correlation
i
mRNA
GSE10833
microRNA
79 miRNAs
Combined effect
5144 genes
Sequence features on site efficacy
Association of sequence features with
estimated mRNA decay or translation
repression
microRNA-target interactions
Significant inverse
Correlation (p<0.005)
Supported by TargetScan,
miRanda or MirTarget2
Site type
Site location
7235 functional
relationships
Binding
evidence
Local AU-context
Additional 3’ pairing
microRNA-target
interactions
the relative contribution of
translation repression
580 interactions
60miRNAs
423 genes
Features on site efficacy for these two regulation types
mRNA decay :
8mer is efficient
Tanslational repression :
8mer site do not show significant efficacy
mRNA decay :
3’UTR>ORF>5’UTR
translational repression :
marginal significance in ORF
Features on site efficacy for these two regulation types
AU-rich context appears to favor both mRNA
decay and translational repression
3’ pairing enhance mRNA decay , but
disfavor efficacy for translational
repression
miR-138 prefers translational repression
SW620 and SW480 (derived from the same patient)
SW620
SW480
source
lymph node
primary
metastasis
high
poor
miR-138
(log2)
3.06
6.39
A
B
C
UP
DOWN
GOLGI_VESICLE_TRANSPORT(FDR=0.07)
KEGG_AMINOACYL_TRNA_BIOSYNTHESIS (FDR=0.03)
KEGG_PROTEASOME (FDR=0.03)
CYTOKINE_METABOLIC_PROCESS (FDR=0.09)
FEEDING_BEHAVIOR (FDR=0.005)
D
UP
DOWN (FDR=0.00001)
KEGG_PRIMARY_IMMUNODEFICIENCY (FDR=0.002)
GPROTEIN_COUPLED_RECEPTOR_SIGNALING (FDR=0.005)
KEGG_ALLOGRAFT_REJECTION (FDR=0.005)
KEGG_CELL_ADHESION_MOLECULES_CAMS (FDR=0.003)
T_CELL_ACTIVATION (FDR=0.002)
123 Stage I
mRNA
Limma
Stage-dependent
alterations
Correlation
CNV
55 Stage IV
Methylation
TF-target
CNV effect
Correlation
Methylation
effect
Regression Model
Stage-dependent
TF activity changes
A
D
B
C
Stage-dependent TF activities changes
Regulator
Target regulation
Effect size
FDR
GATA6
Up
0.14
1.2e-13
NFIL3
Down
-0.12
1.0e-08
SREBF2
Up
0.12
7.3e-08
SREBF1
Down
-0.08
1.0e-07
TBP
Up
0.05
1.4e-07
HLF
Up
0.11
7.5e-07
TCF12
Up
0.10
3.1e-06
GATA1
Down
-0.07
1.6e-05
FOSB
Up
0.10
1.7e-05
RARA/RARB/
RARG/RXRB
Up
0.21
6.5e-05
REST
Up
0.14
9.2e-05
FOXF2
Down
-0.05
1.3e-04
FOXC1
Up
0.09
1.7e-04
HMGA1
Up
0.09
1.9e-04
E2F7
Up
0.12
3.6e-04
NKX2-1
Up
0.06
8.2e-04
Challenges
•
•
•
•
•
•
•
Complex structure, but limited sample size
Cooperative regulation
Incorporate prior knowledge
Nonlinear effect
Long range chromatin interaction
Data heterogeneity
Complexity and model sparsity
Individual omics analysis
Integrative omics analysis
Illustrative example of SNF steps
The advantage of the integrative procedure is that weak similarities (low-weight edges)
disappear, helping to reduce the noise, and strong similarities (high-weight edges)
present in one or more networks are added to the others. Additionally, low-weight edges
supported by all networks are retained depending on how tightly connected their
neighborhoods are across networks.
Methods
Extension to more than 2 data types
Inspired by the theoretical multiview learning framework developed for the computer
vision and image processing applications.
Patient similarities for each data types
compared to SNF fused similarity
Comparison of SNF with icluster and
concatenation
Challenges
• Systems-level probabilistic modeling of
multiple data types
• Correlated data
• Missing values
• Dependence among genes
Thank you very much for
your attention!
Related documents