Download ppt - University of Connecticut

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Metabolic pathway activity estimation from RNA-Seq data
Department of Computer Science,
Department of Biology, Georgia State University
Computer Science and Engineering,
University of Connecticut
Yvette Temate-Tiagueu, Qiong Cheng, Meril
Mathew, Igor Mandric, Olga Glebova, Nicole Beth
Lopanik, Ion Mandoiu and Alex Zelikovsky
Abstract
Our Contribution
The application of RNA-Seq has allowed various differential analysis studies
including differential expression for pathways. A standard approach to study
the metabolic differences between species is metabolic pathway. In this
study, we introduce a novel approach to characterize pathways activity levels
of two samples. We present XPathway, a set of pathways activity analysis
tools based on Kegg-Kaas mapping of proteins to pathways. We applied our
proposed methods on RNA-Seq Bugula neritina metagenomics data. We
successfully identified several pathways with differential activity levels using
our novel computational approaches implemented in XPathway. Further
validation of initial results is conducted through qPCR.
Using Kegg: database resource for understanding high-level functions and
utilities of the biological system from molecular-level information. [Kanehisa
M., and Goto S., 2000]
(1) A novel graph-based approach to analyze pathways significance
(2) Representing a pathway as a set an inferring activity from the information
extracted from those sets
(3) Validating the two approaches through differential expression analysis at
the transcripts and genes level and also through qPCR experiment
RNA-seq reads
2 Samples
Objectives

Develop efficient algorithms for
reliable estimation of pathway activity
level
 Identify pathways which
activities significantly differ between
two conditions
Model 1: permutation of labels
d
d
b
c
e
a
c
c
In induced graph:
•
•
•
•
•
# nodes N
d
b
# edges M
# green connected components
# 0 in- & out-degrees
Density of the induced graph: M/(N-1)
d
b
𝒘EM-based
estimation
of
pathway
activity
= pathway
𝒈 = ortholog group
𝒇𝒘 =
g ∊𝒘
= (𝟏 +
𝜹 𝒘 = binary activity status of w
𝒈𝒘 = participation of ortholog
group g in pathway w
𝒈𝒘
𝒇𝒘 = activity level of pathway w
𝑻𝒘 = threshold of w
𝟏,
𝜹(𝒘) =
𝟎,
Bootstrapping:
- Repeat 1000 times
1. Randomly switch edges
2. Compute density of
the largest component
- Sort wrt to density
- Find the rank of the observed
induced subgraph
Pathway
activity
Illumina sequence paired-end reads:
Sample 1: Bugula with symbiont
Sample 2: Bugula without symbiont




𝒈𝒘
50bp paired-end reads
200bp mean fragment length
Assembly into contigs by Trinity
BLAST with Swissprot database
𝜹(𝒘′))−𝟏
Results
𝒘′ э 𝒈,𝒘′ ≠ 𝒘
𝐢𝐟 𝒇𝒘 ≥ 𝑻𝒘
𝐢𝐟 𝒇𝒘 < 𝑻𝒘
Validation
 Selected pathways for qPCR validation
Pathway #Mapped contigs DE contigs Ratio of DE Pathway name
ko00062
14
3
21.43% Fatty acid elongation
ko00100
8
1
12.50% Steroid biosynthesis
ko00250
39
4
10.26% Alanine, aspartate and glutamate metabolism
ko04146
98
15
15.31% Peroxisome
ko03008
67
10
14.93% Ribosome biogenesis in eukaryotes
ko03013
148
22
14.86% RNA transport
ko00983
28
4
14.29% Drug metabolism - other enzymes
ko04530
237
15
6.33% Tight junction
 qPCR
IsoDE
Contigs
validation
Differentially
expressed
pathways
1. Deep-water (West coast of United States)
2. Shallow-water (West and Southern East coasts)
3. Northern Atlantic (Northern East coast)
e
a
Binary EM
In United States - Three sibling species:
Model 2: permutation of edges
a
BLAST
Experimental studies: Bugula neritina
 Topology-based estimation of pathway significance
b
Pathway
significance
Contigs
Ortholog
groups
K00161
MAFSAED
VLK EYDKEGG, K00162
K00163
RRMEALSEED
Proteins
Experimental validation
Methods
c
Graph-based
Sample 2
Sample 1
a
Ortholog
groups
K00161
K00162KEGG,
K00163SEED
Trinity
 For gene expression analyses:
 Preliminary results
- Select pathways with significantly different activity
- Select DE transcripts from these pathways
 More primers ordered
- Select the genes from these transcripts
- Primers are created to test genes per condition
References
1. Moran NA: Symbiosis. Curr Biol 2006, 16:R866–R871.
2. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Loso T, Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF et al: Animals
in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci USA 2013, 110(9):3229-3236.
3. Haine ER: Symbiont-mediated protection. Proc R Soc B-Biol Sci 2008, 275(1633):353-361.
4. Lopanik NB: Chemical defensive symbioses in the marine environment. Funct Ecol 2013, 28:328-340.
5. Cragg GM, Newman DJ: Natural products: A continuing source of novel drug leads. Biochimica Et Biophysica Acta-General Subjects
2013, 1830(6):3670-3695.
6. Piel J: Metabolites from symbiotic bacteria. Natural Product Reports 2009, 26(3):338-362.
7. Gerwick WH, Moore BS: Lessons from the past and charting the future of marine natural products drug discovery and chemical
biology. Chem Biol 2012, 19(1):85-98.
Vertex labels swapping
Pathway
ko04146
ko03008
ko03013
ko00983
ko04530
ko00062
ko00400
ko00071
ko00100
ko00910
ko04122
ko04713
Model 1: Pvalue
L1
L2
Prob_Diff_Significance
99%
5%
0.94
99%
5%
0.94
99%
5%
0.94
99%
5%
0.94
99%
5%
0.94
1%
75%
0.74
1%
99%
0.98
99%
1%
0.98
99%
1%
0.98
4%
99%
0.95
99%
3%
0.97
99%
1%
0.99
Edges swapping
Pathway activity levels with ratio
Model2: Pvalue
Pathway L1
L2
Prob_Diff_Significance
ko04146
99%
5%
0.94
ko03008
99%
5%
0.94
ko03013
99%
5%
0.94
ko00983
99%
5%
0.94
ko04530
99%
5%
0.94
ko00130
99%
2%
0.97
ko00120
4%
58%
0.55
ko00072
1%
99%
0.98
ko00120
4%
58%
0.55
ko00400
1%
99%
0.98
ko00230
99%
5%
0.94
ko00627
1%
99%
0.99
ko00770
3%
99%
0.97
ko00980
99%
1%
0.99
ko04122
99%
1%
0.98
ko04630
99%
4%
0.96
ko04713
99%
4%
0.96
Highest_Diff_Activity_Level
ko04068
ko04145
ko04610
ko00051
ko00740
ko01230
ko04020
ko05012
ko00983
ko05034
Expression1 Expression2 Diff_Express
23.83
19.77
1.21
17.35
25.78
0.67
9.83
6.83
1.44
13.06
9.34
1.40
7.83
5.83
1.34
30.38
23.81
1.28
17.75
23.72
0.75
25.71
20.07
1.28
8.63
12.20
0.71
17.83
14.30
1.25
Conclusions and Future Work
Our experimental studies on Bugula neritina RNA-seq data (mutualistic symbiosis data vs
none) show that, by analyzing metabolic pathways using our tool XPathway, we can
effectively locate pathways which activities level significantly differ. This result is been
validated through qPCR.
This project is supported in part by the Molecular Basis of Disease fellowship of GSU