Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Metabolic pathway activity estimation from RNA-Seq data Department of Computer Science, Department of Biology, Georgia State University Computer Science and Engineering, University of Connecticut Yvette Temate-Tiagueu, Qiong Cheng, Meril Mathew, Igor Mandric, Olga Glebova, Nicole Beth Lopanik, Ion Mandoiu and Alex Zelikovsky Abstract Our Contribution The application of RNA-Seq has allowed various differential analysis studies including differential expression for pathways. A standard approach to study the metabolic differences between species is metabolic pathway. In this study, we introduce a novel approach to characterize pathways activity levels of two samples. We present XPathway, a set of pathways activity analysis tools based on Kegg-Kaas mapping of proteins to pathways. We applied our proposed methods on RNA-Seq Bugula neritina metagenomics data. We successfully identified several pathways with differential activity levels using our novel computational approaches implemented in XPathway. Further validation of initial results is conducted through qPCR. Using Kegg: database resource for understanding high-level functions and utilities of the biological system from molecular-level information. [Kanehisa M., and Goto S., 2000] (1) A novel graph-based approach to analyze pathways significance (2) Representing a pathway as a set an inferring activity from the information extracted from those sets (3) Validating the two approaches through differential expression analysis at the transcripts and genes level and also through qPCR experiment RNA-seq reads 2 Samples Objectives Develop efficient algorithms for reliable estimation of pathway activity level Identify pathways which activities significantly differ between two conditions Model 1: permutation of labels d d b c e a c c In induced graph: • • • • • # nodes N d b # edges M # green connected components # 0 in- & out-degrees Density of the induced graph: M/(N-1) d b 𝒘EM-based estimation of pathway activity = pathway 𝒈 = ortholog group 𝒇𝒘 = g ∊𝒘 = (𝟏 + 𝜹 𝒘 = binary activity status of w 𝒈𝒘 = participation of ortholog group g in pathway w 𝒈𝒘 𝒇𝒘 = activity level of pathway w 𝑻𝒘 = threshold of w 𝟏, 𝜹(𝒘) = 𝟎, Bootstrapping: - Repeat 1000 times 1. Randomly switch edges 2. Compute density of the largest component - Sort wrt to density - Find the rank of the observed induced subgraph Pathway activity Illumina sequence paired-end reads: Sample 1: Bugula with symbiont Sample 2: Bugula without symbiont 𝒈𝒘 50bp paired-end reads 200bp mean fragment length Assembly into contigs by Trinity BLAST with Swissprot database 𝜹(𝒘′))−𝟏 Results 𝒘′ э 𝒈,𝒘′ ≠ 𝒘 𝐢𝐟 𝒇𝒘 ≥ 𝑻𝒘 𝐢𝐟 𝒇𝒘 < 𝑻𝒘 Validation Selected pathways for qPCR validation Pathway #Mapped contigs DE contigs Ratio of DE Pathway name ko00062 14 3 21.43% Fatty acid elongation ko00100 8 1 12.50% Steroid biosynthesis ko00250 39 4 10.26% Alanine, aspartate and glutamate metabolism ko04146 98 15 15.31% Peroxisome ko03008 67 10 14.93% Ribosome biogenesis in eukaryotes ko03013 148 22 14.86% RNA transport ko00983 28 4 14.29% Drug metabolism - other enzymes ko04530 237 15 6.33% Tight junction qPCR IsoDE Contigs validation Differentially expressed pathways 1. Deep-water (West coast of United States) 2. Shallow-water (West and Southern East coasts) 3. Northern Atlantic (Northern East coast) e a Binary EM In United States - Three sibling species: Model 2: permutation of edges a BLAST Experimental studies: Bugula neritina Topology-based estimation of pathway significance b Pathway significance Contigs Ortholog groups K00161 MAFSAED VLK EYDKEGG, K00162 K00163 RRMEALSEED Proteins Experimental validation Methods c Graph-based Sample 2 Sample 1 a Ortholog groups K00161 K00162KEGG, K00163SEED Trinity For gene expression analyses: Preliminary results - Select pathways with significantly different activity - Select DE transcripts from these pathways More primers ordered - Select the genes from these transcripts - Primers are created to test genes per condition References 1. Moran NA: Symbiosis. Curr Biol 2006, 16:R866–R871. 2. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Loso T, Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF et al: Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci USA 2013, 110(9):3229-3236. 3. Haine ER: Symbiont-mediated protection. Proc R Soc B-Biol Sci 2008, 275(1633):353-361. 4. Lopanik NB: Chemical defensive symbioses in the marine environment. Funct Ecol 2013, 28:328-340. 5. Cragg GM, Newman DJ: Natural products: A continuing source of novel drug leads. Biochimica Et Biophysica Acta-General Subjects 2013, 1830(6):3670-3695. 6. Piel J: Metabolites from symbiotic bacteria. Natural Product Reports 2009, 26(3):338-362. 7. Gerwick WH, Moore BS: Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. Chem Biol 2012, 19(1):85-98. Vertex labels swapping Pathway ko04146 ko03008 ko03013 ko00983 ko04530 ko00062 ko00400 ko00071 ko00100 ko00910 ko04122 ko04713 Model 1: Pvalue L1 L2 Prob_Diff_Significance 99% 5% 0.94 99% 5% 0.94 99% 5% 0.94 99% 5% 0.94 99% 5% 0.94 1% 75% 0.74 1% 99% 0.98 99% 1% 0.98 99% 1% 0.98 4% 99% 0.95 99% 3% 0.97 99% 1% 0.99 Edges swapping Pathway activity levels with ratio Model2: Pvalue Pathway L1 L2 Prob_Diff_Significance ko04146 99% 5% 0.94 ko03008 99% 5% 0.94 ko03013 99% 5% 0.94 ko00983 99% 5% 0.94 ko04530 99% 5% 0.94 ko00130 99% 2% 0.97 ko00120 4% 58% 0.55 ko00072 1% 99% 0.98 ko00120 4% 58% 0.55 ko00400 1% 99% 0.98 ko00230 99% 5% 0.94 ko00627 1% 99% 0.99 ko00770 3% 99% 0.97 ko00980 99% 1% 0.99 ko04122 99% 1% 0.98 ko04630 99% 4% 0.96 ko04713 99% 4% 0.96 Highest_Diff_Activity_Level ko04068 ko04145 ko04610 ko00051 ko00740 ko01230 ko04020 ko05012 ko00983 ko05034 Expression1 Expression2 Diff_Express 23.83 19.77 1.21 17.35 25.78 0.67 9.83 6.83 1.44 13.06 9.34 1.40 7.83 5.83 1.34 30.38 23.81 1.28 17.75 23.72 0.75 25.71 20.07 1.28 8.63 12.20 0.71 17.83 14.30 1.25 Conclusions and Future Work Our experimental studies on Bugula neritina RNA-seq data (mutualistic symbiosis data vs none) show that, by analyzing metabolic pathways using our tool XPathway, we can effectively locate pathways which activities level significantly differ. This result is been validated through qPCR. This project is supported in part by the Molecular Basis of Disease fellowship of GSU