Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Shivashankar H. Nagaraj and Shoba Ranganathan Professor and Chair – Bioinformatics Biotechnology Research Institute and Adjunct Professor Dept. of Chemistry & Biomolecular Sciences Dept. of Biochemistry Macquarie University National University of Singapore Sydney, Australia Singapore ([email protected]) ([email protected]) Expressed Sequence Tags (ESTs) Unedited, short, single pass sequences generated from 5' or 3' end of randomly selected cDNA libraries in desired cells/tissues/organ. Length: 200-700 bp (average 360 bp) Can be quickly generated at low cost (“poor-man’s genome”) EST data is highly fragmented EST annotations have very little biological information High-throughput in nature EST Applications Gene Discovery Gene Structure Prediction Expression Maps Alternative Splicing Identification and characterization of SNPs Gene expression studies tissue or disease specific developmental stage Proteomics (for example peptide mass fingerprinting) Identification of drug and vaccine candidates Properties of ESTs Genomic DNA mRNA cDNA ESTs An EST sequence vector 5’ ESTs repeats 3’ ESTs high quality sequence ~ 50 - 500 bp ~ 500- 700 bp ~ 1-50 bp High quality vector EST data resources Available in plenty Several dedicated databases Fragmented Quality dubious Need cleaning Clustering Annotation! EST data repositories dbEST release 061507 (June, 2007) www.ncbi.nlm.nih.gov/dbEST/ 43,396,096 ESTs from 659 different organisms Homo sapiens (human) 8,119,106 Mus musculus (mouse) 4,850,243 Danio rerio (zebrafish) 1,350,105 Bos taurus (cattle) 1,318,208 Arabidopsis thaliana (thale cress) 1,276,692 Xenopus tropicalis 1,271,375 Oryza sativa (rice) 1,211,418 Zea mays (maize) 1,161,241 Triticum aestivum (wheat) 1,050,267 Overview of EST sequence analysis Submit Data Raw EST sequence data Contamination check Visualize results Vector clipping Poly-A removal Repeat Masking Gene annotation RNAi Gene mapping Alternative splicing SNPs Clustering Assembly Consensus generation Conceptual translation Peptide annotation Protein interactors Gene Ontologies KEGG Evolution of ESTExplorer Comparison of current methods for EST analysis Critical evaluation of contemporary tools and EST analysis pipelines Benchmarking of tools using EST datasets Lack of downstream functional annotation at DNA and protein levels ESTExplorer Description of ESTExplorer ESTExplorer – features Suite of programs to pre-process, assemble and functionally annotate ESTs User-defined input and analysis – parameter control Species-specific analysis Input: ESTs or assembled contigs Output: Assembled ESTs, Gene Ontologies, mapping to Domains/Motifs, Pathway mapping Phase I (EST pre-processing) Input Option 1 EST sequences Short sequences removed from the analysis SeqClean RepeatMasker Quality values (.qual) CAP3 Input Option 2 assembled ESTs Assembled ESTs Phase II (DNA level Annotation) Workflow Phase III (Protein level Annotation) ESTScan BLASTX BLAST2GO InterProScan KOBAS Final output: Annotation summary for assembled ESTs ESTExplorer analysis and annotation workflow, showing Phase I (pre-processing and assembly), Phase II (nucleotide-level annotation) and Phase III (protein-level annotation). estexplorer.biolinfo.org Annotation summary page The worm in question Trichostrongylus vitrinus (order Strongylida) is a parasitic nematode. Principal causative nematode associated with parasitic diseases in sheep and cattle Current treatment for the disease : chemotherapeutic agents (anti-helmintics) Disadvantages with current treatments: a. Expensive and only partially effective b. Anthelmintics drug resistance over the last decade c. Residue problems in meat and milk Possible alternative: the development of anti-parasite Nisbet AJ, et al. Int J Parasitol, 2004 drugs and/or vaccines Creation of cDNA libraries and EST generation from the parasite Trichostrongylus vitrinus Phase I Bioinformatics Analysis of the ESTs Categorization of Differentially expressed ESTs Phase II Subset of potential drug target genes Isolation of full length genes Functional Genomics via RNAi Biochemical activity assays Proteomics Phase III Virtual and High-throughput screening Phase IV Pre-clinical and clinical evaluation Comparative genomics with nematodes Phase I: EST pre-processing Raw ESTs male: 910 female: 866 EST pre-processing (SeqClean & RepMasker) male:902 female:857 EST clustering and assembly (CAP3) male contigs:180; singletons: 251 female contigs:143; singletons:122 Conceptual translation (ESTSCAN) peptide sequences male : 400 female: 240 EST analysis schema Database similarity searches against NR and Wormpep (BLASTX) for updating Nisbet et al results. Database similarity searches for locating parasitic nematode homologues (BLASTX) Locate RNAi phenotype from C. elegans (BLASTX against Wormpep) Database similarity searches for locating mammalian homologues (BLASTX against NR) Gene Ontologies BLAST2GO male: 134 female:133 Phase II: DNA level annotation EST analysis schema Phase III: Protein level Annotation Secretome analysis (SignalP, TMHMM, PSORT) male: 28 female: 12 Domain/Motif analysis (InterProScan) male: 141 female:120 Pathway Mapping (KOBAS) male: 120 female: 110 Results of overall EST analysis Number of ESTs analysed : 1776 ( male : 910 female : 866) Caenorhabditis elegans homologues Homologues in parasitic nematodes Homologues in non-nematodes No significant match to any sequence in the current databases 290 (41%) 329 (42%) 202 (28%) 218 (31%) Gene Ontologies (GO) assigned Pathway associations established 267 (38%) 230 (33%) Of the C. elegans homologues, 90 entries had observed ‘non-wildtype’ RNAi phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth. Results from BLAST vs. ESTExplorer Manual annotation using BLAST EST ID E-value BLAST results PP1-gamma serine/threonine TVm0 2.00E- protein 2_C07 37 phosphatase Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24; Results from BLAST vs. ESTExplorer Annotations obtained automatically from ESTExplorer Manual annotation using BLAST BLAST results EST ID E-value E-value BLAST results Annotations obtained automatically from ESTExplorer Metabolic Gene Ontologies BLAST results E-value Gene Ontologies Domain/Motif PathwayMetabolic data data Domain/Motif Pathway Mapping Mapping chromatin modification, protein chromatin modification, amino acid dephosphorylation, Long-term embryonic cleavage, cytokinesis, Metallophosphoe protein amino acidmeiosis, oviposition, manganese potentiation, Regulation of sterase, Serine/threoninedephosphorylation,ion binding, protein phosphatase actin PP1-gamma protein type 1 activity, mitochondrial cytoskeleton, specific protein embryonic cleavage, Long-term Metallophosph serine/threonine phosphatase outer membrane, protein binding, Focal adhesion, phosphatase and TVm0 2.00Eprotein catalyticgamma mitosis, glycogen metabolic Insulin signaling bis(5-nucleosyl)cytokinesis, meiosis, potentiation, oesterase, 2_C07 37 phosphatase isoform isoform 1 1.00E-36 process, iron ion binding, nucleus pathway tetraphosphatase oviposition, manganese ion Regulation of Serine/threoni binding, protein actin ne-specific protein phosphatase type 1 activity, cytoskeleton, protein phosphatase mitochondrial outer Focal phosphatase catalytic membrane, protein binding, adhesion, and bis(5gamma mitosis, glycogen metabolic Insulin nucleosyl)isoform process, iron ion binding, signaling tetraphosphat isoform 1 1.00E-36 nucleus pathway ase Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24; Redefining parameters for possible drug/ vaccine targets in parasitic nematodes Secreted Proteins Parasites must secrete biologically active mediators to manipulate the host environment in order to survive immune attack Inhibit host antigen-processing pathways Examples : • Aspartyl protease inhibitor (API-1) • Cystatin (cysteine protease inhibitor) • Acetylcholinesterase (AChE) Strong RNAi phenotypes in C. elegans Embryonic lethality Larval lethality Sterile progeny Larval arrest Maternal sterility Slow growth Harcus YM, et al. Genome Biol, 2004 Delaney A, et al. Int J Parasitol 2005 Vanholme B, et al. Gene 2004 Absence of homologues in mammalian host (nematode specific genes) Genes with specificity to nematodes may serve as excellent targets for drugs/vaccines with low toxicity to humans and other vertebrates. Better understanding of the unusual nematode biochemistry can also have industrial or therapeutic value. T. vitrinus male EST data comparison C. elegans 169 (39.21%) Venn diagram 19 6 55 89 3 45 2 191 (44.31%) Parasitic nematodes 100 (23.20%) Non-nematodes T. vitrinus female EST data comparison C. elegans 121 (45.6%) Venn diagram 6 6 8 Non-nematodes 102 (38.4%) 85 3 24 26 Parasitic nematodes 138 (52.1%) SimiTri : visualizing similarity relationships for groups of sequences Database 1 Query dataset (EST sequences in this study) BLAST SimiTri provides a two-dimensional display of relative similarity relationships among three different datasets. Java/Perl-based application Display of relative similarity relationships Analysis of relative similarity relationships Based on raw bit score from BLAST output Parkinson J, et al. Bioinformatics, 2003 Parkinson J, et al. Nat Genetics, 2004 Database 3 Database 2 vizualization Color scale of maximal BLAST scores for tiles a. SimiTri: Male dataset 431 male ESTs C. elegans 19 169 (39.21%) 100150200250300 No match for 114 ESTs 6 55 100 89 3 Non-nematodes 100 (23.20%) 2 45 Parasitic nematodes 191 (44.31%) Color scale of maximal BLAST scores for tiles b. SimiTri : Female dataset 265 female ESTs C. elegans 6 121 (45.6%) 100150200250300 No match for 78 ESTs 6 24 100 85 8 Non-nematodes 102 (38.4%) 3 26 Parasitic nematodes 138 (52.1%) SimiTri results: T. vitrinus ESTs are closer to parasitic nematodes and C. elegans than to other nonnematode organisms. BLAST vs. ESTExplorer ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches. 1776 ESTs Analysis of individual ESTs using BLAST 1776 ESTs Analysis using semi-automated approach via ESTExplorer Slow (took several weeks) Fast (took few minutes) BLAST results are the only evidence for functional assignment Multiple evidences for annotation supported by GO, InterProScan and Pathway Mapping Peripheral annotation In depth annotation Secreted protein analysis Number of putative secreted proteins : 40 Immune-response related genes Ion channels Signalling molecules Proteases Protease inhibitors Candidate target genes in Trichostrongylus vitrinus EST contig/ singletons Seq Length ( in aa) Homology (Wormpep) RNAi phenotype (Wormbase) Tvmale_Contig 9 113 Translation initiation factor 3, subunit f (eIF-3f) embryonic lethal (Emb) larval arrest (Lva) sterile progeny (Stp) slow growth (Gro) Tvfemale_Conti g105 115 pbs-2 (Proteasom e Beta Subunit) Tvmale 04_F02 96 Tvmale 02_C01 136 Gene Ontology Mammali an homolog Secreted Protein GO:0003743:translatio n initiation factor activity NO YES embryonic lethal (Emb) locomotion abnormal larval arrest (Lva) maternal sterile larval lethal (Let) GO:0005839: proteasome core GO:0006511 : ubiquitin-dependent protein catabolism GO:0008233 : peptidase activity GO:0004175 : endopeptidase activity YES (weakly similar) YES asb-2 - (ATP Synthase B homolog) embryonic lethal (Emb) larval arrest (Lva) sterile progeny (Stp) slow growth (Gro) maternal sterile GO:0046933 :ATP synthase activity YES (weakly similar) YES RNA splicing embryonic lethal (Emb) GO:0006375: nuclear mRNA splicing NO YES Results from BLAST vs. ESTExplorer Manual annotation using BLAST EST ID E-value BLAST results PP1-gamma serine/threonine TVm0 2.00E- protein 2_C07 37 phosphatase Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24; Results from BLAST vs. ESTExplorer Annotations obtained automatically from ESTExplorer Manual annotation using BLAST BLAST results EST ID E-value E-value BLAST results Annotations obtained automatically from ESTExplorer Metabolic Gene Ontologies BLAST results E-value Gene Ontologies Domain/Motif PathwayMetabolic data data Domain/Motif Pathway Mapping Mapping chromatin modification, protein chromatin modification, amino acid dephosphorylation, Long-term embryonic cleavage, cytokinesis, Metallophosphoe protein amino acidmeiosis, oviposition, manganese potentiation, Regulation of sterase, Serine/threoninedephosphorylation,ion binding, protein phosphatase actin PP1-gamma protein type 1 activity, mitochondrial cytoskeleton, specific protein embryonic cleavage, Long-term Metallophosph serine/threonine phosphatase outer membrane, protein binding, Focal adhesion, phosphatase and TVm0 2.00Eprotein catalyticgamma mitosis, glycogen metabolic Insulin signaling bis(5-nucleosyl)cytokinesis, meiosis, potentiation, oesterase, 2_C07 37 phosphatase isoform isoform 1 1.00E-36 process, iron ion binding, nucleus pathway tetraphosphatase oviposition, manganese ion Regulation of Serine/threoni binding, protein actin ne-specific protein phosphatase type 1 activity, cytoskeleton, protein phosphatase mitochondrial outer Focal phosphatase catalytic membrane, protein binding, adhesion, and bis(5gamma mitosis, glycogen metabolic Insulin nucleosyl)isoform process, iron ion binding, signaling tetraphosphat isoform 1 1.00E-36 nucleus pathway ase Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24; ESTExplorer : applications so far .. 1. In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Nagaraj SH, Gasser RB, Ranganathan S. 2. A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus. Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder T and Gasser RB. BMC Genomics, 2007, accepted 3. Gender-enriched transcripts in adult Haemonchus contortus (Nematoda) – predicted functions and genetic interactions based on comparative analyses with Caenorhabditis elegans. Campbell BE, Nagaraj SH, Hu M, Zhong W, Sternberg PW, Ong EK, Loukas A, Ranganathan S, Beveridge A and Robin B. Gasser. 4. Transcriptional changes in the third-stage larva of Ancylostoma caninum (Nematoda) following in vitro serumstimulation, employing a suppressivesubtractive hybridisation-based microarray approach. Datu BJD, Gasser RB, Nagaraj SH, Eng K. Onge, O’Donoghue P, McInnes R, Ranganathan S and Loukas A 5. Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007, accepted Ref papers Acknowledgements Prof. Robin Gasser (University of Melbourne) Genetics Technologies Pty. Ltd. Australian Research Council LINKAGE PROJECT (LP0667795) Some more examples of secreted proteins M41 family metalloproteasemitochondrial membrane proteinase : Schistosoma Pathogenesis related protein similar to helminth venom allergen homologues :Schistosoma