Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
In silico analysis of expressed sequence tags (EST)
from Trichostrongylus vitrinus (Nematoda):
comparison of the automated ESTExplorer
workflow platform with database searches.
Shivashankar H. Nagaraj and
Shoba Ranganathan
Professor and Chair – Bioinformatics
Biotechnology Research Institute and
Adjunct Professor
Dept. of Chemistry & Biomolecular Sciences
Dept. of Biochemistry
Macquarie University
National University of Singapore
Sydney, Australia
Singapore
([email protected])
([email protected])
Expressed Sequence Tags (ESTs)
 Unedited, short, single pass sequences
generated from 5' or 3' end of randomly
selected cDNA libraries in desired
cells/tissues/organ.
 Length: 200-700 bp (average 360 bp)
 Can be quickly generated at low cost
(“poor-man’s genome”)
 EST data is highly fragmented
 EST annotations have very little
biological information
 High-throughput in nature
EST Applications






Gene Discovery
Gene Structure Prediction
Expression Maps
Alternative Splicing
Identification and characterization of SNPs
Gene expression studies
 tissue or disease specific
 developmental stage
 Proteomics (for example peptide mass
fingerprinting)
 Identification of drug and vaccine candidates
Properties of ESTs
Genomic DNA
mRNA
cDNA
ESTs
An EST sequence
vector
5’ ESTs
repeats
3’ ESTs
high quality sequence
~ 50 - 500 bp
~ 500- 700 bp
~ 1-50 bp
High quality
vector
EST data resources




Available in plenty
Several dedicated databases
Fragmented
Quality dubious
 Need cleaning
 Clustering
 Annotation!
EST data repositories
dbEST release 061507 (June, 2007)
www.ncbi.nlm.nih.gov/dbEST/
43,396,096 ESTs from 659 different organisms
Homo sapiens (human) 8,119,106
Mus musculus (mouse) 4,850,243
Danio rerio (zebrafish) 1,350,105
Bos taurus (cattle) 1,318,208
Arabidopsis thaliana (thale cress) 1,276,692
Xenopus tropicalis 1,271,375
Oryza sativa (rice) 1,211,418
Zea mays (maize) 1,161,241
Triticum aestivum (wheat) 1,050,267
Overview of EST sequence analysis
Submit Data
Raw EST
sequence
data
Contamination check
Visualize results
Vector clipping
Poly-A removal
Repeat Masking
Gene annotation
RNAi
Gene mapping
Alternative splicing
SNPs
Clustering
Assembly
Consensus generation
Conceptual translation
Peptide annotation
Protein interactors
Gene Ontologies
KEGG
Evolution of ESTExplorer
Comparison of current
methods for EST analysis
Critical evaluation of contemporary tools
and EST analysis pipelines
Benchmarking of tools
using EST datasets
Lack of downstream functional annotation
at DNA and protein levels
ESTExplorer
Description of ESTExplorer
ESTExplorer – features
 Suite of programs to pre-process, assemble
and functionally annotate ESTs
 User-defined input and analysis – parameter
control
 Species-specific analysis
 Input: ESTs or assembled contigs
 Output: Assembled ESTs, Gene Ontologies,
mapping to Domains/Motifs, Pathway
mapping
Phase I (EST pre-processing)
Input Option 1
EST sequences
Short sequences
removed from the analysis
SeqClean
RepeatMasker
Quality values
(.qual)
CAP3
Input Option 2
assembled ESTs
Assembled ESTs
Phase II
(DNA level
Annotation)
Workflow
Phase III (Protein level Annotation)
ESTScan
BLASTX
BLAST2GO
InterProScan
KOBAS
Final output: Annotation summary for
assembled ESTs
ESTExplorer analysis and annotation workflow, showing Phase I (pre-processing and assembly),
Phase II (nucleotide-level annotation) and Phase III (protein-level annotation).
estexplorer.biolinfo.org
Annotation
summary
page
The worm in question
 Trichostrongylus vitrinus (order
Strongylida) is a parasitic nematode.
 Principal causative nematode associated with parasitic
diseases in sheep and cattle
 Current treatment for the disease : chemotherapeutic
agents (anti-helmintics)
 Disadvantages with current treatments:
a. Expensive and only partially effective
b. Anthelmintics drug resistance over the last decade
c. Residue problems in meat and milk
 Possible alternative: the development of anti-parasite
Nisbet AJ, et al. Int J Parasitol, 2004
drugs and/or vaccines
Creation of cDNA libraries and EST generation from
the parasite Trichostrongylus vitrinus
Phase I
Bioinformatics Analysis of the ESTs
Categorization of Differentially
expressed ESTs
Phase II
Subset of potential
drug target genes
 Isolation of full length genes
 Functional Genomics via RNAi
 Biochemical activity assays
 Proteomics
Phase III
Virtual and High-throughput screening
Phase IV
Pre-clinical and clinical evaluation
Comparative genomics
with nematodes
Phase I: EST pre-processing
Raw ESTs
male: 910
female: 866
EST pre-processing
(SeqClean & RepMasker)
male:902 female:857
EST clustering and assembly
(CAP3)
male contigs:180; singletons: 251
female contigs:143; singletons:122
Conceptual translation
(ESTSCAN)
peptide sequences
male : 400
female: 240
EST analysis schema
Database similarity searches against
NR and Wormpep
(BLASTX) for updating Nisbet et al results.
Database similarity searches for
locating parasitic nematode homologues
(BLASTX)
Locate RNAi phenotype from C. elegans
(BLASTX against Wormpep)
Database similarity searches for
locating mammalian homologues
(BLASTX against NR)
Gene Ontologies
BLAST2GO
male: 134
female:133
Phase II:
DNA level
annotation
EST analysis schema
Phase III: Protein
level Annotation
Secretome analysis
(SignalP, TMHMM,
PSORT)
male: 28
female: 12
Domain/Motif analysis
(InterProScan)
male: 141
female:120
Pathway Mapping
(KOBAS)
male: 120
female: 110
Results of overall EST analysis
Number of ESTs analysed : 1776 ( male : 910 female : 866)
Caenorhabditis elegans homologues
Homologues in parasitic nematodes
Homologues in non-nematodes
No significant match to any sequence
in the current databases
290 (41%)
329 (42%)
202 (28%)
218 (31%)
Gene Ontologies (GO) assigned
Pathway associations established
267 (38%)
230 (33%)
Of the C. elegans homologues, 90 entries had observed
‘non-wildtype’ RNAi phenotypes, including embryonic lethality,
maternal sterility, sterile progeny, larval arrest and slow
growth.
Results from BLAST vs. ESTExplorer
Manual annotation using BLAST
EST ID
E-value
BLAST results
PP1-gamma
serine/threonine
TVm0 2.00E- protein
2_C07 37
phosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular
characterization and transcriptional analysis of Tv-stp-1, a serine/threonine
phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet
AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Results from BLAST vs. ESTExplorer
Annotations obtained automatically from ESTExplorer
Manual annotation using
BLAST
BLAST results
EST ID
E-value
E-value
BLAST results
Annotations obtained automatically
from ESTExplorer
Metabolic
Gene Ontologies
BLAST results
E-value
Gene Ontologies
Domain/Motif
PathwayMetabolic
data data
Domain/Motif
Pathway Mapping
Mapping
chromatin modification, protein
chromatin modification,
amino acid dephosphorylation,
Long-term
embryonic cleavage, cytokinesis,
Metallophosphoe
protein amino acidmeiosis, oviposition, manganese potentiation,
Regulation of
sterase,
Serine/threoninedephosphorylation,ion binding, protein phosphatase actin
PP1-gamma
protein
type 1 activity, mitochondrial
cytoskeleton,
specific protein
embryonic cleavage,
Long-term
Metallophosph
serine/threonine
phosphatase
outer membrane, protein
binding,
Focal adhesion,
phosphatase and
TVm0
2.00Eprotein
catalyticgamma
mitosis,
glycogen
metabolic
Insulin
signaling
bis(5-nucleosyl)cytokinesis, meiosis,
potentiation,
oesterase,
2_C07
37
phosphatase
isoform isoform 1
1.00E-36
process, iron ion binding, nucleus
pathway
tetraphosphatase
oviposition, manganese ion
Regulation of Serine/threoni
binding, protein
actin
ne-specific
protein
phosphatase type 1 activity,
cytoskeleton, protein
phosphatase
mitochondrial outer
Focal
phosphatase
catalytic
membrane, protein binding,
adhesion,
and bis(5gamma
mitosis, glycogen metabolic Insulin
nucleosyl)isoform
process, iron ion binding,
signaling
tetraphosphat
isoform 1
1.00E-36 nucleus
pathway
ase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular
characterization and transcriptional analysis of Tv-stp-1, a serine/threonine
phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet
AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Redefining parameters for possible drug/
vaccine targets in parasitic nematodes
Secreted Proteins
 Parasites must secrete biologically active mediators to manipulate the host
environment in order to survive immune attack
Inhibit host antigen-processing pathways
Examples :
• Aspartyl protease inhibitor (API-1)
• Cystatin (cysteine protease inhibitor)
• Acetylcholinesterase (AChE)
Strong RNAi phenotypes in C. elegans
 Embryonic lethality
 Larval lethality
 Sterile progeny
 Larval arrest
 Maternal sterility
 Slow growth
Harcus YM, et al. Genome Biol, 2004
Delaney A, et al. Int J Parasitol 2005
Vanholme B, et al. Gene 2004
Absence of homologues in mammalian
host (nematode specific genes)
 Genes with specificity to nematodes
may serve as excellent targets for
drugs/vaccines with low toxicity to
humans and other vertebrates.
 Better understanding of the unusual
nematode biochemistry can also have
industrial or therapeutic value.
T. vitrinus male EST data comparison
C. elegans
169 (39.21%)
Venn diagram
19
6
55
89
3
45
2
191 (44.31%)
Parasitic nematodes
100 (23.20%)
Non-nematodes
T. vitrinus female EST data comparison
C. elegans
121 (45.6%)
Venn diagram
6
6
8
Non-nematodes
102 (38.4%)
85
3
24
26
Parasitic nematodes
138 (52.1%)
SimiTri : visualizing similarity relationships
for groups of sequences
Database 1
Query dataset
(EST sequences in this study)
BLAST
SimiTri provides a two-dimensional
display of relative similarity
relationships among three different
datasets.
 Java/Perl-based application
 Display of relative similarity relationships
 Analysis of relative similarity relationships
 Based on raw bit score from BLAST output
Parkinson J, et al. Bioinformatics, 2003
Parkinson J, et al. Nat Genetics, 2004
Database 3
Database 2
vizualization
Color scale of maximal
BLAST scores for tiles
a. SimiTri: Male dataset
431 male ESTs
C. elegans
19 169 (39.21%) 100150200250300
No match for
114 ESTs
6
55
100
89
3
Non-nematodes
100 (23.20%)
2
45
Parasitic nematodes
191 (44.31%)
Color scale of maximal
BLAST scores for tiles
b. SimiTri : Female dataset
265 female ESTs
C. elegans
6 121 (45.6%)
100150200250300
No match for
78 ESTs
6
24
100
85
8
Non-nematodes
102 (38.4%)
3
26
Parasitic nematodes
138 (52.1%)
SimiTri results:
T. vitrinus ESTs
are closer to
parasitic
nematodes and
C. elegans than
to other nonnematode
organisms.
BLAST
vs.
ESTExplorer
 ESTExplorer reliably and rapidly annotated 301 ESTs,
with pathway and GO information, eliminating 60 low
quality hits from database searches.
1776 ESTs
Analysis of individual ESTs
using BLAST
1776 ESTs
Analysis using semi-automated
approach via ESTExplorer
 Slow (took several weeks)
 Fast (took few minutes)
 BLAST results are the only evidence
for functional assignment
 Multiple evidences for annotation
supported by GO, InterProScan
and Pathway Mapping
 Peripheral annotation
 In depth annotation
Secreted protein analysis
Number of putative secreted proteins : 40
Immune-response
related genes
Ion channels
Signalling molecules
Proteases
Protease inhibitors
Candidate target genes in
Trichostrongylus vitrinus
EST contig/
singletons
Seq
Length (
in aa)
Homology
(Wormpep)
RNAi phenotype
(Wormbase)
Tvmale_Contig
9
113
Translation
initiation
factor 3,
subunit f
(eIF-3f)
embryonic lethal
(Emb)
larval arrest (Lva)
sterile progeny (Stp)
slow growth (Gro)
Tvfemale_Conti
g105
115
pbs-2 (Proteasom
e Beta
Subunit)
Tvmale 04_F02
96
Tvmale 02_C01
136
Gene Ontology
Mammali
an
homolog
Secreted
Protein
GO:0003743:translatio
n initiation factor
activity
NO
YES
embryonic lethal
(Emb) locomotion
abnormal
larval arrest (Lva)
maternal sterile
larval lethal (Let)
GO:0005839:
proteasome core
GO:0006511 :
ubiquitin-dependent
protein catabolism
GO:0008233 :
peptidase activity
GO:0004175 :
endopeptidase activity
YES
(weakly
similar)
YES
asb-2 - (ATP
Synthase B
homolog)
embryonic lethal
(Emb)
larval arrest (Lva)
sterile progeny (Stp)
slow growth (Gro)
maternal sterile
GO:0046933 :ATP
synthase activity
YES
(weakly
similar)
YES
RNA
splicing
embryonic lethal
(Emb)
GO:0006375: nuclear
mRNA splicing
NO
YES
Results from BLAST vs.
ESTExplorer
Manual annotation using BLAST
EST ID
E-value
BLAST results
PP1-gamma
serine/threonine
TVm0 2.00E- protein
2_C07 37
phosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular
characterization and transcriptional analysis of Tv-stp-1, a serine/threonine
phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet
AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Results from BLAST vs. ESTExplorer
Annotations obtained automatically from ESTExplorer
Manual annotation using
BLAST
BLAST results
EST ID
E-value
E-value
BLAST results
Annotations obtained automatically
from ESTExplorer
Metabolic
Gene Ontologies
BLAST results
E-value
Gene Ontologies
Domain/Motif
PathwayMetabolic
data data
Domain/Motif
Pathway Mapping
Mapping
chromatin modification, protein
chromatin modification,
amino acid dephosphorylation,
Long-term
embryonic cleavage, cytokinesis,
Metallophosphoe
protein amino acidmeiosis, oviposition, manganese potentiation,
Regulation of
sterase,
Serine/threoninedephosphorylation,ion binding, protein phosphatase actin
PP1-gamma
protein
type 1 activity, mitochondrial
cytoskeleton,
specific protein
embryonic cleavage,
Long-term
Metallophosph
serine/threonine
phosphatase
outer membrane, protein
binding,
Focal adhesion,
phosphatase and
TVm0
2.00Eprotein
catalyticgamma
mitosis,
glycogen
metabolic
Insulin
signaling
bis(5-nucleosyl)cytokinesis, meiosis,
potentiation,
oesterase,
2_C07
37
phosphatase
isoform isoform 1
1.00E-36
process, iron ion binding, nucleus
pathway
tetraphosphatase
oviposition, manganese ion
Regulation of Serine/threoni
binding, protein
actin
ne-specific
protein
phosphatase type 1 activity,
cytoskeleton, protein
phosphatase
mitochondrial outer
Focal
phosphatase
catalytic
membrane, protein binding,
adhesion,
and bis(5gamma
mitosis, glycogen metabolic Insulin
nucleosyl)isoform
process, iron ion binding,
signaling
tetraphosphat
isoform 1
1.00E-36 nucleus
pathway
ase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular
characterization and transcriptional analysis of Tv-stp-1, a serine/threonine
phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet
AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
ESTExplorer : applications so far ..
1. In silico analysis of expressed sequence tags (EST) from Trichostrongylus
vitrinus (Nematoda): comparison of the automated ESTExplorer workflow
platform with database searches. Nagaraj SH, Gasser RB, Ranganathan S.
2. A transcriptomic analysis of the adult stage of the bovine lungworm,
Dictyocaulus viviparus. Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder
T and Gasser RB. BMC Genomics, 2007, accepted
3. Gender-enriched transcripts in adult Haemonchus contortus (Nematoda) –
predicted functions and genetic interactions based on comparative analyses
with Caenorhabditis elegans. Campbell BE, Nagaraj SH, Hu M, Zhong W,
Sternberg PW, Ong EK, Loukas A, Ranganathan S, Beveridge A and Robin B.
Gasser.
4. Transcriptional changes in the third-stage larva of Ancylostoma caninum
(Nematoda) following in vitro serumstimulation, employing a suppressivesubtractive hybridisation-based microarray approach. Datu BJD, Gasser RB,
Nagaraj SH, Eng K. Onge, O’Donoghue P, McInnes R, Ranganathan S and
Loukas A
5. Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization
and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene.
Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser
RB. Exp Parasitol. 2007, accepted
Ref papers
Acknowledgements
Prof. Robin Gasser (University of Melbourne)
Genetics Technologies Pty. Ltd.
Australian Research Council LINKAGE PROJECT (LP0667795)
Some more examples of secreted proteins
M41 family metalloproteasemitochondrial membrane proteinase : Schistosoma
Pathogenesis related protein similar to helminth venom allergen homologues :Schistosoma
Related documents