Download Catalogue of Expressed Sequence Tags (ESTs) from

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

DNA vaccination wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Transposable element wikipedia , lookup

NEDD9 wikipedia , lookup

Gene desert wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genome (book) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Primary transcript wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Pathogenomics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Minimal genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Sequence alignment wikipedia , lookup

Microsatellite wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic library wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Catalogue of Expressed Sequence Tags (ESTs) from Dust Mites
Dang, P.K.1 and Chew,F.T.2
Department of Biological Sciences, The National University of Singapore
Blk S3, Science Dr 4, Singapore 117543.
ABSTRACT
A large part of the currently available nucleotide data comes from the
partial sequences of expressed sequence tags (ESTs). Large numbers of
randomly selected cDNA clones are partially sequenced to investigate the
level and complexity of gene expression in the sampled organism. Here, I
present a small-scale production of ESTs from the storage dust mite Acarus
siro and its cross-study with the EST database of another storage mite
Glycyphagus domesticus. 824 Acarus siro ESTs were analyzed, which
represented 626 unique transcripts. Putative functions were assigned to 585
of these transcripts. Of particular interest are the 7% of ESTs which encode
homologues of allergenic proteins, the most abundant ones being glutathione
s-transferase, thioredoxin and aldehyde dehydrogenase, besides matches to
allergens of other mite species. In additional, 7% ESTs show no significant
similarity to any other DNA or protein sequences in the database. This lack
of similarity to other sequences may indicate some role of these sequences,
specific only to Acarus siro. This has opened up new possibilities in allergy
research.
INTRODUCTION
In order to identify the allergenic components in mite species,
expressed sequence tags (ESTs), which are clones selected at random from
cDNA libraries, can be used efficiently. These have already been generated
for several genome research projects like those of zebra fish2 and mouse and
have revealed many new genes. ESTs represent a particular type of
sequence-tagged sites useful for the physical mapping of genomes. This has
accelerated the process of gene discovery and genome annotation
considerably. This aim of this investigation is to isolate and putatively
identify gene fragments (ESTs) which could be used in the molecular
investigations of the storage mite Acarus siro. Besides Acarus siro, this
1
2
Undergraduate Research Student
Assistant Professor / Supervisor
study also includes the cross-study of catalogued EST sequence data of
another storage mite, Glycyphagus domesticus.
MATERIALS AND METHODS
The cDNA library of Acarus siro, used in this project for generation
of ESTs, had an insert size >1KB. Single clones were cultured from this
library of Acarus siro and their plasmid DNA was isolated and quantified.
These were used as templates for the sequencing Polymerase Chain Reaction
(PCR) using in which partial sequences (ESTs) were generated from 5’ends.
The PCR products were purified and run on sequencers. The sequence
electropherograms obtained were converted into FASTA sequences using
PHRED base caller and then screened for vector sequences. These were
searched against the protein database of ESTs using BLASTX program. Top
scoring matches with an e-value <=0.001 were accepted. The putative
proteins represented by the ESTs were classified into 11 groups based on
their putative functions. The ESTs were also assembled into contiguous
sequences using PHRAP program to find out the number of genes they
represent and the redundancy rate.
RESULTS AND DISCUSSION
Using blue/white plaque selection, 1010 colonies were selected. Their
plasmids were isolated, the DNA concentration ranging from 50-700µg/ml.
All of these 1010 samples were sequenced from the 5’ end. PHRED base
calling gave the number of high quality sequences as 871, giving a success
rate of 86.2%. The average read-length of the “cleaned” sequences was 530
bp. Sequences shorter than 100 bases were removed. The final number of
ESTs used for BLAST search was 824.
Functional classification: ESTs with putative protein functions were
classified into 11 functional groups, based on the catalogues for human brain
cDNA libraries (Adams et al., 1991), Blomia tropicalis, Dermatophagoides
farinae and Tyrophagus putrescentiae (Chew et al., unpublished data) and
extensive literature search. For both Acarus siro and Glycyphagus
domesticus, the distribution of proteins in the various groups was nonuniform, the group ‘Unclassified’ being the largest in both species (31 and
38%, respectively). Majority of the classified proteins (a total of 27 and
26%, respectively) had functions related to basic housekeeping
responsibilities of the cells, namely metabolism, gene expression and protein
synthesis and DNA synthesis.
Heat shock proteins (HSPs) made the bulk of the defence and homeostasis
group (2% and 5%, respectively). This is convincing because storage mites
can survive at humidity levels of 60% or more only3 and need the HSPs to
maintain their homeostasis, thus increasing their survival rate.
The genes of more direct interest were the allergens. This group constituted
7% of the putative proteins in Acarus siro, and 4% of the putative proteins in
Glycyphagus domesticus. It contained both putative mite- and not mite
allergens, the mite allergens being about 30% of the total allergens. There
were 18 distinct allergens in Acarus siro and 27 in Glycyphagus domesticus.
These were dust mite allergen groups 1, 3, 6, 8, 13, 15 and 98kDa protein,
found in both Acarus siro and Glycyphagus domesticus. In addition,
Glycyphagus domesticus had an additional 8 allergens distinct to itself.
These were dust mite allergen groups 5 and7, tropomyosin and superoxide
dismutase. It also contained allergens from other organisms such as plants
(Juniperus virginiana and Hevea brasiliensis), yeast (Malassezia
sympodialis), and fruit (Actnidia chinensis). Non-mite allergens were
calcium binding protein, Venom allergen Sol i 3), heat shock protein 70,
profilin, enolase, ovalbumin, alcohol dehydrogenase, thioredoxin, Mal s and
actinadain-like allergens homologues. Since Glycyphagus domesticus had
more unique allergens than Acarus siro, it can be said that its allergic effects
are probably more varied than those of Acarus siro.
7% Acarus siro sequences and 10% Glycyphagus domesticus sequences had
no hits in the BLAST-X results. These could be interpreted in two ways –
either the sequences were of low quality (with mismatches and errors) and
hence, could not be aligned to get a high score match; alternatively, such a
sequence with no hits could come from a gene unique to either Acarus siro
or Glycyphagus domesticus that has not yet been identified. Thus, even
sequences with no BLAST results could be biologically significant, if not
statistically significant.
Contig assembly and redundancy analysis: Since most of the genes in the
library are represented by more than 1 ESTs, the purpose of contig assembly
is to align the ESTs with overlapping regions into contiguous sequences to
get a clearer representation of the genes of the organism.
From the 824 ESTs analyzed for Acarus siro using PHRAP, 128 contiguous
sequences and 521 singletons were obtained. The number of reads in the
contigs varied from 2 to 29. The number of unique genes was shown to be
649. This accounted for 79 % of the analyzed DNA library. The
corresponding redundancy rate is 21 %; i.e., there is a 21 % chance that any
new sequence from the library will already be represented in the data set.
On the other hand, for the Glycyphagus domesticus library, 1600 sequences
were assembled to give 151 contigs that comprised 729 ESTs. The
maximum number of ESTs found in a cluster was 35. There were 871
singletons. The number of unique genes in the library was expected to be
1022, a redundancy of 36%. Considering the fact that both the libraries were
non-normalized, we observe that the percentage redundancy in both the
cDNA libraries is relatively low.
Predicting gene function: The occurrence of functional domains, patterns
and motifs in proteins is very specific. Through evolution, many mutations
occurring at the gene and protein level cause the diversification of genes.
However, the domains with functional significance are conserved as
numerous short sequence combinations, as seen in the above patterns. This
leads us to an important application of ESTs – gene discovery.
As we noted above, many ESTs in the database returned ‘no hits’ in
BLASTX. When such sequences were studied for existence of functional
domains (using Multiple Sequence Alignment program CLUSTALW4 and
PROSITE5 database) even ESTs showing no similarity to any other
sequences in the database showed the existence of certain conserved
functional domains. These could be instrumental in predicting the functions
of the translated products of such genes, followed by wet experiments to test
the predictions. This will certainly boost the course of gene discovery. Thus,
the BLAST results that return with a ‘no hit’ do not necessarily imply that
our search for a putative gene function should end there. In fact, it could be
the door to another avenue for research.
References:
1. 1. Adams, M.D., Kelly, J.M., Gocayne, J.D., Dubnick, M.,
Polymerpopoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno.,
R.F., Kerlavage, A.R., McCombie, W.R., Venter, J.C. (1991)
“Complementary DNA Sequencing: Expressed Sequence Tags and Human
Genome Project” Science 252, 1651-1656.
2. Clark MD, Hennig S, Herwig R, Clifton SW, Marra MA, Lehrach H,
Johnson SL, and the WU-GSC EST Group (2001) “An oligonucleotide
fingerprint normalized and expressed sequence tag characterized zebrafish
cDNA library” Genome Res.11:1594-1602.
3. Studies at Medical Entomology Centre, Cambridge. Can be accessed at
www.baxicleanairsystems.co.uk/health_housing_pages/erad_mites.htm
4. CLUSTALW - http://www.ebi.ac.uk/clustalw
5. PROSITE - http://tw.expasy.org/prosite