* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Catalogue of Expressed Sequence Tags (ESTs) from
Gene expression programming wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
DNA vaccination wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Transposable element wikipedia , lookup
Gene desert wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome (book) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Primary transcript wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Pathogenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Minimal genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Sequence alignment wikipedia , lookup
Microsatellite wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Metagenomics wikipedia , lookup
Catalogue of Expressed Sequence Tags (ESTs) from Dust Mites Dang, P.K.1 and Chew,F.T.2 Department of Biological Sciences, The National University of Singapore Blk S3, Science Dr 4, Singapore 117543. ABSTRACT A large part of the currently available nucleotide data comes from the partial sequences of expressed sequence tags (ESTs). Large numbers of randomly selected cDNA clones are partially sequenced to investigate the level and complexity of gene expression in the sampled organism. Here, I present a small-scale production of ESTs from the storage dust mite Acarus siro and its cross-study with the EST database of another storage mite Glycyphagus domesticus. 824 Acarus siro ESTs were analyzed, which represented 626 unique transcripts. Putative functions were assigned to 585 of these transcripts. Of particular interest are the 7% of ESTs which encode homologues of allergenic proteins, the most abundant ones being glutathione s-transferase, thioredoxin and aldehyde dehydrogenase, besides matches to allergens of other mite species. In additional, 7% ESTs show no significant similarity to any other DNA or protein sequences in the database. This lack of similarity to other sequences may indicate some role of these sequences, specific only to Acarus siro. This has opened up new possibilities in allergy research. INTRODUCTION In order to identify the allergenic components in mite species, expressed sequence tags (ESTs), which are clones selected at random from cDNA libraries, can be used efficiently. These have already been generated for several genome research projects like those of zebra fish2 and mouse and have revealed many new genes. ESTs represent a particular type of sequence-tagged sites useful for the physical mapping of genomes. This has accelerated the process of gene discovery and genome annotation considerably. This aim of this investigation is to isolate and putatively identify gene fragments (ESTs) which could be used in the molecular investigations of the storage mite Acarus siro. Besides Acarus siro, this 1 2 Undergraduate Research Student Assistant Professor / Supervisor study also includes the cross-study of catalogued EST sequence data of another storage mite, Glycyphagus domesticus. MATERIALS AND METHODS The cDNA library of Acarus siro, used in this project for generation of ESTs, had an insert size >1KB. Single clones were cultured from this library of Acarus siro and their plasmid DNA was isolated and quantified. These were used as templates for the sequencing Polymerase Chain Reaction (PCR) using in which partial sequences (ESTs) were generated from 5’ends. The PCR products were purified and run on sequencers. The sequence electropherograms obtained were converted into FASTA sequences using PHRED base caller and then screened for vector sequences. These were searched against the protein database of ESTs using BLASTX program. Top scoring matches with an e-value <=0.001 were accepted. The putative proteins represented by the ESTs were classified into 11 groups based on their putative functions. The ESTs were also assembled into contiguous sequences using PHRAP program to find out the number of genes they represent and the redundancy rate. RESULTS AND DISCUSSION Using blue/white plaque selection, 1010 colonies were selected. Their plasmids were isolated, the DNA concentration ranging from 50-700µg/ml. All of these 1010 samples were sequenced from the 5’ end. PHRED base calling gave the number of high quality sequences as 871, giving a success rate of 86.2%. The average read-length of the “cleaned” sequences was 530 bp. Sequences shorter than 100 bases were removed. The final number of ESTs used for BLAST search was 824. Functional classification: ESTs with putative protein functions were classified into 11 functional groups, based on the catalogues for human brain cDNA libraries (Adams et al., 1991), Blomia tropicalis, Dermatophagoides farinae and Tyrophagus putrescentiae (Chew et al., unpublished data) and extensive literature search. For both Acarus siro and Glycyphagus domesticus, the distribution of proteins in the various groups was nonuniform, the group ‘Unclassified’ being the largest in both species (31 and 38%, respectively). Majority of the classified proteins (a total of 27 and 26%, respectively) had functions related to basic housekeeping responsibilities of the cells, namely metabolism, gene expression and protein synthesis and DNA synthesis. Heat shock proteins (HSPs) made the bulk of the defence and homeostasis group (2% and 5%, respectively). This is convincing because storage mites can survive at humidity levels of 60% or more only3 and need the HSPs to maintain their homeostasis, thus increasing their survival rate. The genes of more direct interest were the allergens. This group constituted 7% of the putative proteins in Acarus siro, and 4% of the putative proteins in Glycyphagus domesticus. It contained both putative mite- and not mite allergens, the mite allergens being about 30% of the total allergens. There were 18 distinct allergens in Acarus siro and 27 in Glycyphagus domesticus. These were dust mite allergen groups 1, 3, 6, 8, 13, 15 and 98kDa protein, found in both Acarus siro and Glycyphagus domesticus. In addition, Glycyphagus domesticus had an additional 8 allergens distinct to itself. These were dust mite allergen groups 5 and7, tropomyosin and superoxide dismutase. It also contained allergens from other organisms such as plants (Juniperus virginiana and Hevea brasiliensis), yeast (Malassezia sympodialis), and fruit (Actnidia chinensis). Non-mite allergens were calcium binding protein, Venom allergen Sol i 3), heat shock protein 70, profilin, enolase, ovalbumin, alcohol dehydrogenase, thioredoxin, Mal s and actinadain-like allergens homologues. Since Glycyphagus domesticus had more unique allergens than Acarus siro, it can be said that its allergic effects are probably more varied than those of Acarus siro. 7% Acarus siro sequences and 10% Glycyphagus domesticus sequences had no hits in the BLAST-X results. These could be interpreted in two ways – either the sequences were of low quality (with mismatches and errors) and hence, could not be aligned to get a high score match; alternatively, such a sequence with no hits could come from a gene unique to either Acarus siro or Glycyphagus domesticus that has not yet been identified. Thus, even sequences with no BLAST results could be biologically significant, if not statistically significant. Contig assembly and redundancy analysis: Since most of the genes in the library are represented by more than 1 ESTs, the purpose of contig assembly is to align the ESTs with overlapping regions into contiguous sequences to get a clearer representation of the genes of the organism. From the 824 ESTs analyzed for Acarus siro using PHRAP, 128 contiguous sequences and 521 singletons were obtained. The number of reads in the contigs varied from 2 to 29. The number of unique genes was shown to be 649. This accounted for 79 % of the analyzed DNA library. The corresponding redundancy rate is 21 %; i.e., there is a 21 % chance that any new sequence from the library will already be represented in the data set. On the other hand, for the Glycyphagus domesticus library, 1600 sequences were assembled to give 151 contigs that comprised 729 ESTs. The maximum number of ESTs found in a cluster was 35. There were 871 singletons. The number of unique genes in the library was expected to be 1022, a redundancy of 36%. Considering the fact that both the libraries were non-normalized, we observe that the percentage redundancy in both the cDNA libraries is relatively low. Predicting gene function: The occurrence of functional domains, patterns and motifs in proteins is very specific. Through evolution, many mutations occurring at the gene and protein level cause the diversification of genes. However, the domains with functional significance are conserved as numerous short sequence combinations, as seen in the above patterns. This leads us to an important application of ESTs – gene discovery. As we noted above, many ESTs in the database returned ‘no hits’ in BLASTX. When such sequences were studied for existence of functional domains (using Multiple Sequence Alignment program CLUSTALW4 and PROSITE5 database) even ESTs showing no similarity to any other sequences in the database showed the existence of certain conserved functional domains. These could be instrumental in predicting the functions of the translated products of such genes, followed by wet experiments to test the predictions. This will certainly boost the course of gene discovery. Thus, the BLAST results that return with a ‘no hit’ do not necessarily imply that our search for a putative gene function should end there. In fact, it could be the door to another avenue for research. References: 1. 1. Adams, M.D., Kelly, J.M., Gocayne, J.D., Dubnick, M., Polymerpopoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno., R.F., Kerlavage, A.R., McCombie, W.R., Venter, J.C. (1991) “Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project” Science 252, 1651-1656. 2. Clark MD, Hennig S, Herwig R, Clifton SW, Marra MA, Lehrach H, Johnson SL, and the WU-GSC EST Group (2001) “An oligonucleotide fingerprint normalized and expressed sequence tag characterized zebrafish cDNA library” Genome Res.11:1594-1602. 3. Studies at Medical Entomology Centre, Cambridge. Can be accessed at www.baxicleanairsystems.co.uk/health_housing_pages/erad_mites.htm 4. CLUSTALW - http://www.ebi.ac.uk/clustalw 5. PROSITE - http://tw.expasy.org/prosite