Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Generation and Analysis of Penaeus monodon Expressed Sequence Tags Anchalee Tassanakajon Shrimp Molecular Biology and Genomics Laboratory, Department of Biochemistry, Faculty of Science, Chulalongkorn University Research Team Chulalongkorn University Shrimp Molecular Biology and Genomics Laboratory Dr. Anchalee Tassanakajon Dr. Siriporn Pongsomboon Dr. Premruethai Supungul Dr. Piti Amparyup Ms. Sureerat Tang CE for Marine Biotechnology Dr. Sirawut Klinbunga Dr. Narongsak Paunglarp Advanced Virtual and Intelligent Computing Research Center Dr. Chidchanok Lursinsap Mr. Kasemsant Kuphanumat Mahidol University Dr. Apinunt Udomkit (IMBG) Dr. Sarawut Jitrapakdee (Faculty of Sci.) Dr. Kallaya Dangtip (Centex Shrimp) Prince of Songkla University Dr. Amornrat Pongdara Shrimp ESTs from GenBank dbEST release 012508 Summary by Organism - January 25, 2008 Number of public entries: 49,284,356 Objectives ¾ To generate a collection of ESTs from Penaeus monodon ¾ To establish the database for mining of genes, repetitive sequences and SNP detection cDNA Libraries Non-normalized cDNA libraries ¾ Hemocyte ¾ Hematopoietic tissue ¾ Lymphoid organ ¾ Intestine ¾ Gill and epipodite ¾ Hepatopancrease ¾ Antennal gland ¾ Eye stalk ¾ Brain and thoracic ganglia ¾ Heart ¾ Ovary ¾ Testis • Normalized cDNA libraries Hemocyte (11,008 ESTs) Hepatopancrease (4,122 ESTs) Ovary Antennal gland Gill-epipodite Subtractive cDNA libraries 9 Heat-induced gill subtraction 9 Ovary subtraction (different stages of female broodstock) 9 Testes subtraction (Broodstock / juvenile) Experimental animals Normal shrimp Pathogen-infected shrimp Heat-induced shrimp Summary of Penaeus monodon EST analysis Total no. of ESTs Total no. of Mt sequences 40,001 6858 (17%) Total ESTs analyzed 33,143 No. of ESTs in contigs 25,834 No. of contigs 3,227 No. of singletons 7,309 No. of unique transcripts 10,536 The distribution of cluster size The no. of EST/contig = 2 to 454 The average range of the contig = 945 bp The longest assembled sequence = 6,309 bp The shortest assembled sequence = 109 bp >50-495 Cluster size Functional Annotation Blastx and Blastn Matched EST (e-value < 10-4 ) Anchalee Tassanakajon 5,648 (53.6%) Unmatched 4,888 (46.4%) Other Arthropods 1% Matched Species All Others 8% Mammals 11% Actinopterygii 6% Crustacea 8% Chordates 22% Other Chordates 5% Platyhelminthes Arthropods 41% 7% Echinoderms Insects 32% 4% Fenneropenaeus 1% Marsupenaeus 2% Litopenaeus 4% Penaeus 6% Homarus 2% Bacteria 4% Other 10% Tribolium 23% Pan 2% Macaca 3% Bos 4% Others 6% Danio 19% Canis 3% Mus 8% Bombyx 2% Apis 21% Anopheles 8% Drosophila 10% Protists 14% Aedes 11% Rattus 15% Tetraodon 8% Gallus 9% Xenopus 11% Homo 12% Gene Ontology Annotations Total no. of GO hits 5,002 (47.5%) GO hits within “Molecular Function” 3,859 GO hits within “Biological process” 3,427 GO hits within “Cellular Component” 3,797 Gene Ontology Annotations Highly represented EST transcripts from P. monodon libraries Contig number No. of sequence Putative gene [Closest species] Accession No. E value XP_001054782 3.00E-17 CT95 454 hypothetical protein [Rattus norvegicus] CT115 443 unknown CT19 393 elongation factor 1-alpha [Pocillopora damicornis] BAE66714 0 CT255 393 thrombospondin [Penaeus monodon] AAN17670 0 CT148 374 hypothetical protein [Eimeria tenella str. Houghton] XP_001238639 1.00E-09 CT263 275 conserved hypothetical protein [Aedes aegypti] EAT47957 2.00E-27 CT111 260 penaeidin [Penaeus monodon] AAQ05769 6.00E-39 CT151 254 beta-actin [Litopenaeus vannamei] AAG16253 0 CT283 214 AAM44050 1.00E-169 CT242 161 ovarian peritrophin 2 precursor [Penaeus monodon] similar to secreted nidogen domain protein [Strongylocentrotus purpuratus] XP_788074 2.00E-30 CT82 159 BAD15063 7.00E-74 CT42 148 crustin-like peptide type 2 [Marsupenaeus japonicus] ribosomal protein S26 [Branchiostoma belcheri] ABK32080 6.00E-44 CT48 147 putative senescence-associated protein [Pisum sativum] BAB33421 5.00E-47 CT170 132 hemocyte kazal-type proteinase inhibitor [Penaeus monodon] AAP92779 1.00E-167 CT100 131 profilin [Branchiostoma belcheri] Q8T938 2.00E-23 CT251 129 AAM44050 1.00E-128 CT169 128 ovarian peritrophin 2 precursor [Penaeus monodon] mFLJ00348 protein [Mus musculus] BAD90390 6.00E-16 CT156 125 AAM44049 1.00E-170 CT219 124 ovarian peritrophin 1 precursor [Penaeus monodon] hemocyanin [Litopenaeus vannamei] CAA57880 0 Differentially expressed immune-related genes Mining for microsatellites Total clone searched No. of ESTs containing microsatelites No. of unique ESTs containing microsatellites Total no. of microsatellites loci 10,100 1,381 (13.7%) 997 2,165 Distribution of microsatellite repeat types 85 new polymorphic microsatellite markers were developed. No. of alleles per locus 3–30 alleles (an average of 12.6. alleles/ locus) SNP Prediction Total clones subjected to prediction 8,091 No. of clones in contigs 3,846 No. of contigs 356 Potential SNP sites 595 Estimated SNP site 1/ 644 bp SNP Prediction in various putative genes Contig name Contig Contig Contig Contig Contig Contig Contig Contig 1274 1256 1267 1270 1251 1255 1271 1272 Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig Contig 1260 1258 1261 1223 1254 1249 1227 1207 1198 1204 1189 1193 1176 1177 1180 1124 1117 1112 1118 1099 1037 Putative genes thrombospondin hemocyanin ovarian peritrophin 2 precursor unknown hemocyte kazal-type proteinase inhibitor unknown anti-lipopolysaccharide factor Oryza sativa (japonica cultivar-group) cDNA clone:J023007E09 ovarian peritrophin 2 precursor penaeidin antimicrobial peptide elongation factor-1 alpha thymosin isoform 1 ribosomal protein L10 40S ribosomal protein ATP/ADP translocase eukaryotic initiation factor 4A Rps16 protein oncoprotein nm23 trypsin profilin actin depolymerizing factor vacuolar ATP synthase subunit E fructose 1,6-bisphosphate aldolase ficolin cathepsin A polehole chaperonin calcium-binding protein Calnexin Length No. of sequence in contig No. of SNP 3428 2243 2658 1881 1554 1690 723 2918 139 32 62 85 26 29 86 96 35 33 26 19 17 16 14 12 900 662 1795 2500 1337 696 552 1302 1548 523 721 762 1082 1353 1985 2315 1912 2166 2637 1875 2244 36 34 39 15 29 23 15 13 12 12 11 11 10 10 10 8 7 7 7 6 5 site 12 12 12 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Microarrays Fabrication Duplicated spots 9,991 genes on the array 7,256 gene spots (72.6%) showed acceptable signal intensity for data analysis. Outcome and Future Prospects 9 40,001 high quality EST sequences representing 10,536 unique genes 9 P. monodon EST database and a user-friendly web site (http://pmonodon.biotec.or.th) 9 A large number of potential genetic markers from microsatellites and potential SNP sites 9 A cDNA microarray containing 9,991 unigenes 9 11 international publications Dr. Prasit Palittapongarnpim Prof. Boonsirm Withyachamnarnkul This research received financial support from BIOTEC.