Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Xianfeng Chen, Kurt A. Gust, and Edward J. Perkins Environmental Laboratory, ERDC, Vicksburg, MS 39180 Supercomputer Assembly and Annotation of Transcriptomes for Assessing Impacts of Army Stressors on Ecological Receptors Introduction • Genomic tool development for ecologically-relevant non-model species has lagged relative to model species, advancements in sequencing technology, bioinformatics processing, and gene expression platforms have led to an increasing number of non-model species having deepcoverage and well-annotated transcriptomes from which high-quality genomic tools have been produced. • We have developed a bioinformatics infrastructure and data processing pipeline to transit raw sequence data to robustly annotated coding genes to support gene expression profiling and biological impact assessment of army stressors on ecological receptors such as Western fence lizard (Sceloporus occidentalis) and Japanese quail (Coturnix coturnix). • These gene expression and cyber-infrastucture tools are proving to be indispensable as the focus of biological research and regulatory decision frameworks continue to shift toward systems biology and predictive toxicology approaches. Figure 1. Japanese Quail and Western Fence Lizard. Table 3. Unigenes homology-based coding potential detection and annotation against the following protein databases: NR.aa (10,606,545 proteins), Refseq (6,392,535 proteins), UniProt-SwissProt (515,203 proteins), Uniref90 (6,544,144 proteins), Uniref100 (9,865,668 proteins). Unigene Dataset Results •The sequencing effort produced over 328 million base reads for the Western Fence Lizard (WFL) ) [Figure 1] and 189 million base reads for Japanese Quail (JQ) in 928,780 and 559,833 sequence reads, respectively (Table 1). •A total of 559,819 and 928,759 sequences for both WFL and JQ were clustered and assembled using Gene Indices Clustering Tools (TGICL, J. Craig Venter Institute) into 44,455 and 58,962 unigenes, respectively. • Assembled unigenes were annotated using Basic Local Alignment Search Tool (BLAST) against 5 publicly available protein sequence databases, produced 33 to 44 % unigene characterization (Table 2 and 3) via the DoD supercomputers, Diamond (SGI Altrix ICE) and Jade (Cray XT4) [Figure 2]. • Sequences with significant similarity to known proteins were used to design custom high density gene expression microarrays to be used to assess the impacts of Army activity on the health of the JQ and WFL environmental models. • Thus, this effort has developed a cyber-infrastructure capability (http://jeff.ifxworks.com/EGGT/) at the Environmental Laboratory to rapidly develop genomic infrastructure and gene expression tools for any environmental model that emerges as species of interest [Figure 3, 4, and 5]. Table 1. Results of GS-FLX Pyrosequencing of normalized cDNA Libraries for Western fence lizard (WFL) and Japanese quail (JQ). Sequencing Parameters WFL JQ Raw Wells 2,125,263 1,157,019 Key Pass Wells 2,061,220 1,103,565 928,780 559,833 Passed Filter Wells Total Bases Length Average 328,540,934 354 189,239,672 338 Median Reads Length 397 388 Longest Reads Length 2,043 686 Shortest Reads Length 2 11 WFL Contigs WFL Singlets JQ Contigs JQ Singlets Coding Detected 23,385 23,173 21,593 23,463 23,508 1,425 1,440 1,457 1,465 1,298 17,873 17,732 15,513 18034 18,031 1,208 1,195 1,140 1,217 1,211 NonCoding Detected 30,512 30,724 32,304 30,434 30,389 1,825 1,837 1,820 1,812 1,979 23,193 23,334 25,553 23,032 23,035 2,181 2,194 2,249 2,172 2,178 % Coding Protein Database 43.39% 43.00% 40.06% 43.53% 43.62% 44.33% 43.94% 44.46% 44.71% 39.61% 43.52% 43.18% 37.78% 43.92% 43.91% 35.65% 35.26% 33.64% 35.91% 35.73% NR.aa Refseq UniProt-SwissProt Uniref100 Uniref90 NR.aa Refseq UniProt-SwissProt Uniref100 Uniref90 NR.aa Refseq UniProt-SwissProt Uniref100 Uniref90 NR.aa Refseq UniProt-SwissProt Uniref100 Uniref90 Table 2. Summary of sequence clustering and assembly for Western Fence Lzard (WFL) and Japanese Quail (JQ). Sequence Assembly WFL 928,759 559,819 Total Assembled Contigs 53,897 41,066 Data Exchange Using XML Based SOAP Batch Processing High Performance and Throughput Computing using Super Computers Data Management Total Singlets 5,065 3,389 Total Unigenes 58,962 44,455 Figure 3. Web dissemination of the JQ and WFL transcriptome datasets. (http://jeff.ifxworks.com/EGGT/Quail_Lizard.html). Web Services JQ Total ESTs Available Figure 2. Diamond and Jade Supercomputers at ITL ERDC. Data Query, Data Upload via http: Oracle Relational Database (1) (2) (3) (4) Data Uploading; Data Validation; Data Analysis; Data Processing Data Management Perl & Java Private File Server Public File Server Figure 4. Proposed bioinformatics system architecture. Figure 5. Web-based tools for transcriptomes and unigene analysis.