Download Correlation of Genome Sequence and Phenotype Microarray Results i

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Marine microorganism wikipedia , lookup

Triclocarban wikipedia , lookup

Horizontal gene transfer wikipedia , lookup

Community fingerprinting wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Name: Jeff Newman Institution: Lycoming College Project Title: Correlation of Genome Sequence and Phenotype Microarray Results in Novel Bacterial Species Identified by Undergraduates In this project, we will compare the complete genome sequences and phenotype microarray results of closely related species to investigate the function of genes specific to each species. We are in the process of completing work necessary to publish and name a new genus, Lycomia, within the family, Flavobacteriaceae. One of the members of this new Genus, to be named Lycomia vostokensis, was isolated from Lake Vostok in Antarctica and has had its genome sequenced already by a group at LSU to identify ice‐binding proteins. Another member, to be named Lycomia zaccaria, was isolated from the Loyalsock Creek, a tributary of the Susquehanna River, and was identified as potentially novel by an undergraduate student during the unknown microbe identification lab. We intend to conduct phenotype microarray studies on these and several other organisms during summer and Fall 2011. Biolog’s Phenotype microarrays are a series of twenty 96‐well plates, used to measure nutritional phenotypes and chemical sensitivity profiles of bacteria – nearly 2000 phenotypes. Our hypothesis is that differences in the phenotypes of the two species will suggest functional roles for genes and operons that differ between the two organisms. Several other closely related organisms whose classification is under debate, will also by studied with phenotype microarrays and the ability to generate genome sequence would clarify the evolutionary relationships among them and define levels of average nucleotide identity (ANI) observed with interspecific vs intergeneric comparisons. As genome sequencing costs continue to decrease, there is a movement in the Microbial Systematics and Taxonomy community to have 95% ANI replace 70% DNA‐
DNA hybridization as the gold standard for defining separate species. These sequences would place us at the forefront of this movement and would contribute to the Genomic Encyclopedia of Bacteria and Archaea (GEBA). Having multiple gene sets would also facilitate correlation of specific genes with specific phenotypes. Finally, I am collaborating with Susquehanna University Biochemist Wade Johnson to characterize a potentially novel thermophilic bacterial species from soils above the Centralia, PA mine fire. Based on 16S rRNA sequence, it is unclear whether this organism, tentatively named Meiothermus centralius, merits description as a novel species. However, its closest relative, Meiothermus sylvanus, and the type species for the genus, Meiothermus ruber, have both had their genomes sequenced. The low level of homology between M. sylvanus and all other Meiothermi, suggests that M. sylvanus and M. centralius should be in new genus, separate from M. ruber. My ranked priority list for genome sequencing is as follows: 1. Lycomia zaccaria 2. “Chryseobacterium” haifense 3. “Chryseobacterium”/Kaistella koreensis 4. Chryseobacterium piperi 5. Meiothermus centralius 6. Chryseobacterium angstadti Each of the organisms has an approximate genome size of ~2.5 Mbp, so for 20X coverage, we would need ~50 Mbp of sequence x 6 organisms = 300 Mbp. We would gladly pay $1000 toward as many genome sequences as we could get, perhaps by splitting an Ion Torrent 316 chip with someone else, or running three barcoded genomes on a single 316 chip. DNA will be isolated as recommended by the Penn State genomics core, or if they have no preference, via a Qiagen or MoBio genomic DNA isolation kit. I could have the samples ready to send at any time, depending on the amount needed. I could even personally deliver them in advance of the workshop, or bring them with me to the workshop. After obtaining the data, average nucleotide identities will be determined using the Jspecies software package. I do not yet know what software will be used to annotate the genome. What I would like to do is identify all ORFs, and group those with known functions into the appropriate COGs. Hypothetical proteins would be used in BLAST searches to determine what other sequenced organisms have the gene, and if possible to determine biochemical function based on domain homology (e.g. transcription factor, transporter, etc.). Additional clues to function would be determined by location of the gene with other genes in an operon. The genomes of closely related organisms will be compared to each other to identify common genes making up the core genome, and genes specific to one or just a few species. These results will be compared to phenotype microarray results to correlate the presence or absence of specific genes/operons with the presence/absence of specific traits. Specific examples of how we would integrate the sequence data into courses within our curriculum are outlined below. 




Bio110 – Introduction to Biology I ‐ In Bioinformatics lab activity, instead of “annotating” a plasmid sequence containing GFP (pGLO), Each student (108 this year) will be given overlapping 30 kb genome segments to identify ORFs and assemble into a scaffold. Bio222 – Genetics – Target for PCR + Cloning (alpha complementation) + plasmid prep/restriction digest lab will switch from human clotting factor IX gene to gaps between contigs in an attempt to close gaps and “finish” genome. Bio321 ‐ Microbiology ‐ New lab activity will be developed to test average nucleotide identity across the genome to determine whether organisms are members of the same species. Hypotheses generated by biochemistry students (see below) will be tested. Bio437 – Molecular Biology – Half semester project will focus on assembly and annotation of microbial genome sequences (other half will still focus on Microarrays) Bio444/Chem444 – Biochemistry – Students will use our genome sequences for metabolic modeling activity instead of others in database as we have done previously.