* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download presentation - Genome-to-Genome Distance Calculator
Vectors in gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Human genetic variation wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Microevolution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenomics wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Designer baby wikipedia , lookup
DNA sequencing wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Adeno-associated virus wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Genome (book) wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Oncogenomics wikipedia , lookup
DNA barcoding wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Metagenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Minimal genome wikipedia , lookup
Human genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Methods for the phylogenetic inference from whole genome sequences and their use in Prokaryote taxonomy M. Göker, A.F. Auch, H.P. Klenk Contents 1. Short introduction into the role of DNADNA hybridization (DDH) in Prokaryote taxonomy 2. Current developments in genome sequencing techniques and projects 3. Correlation between in vitro DDH data and in silico wholegenome distances: can we replace DDH? 4. Outlook: the impact of genome sequencing on Prokaryote taxonomy Genomebased taxonomy Traditional Methods for Polyphasic Taxonomy laborious time consuming technically demanding standardization required not suitable for all organisms Picture from Vandamme et al. 1996 Genomebased taxonomy DNADNA hybridization ultimately decides about acceptance of new species To establish a strain as type strain of a novel species, the similarity between its genomic DNA and the DNA of closely related reference species must be sufficiently low Measured using DNADNA hybridization (DDH) with a threshold of 70% Several wetlab methods available: Free solution (photometry, nuclease, thermal stability) or membrane bound (binding, thermal stability) Techniques are cumbersome and cannot easily be made reproducible; different methods yield different results Currently carried out in only a few specialized molecular labs in the world => Can DDH be replaced by comparing sequenced genomes? Genomebased taxonomy Genome sequencing gets easier and cheaper each day March 29th 2010: 1117 published archaeal and bacterial genomes from >250 genera 4045 ongoing bacterial genome sequencing projects (data from www.genomesonline.org/gold) Genomes Genera Modified from www.ncbi.nlm.nih.gov/genomes/MICROBES/microbial_growth.html Genomebased taxonomy Growth of microbial genome sequences July 1995 to December 2008 Most phyla with cultured species are poorly sampled in genomics ...lineages with no cultured taxa are even more poorly sampled! Genomebased taxonomy Current genomic coverage of type strains is unsatisfactory Situation in April 2009 Genomebased taxonomy Picture from Göker & Klenk, in press The GEBA project aims at filling the gaps GEBA: Genomic Enzyclopedia of Bacteria and Archaea Approach: Take rRNA Tree of Life Overlay finished and in progress genome sequencing projects onto it Rank phylotypes by how they fill into gaps of the tree Use phylogenetic diversity as Strain selection for GEBA main project: Wu et al., Phylogenetic Distance (PD) procedure measure for strain selection Genomebased taxonomy Several approaches to directly compare whole genomes are available “Direct” methods: HSPbased (e.g., GBDP) Based on wordcount frequencies Complexitybased (e.g., OS distance) Based on common substrings (e.g., ACS) => all use alignmentfree distance calculation to infer genome togenome distances (GGD) More widespread alternative: Extraction methods, starting with gene finding and orthology determination Genomebased taxonomy GBDP is based on highscoring segment pairs (HSPs) GBDP = Gen(om)e Blast Distance Phylogeny Pairwise homology search via BLAST or related algorithms Calculate coverage of genomes Apply specific distance functions, e.g.: Genomebased taxonomy d(x,y) := distance between genomes x and y Ixy := number of identical bases/amino acids in HSPs Hxy := total HSP length using x as query and y as subject (x,y) := sum of genome lengths (alternative: 2* minimum length) Several methods to obtain HSPs have been described Blast (Altschul et al. 1990): probably most frequently used algorithm in bioinformatics; about 50x faster than exhaustive sequence alignment Blat (Kent 2002): faster than Blast when it was introduced (while using more memory) Blastz (Schwartz et al. 2003) developed for whole genome humanmouse alignments Mummer (Kurtz et al. 2004): even faster alternative, obtains exact matches of specified minimum length only (MUMs instead of HSPs) Genomebased taxonomy Empirical assessment of wholegenome comparisons Goris et al. (2007) published DDH values for 62 pairs of sequenced genomes for comparison with a genome sequence and Blastbased method called ”ANI” Search in GOLD and in the taxonomic literature revealed additional 31 genome pairs for which DDH values have been published Covering 15 „hybridization groups“ (sets of potentially closely related strains, e.g. from the same genus) Dataset includes fully sequenced as well as draft genomes Correlation between DDH values obtained in vitro and pair wise genome distances calculated in silico determined Genomebased taxonomy Blat all in all performs best, followed by NCBI blastn Genomebased taxonomy Analogon of 70% DDH determined via regression or minimum error ratio s(d) = d * 122.5151 + 100.6280 Genomebased taxonomy Mummer performs better for larger minimum exact match lengths Genomebased taxonomy GBDP slightly outperforms ANI Best correlation obtained Minimal error ratio GBDP/NCBI Blastn 0.771 0.065 GBDP/Blat 0.760 0.065 ANI (Goris et al. 2007) 0.717 0.065 „Percentage conserved DNA“ (Goris et al. 2007) 0.686 0.081 As 1st step, ANI splits the genome sequence in 1,000 bp chunks; this seems to be unnecessary GBDP includes a richer set of distance functions and HSP/MUM determination programs Genomebased taxonomy Some distance functions accurately predict 70% DDH using only 20% of the genome sequence Genomebased taxonomy DDH correlates less well with 16S rDNA distances than genometogenome distances do Picture from Stackebrandt & Evers 2006 Best correlation with 16S rRNA distances obtained DDH 0.533 GBDP/NCBI Blastn 0.602 GBDP/Blat 0.612 GBDP/MUMmer 0.611 Genomebased taxonomy Wholegenome comparison available as web service: http://ggdc.gbdp.org/ Genomes received either through upload or from Genbank ... Genomebased taxonomy Results are received via email Dear User, please find below the results of your GGDC job. A short explanation is found at the end of this email. Further information is provided on our website at http://www.gbdp.org/species/. If you use this service in a publication, please cite the appropriate references listed there. Also, please report any apparent calculation errors or other anomalous results to the authors. Sincerely, Your GGDC team ======================================== Submission: 09112315384417682 Query: taxon1 Reference: CP001407.1 One entry per reference Program: NCBIBLAST Formula: 1 (HSP length / total length) genome and distance formula Distance: 0.1661101 DDH estimate (regressionbased): 76.7180478 Estimate of DDH <=70% (thresholdbased): no (threshold=0.2676) ======================================== Submission: 09112315384417682 Query: taxon1 Reference: CP001407.1 ... Genomebased taxonomy A standard operation procedure (SOP) for the use of the server is available Standards in Genomic Sciences (2010) 2: 142-148 DOI:10.4056/sigs.541628 The Genomic Standards Consortium Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs Alexander F. Auch1, Hans-Peter Klenk2*, Markus Goeker2 1 Center for Bioinformatics Tübingen, Eberhard-Karls-Universitaet, Tübingen, Germany 2 DSMZ – German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany. * Corresponding author : Hans-Peter Klenk. Keywords : BLAST, GBDP, GGDC web server, genomics, MUMmer, phylogeny, species delineation, microbial taxonomy. Standards in Genomic Sciences (2010) 2: 117-134 DOI:10.4056/sigs.531120 The Genomic Standards Consortium Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison Alexander F. Auch1, Mathias von Jan2, Hans-Peter Klenk2*, Markus Goeker2 1 Center for Bioinformatics Tübingen, Eberhard-Karls-Universitaet, Tübingen, Germany 2 DSMZ – German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany * Corresponding author : Hans-Peter Klenk. Keywords : Archaea , Bacteria , BLAST, GBDP, genomics, MUMmer, phylogeny, species concept, taxonomy. Genomebased taxonomy Next steps in genomebased taxonomy Midterm goals: Enlargement of reference datasets for empirical tests and test of more distance functions Automatically updated server for the genomes of type strains (GEBA etc.) Genome sequencing as routine part of the type strain accession process in culture collections Prognosis: Significantly increased impact of genome sequencing on Prokaryote classification Genomebased taxonomy GBDP can be used to infer whole genome phylogenetic trees Neighbournet phylogenetic network Genomebased taxonomy Henz et al. 2005 En route to a genomebased taxonomy Analysis of genome sequences can be applied at all taxonomic levels Techniques will soon be sufficiently fast and affordable Picture from Vandamme et al. 1996, modified Genomebased taxonomy