* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to polyphasic taxonomy
Ridge (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Hybrid (biology) wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic library wikipedia , lookup
Helitron (biology) wikipedia , lookup
DNA barcoding wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome editing wikipedia , lookup
Minimal genome wikipedia , lookup
Koinophilia wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Introduction to polyphasic taxonomy Peter Vandamme EUROBILOFILMS - Third European Congress on Microbial Biofilms Ghent, Belgium, 9 - 12 September 2013 http://www.lm.ugent.be/ Content The observation of diversity: phenotypic and genotypic coherence allows to define bacterial species Taxonomy and species definitions vary with technology: old and new practices Phenotypic and numerical taxonomy DNA Phylogeny Polyphasic taxonomy Whole genome sequences Observation of diversity in species Campylobacter lari whole cell protein patterns 4 Observation of diversity in species Campylobacter jejuni RAPD patterns 5 Observation of diversity in species: AFLP 6 Origin of diversity: genetic drift 7 Evolution • Growth, genetic drift, physical separation and periods of selection lead to evolution and variation in bacterial genomes – Size & organization – Content – Sequence 8 Genome size and organization Genome size varies from 580,074 bp (Mycoplasma genitalium) to 9,105,828 bp (Bradyrhizobium japonicum) 9 Genome size and organization Genome size varies from 580,074 bp (Mycoplasma genitalium) to 9,105,828 bp (Bradyrhizobium japonicum) 1 circular chromosome (eg. Escherichia coli 4.6 – 5.4 Mbp) Multiple circular chromosomes eg. Ralstonia solanacearum 3.7 Mbp and 2.1 Mbp ; Burkholderia cenocepacia 3.8 Mbp, 3.2 Mbp en 0.9 Mbp 1 linear chromosome (eg. Borrelia burgdorferi 0.9 Mbp) 1 linear and 1 circular chromosome (eg. Agrobacterium tumefaciens 2.8 en 2.1 Mbp) 10 Variability in gene content • Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71) 11 Variability in gene content • Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71) 12 Gene content The number of genes two genomes have in common depends on their evolutionary distance Fraction of shared genes Avg. no. of nucleotide substitutions/site for 16S rRNA 13 The species core and pangenome 14 Fig. 2. GBS core genome Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955 Copyright ©2005 by the National Academy of Sciences 15 Fig. 3. GBS pan-genome Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955 Copyright ©2005 by the National Academy of Sciences 16 Lefébure et al. 2010: WGS of 96 C. coli and C. jejuni strains The two species have a similar pangenome size; however, C. coli has acquired a larger core genome and each species has evolved a number of species-specific core genes, possibly reflecting different adaptive strategies, in spite of their occurrence in the same niche (the gastrointestinal tract of several hosts). Recombination within the core genome is frequent within species, rare between sister species, and extremely rare with other species. Both species’ pan-genomes underwent unique and cohesive features defining their genomic identity. 17 Difference in sequence? • Relative occurrence of di-, tri-, tetra- (…) nucleotides: Karlin signatures • Genes that are shared between organisms can differ considerably in sequence. The percentage sequence divergence in orthologous genes is described by the ANI parameter (ANI: average nucleotide identity) 18 19 Variability in gene content • Genomes seem to be composed of a core set of genes that is conserved among strains of the same species and accessory genes that are strain specific. • Content and size of core vary with species • Although it is clear that mechanisms exist for abundant and widespread genetic transfer between microbial lineages, the observation of phenotypic and genotypic clustering argues for genomic stability and cohesion. Especially LGT and recombination are now considered cohesive rather then disruptive forces in bacterial species. * Konstantinidis and Tiedje. 2005. Genomic insights that advance the species definition for prokaryotes.20 PNAS 102:2567-2572 How is this information used to define bacterial species? 21 “...Taxonomy is written by taxonomists for taxonomists; in this form the subject is so dull that few, if any, non-taxonomists are tempted to read it, and presumably even fewer try their hand at it. It is the most subjective branch of any biological discipline, and in many ways is more of an art than a science...” (S. T. Cowan, 1971) 22 The bacterial species concept, definition & taxonomy • There is a practical need to define bacterial species as a name bears information. • The approaches used to define bacterial species past and present reflect state-of-the-art in science and technology. • The observation of phenotypic and genotypic clustering argues for genomic stability and cohesion. • Such clusters could be called species. 23 The bacterial species concept, definition & taxonomy • Progress in the field of taxonomy has been dominated by technological progress. Initially (until the 1950s), ‘conventional’ bacterial taxonomy placed heavy emphasis on analyses of phenotypic properties of the organism. • To define and identify an organism, one must assess several of its phenotypic properties, from general to specific. 24 Phenotypic characterisation 26 Numerical taxonomy • In the 1950s – 1960s it became evident that the analysis of large numbers of characteristics provided a more stable classification and a superior means to classify and identify bacteria. • First generations of computers were used to analyze large data sets of biochemical and phenotypic characteristics 27 Discovery of the secret of life • DNA was used to classify bacteria! • Determining the guanine plus cytosine base ratio (GC ratio) of the DNA of the organism can be part of this process. 28 DNA-DNA hybridisation • Single stranded whole genomic DNA of two strains is hybridised. The thermal stability of the obtained heterologous hybrid (expressed as a percentage value) is a measure for whole genome sequence similarity. 29 Ad Hoc Committees on Reconciliation of Approaches to Bacterial Systematics (Wayne et al. 1987 – TC [08/09/2013]:3,261) • • • The complete genome should be the reference standard to determine phylogeny and taxonomy Pending routine access to whole genome sequences, measuring the thermal stability between two genomes, through DNA-DNA hybridization represents the best indirect assessment of the level of whole genome sequence similarity The phylogenetic definition of coherent phenotypic clusters, called species, generally would include strains with at least 60 - 70% DNA-DNA hybridization 30 What about phylogeny? • • • • • DNA-DNA hybridisations between organisms considered closely related very often yielded low DNA-DNA hybridisation values, just like DNA-DNA hybridisations between completely different bacteria. Perhaps, if evolution of the whole genome can not be measured, similarities in more conserved parts of the genome might be more accessible? A gene encoding a highly conserved function (chronometer) might be a good target: rRNA genes??? DNA-rRNA hybridisations provided a framework of five rRNA superfamilies which corresponded with the five subdivisions in the Proteobacteria. Technological progress allowed ‘isolation’ and sequence analysis of conserved genes. 31 Molecular clocks (chronometers) • The most widely used molecular clocks (‘single locus appraoches’ are small subunit ribosomal RNA (SSU rRNA) genes – Found in all domains of life (not the case with other chronometers) • 16S rRNA in prokaryotes and 18S rRNA in eukaryotes – Functionally constant – Sufficiently conserved (change slowly) with variable regions (V1V9), but too conserved to discriminate between closely related species – Sufficient length – Without (?) lateral gene transfer or recombination: differences should be primarily caused by point mutation, such that the number of nucleotide differences correlates with the number of changes through evolution 32 Ribosomal Database Project •The Ribosomal Database Project (RDP) •A large collection of rRNA sequences •Provides a variety of analytical programs • RDP Release 10, Update 32: May 14, 2013: 2,765,278 16S rRNAs • http://rdp.cme.msu.edu/ 33 • Phylogenetic trees reflecting similarity in ribosomal RNA sequences, but assumed to reflect organismal phylogeny have now been prepared for all the major prokaryotic and eukaryotic groups. 34 'The All-Species Living Tree' Project • Public databases accumulated poor quality and erroneously annotated sequences. • The need for curated databases! • http://www.arb-silva.de/projects/living-tree/ 35 16S rRNA sequence analysis: advantages • There are several technological and scientific advantages for using 16S rRNA genes sequences for studying the phylogeny of bacteria. The main assets are: • The availability of a near-universal database • The availability of highly conserved 16S rRNA primers 36 16S rRNA sequence analysis: caveats • Often insufficient diversity to distinguish closely related species (Fox et al., 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity). 37 16S rRNA sequence analysis: caveats • Often insufficient diversity to distinguish closely related species (Fox et al., 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity). • Often too much diversity within species: – 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology) – 4-5% in 16S rRNA genes of epsilon proteobacteria 38 Limits of 16S rRNA based phylogeny 39 39 16S rRNA sequence analysis: caveats • Often insufficient diversity to distinguish closely related species (Fox et al., 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity). • Often too much diversity within species: – 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology) – 4-5% in 16S rRNA genes of epsilon proteobacteria • Tentative representation of the phylogeny of closely related bacteria 40 Evolutionary relationships of prokaryotes • Chronometers: (genes of) – ribosomal proteins and RNAs – Cytochrome – Fe-S proteins (e.g. ferredoxins) – ATPase (synthesis/hydrolysis of ATP) – recA (recombination protein) – gyrB, groEL, rpoB... 41 Analysis of other chronometers to study phylogeny of bacteria? • Pro: the less conserved nature of these genes facilitates a higher taxonomic resolution between closely related bacteria • Con: – Not universally present – No universal databases – Development of universal primers proved impossible – Interference of recombination and lateral gene transfer 42 Limits of recA based phylogeny 43 Polyphasic taxonomy • There is no single molecule that represents all organismal relationships adequately. • Different molecules carry different types of information. • A wealth of other methods was developed which were, just like the original biochemical tests, used to classify and identify bacteria. All of these methods carried some information that could be used as indirect measure of whole genome similarity between isolates. 44 Chemotaxonomy - Respiratory quinones 45 Chemotaxonomy - Phospholipid analysis 46 Chemotaxonomy - Polyamine analysis 47 Chemotaxonomy - Whole cell fatty acids 48 Whole-cell protein electrophoresis: Azospirillum SDS-PAGE en DNA-DNA hybridisatie (Azospirillum) 100 96 23 22 97 93 20 15 22 24 21 24 19 A. halopaeferens Au 2 LMG 7108T Au 5 Au 7 Au 9 Au 10 Au 11 Au 12 DSM 2787T Y 13 Y9 ATCC29145T SpBr17 %DNA-binding 7108T 2787T 100 63 A. amazonense 70 18 A. brasilense 9 A. lipoferum 49 Comparison of MALDI-TOF MS spectral patterns 50 Raman spectroscopy 51 Genotyping - Ribotyping Lactobacillus sakei Lactobacillus curvatus Lactobacillus curvatus Lactobacillus sakei 52 Genotyping - AFLP Campylobacter 53 • • Polyphasic taxonomy Consensus approach to bacterial taxonomy which integrates several generally accepted ideas for the classification of bacteria • Species delineation is based on DNA-DNA hybridisation experiments • Bacterial phylogeny can be studied through comparative sequence analysis of conserved macromolecules such as 16S rRNA • Polyphasic taxonomy determines and acknowledges the value of other methods for the delineation of bacteria at different hierarchical levels The aim is to collect as much information as possible in order to define a pragmatic consensus classification that facilitates identification 54 Polyphasic species definition • The bacterial species appears to be an assemblage of isolates originating from a common ancestor population in which genetic drift resulted in clones with different degrees of recombination and characterized by: – a certain degree of phenotypic consistency – a significant degree of DNA-DNA hybridization – over 97% of 16S rRNA gene sequence similarity 55 Polyphasic observation 2 observation 1 Genomic taxonomy observation 3 56 Now that we have access to whole-genome sequences: what do they tell us? 57 Gene content could be used to define species … • Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71) 58 … and higher taxonomic units • Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71) 59 Average Nucleotide Identity? • • Genomes seem to be composed of a core set of genes that is conserved among strains of the same species and accessory genes that are strain specific Phylogenetic signal present in core genes (ANI values): 95% ANI corresponds with 70% DNA-DNA hybridisation 60 Average Nucleotide Identity? • • • Phylogenetic signal present in core genes (ANI values): 95% ANI corresponds with 70% DNA-DNA hybridization ANI does not necessarily correlate with gene content – – ANI values reflect phylogeny Gene content reflects ecology Bacteria with considerable differences in gene content are classified in the same species in spite of considerable genomic differences 61 ANI based phylogeny Figure 3 62 Conclusions (1) – Whole genome sequences can become part of polyphasic taxonomy and the standard description of bacterial species. – Whole genome sequences provide parameters for a superior reconstruction of organismal phylogeny and for the delineation of species as defined by DNA-DNA hybridization. – Why hold on to DNA-DNA hybridization level as a standard? 63 Conclusions (2) – Currently, less than 10,000 bacterial species have been described representing far less than 0.1% of the existing bacterial diversity. – The present practice of polyphasic taxonomy as requested by the editorial boards of taxonomic journals is counterproductive in light of the vast microbial diversity that remains to be described. 64