Download Introduction to polyphasic taxonomy

Document related concepts

Ridge (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Public health genomics wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Hybrid (biology) wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

DNA barcoding wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Koinophilia wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Introduction to polyphasic taxonomy
Peter Vandamme
EUROBILOFILMS - Third European Congress on Microbial Biofilms
Ghent, Belgium, 9 - 12 September 2013
http://www.lm.ugent.be/
Content
The observation of diversity: phenotypic and genotypic
coherence allows to define bacterial species
Taxonomy and species definitions vary with technology:
old and new practices
Phenotypic and numerical taxonomy
DNA
Phylogeny
Polyphasic taxonomy
Whole genome sequences
Observation of diversity in species
Campylobacter lari whole cell protein patterns
4
Observation of diversity in species
Campylobacter jejuni RAPD patterns
5
Observation of diversity in species: AFLP
6
Origin of diversity: genetic drift
7
Evolution
• Growth, genetic drift, physical separation and
periods of selection lead to evolution and
variation in bacterial genomes
– Size & organization
– Content
– Sequence
8
Genome size and organization
Genome size varies from 580,074 bp (Mycoplasma genitalium) to
9,105,828 bp (Bradyrhizobium japonicum)
9
Genome size and organization
Genome size varies from 580,074 bp (Mycoplasma genitalium) to
9,105,828 bp (Bradyrhizobium japonicum)
1 circular chromosome (eg. Escherichia coli 4.6 – 5.4 Mbp)
Multiple circular chromosomes
eg. Ralstonia solanacearum 3.7 Mbp and 2.1 Mbp ;
Burkholderia cenocepacia 3.8 Mbp, 3.2 Mbp en 0.9 Mbp
1 linear chromosome (eg. Borrelia burgdorferi 0.9 Mbp)
1 linear and 1 circular chromosome (eg. Agrobacterium
tumefaciens 2.8 en 2.1 Mbp)
10
Variability in gene content
• Venn diagram showing core and accessory genes for
Streptococcus species. The surfaces are approximately
proportional to the number of genes (Lefébure and
Stanhope 2007 Genome Biol. 8: R71)
11
Variability in gene content
• Venn diagram showing core and accessory genes for
Streptococcus species. The surfaces are approximately
proportional to the number of genes (Lefébure and
Stanhope 2007 Genome Biol. 8: R71)
12
Gene content
The number of genes two genomes have in common
depends on their evolutionary distance
Fraction of
shared genes
Avg. no. of nucleotide substitutions/site for 16S rRNA
13
The species core and pangenome
14
Fig. 2. GBS core genome
Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955
Copyright ©2005 by the National Academy of Sciences
15
Fig. 3. GBS pan-genome
Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955
Copyright ©2005 by the National Academy of Sciences
16
Lefébure et al. 2010: WGS of 96
C. coli and C. jejuni strains
The two species have a similar pangenome size; however, C. coli has
acquired a larger core genome and
each species has evolved a number of
species-specific core genes, possibly
reflecting different adaptive
strategies, in spite of their occurrence
in the same niche (the
gastrointestinal tract of several
hosts).
Recombination within the core
genome is frequent within species,
rare between sister species, and
extremely rare with other species.
Both species’ pan-genomes
underwent unique and cohesive
features defining their genomic
identity.
17
Difference in sequence?
• Relative occurrence of di-, tri-, tetra- (…)
nucleotides: Karlin signatures
• Genes that are shared between organisms can
differ considerably in sequence. The percentage
sequence divergence in orthologous genes is
described by the ANI parameter (ANI: average
nucleotide identity)
18
19
Variability in gene content
•
Genomes seem to be composed of a core set of genes
that is conserved among strains of the same species and
accessory genes that are strain specific.
•
Content and size of core vary with species
•
Although it is clear that mechanisms exist for abundant
and widespread genetic transfer between microbial
lineages, the observation of phenotypic and genotypic
clustering argues for genomic stability and cohesion.
Especially LGT and recombination are now considered
cohesive rather then disruptive forces in bacterial
species.
* Konstantinidis and Tiedje. 2005. Genomic insights that advance the species definition for prokaryotes.20
PNAS 102:2567-2572
How is this information used to
define bacterial species?
21
“...Taxonomy is written by taxonomists for
taxonomists; in this form the subject is so
dull that few, if any, non-taxonomists are
tempted to read it, and presumably even
fewer try their hand at it. It is the most
subjective branch of any biological
discipline, and in many ways is more of an
art than a science...”
(S. T. Cowan, 1971)
22
The bacterial species concept,
definition & taxonomy
• There is a practical need to define bacterial
species as a name bears information.
• The approaches used to define bacterial species
past and present reflect state-of-the-art in
science and technology.
• The observation of phenotypic and genotypic
clustering argues for genomic stability and
cohesion.
• Such clusters could be called species.
23
The bacterial species concept,
definition & taxonomy
• Progress in the field of taxonomy has been
dominated by technological progress. Initially
(until the 1950s), ‘conventional’ bacterial
taxonomy placed heavy emphasis on analyses of
phenotypic properties of the organism.
• To define and identify an organism, one must
assess several of its phenotypic properties, from
general to specific.
24
Phenotypic characterisation
26
Numerical taxonomy
• In the 1950s – 1960s it became evident that
the analysis of large numbers of characteristics
provided a more stable classification and a
superior means to classify and identify bacteria.
• First generations of computers were used to
analyze large data sets of biochemical and
phenotypic characteristics
27
Discovery of the secret of life
• DNA was used to
classify bacteria!
• Determining the guanine
plus cytosine base ratio
(GC ratio) of the DNA of
the organism can be part
of this process.
28
DNA-DNA hybridisation
• Single stranded whole
genomic DNA of two strains
is hybridised. The thermal
stability of the obtained
heterologous hybrid
(expressed as a percentage
value) is a measure for
whole genome sequence
similarity.
29
Ad Hoc Committees on Reconciliation of
Approaches to Bacterial Systematics
(Wayne et al. 1987 – TC [08/09/2013]:3,261)
•
•
•
The complete genome should be the reference
standard to determine phylogeny and taxonomy
Pending routine access to whole genome sequences,
measuring the thermal stability between two genomes,
through DNA-DNA hybridization represents the best
indirect assessment of the level of whole genome
sequence similarity
The phylogenetic definition of coherent phenotypic
clusters, called species, generally would include strains
with at least 60 - 70% DNA-DNA hybridization
30
What about phylogeny?
•
•
•
•
•
DNA-DNA hybridisations between organisms considered
closely related very often yielded low DNA-DNA
hybridisation values, just like DNA-DNA hybridisations
between completely different bacteria.
Perhaps, if evolution of the whole genome can not be
measured, similarities in more conserved parts of the
genome might be more accessible?
A gene encoding a highly conserved function
(chronometer) might be a good target: rRNA genes???
DNA-rRNA hybridisations provided a framework of five
rRNA superfamilies which corresponded with the five
subdivisions in the Proteobacteria.
Technological progress allowed ‘isolation’ and sequence
analysis of conserved genes.
31
Molecular clocks
(chronometers)
• The most widely used molecular clocks (‘single locus
appraoches’ are small subunit ribosomal RNA (SSU
rRNA) genes
– Found in all domains of life (not the case with other
chronometers)
• 16S rRNA in prokaryotes and 18S rRNA in eukaryotes
– Functionally constant
– Sufficiently conserved (change slowly) with variable regions (V1V9), but too conserved to discriminate between closely related
species
– Sufficient length
– Without (?) lateral gene transfer or recombination: differences
should be primarily caused by point mutation, such that the
number of nucleotide differences correlates with the number of
changes through evolution
32
Ribosomal Database Project
•The Ribosomal Database Project (RDP)
•A large collection of rRNA sequences
•Provides a variety of analytical programs
• RDP Release 10, Update 32: May 14,
2013: 2,765,278 16S rRNAs
• http://rdp.cme.msu.edu/
33
• Phylogenetic trees reflecting similarity in
ribosomal RNA sequences, but assumed to
reflect organismal phylogeny have now been
prepared for all the major prokaryotic and
eukaryotic groups.
34
'The All-Species Living Tree'
Project
• Public databases accumulated poor quality and
erroneously annotated sequences.
• The need for curated databases!
• http://www.arb-silva.de/projects/living-tree/
35
16S rRNA sequence analysis:
advantages
• There are several technological and scientific
advantages for using 16S rRNA genes sequences
for studying the phylogeny of bacteria. The main
assets are:
• The availability of a near-universal database
• The availability of highly conserved 16S rRNA
primers
36
16S rRNA sequence analysis:
caveats
• Often insufficient
diversity to distinguish
closely related species
(Fox et al., 1992. How close is
close: 16S rRNA sequence
identity may not be sufficient
to guarantee species
identity).
37
16S rRNA sequence analysis:
caveats
• Often insufficient diversity to distinguish closely
related species (Fox et al., 1992. How close is
close: 16S rRNA sequence identity may not be
sufficient to guarantee species identity).
• Often too much diversity within species:
– 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic
note: a place for DNA-DNA reassociation and 16S rRNA
sequence analysis in the present species definition in
bacteriology)
– 4-5% in 16S rRNA genes of epsilon proteobacteria
38
Limits of 16S rRNA based
phylogeny
39
39
16S rRNA sequence analysis:
caveats
• Often insufficient diversity to distinguish closely related species
(Fox et al., 1992. How close is close: 16S rRNA sequence identity
may not be sufficient to guarantee species identity).
• Often too much diversity within species:
– 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic
note: a place for DNA-DNA reassociation and 16S rRNA
sequence analysis in the present species definition in
bacteriology)
– 4-5% in 16S rRNA genes of epsilon proteobacteria
• Tentative representation of the phylogeny of closely
related bacteria
40
Evolutionary relationships
of prokaryotes
• Chronometers: (genes of)
– ribosomal proteins and RNAs
– Cytochrome
– Fe-S proteins (e.g. ferredoxins)
– ATPase (synthesis/hydrolysis of ATP)
– recA (recombination protein)
– gyrB, groEL, rpoB...
41
Analysis of other chronometers
to study phylogeny of bacteria?
• Pro: the less conserved nature of these
genes facilitates a higher taxonomic
resolution between closely related bacteria
• Con:
– Not universally present
– No universal databases
– Development of universal primers proved
impossible
– Interference of recombination and lateral
gene transfer
42
Limits of recA based
phylogeny
43
Polyphasic taxonomy
•
There is no single molecule that represents all
organismal relationships adequately.
•
Different molecules carry different types of information.
•
A wealth of other methods was developed which were,
just like the original biochemical tests, used to classify
and identify bacteria. All of these methods carried some
information that could be used as indirect measure of
whole genome similarity between isolates.
44
Chemotaxonomy - Respiratory
quinones
45
Chemotaxonomy - Phospholipid
analysis
46
Chemotaxonomy - Polyamine
analysis
47
Chemotaxonomy - Whole cell
fatty acids
48
Whole-cell protein
electrophoresis: Azospirillum
SDS-PAGE en DNA-DNA
hybridisatie (Azospirillum)
100
96
23
22
97
93
20
15
22
24
21
24
19
A. halopaeferens
Au 2
LMG 7108T
Au 5
Au 7
Au 9
Au 10
Au 11
Au 12
DSM 2787T
Y 13
Y9
ATCC29145T
SpBr17
%DNA-binding
7108T 2787T
100
63 A. amazonense
70
18 A. brasilense
9 A. lipoferum
49
Comparison of MALDI-TOF MS
spectral patterns
50
Raman spectroscopy
51
Genotyping - Ribotyping
Lactobacillus sakei
Lactobacillus curvatus
Lactobacillus curvatus
Lactobacillus sakei
52
Genotyping - AFLP Campylobacter
53
•
•
Polyphasic taxonomy
Consensus approach to bacterial taxonomy which
integrates several generally accepted ideas for the
classification of bacteria
• Species delineation is based on DNA-DNA
hybridisation experiments
• Bacterial phylogeny can be studied through
comparative sequence analysis of conserved
macromolecules such as 16S rRNA
• Polyphasic taxonomy determines and
acknowledges the value of other methods for the
delineation of bacteria at different hierarchical
levels
The aim is to collect as much information as possible
in order to define a pragmatic consensus classification
that facilitates identification
54
Polyphasic species definition
• The bacterial species appears to be an
assemblage of isolates originating from a
common ancestor population in which genetic
drift resulted in clones with different degrees
of recombination and characterized by:
– a certain degree of phenotypic consistency
– a significant degree of DNA-DNA hybridization
– over 97% of 16S rRNA gene sequence similarity
55
Polyphasic
observation 2
observation 1
Genomic taxonomy
observation 3
56
Now that we have access to whole-genome
sequences: what do they tell us?
57
Gene content could be used to
define species …
• Venn diagram showing core and accessory genes for
Streptococcus species. The surfaces are approximately
proportional to the number of genes (Lefébure and
Stanhope 2007 Genome Biol. 8: R71)
58
… and higher taxonomic units
• Venn diagram showing core and accessory genes for
Streptococcus species. The surfaces are approximately
proportional to the number of genes (Lefébure and Stanhope
2007 Genome Biol. 8: R71)
59
Average Nucleotide Identity?
•
•
Genomes seem to be composed of a core set of genes
that is conserved among strains of the same species and
accessory genes that are strain specific
Phylogenetic signal present in core genes (ANI values):
95% ANI corresponds with 70% DNA-DNA hybridisation
60
Average Nucleotide Identity?
•
•
•
Phylogenetic signal present in core genes (ANI values):
95% ANI corresponds with 70% DNA-DNA hybridization
ANI does not necessarily correlate with gene content
–
–
ANI values reflect phylogeny
Gene content reflects ecology
Bacteria with considerable differences in gene content are
classified in the same species in spite of considerable
genomic differences
61
ANI based phylogeny
Figure 3
62
Conclusions (1)
–
Whole genome sequences can become part of
polyphasic taxonomy and the standard
description of bacterial species.
–
Whole genome sequences provide parameters
for a superior reconstruction of organismal
phylogeny and for the delineation of species
as defined by DNA-DNA hybridization.
–
Why hold on to DNA-DNA hybridization level
as a standard?
63
Conclusions (2)
–
Currently, less than 10,000 bacterial
species have been described representing
far less than 0.1% of the existing
bacterial diversity.
–
The present practice of polyphasic
taxonomy as requested by the editorial
boards of taxonomic journals is
counterproductive in light of the vast
microbial diversity that remains to be
described.
64