Download Intraspecies variation in bacterial genomes: the need for a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Oncogenomics wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Transposable element wikipedia , lookup

Polyploid wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

DNA barcoding wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Koinophilia wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Human Genome Project wikipedia , lookup

Gene wikipedia , lookup

Human genome wikipedia , lookup

Genomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomic library wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
REVIEWS
Intraspecies variation in bacterial
genomes: the need for a species
genome concept
Ruiting Lan and Peter R. Reeves
B
acteria are characterized
Bacterial populations are clonal. Their
absence of the relevant genes,
by extensive intraspecies
evolution involves not only divergence
and this is perhaps generally
variation. There is not
between orthologous genes but also gain
the case.
only the sequence variation
of genes from other clones or species,
One could of course sefound in all species, but also
which has only recently been widely
quence a large number of isothe presence or absence of
appreciated through macrorestriction
lates for each species to obtain
whole genes or clusters of
mapping, genomic subtraction and
this all-important sequence,
genes. In most eukaryotes, an
complete genome sequencing. Genes can
but that is expensive and has
individual genome sequence
also be lost in response to selection or by
only been completed for two
will provide us with the vast
random mutation after becoming
isolates of two species: Helicomajority of the genes for that
redundant. The bacterial genome is a
bacter pylori2,3 and Neisseria
species. This is not the case for
meningitidis4,5. Work is in
dynamic structure and intraspecies
bacteria. Only if we have the
progress on more than one
variation needs to be included in genome
sequence for all the DNA imisolate of a number of other
analysis if we are to gain insight into the
portant for a species do we refull species genome.
species [see TIGR’s website
ally have the ‘species’ genome,
(http://www.tigr.org) for
R. Lan and P.R. Reeves* are in the Dept of
as distinct from an individual
a list of genomes currently
Microbiology, Bldg G08, University of Sydney,
genome. It would probably
being sequenced] but this will
NSW 2006, Sydney, Australia.
require the sequence of many
still give only a small sample of
*tel: 161 2 9351 6045,
individual genomes to even
the variation, and other apfax: 161 2 9351 4571,
e-mail: [email protected]
approximate the full species
proaches are being developed.
genome, and alternative approaches are therefore required. This review will Genome-comparison techniques
focus on the genome differences between clones of Macrorestriction mapping or pulse-field gel eleca species and the forces involved in generating this trophoresis, which separates large DNA fragments
variation.
after digestion of chromosomal DNA by rare-cutting
enzymes, has greatly advanced our understanding of
Assessing genome variation within species
species genome structure. Macrorestriction mapping
It has long been known that bacteria can carry plas- detects genome rearrangements as well as substantial
mids or lysogenic bacteriophages and that these gene additions; this technique has now been carried
elements are, in general, present in only some strains. out for many species and has demonstrated the ubiqThere have been no extensive studies of the number uity of intraspecies genome size variation6,7. Bloch
of forms that can exist in a species, but it is certainly and colleagues8 developed a novel macrorestriction
very large. For example, there are nine pro- mapping method using extremely rare enzyme sites
phages/cryptic prophages in Escherichia coli K-12 that are introduced into the chromosome on trans(Ref. 1). Some are simply parasitic and not an integral posons and then moved between strains. The
part of the genome, but others carry genes that make chromosome can then be cut into large-sized fragthem important for cell survival. There are also ments with fixed-reference terminal loci, allowing
groups of genes found in the chromosome of some comparison of fragments with the same end points
strains but not others. These are clearly part of the from different isolates.
Genomic subtraction is based on the hybridization
genome and it is this variation that will most concern us. The best known of these groups are the of DNA from two genomes and removal of any compathogenicity islands (PAIs), which carry genes that mon sequences9, allowing direct analysis of strainconfer specific aspects of pathogenicity. As another specific DNA. A simplification of the procedure has
example, strains of many bacterial species vary in been reported recently10. Genomic subtraction is
metabolic capability, in areas such as which sugars or probably the best method to explore genome differother substrates can be utilized; in some cases, this ences for a large number of isolates. One can use the
has been shown to be the result of the presence or subtracted DNA to screen plasmid/cosmid libraries,
0966-842X/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S0966-842X(00)01791-1
TRENDS
IN
MICROBIOLOGY
396
VOL. 8
NO. 9
SEPTEMBER 2000
REVIEWS
D
N
hy A–D
br N
id A
iz
at
io
N
n
on
D -ho
N
A mo
lo
go
us
or it can be cloned directly, allowing access to the unique DNA
regions11.
Another powerful tool for comparative genome analysis is DNA
microarray or DNA chip technology. Behr et al.12 exploited
sv. Typhimurium LT2 –
–
this technology to compare the
genomes of 13 variants of the
sv. Typhimurium (#2) 98%
2%
tuberculosis Bacille Calmette–
SPI-2
Guérin (BCG) vaccine strain,
sv. Muenchen
92%
5%
39 fragments
which was derived from a viruSPI-1
lent isolate of Mycobacterum
12 Metabolic
sv. Typhi
88%
9%
bovis early this century, with
6 Phages
that of Mycobacterium tubercu2 O-antigens
Subspecies V
N/A
20%
losis H37Rv. Of the 3924 open
3 Virulence
reading frames (ORFs) of M. tu14 Not defined
E. coli K-12
45%
30%
berculosis H37Rv, 3902 were
spotted on a microarray and hytrends in Microbiology
bridized with labeled BCG total
Fig. 1. Genomic variation in Salmonella enterica. The dendrogram shows the relationships of three
genomic DNA. One region was
S. enterica subspecies I serovars (including two strains of sv. Typhimurium), S. enterica subabsent from all BCG strains, perspecies V and Escherichia coli K-12. S. enterica subspecies II, IIIa, IIIb, IV, VI and VII were omitted
haps representing loss during
as no subtractive hybridization data were available for these subspecies. The branching order was
the initial process of attenuation.
inferred from multilocus enzyme electrophoresis (MLEE) and sequence data14,39. Two significant
events
in the evolution of virulence occurred through the gain of pathogenicity islands: Salmonella
Four regions were apparently
pathogenicity island 1 (SPI-1; found in all subspecies) and SPI-2 (found in all subspecies except
deleted in various BCG strains
V)22 are marked on the nodes. DNA–DNA hybridization data with reference to S. enterica LT2 are
during subsequent passages.
from Crosa40. The percentage of non-homologous DNA for S. enterica strains was determined with
Nine regions, of a total of 50 kb
reference to S. enterica LT2 by analysis of residual DNA after subtractive hybridization11, and for
E. coli K-12 with reference to S. enterica LT2 by comparison of genome data41 (incomplete for LT2).
containing 61 ORFs, were found
39 random fragments sampled from the 20% of DNA present in LT2 but absent in subspecies V
to be absent in BCG strains and
were sequenced11 and the sequences have been used to identify 35 of the corresponding genes
virulent M. bovis strains. Alusing databases for genomes being sequenced for S. enterica sv. Typhimurium and Typhi (see webthough they were regarded as
sites http://genome.wustl.edu/gsc/bacterial/salmonella.shtml and http://www.
M. bovis deletions12,13, it is surely
sanger.ac.uk/Projects/S_typhi/). The boxed text shows the general categories for the 23
of these genes for which such information was obtained in BLAST searches in Genbank. The majority
equally likely that many of these
are metabolic pathway genes, possibly gained to expand ecological niches during divergence of
genes were gained by M. tuberS. enterica.
culosis when it became a humanspecific pathogen. Behr et al.12
reported that their current microarray detects de- to probe LT2 DNA cosmids. The Typhimurium,
letions as small as 2 kb (i.e. 1/2000th of a genome), Muenchen, Typhi and subspecies V strains were estiillustrating the power of this technology. However, a mated to differ from LT2 by 2%, 5%, 9% and 20%,
serious drawback of this approach is that it is unidi- respectively. This correlates with the divergence of
rectional, that is, we can only use it to screen for genes the four strains found by multilocus enzyme electrowe have already identified.
phoresis (MLEE), sequence variation of housekeeping
genes and DNA–DNA hybridization data (Fig. 1).
Genome variation in Salmonella enterica and E. coli
Further insight was gained by sequencing 39 fragGenome size in S. enterica and E. coli has been shown ments from the subspecies-V-strain-subtracted LT2
to vary by as much as 20% between isolates.
DNA11. Sixteen had a G1C content below the species
average, indicating derivation from distantly related
S. enterica
species, but over half had a G1C content within the
There are .2000 serovars of S. enterica. Initially, full normal range for the species. Of six for which a dataspecies names were given but all serovars are now in base search indicated a function, the majority related
one species, S. enterica, with subspecies V having ar- to the utilization of specific substrates. We have now
guable status as a separate species, Salmonella bon- used the partial genome sequences of S. enterica sv.
gori. S. enterica has a well-defined subspecies struc- Typhimurium LT2 and Typhi (http://genome.
ture with deep branch lengths14. We have analysed wustl.edu/gsc/bacterial/salmonella.shtml
the genome differences of four strains by genomic and http://www.sanger.ac.uk/Projects/S_
subtraction11. Strain LT2 (sv. Typhimurium) was typhi/) to obtain the full sequence of the genes
used as the target and separately subtracted by sub- sampled by 35 of the 39 fragments. BLAST searches
species I strains of serovars Typhimurium, Muenchen suggest that 12 are related to metabolism, six to
and Typhi, and a subspecies V strain. The amount of phage, two to O-antigen synthesis and three to viruLT2 DNA not present in each of the other strains was lence (Fig. 1). Of course, some of the variation in, for
estimated by using the LT2 DNA left after subtraction example, metabolic pathway genes could relate to
TRENDS
IN
MICROBIOLOGY
397
VOL. 8
NO. 9
SEPTEMBER 2000
REVIEWS
virulence but, if so, the relevance is not immediately
obvious, and it appears that much of the variation
relates to aspects of S. enterica ecology other than
virulence.
E. coli
E. coli is a diverse species with commensal isolates as
well as many pathogenic clones. There is a great variation in genome size. By genomic subtraction against
the reference K-12 strain, an avian enteropathogenic
E. coli strain was found to carry a total of 12 unique
regions with an estimated 350-kb unique DNA15.
Using their modified macrorestriction mapping,
Bloch’s group16 identified the differences from the
K-12 genome in RS218, a newborn-sepsis-associated
strain, and J96, a urinary tract infection (UTI) isolate.
Strain RS218 has 10 additions totalling 537 kb and
one 20-kb deletion. Strain J96 has 493 kb of additional DNA at four locations and two deletions of
53 and 35 kb. Two of the J96 segments are known
PAIs17. Our final example is the well-known clone
O157:H7, which is being sequenced and is reported
to have a genome 20% larger than that of K-12
(Ref. 18).
A more comprehensive but less detailed picture of
genome differences in this species was obtained by
Bergthorsson and Ochman19, who measured the
genome sizes of 35 isolates of the ECOR set (a widely
used reference collection of E. coli strains) by I-ceuI
macrorestriction mapping. The chromosome lengths
ranged from 4.5 to 5.5 Mb, with the variation being
dispersed throughout the genome. The major E. coli
groups differ in average genome size, with groups B2
and D having the largest genomes. PAIs have contributed to this variably present DNA. Boyd and
Hartl20 showed that of four virulence genes (hly, kps,
pap and sfa) from UTI-associated PAIs17, at least two
are present in the majority of B2 and D strains but
only sporadically elsewhere. This provides just a hint
of the patterns we can expect to see as we explore the
species genome.
Genome variation in H. pylori and N. meningitidis
There are complete genome sequences available for
two isolates each of H. pylori and N. meningitidis, allowing gene-by-gene comparison of genome variation.
H. pylori
H. pylori occupies a single niche, the gastric mucosa,
but causes a range of diseases and was the first
bacterial species for which the complete genome
sequence of two isolates was determined2,3. The two
isolates of H. pylori essentially provide a random
sampling of the species variation. One strain, J99,
was from a patient with a duodenal ulcer and the
other, 26695, was from a gastritis patient. Eighty
nine of the 1495 genes (6%) in J99 and 117 of the
1552 genes (7.5%) in 26695 are specific to their respective strain, with almost half of these genes being
clustered in a single hypervariable region. Sequence
divergence in other genes ranges from 0–30%. As yet,
little is known of the features of the strain-specific
TRENDS
IN
MICROBIOLOGY
398
genes, with only 51 of the total of 206 assigned to a
functional category, including DNA restriction or
modification (31 genes), cell-envelope synthesis (6),
cellular processes (6), DNA replication (4), energy
metabolism (3) and phospholipid metabolism (1).
N. meningitidis
N. meningitidis is generally a commensal that lives on
the mucosa of the nasopharynx. However, some
strains are invasive and can enter first the bloodstream and then the cerebrospinal fluid, causing
meningitis. Most of the pathogenic strains fall into
three serogroups and representatives of the group A
(Strain MC58) and group B (strain Z2491) serogroups
have been sequenced4,5. We downloaded sequences
for all ORFs of MC58 and Z2491 (2155 and 2121
ORFs, respectively) and performed BLASTN searches
for homologous genes. Two hundred and thirty nine
(11.1%) of the MC58 ORFs and 208 (9.8%) of the
Z2491 ORFs are not present in the other strain. We
used a P value of 10220 for the BLASTN non-homologous cut-off. However, a proportion of the ORFs
of MC58 and Z2491 with a BLASTN P value of
10220–10250 (42 and 59 ORFs, respectively) are only
partially homologous but not included. Of the strainspecific ORFs, 87% are of unknown function. Only
31 of the 239 MC58 ORFs and 28 of the 207 Z2491
ORFs have been assigned (putative) functions including, for example, capsule biosynthesis, as expected,
and membrane proteins.
In summary, studies based on subtractive hybridization or genome comparisons show that a substantial
amount of DNA present in one strain can be absent in
another.
Horizontal gene transfer and niche adaptation
Bacterial clones are often adapted to specific niches21
(see Box 1). In E. coli and S. enterica, both well-studied organisms at the population level, there are many
clones that are specialized in the disease they cause
and the hosts they colonize. It has been proposed that
such clones are maintained by niche adaptation21,
and that new clones arise by horizontal transfer of
beneficial genes when clones are adapting to new
niches. It is important to note that the transfer of a
gene or genes beneficial to the recipient will be followed by selection of the recombinant, which gives a
high probability of fixation in the clone and allows
the expansion of niches. This can occur even in highly
clonal species with very low levels of recombination21. The variation in genes present in different
clones provides the major genetic diversity within a
species, with the extent of the diversity depending on
the range of niches occupied.
PAIs
PAIs have been a major focus of the study of pathogenicity. The PAIs of S. enterica are particularly interesting in terms of the evolution of virulence and adaptation. There are five PAIs known in this species. The
distribution of the known PAIs in different subspecies
of S. enterica suggests that the acquisition of PAIs by
VOL. 8
NO. 9
SEPTEMBER 2000
REVIEWS
horizontal transfer was the essential step in becoming
a pathogen22. SPI-1 contains genes required for the invasion of epithelial cells23 and is present in all subspecies but absent in E. coli24. This suggests that, after
acquisition of SPI-1, S. enterica became capable of invading intestinal epithelium and multiplying within
gut-associated lymphoid tissue and effectively established a niche different from the niche in the gut
lumen occupied by its close relative E. coli. The acquisition of additional pathogenicity islands, after
the divergence of subspecies V, probably allowed
S. enterica to spread from the intestinal tissue into the
bloodstream and multiply within macrophages, thus
expanding the niche to intracellular locations. SPI-2
is required for systemic infections and is present in all
subspecies apart from subspecies V (Refs 24,25). SPI-3
(Ref. 26) and possibly SPI-4 (Ref. 27) are required for
survival within macrophages, whereas SPI-5 (Ref. 28)
carries genes required for enteropathogenicity. SPI-3
is variably present in all subspecies and must have
a complex history26, but we have no distribution
information for SPI-4 or SPI-5.
O-antigens
The variation within a species can be looked at either
as polymorphism or as a result of lateral transfer between clones. In the case of S. enterica, as a highly
clonal species, O-antigen variation can be viewed as a
stable polymorphism within a species, maintained by
niche selection. Antigenic diversity is common and is
probably a widely used mechanism of evading the immune system. The O-antigen is an extremely variable
and antigenic surface polysaccharide, with 46 known
forms in S. enterica. Twenty five forms are present in
at least three subspecies and only five rare forms are
limited to a single subspecies. The distribution of
O-antigens provides evidence of extensive movement
of the O-antigen-encoding genes in the seven subspecies29. Evidence for such transfer can be seen in the
gnd gene, which is adjacent to the O-antigen locus. As
for other genes, subspecies-specific forms of gnd can
be recognized, but frequently a part of the gene is cotransferred with the O-antigen genes, carrying its
subspecies signature30.
Gene loss in bacterial evolution
Genes that for some reason become deleterious or are
no longer beneficial to a clone will be lost by the accumulation of mutations and deletions; this presumably balances gains by lateral transfer. Lysine decarboxylase (LDC) is widely present in E. coli but is
usually absent in Shigella and enteroinvasive (EIEC)
strains of E. coli. When the LCD-encoding gene was
cloned into a Shigella strain there was a significant reduction in virulence (fitness)31. LCD catalyses a reaction to produce cadaverine, which inhibits the enterotoxin and hence attenuates virulence, and the region
containing the LDC-encoding gene was found to be
deleted in Shigella and EIEC clones. This is an excellent example of a gene becoming deleterious in a new
niche, thereby generating strong selection pressure
for gene loss. The selection pressure for loss of activity
TRENDS
IN
MICROBIOLOGY
399
Box 1. Definition of species in higher organisms and
bacteria
The species concept is generally applied to bacteria, yet it has
long been recognized that there are difficulties. In sexually reproducing higher organisms, species are, in general, readily recognized in any well-worked groups. There are many definitions of a
species but that of Mayr a ‘groups of actually or potentially interbreeding natural populations, which are reproductively isolated
from other such groups’ is widely accepted and appropriate for our
purpose.
The important aspects of such species that differentiate them
from bacteria are: (1) that each individual acquires its genome almost equally from two parents, with the corollary that genetic polymorphisms are reassorted each generation such that one can
treat each individual as having a near-random sample, at least for
the local gene pool; and (2) that as species diverge hybrids become less viable, as they will at best combine an assortment of
all characters that differentiate the two species, and so be adapted
to neither niche, and at worst will suffer genetic incompatibility,
leading to imperfect development.
Bacteria propagate by binary fission; gene exchange is therefore relatively rare, occuring by transfer of one or few genes from
one individual to another. This enables extreme diversity to develop by niche adaptation of clonesb and also enables genes to
transfer between species. Perhaps surprisingly, despite occasional suggestions that the latter aspect renders the traditional
species concept inapplicable, bacterial species are also readily
recognized in well-studied cases. When housekeeping genes (part
of the core set of genes discussed in the text) are sequenced,
each species has a well-defined cluster of related sequences,
with only occasional evidence of interspecific transfer. It seems
clear that, despite the opportunity for gene transfer between
species, many and perhaps most genes in a species can be recognized by their sequence as being that particular species. That is
not to say that at some time in the past the gene did not arrive
from outside. Thus, although there are differences between species
concept in bacteria and sexually reproducing higher organisms, in
both cases species can be recognized by phenotypes and by
reference to a gene pool.
References
a Mayr, E. (1940) Speciation phenomena in birds. Am. Nat. 74, 249–278
b Reeves, P.R. (1992) Variation in O antigens, niche specific selection and
bacterial populations. FEMS Microbiol. Lett. 100, 509–516
will be high, as these strains live mainly within gut
epithelial cells, and the expression of LDC effectively
removes its ability to live in this niche.
Another Shigella example involves Shigella sonnei,
which has an O-antigen identical to that of serotype
17 of Plesiomonas shigelloides. The S. sonnei O-antigen gene cluster is on a plasmid and appears to have
been acquired by lateral transfer from P. shigelloides32. The original chromosomal O-antigen genes
have been inactivated by a deletion that fused the
O-antigen cluster and the upstream colanic acid
cluster33. One can speculate that the new O-antigen
offered a selective advantage in adapting to a new
environment, whereas the old O-antigen might have
exerted a severe burden on the cell, leading to its
eventual inactivation by deletion. There are other
examples of genes in O-antigen gene clusters that have
clearly undergone substantial mutational changes
after becoming redundant, such as wbaE in the
VOL. 8
NO. 9
SEPTEMBER 2000
REVIEWS
Questions for future research
• How can strain-specific DNA be collected systematically? Can
novel methods be developed?
• What is the best way to define a bacterial species (i.e. a good
species concept) and the best way to define strains for coverage
of a species genome?
• How many genes are present in the core set of genes and how
extensive are auxiliary genes in a species?
• What is the extent of gene polymorphism present within a
species?
• How many decaying genes are present in a genome?
O-antigen IIA gene cluster of Yersinia pseudotuberculosis34 and wzy genes in the B and D1 antigen gene
clusters of S. enterica35. The wzy gene in group B
strains, for example, has many regions deleted, which
total 73% of the gene, but there are sufficient regions
remaining to identify it as a remnant of a wzy gene
that is very similar to the functional wzy gene of D3.
This loss of DNA will presumably continue until
there is no remnant of the gene left. We suggest that
this whole process be called gene decay.
The species genome concept
Even the limited data available on pairwise comparisons show that up to 20% of the DNA in one strain
can be absent in another. The total amount of such
DNA is, at present, unknown, but it is part of the
genome of that particular species. Clearly, comparison with different strains will add to the DNA in this
class. If we accept the concept of the species genome,
comprising all genes found in the species, then the
genes of any individual will include two components:
the core set of genes and the auxiliary genes. Genes
found in most individuals, which we can call the core
set of genes for that species, are the genes that determine those properties characteristic of all members of
the species. Additionally, each strain will have some
auxiliary genes, which determine properties found in
some but not all members of the species. The distinction will not be absolute but we believe it provides a
useful framework. Suitable boundaries might become
obvious as our knowledge of intraspecies variation
grows, but we suggest as a starting point that genes
found in 95% or more of isolates form the core set
and genes found in 1–95% of isolates form part of the
auxiliary set of genes; those present in ,1% are provisionally treated as foreign genes or genes being lost
from the species. The cutoffs are arbitrary, and with
better knowledge one would define the lower cutoff
in terms of genes that persisted in the species for long
enough to show that their presence was maintained
by selection.
The tools of subtractive hybridization and microarray technology will enable these components of the
genome to be identified. For the core genome, one can
use microarrays to determine how many genes of a
reference strain are present in 95% or more of a set of
isolates. The auxiliary set of genes comprises those in
the species genome but not in the core set of genes. To
determine the latter is not straightforward but, for
TRENDS
IN
MICROBIOLOGY
400
example, if a number of strains are compared with a
reference strain by subtractive hybridization, for each
strain one can collect the DNA present in it but not
the reference strain. From comparison of the individual pools of subtracted DNA one can determine how
much of it is present in one, two, three and so on,
of the strains, and then determine statistically how
many isolates need to be studied to achieve a reasonable approximation of the species genome. The application of these methods will give us an overview of
the genome of a bacterial species, and of course provide the DNA to enable characterization of the two
species genome components.
The core set of genes as we define it is not equivalent to the ‘minimal’ set of genes described by Hutchison et al.36 Our core set includes all genes generally
found in any individual of the species and will include
genes not required for growth in the specific experimental growth conditions used to define the ‘minimal’
gene set.
Conclusions and prospects
Bacterial taxa at all levels seem to be defined in large
part by the presence or absence of genes relative to
other taxa. This is apparent from the description of,
for example, species and genera, which are often defined by a combination of biochemical functions, and
confirmed by the genome sequences now available.
The phenomenon also applies to clones of a species,
as they also vary in the genes present. It is important
that in the new genome era this intraspecies variation
be recognized by exploring the gene content of the
whole species, the ‘species genome’; only rarely will
the genome of a single isolate represent the genetic
potential of a bacterial species. DNA chip technology
provides a powerful means to scan for the presence or
absence of the genes already sequenced12. However,
the major task will be to find the genes not present in
the sequenced isolates. A systematic approach is
needed, as it is this which will provide the ultimate
gateway to an understanding of the evolution and biology of the species. We have reviewed the currently
used methods. Subtractive hybridization remains the
method of choice for extending the species genome as
discussed above for an avian pathogen of E. coli15.
Microarray technology seems best suited for studying
the distribution of known genes, which, until now
has been done mainly by subtractive hybridization
for S. enterica11 as discussed, and more recently for
Neisseria species37.
It is also interesting to speculate on why horizontal
transfer is so common in bacteria. It probably relates
to the very nature of single-celled organisms. A newly
transferred gene is of immediate benefit to the whole
organism and, furthermore, will be easily passed on
to the next generation. In a multicellular organism it
could benefit only one cell and its descendants within
the organism, and in most animals only if it can be
passed on in the germline to the next generation.
Also, in animals and plants the major adaptations
involve changes in developmental processes, which
are more likely to require fine-tuning than added
VOL. 8
NO. 9
SEPTEMBER 2000
REVIEWS
functions. Even desirable new functions will generally only be useful if expression is regulated to occur
at the appropriate time and place, whereas bacteria
can often gain benefit without regulation, which can
be added later. In the circumstances, it is hardly surprising that horizontal gene transfer is so important
in prokaryote evolution and indeed seems to go back
to the very origins of the three major domains38.
Acknowledgements
Research in the authors’ laboratory is supported by the Australian
Research Council and National Health and Medical Research Council.
We thank the reviewers for helpful suggestions.
References
1 Blattner, F.R. et al. (1997) The complete genome sequence of
Escherichia coli K-12. Science 277, 1453–1474
2 Tomb, J.F. et al. (1997) The complete genome sequence of the
gastric pathogen Helicobacter pylori. Nature 388, 539–547
3 Alm, R.A. et al. (1999) Genomic-sequence comparison of two
unrelated isolates of the human gastric pathogen Helicobacter
pylori. Nature 397, 176–180
4 Tettelin, H. et al. (2000) Complete genome sequence of
Neisseria meningitidis serogroup B strain MC58. Science 287,
1809–1815
5 Parkhill, J. et al. (2000) Complete DNA sequence of a serogroup
A strain of Neisseria meningitidis Z2491. Nature 404, 502–506
6 Liu, S-L. and Sanderson, K.E. (1996) Highly plastic chromosomal
organization in Salmonella typhi. Proc. Natl. Acad. Sci. U. S. A.
93, 10303–10308
7 Leblond, P. and Decaris, B. (1998) Chromosome geometry and
intraspecific genetic polymorphism in Gram-positive bacteria
revealed by pulsed-field gel electrophoresis. Electrophoresis
19, 582–588
8 Rode, C.K. et al. (1995) New tools for integrated genetic and
physical analysis of the Escherichia coli chromosome. Gene
166, 1–9
9 Straus, D. and Ausubel, F.M. (1990) Genomic subtraction for
cloning DNA corresponding to deletion mutations. Proc. Natl.
Acad. Sci. U. S. A. 87, 1889–1893
10 Akopyants, N.S. et al. (1998) PCR-based subtractive hybridization
and differences in gene content among strains of Helicobacter
pylori. Proc. Natl. Acad. Sci. U. S. A. 95, 13108–13113
11 Lan, R. and Reeves, P.R. (1996) Gene transfer is a major factor
in bacterial evolution. Mol. Biol. Evol. 13, 47–55
12 Behr, M.A. et al. (1999) Comparative genomics of BCG vaccines
by whole-genome DNA microarray. Science 284, 1520–1523
13 Young, D.B. and Robertson, B.D. (1999) TB vaccines: global
solutions for global problems. Science 284, 1479–1480
14 Selander, R.K. et al. (1996) Evolutionary genetics of Salmonella
enterica. In Escherichia coli and Salmonella: Cellular and
Molecular Biology (2nd edn) (Vol. 2) (Neidhardt, F.C. et al.,
eds), pp. 2691–2707, ASM Press
15 Brown, P. and Curtiss, R. (1996) Unique chromosomal regions
associated with virulence of an avian pathogenic Escherichia coli
strain. Proc. Natl. Acad. Sci. U. S. A. 93, 11149–11154
16 Rode, C.K. et al. (1999) Type-specific contribution to chromosome
size differences in Escherichia coli. Infect. Immun. 67, 230–236
17 Hacker, J. et al. (1997) Pathogenicity islands of virulent bacteria:
structure, function and impact on microbial evolution. Mol.
Microbiol. 23, 1089–1097
18 Perna, N.T. et al. (1998) Comparative genomics of E. coli K–12,
O157:H7 and related enterobacterial pathogens. In ASM
Conference on Small Genomes, p. 7, ASM Press
19 Bergthorsson, U. and Ochman, H. (1998) Distribution of
chromosome length variation in natural isolates of Escherichia
coli. Mol. Biol. Evol. 15, 6–16
TRENDS
IN
MICROBIOLOGY
401
20 Boyd, E.F. and Hartl, D.L. (1998) Chromosomal regions specific
to pathogenic isolates of Escherichia coli have a phylogenetically
clustered distribution. J. Bacteriol. 180, 1159–1165
21 Reeves, P.R. (1992) Variation in O antigens, niche-specific
selection and bacterial populations. FEMS Microbiol. Lett.
100, 509–516
22 Baumler, A.J. (1997) The record of horizontal gene transfer in
Salmonella. Trends Microbiol. 5, 318–322
23 Mills, D.M. et al. (1995) A 40-kb chromosomal fragment
encoding Salmonella typhimurium invasion genes is absent from
the corresponding region of the Escherichia coli K-12
chromosome. Mol. Microbiol. 15, 749–759
24 Ochman, H. and Groisman, E.A. (1996) Distribution of
pathogenicity islands in Salmonella spp. Infect. Immun.
64, 5410–5412
25 Hensel, M. et al. (1997) Analysis of the boundaries of Salmonella
pathogenicity island 2 and the corresponding chromosomal
region of Escherichia coli K-12. J. Bacteriol. 179, 1105–1111
26 Blanc-Potard, A.B. et al. (1999) The SPI-3 pathogenicity island of
Salmonella enterica. J. Bacteriol. 181, 998–1004
27 Wong, K.K. et al. (1998) Identification and sequence analysis of a
27-kilobase chromosomal fragment containing a Salmonella
pathogenicity island located at 92 minutes on the chromosome
map of Salmonella enterica serovar Typhimurium LT2. Infect.
Immun. 66, 3365–3371
28 Wood, M.W. et al. (1998) Identification of a pathogenicity island
required for Salmonella enteropathogenicity. Mol. Microbiol.
29, 883–891
29 Reeves, P.R. (1995) Role of O-antigen variation in the immune
response. Trends Microbiol. 3, 381–386
30 Thampapillai, G. et al. (1994) Molecular evolution in the gnd
locus of Salmonella enterica. Mol. Biol. Evol. 11, 813–828
31 Maurelli, A.T. et al. (1998) ‘Black holes’ and bacterial
pathogenicity: a large genomic deletion that enhances the
virulence of Shigella spp. and enteroinvasive Escherichia coli.
Proc. Natl. Acad. Sci. U. S. A. 95, 3943–3948
32 Houng, H.H. et al. (1997) The roles of IS630 sequence in the
expression of the form I antigen of Shigella sonnei: molecular and
evolutionary aspects. In Ecology of Pathogenic Bacteria:
Molecular and Evolutionary Aspects (van der Zeijst, B.A.M. et
al., eds), pp. 282–283, Elsevier Science
33 Lai, V. et al. (1998) Escherichia coli clone Sonnei (Shigella
sonnei) had a chromosomal O-antigen gene cluster prior to
gaining its current plasmid-borne O-antigen genes. J. Bacteriol.
180, 2983–2986
34 Hobbs, M. and Reeves, P.R. (1995) Genetic organisation and
evolution of Yersinia pseudotuberculosis 3,6-dideoxyhexose
biosynthetic genes. Biochim. Biophys. Acta 1245, 273–277
35 Curd, H. et al. (1998) Relationships among the O-antigen gene
clusters of Salmonella enterica groups B, D1, D2 and D3.
J. Bacteriol. 180, 1002–1007
36 Hutchison, C.A. et al. (1999) Global transposon mutagenesis and
a minimal Mycoplasma genome. Science 286, 2165–2169
37 Perrin, A. et al. (1999) Identification of regions of the
chromosome of Neisseria meningitidis and Neisseria gonorrhoeae
which are specific to the pathogenic Neisseria species. Infect.
Immun. 67, 6119–6129
38 Nelson, K.E. et al. (1999) Evidence for lateral gene transfer
between Archaea and bacteria from genome sequence of
Thermotoga maritima. Nature 399, 323–329
39 Beltran, P. et al. (1991) Reference collection of strains of the
Salmonella typhimurium complex from natural populations.
J. Gen. Microbiol. 137, 601–606
40 Crosa, J.H. et al. (1973) Molecular relationships among the
Salmonellae. J. Bacteriol. 115, 307–315
41 Wong, R.M-Y. et al. (1999) Sample sequencing of a Salmonella
typhimurium LT2 lambda library: comparison to the Escherichia
coli K12 genome. FEMS Microbiol. Lett. 173, 411–423
VOL. 8
NO. 9
SEPTEMBER 2000