Download Are there bacterial species, and what is the goal of metagenomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mitochondrial DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA barcoding wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Polyploid wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Koinophilia wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Metagenomics wikipedia , lookup

Microevolution wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Are
there
bacterial
species,
and
what
is
the
goal
of
metagenomics
Tom Doak, Lynch lab, Biology; [email protected]
20% time, labs of HaiXu Tang and YuZhen Ye
The
concept
of
species
•  Reproduc9ve
isola9on
(however,
morphology
is
always
the
first
pass
method)
•  Species
we
can
be
sure
about;
all
higher
taxonomic
levels
we
can
argue
about
•  Common
ancestor
(for
the
genome
as
a
set):
Phylogenies
Old fashioned bacterial classification: shape, staining, metabolism
Figure 25-4 Bacterial shapes and cell-surface structures
Bacteria are classified into three different shapes: (A) spheres (cocci), (B) rods (bacilli), and (C) spiral cells (spirochetes). (D) They are also classified as
Gram-positive or Gram-negative. Bacteria such as Streptococci and Staphylococci have a single membrane and a thick cell wall made of cross-linked
peptidoglycan. They retain the violet dye used in the Gram staining procedure and are thus called Gram-positive. Gram-negative bacteria such as E. coli
and Salmonella have two membranes, separated by a periplasmic space (see Figure 11-17). The peptidoglycan layer in the cell wall of these organisms is
located in the periplasmic space and is thinner than in Gram-positives; they therefore fail to retain the dye in the Gram staining procedure. The inner
membrane of Gram-negative bacteria is a phospholipid bilayer, and the inner leaflet of the outer membrane is also made primarily of phospholipids; the
outer leaflet of the outer membrane, however, is composed of a unique glycosylated lipid called lipopolysaccharide (LPS) (see Figure 25-40). (E) Cellsurface projections are important for bacterial behavior. Many bacteria swim using the rotation of helical flagella (see Figure 15-68). The bacterium
illustrated has only a single flagellum at one pole; others such as E. coli are decorated with multiple flagella all over the surface. Straight pili (also called
fimbriae) are used to adhere to surfaces in the host and to facilitate genetic exchange between bacteria. Both flagella and pili are anchored to the cell
surface by large multiprotein complexes.
Bacterial pathogenomics
Mark J. Pallen & Brendan W. Wren
Nature 449, 835-842(18 October 2007)
The
problem
of
horizontal
transfer
•  Genes
in
a
genome
do
not
share
common
decent.
In
the
worst
case,
a
mosaic
of
genes
from
different
sources
(from
different
“species”
).
•  Two
general
types,
–  selfish
elements
(transposons
and
phage)
–  Metabolic
genes
(generally,
enzymes)
•  Iden9fied
by:
–  Genome
comparisons
–  Composi9onal
differences
across
a
genome
We
found
that
755
of
4,288
ORFs
(547.8
kb)
have
been
introduced
into
the
E.
coli
genome
in
at
least
234
lateral
transfer
events
since
this
species
diverged
from
the
Salmonella
lineage
100
million
years
(Myr)
ago.
The
average
age
of
introduced
genes
was
14.4
Myr,
yielding
a
rate
of
transfer
16
kby/Myry
lineage
since
divergence.
Although
most
of
the
acquired
genes
subsequently
were
deleted,
the
sequences
that
have
persisted
('18%
of
the
current
chromosome)
have
conferred
properOes
permiPng
E.
coli
to
explore
otherwise
unreachable
ecological
niches.
IslandViewer | An integrated interface for computational identification and
visualization of genomic islands:
Salmonella enterica subsp. enterica serovar Typhi str. CT18
Nature
413,
852‐856.
2001
Complete
genome
sequence
of
Salmonella
enterica
serovar
Typhimurium
LT2
30 genes for vitamin B12 synthesis
Genomic islands in pathogenic and environmental microorganisms
Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker
Nature Reviews Microbiology 2, 414-424 (May 2004)
Genomic islands in pathogenic and environmental microorganisms
Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker
Nature Reviews Microbiology 2, 414-424 (May 2004)
Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker. 2004.
Genomic islands in pathogenic and environmental microorganisms.
Nature Reviews Microbiology 2, 414-424.
Genomic islands in two strains of a marine
phototroph, Prochlorococcus
The two strains of Prochlorococcus marinus are cyanobacteria, major oxygenic producers in the
oceans, consuming a large part of atmospheric CO2. The strains MED4 and MIT96512 differ by
only 0.8% of their genome, yet their distributions throughout the ocean are very different, for
unknown reasons. The reason for the difference in distribution may have to do with genes encoded
within five genomic islands specific to MED4 (ISL1, ISL4, ISL5) or to MIT9312 (ISL2, ISL3).
Iden9fica9on
of
Genomic
Islands:
Synechococcus
sp.
WH8102
vs.
Sargasso
Sea
MUMmerplot
(alignment
of
Sargasso
Sea
reads
against
the
genome
of
this
marine
cyanobacterium)
EvoluOon
of
virulence
in
Pseudomonas
syringae
pv.
phaseolicola.(John
Mansfield)
Sequence
and
funcOonal
analyses
of
Haemophilus
spp.
genomic
islands
Gene islands and genome diversity in Pseudomonas aeruginosa: Different Pseudomonas aeruginosa strains show a remarkable
genomic diversity mainly caused by insertion and deletion of mobile DNA blocks such as (pro)phages, plasmids, genomic islands and
other elements.
We have monitored large genomic islands in several P. aeruginosa strains and analysed these DNA blocks both for function of their
encoded proteins and mobilisation from the host genome.
Although these islands represent strain-specific insertions and can be excised and mobilised with different frequencies, the islands
have apparently evolved from a common ancestor with phage- and plasmid-like characteristics and belong to a family of related
genetic elements. All contain homologous parts with genes found in all related islands. Within these conserved parts unrelated blocks
of DNA are interspersed.
By screening larger collections of P. aeruginosa strains we could show that members of this family of genomic islands are widespread
within this species, and in the meantime more than 30 related DNA elements have been detected in the genomes of many different band g-proteobacteria.
Pseudomonas aeruginosa is an opportunistic pathogen for plants,
animals and man. It is responsible for severe nosocomial infections
and chronically colonizes lungs of patients with cystic fibrosis (CF)
leading to morbidity and mortality. The genomes of the reference strain
PAO1 (www.pseudomonas.com) and of several other strains have
been sequenced completely, the results are invaluable for the research
on pseudomonas genomics.
What
metabolic
abili9es
are
found
on
islands?
Pathogenesis
(eg.
host
range,
and
disease
effectors)
Drug
resistance
(eg.
mul9drug
resistance)
Alterna9ve
fuels
(eg.
degrading
PCBs)
Novel
biosynthe9c
pathways
(eg.
vitamin
B12)
Microbial
compe99on
effectors
(an9bio9cs)
Genomic islands in pathogenic and environmental microorganisms
Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker
Nature Reviews Microbiology 2, 414-424 (May 2004)
The genome of Salinibacter ruber:
Convergence and gene exchange among
hyperhalophilic bacteria and archaea
A schematic representation of the hypersalinity island identified in the genome of
Salinibacter
Mongodin E F et al. PNAS 2005;102:18147-18152
©2005 by National Academy of Sciences
The
no9on
that
all
prokaryotes
belong
to
genomically
and
phenomically
cohesive
clusters
that
we
might
legi9mately
call
‘‘species’’
is
a
conten9ous
one.
At
issue
are
(1)
whether
such
clusters
actually
exist;
(2)
what
species
defini9on
might
most
reliably
iden9fy
them,
if
they
do;
and
(3)
what
species
concept—by
which
is
meant
a
gene9c
and
ecological
theory
of
specia9on—might
best
explain
species
existence
and
ra9onalize
a
species
defini9on,
if
we
could
agree
on
one.
We
review
exis9ng
theories
and
some
relevant
data.
We
conclude
that
microbiologists
now
understand
in
some
detail
the
various
gene9c,
popula9on,
and
ecological
processes
that
effect
the
evolu9on
of
prokaryotes.
There
will
be
on
occasion
circumstances
under
which
these,
working
together,
will
form
groups
of
related
organisms
sufficiently
like
each
other
that
we
might
all
agree
to
call
them
‘‘species,’’
but
there
is
no
reason
that
this
must
always
be
so.
Thus,
there
is
no
principled
way
in
which
quesOons
about
prokaryoOc
species,
such
as
how
many
there
are,
how
large
their
populaOons
are,
or
how
globally
they
are
distributed,
can
be
answered.
These
ques9ons
can,
however,
be
reformulated
so
that
metagenomic
methods
and
thinking
will
meaningfully
address
the
biological
paderns
and
processes
whose
understanding
is
our
ul9mate
target.
‘‘
.
.
.
in
the
end,
I
think
the
debate
about
species
reality
boils
down,
sadly,
to
different
interpreta;ons
of
the
word
‘real’.’’
J.
Mallet
(2005)
Our
quota9on
is
from
a
review
of
Coyne
and
Orr’s
recent
authorita9ve
monograph,
Specia9on
(Coyne
and
Orr
2004).
The
book
deals
overwhelmingly
with
the
problems
and
prac9ces
of
systema9sts
who
work
with
nonmicrobes
(mostly
animals)
and
the
arguments
of
philosophers
and
historians
who
have
taken
an
interest
in
what
these
systema9sts
do.
But
Mallet’s
conclusion
applies
equally
to
debates
among
microbiologists.
We
too
remain
deeply
divided,
in
our
case
about
whether
or
not
prokaryotes
(i.e.,
Bacteria
and
Archaea;
pace
Pace
2006)
have
real
species
and
if
so
how
we
might
recognize,
enumerate,
and
integrate
them
into
exis9ng
theore9cal
frameworks
in
ecology,
popula9on
gene9cs,
and
evolu9onary
biology.
To
the
philosophically
inclined,
this
should
be
more
interes9ng
than
sad,
however.
At
the
end
of
this
essay
we
will
conclude
that
prokaryoOc
genomics
shows
us
that
there
is
no
reasonable
interpretaOon
of
the
word
‘‘real’’
that
can
be
applied
to
microbial
species
generally,
but
that
thinking
about
species
has
been
highly
producOve—and
learning
to
do
without
them
will
be
even
more
so.
Figure
1.
The
problema9cs
of
any
metapopula9on
lineage‐based
general
species
concept.
Arrowheads
represent
popula9ons
or
subpopula9ons
that
might
or
might
not
comprise
a
single
species.
Ohen,
phylogene9c
rela9onships
between
such
clusters
of
individuals
will
be
unknown
or
ambiguous:
(leh
panel)
Common
memberships
in
a
‘‘metapopula9on
lineage’’
cannot
be
established.
(middle
panel)
As
well,
there
is
in
principle
no
way
of
knowing
at
what
degree
of
divergence
subpopula9ons
assume
independent
‘‘evolu9onary
roles
and
tendencies,’’
[i.e.
Bacteria
don’t
have
sex,
to
define
specie]
and
thus
no
way
of
recognizing
minimally
inclusive
groupings
(that
is,
of
dis9nguishing
species
from
higher
taxonomic
groupings).
(right
panel)
And,
when
individuals
are
the
product
of
extensive
gene
exchange,
the
very
no9on
of
lineage
becomes
problema9c.
HGT
—
gene
exchange
between
non‐related
organisms
—appears
commonplace
among
bacteria,
but
contributes
just
small
fragments
of
gene9c
informa9on,
leaving
the
tradi9onal
tree
of
life
intact.
From:
Comparing
Gene
Trees
and
Genome
Trees:
A
Cobweb
of
Life?
PLoS
Biol
3:e347
773
genomes
available
in
NCBI’s
RefSeq
database
were
ini9ally
clustered
using
16S
rRNA
iden9ty
of
at
least
97%
as
a
guide
to
form
groups.
A
dozen
clusters
were
selected
(list
of
genomes
within
each
cluster
is
available
in
Supplemental
Table
1).
Clustering of cores: A possible
recourse for species monism
and realism?
Figure
2.
Comparison
of
average
nucleo9de
iden99es
(ANI)
with
gene
content.
773
genomes
available
in
NCBI’s
RefSeq
database
were
ini9ally
clustered
using
16S
rRNA
iden9ty
of
at
least
97%as
a
guide
to
form
groups.
A
dozen
clusters
were
selected
(list
of
genomes
within
each
cluster
is
available
in
Supplemental
Table
1).
For
genomes
within
each
cluster,
pair‐
wise
ANI
was
calculated
essen9ally
as
described
in
Konstan9nidis
and
Tiedje
(2005).
Shared
genes
for
each
pair
of
genomes
were
iden9fied
as
reciprocal
top
scoring
BLASTPmatches
(E‐value
<
0.001,
z
=
20,000,000).
The
propor9on
of
shared
genes
was
calculated
as
a
ra9o
of
the
number
of
shared
genes
over
the
average
number
of
genes
in
two
genomes.
Each
ORF
in
a
genome
was
assigned
to
a
func9onal
category
according
to
the
Clusters
of
Orthologous
Groups
(COG)
database
(August
2005
release),
and
three
selected
categories
are
depicted
in
this
figure:
categories
J,
P,
and
Q
in
COG
category
one‐leder
designa9on.
Note
that
genomes
of
the
E.
coli/Shigella
group
have
similar
ANI
values,
but
drama9cally
varying
gene
content.
Some
groups
form
9ght
clusters
(e.g.,
Legionella
spp.),
while
others
exhibit
a
con9nuum
of
ANI/shared
genes
values
(e.g.,
Burkholderia
spp.).
The
clustering
also
exhibits
a
large
variability
in
the
number
of
shared
genes
if
genes
are
considered
by
func9onal
category.
Figure
3.
Phylogene9c
rela9onships
among
selected
genomes
in
the
Prochlorococcus
marinus/marine
Synechococcus
group.
Each
point
in
a
triangle
(simplex)
represents
a
set
of
orthologous
genes
that
contains
at
least
four
analyzed
genomes
(and
as
many
as
19
genomes
from
this
group).
Posi9on
of
the
point
in
the
barycentric
coordinate
system
(triangle)
depends
on
bootstrap
support
values
for
each
of
three
possible
tree
topologies
with
which
each
vertex
is
associated.
The
closer
the
point
to
the
vertex,
the
higher
its
bootstrap
support
for
that
tree
topology.
Poorly
resolved
rela9onships
result
in
points
located
closer
to
the
center
of
the
triangle.
Values
at
each
vertex
refer
to
the
number
of
sets
of
orthologous
genes
that
support
the
tree
topology
at
the
vertex
overall,
with
at
least
80%
and
at
least
90%
bootstrap
support,
respec9vely.
For
a
full
descrip9on
of
the
methodology
used
to
analyze
embedded
quartets,
see
Zhaxybayeva
and
Gogarten
(2003)
and
Zhaxybayeva
et
al.
(2006).
Genomes
are
designated
by
their
strain
names.
(Bold)
Genomes
of
marine
Synechococcus
spp.,
(italics)
low‐light
adapted
Prochlorococcus
marinus
genomes,
(plain
font)
Prochlorococcus
marinus
high‐light
adapted
strains
(all
genomes
are
from
NCBI’s
RefSeq
database).
Full
analyses
of
the
phylogene9c
rela9onships
within
this
group
as
well
as
details
on
the
selec9on
of
sets
of
orthologous
genes
and
phylogene9c
analyses
performed
will
be
presented
elsewhere
(O.
Zhaxybayeva,
F.
Doolidle,
T.
Papke,
and
P.
Gogarten,
in
prep.).
Clustering of cores: A possible
recourse for species monism
and realism?
Residual
Ks
Synonymous
subs9tu9ons
per
site
(Ks)
We
separated
sequence
divergence
into
rate
and
Ome
components,
revealing
that
different
regions
of
the
Escherichia
coli
and
Salmonella
enterica
chromosomes
diverged
over
a
~70‐million‐year
period.
GeneOc
isolaOon
first
occurred
at
regions
carrying
species‐specific
genes,
indicaOng
that
physiological
disOncOveness
between
the
nascent
Escherichia
and
Salmonella
lineages
was
maintained
for
tens
of
millions
of
years
before
the
complete
geneOc
isolaOon
of
their
chromosomes.
Distance
from
the
Escherichia
coli
replica;on
origin
(Mb)
Codon
Usage
Bias
(standard
devia9ons
from
the
CAI
mean)