Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcriptional regulation wikipedia , lookup

X-inactivation wikipedia , lookup

Community fingerprinting wikipedia , lookup

List of types of proteins wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Molecular cloning wikipedia , lookup

Plasmid wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

DNA supercoil wikipedia , lookup

Molecular evolution wikipedia , lookup

Genomic library wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Replisome wikipedia , lookup

Transcript
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Chapter 3. DNA production in the bacterial cell
Complex molecules in biology are usually made of polymers from simple 'building block'
entities, and different functions are obtained from variation in the sequence of these building
blocks. Polymerization processes involve three stages: initiation, elongation, and then
termination. This is true for the production of all the major polymers in the cell: DNA, RNA,
proteins and carbohydrates. In the case of DNA, individual nucleotides are polymerized into
long molecules. The production of a copy of a DNA molecule is called replication and it
occurs through the same three stages: initiation (at a specific site in the chromosome, called
the replication origin), elongation (the synthesis of DNA strands - in this case, copying of a
template strand) and termination (usually on the opposite end of the chromosome from the
origin, called replication terminus). The main enzyme responsible for elongation is DNA
polymerase, which produces two copies of each genome needed for cell division.
3.1. Production of the chromosome
DNA polymerase produces a DNA strand from a single-strand template
The main enzyme responsible for the polymerization of DNA nucleotides is DNA
polymerase. However, it is important to note that DNA polymerase (DNA pol for short) is not
the only enzyme involved in the production of genome copies, and it functions as an enzyme
complex, in which many different proteins partake. At the core of this process is a DNA pol
that connects nucleotides (dNTPs) to each other to form a DNA strand. It can only do this
when an existing denatured (single-stranded) DNA strand dictates the order of the nucleotides
it connects: single-stranded DNA has to act like a template. The enzyme attaches nucleotides
at the 3'-OH end of a growing DNA strand, as shown by the grey strand in Figure 3.1, using
the existing single-strand DNA as a template to connect the nucleotides into the correct order.
However, the DNA polymerase can only extend a nucleotide that is already present, which
means at least a piece of double-strand DNA must already exist, as in the figure. DNA
polymerase can't start on a completely single-stranded template, as it wouldn't have a
nucleotide to extend.
Figure 3.1. DNA nucleotides are connected by DNA polymerase A single-strand DNA
molecule, with a short double-strand part (which can be DNA or RNA), allows DNA
polymerase to produce a complementary strand. A few nucleotides are specified for
clarification. When the enzyme has reached the end of the template, the product is a complete
double stranded DNA that exists of one old strand (the template, grey) and one newly formed
strand (black).
32
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery The three stages of DNA replication
DNA polymerase (together with multiple other enzymes) is responsible for production of a
genome copy in the cell, but how are the requirements met of using single-stand DNA as a
template, and a piece of double-strand DNA to start with? During initiation, a bubble opens
up at the origin of replication (ori). This is a specific section of the chromosome, and to open
it up, a crucial protein called DnaA is required. The gene coding for DnaA is usually located
close to the ori. The opened bubble has to be stabilized by proteins, as single-strand DNA
(ssDNA) is an unfavorable confirmation and is sensitive to both mutation and degradation in a
cell. A protein called SSB, for single-strand DNA binding protein, provides stabilization of
ssDNA. An enzyme called primase (an RNA polymerase which can start all by itself) will
next produce a short RNA fragment complementary to each of the two strands. These socalled primers will serve as a starting point for DNA polymerase for each of the strands that
have to be produced. The details of the opening of the origin of replication will be explained
in detail in the next chapter. Once the two primers are in place, two copies of the enzyme
DNA polymerase both produce a DNA strand, each producing its own complimentary strand.
This process is described as elongation. Because the two strands of DNA are anti-parallel, the
two enzymes work in opposite directions so that they are moving away from each other, as
shown in Figure 3.2. For the other strand, a primer is produced a little further downstream,
and DNA pol will extend that strand in the opposite direction.
Finally, the last step, termination, will take place. When the two replichores near
completion, the two replication forks on both sides of the chromosome will meet each other.
This happens at the termination region, which is not as strictly defined as the origin of
replication. Termination may be a spontaneous process when the two forks meet in some
bacteria, but in many bacterial chromosomes it is regulated. Usually, multiple repeat
sequences (each repeat unit, called ter, is 22 basepairs long) act as blocks to the polymerase,
in order to avoid that one of the two approaching polymerase enzymes shoots through: these
repeats allow passage of the enzyme only in the orientation towards the middle. Specific
proteins bind to ter and stop the replication machinery. The protein ligase glues the two loose
ends together.
It is important to realize that when a chromosome is copied, the two products will consist
of one old strand (which served as template) and one novel strand. This is called
semiconservative DNA synthesis, which means that the DNA of two cells after cell division
will always be a mixture of old and new DNA.
33
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.2. DNA replication of leading and lagging strands at the origin of replication.
A. To start replication, a bacterial genome opens at the origin of replication and is stabilized
by proteins (not shown here). A leading strand is continuously produced in both directions
from the origin, though it is the opposite strand in opposite directions. Newly produced DNA
is shown in black, the template strands are shown in grey.
B. At the same time, the protein produces the lagging strand as fragments, each starting from
short RNA primers (blue) at intervals to keep up with the extending bubble. The RNA primers
are later replaced by DNA and the lagging strand fragments are glued together.
C. The final DNA copies are a mixture of an existing strand (grey in the figure) and a novel
DNA strand (black). Following one strand along a complete circular chromosome, one moves
from one lagging half to one leading half, separated by the origin and terminus of replication.
34
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Production of the leading and the lagging strand
During elongation, the bubble increases as more and more DNA is being copied. The borders
of the bubble are called the replication forks and as the new DNA is being elongated, the
growing chromosomes are called replichores. The replication forks move with a speed of 600
to 1000 basepairs per second, the speed with which DNA polymerase can synthesize DNA.
For each replichore, the enzyme will produce one continuous DNA strand, which is called the
leading strand. The complementary strand of each replichore can only be made with multiple
starts, from multiple primers (separated every 1000 - 2000 basepairs), as it is elongated in the
opposite direction of the moving replication fork. The strand that is produced from these
multiple primers is called the lagging strand. DNA polymerase will continue with the lagging
strand till it reaches the previous RNA primer, where it stops, to continue further downstream
the replication fork with the next primer. This results in a strand that consists of DNA
fragments, called Okazaki fragments, interrupted by short RNA fragments.
The enzyme that produces the leading and lagging strand is called DNA polymerase III
(Pol III); it is the fastest DNA polymerase known. A different type of DNA polymerase will
perform the next step for the lagging strand: DNA polymerase I (Pol I), which has specific
exonuclease activity, will eat away the RNA of each primer from the 5'-end, in the same
direction in which it then synthesizes DNA. Following the combined exonuclease and DNA
polymerase activity of Pol I, the lagging strand still only exists as disconnected DNA
fragments. A separate ligase can join these by fusing the ribose-phosphate bond between
adjacent nucleotides; this is something DNA polymerases can't do.
Multiple enzymes are required for DNA replication
Replication does not only depend on DnaA, two types of DNA polymerase, primase and
ligase; additional proteins are necessary for the complete replication machinery, with many
proteins to solve specific problems. For instance, after separation of the two strands by DnaA,
during initiation at the origin of replication, the two strands of the growing replication fork are
not separated by DNA polymerase itself, but by a protein called helicase. Stabilizing proteins
are needed to prevent DNA polymerase from detaching from the DNA, and other proteins
bind to the temporarily single-strand DNA to protect it from degradation. As was already
mentioned, replication causes positive supercoiling upstream, and negative supercoiling
downstream of the opening bubble. Gyrase will relax the upstream positive supercoiling by
introducing extra negative coils, but his causes the two growing strands to become
intermingled, resulting in intertwined catenated structures that have to be untangled again,
which topoisomerase IV will do. However, this enzyme can stitch two chromosomes together
as well as releasing them, (not knowing which are crossing parts of the same molecule and
which are two molecules crossing each other), unless the two molecules are separated
spatially at the same time as the enzyme untangles them. This is done by condensins, proteins
that condense and separate the two DNA strands leaving the replication form so that they
don't remain intermingled.
When the replication fork halts for whatever reason (for instance due to a strand break, a
false base pair, or a chemical modification of a nucleotide that DNA polymerase can't
recognize), it would be detrimental to the dividing cell. A number of checks and balances are
in place to keep the replication machinery going, and repair any damage along the way, which
are described in the next chapter.
35
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.3. Four base atlases of various bacterial genomes. The genome of T. tengcongensis (a
thermophilic Firmicute, shown in top left) displays a strong GC skew (blue and magentum) and a
strong AT skew (green and red). Notice, that its genes are strongly preferred for the leading strand, as
can be seen from the blue and red coding sequences (CDS) separating on the two halves of the
chromosome. In contrast, P. gingivalis (a Bacteroidetes causing gingivitis, top right) has no bias in
the bases, so that it is hard to see where its origin of replication is. The genome of Veillonella (an
anaerobic, Gram negative Firmicute that lives in teeth plaque, bottom left) has a strong GC base skew
but a weak AT base skew, and its genes are again preferred on the leading strand, whereas the genes
of V. fisheri (a marine Gammaproteobacterium, bottom right) has no strand preference for its genes,
despite having a strong GC-skew and AT skew.
36
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Base composition differences between leading and lagging strands
The leading and lagging strands of the chromosomal DNA can considerably differ in base
composition. For instance, the G's of all G-C basepairs tend to be more often found on the
leading strand. This is common in bacteria, and is called a GC-skew. Most bacterial display a
GC-skew in their genomes, so they have more G's on the leading strand than on the lagging
strand. For some bacterial genomes their leading strand also contains more A's than T's.
Circular chromosomes can be shown in an atlas. Figure 3.3 presents four examples of
bacterial genomes with different GC-skews. Circular graphical representations are very useful
to display particular features. However, it isn't practical to write down their DNA sequence in
a circular manner. For written genome sequences, a circular chromosome or plasmid is
arbitrarily cut open to write the sequence in a linear way. By convention, the twelve o'clock
position of an atlas is where a written DNA sequence starts (and thus was artificially opened).
It makes sense to start a written bacterial chromosome at the origin of replication, since
replication of a chromosome is initiated here. In most cases, this would result in a written
sequence in which the first gene that appears would be dnaA, since that gene is most often
located next to the origin of replication. For historical reasons, this practice is not followed
with E. coli and related genomes. Moreover, for many sequenced bacterial genomes, the
origin of replication is either not known or not taken into account when preparing the final
sequence to be deposited to public databases, so that ori is not always located at the top of a
genome, and dnaA doesn't always appear as the first gene in a genome sequence.
As Figure 3.2 illustrated, replication starts at the Ori but occurs in diverging directions.
This means that both halves of a chromosome have a leading and a lagging strand, but it is not
the same strand in the complete chromosome molecule that is always leading. If one follows
one strand from ori, say in clockwise direction, to Ter and further up the circle again, one
would read the leading strand for the first half up to the terminus of replication, after which
this same strand becomes the lagging strand. It means that, for the chromosomes shown in
Figure 3.3, one complete sequence (which is one DNA strand written out in full) would report
an over-representation of G's in its first half (where we write the leading strand), and an
under-representation of G's in its second half (where the lagging strand is written down). The
over-representation of G's in the first half of a GC-skewed chromosome is compensated by an
under-representation in the other half. Moreover, whereas the terminus of replication is the
last bit of DNA produced during replication, it is not the last sequence we can find in our
DNA file: the Ter is somewhere in the middle of a written circular chromosome sequence, at
least when that sequence is opened up at the origin of replication.
3.2. Production of plasmid DNA
In many bacteria, chromosomes are not the only DNA molecules that need to be replicated.
When plasmids are present, these have to be multiplied and divided over the two sister cells
as well. Plasmids are maintained at various copy levels, and can be maintained as a single
copy per cell, at low copy-number or high copy-numbers, depending on the plasmid. The
copy number of a plasmid is largely dictated by initiation of their replication. Initiation of
plasmid replication typically depends on an initiation protein (often called RepA or RepC)
that is coded by the plasmid, which binds to a specific repeat sequence in the ori of the
plasmid. The repeat binding sites on plasmids are called iterons. Plasmid replication can be
bidirectional, just like the production of chromosomes, or unidirectional, which is a simplified
version of circular replication; a third mechanism of plasmid production is called rolling
circle replication.
37
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Unidirectional or bidirectional plasmid replication
Plasmids undergoing bidirectional replication are produced with two replication forks, each
producing a leading and a lagging strand, just like a small chromosome. Plasmids undergoing
unidirectional replication produce two leading strands from an origin of replication that
continue all the way along the circle; the two strands meet again at the ori, as shown in
Figure 3.4. As with replication of a chromosome, DNA pol doesn't act alone, but needs a
number of other proteins to produce a plasmid copy. Most of these proteins are encoded on
the chromosome of the host cell in which the plasmids replicate, though some plasmids code
for their own helicase or primase. Unfortunately, the nomenclature of plasmid genes is rather
messy and confusing, and sometimes identical names are used for different genes (and
proteins) with different functions, depending on the type of plasmid.
Figure 3.4. Unidirectional plasmid replication From the origin of replication (oriV), two
leading strands are produced that meet again at the origin. A lagging strand is not produced.
Plasmids can be divided in families of incompatibility: some plasmids cannot be maintained
together in the same cell without external selective pressure, in which case they are called to
be incompatible. This is partly dictated by their iterons, which when competing for the same
initiation protein, inhibit each other's replication. Incompatibility of plasmids will be further
explained in Chapter 7.
Plasmids of alpha-Proteobacteria that belong to a large group called RepABC replicons
contain genes for a DnaA-like initiation protein (called RepC for these plasmids) that
specifically initiates replication of the plasmid. Two other proteins, RepA and RepB, are
involved in segregation of plasmid during cell division; the three genes are located in one
single locus called RepABC which gave these plasmids their general name. A typical example
of a RepABC replicon is the Ti-plasmid of Agrobacterium tumefaciens.
Broad host-range plasmids of the IncQ family (found in Gram-negative bacteria but also in
Mycobacteria and Cyanobacteria) are extremely promiscuous, which means they can replicate
in a wide range of bacterial species. They can do so because they contain a relatively large
number of replication genes, making them less dependent on their host. They contain their
own initiation protein RepC, a specific helicase (here called RepA) and their own primase
(here called RepB, note the same names are used as in the RepABC plasmids though their
functions differ); sometimes SSB (the protein stabilizing ssDNA), gyrase or DNA Pol III
subunit genes are present as well. Typical examples of plasmids belonging to the IncQ family
are R300B (from Salmonella typhimurium), and closely related R1162 of Pseudomonas
aeruginosa.
The R1 plasmid of Salmonella typhimurium (a member of the IncFII family) and related
low-copy-number plasmids code for their own initiator protein RepA, whose production is
tightly regulated; after initiation, replication unidirectional, terminating at the origin.
38
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery As a last example of some major plasmid families, members of the ColE1 family, which
have found many applications in biotechnology, initiate replication by transcription: an RNA
of 130 nucleotides is produced by RNA polymerase, after which DNA Pol I starts production
of the leading strand. Only after 300 nucleotides or so, Pol III takes over.
Rolling circle replication
An alternative mechanism to produce plasmid DNA copies, called rolling circle replication,
includes a stage in which a complete single-strand DNA plasmid is present in the cell. This is
illustrated in Figure 3.5. Rolling circle replication is mostly used by small circular, multicopy plasmids of Gram-positive bacteria. (However, rolling-circle replicating plasmids are
also known in some Gram-negatives and in archaea).
Figure 3.5. Rolling circle replication. In this type of plasmid replication the two strands are
produced in two separate steps. First a leading strand is formed from the dso, with the help of
RepD and helicase PcrA; the existing leading strand is covalently bound to RepD as singlestrand DNA. This is recircularized by RepD, after which production of the lagging strand is
initiated at sso by primase. Newly-synthesized DNA is black and template DNA is grey.
Initiation of this type of replication depends on initiation protein RepD (the nomenclature
here is taken from Bacillus plasmids), whose gene is found on the plasmid. As a first step, the
dimer RepD binds to a specific sequence called dso for 'double-strand DNA origin of
39
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery replication', which forms a hairpin structure. Upon binding, RepD introduces a nick (a singlestrand break) in one of the loops. It then becomes covalently linked with one of its tyrosine
amino acids to the 5'-phospate of the nicked DNA strand. RepA recruits a helicase, (called
PcrA in Gram-positive bacteria) which will separate the two strands; SSB helps to keep the
two strands apart. Since the nick has produced a free 3'-OH end, DNA polymerase III can use
this for extension, which produces a leading strand. This strand is produced all the way
around to the dso site, and even extends a few nucleotides beyond its start. RepD then cuts the
displaced complementary strand of the original template, again via formation of a hairpin
structure. This results in a double-strand copy and a single-strand template that still is
covalently bound to the RepD dimer. The protein restores this to a complete circular template,
after which an alternative origin of replication is used to complete it to double-strand DNA:
the sso (single-strand origin of replication). Here, a primase produces an RNA primer that is
extended by DNA Pol III, and completed by DNA Pol I and ligase. Rolling circle replication
is also used by single-strand bacteriophages.
3.3. Production of bacteriophage DNA
Just like eukaryotes (including humans), bacteria suffer from viral infections. Viruses
replicating in bacteria are called bacteriophages. They cannot replicate by themselves, as they
depend on the transcription and translation machinery of their host to produce all necessary
proteins. Some bacteriophages carry genes for (a number of) replication proteins, but the
simplest viruses are nothing more than a piece of DNA containing genes that code for their
own structural components. Phage genomes can, however, contain a wide variety of other
genes as well. Since bacteria eventually reproduce all the bacteriophage genomes, their genes
can be considered as bacterial, although for part of the time the genes reside outside a
bacterial cell, temporarily protected against degradation by the virus proteins surrounding it.
Lytic phages cause viral infections in bacteria
A virus particle (which is not considered a living cell) is called a virion and consists of
nucleic acid (DNA or RNA) that is covered with a protein capsule (occasionally the capsule
contains both protein and lipids). The viral genome contains the genes required to produce the
proteins of which the virion is composed, as well as signals that are required to force an
infected cell to produce virion copies. However, a virus cannot independently replicate, as it
does not possess a translation machinery to produce the necessary proteins. A virus particle is
in most cases so small that it is only visible by electron microscopy. Viruses infecting bacteria
can be visualized as plaques on a lawn of cells growing on an agar plate. One cell that is
infected by a single virion will produce more virus copies, which infect neighbor cells, either
killing them as they leave the cells, or slowing down their growth. The result is a round hole
in the lawn where there is no bacterial growth, like a negative colony. These plaques gave
bacteria-infecting viruses their name bacteriophage ('bacteria eaters') that is usually shortened
to phage. Note, however, that not all phages kill their bacterial hosts. The term phage is now
used for any virus that infects prokaryotes.
Phages come in many types and sizes. The simplest particles are basically DNA or RNA
protected by a coat of protein, usually assembled into regular icosahedrons (three-dimensional
shapes with a regular triangular spatial build). One of the smallest bacterial viruses known is
Enterobacteria phage GA, whose genome measures a mere 3466 bp, coding for four proteins.
Other phages are more complex, and their particles consist of a head in which their
genome is stored, and a tail, which assists in exporting the genome into a host cell during
infection. Further structural components can add complexity to the morphology of phages. An
40
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery example of an exceptionally large bacteriophage is Pseudomonas phage 201phi2-1, with a
genome of 316,674 bp on which 461 genes have been identified. Obviously this much DNA
requires a larger head (also called capsid), but these are not even the largest viruses known.
The record holders are giant viruses infecting unicellular eukaryotes, with genomes
containing over a thousand genes. Their virions are as big as small bacterial cells, but they
cannot independently perform protein synthesis, which is the hallmark of the living world.
Margin box: The Russian doll effect
Even viruses can suffer from viral infections. The giant mamavirus (a cousin of the
mimivirus that was the first giant virus to be identified) is a eukaryotic virus that infects
amoebas. Inside its viroid particle, which is as big as a small bacterial cell, replicating
viruses were identified that use the mamavirus as a host. In analogy to 'bacteriophage', a
virus parasitizing on other viruses has been named a 'virophage'. Virophages that use
bacteriophages as a host have not yet been discovered. Probably there are size constraints
that wouldn't allow a bacteriophage, which has to be small enough to infect and replicate
inside a bacterial cell, to harbor a virophage that has to be smaller still. However, giant
bacterial cells have been discovered, such as Thiomargarita or Epulopiscium species that are
visible to the naked eye, so it is not impossible that one day a virophage is discovered that
preys on a bacteriophage that preys on very large bacteria.
When a virus infects a host cell, it will inject its genome (which can be RNA or DNA,
either in single-strand or in double-strand form) into the cell. From then on, cellular proteins
will start reading the information on the viral DNA, which results in production of more virus
particles. Bacterial viruses that reproduce in this way are called lytic phages. However, there
are a number of phages that can alternatively insert their DNA inside the chromosome of the
cell, where it will reside, and be replicated, for generations to come. These are called
temperate phages. Such an integrated bacteriophage genome is called a prophage. Prophages
can eventually excise and be replicated to produce new virus particles. Lytic phages are often
detrimental to their host but there are many examples of bacteria that profit from the presence
of prophage genes. The life cycle of temperate phages will be treated in Chapter 7.
The infective cycle of lytic phages consists of distinct, closely regulated and well-timed
stages, illustrated in Figure 3.6. Infection starts with a virus particle binding to a host cell,
recognizing a specific receptor so that this binding determines the host specificity of a phage.
Following binding (which is sometimes called adsorption), the viral DNA or RNA is injected
inside the cell's cytoplasm. Filamentous phages, that have their nucleic acid strand packed by
a single layer of protein form an exception, in that they cross the outer membrane of Gramnegative cells completely to end up in the cytoplasm; crossing the inner membrane strips them
of their protein coat. The fate of viral RNA genomes will be treated in Chapter 14. Phages
with a DNA genome can immediately start the next step of their infective cycle: expression of
their genes. Phage genes that are regulated by promoters recognizable by the cellular sigma
factors and RNA polymerase will be transcribed as soon as the DNA enters the cell, so that
messenger RNA is being produced within minutes. Genes that are expressed during this phase
of the infective cycle are called early genes. They typically code for phage proteins that will
direct the cellular replication machinery towards producing more phage DNA: DNA
polymerase, helicase, primase, etc. As a result of this protein production, making use of the
host cell's translation machinery, the viral genome is being replicated. In addition, early genes
can code for regulators that will, once they have been produced in sufficient quantities, switch
on viral genes that produce the protein building blocks of the virion. Since these genes are
expressed with a delay upon viral entry, these are called late genes. As soon as sufficient
genome copies and phage building blocks have been produced, virus particles will assemble
spontaneously, and these accumulate inside the cell.
41
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Phages that do not kill their host will leave the cell by diffusion to start a new infection
cycle. Not all host cells survive a viral infection, though. Cells can explode due to the
abundant presence of virus particles, while some phages produce lysozyme to degrade the
peptidoglycan layer, and will actively lyse the cell. Obviously, the production of early and
late virus proteins must be carefully regulated: lysing the cell too early would result in too few
virus copies, and the production of viral DNA and protein components has to be well
coordinated.
Figure 3.6. The infective cycle of a lytic phage. Chromosomal DNA is not shown and the cell
and virus are not drawn on scale. One infected cell can produce hundreds of phage virions.
Note that the virus shape shown here is only one of several existing phage morphologies.
In nature, bacteria can be frequently infected by phages, and the bacterial and viral population
may reach an equilibrium, which doesn't eliminate either. Lytic phages are extremely
abundant in the ocean. It has been estimated that the complete bacterial biosphere of the ocean
is regenerated every few days as a result of phage-induced lysis. There are approximately ten
times more phage particles in a drop of seawater than there are bacteria present. Although less
generally recognized, bacteriophages are also very common in soil. Box 3.1 provides a brief
overview of the taxonomy of bacteriophages. Probably, wherever bacteria or archaea live,
bacteriophages are present as well.
Information box 3.1. Taxonomy of viruses
Viruses parasitize on all living cells, and taxonomists have grouped them into families based on
the nature of their genetic material and their morphology, which is related to their gene content.
All bacteria share the 16S rRNA gene, which can be used as a taxonomic reference;
unfortunately, there is not a single gene that is conserved in every virus or bacteriophage.
42
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Taxonomic divisions of viruses do not match their host ranges, and both archaea and eubacteria
can be infected by a wide variety of virus families.
The vast majority of bacteriophages belong to the Caudovirales, which are tailed viruses
containing dsDNA. They comprise of three families: the Myoviridae, Siphoviridae and
Podoviridae. Most tailed bacteriophages are Siphoviridae to which λ and T4 belong. The
second most abundant family is the Myoviridae, to which P2 and phage Mu belong.
Tailless viruses are less frequently observed as bacteriophages, but they are far more diverse
than the tailed phages, and are divided into more families. Tailless bacteriophages belong to at
least 12 families. All four types of nucleic acids (dsDNA, ssDNA, dsRNA or ssRNA) are
represented in tailless phages. Their morphology can be filamentous (containing dsDNA or
ssDNA), polyhedral (all four nucleic acid types are represented in this morphology) or
pleomorphic (e.g. dsDNA containing phages of Sulfolobus species). Polyhydral, ssDNA
containing Microviridae parasitizing on Enterobacteria have been well studied. Phage M13 is an
example of an ssDNA filamentous phage. Polyhedral, dsDNA containing Teciviridae that use
Bacillus or Enterobacterium as a host contain a lipid vesicle inside their protein capsule.
3.4. Variations on the theme of replication
Production of linear chromosomes
Bacteria with a linear chromosome usually have their ori located in the middle, and start
bidirectional replication from there, just like with circular chromosomes. However, there is a
problem producing the last Okasaki fragment at the very ends of the lagging strand, since
removal of the very last primer leaves an overhanging 3'-end that can't be 'patched'. This is
solved by various mechanisms in bacteria. Borrelia species contain linear DNA replicons
(both plasmids and chromosomes) whose ends form an internal loop, called hairpin telomeres:
the two strands are fused to one continuous circular strand. This is somewhat similar to
eukaryotic chromosomes, which are linear as well, but start their unidirectional replication at
one end rather than from the centre); they can form four-stranded structures, in which the
DNA folds back on itself. Replication then produces a circular intermediate, in which the two
replicons are connected. These are separated into two molecules by a special enzyme,
resolvase (ResT). This also restores the hairpins, and Figure 3.7 illustrates its action.
Streptomyces species with linear chromosomes and plasmids have solved the problem of
patching the 3'-ends of lagging strands differently: the single-strand end of the lagging strand
(estimated 230 nucleotides long) is stabilized by 'terminal proteins' (TPs) that are covalently
attached to the telomere 5'-ends. These proteins serve as a primer to complete the ends of the
lagging strands. The telomere ends of most known linear plasmids of Streptomyces are
strongly conserved, as are the genes coding for TPs, which are located near the ends of the
chromosome, and on some of the linear plasmids. How a covalently bound protein can serve
as a primer for DNA polymerase is not completely clear. The presence of long inverted
repeats in the terminal regions of the linear replicates suggest that hairpin structures can be
formed, which could serve as a primer. However, this doesn't seem the case, and instead the
inverted repeats are needed to bind a second protein 'Telomere associated protein’ (Tap) that
is essential for telomere completion. The covalent attachment of proteins to DNA ends is also
a strategy used by some bacteriophages, though these usually start unidirectional replication
from those ends, whereas bacterial linear chromosomes and plasmids seem to prefer
bidirectional replication. Another feature of Streptomyces is that it can go through multinucleoid stages, in which more than one chromosome copy is present in the cell, and in the
last section of this chapter we will see there are more bacteria with multiple chromosome
copies in their cells.
43
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.7. Replication of Borrelia linear chromosome and plasmids. To the left, the
formation of a circular intermediate during bidirectional replication from the ori is shown,
which is separated into two chromosomes by ResT. The protein introduces two nicks and a
conformational change in the hairpin structures of the telomere ends as shown to the right.
Coordination of replication of multiple chromosomes
Bacteria with multiple chromosomes have to coordinate their replication carefully. This
coordination has been studied in Vibrio cholerae, which contains two circular chromosomes.
The origin of replication of chromosome I resembles that of E. coli, and initiation of its
replication depends on DnaA. Chromosome II, however, has a slightly different ori region
that more resembles that of plasmids; it requires a protein RctB for initiation. The presence of
two separate initiators may prevent competition, but requires a coordinate expression of the
two proteins. Box 3.2 lists some bacteria that replicate their chromosomes using alternative
strategies, compared to E. coli. Notably, replication of archaea is significantly different at
several steps and more resembles that of eukaryotes than of prokaryotes. This is one of the
observations that have led to the proposal that eukaryotes evolved from archaeal ancestor
cells, living in symbiosis with eubacteria that eventually specialized into mitochondria.
Information Box 3.2: Chromosomes that beg to differ
• Replication in archaea more resembles that of eukaryotes than of prokaryotes, like
multiple replication starts per chromosome.
• Many halophilic archaea maintain mega-plasmids (or mini-chromosomes, depending on
the definition) in addition to their relatively small chromosomes, that carry ribosomal
RNA genes.
• Both Streptomyces and Borrelia species have linear chromosomes, but they do not use
the same strategy for replication termination.
• In Vibrio cholerae, production of the second, smaller chromosome is only started after
the major chromosome is nearly completely replicated, so that both are finished at the
same time.
• Fast-growing cells may start a new replication round when the first round isn't yet
completed.
44
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery • In E. coli, asymmetrical replichores, due to large insertions/deletions on one side of the
chromosome but not the other, decrease the fitness of the bacteria, however, in closely
related Salmonella enterica, asymmetrical replichores have little effect on fitness when
tested under experimental conditions.
Polyploid bacteria: multiple copies of identical chromosomes
Instead of a unique single chromosome per cell, a number of bacteria prefer to maintain
multiple identical copies of their chromosome. This is called polyploidy and such organisms
are polyploid. In contrast, organisms with a single copy of a chromosome per cell are called
monoploid, whereas the term haploid is reserved for the phase of sexually reproducing
organisms where their reproductive cells contain one copy of each chromosome. Finally, the
term oligoploid is used to describe cells with reduced numbers of multiple genome copies,
compared to their true polyploid stage. Oligoploid cells are typically observed in a particular
growth phase of a polyploid species.
Polyploidy has been observed for various species from a number of bacterial phyla. These
include Cyanobacteria (e.g. Synechococcus species), the Spirochete Borrelia hermsii, the
Firmicute Epulopiscium fishelsoni, members from the Deinococcus-Thermus phylum, and a
number of Proteobacteria.
An impressive example of a polyploid γ-Proteobacterium is Azotobacter vinelandii,
because it produces so many copies of its chromosome. That the species is polyploid was
discovered by the inactivation of essential genes, since the cells would maintain at least one
chromosome copy still bearing an intact gene, while other copies of the gene were
successfully inactivated. Fast-growing A. vinelandii cells accumulate, in the late-stationary
phase, 50 to 100 copies of their genome per cell, which causes the cells to swell up
considerably. However, when this species is grown on minimal medium, the slow-growing
cells remain monoploid. This has led to the interpretation that all this extra DNA may be used
as a storage, possibly of nitrogen and phosphate, though the use of such stored DNA in times
when food becomes scarce has not yet been demonstrated.
A more moderate example of polyploidy is the β-Proteobacterium Neisseria gonorrhoeae.
Two identical chromosome copies are present before, and four copies after replication (before
cell division), so that this species is in fact diploid. In a population of exponentially growing
cells (with a mixture of cells being present before and after replication) this results in an
average of three chromosomes per cell (though none of the cells present will actually contain
three copies). This DNA is located in different nucleoid regions inside the cell. During cell
division, there is only one pair of replication forks. As a result, the diploid cells are
monozygous, since the multiple DNA copies are all derived from one replicating molecule.
This was established by production of a mutant in which two chromosome copies received
different antibiotic resistance inserts, targeted at the same chromosomal location. The
resulting double-resistant mutants had rescued one of these resistant markers by homologous
recombination, which placed the gene in a different location, as is schematically shown in
Figure 3.8. Heterozygous offspring of cells bearing two chromosomes with the two different
resistance markers in the same location were never observed. Such observations can only be
explained when one of the two chromosome copies is exclusively replicated during cell
division. Another member of the Neisseria genus, N. meningitidis, was also found to be
diploid, but the property is not conserved in all members of the genus, as the commensal
Neisseria lactamica contains just one chromosome copy per cell.
45
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.8. Experimental demonstration that Neisseria gonorrhoea is homozygous.
In this transformation experiment, two different mutants were first produced (here called
Mutant R1 and Mutant R2) that had different resistance genes introduced in the same
location. Two chromosome copies are shown, called A and A' for clarity. DNA of one mutant
was then transformed into the other, and double-resistant transformants were selected. These
were discovered to be homozygous and their combined DNA contained the two resistance
genes invariably in two locations. No heterozygous bacteria were identified that carried both
genes in the same location, suggesting that the multiple chromosome copies of the cells have
to be identical.
An interesting polyploid γ-Proteobacterium is Buchnera aphidicola. It lives as an
endosymbiont in aphids and its genome is amongst the smallest bacterial genomes known. Its
chromosome is believed to have undergone severe gene reduction (a process that may still be
ongoing) as an adaptation to its symbiotic life. Apart from a minute chromosome of a mere
420 to 650 kbp (depending on the strain), some strains also contain up to two plasmids. This
very small genome is packed in a very large cell that in fact contains an awful lot of DNA: a
Buchnera cell is approximately 15 times bigger than an E. coli cell, and contains 10 times as
much DNA, since the genome is multiplied to 50 to 200 copies, depending on the age of the
aphid.
46
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Polyploidy may not at all be unusual for Proteobacteria; it had just not been studied until
recently. A default of one copy for a bacterial chromosome had been assumed, based on
extrapolation from E. coli, but it had rarely been verified experimentally. When the genome
copy number was determined for four Proteobacterial species, using real-time PCR, E. coli
was indeed found to be monoploid, as were Caulobacter crescentus and Wollinella
succigones. However Pseudomonas putida was polyploid, with 14 copies of terminus regions
per cell on average.
Genetic manipulation experiments had led to the observation that Thermus thermophilus (a
member of the Deinococcus-thermus phylum) is polyploid, because a resistance gene could
be introduced by homologous recombination, but at the same time the target gene that was
supposed to have been inactivated in the transformants remained intact. A chromosomal copy
number of four to five was subsequently determined, and that same number of copies was also
found for its large plasmid. It had already been known that Deinococcus radians (another
member of this phylum) was polyploid, and this organism may use its chromosomal copies
for rapid DNA recombination repair as a mechanism to provide resistance to extreme
radiation.
Linear chromosomes can also be found in multiple copies, as the Spirochete Borrelia
hermsii demonstrates. This organism causes tick-born relapsing fever in North America, and
its linear chromosome is present on average as 16 copies, as was detected in cells that were
grown in mice (the species doesn't replicate outside a host). The bacteria also contain a
number of linear plasmids, whose copy numbers seem to be slightly lower than that of the
chromosome.
The most complex polyploidy is possibly found in a Firmicute. Epulopiscium fishelsoni is
a symbiont (it can't be cultured in the laboratory) that lives in the gut of the Red Sea brown
surgeonfish (Acanthurus nigrofuscus, similar bacteria have been found in related surgeonfish
species). The bacteria display gigantism, and their cells belong to the biggest bacteria known.
The size of the cigar-shaped cells can reach over 0.6 mm, and varies 20-fold in length, or over
2,000-fold in volume. The largest observed cells exceed the volume of an E. coli cell by 5
magnitudes, but their size varies considerably during the day. This variation in cell size
reflects a complex daily life cycle that is probably regulated by the fish's dietary intake. The
cells contain one or two nucloids that increase in size as the cell grows during the day. This
increase in nucleoid size is related to an increase in DNA content, presumably by
multiplication of its chromosome, though chromosomal copy numbers have not yet been
determined.
It is not known why some bacterial species prefer to maintain multiple chromosome copies
per cell. The trait is found in endosymbionts as well as in free-living cells, in fast-growing or
slower growing organisms, and a transition from exponential growth into stationary stage can
increase or decrease copy numbers, depending on the species. Most likely, the biological
function of polyploidy depends on the specific requirements of the species.
Polyploidy is not restricted to Eubacteria, as it is also demonstrated for a number of
Archaea. Some Sulfolobus species, which are Crenarchaeota, were found to contain two
copies during most of their cell cycle, and one copy prior to replication; this resembles a G2
phase typical for many eukaryotes. However, most Crenarchaeota seem to be monoploid. In
contrast, polyploidy is quite common for Euryarchaeota. An example is Archaeoglobus
fulgidus, which also seems to go through a G2 phase, with two chromosome copies. Other
Euryarchaeotes follow different strategies. Methanothermobacter thermoautotrophicus grows
in filaments with cells that contain several nucleoids, each of which contains a single
chromosome. Methanococcus jannaschii, on the other hand, contains multiple copies of the
chromosome throughout the cell, and these are not always evenly distributed to the daughter
cells during cell division. Halobacterium salinarum was shown to contain an average of 25
chromosome copies in exponential phase, but decreases this to 15 copies during stationary
47
Bacterial Genetics and Molecular Biology -­‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery phase. Other Euryarchaeota also go through alternating phases of oligoploidy and polyploidy,
strictly regulated during their growth phase. Experiments have shown that polyploid
Euryarchaeotes can even be heterozygous, though in absence of selection, the multiple copies
of their chromosome rapidly converse to a homozygous state. A temporary heterozygous
stage could provide an evolutionary advantage, as it increases the genetic repertoire of an
organism. Thus different patterns of polyploidy exist, and it is a more general phenomenon in
the prokaryotic world than once thought.
3.5. Concluding remarks
Bacteria are not simple and uniform bags of DNA and protein; they are highly organized and
diverse living cells that have evolved various ways to multiply their DNA. DNA replication is
an essential process for all living cells, and the multiple proteins involved are cooperating in a
complex manner. Regulatory interactions exist at various levels, and different species have
organized their genomes in different ways, with variations on the production of their DNA.
Although many of the processes are currently best studied in E. coli, there is no reason to
assume that the solutions this bacterium came up with are superior, or even more generally
conserved, than alternatives found in other species; the latter have just been less frequently
studied.
Recommended reading
Mechanism and evolution of DNA primases. Kuchta RD and Stengel G. 2010. Biochim
Biophys Acta 1804:1180-1189.
One-way traffic control in replication termination. Theis K. 2006. Nat Chem Biol. 2:455-456.
Characterization and in vitro reaction properties of 19 unique hairpin telomeres from the
linear plasmids of the lyme disease spirochete. Tourand Y, Deneke J, Moriarty TJ,
Chaconas G. 2009. J Biol Chem. 284:7264-7272.
Soil to genomics: the Streptomyces chromosome. Hopwood DA. 2006. Annu Rev Genet.
40:1-23.
The physics of virus assembly. Stockley PG and Twarock R. 2010. Phys Biol. 7(4):040301.
Regulation of the initiation of chromosomal replication in bacteria. Zakrzewska-Czerwińska
J, Jakimowicz D, Zawilak-Pawlik A, Messer W. 2007. FEMS Microbiol Rev. 31:378-387.
Plasmid rolling-circle replication: highlights of two decades of research. Khan SA. 2005.
Plasmid. 53:1261-1236.
48