* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 3
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Genome evolution wikipedia , lookup
Transcriptional regulation wikipedia , lookup
X-inactivation wikipedia , lookup
Community fingerprinting wikipedia , lookup
List of types of proteins wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Molecular cloning wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
DNA supercoil wikipedia , lookup
Molecular evolution wikipedia , lookup
Genomic library wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Chapter 3. DNA production in the bacterial cell Complex molecules in biology are usually made of polymers from simple 'building block' entities, and different functions are obtained from variation in the sequence of these building blocks. Polymerization processes involve three stages: initiation, elongation, and then termination. This is true for the production of all the major polymers in the cell: DNA, RNA, proteins and carbohydrates. In the case of DNA, individual nucleotides are polymerized into long molecules. The production of a copy of a DNA molecule is called replication and it occurs through the same three stages: initiation (at a specific site in the chromosome, called the replication origin), elongation (the synthesis of DNA strands - in this case, copying of a template strand) and termination (usually on the opposite end of the chromosome from the origin, called replication terminus). The main enzyme responsible for elongation is DNA polymerase, which produces two copies of each genome needed for cell division. 3.1. Production of the chromosome DNA polymerase produces a DNA strand from a single-strand template The main enzyme responsible for the polymerization of DNA nucleotides is DNA polymerase. However, it is important to note that DNA polymerase (DNA pol for short) is not the only enzyme involved in the production of genome copies, and it functions as an enzyme complex, in which many different proteins partake. At the core of this process is a DNA pol that connects nucleotides (dNTPs) to each other to form a DNA strand. It can only do this when an existing denatured (single-stranded) DNA strand dictates the order of the nucleotides it connects: single-stranded DNA has to act like a template. The enzyme attaches nucleotides at the 3'-OH end of a growing DNA strand, as shown by the grey strand in Figure 3.1, using the existing single-strand DNA as a template to connect the nucleotides into the correct order. However, the DNA polymerase can only extend a nucleotide that is already present, which means at least a piece of double-strand DNA must already exist, as in the figure. DNA polymerase can't start on a completely single-stranded template, as it wouldn't have a nucleotide to extend. Figure 3.1. DNA nucleotides are connected by DNA polymerase A single-strand DNA molecule, with a short double-strand part (which can be DNA or RNA), allows DNA polymerase to produce a complementary strand. A few nucleotides are specified for clarification. When the enzyme has reached the end of the template, the product is a complete double stranded DNA that exists of one old strand (the template, grey) and one newly formed strand (black). 32 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery The three stages of DNA replication DNA polymerase (together with multiple other enzymes) is responsible for production of a genome copy in the cell, but how are the requirements met of using single-stand DNA as a template, and a piece of double-strand DNA to start with? During initiation, a bubble opens up at the origin of replication (ori). This is a specific section of the chromosome, and to open it up, a crucial protein called DnaA is required. The gene coding for DnaA is usually located close to the ori. The opened bubble has to be stabilized by proteins, as single-strand DNA (ssDNA) is an unfavorable confirmation and is sensitive to both mutation and degradation in a cell. A protein called SSB, for single-strand DNA binding protein, provides stabilization of ssDNA. An enzyme called primase (an RNA polymerase which can start all by itself) will next produce a short RNA fragment complementary to each of the two strands. These socalled primers will serve as a starting point for DNA polymerase for each of the strands that have to be produced. The details of the opening of the origin of replication will be explained in detail in the next chapter. Once the two primers are in place, two copies of the enzyme DNA polymerase both produce a DNA strand, each producing its own complimentary strand. This process is described as elongation. Because the two strands of DNA are anti-parallel, the two enzymes work in opposite directions so that they are moving away from each other, as shown in Figure 3.2. For the other strand, a primer is produced a little further downstream, and DNA pol will extend that strand in the opposite direction. Finally, the last step, termination, will take place. When the two replichores near completion, the two replication forks on both sides of the chromosome will meet each other. This happens at the termination region, which is not as strictly defined as the origin of replication. Termination may be a spontaneous process when the two forks meet in some bacteria, but in many bacterial chromosomes it is regulated. Usually, multiple repeat sequences (each repeat unit, called ter, is 22 basepairs long) act as blocks to the polymerase, in order to avoid that one of the two approaching polymerase enzymes shoots through: these repeats allow passage of the enzyme only in the orientation towards the middle. Specific proteins bind to ter and stop the replication machinery. The protein ligase glues the two loose ends together. It is important to realize that when a chromosome is copied, the two products will consist of one old strand (which served as template) and one novel strand. This is called semiconservative DNA synthesis, which means that the DNA of two cells after cell division will always be a mixture of old and new DNA. 33 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.2. DNA replication of leading and lagging strands at the origin of replication. A. To start replication, a bacterial genome opens at the origin of replication and is stabilized by proteins (not shown here). A leading strand is continuously produced in both directions from the origin, though it is the opposite strand in opposite directions. Newly produced DNA is shown in black, the template strands are shown in grey. B. At the same time, the protein produces the lagging strand as fragments, each starting from short RNA primers (blue) at intervals to keep up with the extending bubble. The RNA primers are later replaced by DNA and the lagging strand fragments are glued together. C. The final DNA copies are a mixture of an existing strand (grey in the figure) and a novel DNA strand (black). Following one strand along a complete circular chromosome, one moves from one lagging half to one leading half, separated by the origin and terminus of replication. 34 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Production of the leading and the lagging strand During elongation, the bubble increases as more and more DNA is being copied. The borders of the bubble are called the replication forks and as the new DNA is being elongated, the growing chromosomes are called replichores. The replication forks move with a speed of 600 to 1000 basepairs per second, the speed with which DNA polymerase can synthesize DNA. For each replichore, the enzyme will produce one continuous DNA strand, which is called the leading strand. The complementary strand of each replichore can only be made with multiple starts, from multiple primers (separated every 1000 - 2000 basepairs), as it is elongated in the opposite direction of the moving replication fork. The strand that is produced from these multiple primers is called the lagging strand. DNA polymerase will continue with the lagging strand till it reaches the previous RNA primer, where it stops, to continue further downstream the replication fork with the next primer. This results in a strand that consists of DNA fragments, called Okazaki fragments, interrupted by short RNA fragments. The enzyme that produces the leading and lagging strand is called DNA polymerase III (Pol III); it is the fastest DNA polymerase known. A different type of DNA polymerase will perform the next step for the lagging strand: DNA polymerase I (Pol I), which has specific exonuclease activity, will eat away the RNA of each primer from the 5'-end, in the same direction in which it then synthesizes DNA. Following the combined exonuclease and DNA polymerase activity of Pol I, the lagging strand still only exists as disconnected DNA fragments. A separate ligase can join these by fusing the ribose-phosphate bond between adjacent nucleotides; this is something DNA polymerases can't do. Multiple enzymes are required for DNA replication Replication does not only depend on DnaA, two types of DNA polymerase, primase and ligase; additional proteins are necessary for the complete replication machinery, with many proteins to solve specific problems. For instance, after separation of the two strands by DnaA, during initiation at the origin of replication, the two strands of the growing replication fork are not separated by DNA polymerase itself, but by a protein called helicase. Stabilizing proteins are needed to prevent DNA polymerase from detaching from the DNA, and other proteins bind to the temporarily single-strand DNA to protect it from degradation. As was already mentioned, replication causes positive supercoiling upstream, and negative supercoiling downstream of the opening bubble. Gyrase will relax the upstream positive supercoiling by introducing extra negative coils, but his causes the two growing strands to become intermingled, resulting in intertwined catenated structures that have to be untangled again, which topoisomerase IV will do. However, this enzyme can stitch two chromosomes together as well as releasing them, (not knowing which are crossing parts of the same molecule and which are two molecules crossing each other), unless the two molecules are separated spatially at the same time as the enzyme untangles them. This is done by condensins, proteins that condense and separate the two DNA strands leaving the replication form so that they don't remain intermingled. When the replication fork halts for whatever reason (for instance due to a strand break, a false base pair, or a chemical modification of a nucleotide that DNA polymerase can't recognize), it would be detrimental to the dividing cell. A number of checks and balances are in place to keep the replication machinery going, and repair any damage along the way, which are described in the next chapter. 35 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.3. Four base atlases of various bacterial genomes. The genome of T. tengcongensis (a thermophilic Firmicute, shown in top left) displays a strong GC skew (blue and magentum) and a strong AT skew (green and red). Notice, that its genes are strongly preferred for the leading strand, as can be seen from the blue and red coding sequences (CDS) separating on the two halves of the chromosome. In contrast, P. gingivalis (a Bacteroidetes causing gingivitis, top right) has no bias in the bases, so that it is hard to see where its origin of replication is. The genome of Veillonella (an anaerobic, Gram negative Firmicute that lives in teeth plaque, bottom left) has a strong GC base skew but a weak AT base skew, and its genes are again preferred on the leading strand, whereas the genes of V. fisheri (a marine Gammaproteobacterium, bottom right) has no strand preference for its genes, despite having a strong GC-skew and AT skew. 36 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Base composition differences between leading and lagging strands The leading and lagging strands of the chromosomal DNA can considerably differ in base composition. For instance, the G's of all G-C basepairs tend to be more often found on the leading strand. This is common in bacteria, and is called a GC-skew. Most bacterial display a GC-skew in their genomes, so they have more G's on the leading strand than on the lagging strand. For some bacterial genomes their leading strand also contains more A's than T's. Circular chromosomes can be shown in an atlas. Figure 3.3 presents four examples of bacterial genomes with different GC-skews. Circular graphical representations are very useful to display particular features. However, it isn't practical to write down their DNA sequence in a circular manner. For written genome sequences, a circular chromosome or plasmid is arbitrarily cut open to write the sequence in a linear way. By convention, the twelve o'clock position of an atlas is where a written DNA sequence starts (and thus was artificially opened). It makes sense to start a written bacterial chromosome at the origin of replication, since replication of a chromosome is initiated here. In most cases, this would result in a written sequence in which the first gene that appears would be dnaA, since that gene is most often located next to the origin of replication. For historical reasons, this practice is not followed with E. coli and related genomes. Moreover, for many sequenced bacterial genomes, the origin of replication is either not known or not taken into account when preparing the final sequence to be deposited to public databases, so that ori is not always located at the top of a genome, and dnaA doesn't always appear as the first gene in a genome sequence. As Figure 3.2 illustrated, replication starts at the Ori but occurs in diverging directions. This means that both halves of a chromosome have a leading and a lagging strand, but it is not the same strand in the complete chromosome molecule that is always leading. If one follows one strand from ori, say in clockwise direction, to Ter and further up the circle again, one would read the leading strand for the first half up to the terminus of replication, after which this same strand becomes the lagging strand. It means that, for the chromosomes shown in Figure 3.3, one complete sequence (which is one DNA strand written out in full) would report an over-representation of G's in its first half (where we write the leading strand), and an under-representation of G's in its second half (where the lagging strand is written down). The over-representation of G's in the first half of a GC-skewed chromosome is compensated by an under-representation in the other half. Moreover, whereas the terminus of replication is the last bit of DNA produced during replication, it is not the last sequence we can find in our DNA file: the Ter is somewhere in the middle of a written circular chromosome sequence, at least when that sequence is opened up at the origin of replication. 3.2. Production of plasmid DNA In many bacteria, chromosomes are not the only DNA molecules that need to be replicated. When plasmids are present, these have to be multiplied and divided over the two sister cells as well. Plasmids are maintained at various copy levels, and can be maintained as a single copy per cell, at low copy-number or high copy-numbers, depending on the plasmid. The copy number of a plasmid is largely dictated by initiation of their replication. Initiation of plasmid replication typically depends on an initiation protein (often called RepA or RepC) that is coded by the plasmid, which binds to a specific repeat sequence in the ori of the plasmid. The repeat binding sites on plasmids are called iterons. Plasmid replication can be bidirectional, just like the production of chromosomes, or unidirectional, which is a simplified version of circular replication; a third mechanism of plasmid production is called rolling circle replication. 37 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Unidirectional or bidirectional plasmid replication Plasmids undergoing bidirectional replication are produced with two replication forks, each producing a leading and a lagging strand, just like a small chromosome. Plasmids undergoing unidirectional replication produce two leading strands from an origin of replication that continue all the way along the circle; the two strands meet again at the ori, as shown in Figure 3.4. As with replication of a chromosome, DNA pol doesn't act alone, but needs a number of other proteins to produce a plasmid copy. Most of these proteins are encoded on the chromosome of the host cell in which the plasmids replicate, though some plasmids code for their own helicase or primase. Unfortunately, the nomenclature of plasmid genes is rather messy and confusing, and sometimes identical names are used for different genes (and proteins) with different functions, depending on the type of plasmid. Figure 3.4. Unidirectional plasmid replication From the origin of replication (oriV), two leading strands are produced that meet again at the origin. A lagging strand is not produced. Plasmids can be divided in families of incompatibility: some plasmids cannot be maintained together in the same cell without external selective pressure, in which case they are called to be incompatible. This is partly dictated by their iterons, which when competing for the same initiation protein, inhibit each other's replication. Incompatibility of plasmids will be further explained in Chapter 7. Plasmids of alpha-Proteobacteria that belong to a large group called RepABC replicons contain genes for a DnaA-like initiation protein (called RepC for these plasmids) that specifically initiates replication of the plasmid. Two other proteins, RepA and RepB, are involved in segregation of plasmid during cell division; the three genes are located in one single locus called RepABC which gave these plasmids their general name. A typical example of a RepABC replicon is the Ti-plasmid of Agrobacterium tumefaciens. Broad host-range plasmids of the IncQ family (found in Gram-negative bacteria but also in Mycobacteria and Cyanobacteria) are extremely promiscuous, which means they can replicate in a wide range of bacterial species. They can do so because they contain a relatively large number of replication genes, making them less dependent on their host. They contain their own initiation protein RepC, a specific helicase (here called RepA) and their own primase (here called RepB, note the same names are used as in the RepABC plasmids though their functions differ); sometimes SSB (the protein stabilizing ssDNA), gyrase or DNA Pol III subunit genes are present as well. Typical examples of plasmids belonging to the IncQ family are R300B (from Salmonella typhimurium), and closely related R1162 of Pseudomonas aeruginosa. The R1 plasmid of Salmonella typhimurium (a member of the IncFII family) and related low-copy-number plasmids code for their own initiator protein RepA, whose production is tightly regulated; after initiation, replication unidirectional, terminating at the origin. 38 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery As a last example of some major plasmid families, members of the ColE1 family, which have found many applications in biotechnology, initiate replication by transcription: an RNA of 130 nucleotides is produced by RNA polymerase, after which DNA Pol I starts production of the leading strand. Only after 300 nucleotides or so, Pol III takes over. Rolling circle replication An alternative mechanism to produce plasmid DNA copies, called rolling circle replication, includes a stage in which a complete single-strand DNA plasmid is present in the cell. This is illustrated in Figure 3.5. Rolling circle replication is mostly used by small circular, multicopy plasmids of Gram-positive bacteria. (However, rolling-circle replicating plasmids are also known in some Gram-negatives and in archaea). Figure 3.5. Rolling circle replication. In this type of plasmid replication the two strands are produced in two separate steps. First a leading strand is formed from the dso, with the help of RepD and helicase PcrA; the existing leading strand is covalently bound to RepD as singlestrand DNA. This is recircularized by RepD, after which production of the lagging strand is initiated at sso by primase. Newly-synthesized DNA is black and template DNA is grey. Initiation of this type of replication depends on initiation protein RepD (the nomenclature here is taken from Bacillus plasmids), whose gene is found on the plasmid. As a first step, the dimer RepD binds to a specific sequence called dso for 'double-strand DNA origin of 39 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery replication', which forms a hairpin structure. Upon binding, RepD introduces a nick (a singlestrand break) in one of the loops. It then becomes covalently linked with one of its tyrosine amino acids to the 5'-phospate of the nicked DNA strand. RepA recruits a helicase, (called PcrA in Gram-positive bacteria) which will separate the two strands; SSB helps to keep the two strands apart. Since the nick has produced a free 3'-OH end, DNA polymerase III can use this for extension, which produces a leading strand. This strand is produced all the way around to the dso site, and even extends a few nucleotides beyond its start. RepD then cuts the displaced complementary strand of the original template, again via formation of a hairpin structure. This results in a double-strand copy and a single-strand template that still is covalently bound to the RepD dimer. The protein restores this to a complete circular template, after which an alternative origin of replication is used to complete it to double-strand DNA: the sso (single-strand origin of replication). Here, a primase produces an RNA primer that is extended by DNA Pol III, and completed by DNA Pol I and ligase. Rolling circle replication is also used by single-strand bacteriophages. 3.3. Production of bacteriophage DNA Just like eukaryotes (including humans), bacteria suffer from viral infections. Viruses replicating in bacteria are called bacteriophages. They cannot replicate by themselves, as they depend on the transcription and translation machinery of their host to produce all necessary proteins. Some bacteriophages carry genes for (a number of) replication proteins, but the simplest viruses are nothing more than a piece of DNA containing genes that code for their own structural components. Phage genomes can, however, contain a wide variety of other genes as well. Since bacteria eventually reproduce all the bacteriophage genomes, their genes can be considered as bacterial, although for part of the time the genes reside outside a bacterial cell, temporarily protected against degradation by the virus proteins surrounding it. Lytic phages cause viral infections in bacteria A virus particle (which is not considered a living cell) is called a virion and consists of nucleic acid (DNA or RNA) that is covered with a protein capsule (occasionally the capsule contains both protein and lipids). The viral genome contains the genes required to produce the proteins of which the virion is composed, as well as signals that are required to force an infected cell to produce virion copies. However, a virus cannot independently replicate, as it does not possess a translation machinery to produce the necessary proteins. A virus particle is in most cases so small that it is only visible by electron microscopy. Viruses infecting bacteria can be visualized as plaques on a lawn of cells growing on an agar plate. One cell that is infected by a single virion will produce more virus copies, which infect neighbor cells, either killing them as they leave the cells, or slowing down their growth. The result is a round hole in the lawn where there is no bacterial growth, like a negative colony. These plaques gave bacteria-infecting viruses their name bacteriophage ('bacteria eaters') that is usually shortened to phage. Note, however, that not all phages kill their bacterial hosts. The term phage is now used for any virus that infects prokaryotes. Phages come in many types and sizes. The simplest particles are basically DNA or RNA protected by a coat of protein, usually assembled into regular icosahedrons (three-dimensional shapes with a regular triangular spatial build). One of the smallest bacterial viruses known is Enterobacteria phage GA, whose genome measures a mere 3466 bp, coding for four proteins. Other phages are more complex, and their particles consist of a head in which their genome is stored, and a tail, which assists in exporting the genome into a host cell during infection. Further structural components can add complexity to the morphology of phages. An 40 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery example of an exceptionally large bacteriophage is Pseudomonas phage 201phi2-1, with a genome of 316,674 bp on which 461 genes have been identified. Obviously this much DNA requires a larger head (also called capsid), but these are not even the largest viruses known. The record holders are giant viruses infecting unicellular eukaryotes, with genomes containing over a thousand genes. Their virions are as big as small bacterial cells, but they cannot independently perform protein synthesis, which is the hallmark of the living world. Margin box: The Russian doll effect Even viruses can suffer from viral infections. The giant mamavirus (a cousin of the mimivirus that was the first giant virus to be identified) is a eukaryotic virus that infects amoebas. Inside its viroid particle, which is as big as a small bacterial cell, replicating viruses were identified that use the mamavirus as a host. In analogy to 'bacteriophage', a virus parasitizing on other viruses has been named a 'virophage'. Virophages that use bacteriophages as a host have not yet been discovered. Probably there are size constraints that wouldn't allow a bacteriophage, which has to be small enough to infect and replicate inside a bacterial cell, to harbor a virophage that has to be smaller still. However, giant bacterial cells have been discovered, such as Thiomargarita or Epulopiscium species that are visible to the naked eye, so it is not impossible that one day a virophage is discovered that preys on a bacteriophage that preys on very large bacteria. When a virus infects a host cell, it will inject its genome (which can be RNA or DNA, either in single-strand or in double-strand form) into the cell. From then on, cellular proteins will start reading the information on the viral DNA, which results in production of more virus particles. Bacterial viruses that reproduce in this way are called lytic phages. However, there are a number of phages that can alternatively insert their DNA inside the chromosome of the cell, where it will reside, and be replicated, for generations to come. These are called temperate phages. Such an integrated bacteriophage genome is called a prophage. Prophages can eventually excise and be replicated to produce new virus particles. Lytic phages are often detrimental to their host but there are many examples of bacteria that profit from the presence of prophage genes. The life cycle of temperate phages will be treated in Chapter 7. The infective cycle of lytic phages consists of distinct, closely regulated and well-timed stages, illustrated in Figure 3.6. Infection starts with a virus particle binding to a host cell, recognizing a specific receptor so that this binding determines the host specificity of a phage. Following binding (which is sometimes called adsorption), the viral DNA or RNA is injected inside the cell's cytoplasm. Filamentous phages, that have their nucleic acid strand packed by a single layer of protein form an exception, in that they cross the outer membrane of Gramnegative cells completely to end up in the cytoplasm; crossing the inner membrane strips them of their protein coat. The fate of viral RNA genomes will be treated in Chapter 14. Phages with a DNA genome can immediately start the next step of their infective cycle: expression of their genes. Phage genes that are regulated by promoters recognizable by the cellular sigma factors and RNA polymerase will be transcribed as soon as the DNA enters the cell, so that messenger RNA is being produced within minutes. Genes that are expressed during this phase of the infective cycle are called early genes. They typically code for phage proteins that will direct the cellular replication machinery towards producing more phage DNA: DNA polymerase, helicase, primase, etc. As a result of this protein production, making use of the host cell's translation machinery, the viral genome is being replicated. In addition, early genes can code for regulators that will, once they have been produced in sufficient quantities, switch on viral genes that produce the protein building blocks of the virion. Since these genes are expressed with a delay upon viral entry, these are called late genes. As soon as sufficient genome copies and phage building blocks have been produced, virus particles will assemble spontaneously, and these accumulate inside the cell. 41 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Phages that do not kill their host will leave the cell by diffusion to start a new infection cycle. Not all host cells survive a viral infection, though. Cells can explode due to the abundant presence of virus particles, while some phages produce lysozyme to degrade the peptidoglycan layer, and will actively lyse the cell. Obviously, the production of early and late virus proteins must be carefully regulated: lysing the cell too early would result in too few virus copies, and the production of viral DNA and protein components has to be well coordinated. Figure 3.6. The infective cycle of a lytic phage. Chromosomal DNA is not shown and the cell and virus are not drawn on scale. One infected cell can produce hundreds of phage virions. Note that the virus shape shown here is only one of several existing phage morphologies. In nature, bacteria can be frequently infected by phages, and the bacterial and viral population may reach an equilibrium, which doesn't eliminate either. Lytic phages are extremely abundant in the ocean. It has been estimated that the complete bacterial biosphere of the ocean is regenerated every few days as a result of phage-induced lysis. There are approximately ten times more phage particles in a drop of seawater than there are bacteria present. Although less generally recognized, bacteriophages are also very common in soil. Box 3.1 provides a brief overview of the taxonomy of bacteriophages. Probably, wherever bacteria or archaea live, bacteriophages are present as well. Information box 3.1. Taxonomy of viruses Viruses parasitize on all living cells, and taxonomists have grouped them into families based on the nature of their genetic material and their morphology, which is related to their gene content. All bacteria share the 16S rRNA gene, which can be used as a taxonomic reference; unfortunately, there is not a single gene that is conserved in every virus or bacteriophage. 42 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Taxonomic divisions of viruses do not match their host ranges, and both archaea and eubacteria can be infected by a wide variety of virus families. The vast majority of bacteriophages belong to the Caudovirales, which are tailed viruses containing dsDNA. They comprise of three families: the Myoviridae, Siphoviridae and Podoviridae. Most tailed bacteriophages are Siphoviridae to which λ and T4 belong. The second most abundant family is the Myoviridae, to which P2 and phage Mu belong. Tailless viruses are less frequently observed as bacteriophages, but they are far more diverse than the tailed phages, and are divided into more families. Tailless bacteriophages belong to at least 12 families. All four types of nucleic acids (dsDNA, ssDNA, dsRNA or ssRNA) are represented in tailless phages. Their morphology can be filamentous (containing dsDNA or ssDNA), polyhedral (all four nucleic acid types are represented in this morphology) or pleomorphic (e.g. dsDNA containing phages of Sulfolobus species). Polyhydral, ssDNA containing Microviridae parasitizing on Enterobacteria have been well studied. Phage M13 is an example of an ssDNA filamentous phage. Polyhedral, dsDNA containing Teciviridae that use Bacillus or Enterobacterium as a host contain a lipid vesicle inside their protein capsule. 3.4. Variations on the theme of replication Production of linear chromosomes Bacteria with a linear chromosome usually have their ori located in the middle, and start bidirectional replication from there, just like with circular chromosomes. However, there is a problem producing the last Okasaki fragment at the very ends of the lagging strand, since removal of the very last primer leaves an overhanging 3'-end that can't be 'patched'. This is solved by various mechanisms in bacteria. Borrelia species contain linear DNA replicons (both plasmids and chromosomes) whose ends form an internal loop, called hairpin telomeres: the two strands are fused to one continuous circular strand. This is somewhat similar to eukaryotic chromosomes, which are linear as well, but start their unidirectional replication at one end rather than from the centre); they can form four-stranded structures, in which the DNA folds back on itself. Replication then produces a circular intermediate, in which the two replicons are connected. These are separated into two molecules by a special enzyme, resolvase (ResT). This also restores the hairpins, and Figure 3.7 illustrates its action. Streptomyces species with linear chromosomes and plasmids have solved the problem of patching the 3'-ends of lagging strands differently: the single-strand end of the lagging strand (estimated 230 nucleotides long) is stabilized by 'terminal proteins' (TPs) that are covalently attached to the telomere 5'-ends. These proteins serve as a primer to complete the ends of the lagging strands. The telomere ends of most known linear plasmids of Streptomyces are strongly conserved, as are the genes coding for TPs, which are located near the ends of the chromosome, and on some of the linear plasmids. How a covalently bound protein can serve as a primer for DNA polymerase is not completely clear. The presence of long inverted repeats in the terminal regions of the linear replicates suggest that hairpin structures can be formed, which could serve as a primer. However, this doesn't seem the case, and instead the inverted repeats are needed to bind a second protein 'Telomere associated protein’ (Tap) that is essential for telomere completion. The covalent attachment of proteins to DNA ends is also a strategy used by some bacteriophages, though these usually start unidirectional replication from those ends, whereas bacterial linear chromosomes and plasmids seem to prefer bidirectional replication. Another feature of Streptomyces is that it can go through multinucleoid stages, in which more than one chromosome copy is present in the cell, and in the last section of this chapter we will see there are more bacteria with multiple chromosome copies in their cells. 43 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.7. Replication of Borrelia linear chromosome and plasmids. To the left, the formation of a circular intermediate during bidirectional replication from the ori is shown, which is separated into two chromosomes by ResT. The protein introduces two nicks and a conformational change in the hairpin structures of the telomere ends as shown to the right. Coordination of replication of multiple chromosomes Bacteria with multiple chromosomes have to coordinate their replication carefully. This coordination has been studied in Vibrio cholerae, which contains two circular chromosomes. The origin of replication of chromosome I resembles that of E. coli, and initiation of its replication depends on DnaA. Chromosome II, however, has a slightly different ori region that more resembles that of plasmids; it requires a protein RctB for initiation. The presence of two separate initiators may prevent competition, but requires a coordinate expression of the two proteins. Box 3.2 lists some bacteria that replicate their chromosomes using alternative strategies, compared to E. coli. Notably, replication of archaea is significantly different at several steps and more resembles that of eukaryotes than of prokaryotes. This is one of the observations that have led to the proposal that eukaryotes evolved from archaeal ancestor cells, living in symbiosis with eubacteria that eventually specialized into mitochondria. Information Box 3.2: Chromosomes that beg to differ • Replication in archaea more resembles that of eukaryotes than of prokaryotes, like multiple replication starts per chromosome. • Many halophilic archaea maintain mega-plasmids (or mini-chromosomes, depending on the definition) in addition to their relatively small chromosomes, that carry ribosomal RNA genes. • Both Streptomyces and Borrelia species have linear chromosomes, but they do not use the same strategy for replication termination. • In Vibrio cholerae, production of the second, smaller chromosome is only started after the major chromosome is nearly completely replicated, so that both are finished at the same time. • Fast-growing cells may start a new replication round when the first round isn't yet completed. 44 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery • In E. coli, asymmetrical replichores, due to large insertions/deletions on one side of the chromosome but not the other, decrease the fitness of the bacteria, however, in closely related Salmonella enterica, asymmetrical replichores have little effect on fitness when tested under experimental conditions. Polyploid bacteria: multiple copies of identical chromosomes Instead of a unique single chromosome per cell, a number of bacteria prefer to maintain multiple identical copies of their chromosome. This is called polyploidy and such organisms are polyploid. In contrast, organisms with a single copy of a chromosome per cell are called monoploid, whereas the term haploid is reserved for the phase of sexually reproducing organisms where their reproductive cells contain one copy of each chromosome. Finally, the term oligoploid is used to describe cells with reduced numbers of multiple genome copies, compared to their true polyploid stage. Oligoploid cells are typically observed in a particular growth phase of a polyploid species. Polyploidy has been observed for various species from a number of bacterial phyla. These include Cyanobacteria (e.g. Synechococcus species), the Spirochete Borrelia hermsii, the Firmicute Epulopiscium fishelsoni, members from the Deinococcus-Thermus phylum, and a number of Proteobacteria. An impressive example of a polyploid γ-Proteobacterium is Azotobacter vinelandii, because it produces so many copies of its chromosome. That the species is polyploid was discovered by the inactivation of essential genes, since the cells would maintain at least one chromosome copy still bearing an intact gene, while other copies of the gene were successfully inactivated. Fast-growing A. vinelandii cells accumulate, in the late-stationary phase, 50 to 100 copies of their genome per cell, which causes the cells to swell up considerably. However, when this species is grown on minimal medium, the slow-growing cells remain monoploid. This has led to the interpretation that all this extra DNA may be used as a storage, possibly of nitrogen and phosphate, though the use of such stored DNA in times when food becomes scarce has not yet been demonstrated. A more moderate example of polyploidy is the β-Proteobacterium Neisseria gonorrhoeae. Two identical chromosome copies are present before, and four copies after replication (before cell division), so that this species is in fact diploid. In a population of exponentially growing cells (with a mixture of cells being present before and after replication) this results in an average of three chromosomes per cell (though none of the cells present will actually contain three copies). This DNA is located in different nucleoid regions inside the cell. During cell division, there is only one pair of replication forks. As a result, the diploid cells are monozygous, since the multiple DNA copies are all derived from one replicating molecule. This was established by production of a mutant in which two chromosome copies received different antibiotic resistance inserts, targeted at the same chromosomal location. The resulting double-resistant mutants had rescued one of these resistant markers by homologous recombination, which placed the gene in a different location, as is schematically shown in Figure 3.8. Heterozygous offspring of cells bearing two chromosomes with the two different resistance markers in the same location were never observed. Such observations can only be explained when one of the two chromosome copies is exclusively replicated during cell division. Another member of the Neisseria genus, N. meningitidis, was also found to be diploid, but the property is not conserved in all members of the genus, as the commensal Neisseria lactamica contains just one chromosome copy per cell. 45 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Figure 3.8. Experimental demonstration that Neisseria gonorrhoea is homozygous. In this transformation experiment, two different mutants were first produced (here called Mutant R1 and Mutant R2) that had different resistance genes introduced in the same location. Two chromosome copies are shown, called A and A' for clarity. DNA of one mutant was then transformed into the other, and double-resistant transformants were selected. These were discovered to be homozygous and their combined DNA contained the two resistance genes invariably in two locations. No heterozygous bacteria were identified that carried both genes in the same location, suggesting that the multiple chromosome copies of the cells have to be identical. An interesting polyploid γ-Proteobacterium is Buchnera aphidicola. It lives as an endosymbiont in aphids and its genome is amongst the smallest bacterial genomes known. Its chromosome is believed to have undergone severe gene reduction (a process that may still be ongoing) as an adaptation to its symbiotic life. Apart from a minute chromosome of a mere 420 to 650 kbp (depending on the strain), some strains also contain up to two plasmids. This very small genome is packed in a very large cell that in fact contains an awful lot of DNA: a Buchnera cell is approximately 15 times bigger than an E. coli cell, and contains 10 times as much DNA, since the genome is multiplied to 50 to 200 copies, depending on the age of the aphid. 46 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery Polyploidy may not at all be unusual for Proteobacteria; it had just not been studied until recently. A default of one copy for a bacterial chromosome had been assumed, based on extrapolation from E. coli, but it had rarely been verified experimentally. When the genome copy number was determined for four Proteobacterial species, using real-time PCR, E. coli was indeed found to be monoploid, as were Caulobacter crescentus and Wollinella succigones. However Pseudomonas putida was polyploid, with 14 copies of terminus regions per cell on average. Genetic manipulation experiments had led to the observation that Thermus thermophilus (a member of the Deinococcus-thermus phylum) is polyploid, because a resistance gene could be introduced by homologous recombination, but at the same time the target gene that was supposed to have been inactivated in the transformants remained intact. A chromosomal copy number of four to five was subsequently determined, and that same number of copies was also found for its large plasmid. It had already been known that Deinococcus radians (another member of this phylum) was polyploid, and this organism may use its chromosomal copies for rapid DNA recombination repair as a mechanism to provide resistance to extreme radiation. Linear chromosomes can also be found in multiple copies, as the Spirochete Borrelia hermsii demonstrates. This organism causes tick-born relapsing fever in North America, and its linear chromosome is present on average as 16 copies, as was detected in cells that were grown in mice (the species doesn't replicate outside a host). The bacteria also contain a number of linear plasmids, whose copy numbers seem to be slightly lower than that of the chromosome. The most complex polyploidy is possibly found in a Firmicute. Epulopiscium fishelsoni is a symbiont (it can't be cultured in the laboratory) that lives in the gut of the Red Sea brown surgeonfish (Acanthurus nigrofuscus, similar bacteria have been found in related surgeonfish species). The bacteria display gigantism, and their cells belong to the biggest bacteria known. The size of the cigar-shaped cells can reach over 0.6 mm, and varies 20-fold in length, or over 2,000-fold in volume. The largest observed cells exceed the volume of an E. coli cell by 5 magnitudes, but their size varies considerably during the day. This variation in cell size reflects a complex daily life cycle that is probably regulated by the fish's dietary intake. The cells contain one or two nucloids that increase in size as the cell grows during the day. This increase in nucleoid size is related to an increase in DNA content, presumably by multiplication of its chromosome, though chromosomal copy numbers have not yet been determined. It is not known why some bacterial species prefer to maintain multiple chromosome copies per cell. The trait is found in endosymbionts as well as in free-living cells, in fast-growing or slower growing organisms, and a transition from exponential growth into stationary stage can increase or decrease copy numbers, depending on the species. Most likely, the biological function of polyploidy depends on the specific requirements of the species. Polyploidy is not restricted to Eubacteria, as it is also demonstrated for a number of Archaea. Some Sulfolobus species, which are Crenarchaeota, were found to contain two copies during most of their cell cycle, and one copy prior to replication; this resembles a G2 phase typical for many eukaryotes. However, most Crenarchaeota seem to be monoploid. In contrast, polyploidy is quite common for Euryarchaeota. An example is Archaeoglobus fulgidus, which also seems to go through a G2 phase, with two chromosome copies. Other Euryarchaeotes follow different strategies. Methanothermobacter thermoautotrophicus grows in filaments with cells that contain several nucleoids, each of which contains a single chromosome. Methanococcus jannaschii, on the other hand, contains multiple copies of the chromosome throughout the cell, and these are not always evenly distributed to the daughter cells during cell division. Halobacterium salinarum was shown to contain an average of 25 chromosome copies in exponential phase, but decreases this to 15 copies during stationary 47 Bacterial Genetics and Molecular Biology -‐ a Genomics Perspective (Ch. 3) Trudy M. Wassenaar, David W. Ussery phase. Other Euryarchaeota also go through alternating phases of oligoploidy and polyploidy, strictly regulated during their growth phase. Experiments have shown that polyploid Euryarchaeotes can even be heterozygous, though in absence of selection, the multiple copies of their chromosome rapidly converse to a homozygous state. A temporary heterozygous stage could provide an evolutionary advantage, as it increases the genetic repertoire of an organism. Thus different patterns of polyploidy exist, and it is a more general phenomenon in the prokaryotic world than once thought. 3.5. Concluding remarks Bacteria are not simple and uniform bags of DNA and protein; they are highly organized and diverse living cells that have evolved various ways to multiply their DNA. DNA replication is an essential process for all living cells, and the multiple proteins involved are cooperating in a complex manner. Regulatory interactions exist at various levels, and different species have organized their genomes in different ways, with variations on the production of their DNA. Although many of the processes are currently best studied in E. coli, there is no reason to assume that the solutions this bacterium came up with are superior, or even more generally conserved, than alternatives found in other species; the latter have just been less frequently studied. Recommended reading Mechanism and evolution of DNA primases. Kuchta RD and Stengel G. 2010. Biochim Biophys Acta 1804:1180-1189. One-way traffic control in replication termination. Theis K. 2006. Nat Chem Biol. 2:455-456. Characterization and in vitro reaction properties of 19 unique hairpin telomeres from the linear plasmids of the lyme disease spirochete. Tourand Y, Deneke J, Moriarty TJ, Chaconas G. 2009. J Biol Chem. 284:7264-7272. Soil to genomics: the Streptomyces chromosome. Hopwood DA. 2006. Annu Rev Genet. 40:1-23. The physics of virus assembly. Stockley PG and Twarock R. 2010. Phys Biol. 7(4):040301. Regulation of the initiation of chromosomal replication in bacteria. Zakrzewska-Czerwińska J, Jakimowicz D, Zawilak-Pawlik A, Messer W. 2007. FEMS Microbiol Rev. 31:378-387. Plasmid rolling-circle replication: highlights of two decades of research. Khan SA. 2005. Plasmid. 53:1261-1236. 48