* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download FEBS Lett. 586, 2043-2048 - iSSB
Gene desert wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Copy-number variation wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Transcription factor wikipedia , lookup
Transposable element wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genetic engineering wikipedia , lookup
X-inactivation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression programming wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Human Genome Project wikipedia , lookup
Human genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pathogenomics wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright Author's personal copy FEBS Letters 586 (2012) 2043–2048 journal homepage: www.FEBSLetters.org Review The layout of a bacterial genome François Képès a,b,⇑, Brian C. Jester a, Thibaut Lepage a, Nafiseh Rafiei a, Bianca Rosu a, Ivan Junier a,c a Epigenomics Project, institute of Systems and Synthetic Biology, GenopoleÒ, CNRS, University of Evry, France Institute of Systems and Synthetic Biology, Division of Molecular Biosciences, Imperial College, London, UK c Center for Genomic Regulation, Barcelona, Spain b a r t i c l e i n f o Article history: Received 2 March 2012 Revised 25 March 2012 Accepted 26 March 2012 Available online 31 March 2012 Edited by Thomas Reiss and Wilhelm Just Keywords: Genome layout and design Bacteria Replication Transcription Codon bias Co-functional gene a b s t r a c t Recently the mismatch between our newly acquired capacity to synthetize DNA at genome scale, and our low capacity to design ab initio a functional genome has become conspicuous. This essay gathers a variety of constraints that globally shape natural genomes, with a focus on eubacteria. These constraints originate from chromosome replication (leading/lagging strand asymmetry; gene dosage gradient from origin to terminus; collisions with the transcription complexes), from biased codon usage, from noise control in gene expression, and from genome layout for co-functional genes. On the basis of this analysis, lessons are drawn for full genome design. Ó 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. 1. Introduction After 40 years of genetic engineering, human technology has produced millions of genomes that differ, often in a fully-defined way, from the original genomes of genetically tractable species, mostly bacteria. These precise differences generally amount to the alteration, insertion or deletion of one or a few genes. A few cases have been reported of deletions or inversions of long genomic segments, carried out to ask specific scientific questions [1–5]. More recently, under the remit of synthetic biology, we have witnessed more systematic attempts at deleting a series of short and long segments (e.g., [6]). Various goals were pursued, such as to increase genome stability or to shorten genomes; on the application side, a major objective was to improve cell performance as a host or chassis for heterologous genes expression. Another line of research, called synthetic genomics, aims at chemically synthesizing full genomes to replace the original within live cells. This approach has already been validated with the synthesis and replacement of a 1 Mbp bacterial genome, even though its sequence was nearly identical to that of the original strain [7]. This near identity symbolizes the mismatch between our newly acquired capacity to synthetize DNA at genome scale, and our low capacity to design ab initio a functional genome. This discordance ⇑ Corresponding author at: Epigenomics Project, institute of Systems and Synthetic Biology, GenopoleÒ, CNRS, University of Evry, France. E-mail address: [email protected] (F. Képès). was eloquently expressed by Porcar et al. in the following terms: ‘‘paradoxically, the success in synthesizing (copying) DNA also highlights the very poor ability to de novo design (writing) genomes, which is certainly a consequence of the limited knowledge we have of the inherent complexity of living forms.’’ [8]. Operationally, what we are missing is the whole set of the essential constraints that shape a functional genome. It is anticipated that for a given set of constraints, myriads of solutions must exist, among them the natural genomes. It is sometimes estimated that there exist about 108 species on the Earth, and strain to strain variability within each species increases even further the number of viable genomes that constitute the global hereditary complement of our planet. Within this scope, comparative genomics has efficiently delineated fundamental bacterial processes because it is now supported by a thousand sequenced genomes. Hence, several sets of major constraints may be uncovered in the realm of bacteria, and testing this hypothesis may pave the way to many fundamental insights. This essay highlights the known constraints that shape a functional bacterial genome. It aims at proposing strategies for full design of a new-to-nature genome layout in the vein of synthetic biology [9]. Constraints at the gene scale, such as transcriptional and translational start and stop signals, or the making of an operon, as important as they are, are out of the scope of this discussion, which focuses on the genome scale. Genome layout comprises the position, order and orientation of genes in the genome. These features are highly non-random, reflecting in part functional 0014-5793/$36.00 Ó 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.febslet.2012.03.051 Author's personal copy 2044 F. Képès et al. / FEBS Letters 586 (2012) 2043–2048 Fig. 1. Genome-scale properties of eubacterial chromosomes. (a) Head-on (upper unit) and codirectional (lower unit) collisions between replication (‘‘Rep’’) and transcription (‘‘T’’) complexes. At the time shown, the upper replication complex has already collided with all transcription complexes that were on the upper unit when it was near the end of this unit. It will now collide with the transcription complexes that started meanwhile. Consequently, if solving a collision takes longer than the time between two transcription initiations, the replication could be blocked. By contrast, the other replication complex is at the start of the lower unit and will collide only with its first transcription complexes, as the last ones will reach the end of the unit before being caught up by the replication complex. ‘‘Start’’, transcription initiation site; ‘‘Stop’’, transcription termination site; dark triangle, transcription complex that will be dislodged by the replication complex; light triangle, transcription complex that will not be dislodged by the replication complex. (b) The circular chromosome of bacteria is generally replicated bi-directionally from an origin ‘o’ to a terminus which, roughly, is diametrically opposed. The duration of one replication round from start to finish is rather constant for medium to fast growth rates. For instance, it is close to 40 min for E. coli at 37 °C. Under conditions of medium growth (top), the replicative period of 40 min is therefore shorter than the generation time, e.g., 60 min. No re-initiation of replication happens before the termination of the preceding round of replication. Thus, replication is followed by a period devoid of DNA synthesis. Under conditions of rapid growth (bottom), the replicative period is longer than the generation time, e.g., 20 min. However, replication re-initiations must take place at the pace of the generation time. Consequently, two or three levels of replication forks may coexist, following the successive initiation rounds. (c) Example of codon frequencies for the six synonymous codons encoding the amino acid Leucine. From these frequencies, a codon weight is simply calculated as the ratio of its frequency over that of the favorite codon, here UUG. (d) Chromosomal proximity and periodical spacing of co-functional genes. Co-functional genes tend to be proximal along the bacterial chromosome, with a limit of about 20 genes. Beyond this limit, they often display a periodical spacing. This periodicity is interpreted as a means to cluster co-functional genes in space through the appropriate folding of chromosomes, here symbolized by a solenoid. adaptation. Genes, through recombination and related processes, repeatedly sample various positions along the chromosome. Over evolutionary times, genes come to be fixed at one among many positions that confer high fitness to the cell. Often, relative gene position matters more than absolute location: a gene would fit better at a given position with respect to other genes, e.g., cofunctional genes. 2. Control of collisions between replication and transcription complexes The fundamental processes of replication and transcription take place on the same physical chromosome. In Escherichia coli, the replication forks progress along the two halves of the circular chromosome, from the oriC locus [10,11] to the ter region [12], thus defining two ‘‘replichores’’. These specific loci have been well characterized and are generally predictable with a bioinformatics approach [13]. A similar organization is observed in Bacillus subtilis [14,15], although the two replichores are of unequal length. Replication of the DNA double helix occurs on each strand from 50 to 30 . Thus, on a given replichore, one DNA strand, called the ‘‘leading strand’’, is replicated co-directionally with the replication fork. The other strand is called the ‘‘lagging strand’’. Transcription occurs on shorter stretches, on average in the 1 kbp range. It may progress along the leading strand (co-orientation with the replication fork), or the lagging strand (head-on). In E. coli, the ratio of replication to transcription speeds is estimated to be between 14 and 24. Transcription speed is 42 bp/s [16], replication speed is 600–1000 bp/s [17]. As a consequence, collisions between the replication complex and the much slower transcription complex have been observed in different organisms (Fig. 1a). Collisions have been proposed to negatively affect cellular fitness by interrupting the expression of highly-transcribed genes [18], or by leading to the formation of incomplete transcripts, which subsequently results in toxic truncated polypeptides [19]. The effects on chromosome replication are actually most deleterious. For genome design, the important fact is that the arrest of replication forks due to collisions with transcription complexes leads to genomic instability and cell death [5]. Mechanisms that promote the progression of replication forks past transcription complexes are therefore essential for propagation and preservation of the genome. Numerous factors are involved in recovering from collisions and minimizing their molecular consequences [20,21]. The genes encoding these factors must therefore be present in any chassis genome. Besides this recovery mechanism, a second type of mechanism aims at preventing collisions by an adapted genome layout. Functional constraints are as follows. In fast-growing cells, both replication of the replichores and transcription of the highly-expressed genes are initiated more frequently, further elevating this potential conflict [22–24]. The outcome of their encounter should depend strongly on their relative direction. RNA polymerase is dislodged by replication in either direction [25,26]. On the other hand, replication is affected mostly by head-on transcription (on the lagging strand) [26–30]. The following observations on natural chromosomes are direct consequences of these functional constraints and should be turned into principles to design the layout of a genetically stabilized chassis genome. Firstly, highly-expressed genes are preferentially transcribed co-directionally with replication (on the leading strand) across numerous species [19,28]. For instance, in B. subtilis and E. coli, all rRNA operons are transcribed on the leading strand [31–33]. Secondly, essential genes are Author's personal copy F. Képès et al. / FEBS Letters 586 (2012) 2043–2048 enriched to a greater extent than non-essential genes on the leading strand [19]. Thirdly, there is a global bias for co-directionality of replication and transcription. In B. subtilis and E. coli, this bias is 75% and 55% of all genes, respectively [32,33]. Finally, it is noteworthy that collision probability is proportional to the length of transcription units (Fig. 1a). Consequently, longer transcription units are enriched on the leading strand. This enrichment decreases as the replication / transcription speed ratio increases [34]. 3. Replication-associated dosage of genes involved in translation and transcription Replication-associated gene dosage is an important determinant of chromosome organization and dynamics, especially among fast-growing bacteria. The bidirectional replication of bacterial chromosomes leads to transient gene dosage effects. Indeed, DNA replication must be initiated once per cell cycle, while it may take longer than one generation time and particularly so at high growth rates (Fig. 1b). It appears that dozens of replication forks may be simultaneously present in the cells of certain species. Gene dosage effects strongly constrain the position of genes involved in translation and transcription, but not the position of other highly expressed genes [35]. The relative proximity of the former genes to the origin of replication is stronger for RNA polymerase, then rRNA, then ribosomal proteins and the most abundant tRNAs. In eubacteria bearing multiple chromosomes, highly expressed genes are preferentially found in the largest chromosome, a feature that maximizes the gene dosage effects because this chromosome is longest to replicate [36]. Biotechnology most often makes use of fast-growing bacteria, precisely those whose genome layout has been shown to most strongly exhibit replication-associated gene dosage biases. The principles discussed above must therefore be integrated into any design study of the genome layout of fast-growing bacteria. Leaving them out would presumably result in imbalances in the translation and transcription machineries, with dire consequences on growth rate and bioproduction capacity. 4. Codon and strand compositional biases The genetic code is degenerate. Synonymous codons are codons that match the same amino acid. Synonymous codons are seldom used with equal frequencies [37,38]. Both natural selection and mutational biases contribute to changes in codon usage bias within a genome. Mutational bias is caused by the bias in the errors made by the DNA replication and repair machineries. It may cause a preference for codons with G or C at the third nucleotide position, thus altering the GC-content [39], or it may enrich the leading strand in G + C compared to the lagging strand [39]. Codon usage bias may also arise in some specific locations as a consequence of horizontal transfer of genes with different base content [40]. Selective forces are also shaping codon usage bias. In particular, ‘‘translational bias’’ plays a major role in fast growing prokaryotes and eukaryotes. From the genome sequence and its predicted protein-coding regions, this bias can easily be quantified [41], even in the absence of prior biological knowledge about the organism under study [42] (Fig. 1c). Three main facts support the idea of translational bias: highly expressed genes tend to use only a limited number of codons and thus to display a high codon bias [37,41]; preferred codons and iso-acceptor tRNA content exhibit a strong positive correlation [43]; and tRNA iso-acceptor pools affect the rate of polypeptide chain elongation [44]. Thus, a non-optimal codon will cause the ribosome to pause while waiting for the correct aminoacyl-tRNA to reach its site [45]. The resulting slow-down in ribosome movement is expected to be inversely proportional 2045 to the aminoacyl-tRNA concentration [44]. As this essay is about genome-scale design, translational bias at the gene and at the codon levels will not be discussed further. These biases alter strand composition and codon choice [46,47]. In turn, both codon and mutational biases shape the ability of the organism to exchange genetic material with other species by homologous recombination. Hence, these compositional biases affect the capacity and species profile for horizontal gene transfer and they have consequences on the process of speciation. They may thus be considered as ways to reduce the probability of horizontal gene transfer, e.g., for strains that are planned to be released in the environment for bio-remediation. As similar physiology or habitat appear to result in comparable codon biases [48], the choice of preferred codons could be elaborated to decrease the probability of horizontal gene transfer by avoiding the dominant bias in the targeted habitat. Keeping in mind that biotechnological strains are often fast growers, the situation is different for translational bias because its impact is not only on recombinational pattern, but also on protein biosynthesis. It is generally believed that codon bias is maintained by a balance between selection, mutation, and genetic drift. The actual codon bias for any given organism is a rather arbitrary feature. While there is a strong requirement for self-consistency within a genome, there is no known constraint on picking one bias rather than another one. If there were any lesson to draw from this knowledge in designing a full genome ab initio, it would be to respect selfconsistency of translational codon bias throughout the genome. The choice of the preferred codons could in principle be arbitrary but should be matched by the cognate tRNA concentrations, which can be tuned by playing with the tRNA number of gene copies, and their transcription and degradation rates [49]. The absence of self-consistency would likely result in mild to medium effects on cell physiology, due to stoichiometric imbalances among proteins involved in common complexes or pathways, as well as among complementary pathways. 5. Control of noise in gene expression at the molecular scale In bacterial cells, the number of copies of many key proteins is very small. For instance, many transcription factors are in 10–100 copies per cell. It is therefore expected that the events they contribute to should be strongly fluctuating in time and space. It has been argued that noise may have positive effects, e.g. to allow a population to cope with an uncertain future by not having all individuals follow the same route [50,51]. It may also have negative effects that various mechanisms may attempt to control [52]. For physiological reasons, expression of some genes must be rather constant, despite the stochastic nature of most molecular events that determine the rate of transcriptional initiation, such as binding of transcription factors to DNA. Interestingly, it was observed in yeast that the degree of noise in the expression levels is highly variable from gene to gene [53]. This suggests that noise for individual genes might be under adaptive pressure [54]; reviewed in [55]. Similar phenomena are likely to be at play in bacteria. One possible avenue to control noise would be to implement synthetic circuits at genome scale that comprise appropriate loops [56]. In particular, with proper parameters, negative feedback loops have stabilizing properties, while feedforward loops with time delays can filter out short-term fluctuations [57]. In natural regulatory circuits, such loops are statistically over-represented, suggesting that indeed, noise control is an important and general feature encoded in natural genomes [58,59]. Moreover, it appears that yeast genes with comparable noise levels tend to cluster together [60]. This fact, if it turns out to also apply to bacteria, would constitute a further constraint on their Author's personal copy 2046 F. Képès et al. / FEBS Letters 586 (2012) 2043–2048 genome layout, even though it is currently impossible to predict the extent of its physiological consequences. Arguably, the strategies of chromosomal proximity and periodicity for co-functional genes, observed both in eukaryotes and in prokaryotes, also constitute ways to reduce noise at transcriptional initiation [61]. Because the logics and consequences of these strategies go beyond the issue of noise control, they are discussed separately below. 6. Interplay between genome layout, chromosome conformation and genome function Spatial conformations of bacterial chromosomes have been known for a long time to affect gene regulation; reviewed in [62]. From a structural point of view, bacterial chromosomes are organized into a compact nucleoid comprising DNA and proteins. They are often negatively supercoiled, which causes the double helix to adopt a branched and plectonemic or interwound structure. Supercoiling level is determined by a balance between opposing effects carried out by topoisomerases and ATP-dependent DNA gyrase [63,64]. The whole process is additionally cross-regulated by factors known as nucleoid-associated proteins [65]. Because supercoiled DNA strands rotate quickly [66], chromosomes are actually partitioned into small domains to prevent the spreading of supercoiling loss. These supercoiling domains are further reorganized by transcription [67], leading to a highly integrated and regulated system. Interestingly, some physiological transitions induced by external stimuli are accompanied by changes in DNA supercoiling [63]; reviewed in [64]. Along the same line, it has been proposed that supercoiling may play a major role in control of gene expression in small genomes that do not seem to encode proteic transcription factors [68]. Finally, genes encoding the above-mentioned players of this integrated system appear along the replichores at ordered positions that largely correspond to their temporal expression patterns during growth [69]. Even though this peculiar genome layout has not been proven to be essential for genome function, this recent observation suggests to order the supercoiling- and nucleoid-associated genes accordingly. More generally, genome layout, chromosome conformation and genome function/expression are expected to be interdependent. Evidence for non-random genome positioning of co-functional genes has raised since 2003. Here, gene co-functionality may refer to transcriptional co-regulation or co-expression, complex formation by the products, participation to a common pathway, or a combination of two or three of the above. Investigation of the genomic organization of co-functional groups has revealed regularities of two types, either chromosomal proximity or long-range periodical spacing along the chromosomes (Fig. 1d). On the one hand, chromosomal proximity has been found to be effectively limited to 20 successive genes in enterobacteria [70]. On the other hand, periodic positioning has been observed for microbial genes that are either co-regulated [71–73], co-expressed [74], evolutionarily correlated [75,70], or highly codon-biased [76]. Képès has postulated the existence of a positive feedback loop connecting periodic genome layout to the cellular conformation of chromosomes onto transcriptional control [71]. This has been demonstrated using a thermodynamic model of chromosome folding where transcription factors cross-link distant binding sites [77]. In this chromosome model, a periodic gene positioning has been shown to be crucial for achieving chromosome conformations that favor gene spatial clustering. In the case of transcriptional coregulation, as demonstrated for the lac operon, this increased local concentration of transcription factors and of their cognate binding sites leads to strongly enhanced [78,61] and more robust [61,79] transcriptional control. More generally, in bacteria, proteins tend to be made close to their encoding genes [80] as a consequence of the kinetic coupling between transcription and translation. Therefore, it is expected that elevated concentrations of the cofunctional products will build up locally [70]. This should result in reduced diffusion times and enhanced rate kinetics [81]. A further layer of complexity arises in the case of membrane or secreted proteins. In bacteria, the amino-terminal hydrophobic signal sequence of these proteins gets inserted in the cell membrane while translation is underway, and often while transcription is not finished: this coupled transcription/translation/insertion process has been named ‘‘transertion’’ [82]. While it is intuitively clear that the numerous genes encoding secreted or membrane protein should be organized to allow efficient secretion/insertion, transertion is not understood sufficiently to elaborate any useful guideline for genome design. The above features are likely to take an increasingly important role in the design of new genome layouts, even though the extent of such effects on cell physiology still awaits an evaluation. On the side of chromosomal proximity, the main lesson is to limit it to less than 20 successive genes. A more hypothetical guideline would be to co-orient bacterial genes that encode products that must co-fold into a single complex, given that folding of large proteins often starts before their mRNA synthesis ends. On the side of periodic positioning, it is important to evaluate the situation for the genome at hand [83] in order to introduce synthetic genes at appropriate sites that favor their spatial clustering, and consequently improve their co-functioning. 7. Conclusion This essay has gathered a number of known genome-scale observations on natural eubacterial chromosomes. An important issue is whether these observations constitute as many constraints that are essential to the success of synthetic genomics, which is the thesis defended in this essay. Alternatively, these observations could merely suggest incremental improvements that might impart minor fitness advantages. This issue has been briefly evaluated in each of the above sections. We wish to provide here a broader perspective. The alternative view of a genome as a bag of genes has been consistently loosing weight since bioinformatics has allowed to decipher partial genome sequences in the 1980s. Yet, in many published molecular genetics experiments, functional complementation seems to depend little on gene location. However, the vast majority of such experiments done on chromosomes rather than on plasmids have made use of strong gene promoters that are expected to stand alone. Furthermore, negative or highly surprising results tend to not be reported, with interesting exceptions when the natural promoter of the displaced gene is kept (e.g., [84]). It is also noteworthy that significant chromosomal inversions are generally deleterious. In sum, this important issue cannot receive an univocal answer at present. Full genome design and construction should actually prove an effective route to clarify these fundamental issues. In a sense, this essay is work in progress, as new major constraints on natural chromosomes are likely to emerge in the coming years which will have an impact on the design of new-to-nature genomes. While these constraints are described here in a natural language, they should be expressed in a more formal way to automate design. Among other formalisms, we trust that linguistic models could adequately be used, as they have already been proposed both to describe [85] and to design genes [86,87]. Investigations on genome design beautifully illustrate how synthetic biology can simultaneously contribute to fundamental insights and to biotechnological advances. Sufficient prior knowledge, summarized here, has been accumulated so far to warrant partial success in designing and constructing new genomes. In turn, such pioneering attempts will be invaluable to improve our fundamental understanding of Author's personal copy F. Képès et al. / FEBS Letters 586 (2012) 2043–2048 the layout of bacterial genomes. Moreover, these and previous insights will open the way to a new generation of more robust and efficient biotechnological strains. Acknowledgments This work was supported by the Sixth European Research Framework (GENNETEC project number 034952), by the Seventh European Research Framework (ST-FLOW project number 289326), and by CNRS, CRIF and GenopoleÒ. I.J. is supported by a Novartis grant (CRG). References [1] Rebollo, J.E., Francois, V. and Louarn, J.M. (1988) Detection and possible role of two large nondivisible zones on the Escherichia coli chromosome. Proc. Natl. Acad. Sci. U S A 85, 9391–9395. [2] Hill, C.W. and Gray, J.A. (1998) Effects of chromosomal inversion on cell fitness in Escherichia coli K-12. Genetics 119, 771–778. [3] Campo, N., Dias, M.J., Daveran-Mingot, M.-L., Ritzenthaler, P. and Le Bourgeois, P. (2004) Chromosomal constraints in Gram-positive bacteria revealed by artificial inversions. Mol. Microbiol. 51, 511–522. [4] Boccard, F., Esnault, E. and Valens, M. (2005) Spatial arrangement and macrodomain organization of bacterial chromosomes. Mol. Microbiol. 57, 9–16. [5] Srivatsan, A., Tehranchi, A., MacAlpine, D.M. and Wang, J.D. (2010) Coorientation of replication and transcription preserves genome integrity. PLoS Genet. 6 (1), e1000810. [6] Pósfai, G., Plunkett 3rd, G., Fehér, T., Frisch, D., Keil, G.M., Umenhoffer, K., Kolisnychenko, V., Stahl, B., Sharma, S.S., de Arruda, M., Burland, V., Harcum, S.W. and Blattner, F.R. (2006) Emergent properties of reduced-genome Escherichia coli. Science 312, 1044–1046. [7] Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.Y., et al. (2010) Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56. [8] Porcar, M., Danchin, A., de Lorenzo, V., Dos Santos, V.A., Krasnogor, N., Rasmussen, S. and Moya, A. (2011) The ten grand challenges of synthetic life. Syst. Synth. Biol. 5, 1–9. [9] Lux, M.W., Bramlett, B.W., Ball, D.A. and Peccoud, J. (2012) Genetic design automation: engineering fantasy or scientific renewal? Trends Biotechnol. 30, 120–126. [10] Hirose, S., Hiraga, S. and Okazaki, T. (1983) Initiation site of deoxyribonucleotide polymerization at the replication origin of the Escherichia coli chromosome. Mol. Gen. Genet. 189, 422–431. [11] Campbell, J.L. and Kleckner, N. (1990) E. coli oriC and the dnaA gene promoter are sequestered from dam methyltransferase following the passage of the chromosomal replication fork. Cell 62, 967–979. [12] Duggin, I.G. and Bell, S.D. (2009) Termination structures in the Escherichia coli chromosome replication fork trap. J. Mol. Biol. 387, 532–539. [13] Sernova, N.V. and Gelfand, M.S. (2008) Identification of replication origins in prokaryotic genomes. Brief. Bioinform. 9, 376–391. [14] Lemon, K.P., Moriya, S., Ogasawara, N. and Grossman, A.D. (2002) Chromosome replication and segregation in: Bacillus subtilis and its Closest Relatives: From Genes to Cells (Sonenshein, A., Hoch, J. and Losick, R., Eds.), pp. 73–86, ASM Press, WA. [15] Duggin, I.G. and Wake, R.G. (2002) Termination of chromosome replication in: Bacillus subtilis and its Closest Relatives: From Genes to Cells (Sonenshein, A., Hoch, J. and Losick, R., Eds.), pp. 87–95, ASM Press, WA. [16] Gotta, S., Miller Jr., O. and French, S. (1991) RRNA transcription rate in Escherichia coli. J. Bacteriol. 173, 6647–6649. [17] Mok, M. and Marians, K.J. (1987) The Escherichia coli preprimosome and DNA B helicase can form replication forks that move at the same rate. J. Biol. Chem. 262, 16644–16654. [18] Price, M.N., Alm, E.J. and Arkin, A.P. (2005) Interruptions in gene expression drive highly expressed operons to the leading strand of DNA replication. Nucleic Acids Res. 33, 3224–3234. [19] Rocha, E.P. and Danchin, A. (2003) Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nat. Genet. 34, 377–378. [20] Trautinger, B.W., Jaktaji, R.P., Rusakova, E. and Lloyd, R.G. (2005) RNA polymerase modulators and DNA repair activities resolve conflicts between DNA replication and transcription. Mol. Cell 19, 247–258. [21] Boubakri, H., de Septenville, A.L., Viguera, E. and Michel, B. (2010) The helicases DinG, Rep and UvrD cooperate to promote replication across transcription units in vivo. EMBO J. 29, 145–157. [22] Miura, A., Krueger, J.H., Itoh, S., de Boer, H.A. and Nomura, M. (1981) Growthrate-dependent regulation of ribosome synthesis in E. coli: expression of the lacZ and galK genes fused to ribosomal promoters. Cell 25, 773–782. [23] Bremer, H. and Dennis, P. (1996) Modulation of chemical composition and other parameters of the cell by growth rate in: Escherichia coli and Salmonella: Cellular and Molecular Biology (Neidhardt, F.C., Curtiss, R.I., Ingraham, J.L., Lin, E.C.C. and Low, K.B., et al., Eds.), pp. 1553–1569, ASM Press, WA. [24] Cooper, S. and Helmstetter, C.E. (1968) Chromosome replication and the division cycle of Escherichia coli B/r. J. Mol. Biol. 31, 519–540. 2047 [25] French, S. (1992) Consequences of replication fork movement through transcription units in vivo. Science 258, 1362–1365. [26] Pomerantz, R.T. and O’Donnell, M. (2008) The replisome uses mRNA as a primer after colliding with RNA polymerase. Nature 456, 762–766. [27] Yao, N.Y. and O’Donnell, M. (2009) Replisome structure and conformational dynamics underlie fork progression past obstacles. Curr. Opin. Cell Biol. 21, 336–343. [28] Rocha, E.P. (2008) The organization of the bacterial genome. Annu. Rev. Genet. 42, 211–233. [29] Liu, B. and Alberts, B.M. (1995) Head-on collision between a DNA replication apparatus and RNA polymerase transcription complex. Science 267, 1131–1137. [30] Elias-Arnanz, M. and Salas, M. (1999) Resolution of head-on collisions between the transcription machinery and bacteriophage phi29 DNA polymerase is dependent on RNA polymerase translocation. EMBO J. 18, 5675–5682. [31] Guy, L. and Roten, C.A. (2004) Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene 340, 45–52. [32] Ellwood, M. and Nomura, M. (1982) Chromosomal locations of the genes for rRNA in Escherichia coli K-12. J. Bacteriol. 149, 458–468. [33] Zeigler, D.R. and Dean, D.H. (1990) Orientation of genes in the Bacillus subtilis chromosome. Genetics 125, 703–708. [34] Omont, N. and Képès, F. (2004) Transcription/replication collisions cause bacterial transcription units to be longer on the leading strand of replication. Bioinformatics 20, 2719–2725. [35] Abby, S. and Daubin, V. (2007) Comparative genomics and the evolution of prokaryotes. Trends Microbiol. 15, 135–141. [36] Couturier, E. and Rocha, E.P. (2006) Replication-associated gene dosage effects shape the genomes of fast-growing bacteria but only for transcription and translation genes. Molec. Microbiol. 59, 1506–1518. [37] Grantham, R., Gautier, C., Gouy, M., Mercier, R. and Pavé, A. (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8, r49–r62. [38] Wada, K.S., Aota, R., Tsuchiya, F., Ishibashi, T., Gojobori, T. and Ikemura, T. (1990) Codon usage tabulated from GenBank genetic sequence data. Nucleic Acids Res. 18 (Suppl), 2367–2411. [39] Lafay, B., Lloyd, A.T., McLean, M.J., Devine, K.M., Sharp, P.M. and Wolfe, K.H. (1999) Proteome composition and codon usage in spirochaetes: speciesspecific and DNA strand-specific mutational biases. Nucleic Acids Res. 27, 1642–1649. [40] Moszer, I., Rocha, E.P. and Danchin, A. (1999) Codon usage and lateral gene transfer in Bacillus subtilis. Curr. Opin. Microbiol. 2, 524–528. [41] Sharp, P.M. and Li, W.-H. (1987) The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acid Res. 15, 1281–1295. [42] Carbone, A., Zinovyev, A. and Képès, F. (2003) Codon Adaptation Index as a measure of dominating codon bias. Bioinformatics 19, 2005–2015. [43] Ikemura, T. (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34. [44] Varenne, S., Buc, J., Lloubès, R. and Lazdunski, C. (1984) Translation is a nonuniform process. Effect of tRNA availability on the rate of elongation of nascent polypeptide chains. J. Mol. Biol. 180, 549–576. [45] Talkad, V., Schneider, E. and Kennell, D. (1976) Evidence for variable rates of ribosome movement in Escherichia coli. J. Mol. Biol. 104, 299–303. [46] Palidwor, G.A., Perkins, T.J. and Xia, X. (2010) A general model of codon bias due to GC mutational bias. PLoS ONE 5 (10), e13431. [47] Hershberg, R. and Petrov, D.A. (2008) Selection on codon bias. Annu. Rev. Genet. 42, 287–299. [48] Carbone, A., Képès, F. and Zinovyev, A. (2004) Codon bias signatures, organisation of microorganisms in codon space and lifestyle. Mol. Biol. Evol. 22, 547–561. [49] Dittmar, K.A., Mobley, E.M., Radek, A.J. and Pan, T. (2004) Exploring the regulation of tRNA distribution on the genomic scale. J. Mol. Biol. 337, 31–47. [50] Arkin, A., Ross, J. and McAdams, H.H. (1998) Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics 149, 1633–1648. [51] Balaban, N.Q., Merrin, J., Chait, R., Kowalik, L. and Leibler, S. (2004) Bacterial persistence as a phenotypic switch. Science 305, 1622–1625. [52] Raser, J.M. and O’Shea, E.K. (2005) Noise in gene expression: origins, consequences, and control. Science 309, 2010–2013. [53] Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi, J.L. and Weissman, J.S. (2006) Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846. [54] Wang, Z. and Zhang, J. (2011) Impact of gene expression noise on organismal fitness and the efficacy of natural selection. Proc. Natl Acad. Sci. US A 108, E67–E76. [55] Warnecke, T. and Hurst, L.D. (2011) Error prevention and mitigation as forces in the evolution of genes and genomes. Nat. Rev. Genet. 12, 875–881. [56] Thomas, R. and D’Ari, R. (1990) Biological feedback, CRC Press, Boca Raton, FL. [57] Alon, U. (2007) Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461. [58] Shen-Orr, S.S., Milo, R., Mangan, S. and Alon, U. (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68. [59] Guelzim, N., Bottani, S., Bourgine, P. and Képès, F. (2002) Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 31, 60–63. Author's personal copy 2048 F. Képès et al. / FEBS Letters 586 (2012) 2043–2048 [60] Batada, N.N. and Hurst, L.D. (2007) Evolution of chromosome organization driven by selection for reduced gene expression noise. Nature Genet. 39, 945– 949. [61] Vilar, J.M.G. and Leibler, S. (2003) DNA looping and physical constraints on transcription regulation. J. Mol. Biol. 331, 981–989. [62] Thanbichler, M., Viollier, P.H. and Shapiro, L. (2005) The structure and function of the bacterial chromosome. Curr. Opin. Genet. Dev. 15, 153–162. [63] Balke, V.L. and Gralla, J.D. (1987) Changes in the linking number of supercoiled DNA accompany growth transitions in Escherichia coli. J. Bacteriol. 169, 4499– 4506. [64] Travers, A. and Muskhelishvili, G. (2005) DNA supercoiling – a global transcriptional regulator for enterobacterial growth? Nat. Rev. Microbiol. 3, 157–169. [65] Dame, R.T. (2005) The role of nucleoid-associated proteins in the organization and compaction of bacterial chromatin. Mol. Microbiol. 56, 858–870. [66] Oram, M., Shipstone, E. and Halford, S.E. (1994) Synapsis by Tn3 resolvase: speed and dependence on DNA supercoiling. Biochem. Soc. Trans. 22, 303. [67] Deng, S., Stein, R.A. and Higgins, N.P. (2005) Organization of supercoil domains and their reorganization by transcription. Molec. Microbiol. 57, 1511–1521. [68] Dorman, C.J. (2011) Regulation of transcription by DNA supercoiling in Mycoplasma genitalium: global control in the smallest known self-replicating genome. Mol. Microbiol. 81, 302–304. [69] Sobetzko, P., Travers, A. and Muskhelishvili, G. (2012) Gene order and chromosome dynamics coordinate spatiotemporal gene expression during the bacterial growth cycle. Proc. Natl. Acad. Sci. U S A 109, E42–E50. [70] Junier, I., Hérisson, J., and Képès, F. Genomic organization of evolutionarily correlated genes in bacteria: limits and strategies. J. Mol. Biol., in press (doi http://dx.doi.org/10.1016/j.jmb.2012.03.009). [71] Képès, F. and Vaillant, C. (2003) Transcription-based solenoidal model of chromosomes. ComPlexUs 1, 171–180. [72] Képès, F. (2004) Periodic transcriptional organization of the E. coli genome. J. Mol. Biol. 340, 957–964. [73] Képès, F. (2003) Periodic epi-organization of the yeast genome revealed by the distribution of promoter sites. J. Mol. Biol. 329, 859–865. [74] Mercier, G., Berthault, N., Touleimat, N., Képès, F., Fourel, G., et al. (2005) A haploid-specific transcriptional response to irradiation in Saccharomyces cerevisiae. Nucleic Acids Res. 33, 6635–6643. [75] Wright, M., Kharchenko, P., Church, G. and Segrè, D. (2007) Chromosomal periodicity of evolutionarily conserved gene pairs. Proc. Natl. Acad. Sci. U S A 104, 10559–10564. [76] Mathelier, A. and Carbone, A. (2010) Chromosomal periodicity and positional networks of genes in Escherichia coli. Mol. Syst. Biol. 6, 366. [77] Junier, I., Martin, O. and Képès, F. (2010) Spatial and topological organization of DNA chains induced by gene co-localization. PLoS Comput. Biol. 6, e1000678. [78] Dröge, P. and Müller-Hill, B. (2001) High local protein concentrations at promoters: strategies in prokaryotic and eukaryotic cells. Bioessays 23, 179– 183. [79] Fraser, P. and Bickmore, W. (2007) Nuclear organization of the genome and the potential for gene regulation. Nature 447, 413–417. [80] Miller Jr., O.L., Hamkalo, B.A. and Thomas Jr., C.A. (1970) Visualization of bacterial genes in action. Science 169, 392–395. [81] Kuriyan, J. and Eisenberg, D. (2007) The origin of protein interactions and allostery in colocalization. Nature 450, 983–990. [82] Norris, V. and Madsen, M.S. (1995) Autocatalytic gene expression occurs via transertion and membrane domain formation and underlies differentiation in bacteria: a model. J. Mol. Biol. 253, 739–748. [83] Junier, I., Hérisson, J. and Képès, F. (2010) Efficient detection of periodic patterns within small datasets. Algorithms Mol. Bio. 5, 31. [84] Martin, R.G. and Rosner, J.L. (2002) Genomics of the marA/soxS/rob regulon of Escherichia coli: identification of directly activated promoters by application of molecular genetics and informatics to microarray data. Mol. Microbiol. 44, 1611–1624. [85] Searls, D.B. (2002) The language of genes. Nature 420, 211–217. [86] Cai, Y. et al. (2007) A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts. Bioinformatics 23, 2760–2767. [87] Cai, Y. et al. (2009) Modeling structure–function relationships in synthetic DNA sequences using attribute grammars. PLoS Comput. Biol. 5, e1000529.