Download FEBS Lett. 586, 2043-2048 - iSSB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Polyploid wikipedia , lookup

Chromosome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Transcription factor wikipedia , lookup

Transposable element wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

X-inactivation wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Replisome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA-Seq wikipedia , lookup

Human Genome Project wikipedia , lookup

Human genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic library wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
FEBS Letters 586 (2012) 2043–2048
journal homepage: www.FEBSLetters.org
Review
The layout of a bacterial genome
François Képès a,b,⇑, Brian C. Jester a, Thibaut Lepage a, Nafiseh Rafiei a, Bianca Rosu a, Ivan Junier a,c
a
Epigenomics Project, institute of Systems and Synthetic Biology, GenopoleÒ, CNRS, University of Evry, France
Institute of Systems and Synthetic Biology, Division of Molecular Biosciences, Imperial College, London, UK
c
Center for Genomic Regulation, Barcelona, Spain
b
a r t i c l e
i n f o
Article history:
Received 2 March 2012
Revised 25 March 2012
Accepted 26 March 2012
Available online 31 March 2012
Edited by Thomas Reiss and Wilhelm Just
Keywords:
Genome layout and design
Bacteria
Replication
Transcription
Codon bias
Co-functional gene
a b s t r a c t
Recently the mismatch between our newly acquired capacity to synthetize DNA at genome scale, and
our low capacity to design ab initio a functional genome has become conspicuous. This essay gathers
a variety of constraints that globally shape natural genomes, with a focus on eubacteria. These constraints originate from chromosome replication (leading/lagging strand asymmetry; gene dosage
gradient from origin to terminus; collisions with the transcription complexes), from biased codon
usage, from noise control in gene expression, and from genome layout for co-functional genes.
On the basis of this analysis, lessons are drawn for full genome design.
Ó 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
1. Introduction
After 40 years of genetic engineering, human technology has
produced millions of genomes that differ, often in a fully-defined
way, from the original genomes of genetically tractable species,
mostly bacteria. These precise differences generally amount to
the alteration, insertion or deletion of one or a few genes. A few
cases have been reported of deletions or inversions of long genomic segments, carried out to ask specific scientific questions
[1–5]. More recently, under the remit of synthetic biology, we have
witnessed more systematic attempts at deleting a series of short
and long segments (e.g., [6]). Various goals were pursued, such as
to increase genome stability or to shorten genomes; on the application side, a major objective was to improve cell performance as
a host or chassis for heterologous genes expression. Another line
of research, called synthetic genomics, aims at chemically synthesizing full genomes to replace the original within live cells. This
approach has already been validated with the synthesis and
replacement of a 1 Mbp bacterial genome, even though its sequence was nearly identical to that of the original strain [7]. This
near identity symbolizes the mismatch between our newly acquired capacity to synthetize DNA at genome scale, and our low
capacity to design ab initio a functional genome. This discordance
⇑ Corresponding author at: Epigenomics Project, institute of Systems and
Synthetic Biology, GenopoleÒ, CNRS, University of Evry, France.
E-mail address: [email protected] (F. Képès).
was eloquently expressed by Porcar et al. in the following terms:
‘‘paradoxically, the success in synthesizing (copying) DNA also
highlights the very poor ability to de novo design (writing) genomes, which is certainly a consequence of the limited knowledge
we have of the inherent complexity of living forms.’’ [8]. Operationally, what we are missing is the whole set of the essential constraints that shape a functional genome. It is anticipated that for a
given set of constraints, myriads of solutions must exist, among
them the natural genomes. It is sometimes estimated that there exist about 108 species on the Earth, and strain to strain variability
within each species increases even further the number of viable
genomes that constitute the global hereditary complement of our
planet. Within this scope, comparative genomics has efficiently
delineated fundamental bacterial processes because it is now supported by a thousand sequenced genomes. Hence, several sets of
major constraints may be uncovered in the realm of bacteria, and
testing this hypothesis may pave the way to many fundamental
insights.
This essay highlights the known constraints that shape a functional bacterial genome. It aims at proposing strategies for full
design of a new-to-nature genome layout in the vein of synthetic
biology [9]. Constraints at the gene scale, such as transcriptional
and translational start and stop signals, or the making of an operon,
as important as they are, are out of the scope of this discussion,
which focuses on the genome scale. Genome layout comprises
the position, order and orientation of genes in the genome. These
features are highly non-random, reflecting in part functional
0014-5793/$36.00 Ó 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.febslet.2012.03.051
Author's personal copy
2044
F. Képès et al. / FEBS Letters 586 (2012) 2043–2048
Fig. 1. Genome-scale properties of eubacterial chromosomes. (a) Head-on (upper unit) and codirectional (lower unit) collisions between replication (‘‘Rep’’) and transcription
(‘‘T’’) complexes. At the time shown, the upper replication complex has already collided with all transcription complexes that were on the upper unit when it was near the end
of this unit. It will now collide with the transcription complexes that started meanwhile. Consequently, if solving a collision takes longer than the time between two
transcription initiations, the replication could be blocked. By contrast, the other replication complex is at the start of the lower unit and will collide only with its first
transcription complexes, as the last ones will reach the end of the unit before being caught up by the replication complex. ‘‘Start’’, transcription initiation site; ‘‘Stop’’,
transcription termination site; dark triangle, transcription complex that will be dislodged by the replication complex; light triangle, transcription complex that will not be
dislodged by the replication complex. (b) The circular chromosome of bacteria is generally replicated bi-directionally from an origin ‘o’ to a terminus which, roughly, is
diametrically opposed. The duration of one replication round from start to finish is rather constant for medium to fast growth rates. For instance, it is close to 40 min for E. coli
at 37 °C. Under conditions of medium growth (top), the replicative period of 40 min is therefore shorter than the generation time, e.g., 60 min. No re-initiation of replication
happens before the termination of the preceding round of replication. Thus, replication is followed by a period devoid of DNA synthesis. Under conditions of rapid growth
(bottom), the replicative period is longer than the generation time, e.g., 20 min. However, replication re-initiations must take place at the pace of the generation time.
Consequently, two or three levels of replication forks may coexist, following the successive initiation rounds. (c) Example of codon frequencies for the six synonymous codons
encoding the amino acid Leucine. From these frequencies, a codon weight is simply calculated as the ratio of its frequency over that of the favorite codon, here UUG. (d)
Chromosomal proximity and periodical spacing of co-functional genes. Co-functional genes tend to be proximal along the bacterial chromosome, with a limit of about 20
genes. Beyond this limit, they often display a periodical spacing. This periodicity is interpreted as a means to cluster co-functional genes in space through the appropriate
folding of chromosomes, here symbolized by a solenoid.
adaptation. Genes, through recombination and related processes,
repeatedly sample various positions along the chromosome. Over
evolutionary times, genes come to be fixed at one among many
positions that confer high fitness to the cell. Often, relative gene
position matters more than absolute location: a gene would fit
better at a given position with respect to other genes, e.g., cofunctional genes.
2. Control of collisions between replication and transcription
complexes
The fundamental processes of replication and transcription take
place on the same physical chromosome. In Escherichia coli, the
replication forks progress along the two halves of the circular chromosome, from the oriC locus [10,11] to the ter region [12], thus
defining two ‘‘replichores’’. These specific loci have been well characterized and are generally predictable with a bioinformatics approach [13]. A similar organization is observed in Bacillus subtilis
[14,15], although the two replichores are of unequal length. Replication of the DNA double helix occurs on each strand from 50 to 30 .
Thus, on a given replichore, one DNA strand, called the ‘‘leading
strand’’, is replicated co-directionally with the replication fork.
The other strand is called the ‘‘lagging strand’’. Transcription occurs on shorter stretches, on average in the 1 kbp range. It may
progress along the leading strand (co-orientation with the replication fork), or the lagging strand (head-on). In E. coli, the ratio of
replication to transcription speeds is estimated to be between 14
and 24. Transcription speed is 42 bp/s [16], replication speed is
600–1000 bp/s [17]. As a consequence, collisions between the replication complex and the much slower transcription complex have
been observed in different organisms (Fig. 1a). Collisions have been
proposed to negatively affect cellular fitness by interrupting the
expression of highly-transcribed genes [18], or by leading to the
formation of incomplete transcripts, which subsequently results
in toxic truncated polypeptides [19].
The effects on chromosome replication are actually most deleterious. For genome design, the important fact is that the arrest of
replication forks due to collisions with transcription complexes
leads to genomic instability and cell death [5]. Mechanisms that
promote the progression of replication forks past transcription
complexes are therefore essential for propagation and preservation
of the genome. Numerous factors are involved in recovering from
collisions and minimizing their molecular consequences [20,21].
The genes encoding these factors must therefore be present in
any chassis genome.
Besides this recovery mechanism, a second type of mechanism
aims at preventing collisions by an adapted genome layout. Functional constraints are as follows. In fast-growing cells, both replication of the replichores and transcription of the highly-expressed
genes are initiated more frequently, further elevating this potential
conflict [22–24]. The outcome of their encounter should depend
strongly on their relative direction. RNA polymerase is dislodged
by replication in either direction [25,26]. On the other hand, replication is affected mostly by head-on transcription (on the lagging
strand) [26–30]. The following observations on natural chromosomes are direct consequences of these functional constraints
and should be turned into principles to design the layout of a
genetically stabilized chassis genome. Firstly, highly-expressed
genes are preferentially transcribed co-directionally with replication (on the leading strand) across numerous species [19,28]. For
instance, in B. subtilis and E. coli, all rRNA operons are transcribed
on the leading strand [31–33]. Secondly, essential genes are
Author's personal copy
F. Képès et al. / FEBS Letters 586 (2012) 2043–2048
enriched to a greater extent than non-essential genes on the leading strand [19]. Thirdly, there is a global bias for co-directionality
of replication and transcription. In B. subtilis and E. coli, this bias
is 75% and 55% of all genes, respectively [32,33]. Finally, it is noteworthy that collision probability is proportional to the length of
transcription units (Fig. 1a). Consequently, longer transcription
units are enriched on the leading strand. This enrichment decreases as the replication / transcription speed ratio increases [34].
3. Replication-associated dosage of genes involved in
translation and transcription
Replication-associated gene dosage is an important determinant of chromosome organization and dynamics, especially among
fast-growing bacteria. The bidirectional replication of bacterial
chromosomes leads to transient gene dosage effects. Indeed, DNA
replication must be initiated once per cell cycle, while it may take
longer than one generation time and particularly so at high growth
rates (Fig. 1b). It appears that dozens of replication forks may be
simultaneously present in the cells of certain species. Gene dosage
effects strongly constrain the position of genes involved in translation and transcription, but not the position of other highly expressed genes [35]. The relative proximity of the former genes to
the origin of replication is stronger for RNA polymerase, then rRNA,
then ribosomal proteins and the most abundant tRNAs. In eubacteria bearing multiple chromosomes, highly expressed genes are
preferentially found in the largest chromosome, a feature that
maximizes the gene dosage effects because this chromosome is
longest to replicate [36].
Biotechnology most often makes use of fast-growing bacteria,
precisely those whose genome layout has been shown to most
strongly exhibit replication-associated gene dosage biases. The
principles discussed above must therefore be integrated into any
design study of the genome layout of fast-growing bacteria. Leaving them out would presumably result in imbalances in the translation and transcription machineries, with dire consequences on
growth rate and bioproduction capacity.
4. Codon and strand compositional biases
The genetic code is degenerate. Synonymous codons are codons
that match the same amino acid. Synonymous codons are seldom
used with equal frequencies [37,38]. Both natural selection and
mutational biases contribute to changes in codon usage bias within
a genome. Mutational bias is caused by the bias in the errors made
by the DNA replication and repair machineries. It may cause a preference for codons with G or C at the third nucleotide position, thus
altering the GC-content [39], or it may enrich the leading strand in
G + C compared to the lagging strand [39]. Codon usage bias may
also arise in some specific locations as a consequence of horizontal
transfer of genes with different base content [40]. Selective forces
are also shaping codon usage bias. In particular, ‘‘translational
bias’’ plays a major role in fast growing prokaryotes and eukaryotes. From the genome sequence and its predicted protein-coding
regions, this bias can easily be quantified [41], even in the absence
of prior biological knowledge about the organism under study [42]
(Fig. 1c). Three main facts support the idea of translational bias:
highly expressed genes tend to use only a limited number of
codons and thus to display a high codon bias [37,41]; preferred
codons and iso-acceptor tRNA content exhibit a strong positive
correlation [43]; and tRNA iso-acceptor pools affect the rate of
polypeptide chain elongation [44]. Thus, a non-optimal codon will
cause the ribosome to pause while waiting for the correct
aminoacyl-tRNA to reach its site [45]. The resulting slow-down
in ribosome movement is expected to be inversely proportional
2045
to the aminoacyl-tRNA concentration [44]. As this essay is about
genome-scale design, translational bias at the gene and at the codon levels will not be discussed further.
These biases alter strand composition and codon choice [46,47].
In turn, both codon and mutational biases shape the ability of the
organism to exchange genetic material with other species by
homologous recombination. Hence, these compositional biases affect the capacity and species profile for horizontal gene transfer
and they have consequences on the process of speciation. They
may thus be considered as ways to reduce the probability of horizontal gene transfer, e.g., for strains that are planned to be released
in the environment for bio-remediation. As similar physiology or
habitat appear to result in comparable codon biases [48], the
choice of preferred codons could be elaborated to decrease the
probability of horizontal gene transfer by avoiding the dominant
bias in the targeted habitat.
Keeping in mind that biotechnological strains are often fast
growers, the situation is different for translational bias because its
impact is not only on recombinational pattern, but also on protein
biosynthesis. It is generally believed that codon bias is maintained
by a balance between selection, mutation, and genetic drift. The actual codon bias for any given organism is a rather arbitrary feature.
While there is a strong requirement for self-consistency within a
genome, there is no known constraint on picking one bias rather
than another one. If there were any lesson to draw from this knowledge in designing a full genome ab initio, it would be to respect selfconsistency of translational codon bias throughout the genome. The
choice of the preferred codons could in principle be arbitrary but
should be matched by the cognate tRNA concentrations, which
can be tuned by playing with the tRNA number of gene copies,
and their transcription and degradation rates [49]. The absence of
self-consistency would likely result in mild to medium effects on
cell physiology, due to stoichiometric imbalances among proteins
involved in common complexes or pathways, as well as among complementary pathways.
5. Control of noise in gene expression at the molecular scale
In bacterial cells, the number of copies of many key proteins is
very small. For instance, many transcription factors are in 10–100
copies per cell. It is therefore expected that the events they contribute to should be strongly fluctuating in time and space. It has
been argued that noise may have positive effects, e.g. to allow a
population to cope with an uncertain future by not having all individuals follow the same route [50,51].
It may also have negative effects that various mechanisms may
attempt to control [52]. For physiological reasons, expression of
some genes must be rather constant, despite the stochastic nature
of most molecular events that determine the rate of transcriptional
initiation, such as binding of transcription factors to DNA. Interestingly, it was observed in yeast that the degree of noise in the
expression levels is highly variable from gene to gene [53]. This
suggests that noise for individual genes might be under adaptive
pressure [54]; reviewed in [55]. Similar phenomena are likely to
be at play in bacteria. One possible avenue to control noise would
be to implement synthetic circuits at genome scale that comprise
appropriate loops [56]. In particular, with proper parameters, negative feedback loops have stabilizing properties, while feedforward
loops with time delays can filter out short-term fluctuations [57].
In natural regulatory circuits, such loops are statistically over-represented, suggesting that indeed, noise control is an important and
general feature encoded in natural genomes [58,59].
Moreover, it appears that yeast genes with comparable noise
levels tend to cluster together [60]. This fact, if it turns out to also
apply to bacteria, would constitute a further constraint on their
Author's personal copy
2046
F. Képès et al. / FEBS Letters 586 (2012) 2043–2048
genome layout, even though it is currently impossible to predict
the extent of its physiological consequences. Arguably, the strategies of chromosomal proximity and periodicity for co-functional
genes, observed both in eukaryotes and in prokaryotes, also constitute ways to reduce noise at transcriptional initiation [61]. Because
the logics and consequences of these strategies go beyond the issue
of noise control, they are discussed separately below.
6. Interplay between genome layout, chromosome
conformation and genome function
Spatial conformations of bacterial chromosomes have been
known for a long time to affect gene regulation; reviewed in [62].
From a structural point of view, bacterial chromosomes are organized into a compact nucleoid comprising DNA and proteins. They
are often negatively supercoiled, which causes the double helix to
adopt a branched and plectonemic or interwound structure. Supercoiling level is determined by a balance between opposing effects
carried out by topoisomerases and ATP-dependent DNA gyrase
[63,64]. The whole process is additionally cross-regulated by factors
known as nucleoid-associated proteins [65]. Because supercoiled
DNA strands rotate quickly [66], chromosomes are actually partitioned into small domains to prevent the spreading of supercoiling
loss. These supercoiling domains are further reorganized by transcription [67], leading to a highly integrated and regulated system.
Interestingly, some physiological transitions induced by external
stimuli are accompanied by changes in DNA supercoiling [63];
reviewed in [64]. Along the same line, it has been proposed that
supercoiling may play a major role in control of gene expression
in small genomes that do not seem to encode proteic transcription
factors [68]. Finally, genes encoding the above-mentioned players
of this integrated system appear along the replichores at ordered
positions that largely correspond to their temporal expression patterns during growth [69]. Even though this peculiar genome layout
has not been proven to be essential for genome function, this recent
observation suggests to order the supercoiling- and nucleoid-associated genes accordingly.
More generally, genome layout, chromosome conformation and
genome function/expression are expected to be interdependent.
Evidence for non-random genome positioning of co-functional
genes has raised since 2003. Here, gene co-functionality may refer
to transcriptional co-regulation or co-expression, complex formation by the products, participation to a common pathway, or a
combination of two or three of the above. Investigation of the
genomic organization of co-functional groups has revealed regularities of two types, either chromosomal proximity or long-range
periodical spacing along the chromosomes (Fig. 1d). On the one
hand, chromosomal proximity has been found to be effectively limited to 20 successive genes in enterobacteria [70]. On the other
hand, periodic positioning has been observed for microbial genes
that are either co-regulated [71–73], co-expressed [74], evolutionarily correlated [75,70], or highly codon-biased [76].
Képès has postulated the existence of a positive feedback loop
connecting periodic genome layout to the cellular conformation
of chromosomes onto transcriptional control [71]. This has been
demonstrated using a thermodynamic model of chromosome folding where transcription factors cross-link distant binding sites
[77]. In this chromosome model, a periodic gene positioning has
been shown to be crucial for achieving chromosome conformations
that favor gene spatial clustering. In the case of transcriptional coregulation, as demonstrated for the lac operon, this increased local
concentration of transcription factors and of their cognate binding
sites leads to strongly enhanced [78,61] and more robust [61,79]
transcriptional control. More generally, in bacteria, proteins tend
to be made close to their encoding genes [80] as a consequence
of the kinetic coupling between transcription and translation.
Therefore, it is expected that elevated concentrations of the cofunctional products will build up locally [70]. This should result
in reduced diffusion times and enhanced rate kinetics [81].
A further layer of complexity arises in the case of membrane or
secreted proteins. In bacteria, the amino-terminal hydrophobic signal sequence of these proteins gets inserted in the cell membrane
while translation is underway, and often while transcription is not
finished: this coupled transcription/translation/insertion process
has been named ‘‘transertion’’ [82]. While it is intuitively clear that
the numerous genes encoding secreted or membrane protein
should be organized to allow efficient secretion/insertion, transertion is not understood sufficiently to elaborate any useful guideline
for genome design.
The above features are likely to take an increasingly important
role in the design of new genome layouts, even though the extent
of such effects on cell physiology still awaits an evaluation. On the
side of chromosomal proximity, the main lesson is to limit it to less
than 20 successive genes. A more hypothetical guideline would be
to co-orient bacterial genes that encode products that must co-fold
into a single complex, given that folding of large proteins often
starts before their mRNA synthesis ends. On the side of periodic
positioning, it is important to evaluate the situation for the genome
at hand [83] in order to introduce synthetic genes at appropriate
sites that favor their spatial clustering, and consequently improve
their co-functioning.
7. Conclusion
This essay has gathered a number of known genome-scale
observations on natural eubacterial chromosomes. An important
issue is whether these observations constitute as many constraints
that are essential to the success of synthetic genomics, which is the
thesis defended in this essay. Alternatively, these observations
could merely suggest incremental improvements that might impart minor fitness advantages. This issue has been briefly evaluated
in each of the above sections. We wish to provide here a broader
perspective.
The alternative view of a genome as a bag of genes has been
consistently loosing weight since bioinformatics has allowed to
decipher partial genome sequences in the 1980s. Yet, in many published molecular genetics experiments, functional complementation seems to depend little on gene location. However, the vast
majority of such experiments done on chromosomes rather than
on plasmids have made use of strong gene promoters that are expected to stand alone. Furthermore, negative or highly surprising
results tend to not be reported, with interesting exceptions when
the natural promoter of the displaced gene is kept (e.g., [84]). It
is also noteworthy that significant chromosomal inversions are
generally deleterious. In sum, this important issue cannot receive
an univocal answer at present. Full genome design and construction should actually prove an effective route to clarify these fundamental issues.
In a sense, this essay is work in progress, as new major constraints on natural chromosomes are likely to emerge in the coming
years which will have an impact on the design of new-to-nature
genomes. While these constraints are described here in a natural
language, they should be expressed in a more formal way to automate design. Among other formalisms, we trust that linguistic models could adequately be used, as they have already been proposed
both to describe [85] and to design genes [86,87]. Investigations
on genome design beautifully illustrate how synthetic biology can
simultaneously contribute to fundamental insights and to biotechnological advances. Sufficient prior knowledge, summarized here,
has been accumulated so far to warrant partial success in designing
and constructing new genomes. In turn, such pioneering attempts
will be invaluable to improve our fundamental understanding of
Author's personal copy
F. Képès et al. / FEBS Letters 586 (2012) 2043–2048
the layout of bacterial genomes. Moreover, these and previous insights will open the way to a new generation of more robust and
efficient biotechnological strains.
Acknowledgments
This work was supported by the Sixth European Research
Framework (GENNETEC project number 034952), by the Seventh
European Research Framework (ST-FLOW project number
289326), and by CNRS, CRIF and GenopoleÒ. I.J. is supported by a
Novartis grant (CRG).
References
[1] Rebollo, J.E., Francois, V. and Louarn, J.M. (1988) Detection and possible role of
two large nondivisible zones on the Escherichia coli chromosome. Proc. Natl.
Acad. Sci. U S A 85, 9391–9395.
[2] Hill, C.W. and Gray, J.A. (1998) Effects of chromosomal inversion on cell fitness
in Escherichia coli K-12. Genetics 119, 771–778.
[3] Campo, N., Dias, M.J., Daveran-Mingot, M.-L., Ritzenthaler, P. and Le Bourgeois,
P. (2004) Chromosomal constraints in Gram-positive bacteria revealed by
artificial inversions. Mol. Microbiol. 51, 511–522.
[4] Boccard, F., Esnault, E. and Valens, M. (2005) Spatial arrangement and
macrodomain organization of bacterial chromosomes. Mol. Microbiol. 57, 9–16.
[5] Srivatsan, A., Tehranchi, A., MacAlpine, D.M. and Wang, J.D. (2010) Coorientation of replication and transcription preserves genome integrity. PLoS
Genet. 6 (1), e1000810.
[6] Pósfai, G., Plunkett 3rd, G., Fehér, T., Frisch, D., Keil, G.M., Umenhoffer, K.,
Kolisnychenko, V., Stahl, B., Sharma, S.S., de Arruda, M., Burland, V., Harcum,
S.W. and Blattner, F.R. (2006) Emergent properties of reduced-genome
Escherichia coli. Science 312, 1044–1046.
[7] Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.Y., et al. (2010)
Creation of a bacterial cell controlled by a chemically synthesized genome.
Science 329, 52–56.
[8] Porcar, M., Danchin, A., de Lorenzo, V., Dos Santos, V.A., Krasnogor, N.,
Rasmussen, S. and Moya, A. (2011) The ten grand challenges of synthetic life.
Syst. Synth. Biol. 5, 1–9.
[9] Lux, M.W., Bramlett, B.W., Ball, D.A. and Peccoud, J. (2012) Genetic design
automation: engineering fantasy or scientific renewal? Trends Biotechnol. 30,
120–126.
[10] Hirose, S., Hiraga, S. and Okazaki, T. (1983) Initiation site of
deoxyribonucleotide polymerization at the replication origin of the
Escherichia coli chromosome. Mol. Gen. Genet. 189, 422–431.
[11] Campbell, J.L. and Kleckner, N. (1990) E. coli oriC and the dnaA gene promoter
are sequestered from dam methyltransferase following the passage of the
chromosomal replication fork. Cell 62, 967–979.
[12] Duggin, I.G. and Bell, S.D. (2009) Termination structures in the Escherichia coli
chromosome replication fork trap. J. Mol. Biol. 387, 532–539.
[13] Sernova, N.V. and Gelfand, M.S. (2008) Identification of replication origins in
prokaryotic genomes. Brief. Bioinform. 9, 376–391.
[14] Lemon, K.P., Moriya, S., Ogasawara, N. and Grossman, A.D. (2002)
Chromosome replication and segregation in: Bacillus subtilis and its Closest
Relatives: From Genes to Cells (Sonenshein, A., Hoch, J. and Losick, R., Eds.), pp.
73–86, ASM Press, WA.
[15] Duggin, I.G. and Wake, R.G. (2002) Termination of chromosome replication in:
Bacillus subtilis and its Closest Relatives: From Genes to Cells (Sonenshein, A.,
Hoch, J. and Losick, R., Eds.), pp. 87–95, ASM Press, WA.
[16] Gotta, S., Miller Jr., O. and French, S. (1991) RRNA transcription rate in
Escherichia coli. J. Bacteriol. 173, 6647–6649.
[17] Mok, M. and Marians, K.J. (1987) The Escherichia coli preprimosome and DNA B
helicase can form replication forks that move at the same rate. J. Biol. Chem.
262, 16644–16654.
[18] Price, M.N., Alm, E.J. and Arkin, A.P. (2005) Interruptions in gene expression
drive highly expressed operons to the leading strand of DNA replication.
Nucleic Acids Res. 33, 3224–3234.
[19] Rocha, E.P. and Danchin, A. (2003) Essentiality, not expressiveness, drives
gene-strand bias in bacteria. Nat. Genet. 34, 377–378.
[20] Trautinger, B.W., Jaktaji, R.P., Rusakova, E. and Lloyd, R.G. (2005) RNA
polymerase modulators and DNA repair activities resolve conflicts between
DNA replication and transcription. Mol. Cell 19, 247–258.
[21] Boubakri, H., de Septenville, A.L., Viguera, E. and Michel, B. (2010) The
helicases DinG, Rep and UvrD cooperate to promote replication across
transcription units in vivo. EMBO J. 29, 145–157.
[22] Miura, A., Krueger, J.H., Itoh, S., de Boer, H.A. and Nomura, M. (1981) Growthrate-dependent regulation of ribosome synthesis in E. coli: expression of the
lacZ and galK genes fused to ribosomal promoters. Cell 25, 773–782.
[23] Bremer, H. and Dennis, P. (1996) Modulation of chemical composition and
other parameters of the cell by growth rate in: Escherichia coli and Salmonella:
Cellular and Molecular Biology (Neidhardt, F.C., Curtiss, R.I., Ingraham, J.L., Lin,
E.C.C. and Low, K.B., et al., Eds.), pp. 1553–1569, ASM Press, WA.
[24] Cooper, S. and Helmstetter, C.E. (1968) Chromosome replication and the
division cycle of Escherichia coli B/r. J. Mol. Biol. 31, 519–540.
2047
[25] French, S. (1992) Consequences of replication fork movement through
transcription units in vivo. Science 258, 1362–1365.
[26] Pomerantz, R.T. and O’Donnell, M. (2008) The replisome uses mRNA as a
primer after colliding with RNA polymerase. Nature 456, 762–766.
[27] Yao, N.Y. and O’Donnell, M. (2009) Replisome structure and conformational
dynamics underlie fork progression past obstacles. Curr. Opin. Cell Biol. 21,
336–343.
[28] Rocha, E.P. (2008) The organization of the bacterial genome. Annu. Rev. Genet.
42, 211–233.
[29] Liu, B. and Alberts, B.M. (1995) Head-on collision between a DNA replication
apparatus and RNA polymerase transcription complex. Science 267, 1131–1137.
[30] Elias-Arnanz, M. and Salas, M. (1999) Resolution of head-on collisions
between the transcription machinery and bacteriophage phi29 DNA
polymerase is dependent on RNA polymerase translocation. EMBO J. 18,
5675–5682.
[31] Guy, L. and Roten, C.A. (2004) Genometric analyses of the organization of
circular chromosomes: a universal pressure determines the direction of
ribosomal RNA genes transcription relative to chromosome replication. Gene
340, 45–52.
[32] Ellwood, M. and Nomura, M. (1982) Chromosomal locations of the genes for
rRNA in Escherichia coli K-12. J. Bacteriol. 149, 458–468.
[33] Zeigler, D.R. and Dean, D.H. (1990) Orientation of genes in the Bacillus subtilis
chromosome. Genetics 125, 703–708.
[34] Omont, N. and Képès, F. (2004) Transcription/replication collisions cause
bacterial transcription units to be longer on the leading strand of replication.
Bioinformatics 20, 2719–2725.
[35] Abby, S. and Daubin, V. (2007) Comparative genomics and the evolution of
prokaryotes. Trends Microbiol. 15, 135–141.
[36] Couturier, E. and Rocha, E.P. (2006) Replication-associated gene dosage effects
shape the genomes of fast-growing bacteria but only for transcription and
translation genes. Molec. Microbiol. 59, 1506–1518.
[37] Grantham, R., Gautier, C., Gouy, M., Mercier, R. and Pavé, A. (1980) Codon
catalog usage and the genome hypothesis. Nucleic Acids Res. 8, r49–r62.
[38] Wada, K.S., Aota, R., Tsuchiya, F., Ishibashi, T., Gojobori, T. and Ikemura, T.
(1990) Codon usage tabulated from GenBank genetic sequence data. Nucleic
Acids Res. 18 (Suppl), 2367–2411.
[39] Lafay, B., Lloyd, A.T., McLean, M.J., Devine, K.M., Sharp, P.M. and Wolfe, K.H.
(1999) Proteome composition and codon usage in spirochaetes: speciesspecific and DNA strand-specific mutational biases. Nucleic Acids Res. 27,
1642–1649.
[40] Moszer, I., Rocha, E.P. and Danchin, A. (1999) Codon usage and lateral gene
transfer in Bacillus subtilis. Curr. Opin. Microbiol. 2, 524–528.
[41] Sharp, P.M. and Li, W.-H. (1987) The codon adaptation index—a measure of
directional synonymous codon usage bias, and its potential applications.
Nucleic Acid Res. 15, 1281–1295.
[42] Carbone, A., Zinovyev, A. and Képès, F. (2003) Codon Adaptation Index as a
measure of dominating codon bias. Bioinformatics 19, 2005–2015.
[43] Ikemura, T. (1985) Codon usage and tRNA content in unicellular and
multicellular organisms. Mol. Biol. Evol. 2, 13–34.
[44] Varenne, S., Buc, J., Lloubès, R. and Lazdunski, C. (1984) Translation is a nonuniform process. Effect of tRNA availability on the rate of elongation of
nascent polypeptide chains. J. Mol. Biol. 180, 549–576.
[45] Talkad, V., Schneider, E. and Kennell, D. (1976) Evidence for variable rates of
ribosome movement in Escherichia coli. J. Mol. Biol. 104, 299–303.
[46] Palidwor, G.A., Perkins, T.J. and Xia, X. (2010) A general model of codon bias
due to GC mutational bias. PLoS ONE 5 (10), e13431.
[47] Hershberg, R. and Petrov, D.A. (2008) Selection on codon bias. Annu. Rev.
Genet. 42, 287–299.
[48] Carbone, A., Képès, F. and Zinovyev, A. (2004) Codon bias signatures,
organisation of microorganisms in codon space and lifestyle. Mol. Biol. Evol.
22, 547–561.
[49] Dittmar, K.A., Mobley, E.M., Radek, A.J. and Pan, T. (2004) Exploring the
regulation of tRNA distribution on the genomic scale. J. Mol. Biol. 337, 31–47.
[50] Arkin, A., Ross, J. and McAdams, H.H. (1998) Stochastic kinetic analysis of
developmental pathway bifurcation in phage lambda-infected Escherichia coli
cells. Genetics 149, 1633–1648.
[51] Balaban, N.Q., Merrin, J., Chait, R., Kowalik, L. and Leibler, S. (2004) Bacterial
persistence as a phenotypic switch. Science 305, 1622–1625.
[52] Raser, J.M. and O’Shea, E.K. (2005) Noise in gene expression: origins,
consequences, and control. Science 309, 2010–2013.
[53] Newman, J.R., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi,
J.L. and Weissman, J.S. (2006) Single-cell proteomic analysis of S. cerevisiae
reveals the architecture of biological noise. Nature 441, 840–846.
[54] Wang, Z. and Zhang, J. (2011) Impact of gene expression noise on organismal
fitness and the efficacy of natural selection. Proc. Natl Acad. Sci. US A 108,
E67–E76.
[55] Warnecke, T. and Hurst, L.D. (2011) Error prevention and mitigation as forces
in the evolution of genes and genomes. Nat. Rev. Genet. 12, 875–881.
[56] Thomas, R. and D’Ari, R. (1990) Biological feedback, CRC Press, Boca Raton, FL.
[57] Alon, U. (2007) Network motifs: theory and experimental approaches. Nat.
Rev. Genet. 8, 450–461.
[58] Shen-Orr, S.S., Milo, R., Mangan, S. and Alon, U. (2002) Network motifs in the
transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68.
[59] Guelzim, N., Bottani, S., Bourgine, P. and Képès, F. (2002) Topological and
causal structure of the yeast transcriptional regulatory network. Nat. Genet.
31, 60–63.
Author's personal copy
2048
F. Képès et al. / FEBS Letters 586 (2012) 2043–2048
[60] Batada, N.N. and Hurst, L.D. (2007) Evolution of chromosome organization
driven by selection for reduced gene expression noise. Nature Genet. 39, 945–
949.
[61] Vilar, J.M.G. and Leibler, S. (2003) DNA looping and physical constraints on
transcription regulation. J. Mol. Biol. 331, 981–989.
[62] Thanbichler, M., Viollier, P.H. and Shapiro, L. (2005) The structure and function
of the bacterial chromosome. Curr. Opin. Genet. Dev. 15, 153–162.
[63] Balke, V.L. and Gralla, J.D. (1987) Changes in the linking number of supercoiled
DNA accompany growth transitions in Escherichia coli. J. Bacteriol. 169, 4499–
4506.
[64] Travers, A. and Muskhelishvili, G. (2005) DNA supercoiling – a global
transcriptional regulator for enterobacterial growth? Nat. Rev. Microbiol. 3,
157–169.
[65] Dame, R.T. (2005) The role of nucleoid-associated proteins in the organization
and compaction of bacterial chromatin. Mol. Microbiol. 56, 858–870.
[66] Oram, M., Shipstone, E. and Halford, S.E. (1994) Synapsis by Tn3 resolvase:
speed and dependence on DNA supercoiling. Biochem. Soc. Trans. 22, 303.
[67] Deng, S., Stein, R.A. and Higgins, N.P. (2005) Organization of supercoil domains
and their reorganization by transcription. Molec. Microbiol. 57, 1511–1521.
[68] Dorman, C.J. (2011) Regulation of transcription by DNA supercoiling in
Mycoplasma genitalium: global control in the smallest known self-replicating
genome. Mol. Microbiol. 81, 302–304.
[69] Sobetzko, P., Travers, A. and Muskhelishvili, G. (2012) Gene order and
chromosome dynamics coordinate spatiotemporal gene expression during
the bacterial growth cycle. Proc. Natl. Acad. Sci. U S A 109, E42–E50.
[70] Junier, I., Hérisson, J., and Képès, F. Genomic organization of evolutionarily
correlated genes in bacteria: limits and strategies. J. Mol. Biol., in press (doi
http://dx.doi.org/10.1016/j.jmb.2012.03.009).
[71] Képès, F. and Vaillant, C. (2003) Transcription-based solenoidal model of
chromosomes. ComPlexUs 1, 171–180.
[72] Képès, F. (2004) Periodic transcriptional organization of the E. coli genome. J.
Mol. Biol. 340, 957–964.
[73] Képès, F. (2003) Periodic epi-organization of the yeast genome revealed by the
distribution of promoter sites. J. Mol. Biol. 329, 859–865.
[74] Mercier, G., Berthault, N., Touleimat, N., Képès, F., Fourel, G., et al. (2005) A
haploid-specific transcriptional response to irradiation in Saccharomyces
cerevisiae. Nucleic Acids Res. 33, 6635–6643.
[75] Wright, M., Kharchenko, P., Church, G. and Segrè, D. (2007) Chromosomal
periodicity of evolutionarily conserved gene pairs. Proc. Natl. Acad. Sci. U S A
104, 10559–10564.
[76] Mathelier, A. and Carbone, A. (2010) Chromosomal periodicity and positional
networks of genes in Escherichia coli. Mol. Syst. Biol. 6, 366.
[77] Junier, I., Martin, O. and Képès, F. (2010) Spatial and topological organization
of DNA chains induced by gene co-localization. PLoS Comput. Biol. 6,
e1000678.
[78] Dröge, P. and Müller-Hill, B. (2001) High local protein concentrations at
promoters: strategies in prokaryotic and eukaryotic cells. Bioessays 23, 179–
183.
[79] Fraser, P. and Bickmore, W. (2007) Nuclear organization of the genome and
the potential for gene regulation. Nature 447, 413–417.
[80] Miller Jr., O.L., Hamkalo, B.A. and Thomas Jr., C.A. (1970) Visualization of
bacterial genes in action. Science 169, 392–395.
[81] Kuriyan, J. and Eisenberg, D. (2007) The origin of protein interactions and
allostery in colocalization. Nature 450, 983–990.
[82] Norris, V. and Madsen, M.S. (1995) Autocatalytic gene expression occurs via
transertion and membrane domain formation and underlies differentiation in
bacteria: a model. J. Mol. Biol. 253, 739–748.
[83] Junier, I., Hérisson, J. and Képès, F. (2010) Efficient detection of periodic
patterns within small datasets. Algorithms Mol. Bio. 5, 31.
[84] Martin, R.G. and Rosner, J.L. (2002) Genomics of the marA/soxS/rob regulon of
Escherichia coli: identification of directly activated promoters by application of
molecular genetics and informatics to microarray data. Mol. Microbiol. 44,
1611–1624.
[85] Searls, D.B. (2002) The language of genes. Nature 420, 211–217.
[86] Cai, Y. et al. (2007) A syntactic model to design and verify synthetic genetic
constructs derived from standard biological parts. Bioinformatics 23, 2760–2767.
[87] Cai, Y. et al. (2009) Modeling structure–function relationships in synthetic
DNA sequences using attribute grammars. PLoS Comput. Biol. 5, e1000529.