Download Opposite nucleotide usage biases in different parts of the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Int. J. Bioinformatics Research and Applications, Vol. 11, No. 4, 2015
347
Opposite nucleotide usage biases in different parts of
the Corynebacterium diphtheriae spaC gene
Vladislav Victorovich Khrustalev* and
Eugene Victorovich Barkovsky
Department of General Chemistry,
Belarusian State Medical University,
Dzerzinskogo 83, Minsk, Belarus
Email: [email protected]
Email: [email protected]
*Corresponding author
Valentina Leonidovna Kolodkina
Vaccine Preventable Diseases Laboratory,
Republican Research and Practical Centre for
Epidemiology and Microbiology,
Filimonova 23, Minsk, Belarus
Email: [email protected]
Tatyana Aleksandrovna Khrustaleva
Laboratory of Cellular Technologies,
Institute of Physiology of the National
Academy of Sciences of Belarus,
Academicheskaya 28, Minsk, Belarus
Email: [email protected]
Abstract: In this work we described a bacterial open reading frame with two
different directions of nucleotide usage biases in its two parts. The level of GCcontent in third codon positions (3GC) is equal to 40.17 ± 0.22% during the
most of the length of Corynebacterium diphtheriae spaC gene. However, in the
3'-end of the same gene (from codon #1600 to codon #1873) 3GC level is equal
to 64.61 ± 0.91%. Using original methodology (‘VVTAK Sliding window’
and ‘VVTAK VarInvar’) we approved that there is an ongoing mutational
AT-pressure during the most of the length of spaC gene (up to codon #1599),
and there is an ongoing mutational G-pressure in the 3'-end of spaC. Intragenic
promoters predicted by three different methods may be the cause of the
differences in preferable types of nucleotide mutations in spaC parts because of
their autonomous transcription.
Keywords: mutational pressure; transcription-associated mutational pressure;
asymmetric mutational pressure; genomic islands; pathogenicity islands;
Corynebacterium diphtheriae; Corynebacterium ulcerans; spaC; adhesion; pili;
nucleotide mutations; promoter prediction; terminator prediction; intragenic
promoters; intragenic terminators.
Copyright © 2015 Inderscience Enterprises Ltd.
348
V.V. Khrustalev et al.
Reference to this paper should be made as follows: Khrustalev, V.V.,
Barkovsky, E.V., Kolodkina, V.L. and Khrustaleva, T.A. (2015) ‘Opposite
nucleotide usage biases in different parts of the Corynebacterium diphtheriae
spaC gene’, Int. J. Bioinformatics Research and Applications, Vol. 11, No. 4,
pp.347–365.
Biographical notes: Vladislav Victorovich Khrustalev, PhD, is an Associate
Professor in the Department of General Chemistry at the Belarusian State
Medical University. His research interests are in the areas of biochemistry,
computational biology, immunology, virology, microbiology, genomics,
proteomics and bioinformatics. The most of his scientific projects are connected
with the mutational pressure theory.
Eugene Victorovich Barkovsky, PhD, Professor, is the Head of the Department
of General Chemistry at the Belarusian State Medical University. His research
interests are in the areas of biochemistry, molecular biology, molecular
evolution, proteomics, immunology, computational biology and bioinformatics.
He has written over 400 research articles and six monographs.
Valentina Leonidovna Kolodkina, PhD, is a senior researcher in Vaccine
Preventable Diseases Laboratory at the Republican Research and Practical
Centre for Epidemiology and Microbiology, Minsk, Belarus. Her research
interests are in the field of microbiology, immunology, vaccine design,
molecular phylogenetic and molecular evolution.
Tatyana Aleksandrovna Khrustaleva is a researcher in the Laboratory of
Cellular Technologies at the Institute of Physiology of the National Academy of
Sciences of Belarus. Her research interests are in the field of biochemistry,
ligand-protein interactions, bioinformatics and comparative genomics.
1
Introduction
Mutational pressure is the resulting direction of all the nucleotide mutations happening in
genome, gene, or even in a fragment of gene. Some types of nucleotide mutations are
either occur more frequently than others, or are repaired worse than others (Sueoka,
1988). As a result, usages of nucleotides prone to more frequent mutations are decreasing
in the genome, gene or a fragment of a gene in a given population.
If we consider a bacterial genome (consisting of double-stranded DNA), then we can
use a sum of guanine and cytosine usages (G+C) to characterise the direction of
symmetric mutational pressure (Sueoka, 1988; Khrustalev and Barkovsky, 2010a).
Mutations leading to the increase or decrease of GC-content are fixing more frequently in
third codon positions (Sueoka, 1988). This happens because there are 32 fourfold
degenerated sites in third codon positions: all nucleotide substitutions in those sites are
synonymous (they don’t cause amino acid substitutions in the protein) (Nei and Kumar,
2000). There are also 14 twofold degenerated sites from third codon positions in which
thymine to cytosine and cytosine to thymine mutations are synonymous and 12 twofold
degenerated sites in which adenine to guanine and guanine to adenine mutations are
synonymous (Khrustalev and Barkovsky, 2010b).
Opposite nucleotide usage biases
349
If we consider a single bacterial gene, then we cannot use the sum of guanine and
cytosine levels to characterise nucleotide usage biases properly. There can be significant
differences between usages of guanine and cytosine, as well as between usages of adenine
and thymine in a single gene. Those biases are formed by asymmetric mutational
pressures (Lobry and Sueoka, 2002). That is why such indices as nucleotide content in
fourfold (T4f, A4f, C4f and G4f) and twofold degenerated sites from third codon
positions (T2f3p, A2f3p, C2f3p and G2f3p) were suggested for the description of
asymmetric mutational pressure (Khrustalev and Barkovsky, 2012).
Replication-associated mutational pressure makes nucleotide usage biases in genes
from lagging strands of DNA different from those in genes from leading strands (Lobry
and Sueoka, 2002). Transcription-associated mutational pressure is responsible for
different nucleotide usage biases for differentially expressed genes (Chen and Chen,
2007). From that point of view, one may use the information on nucleotide usage biases
in studies on differential transcription.
Previously we described the eukaryotic gene (the gene encoding platelet
phosphofructokinase of birds from Passeriformes order) with short regions of elevated
GC-content which may be associated with autonomous transcription of microRNA
precursors (Khrustalev et al., 2014). In the present work we described a situation when
autonomously transcribing elements can be suggested inside a single bacterial open
reading frame because of the presence of the opposite nucleotide usage biases. To
visualise those biases we used original MS Excel based algorithm entitled ‘VVTAK
Sliding Window’ (http://chemres.bsmu.by/VVK%20SW.htm).
An interesting fact about nucleotide usage biases is that they are retrospective: they
still exist long after the point in time when the change in mutational pressure direction
happened (Khrustalev and Barkovsky, 2010a). Mutational pressure can change its
direction because of such reason, as mutation in a gene encoding an enzyme from DNA
repair system. After that change in mutational pressure direction nucleotide usage biases
will start to change too. However, it may take a lot of time to establish a new equilibrium
between the rates of nucleotide mutations and usages of nucleotides. That is why it is
important not just to study nucleotide usage biases, but to check whether mutational
pressure is ongoing.
The aim of the present study was to check the existence of ongoing mutational
pressures of different directions inside a single bacterial open reading frame.
With that aim we developed a new computer algorithm (http://chemres.bsmu.by/
VVK%20VarInvar.htm) entitled ‘VVTAK VarInvar’.
Pili (polymeric adhesines linked to bacterial cell wall) play important role in
corynebacterial pathogenesis. There are three separate pilus gene clusters in the first
complete genome sequence of Corynebacterium diphtheriae (a clinical isolate from the
UK – strain NCTC13129) (Cerdeño-Tárraga et al., 2003). There are three types of
polymeric pili available for C. diphtheriae: spaA-, spaD- and spaH-type pili (Ton-That
and Schneewind, 2003). All three types of pili have similar architecture: polymeric shaft
made of numerous major pilins is joined to a specific tip pilin and a base pilin (minor
pilins). The specific binding of corynebacteria to pharyngeal epithelial cells is attributed
to the two minor pilins (spaB and spaC), which can not only exist as a base and a tip of
the polymeric pilus, but most intriguingly, can be linked to the bacterial cell wall in
monomeric and heterodimeric forms as well (Chang et al., 2011).
350
V.V. Khrustalev et al.
In this study we showed that there are different directions of mutational pressure in
different regions of the same open reading frame coding for spaC. This heterogeneous
GC-content distribution along the length of a single gene cannot be understood without
description of heterogeneous GC-content distribution along the length of C. diphtheriae
genome. According to our hypothesis the both heterogeneities are consequences of
transcription-associated mutational pressure. Other genes with heterogeneous GC-content
distribution along their lengths have been found in C. diphtheriae genome and genomes
of its close relatives from the same genus.
There are genes homologous to spaC in genomes of other Corynebacterium species.
3GC is higher than 50% along the most of their lengths. So, one cannot suggest that
specific nucleotide usage biases in spaC gene have been formed because of the recent
gene fusion event(s). There is a possibility of partial gene transfer (Boc et al., 2012) of
GC-rich 3’-end of SpaC from other species to the GC-poor C. diphtheriae SpaC gene.
Anyway, there should be some inner causes which don’t let those nucleotide usage biases
fade away with the course of time. According to the results of promoter and terminator
predictions there might be autonomously transcribed elements in spaC gene which may
be involved in adhesion regulation.
2
Materials and methods
2.1 Materials
As a material we used nucleotide sequences of spaC gene from completely sequenced and
annotated (Trost et al., 2012) genomes of different C. diphtheriae strains: NCTC 13129
(NC_002935), Va01 (NC_016790), 241 (NC_016782), BH8 (NC_016800), CDCE 8392
(NC_016785), HC01 (NC_016786), HC02 (NC_016802), HC03 (NC_016787), NCTC
5011 (AJVH01000021), and HC04 (NC_016788). The last genome contains two copies
of spaC.
BLAST-search (http://blast.ncbi.nlm.nih.gov/Blast.cgi) has been performed with
amino acid sequence of spaC from C. diphtheriae NCTC 13129 strain as a query. Several
sequences of homologous proteins (also entitled spaC in some species) which belong to
other species from Corynebacterium genus have been found. Nucleotide sequences coding
for those spaC homologues have been included in the current study: Corynebacterium
ulcerans 809 spaC (NC_017317: nucleotides 2229085 – 2234643; 38.3% of identical
amino acid residues with C. diphtheriae spaC); C. ulcerans 0102 (AP012284; 2302504 –
2308062; 38.4%); C. ulcerans BR-AD22 spaC (NC_015683; 2324724 – 2330282;
38.0%); Corynebacterium pseudotuberculosis 258 (NC_017945; 2058211 – 2063823;
40.1%); Corynebacterium casei UCMA 3821 (NZ_CAFW01000104; 509 – 6160;
38.3%).
The data on codon usage in each coding region along the complete genome of C.
diphtheriae NCTC 13129 has been downloaded from the Codon Usage Database
(Nakamura et al., 2000) (www.kazusa.or.jp/codon). Since there are no records on
complete C. ulcerans, C. pseudotuberculosis and C. casei genomes in the Codon Usage
Database, we used their nucleotide sequences and protein tables from the GenBank to
calculate nucleotide usage biases in each coding region along their lengths. We used the
genomes of C. ulcerans strain 809 (NC_017317), C. pseudotuberculosis strain 258
(NC_017945) and C. casei strain LMG S-19264 (CP004350).
Opposite nucleotide usage biases
351
2.2 Methods
Each of the nucleotide sequences coding for spaC (from C. diphtheriae and other
Corynebacterium species) has been studied with the help of the ‘VVTAK Sliding
Window’ algorithm (http://chemres.bsmu.by/VVK%20SW.htm). Nucleotide content
distribution along the length of each coding region has been calculated in sliding
windows 150 codons in length (it is a longest window length for that algorithm). The
algorithm calculates nucleotide usage in three codon positions, as well as in fourfold and
twofold degenerated sites from third codon positions in each sliding window. The
‘VVTAK Sliding Window’ algorithm works with a single nucleotide sequence which has
no more than 3000 nucleotides in length. The user should copy a nucleotide sequence into
the designated cell of the ‘Sequence’ list and write the length of the sliding window
(no more than 150 codons) in another designated cell from the same list. Inserted
nucleotide sequence will be cut into short fragments of the required length (the step of the
sliding window is equal to one codon) and all the indices describing nucleotide usage in
each of those fragments will appear in columns from the ‘Results’ list.
Alignment of spaC from different C. diphtheriae strains has been performed by PAM
algorithm included in the MEGA 5.1 program (Tamura et al., 2011). All the gaps have
been deleted from that alignment. Then alignment has been cut into three parts, according
to the changes of biases in nucleotide content distribution along spaC coding regions:
part 1 (from codon #1 to codon #400 relatively to the NCTC 13129 spaC), part 2 (from
codon #401 to codon #1599), and part 3 (from codon #1600 to codon #1873). The first
boarder (codon #400) is the compromise between intersection points for G4f and A4f
(codon #350), for C4f and T4f (codon #450), for G2f3p and A2f3p (codon #300) and for
C2f3p and T2f3p (codon #500). The second boarder (codon #1600) is the point at which
3GC becomes higher than 50%.
Percentage of different types of variable fourfold degenerated sites has been
calculated in three alignments with the help of the ‘VVTAK VarInvar’ algorithm
(http://chemres.bsmu.by/VVK%20VarInvar.htm). The ‘VVTAK VarInvar’ algorithm
works with the alignment of nucleotide sequences (the alignment may contain up to 100
sequences no more than 4000 nucleotides in length each). The logical flow path of the
‘VVTAK VarInvar’ algorithm includes: (a) determination of fourfold and twofold
degenerated sites in third codon positions in each of the sequences; (b) finding those
twofold and fourfold sites which stay twofold and fourfold degenerated, respectively, in
all the sequences from the alignment (we called them stable twofold and stable fourfold
degenerated sites); (c) calculation of nucleotide content in stable twofold and fourfold
degenerated sites for each of the sequences; (d) finding invariable sites among stable
twofold and fourfold degenerated sites; (e) calculation of nucleotide content in invariable
twofold and fourfold degenerated sites (that index is the same for each of the sequences);
(f) finding whether the difference between nucleotide usage in stable and invariable sites
is significant using two-tailed t-test. The algorithm ‘VVTAK VarInvar’ does not use
those sites which are fourfold (or twofold) degenerated only in part of the sequences.
Those ‘instable’ sites formed because of mutations in first or second codon positions and
might introduce significant bias in calculations.
We used three methods for promoter prediction: the ‘BPROM’ available via the
SoftBerry server (http://linux1.softberry.com); the ‘NNPP’ program (http://fruitfly.org/
seq_tools/promoter.html) (Reese, 2001); the ‘PromPredict’ (http://nucleix.mbu.iisc.
ernet.in/prompredict/prompredict.html) program (Rangannan and Bansal, 2009).
352
V.V. Khrustalev et al.
Three methods for Rho-independent transcription terminators prediction used in
this study are: the ‘Erpin’ (Gautheret and Lambert, 2001) and the ‘RNAmotif’ (Macke
et al., 2001; Lesnik et al., 2001) methods both included into the ‘ARNold’ algorithm
(http://rna.igmors.u-psud.fr/toolbox/arnold/index.php), as well as the ‘RibEx’ server
(http://132.248.32.45/cgi-bin/ribex.cgi) (Abreu-Goodger and Merino, 2005).
Consensus secondary structures have been predicted for regions homologous to two
predicted Rho-independent terminators from C. diphtheriae Va01 strain spaC with the
help of the CentroidFold algorithm (Hamada et al., 2009) (http://www.ncrna.org/
centroidfold/).
Promoter and Rho-independent terminator predictions have been performed in spaC
coding regions from 11 strains of C. diphtheriae and in its homologues from other
Corynebacterium species.
The graph demonstrating GC-content distribution between three codon positions
along the length of C. diphtheriae NCTC 13129 strain chromosome has been built
with the help of the ‘Chore Viewer’ algorithm (Khrustalev and Barkovsky, 2012)
(http://chemres.bsmu.by/CRV.htm). As an input that algorithm uses a record from the
Codon Usage Database (Nakamura et al., 2000) (www.kazusa.or.jp/codon) describing
codon usage in each of the coding regions along the length of a genome.
The graphs demonstrating GC-content distribution in three codon positions along the
length of C. ulcerans 809, C. pseudotuberculosis 258 and C. casei LMG S-19264
genomes have been created with the help of the ‘VVK Protective Buffer’ algorithm
(http://chemres.bsmu.by/VVK%20Protective%20buffer.htm). That algorithm calculates
the number of indices describing nucleotide usage (including G+C, 1GC, 2GC and 3GC)
and amino acid usage in the set of protein coding regions (it can work with 100 sequences
no more than 10000 nucleotides in length each simultaneously).
We searched for additional examples of coding regions with heterogeneous 3GC
distribution in complete genomes of C. diphtheriae NCTC 13129, C. ulcerans 809, C.
pseudotuberculosis 258 and C. casei LMG S-19264. Each gene has been studied with the
help of the ‘VVTAK Sliding Window’ algorithm with sliding window length equal to
150 codons. We calculated the difference between the window with the highest 3GC and
the lowest 3GC level (∆3GC150) for each of those genes. Genes were sorted according to
that value for each genome.
3
Results
3.1 Distribution of GC-content in three codon positions along
the length of spaC from C. diphtheriae NCTC 13129
As one can see in Figure 1, 3GC distribution along the length of the gene coding for spaC
is quite heterogeneous. In the first 400 codons of this gene 3GC level varies
approximately from 45 to 50% (average level for all the sliding windows is equal to
45.51±0.36%). In the next part of that gene (from codon #401 to codon #1599) 3GC level
becomes lower: it varies around the level of 40% (average level is equal to 38.72±0.20%).
Interestingly, in the last part of the gene coding for spaC (it starts from the codon #1600)
3GC becomes much higher (64.61±0.91%). The length of that last (GC-rich) part of
the gene coding for spaC is rather long (273 codons), even though it is shorter than the
lengths of two previous parts.
Opposite nucleotide usage biases
Figure 1
353
GC-content in three codon positions (1GC, 2GC and 3GC) in sliding windows
150 codons in length along the spaC gene from C. diphtheriae NCTC 13129
There are proteins homologous to C. diphtheriae spaC in other species from the
Corynebacterium genus. 3GC is higher than 50% in spaC homologues from C. casei, C.
ulcerans, and C. pseudotuberculosis.
The data described above allow us to reject hypothesis of complete spaC lateral
transfer to C. diphtheriae from species with low GC-content. At first, 3GC is high in its
closer relatives from other species of the same genus. At second, 3GC is not decreased in
the last part of C. diphtheriae spaC – it is even higher than that in homologous regions
of C. ulcerans and C. pseudotuberculosis. Theoretically, partial lateral transfer might
happen, while the probability of such event is not very high (Boc et al., 2012).
3.2 Nucleotide usage biases in fourfold and twofold degenerated sites from
third codon positions along the length of C. diphtheriae NCTC 13129 spaC
More specific changes in nucleotide content distribution along the length of the spaC
gene from C. diphtheriae NCTC 13129 are represented in Figure 2.
In Figure 2A one can see that in the middle of the spaC gene adenine content in
fourfold degenerated sites (A4f) is much higher than guanine content (G4f). In the
beginning of the coding region A4f is close to G4f, while in the end of the gene G4f is
much higher than A4f (see Figure 2A). Similar tendency can be observed in Figure 2C.
The usage of C4f is higher than the usage of T4f along the most of the spaC gene.
However, there are peaks of C4f usage near the codons #200, #400 and #1600
(Figure 2B). In twofold degenerated sites the usage of thymine (T2f3p) is much higher
than the usage of cytosine (C2f3p) in the middle part of the spaC coding region
(Figure 2D).
354
Figure 2
V.V. Khrustalev et al.
Guanine and adenine (A, C), and cytosine and thymine (B, D) content in fourfold
degenerated (A, B) and twofold degenerated sites from third codon positions (C, D) in
sliding windows 150 codons in length along the spaC gene from C. diphtheriae NCTC
13129
According to the nucleotide usage biases along the length of the spaC gene, there should
be mutational AT-pressure in the middle part of that gene and G-pressure in its 3’-end.
We separated the spaC coding region into three parts. The first part ends at the codon
#400, according to the positions of intersection points for G4f and A4f (codon #350), for
C4f and T4f (codon #450), for G2f3p and A2f3p (codon #300) and for C2f3p and T2f3p
(codon #500). The last part starts from the codon #1600 where lines corresponding to G4f
and A4f usages (and G2f3p and A2f3p usages) cross each other making 3GC level higher
than 50% for the 3’-end of the gene.
3.3 Evidences of the ongoing mutational AT-pressure in the middle part of C.
diphtheriae spaC and the ongoing mutational G-pressure in its 3'-end
In the next step of the study we used 11 sequences of spaC. In Table 1 one can see that
levels of A4f, T4f, A2f3p and T2f3p are higher in invariable sites (in sites without
nucleotide mutations) than in all the stable sites from third codon positions (in sites which
stay twofold degenerated or fourfold degenerated in all the sequences from the alignment)
of the spaC Part 1 (from codon #1 to codon #400). In contrast, levels of G4f, C4f, G2f3p
and C2f3p are lower in invariable sites than in all the stable sites from third codon
positions of the spaC Part 1. These results confirm that cytosine and guanine residues are
more mutable than adenine and thymine in that part of spaC. In other words, there is an
ongoing mutational AT-pressure in the spaC Part 1.
Opposite nucleotide usage biases
Table 1
355
Nucleotide content in fourfold and twofold degenerated sites from third codon
positions in three parts of spaC gene. The usage of nucleotide in invariable sites is
written in bold underlined type in case if it is significantly higher than the average
usage of the same nucleotide in all the stable sites. Insignificant differences are written
in italic font
Part 1:
Part 2:
Part 3:
codons 1 – 400
codons 401 – 1599
codons 1600 – 1873
Nucleotide
usage Invariable Stable
Invariable Stable
Invariable Stable
P-value
P-value
P-value
sites
sites
sites
sites
sites
sites
A4f
23.94
22.60
0.032
29.02
29.10
0.594
6.41
7.69
0.003
T4f
40.85
32.88
< 10-3
33.16
31.48
< 10-3
30.77
33.55
< 10-3
G4f
18.31
19.24
0.045
19.69
18.93
0.142
42.31
36.94
< 10-3
25.28
< 10
-3
20.45
< 10
-3
20.51
21.82
0.005
< 10
-3
< 10
-3
5.66
8.09
< 10-3
< 10
-3
< 10
-3
18.87
21.42
0.015
< 10
-3
< 10
-3
56.60
45.33
< 10-3
< 10
-3
< 10
-3
18.87
25.16
< 10-3
C4f
A2f3p
T2f3p
G2f3p
C2f3p
16.90
29.33
33.33
14.67
22.67
23.57
31.36
18.80
26.27
18.13
32.27
38.65
11.95
17.13
29.06
33.88
15.94
21.13
In the Part 2 of spaC (from codon #401 to codon #1599) level of T4f is significantly
higher in invariable sites than in all the stable sites, while level of C4f is significantly
lower (see Table 1). There are no significant differences between levels of A4f and G4f in
invariable and all the stable sites. One may conclude that cytosine residues are more
mutable than thymine residues. The level of thymine keeps growing in fourfold
degenerated sites, while the level of adenine is not. Probably, levels of adenine and
guanine have already reached their equilibrium. On the other hand, there are clear
evidences of the ongoing mutational AT-pressure in twofold degenerated sites: levels of
A2f3p and T2f3p are higher in invariable sites than in all the stable sites.
In the Part 3 of spaC (from codon #1600 to codon #1873) the frequency of guanine
mutations should be lower than frequencies of adenine, thymine and cytosine mutations.
Indeed, the usage of G4f in invariable sites is significantly higher than its average usage
in all the stable fourfold degenerated sites, while usages of A4f, T4f and C4f are lower in
invariable sites than in all the stable ones. It means that in the Part 3 of spaC mutations
leading to appearance of guanine in place of other nucleotides are more frequent than
mutations leading to the replacement of guanine by other nucleotides. There are clear
evidences of mutational G-pressure in twofold degenerated sites: level of G2f3p is higher
in invariable sites than in all the stable sites of that kind, while levels of A2f3p and T2f3p
are lower (see Table 1).
The most probable cause of G-pressure in the Part 3 of spaC should be the elevated
rates of T to G transversions. As one can see in Table 2, 21.8% of variable fourfold
degenerated sites in spaC Part 3 are represented by sites containing G and T: it is more
than two times higher than in Part 2 and more than three times higher than in Part 1.
Interestingly, the percent of sites containing G and T nucleotides (sites with trasversion)
is even higher than the percent of sites containing A and G nucleotides (sites with
transition) in the Part 3 of spaC. Once again, according to the biases in nucleotide content
(see Figure 2), those sites occurred mostly due to G to T transversions in Parts 1 and 2,
and mostly due to T to G transversions in Part 3 of the spaC gene.
356
V.V. Khrustalev et al.
Table 2
Percentage of different types of variable fourfold degenerated sites in the alignment of
spaC genes.
Type of the
variable site
Part 1:
codons 1 – 400
Part 2:
codons 401 – 1599
Part 3:
codons 1600 – 1873
TC
40.00
27.54
37.50
AG
21.33
28.50
18.75
GC
8.00
4.83
9.38
AT
4.00
9.18
9.38
AC
13.33
10.14
3.13
TG
6.67
9.66
21.88
ATG
0.00
2.90
0.00
ATC
0.00
4.83
0.00
AGC
5.33
0.97
0.00
TGC
1.33
0.97
0.00
ATGC
0.00
0.48
0.00
In general, the data represented above showed that there are at least two regions with
different mutational pressure directions along the length of the same open reading frame.
Moreover, the first of those regions can be divided into two parts with different intense of
the mutational AT-pressure.
3.4 Prediction of intragenic promoters and terminators in spaC from C.
diphtheriae NCTC 13129 and its homologues
Theoretically, there may be autonomous promoters and transcription terminators inside
the spaC ORF responsible of the differential transcription of its parts. Unfortunately,
large scale RNA expression data for C. diphtheriae is not available in the current time,
unlike that for C. glutamicum (Pfeifer-Sancar et al., 2013). So, we tested hypothesis of
the autonomous transcription in silico: we predicted promoter regions and Rhoindependent transcription terminators along the length of spaC from C. diphtheriae and
its homologues from other species of the same genus. Rho-dependent transcription
terminators cannot be predicted using computer software, although it is known that those
terminators are enriched with cytosine residues (Ciampi, 2006).
There are many putative promoter regions along the length of the spaC ORF.
However, there are just two regions in which promoters have been predicted by all the
three methods: near the codon #400 and near the codon #1550. It is very important to
highlight that these putative promoters are situated near the areas in which nucleotide
usage biases are changing their direction or intense.
There are also two regions in which promoters were predicted by three methods in the
spaC homologue from C. ulcerans strain 0102. These putative promoters are situated in
the same regions as in the C. diphtheriae spaC. Moreover, two abovementioned putative
promoters from C. ulcerans spaC are homologous to promoters from C. diphtheriae
spaC. Interestingly, 3GC between those two predicted promoters is higher than in 5′ and
3′ parts of the spaC gene from C. ulcerans (see Figure 3), in contrast to the C. diphtheriae
spaC gene (see Figure 1).
Opposite nucleotide usage biases
Figure 3
357
GC-content in three codon positions (1GC, 2GC and 3GC) in sliding windows
150 codons in length along the spaC gene from C. ulcerans strain 0102
The search for Rho-independent transcription terminators was less productive than the
search for promoters. Two methods (RibEx and Erpin) failed to predict any terminator in
all the sequences studied. However, the RNAmotif method predicted terminators in one
sequence.
Two Rho-independent transcription terminators have been predicted by RNAmotif
method in the spaC gene from C. diphtheriae Va01 strain. In ten other spaC gene
sequences RNAmotif failed to predict terminators. Nucleotide sequences of two predicted
terminators from Va01 strain have been aligned with sequences from other strains.
Homologous regions containing a few nucleotide substitutions have been found in each of
them. It is known that L-type Rho-independent terminator consists of a short inverse
repeat which is able to form a hairpin and a 3'-tale enriched by thymine residues
(by uracil residues in mRNA) (Naville et al., 2011). We predicted consensus secondary
structures of both regions homologous to terminators predicted in Va01 strain spaC with
the help of the CentroidFold algorithm. As one can see in Figure 4, hairpins were
predicted by the CentroidFold in both cases. It means that those hairpins may be formed
not only in spaC from Va01 strain, but in sequences from other strains too. 3'-tales of
both putative terminators are enriched with uracil residues (see Figure 4).
The first putative Rho-independent terminator (Figure 4A) is located near the codon
#1300, while the second one is located near the codon #1500. The second one is situated
just upstream of the putative promoter region.
The only one known feature of sequences which serve as Rho-dependent transcription
terminators is their elevated cytosine content (Ciampi, 2006). Looking in Figures 2B and
2D one may see several regions in which cytosine content is elevated. The highest peak
of cytosine content in twofold degenerated sites is situated near the codon #1650 of C.
diphtheriae spaC (see Figure 2D).
358
V.V. Khrustalev et al.
Figure 4
Consensus secondary structures of the regions homologous to Rho-independent
terminator predicted near the codon #1330 (A) and near the codon #1500 (B) of spaC
from the C. diphtheriae Va01 strain
In general, regions in which 3GC increases or decreases along the length of spaC gene
are associated with such sequence features as putative promoters or terminators. These
data confirm that changes in nucleotide usage biases (and so, changes of mutational
pressure direction or intense) along the length of the same bacterial ORF can be
associated with the existence of intragenic transcription terminators and additional
promoter regions.
4
Discussion
As one can see in Figure 5A, 3GC levels of genes from C. diphtheriae strain NCTC
13129 genome vary greatly. For the most of the genes 3GC varies from 50 to 80%, while
there is a group of genes with 3GC lower than 50%. The most of the GC-poor genes are
grouped together near the region of replication termination. However, there are also many
genes with low 3GC levels outside the boarders of the large GC-poor genomic island
(see Figure 5). One of those genes is a gene coding for spaC minor pilin. So, there is a
single large GC-poor island and many small GC-poor islands in the genome of the C.
diphtheriae. Interestingly, similar pattern of 3GC distribution along the genome can be
observed in C. pseudotuberculosis (Figure 5B) and C. ulcerans (Figure 5C). However,
the difference between 3GC levels for genes from the GC-poor islands and the rest of the
Opposite nucleotide usage biases
359
genome is lower for C. pseudotuberculosis and C. ulcerans than for those from C.
diphtheriae. The genome of C. casei has no large GC-poor island in the middle but it has
GC-rich island with 3GC levels around 85% in its third quarter (Figure 5D).
Figure 5
GC-content in three codon positions (1GC, 2GC and 3GC) in each coding region along
the length of C. diphtheriae NCTC 13129 (A), C. pseudotuberculosis 258 (B), C.
ulcerans 809 (C) and C. casei LMG S-19264 genomes. Location of spaC gene is shown
by the arrow
According to the recent study (D’Afonseca et al., 2012), there are 24 pathogenicity
islands in the C. diphtheriae NCTC 13129 genome. How do they relate with GC-poor
genomic islands? Approximately 50% of pathogenicity islands are situated in GC-poor
regions of the chromosome. Only one relatively short pathogenicity island has been
identified in the long GC-poor area described above. Another half of pathogenicity
islands demonstrates elevated GC-content. Interestingly, approximately one half of small
GC-poor islands have no genes known to be involved in pathogenesis.
There are 30 genes which demonstrate heterogeneous 3GC distribution along their
lengths (the maximal difference between two sliding windows 150 codons in length is
higher than 25%) in the C. diphtheriae strain NCTC 13129 genome (see Table 3).
Interestingly, 14 of them are coding for membrane-anchored, surface-anchored or
membrane proteins. Theoretically, those proteins may be involved in adhesion process,
just like spaC does. The total number of genes encoding membrane, surface-anchored of
membrane-anchored proteins (according to the latest genome annotation) is equal to 86.
Chi-square test showed that the percentage of genes coding for membrane, surfaceanchored and membrane-anchored proteins among those with ∆3GC150 higher than 25%
(14 out of 30; 46.7%) is significantly higher (P < 0.001) than the percentage of genes
coding for membrane, surface-anchored and membrane-anchored proteins among those
with ∆3GC150 lower than 25% (72 out of 2314; 3.11%).
360
V.V. Khrustalev et al.
Table 3
Percentage of genes distributed according to their maximal difference in 3GC between
sliding windows 150 codons in length (∆3GC150) in genomes of four species from
Corynebacterium genus
Genomes of Corynebacterium species
∆3GC150, %
C. diphtheriae
NCTC 13129
C. ulcerans
809
C. casei
LMG S-19264
C. pseudotuberculosis
258
> 40 ≤ 45
0.088
0.000
0.000
0.000
> 35 ≤ 40
0.080
0.000
0.036
0.000
> 30 ≤ 35
0.264
0.046
0.107
0.000
> 25 ≤ 30
0.880
0.734
0.641
0.287
> 20 ≤ 25
2.817
2.615
2.136
2.059
> 15 ≤ 20
8.803
10.275
9.220
8.764
> 10 ≤ 15
19.058
19.266
20.007
20.259
> 5 ≤ 10
30.766
35.229
33.571
32.519
>0≤5
15.845
14.725
14.667
14.655
= 0 or < 150 codons
21.391
17.110
19.616
21.456
Genes coding for membrane,
surface-anchored or membraneanchored proteins (∆3GC150 >
25%)
14 from 30
5 from 17
0 from 22
0 from 6
SpaC gene itself demonstrates the maximal difference between 3GC levels for two sliding
windows equal to 43.33%. There is just a single gene with a higher ∆3GC150 (44.00%)
level in that genome which is coding for bifunctional alpha-amylase endo-alphaglucosidase. Some so-called housekeeping genes (such as DNA polymerase subunit III,
DNA methylases, translation initiation factor IF-2, serine-threonine protein kinase,
asparagine synthetase) also demonstrate 3GC heterogeneity in the C. diphtheriae strain
NCTC 13129 genome.
In the C. ulcerans 809 genome there are 17 genes with ∆3GC150 > 25% and 5 of
those genes are coding for membrane proteins which may be directly involved in
adhesion (see Table 3). Chi-square test showed that the percentage of genes coding for
membrane proteins among those with ∆3GC150 higher than 25% (5 out of 17; 29.41%) is
significantly higher (P < 0.001) than the percentage of genes coding for membrane,
surface-anchored and membrane-anchored proteins among those with ∆3GC150 lower
than 25% (19 out of 2156; 0.88%). Once again, we used current annotation of the C.
ulcerans 809 genome to find out which of the proteins are membrane, surface-anchored
or membrane-anchored ones.
In contrast, in C. casei genome there are no genes encoding for membrane,
membrane-anchored or surface anchored proteins among those 22 genes with ∆3GC150 >
25%, while the total number of membrane, surface-anchored and membrane-anchored
proteins (according to the current annotation) is equal to 25. In C. pseudotuberculosis
genome there are also no genes coding for membrane, surface-anchored or membraneanchored proteins among those 6 with ∆3GC150 > 25% (see Table 3), while the total
number of genes coding for membrane, surface-anchored or membrane-anchored proteins
annotated in that genome is equal to 24.
Opposite nucleotide usage biases
361
The overview of the Table 3 gives us the basis to say that cases similar to that
described for spaC gene are relatively rare among bacteria from Corynebacterium genus.
In four complete genomes we found 75 genes with anomalous 3GC content variations.
In genomes of C. diphtheriae and C. ulcerans the percentage of membrane-anchored,
surface-anchored and membrane proteins among the genes with ∆3GC150 > 25% is
higher than that among other genes. It shows that 3GC heterogeneity may be linked with
adhesion process. However, that tendency was not found in C. pseudotuberculosis and C.
casei genomes. In general, heterogeneities in 3GC distribution along the length of genes
equal to 5 – 10% are widespread in four studied genomes (Table 3).
Transcription-associated mutational pressure occurs due to the bias in rates of
nucleotide mutations which take place during transcription (Beletskii and Bhagwat,
1996). In case if bacteria can exist in different environments, differential expression of
genes should take place: certain groups of genes should be expressed in certain periods of
infection process only. During, at least, one of those periods some genes encoding
enzymes from base excision repair system may become repressed or overexpressed. As a
result, transcription-associated mutational pressure may become different for a group of
genes expressed only during a special period of bacterial life than for those genes which
are not expressed in that period (Khrustalev and Barkovsky, 2010a). Theoretically,
suppression of uracil-DNA-glycosilase(s) expression may lead to the accumulation of C
to T transitions (Gros et al., 2002) in genes (and their parts) expressed during a certain
period of bacterial life and cause transcription-associated AT-pressure. In contrast,
overexpression of the MutY enzyme (it excises adenine from 8-oxo-G:A mispairs and
leads to T to G transversions) during certain periods of bacterial life may lead to the
transcription-associated GC-pressure (Gros et al., 2002). In case with 3’-end of spaC
gene T to G transversions are frequent only in non-transcribed strand. Probably, 8-oxo-G
residues are removed effectively from the transcribed strand during transcriptionassociated repair in the phase of 3’-end of spaC autonomous expression.
There are several different environments in which C. diphtheriae is able to survive:
nasopharyngeal epithelium, skin (Orouji et al., 2012), cytoplasm of host cells (during
invasive infection of inner organs) (Viguetti et al., 2012). There are also asymptomatic
carriers of C. diphtheriae in which bacteria should survive but should not express genes
responsible for the beginning of acute infection with clear clinical symptoms.
The possibility of autonomous transcription from the inner area of coding region has
been discussed in several studies. Specific areas enriched with putative promoters have
been found in bacterial genomes using predictive software (Shavkunov et al., 2009). The
most of the putative intragenic promoters were considered to code for short RNAs
(Tutukina et al., 2007). Transcription from some of those promoters was approved in
experimental works (Shavkunov et al., 2009). Transcription beginning from intragenic
promoters is used in archaeal genomes even more frequently than transcription beginning
from intergenic regions (Koide et al., 2009). Existence of intragenic transcription
terminators was approved in many experimental works (Naville and Gautheret, 2009).
There are two transcription start sites inside the long open reading frame of C.
diphtheriae spaC. In case that transcription starts from the promoter near the codon #400
or near the codon #1550 of spaC, the resulting proteins will lack their N-terminal domain
which is homologous to von Willebrand factor. It was shown that recombinant protein
containing only the domain homologous to von Willebrand factor from spaC is able to
interact with human pharyngeal epithelium cells (Mandlik et al., 2007). In case of
362
V.V. Khrustalev et al.
transcription start from intragenic promoters, resulting proteins may not be able to
promote adhesion to pharyngeal epithelium, while they theoretically may promote
adhesion to some other surfaces.
In case that transcription ends near the codon #1300 or #1500, where Rhoindependent transcriptional terminators were mapped, resulting transcript may become
nonfunctional, since there is no stop-codon in its 3'-end. On the other hand, tmRNA may
be involved in the resolving of mRNA without stop-codon from ribosome (Fu et al.,
2011). In this case, resulting protein will lack its C-terminal part with sortase motif.
The protein will not be transported to the cell wall without that conserved motif (Mandlik
et al., 2007).
Full-length spaC is expressed together with other genes from spaA-type pili operon,
at least, in the period of C. diphtheriae infection when there is a need of adhesion to the
pharyngeal epithelium. Since 3GC level is low in the middle of the spaC gene (from
codon #400 to codon #1600), it is likely that this part of coding region may be expressed
autonomously in certain conditions. Transcription of this long mRNA may begin from the
promoter near the codon #400 and end in one of the two regions with Rho-independent
terminators (near codon #1300 or #1500), or in the region enriched with cytosine residues
near the codon #1600 (it may contain Rho-dependent terminator). Protein without any
known functional domains or motifs may be translated from that mRNA with the help of
tmRNA.
Different types of proteins may theoretically be translated from different mRNAs
transcribed from a single spaC ORF when C. diphtheriae survives in different conditions.
However, transcription-associated mutational AT-pressure should take place during
suggested autonomous transcription of the central spaC part, while suggested autonomous
expression of its 3′-end should take place during the period(s) with transcriptionassociated G-pressure.
Nucleotide mutations in putative promoters, in transcription factors binding sites, as
well as in Rho-dependent and Rho-independent transcription terminators inside the spaC
gene may cause differences in the success of adhesion to certain surfaces between
different strains of C. diphtheriae.
According to the results of phylogenetic analyses (Ruimy et al., 1995), C. ulcerans
and C. pseudotuberculosis are the closest known relatives of C. diphtheriae. Interestingly,
C. ulcerans and C. pseudotuberculosis are closer related to each other than to C.
diphtheriae. The forth genome from Corynebacterium genus which has spaC homologue
belong to the C. casei which is an outgroup for the three abovementioned species
(Brennan et al., 2001). According to these data, the specific large GC-poor island has
been formed in the common ancestor of C. diphtheriae, C. ulcerans and C.
pseudotuberculosis, while the ancestor of C. casei did not have such a large GC-poor
area. This fact approves that different transcription-associated mutational pressure
directions existed during the evolutionary period which is long enough to make 3GC
levels significantly lower, while that period of time was not enough to decrease 1GC and
2GC in genes from the GC-poor area (see Figure 5A, B and C). Moreover, genes coding
for membrane, surface-anchored and membrane-anchored proteins (including spaC) have
already been involved in autonomous transcription in the common ancestor of C.
diphtheriae, C. ulcerans and C. pseudotuberculosis. Since the time of C. ulcerans and C.
pseudotuberculosis divergence the genes coding for membrane, surface-anchored and
Opposite nucleotide usage biases
363
membrane-anchored proteins of C. pseudotuberculosis have acquired more homogenous
3GC distribution along their lengths. Coming back to the C. casei genome we have to
highlight that heterogeneity in 3GC distribution can be clearly observed along the length
of both the whole genome and several genes of this bacterium, while the pattern of its
3GC heterogeneity is different from that in C. diphtheriae.
The cause of 3GC heterogeneity along the coding region may be different: it can be
formed by insertion of gene fragments from different species or by the transcriptionassociated mutational pressure. As we have shown in this study, local transcriptionassociated mutational pressure may become weaker and even disappear or change its
direction. The ‘VVTAK VarInvar’ algorithm is necessary for answering the question
about the direction of ongoing mutational pressure in different parts of the same gene or
in different genes from the same genome. The aim of the algorithm is to separate current
mutational bias from the previously existed nucleotide usage biases.
5
Conclusions
In this study we showed that mutational pressure theory (Sueoka, 1988) can be applied
not only to complete bacterial genomes or full-length genes, but even to different parts of
the same bacterial open reading frame. Preferable direction of single nucleotide mutations
may be different in different parts of the same coding region. Indeed, we approved the
existence of ongoing AT-pressure in the 5'-part of C. diphtheriae spaC gene (up to the
codon #1599), and the existence of ongoing G-pressure in its 3'-part (from codon #1600
to codon #1873) with the help of original bioinformatic algorithms.
The repertoire of adhesins may be wider than it is thought for C. diphtheriae and C.
ulcerans due to the possibility of transcription from the intragenic promoters inside the
spaC (and several other genes coding for membrane surface-anchored and membraneanchored proteins) and termination of transcription on intragenic Rho-dependent or Rhoindependent terminators.
Bioinformatic analyses of nucleotide usage biases along the length of bacterial
genome and even along the length of a single gene may be used as promising procedures
in studies on transcription regulation and differential transcription.
References
Abreu-Goodger, C. and Merino, E. (2005) ‘RibEx: a web server for locating riboswitches and other
conserved bacterial regulatory elements’, Nucleic Acids Research, Vol. 33, pp.W690–W692.
Beletskii, A. and Bhagwat, A.S. (1996) ‘Transcription-induced mutations: Increase in C to T
mutations in the nontranscribed strand during transcription in Escherichia coli’, Proceedings of
the National Academy of Science USA, Vol. 93, pp.13919–13924.
Boc, A., Diallo, A.B. and Makarenkov, V. (2012) ‘T-REX: a web server for inferring,
validating and visualizing phylogenetic trees and networks’, Nucleic Acids Research, Vol. 40,
pp.W573–W579.
Brennan, N.M., Brown, R., Goodfellow, M., Ward, A.C., Beresford, T.P., Simpson, P.J., Fox, P.F.
and Cogan, T.M. (2001) ‘Corynebacterium mooreparkense sp. nov. and Corynebacterium
casei sp. nov., isolated from the surface of a smear-ripened cheese’, International Journal of
Systematic and Evolutionary Microbiology, Vol. 51, pp.843–852.
364
V.V. Khrustalev et al.
Cerdeño-Tárraga, A.M., Efstratiou, A., Dover, L.G., Holden, M.T, Pallen, M., Bentley, S.D., Besra,
G.S., Churcher, C., James, K.D., De Zoysa, A., Chillingworth, T., Cronin, A., Dowd, L.,
Feltwell, T., Hamlin, N., Holroyd, S., Jagels, K., Moule, S., Quail, M.A., Rabbinowitsch, E.,
Rutherford, K.M., Thomson, N.R., Unwin, L., Whitehead, S., Barrell, B.G. and Parkhill, J.
(2003) ‘The complete genome sequence and analysis of Corynebacterium diphtheriae
NCTC13129’, Nucleic Acids Research, Vol. 31, pp.6516–6523.
Chang, C., Mandlik, A., Das, A. and Ton-That, H. (2011) ‘Cell surface display of minor pilin
adhesins in the form of a simple heterodimeric assembly in Corynebacterium diphtheriae’,
Molecular Microbiology, Vol. 79, pp.1236–1247.
Chen, C. and Chen, C.W. (2007) ‘Quantitative analysis of mutation and selection pressures on base
composition skews in bacterial chromosomes’, BMC Genomics, Vol. 8, No. 286.
Ciampi, M.S. (2006) ‘Rho-dependent terminators and transcription termination’, Microbiology,
Vol. 152, pp.2515–2528.
D’Afonseca, V., Soares, S.C., Ali, A., Santos, A.R., Pinto, A.C., Magalhaes, A.A.C., Faria, C.J.,
Barbosa, E., Guimaraes, L.C., Eslabao, M., Almeida, S.S., Abreu, V.A.C., Zerlotini, A.,
Carneiro, A.R., Cerdeira, L.T., Ramos, R.T.J., Hirata, Jr. R., Mattos-Guaraldi, A.L., Trost, E.,
Tauch, A., Silva, A., Schneider, M.P., Miyoshi, A. and Azevedo, V. (2012) ‘Reannotation of
the Corynebacterium diphtheriae NCTC13129 genome as a new approach to studying gene
targets connected to virulence and pathogenicity in diphtheria’, Open Access Bioinformatics,
Vol. 4, pp.1–13.
Fu, J., Hashem, Y., Wower, J. and Frank, J. (2011) ‘tmRNA on its way through the ribosome:
two steps of resume and what next?’ RNA Biology, Vol. 8, pp.586–590.
Gautheret, D. and Lambert, A. (2001) ‘Direct RNA motif definition and identification from
multiple sequence alignments using secondary structure profiles’, Journal of Molecular
Microbiology, Vol. 313, pp.1003–1011.
Gros, L., Saparbaev, M.K. and Laval, J. (2002) ‘Enzymology of the repair of free radicals-induced
DNA damage’, Oncogene, Vol. 21, pp.8905–8925.
Hamada, M., Kiryu, H., Sato, K., Mituyama, T. and Asai, K. (2009) ‘Prediction of RNA secondary
structure using generalized centroid estimators’, Bioinformatics, Vol. 25, pp.465–473.
Khrustalev, V.V. and Barkovsky, E.V. (2010a) ‘Study of completed archaeal genomes and
proteomes: hypothesis of strong mutational AT pressure existed in their common predecessor’,
Genomics, Proteomics & Bioinformatics, Vol. 8, pp.22–32.
Khrustalev, V.V. and Barkovsky, E.V. (2010b) ‘The level of cytosine is usually much higher than
the level of guanine in two-fold degenerated sites from third codon positions of genes from
Simplex- and Varicelloviruses with G+C higher than 50%’, Journal of Theoretical Biology,
Vol. 266, pp.88–98.
Khrustalev, V.V. and Barkovsky, E.V. (2012) ‘A blueprint for a mutationist theory of replicative
strand asymmetries formation’, Current Genomics, Vol. 13. pp.55–64.
Khrustalev, V.V., Barkovsky, E.V., Khrustaleva, T.A. and Lelevich, S.V. (2014) ‘Intragenic
isochores (intrachores) in the platelet phosphofructokinase gene of Passeriform birds’, Gene,
Vol. 546, pp.16–24.
Koide, T., Reiss, D.J., Bare, J.C. et al. (2009) ‘Prevalence of transcription promoters within
archaeal operons and coding sequences’, Molecular Systems Biology, Vol. 5, p.285.
Lesnik, E.A., Sampath, R., Levene, H.B., Henderson, T.J., McNeil, J.A. and Ecker, D.J. (2001)
‘Prediction of Rho-independent transcriptional terminators in Escherichia coli’, Nucleic Acids
Research, Vol. 29, pp.3583–3594.
Lobry, J.R. and Sueoka, N. (2002) ‘Asymmetric directional mutation pressures in bacteria’,
Genome Biology, Vol. 3.
Macke, T., Ecker, D., Gutell, R., Gautheret, D., Case, D.A. and Sampath, R. (2001) ‘RNAMotif – a
new RNA secondary structure definition and discovery algorithm’, Nucleic Acids Research,
Vol. 29, pp.4724–4735.
Opposite nucleotide usage biases
365
Mandlik, A., Swierczynski, A., Das, A. and Ton-That, H. (2007) ‘Corynebacterium diphtheriae
employs specific minor pilins to target human pharyngeal epithelial cells’, Molecular
Microbiology, Vol. 64, pp.111–124.
Nakamura, Y., Gojobori, T. and Ikemura, T. (2000) ‘Codon usage tabulated from the international
DNA sequence databases: status for the year 2000’, Nucleic Acids Research, Vol. 28, p.292.
Naville, M. and Gautheret, D. (2009) ‘Transcription attenuation in bacteria: theme and variations’,
Briefings in Functional Genomics & Proteomics, Vol. 8, pp.482–492.
Naville, M., Ghuillot-Gaudefroy, A., Marchais, A. and Gautheret, D. (2011) ‘ARNold: A web
tool for the prediction of Rho-independent transcription terminators’, RNA Biology, Vol. 8,
pp.11–13.
Nei, M. and Kumar, S. (2000) Molecular Evolution and Phylogenetics, Oxford University Press,
New York.
Orouji, A., Kiewert, A., Filser, T., Goerdt, S. and Peitsch, W.K. (2012) ‘Cutaneous diphtheria in a
german man with travel history’, Acta Dermato Venereologica, Vol. 92, pp.179–180.
Pfeifer-Sancar, K., Mentz, A., Ruckert, C. and Kalinowski, J. (2013) ‘Comprehensive analysis
of the Corynebacterium glutamicum transcriptome using an improved RNAseq technique’,
BMC Genomics, Vol. 14, No. 888.
Rangannan, V. and Bansal, M. (2009) ‘Relative stability of DNA as a generic criterion for promoter
prediction: whole genome annotation of microbial genomes with varying nucleotide base
composition’, Molecular BioSystems, Vol. 5, pp.1758–1769.
Reese, M.G. (2001) ‘Application of a time-delay neural network to promoter annotation in the
Drosophila melanogaster genome’, Computers and Chemistry, Vol. 26, pp.51–56.
Ruimy, R., Riegel, P., Boiron, P., Monteil, H. and Christen, R. (1995) ‘Phylogeny of the genus
Corynebacterium deduced from analyses of small-subunit ribosomal DNA sequences’,
International Journal of Systematic Bacteriology, Vol. 45, pp.740–746.
Shavkunov, K.S., Masulis, I.S., Tutukina, M.N., Deev, A.A. and Ozoline, O.N. (2009) ‘Gains and
unexpected lessons from genome-scale promoter mapping’, Nucleic Acids Research, Vol. 37,
pp.4919–4931.
Sueoka, N. (1988) ‘Directional mutation pressure and neutral molecular evolution’, Proceedings of
the National Academy of Science USA, Vol. 85, pp.2653–2657.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S. (2011) ‘MEGA5:
molecular evolutionary genetics analysis using maximum likelihood, evolutionary
distance, and maximum parsimony methods’, Molecular Biology and Evolution, Vol. 28,
pp.2731–2739.
Ton-That, H. and Schneewind, O. (2003) ‘Assembly of pili on the surface of Corynebacterium
diphtheriae’Molecular Microbiology, Vol. 50, pp.1429–1438.
Trost, E., Blom, J., Soares, S.C., Huang, I.H., Al-Dilaimi, A., Schreder, J., Jaenicke, S., Dorella,
F.A., Rocha, F.S., Miyoshi, A., Azevedo, V., Schneider, M.P., Silva, A., Camello, T.C.,
Sabbadini, P.S., Santos, C.S., Santos, L.S., Hirata, R. Jr., Mattos-Guaraldi, A.L., Efstratiou, A.,
Schmitt, M.P., Ton-That, H. and Tauch, A. (2012) ‘Pan-genomics of Corynebacterium
diphtheriae: insights into the genomic diversity of pathogenic isolates from cases of classical
diphtheria, endocarditis and pneumonia’, Journal of Bacteriology, Vol. 194, pp.3199–3215.
Tutukina, M.N., Shavkunov, K.S., Masulis, I.S. and Ozoline, O.N. (2007) ‘Intragenic promotor-like
sites in the genome of Escherichia coli discovery and functional implication’, Journal of
Bioinformatics and Computational Biology, Vol. 5, pp.549–560.
Viguetti, S.Z., Pacheco, L.G.C., Santos, L.S., Soares, S.C., Bolt, F., Baldwin, A., Dowson, C.G.,
Rosso, M.L., Guiso, N., Miyoshi, A., Hirata, R. Jr., Mattos-Guaraldi, A.L. and Azevedo, V.
(2012) ‘Multilocus sequence types of invasive Corynebacterium diphtheriae isolated in the
Rio de Janeiro urban area, Brazil’, Epidemiology & Infection, Vol. 140, pp.617–620.