Download Independent evolution of overlapping polymerase and surface

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

Epistasis wikipedia , lookup

Genomics wikipedia , lookup

Public health genomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Pathogenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Human genome wikipedia , lookup

Genomic imprinting wikipedia , lookup

Quantitative trait locus wikipedia , lookup

History of genetic engineering wikipedia , lookup

Ridge (biology) wikipedia , lookup

Population genetics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Expanded genetic code wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Point mutation wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Genetic code wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Journal of General Virology (2007), 88, 2137–2143
DOI 10.1099/vir.0.82906-0
Independent evolution of overlapping polymerase
and surface protein genes of hepatitis B virus
Hans L. Zaaijer,1 Formijn J. van Hemert,2 Marco H. Koppelman3
and Vladimir V. Lukashov2,4
Correspondence
Vladimir V. Lukashov
[email protected]
1
Laboratory of Clinical Virology, Department of Medical Microbiology, Center for Infection and
Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam,
The Netherlands
2
Laboratory of Experimental Virology, Department of Medical Microbiology, CINIMA, Academic
Medical Center, University of Amsterdam, Amsterdam, The Netherlands
3
Department of Virology, Sanquin, Amsterdam, The Netherlands
4
Laboratory of Immunochemistry, D. I. Ivanovsky Institute of Virology, Russian Academy of Medical
Sciences, Moscow, Russia
Received 6 February 2007
Accepted 19 April 2007
The genome of hepatitis B virus (HBV) provides a striking example of gene overlapping. In
particular, the surface protein gene S is overlapped completely by the polymerase gene P.
Evolutionary constraints in overlapping genes have been demonstrated for many viruses, with
one of the two overlapping genes being subjected to positive selection (adaptive evolution), while
the other one is subjected to purifying selection. Yet, for HBV to persist successfully, adaptive
evolution of both the P and S genes is essential. We propose that HBV employs a mechanism that
allows the independent adaptive evolution of both genes. We hypothesize that (i) the adaptive
evolution of P occurs via p1/s3 non-synonymous substitutions, which are synonymous in S, (ii) the
adaptive evolution of S occurs via p3/s2 non-synonymous substitutions, which are synonymous
in P, and (iii) p2/s1 substitutions are rare. Analysis of 450 HBV sequences demonstrated that
this mechanism is operational in HBV evolution both within and among genotypes. Positions were
identified in both genes where adaptive evolution is operational. Whilst significant parts of the
P and S genes were subjected to positive selection, with the Ka/Ks ratio for either the P or the S
gene being .1, there were only a few regions where the Ka/Ks ratios in both genes were .1.
This mechanism of independent evolution of the overlapping regions could also apply to other
viruses, taking into account the increased frequency of amino acids with a high level of
degeneracy in the proteins encoded by overlapping genes of viruses.
INTRODUCTION
Overlapping of genes is a strategy used widely by viruses to
condense a maximal amount of information into short
genomes (Barrell et al., 1976; reviewed by Gibbs & Keese,
1995). The circular genome of hepatitis B virus (HBV)
provides a striking example of gene overlapping, with 50 %
of the genome consisting of overlapping reading frames
(Fig. 1). In particular, the HBV surface protein gene (S;
681 nt) is overlapped completely by the polymerase gene
(P; 2532 nt).
Gene overlapping influences the independent evolution of
the encoded proteins. Recently, evolution of overlapping
versus non-overlapping regions of virus genomes has been
A list of genotypes and GenBank accession numbers for the human
HBV sequences used in this study is available with the online version of
this paper.
0008-2906 G 2007 SGM
compared for several viruses, including papillomaviruses
(Hughes & Hughes, 2005; Narechania et al., 2005),
bacteriophages from the family Microviridae (Pavesi,
2006), potato leafroll virus (Guyader & Ducray, 2002),
simian immunodeficiency virus (SIV) (Hughes et al., 2001)
and HBV (Mizokami et al., 1997). Mizokami et al. (1997)
demonstrated decreased rates of synonymous nucleotide
substitutions in overlapping versus non-overlapping
regions of the HBV genome, providing evidence for evolutionary constraints associated with overlapping regions.
This trend was also found in other viruses. Moreover, for
SIV, potato leafroll virus, members of the family Microviridae and papillomaviruses, the overlapping regions were
characterized by adaptive evolution (high rates of nonsynonymous substitutions, Dn/Ds ratios .1) of one reading frame, mirrored by negative selection (high rates of
synonymous substitutions, Dn/Ds ratios ,1) operational in
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
Printed in Great Britain
2137
H. L. Zaaijer and others
(a)
(b)
Fig. 1. The HBV genome. The P, S, X and C genes encode the
polymerase, surface, X- and core proteins, respectively. The
YMDD motif is involved in resistance to lamivudine. Vaccineinduced antibodies and specific immunoglobulins neutralize the
virus via the a-determinant of the surface protein.
Nucleotide position in codons
Polymerase gene
1
2
3
1
2
3
1
2
3
Surface gene
3
1
2
3
1
2
3
1
2
Nucleotide
sequence
Polymerase
codon
Surface
codon
Wild type: ..CTTT..
CTT = Leu
TTT = Phe
Mutants: ..CTCT..
..CTAT..
..CTGT..
CTC = Leu
CTA = Leu
CTG = Leu
TCT = Ser
TAT = Tyr
TGT = Cys
Fig. 2. Frame-shifted position of the overlapping HBV polymerase
and surface genes. (a) A nucleotide substitution in the third codon
position is likely to be synonymous, allowing a non-synonymous
substitution in the other reading frame (arrows). (b) Example of
point mutations in an overlapping, frame-shifted sequence of the
HBV genome, being simultaneously synonymous in the polymerase
and non-synonymous in the surface protein.
the other reading frame, a trend considered to be general
for the evolution of the overlapping regions.
However, for HBV to persist successfully and to overcome
new challenges, adaptive evolution of both the overlapping
P and S genes is essential. In particular, adaptive evolution
of the P gene in response to antiviral therapy renders the
virus resistant to antiviral drugs, and mutations in the S
gene allow virus escape from neutralizing antibodies
(Cooreman et al., 2001; Hannoun et al., 2000; Moskovitz
et al., 2005). In this study, we analysed the possibility that
the HBV polymerase and surface proteins evolve independently, despite being encoded by the same nucleotide
sequence.
We hypothesized that HBV may employ a mechanism that
allows the independent adaptive evolution of both proteins
encoded by the same sequence. In the overlapping, frameshifted polymerase/surface region of the HBV genome, the
first nucleotide position in a P codon corresponds to the
third position in the S codon (p1/s3), the second position
in a P codon to the first position in the S codon (p2/s1)
and the third position in a P codon to the second position
in the S codon (p3/s2) (Fig. 2). The position of a nucleotide
substitution within a codon strongly influences the chance
for the substitution to be synonymous or not. A nucleotide
substitution in the first codon position causes an amino
acid change in 60 of 64 cases, in the second position in 63
of 64 cases and in the third position in only 16 of 64 cases.
Hence, nucleotide substitution in the first position of a P
codon (p1/s3) is likely to cause amino acid changes in P,
but not in S, whereas substitutions in the third position of
a P codon (p3/s2) are probably synonymous in P, but nonsynonymous in S. Thus, an adaptive, non-synonymous
nucleotide substitution in the first position of a P codon is
likely to remain synonymous in S, whereas an adaptive,
non-synonymous nucleotide substitution in the second
position of an S codon is likely to remain synonymous in P.
2138
On the other hand, nucleotide substitutions in the p2/s1
positions in most cases will be non-synonymous in both
genes. We predicted that adaptive evolution of HBV occurs
via p1/s3 mutations in the P gene and via p3/s2 mutations
in the S gene, and that p2/s1 mutations are rare.
METHODS
To validate this hypothesis, we studied variation of nucleotides and
amino acids in the overlapping P and S gene region (227 codons) of
450 internationally obtained HBV genomes. Sequences of human
HBV were retrieved from GenBank, including the 11 HBV genotype
standards as proposed by Bartholomeusz & Schaefer (2004). The
sequences were genotyped by using the HBV STAR program (Myers
et al., 2006). Genotypes and GenBank accession numbers of the 450
HBV sequences are reported in the Supplementary Material, available
in JGV Online. The sequences were aligned by using the CLUSTAL_W
software (Chenna et al., 2003). The variation of 227 aa in the polymerase and surface protein and the variation in the three sets of
nucleotide sites, comprising respectively the 227 p1/s3 sites, the 227
p2/s1 sites and the 227 p3/s2 sites, were studied by analysing the
entropy values for every individual amino acid and nucleotide
position, as implemented in the BioEdit software (Hall, 1999). As a
measure of the variation of amino acids at any given position, the
entropy values theoretically vary from zero (no variation) to 3.04 (all
amino acids and the stop codon occur in equal proportions). For the
variation of nucleotides at a given position, the entropy values range
from zero (no variation) to 1.39 (A, C, G and T occur at an equal
frequency) (Hall, 1999).
The overall mean synonymous (Ds) and non-synonymous (Dn)
distances in the overlapping region of the P and S genes were calculated
by using the MEGA 3 software (Kumar et al., 2004). The Nei–Gojobori
method with Jukes–Cantor correction was used; standard error was
calculated by using bootstrap resampling with 1000 replications.
The CODEML module of PAML 3.15 (Yang, 1997; Yang et al., 2000) was
applied for estimation of the rates of synonymous and nonsynonymous nucleotide substitutions at sites in the overlapping
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
Journal of General Virology 88
Independent evolution of HBV overlapping genes
reading frames of P and S separately by maximum-likelihood (ML)
approximation. The nested site models (1 and 2, 7 and 8) of PAML
were run on an array of 680 computers called LISA, hosted by SARA
Computing and Networking Services, Amsterdam, The Netherlands.
Each computer (node) in the array was provided with two Intel Xeon
processors working at 3.4 GHz on 2–4 GB EM64T memory under OS
Debian Linux. Jobs submitted individually consisted of a single model
preceded by model 0 in order to trace inconsistency of input data.
The approach allowed for the simultaneous estimation of Dn/Ds
parameters by parallel jobs assigned to different nodes of the array.
When parameter trapping near the border of the parameter space
did not occur, it took the most complex model, 8 [default settings
(Yang et al., 2000)], 9–13 h to converge to an optimal likelihood
estimate, given 400 HBV sequences of 226 codons each. For proper
convergence, the presence of identical sequences (Dn/Ds50/0) should
be avoided. In addition, the use of an initial input tree with MLestimated branch lengths (copied from models 0, 1 or 7 into models
2 or 8) may prevent parameter trapping and hence a very slow
convergence of model 2 and 8 values for ML to those of models 1 and
7. Likelihood-ratio tests (LRT) and Bayes Empirical Bayes (BEB)
statistics (Anisimova et al., 2001, 2002; Yang et al., 2005) were applied
as described in the PAML manual (http://abacus.gene.ucl.ac.uk/
software/paml.html).
Analysis of the rate of synonymous substitutions (the number of
synonymous substitutions per synonymous site, Ks) and nonsynonymous substitutions (the number of non-synonymous substitutions per non-synonymous site, Ka), as well as their ratios (Ka/
Ks), in a sliding window was performed by using the SWAAP 1.0.2
software (http://www.bacteriamuseum.org/SWAAP/SwaapPage.htm).
The Nei–Gojobori distance estimation method was used for a sliding
window (window length, 15 nt; window step, 3 nt, which is the
maximal resolution of the program). Due to the limited number of
sequences that can be analysed simultaneously by the program, 99
sequences, selected randomly from the total of 450, were analysed
each time. So, for the total of 450 sequences, four sets of 99 sequences
and one set of 54 sequences were analysed and the data were pooled,
together giving 1110 window comparisons.
RESULTS
Nucleotide variation at the 227 p1/s3, p2/s1 and p3/s2
positions of the overlapping P/S gene region for all 450
sequences is shown in Fig. 3(a). As predicted by our
hypothesis, variation at the p2/s1 nucleotide sites, which,
with high probability, affects the amino acid sequences of
both the polymerase and the surface protein, was the
lowest. The cumulative entropy value for all p2/s1 positions
was 10.5, compared with the cumulative entropy values of
26.7 and 24.3 for the p1/s3 and p3/s2 positions, respectively. A cluster of p2/s1 variation occurred at codons 110–
140, which encode the immunodominant a-determinant of
the surface protein.
As predicted, amino acid variation of the surface protein
was largely determined by nucleotide substitutions at the
p3/s2 sites, whereas amino acid variation of the polymerase
was primarily caused by substitutions of the p1/s3 nucleotides. Some additional variation was imposed on both
proteins by rare p2/s1 mutations. In some instances, p1/s3
nucleotide variation did not cause amino acid changes in
the polymerase (Fig. 3a). This was due to partial degeneration of the first nucleotide position, as four of 64 possible
http://vir.sgmjournals.org
substitutions are synonymous: TTA–CTA and TTG–CTG
coding for leucine, CGA–AGA and CGG–AGG coding for
arginine. The only substitution at the second nucleotide
position that is synonymous is in stop codons (TAA–
TGA); hence, all p3/s2 nucleotide variation translates into
amino acid changes of the surface protein.
To establish whether the difference in variation of the p1/s3
and p3/s2 versus p2/s1 positions is related to HBV
diversification within or among genotypes and to confirm
that our findings are not limited to certain HBV genotypes,
we subsequently analysed nucleotide variation at the p1/s3,
p2/s1 and p3/s2 positions among sequences belonging to
each genotype and among unassigned sequences, as well as
among genotype consensus sequences of genotypes A–H
(Table 1). For every genotype from A to H, as well as for
unassigned sequences, the cumulative entropy values for
the p2/s1 positions were markedly lower that those for the
p1/s3 and p3/s2 positions. For instance, for genotype B,
which was represented by 120 sequences (118 plus two
reference sequences), the cumulative entropy value for the
p2/s1 positions was 2.3, compared with the values of 9.8
and 9.5 for the p1/s3 and p3/s2 positions, respectively. The
same trend was observed for genotype E, which was
represented by five sequences (four plus one reference
sequence): 1.2, compared with 3.0 and 2.2. This pattern
of markedly lower variation at the p2/s1 than at the p1/s3
and p3/s2 positions was also observed when genotype
consensus sequences were compared with each other
(Table 1).
The overall mean Dn and Ds in the overlapping region of
the P and S genes, calculated by using the MEGA program,
were similar for both genes: for P, respectively
0.035±0.005 and 0.109±0.014 (Dn/Ds ratio of 0.32), and
for S, respectively 0.041±0.005 and 0.092±0.015 (Dn/Ds
ratio of 0.45). Hence, negative selection is generally
dominating in the evolution of both genes. However, this
does not preclude certain regions of both genes being
subjected to positive selection. Our analysis of the Ka/Ks
distribution throughout the P and S genes by using the
sliding-window approach has identified these regions (Fig.
3b). To analyse the relationships between the Ka/Ks ratios
in overlapping sequences of P and S, all 450 sequences were
separated randomly into five subsets and, for each subset,
the Ka/Ks ratio was calculated by using the sliding-window
approach for each 5 aa long P and corresponding (+1 nt)
S peptide with a step of 1 aa. Subsequently, the data for
1110 comparisons of P and S peptides were plotted
together (Fig. 4). For the vast majority of window comparisons, the Ka/Ks ratios of both P and corresponding S
peptides were ,1. Yet, of the total 1100 P and 1110 S
peptide subsets analysed, the Ka/Ks ratios in 170 P and 222
S subsets were .1. However, a Ka/Ks ratio of .1 in a P
peptide subset was associated with a strong decline of the
Ka/Ks ratio in the corresponding S peptides, and vice versa.
Among the 1110 subsets analysed, there were only 28 P and
corresponding S subsets for both of which the Ka/Ks ratios
were .1 (Fig. 4).
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
2139
H. L. Zaaijer and others
a-determinant
Variation (entropy value)
(a)
Polymerase:
Amino acids
p1/s3
nucleotides
Surface protein:
Amino acids
p3/s2
nucleotides
1
0
1
21
(b)
41 61 81 101 121 141 161 181 201 221
Codon number in HBV surface gene
p2/s1
nucleotides
5
Ka/Ks
4
3
P Ka/Ks
2
S Ka/Ks
1
200
(c)
400
Nucleotide position
3.5
600
S
3.0
P
Dn/Ds
2.5
2.0
1.5
1.0
0.5
1
21
41
61
81 101 121 141 161 181 201 221
Amino acid position
Table 1. Intra- and intergenotype variation at the p1/s3, p2/s1
and p3/s2 positions
Genotype
A
B
C
D
E
F
G
H
Unassigned sequences
Consensus A–H
2140
Cumulative entropy values for sites:
p1/s3
p2/s1
p3/s2
12.9
9.8
11.4
11.4
3.0
10.1
1.9
2.5
18.0
23.6
5.2
2.3
3.5
4.7
1.2
1.4
0.0
0.8
7.4
7.3
7.5
9.5
11.0
6.9
2.2
3.3
0.8
2.6
11.1
21.8
Fig. 3. Variation of amino acids and nucleotides in the overlapping HBV polymerase and
surface protein genes. (a) Variation of the three
sets of nucleotides: p1/s3 sites (blue), p2/s1
sites (red) and p3/s2 sites (green), in relation
to the amino acid variation in the P and S
proteins. Orange dots indicate synonymous
variation at p1/s3 positions, caused by redundancy at the first nucleotide position in codons.
(b) Ka/Ks ratios were measured in a sliding
window throughout the P and S genes. The
data for 99 randomly selected sequences are
plotted. The Ka/Ks axis is cut off at the value of
5 for the purpose of presentation. (c) Dn/Ds
values were taken from the BEB output of PAML
model 8. Clusters of sites prone to positive
selection are prominent at regions around
positions 45, 125 and 208. In the a-determinant
domain of S (codons 110–135), Dn/Ds
values of the corresponding sites in P tended
to escape from purifying selection towards
neutrality.
To identify sites subjected to positive selection in the
overlapping region of P and S, several models implemented
in the PAML program were applied. The basic model 0,
which assumes rate constancy among all sites and branches,
generated mean posterior values for Dn/Ds ratios of 0.338
(P) and 0.430 (S) (Table 2), similar to ratios obtained by
the less complex and laborious approach implemented in
the MEGA program. The transition/transversion ratios
varied between 2.3 and 2.7 in all models and are presented
for model 0 only. Model 2 differs from model 1 by having
an extra site class accounting for sites with Dn/Ds ratios .1.
According to model 1, 77.8 and 72.7 % of sites in P and S
displayed mean posterior values for Dn/Ds of 0.109 and
0.166, respectively. The extra site class in model 2 has
accumulated sites of P and S with mean Dn/Ds values of
3.276 (4.2 %) and 2.412 (6.5 %), respectively. This indicated
that positive selection is operational in the overlapping
region of P and S. The LRT provided further support for
this observation. Twice the difference between lnL(model
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
Journal of General Virology 88
Independent evolution of HBV overlapping genes
Table 2. Log-likelihood values and parameter estimates
P Ka/Ks 10 000
10
100
9
0.001
8
1000 100 000
0.01
0.0001
7
S Ka/Ks
P Ka/Ks
6
5
4
3
‘Model’ indicates the PAML CODEML model applied for the determination of nucleotide-substitution rates at codon sites. Figures indicative
of positive selection are in bold. Transition/transversion ratios varied
between 2.3 and 2.7 for all models and are presented for the basic
model 0 only. Values for likelihood convergence (lnL) support the
validity of models 2 and 8 versus 1 and 7, respectively, for the
detection of positive selection (P,0.001). p0–p11 show the proportion
of sites in the category indicated by the corresponding mean posterior
value for Dn/Ds. In models 7 and 8, the parameters p and q indicate
the b-distribution, i.e. the Dn/Ds ratio as a function of the proportion
of sites with a certain Dn/Ds ratio.
Model/parameter
2
S
ts/tv
lnL
Dn/Ds
2.534
213 042.470
0.338
2.311
213 461.882
0.430
lnL
Dn/Ds (p0)
Dn/Ds (p1)
212 626.902
0.109 (0.778)
1.000 (0.222)
213 185.952
0.166 (0.727)
1.000 (0.273)
lnL
Dn/Ds (p0)
Dn/Ds (p1)
Dn/Ds (p2)
212 545.411
0.117 (0.761)
1.000 (0.196)
3.276 (0.042)
213143.474
0.182 (0.713)
1.000 (0.222)
2.412 (0.065)
lnL
p
q
212 641.481
0.347
0.723
213 208.874
0.553
0.835
lnL
p
q
p0
Dn/Ds (p11)
212 549.478
0.490
1.349
0.955
2.970 (0.045)
213 144.096
0.994
2.236
0.905
2.017 (0.095)
0
1
0
P
1
2
3
4
5
6
7
8
9
10
0
1
S Ka/Ks
Fig. 4. Relationships between the Ka/Ks ratios in P and
corresponding S peptides. The axes of the main figure are linear
and cut off at the value of 10 for the purpose of presentation. As
the Ka/Ks ratio for either P or the corresponding S peptide for a few
comparisons was above 10, all 1110 data points are also plotted
on the log scale (inset).
1) and lnL(model 2) amounted to 163 (P) and 85 (S),
indicating a rejection of model 1 in favour of model 2 at a
confidence level of P,0.001. The nested models 7 and 8
employ 10 and 11 site classes, respectively, for the
accumulation of Dn/Ds values at sites. Site class 11 (model
8, Dn/Ds.1) is presented in Table 2, showing Dn/Ds ratios
and LRT values similar to those of the model pair 1 and 2.
The parameters p and q describe the b-distribution – the
Dn/Ds ratio as a function of the proportion of sites with a
certain ratio. A skewed distribution indicates the proportion of sites that are either highly conserved or nearly
neutral. The values for p and q for P and S pointed to a
similarly shaped b-distribution.
Having validated models 2 and 8 by means of the LRT, we
used the BEB output of these models to identify the amino
acid sites subjected to positive selection (Table 3). Positively selected sites identified by models 2 and 8 were the
same. Differences between the figures for statistical support
were at the level of marginality. Positively selected sites
mainly occurred in clusters along both P and S reading
frames. Regions N40–P49 of S and S45–T46 in P illustrate
this feature. Also, the a-determinant domain in S (K122–
G130) was embedded in a region of impressively positive
selection in P (P101–R145). Positive selection was also
prominent at the S204–L213 region of S, C-terminally
adjacent to the conserved YMDD motif in P. A complete
overview of Dn/Ds values along both reading frames
http://vir.sgmjournals.org
2
7
8
(Fig. 3c, model 8) showed an increase of Dn/Ds values
from conservation (negative selection) towards neutrality
specifically at clusters with an enhanced proportion of
positively selected sites.
DISCUSSION
Gene overlapping is an answer to the pressure to minimize
the length of the genome, as present in small-sized organisms and in RNA-encoded organisms employing errorprone RNA polymerases (Barrell et al., 1976; reviewed by
Gibbs & Keese, 1995). Alternatively, overlapping genes are
thought to simplify the control of gene expression
(Johnson & Chisholm, 2004). Various factors are thought
to have influenced the evolution of redundancy in the
genetic code (Ardell & Sella, 2001). Possibly, the skewed
redundancy in the genetic code reflects that, in early organisms, overlapping, frame-shifted genes were common.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
2141
H. L. Zaaijer and others
Table 3. Positively selected sites in the P and S overlapping
reading frames
P residue
30 T
45 S
46 T
101 P
114 I
116 Y
126 D
131 H
141 K
145 R
S residue
3N
24 R
40 N
44 G
45 A
47 T
49 P
122 K
126 I
130 G
161 F
204 S
207 N
210 N
213 L
Probability
Dn/Ds*
0.999D
1.000D
1.000D
0.984d
1.000D
0.999D
1.000D
1.000D
0.996D
0.753
3.059±0.499
3.060±0.496
3.059±0.497
3.023±0.563
3.060±0.496
3.058±0.499
3.060±0.496
3.060±0.496
3.050±0.515
2.443±0.989
0.937
0.753
0.985d
1.000D
1.000D
0.987d
0.958d
0.999D
0.999D
0.907
0.858
1.000D
0.920
0.999D
0.984d
2.398±0.394
2.090±0.718
2.474±0.200
2.499±0.028
2.499±0.026
2.478±0.186
2.432±0.323
2.498±0.046
2.498±0.048
2.346±0.482
2.270±0.563
2.499±0.026
2.370±0.441
2.498±0.050
2.473±0.203
*Dn/Ds ratios are provided with their standard errors.
DPosterior probability level of 99 %.
dPosterior probability level of 95 %.
The evolution of a genetic region containing overlapping,
frame-shifted genes is subjected to extra constraints. A
slower evolution rate of overlapping compared with nonoverlapping genome regions has been demonstrated for a
number of viruses, including HBV (Mizokami et al., 1997).
The extra constraints in the evolution of overlapping
compared with non-overlapping genetic regions are related
to the fact that a neutral or beneficial substitution in one
reading frame could be deleterious in the other reading
frame. Therefore, synonymous substitutions and adaptive
amino acid changes in one reading frame, which would be
evolutionarily neutral or beneficial in non-overlapping
genetic regions, will nevertheless be selected against, as they
could be deleterious in the other reading frame. As a result,
independent evolution of overlapping genes is restricted.
The general trend demonstrated for the evolution of the
overlapping regions is that one of the two overlapping
genes is subjected to positive selection (adaptive evolution), whilst the other is subjected to purifying selection.
In particular, this has been shown for papillomaviruses,
in which the adaptive evolution of the E2 region and
2142
purifying selection in the overlapping E4 region have been
demonstrated (Hughes & Hughes, 2005; Narechania et al.,
2005). The same phenomenon was demonstrated for the
evolution of potato leafroll virus (Guyader & Ducray,
2002). For members of the family Microviridae, in which
the A and B as well as D and E genes are overlapping, in
both overlapping regions, one gene was subjected to positive selection (Ka/Ks ratio .1, B and E genes), whilst in the
other gene, purifying selection was operational (Ka/Ks ratio
,1, A and D genes) (Pavesi, 2006). Similarly, in SIV, the
tat gene was found to be the most variable among the nine
virus genes at the amino acid level, whereas the overlapping
vpr gene appeared to be one of the most conserved
(Hughes et al., 2001). Moreover, for SIV, the adaptive
evolution of tat mirrored by purifying selection in vpr has
been demonstrated in vivo in experimentally infected
monkeys (Hughes et al., 2001).
Among viruses with overlapping genes, HBV provides a
striking example, with 50 % of its genome containing overlapping reading frames (Fig. 1). For many other viruses,
the overlapping of reading frames is partial and adaptive
evolution of both genes can occur in non-overlapping
regions. In contrast, the HBV surface protein gene S is
overlapped completely by the polymerase gene P. Whilst
the independent adaptive evolution of both P and S genes
was shown to be constrained (Mizokami et al., 1997), it
should not be precluded, as both genes must adapt to the
versatile environment. We hypothesized that HBV may
employ a mechanism that allows the independent adaptive
evolution of both overlapping genes, by which the evolution of P occurs via p1/s3 non-synonymous substitutions,
which are synonymous in S, and the evolution of S mainly
occurs via p3/s2 non-synonymous substitutions, which
are synonymous in P (Fig. 2). To test this hypothesis, we
analysed variation of nucleotides and amino acids among
450 internationally obtained HBV genomes in the overlapping P and S gene region.
We demonstrated that the evolution of the overlapping
region of the P and S genes indeed occurs mainly via p3/s2
and p1/s3 substitutions, respectively, and that substitutions
at the p2/s1 positions, which would affect amino acid of
both proteins, are rare (Fig. 3a). This mechanism was
operational in HBV evolution both within and among
genotypes (Table 1). The Dn/Ds ratio of ,1 for the whole
gene does not mean that adaptive evolution is not
operational, as adaptive mutations could have accumulated
in short regions of the gene, or even in a few nucleotide
positions. By using PAML analysis, we identified positions
of both the P and S genes where adaptive evolution is
operational (Fig. 3c). Sliding-window analysis demonstrated that, whilst significant parts of the P or S genes were
subjected to positive selection, with the Ka/Ks ratio for
either P or S gene being .1, there were only a few regions
where the Ka/Ks ratios in both genes were .1 (Fig. 4).
Whilst HBV is a rather unique example of overlapping
reading frames, this mechanism of independent evolution
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
Journal of General Virology 88
Independent evolution of HBV overlapping genes
of the overlapping regions could also apply to other
viruses. This point is supported by the observation of an
increased frequency of amino acid residues with a high
level of degeneracy (arginine, leucine and serine) in the
proteins encoded by overlapping genes of several viruses
(Pavesi et al., 1997).
Hall, T. A. (1999). BioEdit: a user-friendly biological sequence
ACKNOWLEDGEMENTS
Hughes, A. L., Westover, K., da Silva, J., O’Connor, D. H. & Watkins,
D. I. (2001). Simultaneous positive and purifying selection on
We thank Willem Vermin, SARA Computing and Networking
Services, Amsterdam, The Netherlands, for providing the information
on the LISA computer array and help with software.
overlapping reading frames of the tat and vpr genes of simian
immunodeficiency virus. J Virol 75, 7966–7972.
alignment editor and analysis program for Windows 95/98/NT.
Nucleic Acids Symp Ser 41, 95–98.
Hannoun, C., Horal, P. & Lindh, M. (2000). Long-term mutation rates
in the hepatitis B virus genome. J Gen Virol 81, 75–83.
Hughes, A. L. & Hughes, M. A. (2005). Patterns of nucleotide
difference in overlapping and non-overlapping reading frames of
papillomavirus genomes. Virus Res 113, 81–88.
Johnson, Z. I. & Chisholm, S. W. (2004). Properties of overlapping genes
are conserved across microbial genomes. Genome Res 14, 2268–2272.
MEGA3: integrated software
for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 5, 150–163.
Kumar, S., Tamura, K. & Nei, M. (2004).
REFERENCES
Anisimova, M., Bielawski, J. P. & Yang, Z. (2001). Accuracy and
power of the likelihood ratio test in detecting adaptive molecular
evolution. Mol Biol Evol 18, 1585–1592.
Anisimova, M., Bielawski, J. P. & Yang, Z. (2002). Accuracy and
power of Bayes prediction of amino acid sites under positive
selection. Mol Biol Evol 19, 950–958.
Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J. Y. & Gojobori, T.
(1997). Constrained evolution with respect to gene overlap of
hepatitis B virus. J Mol Evol 44 (Suppl. 1), S83–S90.
Moskovitz, D. N., Osiowy, C., Giles, E., Tomlinson, G. & Heathcote,
E. J. (2005). Response to long-term lamivudine treatment (up to
Ardell, D. H. & Sella, G. (2001). On the evolution of redundancy in
5 years) in patients with severe chronic hepatitis B, role of genotype
and drug resistance. J Viral Hepat 12, 398–404.
genetic codes. J Mol Evol 53, 269–281.
Myers, R., Clark, C., Khan, A., Kellam, P. & Tedder, R. (2006).
Barrell, B. G., Air, G. M. & Hutchison, C. A., III (1976). Overlapping
genes in bacteriophage WX174. Nature 264, 34–41.
Genotyping hepatitis B virus from whole- and sub-genomic fragments
using position-specific scoring matrices in HBV STAR. J Gen Virol 87,
1459–1464.
Bartholomeusz, A. & Schaefer, S. (2004). Hepatitis B virus
genotypes: comparison of genotyping methods. Rev Med Virol 14,
3–16.
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins,
D. G. & Thompson, J. D. (2003). Multiple sequence alignment with the
Narechania, A., Terai, M. & Burk, R. D. (2005). Overlapping reading
frames in closely related human papillomaviruses result in modular
rates of selection within E2. J Gen Virol 86, 1307–1313.
series of programs. Nucleic Acids Res 31, 3497–3500.
Pavesi, A. (2006). Origin and evolution of overlapping genes in the
family Microviridae. J Gen Virol 87, 1013–1017.
Cooreman, M. P., Leroux-Roels, G. & Paulij, W. P. (2001). Vaccine-
Pavesi, A., De Iaco, B., Granero, M. I. & Porati, A. (1997). On the
and hepatitis B immune globulin-induced escape mutations of
hepatitis B virus surface antigen. J Biomed Sci 8, 237–247.
informational content of overlapping genes in prokaryotic and
eukaryotic viruses. J Mol Evol 44, 625–631.
Gibbs, A. & Keese, P. K. (1995). In search of the origin of viral genes.
Yang, Z. (1997). PAML: a program package for phylogenetic analysis by
In Molecular Basis of Virus Evolution, pp. 76–90. Edited by A. Gibbs,
C. H. Calisher & F. Garcia-Arenal. Cambridge, UK: Cambridge
University Press.
Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A. M. (2000). Codon-
CLUSTAL
Guyader, S. & Ducray, D. G. (2002). Sequence analysis of potato
leafroll virus isolates reveals genetic stability, major evolutionary
events and differential selection pressure between overlapping reading
frame products. J Gen Virol 83, 1799–1807.
http://vir.sgmjournals.org
maximum likelihood. Comput Appl Biosci 13, 555–556.
substitution models for heterogeneous selection pressure at amino
acid sites. Genetics 155, 431–449.
Yang, Z., Wong, W. S. & Nielsen, R. (2005). Bayes empirical Bayes
inference of amino acid sites under positive selection. Mol Biol Evol
22, 1107–1118.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Fri, 16 Jun 2017 07:48:10
2143