* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Independent evolution of overlapping polymerase and surface
Copy-number variation wikipedia , lookup
Public health genomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene desert wikipedia , lookup
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Human genome wikipedia , lookup
Genomic imprinting wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Population genetics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression programming wikipedia , lookup
Expanded genetic code wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Point mutation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Genetic code wikipedia , lookup
Genome evolution wikipedia , lookup
Journal of General Virology (2007), 88, 2137–2143 DOI 10.1099/vir.0.82906-0 Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus Hans L. Zaaijer,1 Formijn J. van Hemert,2 Marco H. Koppelman3 and Vladimir V. Lukashov2,4 Correspondence Vladimir V. Lukashov [email protected] 1 Laboratory of Clinical Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands 2 Laboratory of Experimental Virology, Department of Medical Microbiology, CINIMA, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands 3 Department of Virology, Sanquin, Amsterdam, The Netherlands 4 Laboratory of Immunochemistry, D. I. Ivanovsky Institute of Virology, Russian Academy of Medical Sciences, Moscow, Russia Received 6 February 2007 Accepted 19 April 2007 The genome of hepatitis B virus (HBV) provides a striking example of gene overlapping. In particular, the surface protein gene S is overlapped completely by the polymerase gene P. Evolutionary constraints in overlapping genes have been demonstrated for many viruses, with one of the two overlapping genes being subjected to positive selection (adaptive evolution), while the other one is subjected to purifying selection. Yet, for HBV to persist successfully, adaptive evolution of both the P and S genes is essential. We propose that HBV employs a mechanism that allows the independent adaptive evolution of both genes. We hypothesize that (i) the adaptive evolution of P occurs via p1/s3 non-synonymous substitutions, which are synonymous in S, (ii) the adaptive evolution of S occurs via p3/s2 non-synonymous substitutions, which are synonymous in P, and (iii) p2/s1 substitutions are rare. Analysis of 450 HBV sequences demonstrated that this mechanism is operational in HBV evolution both within and among genotypes. Positions were identified in both genes where adaptive evolution is operational. Whilst significant parts of the P and S genes were subjected to positive selection, with the Ka/Ks ratio for either the P or the S gene being .1, there were only a few regions where the Ka/Ks ratios in both genes were .1. This mechanism of independent evolution of the overlapping regions could also apply to other viruses, taking into account the increased frequency of amino acids with a high level of degeneracy in the proteins encoded by overlapping genes of viruses. INTRODUCTION Overlapping of genes is a strategy used widely by viruses to condense a maximal amount of information into short genomes (Barrell et al., 1976; reviewed by Gibbs & Keese, 1995). The circular genome of hepatitis B virus (HBV) provides a striking example of gene overlapping, with 50 % of the genome consisting of overlapping reading frames (Fig. 1). In particular, the HBV surface protein gene (S; 681 nt) is overlapped completely by the polymerase gene (P; 2532 nt). Gene overlapping influences the independent evolution of the encoded proteins. Recently, evolution of overlapping versus non-overlapping regions of virus genomes has been A list of genotypes and GenBank accession numbers for the human HBV sequences used in this study is available with the online version of this paper. 0008-2906 G 2007 SGM compared for several viruses, including papillomaviruses (Hughes & Hughes, 2005; Narechania et al., 2005), bacteriophages from the family Microviridae (Pavesi, 2006), potato leafroll virus (Guyader & Ducray, 2002), simian immunodeficiency virus (SIV) (Hughes et al., 2001) and HBV (Mizokami et al., 1997). Mizokami et al. (1997) demonstrated decreased rates of synonymous nucleotide substitutions in overlapping versus non-overlapping regions of the HBV genome, providing evidence for evolutionary constraints associated with overlapping regions. This trend was also found in other viruses. Moreover, for SIV, potato leafroll virus, members of the family Microviridae and papillomaviruses, the overlapping regions were characterized by adaptive evolution (high rates of nonsynonymous substitutions, Dn/Ds ratios .1) of one reading frame, mirrored by negative selection (high rates of synonymous substitutions, Dn/Ds ratios ,1) operational in Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 Printed in Great Britain 2137 H. L. Zaaijer and others (a) (b) Fig. 1. The HBV genome. The P, S, X and C genes encode the polymerase, surface, X- and core proteins, respectively. The YMDD motif is involved in resistance to lamivudine. Vaccineinduced antibodies and specific immunoglobulins neutralize the virus via the a-determinant of the surface protein. Nucleotide position in codons Polymerase gene 1 2 3 1 2 3 1 2 3 Surface gene 3 1 2 3 1 2 3 1 2 Nucleotide sequence Polymerase codon Surface codon Wild type: ..CTTT.. CTT = Leu TTT = Phe Mutants: ..CTCT.. ..CTAT.. ..CTGT.. CTC = Leu CTA = Leu CTG = Leu TCT = Ser TAT = Tyr TGT = Cys Fig. 2. Frame-shifted position of the overlapping HBV polymerase and surface genes. (a) A nucleotide substitution in the third codon position is likely to be synonymous, allowing a non-synonymous substitution in the other reading frame (arrows). (b) Example of point mutations in an overlapping, frame-shifted sequence of the HBV genome, being simultaneously synonymous in the polymerase and non-synonymous in the surface protein. the other reading frame, a trend considered to be general for the evolution of the overlapping regions. However, for HBV to persist successfully and to overcome new challenges, adaptive evolution of both the overlapping P and S genes is essential. In particular, adaptive evolution of the P gene in response to antiviral therapy renders the virus resistant to antiviral drugs, and mutations in the S gene allow virus escape from neutralizing antibodies (Cooreman et al., 2001; Hannoun et al., 2000; Moskovitz et al., 2005). In this study, we analysed the possibility that the HBV polymerase and surface proteins evolve independently, despite being encoded by the same nucleotide sequence. We hypothesized that HBV may employ a mechanism that allows the independent adaptive evolution of both proteins encoded by the same sequence. In the overlapping, frameshifted polymerase/surface region of the HBV genome, the first nucleotide position in a P codon corresponds to the third position in the S codon (p1/s3), the second position in a P codon to the first position in the S codon (p2/s1) and the third position in a P codon to the second position in the S codon (p3/s2) (Fig. 2). The position of a nucleotide substitution within a codon strongly influences the chance for the substitution to be synonymous or not. A nucleotide substitution in the first codon position causes an amino acid change in 60 of 64 cases, in the second position in 63 of 64 cases and in the third position in only 16 of 64 cases. Hence, nucleotide substitution in the first position of a P codon (p1/s3) is likely to cause amino acid changes in P, but not in S, whereas substitutions in the third position of a P codon (p3/s2) are probably synonymous in P, but nonsynonymous in S. Thus, an adaptive, non-synonymous nucleotide substitution in the first position of a P codon is likely to remain synonymous in S, whereas an adaptive, non-synonymous nucleotide substitution in the second position of an S codon is likely to remain synonymous in P. 2138 On the other hand, nucleotide substitutions in the p2/s1 positions in most cases will be non-synonymous in both genes. We predicted that adaptive evolution of HBV occurs via p1/s3 mutations in the P gene and via p3/s2 mutations in the S gene, and that p2/s1 mutations are rare. METHODS To validate this hypothesis, we studied variation of nucleotides and amino acids in the overlapping P and S gene region (227 codons) of 450 internationally obtained HBV genomes. Sequences of human HBV were retrieved from GenBank, including the 11 HBV genotype standards as proposed by Bartholomeusz & Schaefer (2004). The sequences were genotyped by using the HBV STAR program (Myers et al., 2006). Genotypes and GenBank accession numbers of the 450 HBV sequences are reported in the Supplementary Material, available in JGV Online. The sequences were aligned by using the CLUSTAL_W software (Chenna et al., 2003). The variation of 227 aa in the polymerase and surface protein and the variation in the three sets of nucleotide sites, comprising respectively the 227 p1/s3 sites, the 227 p2/s1 sites and the 227 p3/s2 sites, were studied by analysing the entropy values for every individual amino acid and nucleotide position, as implemented in the BioEdit software (Hall, 1999). As a measure of the variation of amino acids at any given position, the entropy values theoretically vary from zero (no variation) to 3.04 (all amino acids and the stop codon occur in equal proportions). For the variation of nucleotides at a given position, the entropy values range from zero (no variation) to 1.39 (A, C, G and T occur at an equal frequency) (Hall, 1999). The overall mean synonymous (Ds) and non-synonymous (Dn) distances in the overlapping region of the P and S genes were calculated by using the MEGA 3 software (Kumar et al., 2004). The Nei–Gojobori method with Jukes–Cantor correction was used; standard error was calculated by using bootstrap resampling with 1000 replications. The CODEML module of PAML 3.15 (Yang, 1997; Yang et al., 2000) was applied for estimation of the rates of synonymous and nonsynonymous nucleotide substitutions at sites in the overlapping Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 Journal of General Virology 88 Independent evolution of HBV overlapping genes reading frames of P and S separately by maximum-likelihood (ML) approximation. The nested site models (1 and 2, 7 and 8) of PAML were run on an array of 680 computers called LISA, hosted by SARA Computing and Networking Services, Amsterdam, The Netherlands. Each computer (node) in the array was provided with two Intel Xeon processors working at 3.4 GHz on 2–4 GB EM64T memory under OS Debian Linux. Jobs submitted individually consisted of a single model preceded by model 0 in order to trace inconsistency of input data. The approach allowed for the simultaneous estimation of Dn/Ds parameters by parallel jobs assigned to different nodes of the array. When parameter trapping near the border of the parameter space did not occur, it took the most complex model, 8 [default settings (Yang et al., 2000)], 9–13 h to converge to an optimal likelihood estimate, given 400 HBV sequences of 226 codons each. For proper convergence, the presence of identical sequences (Dn/Ds50/0) should be avoided. In addition, the use of an initial input tree with MLestimated branch lengths (copied from models 0, 1 or 7 into models 2 or 8) may prevent parameter trapping and hence a very slow convergence of model 2 and 8 values for ML to those of models 1 and 7. Likelihood-ratio tests (LRT) and Bayes Empirical Bayes (BEB) statistics (Anisimova et al., 2001, 2002; Yang et al., 2005) were applied as described in the PAML manual (http://abacus.gene.ucl.ac.uk/ software/paml.html). Analysis of the rate of synonymous substitutions (the number of synonymous substitutions per synonymous site, Ks) and nonsynonymous substitutions (the number of non-synonymous substitutions per non-synonymous site, Ka), as well as their ratios (Ka/ Ks), in a sliding window was performed by using the SWAAP 1.0.2 software (http://www.bacteriamuseum.org/SWAAP/SwaapPage.htm). The Nei–Gojobori distance estimation method was used for a sliding window (window length, 15 nt; window step, 3 nt, which is the maximal resolution of the program). Due to the limited number of sequences that can be analysed simultaneously by the program, 99 sequences, selected randomly from the total of 450, were analysed each time. So, for the total of 450 sequences, four sets of 99 sequences and one set of 54 sequences were analysed and the data were pooled, together giving 1110 window comparisons. RESULTS Nucleotide variation at the 227 p1/s3, p2/s1 and p3/s2 positions of the overlapping P/S gene region for all 450 sequences is shown in Fig. 3(a). As predicted by our hypothesis, variation at the p2/s1 nucleotide sites, which, with high probability, affects the amino acid sequences of both the polymerase and the surface protein, was the lowest. The cumulative entropy value for all p2/s1 positions was 10.5, compared with the cumulative entropy values of 26.7 and 24.3 for the p1/s3 and p3/s2 positions, respectively. A cluster of p2/s1 variation occurred at codons 110– 140, which encode the immunodominant a-determinant of the surface protein. As predicted, amino acid variation of the surface protein was largely determined by nucleotide substitutions at the p3/s2 sites, whereas amino acid variation of the polymerase was primarily caused by substitutions of the p1/s3 nucleotides. Some additional variation was imposed on both proteins by rare p2/s1 mutations. In some instances, p1/s3 nucleotide variation did not cause amino acid changes in the polymerase (Fig. 3a). This was due to partial degeneration of the first nucleotide position, as four of 64 possible http://vir.sgmjournals.org substitutions are synonymous: TTA–CTA and TTG–CTG coding for leucine, CGA–AGA and CGG–AGG coding for arginine. The only substitution at the second nucleotide position that is synonymous is in stop codons (TAA– TGA); hence, all p3/s2 nucleotide variation translates into amino acid changes of the surface protein. To establish whether the difference in variation of the p1/s3 and p3/s2 versus p2/s1 positions is related to HBV diversification within or among genotypes and to confirm that our findings are not limited to certain HBV genotypes, we subsequently analysed nucleotide variation at the p1/s3, p2/s1 and p3/s2 positions among sequences belonging to each genotype and among unassigned sequences, as well as among genotype consensus sequences of genotypes A–H (Table 1). For every genotype from A to H, as well as for unassigned sequences, the cumulative entropy values for the p2/s1 positions were markedly lower that those for the p1/s3 and p3/s2 positions. For instance, for genotype B, which was represented by 120 sequences (118 plus two reference sequences), the cumulative entropy value for the p2/s1 positions was 2.3, compared with the values of 9.8 and 9.5 for the p1/s3 and p3/s2 positions, respectively. The same trend was observed for genotype E, which was represented by five sequences (four plus one reference sequence): 1.2, compared with 3.0 and 2.2. This pattern of markedly lower variation at the p2/s1 than at the p1/s3 and p3/s2 positions was also observed when genotype consensus sequences were compared with each other (Table 1). The overall mean Dn and Ds in the overlapping region of the P and S genes, calculated by using the MEGA program, were similar for both genes: for P, respectively 0.035±0.005 and 0.109±0.014 (Dn/Ds ratio of 0.32), and for S, respectively 0.041±0.005 and 0.092±0.015 (Dn/Ds ratio of 0.45). Hence, negative selection is generally dominating in the evolution of both genes. However, this does not preclude certain regions of both genes being subjected to positive selection. Our analysis of the Ka/Ks distribution throughout the P and S genes by using the sliding-window approach has identified these regions (Fig. 3b). To analyse the relationships between the Ka/Ks ratios in overlapping sequences of P and S, all 450 sequences were separated randomly into five subsets and, for each subset, the Ka/Ks ratio was calculated by using the sliding-window approach for each 5 aa long P and corresponding (+1 nt) S peptide with a step of 1 aa. Subsequently, the data for 1110 comparisons of P and S peptides were plotted together (Fig. 4). For the vast majority of window comparisons, the Ka/Ks ratios of both P and corresponding S peptides were ,1. Yet, of the total 1100 P and 1110 S peptide subsets analysed, the Ka/Ks ratios in 170 P and 222 S subsets were .1. However, a Ka/Ks ratio of .1 in a P peptide subset was associated with a strong decline of the Ka/Ks ratio in the corresponding S peptides, and vice versa. Among the 1110 subsets analysed, there were only 28 P and corresponding S subsets for both of which the Ka/Ks ratios were .1 (Fig. 4). Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 2139 H. L. Zaaijer and others a-determinant Variation (entropy value) (a) Polymerase: Amino acids p1/s3 nucleotides Surface protein: Amino acids p3/s2 nucleotides 1 0 1 21 (b) 41 61 81 101 121 141 161 181 201 221 Codon number in HBV surface gene p2/s1 nucleotides 5 Ka/Ks 4 3 P Ka/Ks 2 S Ka/Ks 1 200 (c) 400 Nucleotide position 3.5 600 S 3.0 P Dn/Ds 2.5 2.0 1.5 1.0 0.5 1 21 41 61 81 101 121 141 161 181 201 221 Amino acid position Table 1. Intra- and intergenotype variation at the p1/s3, p2/s1 and p3/s2 positions Genotype A B C D E F G H Unassigned sequences Consensus A–H 2140 Cumulative entropy values for sites: p1/s3 p2/s1 p3/s2 12.9 9.8 11.4 11.4 3.0 10.1 1.9 2.5 18.0 23.6 5.2 2.3 3.5 4.7 1.2 1.4 0.0 0.8 7.4 7.3 7.5 9.5 11.0 6.9 2.2 3.3 0.8 2.6 11.1 21.8 Fig. 3. Variation of amino acids and nucleotides in the overlapping HBV polymerase and surface protein genes. (a) Variation of the three sets of nucleotides: p1/s3 sites (blue), p2/s1 sites (red) and p3/s2 sites (green), in relation to the amino acid variation in the P and S proteins. Orange dots indicate synonymous variation at p1/s3 positions, caused by redundancy at the first nucleotide position in codons. (b) Ka/Ks ratios were measured in a sliding window throughout the P and S genes. The data for 99 randomly selected sequences are plotted. The Ka/Ks axis is cut off at the value of 5 for the purpose of presentation. (c) Dn/Ds values were taken from the BEB output of PAML model 8. Clusters of sites prone to positive selection are prominent at regions around positions 45, 125 and 208. In the a-determinant domain of S (codons 110–135), Dn/Ds values of the corresponding sites in P tended to escape from purifying selection towards neutrality. To identify sites subjected to positive selection in the overlapping region of P and S, several models implemented in the PAML program were applied. The basic model 0, which assumes rate constancy among all sites and branches, generated mean posterior values for Dn/Ds ratios of 0.338 (P) and 0.430 (S) (Table 2), similar to ratios obtained by the less complex and laborious approach implemented in the MEGA program. The transition/transversion ratios varied between 2.3 and 2.7 in all models and are presented for model 0 only. Model 2 differs from model 1 by having an extra site class accounting for sites with Dn/Ds ratios .1. According to model 1, 77.8 and 72.7 % of sites in P and S displayed mean posterior values for Dn/Ds of 0.109 and 0.166, respectively. The extra site class in model 2 has accumulated sites of P and S with mean Dn/Ds values of 3.276 (4.2 %) and 2.412 (6.5 %), respectively. This indicated that positive selection is operational in the overlapping region of P and S. The LRT provided further support for this observation. Twice the difference between lnL(model Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 Journal of General Virology 88 Independent evolution of HBV overlapping genes Table 2. Log-likelihood values and parameter estimates P Ka/Ks 10 000 10 100 9 0.001 8 1000 100 000 0.01 0.0001 7 S Ka/Ks P Ka/Ks 6 5 4 3 ‘Model’ indicates the PAML CODEML model applied for the determination of nucleotide-substitution rates at codon sites. Figures indicative of positive selection are in bold. Transition/transversion ratios varied between 2.3 and 2.7 for all models and are presented for the basic model 0 only. Values for likelihood convergence (lnL) support the validity of models 2 and 8 versus 1 and 7, respectively, for the detection of positive selection (P,0.001). p0–p11 show the proportion of sites in the category indicated by the corresponding mean posterior value for Dn/Ds. In models 7 and 8, the parameters p and q indicate the b-distribution, i.e. the Dn/Ds ratio as a function of the proportion of sites with a certain Dn/Ds ratio. Model/parameter 2 S ts/tv lnL Dn/Ds 2.534 213 042.470 0.338 2.311 213 461.882 0.430 lnL Dn/Ds (p0) Dn/Ds (p1) 212 626.902 0.109 (0.778) 1.000 (0.222) 213 185.952 0.166 (0.727) 1.000 (0.273) lnL Dn/Ds (p0) Dn/Ds (p1) Dn/Ds (p2) 212 545.411 0.117 (0.761) 1.000 (0.196) 3.276 (0.042) 213143.474 0.182 (0.713) 1.000 (0.222) 2.412 (0.065) lnL p q 212 641.481 0.347 0.723 213 208.874 0.553 0.835 lnL p q p0 Dn/Ds (p11) 212 549.478 0.490 1.349 0.955 2.970 (0.045) 213 144.096 0.994 2.236 0.905 2.017 (0.095) 0 1 0 P 1 2 3 4 5 6 7 8 9 10 0 1 S Ka/Ks Fig. 4. Relationships between the Ka/Ks ratios in P and corresponding S peptides. The axes of the main figure are linear and cut off at the value of 10 for the purpose of presentation. As the Ka/Ks ratio for either P or the corresponding S peptide for a few comparisons was above 10, all 1110 data points are also plotted on the log scale (inset). 1) and lnL(model 2) amounted to 163 (P) and 85 (S), indicating a rejection of model 1 in favour of model 2 at a confidence level of P,0.001. The nested models 7 and 8 employ 10 and 11 site classes, respectively, for the accumulation of Dn/Ds values at sites. Site class 11 (model 8, Dn/Ds.1) is presented in Table 2, showing Dn/Ds ratios and LRT values similar to those of the model pair 1 and 2. The parameters p and q describe the b-distribution – the Dn/Ds ratio as a function of the proportion of sites with a certain ratio. A skewed distribution indicates the proportion of sites that are either highly conserved or nearly neutral. The values for p and q for P and S pointed to a similarly shaped b-distribution. Having validated models 2 and 8 by means of the LRT, we used the BEB output of these models to identify the amino acid sites subjected to positive selection (Table 3). Positively selected sites identified by models 2 and 8 were the same. Differences between the figures for statistical support were at the level of marginality. Positively selected sites mainly occurred in clusters along both P and S reading frames. Regions N40–P49 of S and S45–T46 in P illustrate this feature. Also, the a-determinant domain in S (K122– G130) was embedded in a region of impressively positive selection in P (P101–R145). Positive selection was also prominent at the S204–L213 region of S, C-terminally adjacent to the conserved YMDD motif in P. A complete overview of Dn/Ds values along both reading frames http://vir.sgmjournals.org 2 7 8 (Fig. 3c, model 8) showed an increase of Dn/Ds values from conservation (negative selection) towards neutrality specifically at clusters with an enhanced proportion of positively selected sites. DISCUSSION Gene overlapping is an answer to the pressure to minimize the length of the genome, as present in small-sized organisms and in RNA-encoded organisms employing errorprone RNA polymerases (Barrell et al., 1976; reviewed by Gibbs & Keese, 1995). Alternatively, overlapping genes are thought to simplify the control of gene expression (Johnson & Chisholm, 2004). Various factors are thought to have influenced the evolution of redundancy in the genetic code (Ardell & Sella, 2001). Possibly, the skewed redundancy in the genetic code reflects that, in early organisms, overlapping, frame-shifted genes were common. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 2141 H. L. Zaaijer and others Table 3. Positively selected sites in the P and S overlapping reading frames P residue 30 T 45 S 46 T 101 P 114 I 116 Y 126 D 131 H 141 K 145 R S residue 3N 24 R 40 N 44 G 45 A 47 T 49 P 122 K 126 I 130 G 161 F 204 S 207 N 210 N 213 L Probability Dn/Ds* 0.999D 1.000D 1.000D 0.984d 1.000D 0.999D 1.000D 1.000D 0.996D 0.753 3.059±0.499 3.060±0.496 3.059±0.497 3.023±0.563 3.060±0.496 3.058±0.499 3.060±0.496 3.060±0.496 3.050±0.515 2.443±0.989 0.937 0.753 0.985d 1.000D 1.000D 0.987d 0.958d 0.999D 0.999D 0.907 0.858 1.000D 0.920 0.999D 0.984d 2.398±0.394 2.090±0.718 2.474±0.200 2.499±0.028 2.499±0.026 2.478±0.186 2.432±0.323 2.498±0.046 2.498±0.048 2.346±0.482 2.270±0.563 2.499±0.026 2.370±0.441 2.498±0.050 2.473±0.203 *Dn/Ds ratios are provided with their standard errors. DPosterior probability level of 99 %. dPosterior probability level of 95 %. The evolution of a genetic region containing overlapping, frame-shifted genes is subjected to extra constraints. A slower evolution rate of overlapping compared with nonoverlapping genome regions has been demonstrated for a number of viruses, including HBV (Mizokami et al., 1997). The extra constraints in the evolution of overlapping compared with non-overlapping genetic regions are related to the fact that a neutral or beneficial substitution in one reading frame could be deleterious in the other reading frame. Therefore, synonymous substitutions and adaptive amino acid changes in one reading frame, which would be evolutionarily neutral or beneficial in non-overlapping genetic regions, will nevertheless be selected against, as they could be deleterious in the other reading frame. As a result, independent evolution of overlapping genes is restricted. The general trend demonstrated for the evolution of the overlapping regions is that one of the two overlapping genes is subjected to positive selection (adaptive evolution), whilst the other is subjected to purifying selection. In particular, this has been shown for papillomaviruses, in which the adaptive evolution of the E2 region and 2142 purifying selection in the overlapping E4 region have been demonstrated (Hughes & Hughes, 2005; Narechania et al., 2005). The same phenomenon was demonstrated for the evolution of potato leafroll virus (Guyader & Ducray, 2002). For members of the family Microviridae, in which the A and B as well as D and E genes are overlapping, in both overlapping regions, one gene was subjected to positive selection (Ka/Ks ratio .1, B and E genes), whilst in the other gene, purifying selection was operational (Ka/Ks ratio ,1, A and D genes) (Pavesi, 2006). Similarly, in SIV, the tat gene was found to be the most variable among the nine virus genes at the amino acid level, whereas the overlapping vpr gene appeared to be one of the most conserved (Hughes et al., 2001). Moreover, for SIV, the adaptive evolution of tat mirrored by purifying selection in vpr has been demonstrated in vivo in experimentally infected monkeys (Hughes et al., 2001). Among viruses with overlapping genes, HBV provides a striking example, with 50 % of its genome containing overlapping reading frames (Fig. 1). For many other viruses, the overlapping of reading frames is partial and adaptive evolution of both genes can occur in non-overlapping regions. In contrast, the HBV surface protein gene S is overlapped completely by the polymerase gene P. Whilst the independent adaptive evolution of both P and S genes was shown to be constrained (Mizokami et al., 1997), it should not be precluded, as both genes must adapt to the versatile environment. We hypothesized that HBV may employ a mechanism that allows the independent adaptive evolution of both overlapping genes, by which the evolution of P occurs via p1/s3 non-synonymous substitutions, which are synonymous in S, and the evolution of S mainly occurs via p3/s2 non-synonymous substitutions, which are synonymous in P (Fig. 2). To test this hypothesis, we analysed variation of nucleotides and amino acids among 450 internationally obtained HBV genomes in the overlapping P and S gene region. We demonstrated that the evolution of the overlapping region of the P and S genes indeed occurs mainly via p3/s2 and p1/s3 substitutions, respectively, and that substitutions at the p2/s1 positions, which would affect amino acid of both proteins, are rare (Fig. 3a). This mechanism was operational in HBV evolution both within and among genotypes (Table 1). The Dn/Ds ratio of ,1 for the whole gene does not mean that adaptive evolution is not operational, as adaptive mutations could have accumulated in short regions of the gene, or even in a few nucleotide positions. By using PAML analysis, we identified positions of both the P and S genes where adaptive evolution is operational (Fig. 3c). Sliding-window analysis demonstrated that, whilst significant parts of the P or S genes were subjected to positive selection, with the Ka/Ks ratio for either P or S gene being .1, there were only a few regions where the Ka/Ks ratios in both genes were .1 (Fig. 4). Whilst HBV is a rather unique example of overlapping reading frames, this mechanism of independent evolution Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 Journal of General Virology 88 Independent evolution of HBV overlapping genes of the overlapping regions could also apply to other viruses. This point is supported by the observation of an increased frequency of amino acid residues with a high level of degeneracy (arginine, leucine and serine) in the proteins encoded by overlapping genes of several viruses (Pavesi et al., 1997). Hall, T. A. (1999). BioEdit: a user-friendly biological sequence ACKNOWLEDGEMENTS Hughes, A. L., Westover, K., da Silva, J., O’Connor, D. H. & Watkins, D. I. (2001). Simultaneous positive and purifying selection on We thank Willem Vermin, SARA Computing and Networking Services, Amsterdam, The Netherlands, for providing the information on the LISA computer array and help with software. overlapping reading frames of the tat and vpr genes of simian immunodeficiency virus. J Virol 75, 7966–7972. alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41, 95–98. Hannoun, C., Horal, P. & Lindh, M. (2000). Long-term mutation rates in the hepatitis B virus genome. J Gen Virol 81, 75–83. Hughes, A. L. & Hughes, M. A. (2005). Patterns of nucleotide difference in overlapping and non-overlapping reading frames of papillomavirus genomes. Virus Res 113, 81–88. Johnson, Z. I. & Chisholm, S. W. (2004). Properties of overlapping genes are conserved across microbial genomes. Genome Res 14, 2268–2272. MEGA3: integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 5, 150–163. Kumar, S., Tamura, K. & Nei, M. (2004). REFERENCES Anisimova, M., Bielawski, J. P. & Yang, Z. (2001). Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18, 1585–1592. Anisimova, M., Bielawski, J. P. & Yang, Z. (2002). Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol 19, 950–958. Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J. Y. & Gojobori, T. (1997). Constrained evolution with respect to gene overlap of hepatitis B virus. J Mol Evol 44 (Suppl. 1), S83–S90. Moskovitz, D. N., Osiowy, C., Giles, E., Tomlinson, G. & Heathcote, E. J. (2005). Response to long-term lamivudine treatment (up to Ardell, D. H. & Sella, G. (2001). On the evolution of redundancy in 5 years) in patients with severe chronic hepatitis B, role of genotype and drug resistance. J Viral Hepat 12, 398–404. genetic codes. J Mol Evol 53, 269–281. Myers, R., Clark, C., Khan, A., Kellam, P. & Tedder, R. (2006). Barrell, B. G., Air, G. M. & Hutchison, C. A., III (1976). Overlapping genes in bacteriophage WX174. Nature 264, 34–41. Genotyping hepatitis B virus from whole- and sub-genomic fragments using position-specific scoring matrices in HBV STAR. J Gen Virol 87, 1459–1464. Bartholomeusz, A. & Schaefer, S. (2004). Hepatitis B virus genotypes: comparison of genotyping methods. Rev Med Virol 14, 3–16. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G. & Thompson, J. D. (2003). Multiple sequence alignment with the Narechania, A., Terai, M. & Burk, R. D. (2005). Overlapping reading frames in closely related human papillomaviruses result in modular rates of selection within E2. J Gen Virol 86, 1307–1313. series of programs. Nucleic Acids Res 31, 3497–3500. Pavesi, A. (2006). Origin and evolution of overlapping genes in the family Microviridae. J Gen Virol 87, 1013–1017. Cooreman, M. P., Leroux-Roels, G. & Paulij, W. P. (2001). Vaccine- Pavesi, A., De Iaco, B., Granero, M. I. & Porati, A. (1997). On the and hepatitis B immune globulin-induced escape mutations of hepatitis B virus surface antigen. J Biomed Sci 8, 237–247. informational content of overlapping genes in prokaryotic and eukaryotic viruses. J Mol Evol 44, 625–631. Gibbs, A. & Keese, P. K. (1995). In search of the origin of viral genes. Yang, Z. (1997). PAML: a program package for phylogenetic analysis by In Molecular Basis of Virus Evolution, pp. 76–90. Edited by A. Gibbs, C. H. Calisher & F. Garcia-Arenal. Cambridge, UK: Cambridge University Press. Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A. M. (2000). Codon- CLUSTAL Guyader, S. & Ducray, D. G. (2002). Sequence analysis of potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J Gen Virol 83, 1799–1807. http://vir.sgmjournals.org maximum likelihood. Comput Appl Biosci 13, 555–556. substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449. Yang, Z., Wong, W. S. & Nielsen, R. (2005). Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol 22, 1107–1118. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Fri, 16 Jun 2017 07:48:10 2143