Download Positively selected sites on the surface glycoprotein (G) of infectious

Journal of General Virology (2008), 89, 703–708 DOI 10.1099/vir.0.83451-0 Positively selected sites on the surface glycoprotein (G) of infectious hematopoietic necrosis virus Scott E. LaPatra,1 Caryn Evilia2 and Vern Winston2 Correspondence 1 Vern Winston 2 Clear Springs Foods Inc., PO Box 712, Buhl, ID 83316, USA Department of Biological Sciences, Campus Box 8007, Idaho State University, Pocatello, ID 83209-8007, USA [email protected] Received 17 September 2007 Accepted 1 November 2007 Mutations in the surface glycoprotein (G) of infectious hematopoietic necrosis virus (IHNV), a rhabdovirus that causes significant losses in hatcheries raising salmonid fish, were studied. A 303 nt segment (mid-G region) of this protein from 88 Idaho isolates of IHNV was sequenced. Evidence of positive selection at individual codon sites was estimated by using a Bayesian method (MrBayes). A software algorithm (CPHmodels) was used to construct a threedimensional (3D) representation of the IHNV protein. The software identified structural homologies between the IHNV G protein and the surface glycoprotein of vesicular stomatitis virus (VSV) and used the VSV structure as a template for predicting the IHNV structure. The amino acids predicted to be under positive selection were mapped onto the proposed IHNV 3D structure and appeared at sites on the surface of the protein where antigen–antibody interaction should be possible. The sites identified as being under positive selection on the IHNV protein corresponded to those reported by others as active sites of mutation for IHNV, and also as antigenic sites on VSV. Knowledge of the sites where genetic variation is positively selected enables a better understanding of the interaction of the virus with its host, and with the host immune system. This information could be used to develop strategies for vaccine development for IHNV, as well as for other viruses. INTRODUCTION Infectious hematopoietic necrosis virus (IHNV) is an important rhabdoviral pathogen of salmonid fishes (Tordo et al., 2005), causing a large economic impact on commercial fish farms as well as hatcheries raising fish for restocking and mitigation efforts (http://usda.mannlib. cornell.edu/usda/current/TrouProd/TrouProd-02-26-2007. pdf). Because of its economic importance, IHNV has been the subject of intense study. Much of this effort has been directed toward an understanding of the evolution of the virus (Nichol et al., 1995; Oshima et al., 1995; Huang et al., 1996; Emmenegger et al., 2000; Troyer et al., 2000; Kurath et al., 2003; Troyer & Kurath, 2003), with particular focus on identifying the sites at which the virus proteins change as they evolve. Because of physical and functional constraints, very few mutations result in an increase in virus fitness (Domingo, 2006). Mutations that result in decreased fitness of the virus are removed from the gene pool by negative selection. Some (if not most) changes are neutral. They have no negative effect on fitness, but also do not provide a selective advantage (Domingo, 2006). In the case of virus surface proteins, those that demonstrate enhanced fitness might behave in a number of ways. In one instance, the change might result in an improved interaction with the host, by 0008-3451 G 2008 SGM more efficient host binding, entry or uncoating of the virus. Alternatively, the changes might disrupt the interaction of the virus with the proteins of the host immune system. Specific antibodies or receptors on the cells of the immune system might not recognize the altered epitopes as effectively, resulting in an enhanced ability of the virus to escape the defence systems of the host. These sites are under positive selection and are identifiable because the number of non-synonymous amino acid changes at these sites exceeds the number of synonymous changes (Domingo, 2006; Yang et al., 2000). Until the advent of high-speed computers, identification of sites of positive selection was not reliable. However, with the widespread availability of high-speed computers, it has been possible to develop methods to identify individual sites under positive selection. Two of the most commonly used programs to identify sites of positive selection are PAML (Yang, 2007) and MrBayes (Huelsenbeck & Dyer, 2004). In our study, recent (1990–2006) isolates of IHNV were obtained from commercial fish farms in the state of Idaho, USA. A 303 nt segment of the major surface (G) protein gene was amplified and sequenced. The sequences were evaluated by using a fully Bayesian method (MrBayes; Huelsenbeck & Dyer, 2004) to identify codons where the rate of nonsynonymous mutation exceeded that of synonymous mutation in a manner consistent with positive selection. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 Printed in Great Britain 703 S. E. LaPatra, C. Evilia and V. Winston To test the relevance of these predictions, attempts were made to correlate the location of positively selected sites on the amino acid sequence with the three-dimensional (3D) structure of the virus protein. As the structure of the IHNV G surface protein has not been determined, we used a software algorithm (CHPmodels; Lund et al., 2002) to construct a 3D representation of the IHNV G protein. This approach utilized the structural and sequence homologies between the IHNV G sequence and the vesicular stomatitis virus (VSV) surface protein sequence (Roche et al., 2007) to construct a model of the IHNV molecule. The locations of positively selected sites were mapped onto the predicted 3D model of the IHNV protein. The predicted sites on the IHNV sequence mapped onto the surface of the protein that would be expected to be in contact with antibodies and/or cellular receptors, and at sites reported by others (Huang et al., 1996; Troyer et al., 2000) to be sites of mutation in the IHNV molecule. These regions also corresponded to the major antigenic sites of the VSV surface protein (Vandepol et al., 1986). METHODS Source of virus. Isolates of IHNV were collected from outbreaks at commercial fish farms located in Idaho, USA, over the course of 16 years (1990–2006). Samples were inoculated into cell culture (EPC or CHSE-214 cells) and identified as IHNV by virus neutralization. Viral lysates were stored at 275 uC. None of the isolates were passaged more than three times in culture. 2.0 homology modelling server (http://www.cbs.dtu.dk/services/ CPHmodels/) (Lund et al., 2002). The amino acid sequence of the IHNV G protein (GenBank accession no. AAC42146; SRCV strain) was used as input to the web interface. The software identified the B chain of the VSV surface protein (2CMZ.pdb) as the highest-scoring template candidate, and constructed a 3D representation of the IHNV protein based on this template. The locations of positively selected sites were visualized by using VMD software (Visual Molecular Dynamics). A CLUSTAL W alignment (Thompson et al., 1994) was used to confirm the correspondence between IHNV and VSV sequences (data not shown). RESULTS AND DISCUSSION Sequence analysis In total, 88 sequences were obtained. Of these, seven were unique. The rest were observed between two and 31 times (Table 1). Representatives of each of these sequence groups were aligned by codon and analysed by using MrBayes. To validate the software used in these experiments, preliminary experiments used MrBayes to evaluate datasets of influenza HA protein (haemagglutinin), human immunodeficiency virus envelope protein and b-globin for evidence of positive selection. The results obtained were in strong agreement with those obtained by other computational approaches (Yang et al., 2000; Huelsenbeck & Dyer, 2004) (data not shown). The consensus tree obtained by using MrBayes on the IHNV sequence is shown in Fig. 1. This tree was in agreement with the trees describing the Isolation of RNA, RT-PCR and sequencing. RNA was isolated from cell lysates by using a QIAamp viral RNA mini kit (Qiagen) following the manufacturer’s instructions. DNA was synthesized from viral RNA by using an RT-PCR kit (Qiagen One-Step RT-PCR) as directed by the manufacturer, using outer primers described by Emmenegger et al. (2000). Before sequencing, PCR primers were removed by using ExoSAP shrimp alkaline phosphatase (USB). PCR products were sequenced by using an ABI 3100 Genetic Analyzer (Idaho State University Molecular Research Core Facility), BigDye chemistry (ABI) and the inner primers described by Emmenegger et al. (2000) as sequencing primers. Each PCR product was sequenced in both directions and the output was analysed by using the Staden Package (Staden et al., 2000) for evaluation of base calls and production of a contiguous alignment of the complementary sequence. Sequence analysis. Sequences were aligned by codon and trimmed to the 303 nt mid-G sequence reported by others (Emmenegger et al., 2000; Troyer et al., 2000; Kurath et al., 2003; Troyer & Kurath, 2003). Reference sequences included GenBank accession numbers AF237983– AF237992, which represent earlier Idaho isolates (Troyer et al., 2000), L40878, a representative of M clade isolates (Nichol et al., 1995), and L40881 (SRCV), used by others (Troyer & Kurath, 2003) as an outgroup sequence. Bayesian analysis was performed by using MrBayes 3.1.2 (Huelsenbeck & Dyer, 2004) in parallel mode. Specific commands for MrBayes were: ‘lset nucmodel5codon nst52 omegavar5m3’, ‘report possel5yes’. Each chain was run for 1.26106 cycles. The sump and sumt commands were used to tabulate posterior probabilities of positive selection of each amino acid site, and to build consensus trees. Results of estimations obtained before the process reached convergence were discarded. Typically, the first 200 000 cycles were discarded. Mapping of IHNV sites on VSV 3D structure. The 3D structure of Table 1. Representatives of each of the sequence groups and frequencies of occurrence among the 88 sequences obtained The Fa18 sequence is identical to that of GenBank accession no. AF237991. The Fw38 sequence is identical to that of GenBank accession no. AF237983. The Fw40 sequence is identical to that of GenBank accession no. AF237987 (WRAC strain). Representative isolate Fa1 Fa2 Fa6 Fa9 Fa11 Fa13 Fa18 Fw4 Fw6 Fw7 Fw12 Fw14 Fw33 Fw34 Fw35 Fw38 Fw40 GenBank accession no. EU249526 EU249527 EU249528 EU249529 EU249524 EU249525 EU249539 EU249535 EU249536 EU249537 EU249530 EU249531 EU249532 EU249533 EU249534 EU249538 EU249540 No. occurrences 1 9 1 22 2 2 1 1 4 1 2 31 1 1 4 2 3 the IHNV G protein was predicted by using the CPHmodels 704 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 Journal of General Virology 89 Positively selected sites on G protein of IHNV Fig. 1. Tree produced by MrBayes using nonredundant Idaho sequences (indicated by a prefix of Fw or Fa). GenBank sequences with accession numbers AF237983–AF237992, L40881 (outgroup) and L40878 are also included. Suffixes following sequence names indicate the number of times that the sequence was observed. Clade credibility values (%) are shown above branches. relationships of Idaho strains reported by others (Troyer et al., 2000; Troyer & Kurath, 2003) in the following respects: (i) all (except the outgroup) were descended from group M strains; (ii) the two major clades that were observed were divided so that clade A–B contained the reference sequences associated with subgroups A and B of group M as previously reported (Troyer et al., 2000); and (iii) clade C–D contained the reference sequences associated with subgroups C and D reported by the same authors (Troyer et al., 2000). Estimates of mean posterior probability that a codon was under positive selection are represented in Table 2, Fig. 2 and Fig. 3(a–c). When all of the sequences (clade A–B and clade C–D) represented in Fig. 1 were analysed together, the amino acids with a mean probability of being under positive selection .95 % were aa 252, 256, 270, 272 and 277 (using the numbering scheme for the whole protein) (Table 2; Figs 2a, 3a). Inspection of the results showed a Table 2. Correlation between amino acid sites and posterior probabilities of positive selection –, Values ,70 %. Amino acid 220 247 251 252 256 270 272 277 284 Probability of positive selection Clade A–B and C–D 0.746178 0.812231 0.776420 0.964077 0.999696 0.990682 0.968439 0.979865 0.728598 http://vir.sgmjournals.org Clade A–B Clade C–D 0.955825 – – – 0.993278 0.994983 0.989305 0.964256 0.964192 – 0.956028 0.884004 0.997489 0.998601 0.915735 0.960275 0.878237 – number of sites with mean probabilities .70 % of being under positive selection. These sites were 220, 247, 251 and 284 (Table 2; Figs 2a, 3a). These results are in general agreement with those of others. Huang et al. (1996) reported that changes at aa 78, 81, 230–231, 272–273 and 275–276 (Fig. 3e) of IHNV enabled mutants to escape neutralizing monoclonal antibodies. In another study, Troyer et al. (2000) reported observing non-synonymous mutations at IHNV aa 252, 256, 270, 275–277, 284 and 285 (Fig. 3f). The major difference between our study and that of Huang et al. (1996) was their observation of mutations at positions 230 and 231. We saw no amino acid changes at these positions in any of our sequences. Because we did not sequence the whole molecule, we could not observe the state of positions 78 and 81. It has been suggested that passage in cell culture may select for anomalous changes in virus proteins (Novella et al., 2005). The fact that our isolates had not been passaged extensively in culture may explain why we did not observe changes at these positions. The closer agreement between our results and those of Troyer et al. (2000) may be because both that study and ours used isolates that had not been passaged extensively. As it has been suggested previously (Huelsenbeck et al., 2006) that patterns of selection can be different in different lineages, the sequences in each of the main clades shown in Fig. 1 were analysed separately by using MrBayes. The results obtained by using the sequences in clade A–B are shown in Table 2, Fig. 2(b) and Fig. 3(b). In this case, the amino acids with mean posterior probabilities of positive selection .95 % were 220, 256, 270, 272, 277 and 284. No other sequences with a mean posterior probability .70 % were observed. The results when the sequences of clade C–D were analysed separately are shown in Table 2, Fig. 2(c) and Fig. 3(c). In this case, the amino acids with mean posterior probabilities of positive selection .95 % were 247, 252, 256 and 272. Amino acids with mean Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 705 S. E. LaPatra, C. Evilia and V. Winston cells, the sites involved must be located on the outside surface of the virus so that these interactions can occur. To visualize the possible location of positively selected sites on the virus, the 3D modelling software CPHmodels (Lund et al., 2002) was used. This software identified the B chain of the surface protein of VSV (2CMZ.pdb) as the highestscoring template candidate. A 3D model of the IHNV protein was constructed by using the VSV protein as a template. The resulting structures are shown in Fig. 3(a–c), with the corresponding prefusion form of the VSV structure shown in Fig. 3(d) [note that, in Fig. 3(a–c), only the mid-G portion of the IHNV molecule is represented]. In this figure, the molecules are oriented so that the view is from the top, facing directly toward the membrane. Fig. 3(a–c) are labelled to represent sites identified as being under positive selection at the .95 % (red) and .70 % (green) confidence levels by MrBayes. Fig. 3(a) represents sites identified when all members of the dataset were included. Fig. 3(b, c) represent sites identified when only the members of clade A–B or clade C–D, respectively, were used in the analysis. Fig. 3(d) is labelled to reflect the locations of amino acids that change in monoclonal antibody escape mutants of VSV (Vandepol et al., 1986). Fig. 3(e, f) represent the result of mapping sites of IHNV G amino acid changes reported by others (Huang et al., 1996; Troyer et al., 2000) on the prefusion form of the molecule. Fig. 2. Histograms of posterior probabilities that amino acids are under positive selection, obtained by using (a) all of the sequence data that produced the tree shown in Fig. 1; (b) the sequences in clade A–B (Fig. 1); (c) the sequences in clade C–D (Fig. 1). See Table 2 for values. posterior probabilities .70 % were 251, 270 and 277 (Table 2; Figs 2c, 3c.). Further research is needed to determine whether these changes in the virus protein provide the mutated virus with a selective advantage. Structural comparisons If positive selection is arising as a result of the interaction between virus and host, either as a function of antigen– antibody interaction or as a result of enhanced binding to 706 All of the IHNV G sites identified by the Bayesian approach as undergoing positive selection were on or near the top surface of the molecule. This is consistent with the hypothesis that these sites may be involved in interaction with host antibodies. The cluster of IHNV G sites from aa 270 to 277 is in a prominent a-helix (helix E), which should be readily accessible to antibodies. This region corresponds to the VSV A2 region, which is one of the two major epitopic regions of that virus (Fig. 3d). Helix E is also in a region of the molecule that does not change shape as the molecule converts from its prefusion state in the extracellular virion to the pH-activated state in the lysosome (Roche et al., 2007). This would allow for more amino acid substitutions in this region. Amino acid changes in a hinge region, for example, would be more damaging to the function of the protein. Notable is aa 274, which is conserved in all of these sequences. This could indicate that this site is critical for the stabilization of this helix or for the binding of the virus to the host cell. The pattern of sites that were identified as undergoing positive selection by the Bayesian approach was in general agreement with reports of others (Huang et al., 1996; Troyer et al., 2000) (Fig. 3e, f). However, Huang et al. (1996) observed variation at aa 230 and 231 but, in our study, these sites were absolutely conserved. The difference between our study and theirs could be explained by the fact that, in their study, the virus had been passaged repeatedly in culture to produce antibody-escape mutants. It is known that repeated passage in culture can allow amino acid changes that are not observed in vivo (Novella et al., 2005). Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 Journal of General Virology 89 Positively selected sites on G protein of IHNV Fig. 3. Visualization of the possible locations of positively selected sites, obtained by using the 3D modelling software CPHmodels (Lund et al., 2002). (a) Clades A–B and C–D. (b) Clade A–B analysed alone. (c) Clade C–D analysed alone. For (a–c), the IHNV amino acids with a probability of .95 or .70 % (MrBayes) of undergoing positive selection are represented by red and green areas, respectively. (d) VSV epitopes (VSV numbering). (e) IHNV amino acids identified by Huang et al. (1996) as providing resistance to antibody neutralization. (f) IHNV sites identified by Troyer et al. (2000) as sites of non-synonymous mutation. Troyer et al. (2000) reported no changes at these sites, which may reflect the fact that the virus used in that study had not been passaged extensively. Finally, the A1 epitopic region of the VSV structure appeared to share the same general region of the molecule as aa 78 and 81 of IHNV G, identified by Huang et al. (1996) as an epitope of IHNV. http://vir.sgmjournals.org This would suggest that it might be informative if future studies also sequenced this region of this molecule. Further work is needed to explore the suggestions provided by these results. A determination of the 3D structure of the IHNV G protein is needed to confirm the location of the Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 707 S. E. LaPatra, C. Evilia and V. Winston positively selected sites on the surface of the protein identified by the Bayesian approach. If it is possible to identify the regions of the G protein that are undergoing rapid selection, it might be possible to design vaccines whose sequences mirror the specific patterns of change being observed. Conversely, the fact that areas of the protein are conserved may imply that change in these areas is impossible if the virus is to remain viable. Vaccines directed toward these vital regions might be more effective, because the virus is prevented by structural constraints from mutating in these areas. Lund, O., Nielsen, M., Lundegaard, C. & Worning, P. (2002). CPHmodels 2.0: X3M a computer program to extract 3D models. In Abstracts of the CASP5 conference (the Fifth Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction), Asilomar, CA, USA, 2002, A102. http:// www.cbs.dtu.dk/services/CPHmodels/abstract.php Nichol, S. T., Rowe, J. E. & Winton, J. R. (1995). Molecular epizootiology and evolution of the glycoprotein and non-virion protein genes of infectious hematopoietic necrosis virus, a fish rhabdovirus. Virus Res 38, 159–173. Novella, I. S., Gilbertson, D. L., Borrego, B., Domingo, E. & Holland, J. J. (2005). Adaptability costs in immune escape variants of vesicular stomatitis virus. Virus Res 107, 27–34. Oshima, K. H., Arakawa, C. K., Higman, K. H., Landolt, M. L., Nichol, S. T. & Winton, J. R. (1995). The genetic diversity and epizootiology of ACKNOWLEDGEMENTS This work was partially supported by NIH grant P20 RR16454 from the Biomedical Research Infrastructure Network/Idea Network of Biomedical Research Excellence BRIN/INBRE Program of the National Center for Research Resources. The authors acknowledge Luobin Yang for his assistance with software, and Gael Kurath and Ryan Troyer for sharing unpublished sequences. Eric Anderson provided careful reading and helpful comments on the manuscript. George Vidaver provided important guidance on protein structure questions. REFERENCES Domingo, E. (2006). Virus evolution. In Fields Virology, 5th edn, pp. 389–406. Edited by D. M. Knipe & P. M. Howley. Baltimore, MD: Lippincott Williams & Wilkins. Emmenegger, E. J., Meyers, T. R., Burton, T. O. & Kurath, G. (2000). Genetic diversity and epidemiology of infectious hematopoietic necrosis virus in Alaska. Dis Aquat Organ 40, 163–176. Huang, C., Chien, M. S., Landolt, M., Batts, W. & Winton, J. (1996). Mapping the neutralizing epitopes on the glycoprotein of infectious haematopoietic necrosis virus, a fish rhabdovirus. J Gen Virol 77, 3033–3040. Huelsenbeck, J. P. & Dyer, K. A. (2004). Bayesian estimation of positively selected sites. J Mol Evol 58, 661–672. Huelsenbeck, J. P., Jain, S., Frost, S. W. D. & Kosakovsky-Pond, S. L. (2006). A Dirichlet process model for detecting positive selection infectious hematopoietic necrosis virus. Virus Res 35, 123–141. Roche, S., Rey, F., Gaudin, Y. & Bressanelli, S. (2007). Structure of the prefusion form of the vesicular stomatitis virus glycoprotein G. Science 315, 843–848. Staden, R., Beal, K. F. & Bonfield, J. K. (2000). The Staden package, 1998. Methods Mol Biol 132, 115–130. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680. Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). Tordo, N., Benmansour, A., Calisher, C., Dietzgen, R. G., Fang, R.-X., Jackson, A. O., Kurath, G., Nadin-Davis, S., Tesh, R. B. & Walker, P. J. (2005). Family Rhabdoviridae. In Virus Taxonomy: Eighth Report of the International Committee on Taxonomy of Viruses, pp. 623–644. Edited by C. Fauquet, M. A. Mayo, J. Maniloff, U. Desselberger & L. A. Ball. San Diego, CA: Academic Press. Troyer, R. M. & Kurath, G. (2003). Molecular epidemiology of infectious hematopoietic necrosis virus reveals complex virus traffic and evolution within Southern Idaho aquaculture. Dis Aquat Organ 55, 175–185. Troyer, R. M., Lapatra, S. E. & Kurath, G. (2000). Genetic analyses reveal unusually high diversity of infectious haematopoietic necrosis virus in rainbow trout aquaculture. J Gen Virol 81, 2823–2832. Vandepol, S. B., Lefrancois, L. & Holland, J. J. (1986). Sequences of the major antibody binding epitopes of the Indiana serotype of vesicular stomatitis virus. Virology 148, 312–325. in protein-coding DNA sequences. Proc Natl Acad Sci U S A 103, 6263–6268. Yang, Z. (2007). Kurath, G., Garver, K. A., Troyer, R. M., Emmenegger, E. J., EinerJensen, K. & Anderson, E. D. (2003). Phylogeography of infectious Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A.-M. (2000). Codon- haematopoietic necrosis virus in North America. J Gen Virol 84, 803–814. substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449. 708 PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586–1591. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 11 May 2017 23:48:56 Journal of General Virology 89

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Positively selected sites on the surface glycoprotein (G) of infectious