Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
1 Selection on Rev during persistent equine infectious anemia virus infection 2 Wendy O. Sparks1, Karin Dorman,2,3 Sijun Liu1, and Susan Carpenter*1,4 3 4 5 6 1 Department of Veterinary Microbiology and Preventive Medicine, 2Department of Genetics, 7 Development and Cell Biology, 3Department of Statistics, Iowa State University, Ames, IA, 8 50011 and 4Department of Veterinary Microbiology and Pathology, Washington State 9 University, Pullman, WA 99164 10 11 12 *Corresponding author 13 Susan L. Carpenter, Dept. of Veterinary Microbiology & Pathology, Washington State 14 University, Pullman, WA 99164-7040 15 E-mail: [email protected] 16 Telephone: 509-335-6043 17 FAX: 509-335-8529 18 19 Running title: EIAV Rev selection in vivo 20 Total # words in summary: 240 21 Total # words in main text and summary: 5397 22 Total # of figures and tables: 5 23 131 24 SUMMARY 25 26 Longitudinal analyses of Rev variation in horses infected with equine infectious anemia virus 27 (EIAV) have revealed the presence of two subpopulations of Rev that co-existed and differed in 28 genotype and phenotype. To better understand the role of Rev variation in EIAV persistence, 29 computational and genetic analyses were used to examine Rev selection and fitness in vivo. Rev 30 evolution is complicated by the fact that it overlaps with the transmembrane protein coding 31 region, so we developed a novel technique for quantitating selection in both reading frames. 32 Overall, the Rev protein was highly conserved, with purifying selection dominating evolution of 33 Rev in a sample of over 300 clones. However, mutations nonsynonymous in both reading frames 34 were surprisingly well tolerated especially among the most frequently sampled mutations. To 35 investigate whether the most common nonsynonymous mutations could modulate Rev 36 phenotype, we studied the phenotypic effect of ten amino acid mutations observed at a frequency 37 greater than 10% in the sample population. Nine of the 10 mutations were found to significantly 38 alter Rev nuclear export activity, either as single mutations or in the context of cumulatively 39 fixed mutations. These results indicate that limited genetic variation in Rev can result in 40 significant phenotype changes that may confer a selective advantage in vivo. Indeed, special 41 sites, nonessential in both reading frames, may be especially tolerant of genetic variation. The 42 resulting phenotypic variation in Rev may be an important mechanism of immune evasion and 43 lentivirus persistence in vivo.INTRODUCTION 44 45 46 Equine infectious anemia virus (EIAV) is a member of the lentivirus genus within the family of retroviruses. EIAV has the characteristic features shared by all lentiviruses, such as a 231 47 complex genome, tropism for cells of the monocyte/macrophage lineage, and lifelong persistent 48 infection. However, it is unique among lentiviruses in that it results in a dynamic disease course 49 characterized by recurrent cycles of fever, viremia and thrombocytopenia. Most animals 50 eventually gain control of viral replication, progressing to a clinically inapparent stage of 51 disease, yet remaining inapparent carriers of the virus for life. The dynamics of clinical disease 52 and immune control make EIAV a good model to study the role of both host and viral 53 mechanisms contributing to lentiviral persistence and pathogenesis. 54 High genetic variation has been observed in the EIAV rev/tm overlapping reading frames 55 of the EIAV genome, which encode the regulatory protein Rev and the cytoplasmic tail of the 56 transmembrane (TM) protein (Alexandersen and Carpenter, 1991; Leroux et al., 1997, Belshan et 57 al., 1998). Rev is an essential regulatory protein that acts to transport partially spliced and 58 unspliced RNA into the cytoplasm. These RNAs are translated into the structural proteins 59 necessary for replication, and also provide full-length viral genomes for encapsidation. Variation 60 in HIV-1 Rev has been shown to down-regulate the expression of viral late genes and alter 61 sensitivity to Gag-specific cytotoxic T lymphocytes (CTL) (Bobbitt et al., 2003). In addition, 62 CTL epitopes have been identified within HIV-1 Rev (Aldo et al., 2001), as well as within EIAV 63 Rev (Mealey et al., 2003). In EIAV infected horses, nonprogressors exhibited a strong-avidity 64 CTL response to epitopes within Rev, while progressors did not (Mealey et al., 2003). Genetic 65 changes within rev may facilitate immune evasion directly by altering CTL epitopes in Rev, 66 and/or indirectly through altering Rev nuclear export activity and decreasing expression of 67 structural proteins and virion production. 68 Previously, we undertook longitudinal analyses of EIAV Rev variation throughout a clinically 69 dynamic disease course in one pony experimentally infected with the virulent EIAVWYO2078 331 70 (Belshan et al., 2001; Baccam et al., 2003). This pony exhibited a classical disease course, with 71 an acute stage of disease followed by a chronic stage of recurrent febrile episodes, which 72 decreased in frequency and severity over time. Concurrent with maturation of the humoral 73 immune response, this pony entered the inapparent stage of disease, which was then followed by 74 two late febrile episodes. Phylogenetic and partition analyses identified multiple subpopulations 75 of EIAV Rev that were independently evolving, coexisted throughout disease, and exhibited 76 different phenotypes (Baccam et al., 2003). Moreover, these phenotypically distinct 77 subpopulations fluctuated in dominance in a manner coincident with disease state, such that the 78 subpopulation with high Rev phenotype was dominant during the chronic and late chronic stages 79 of disease, whereas, a subpopulation with lower Rev phenotype was dominant during the 80 inapparent stage of disease. These studies indicated that in vivo selection on EIAV may drive 81 genetic and phenotypic variation in Rev. In the present study, we used genetic and biological 82 analyses to characterize the evolution and selection of Rev variants in vivo. We conclude that 83 although Rev is under largely purifying selection, there is enough genetic flexibility, possibly in 84 nonessential regions of both Rev and TM, to substantially modulate Rev phenotype. Selection at 85 these sites could contribute to virus escape in the face of a maturing immune 86 response.MATERIALS AND METHODS 87 88 Experimental infections and identification of Rev variants The virulent Wyoming strain of EIAV was used to infect pony 524, and sequential sera 89 samples were collected from different stages of clinical disease as previously described (Belshan 90 et al, 2001). The inoculum has been maintained by serial in vivo passage and contains a 91 heterogeneous population of EIAV, similar to a natural infection. Virion RNA was isolated from 92 the inoculum and sera samples collected post infection. The rev exon 2/tm overlapping region of 431 93 EIAV was amplified via RT-PCR. A total of 61 rev clones were obtained from the inoculum and 94 23-25 clones from each of 11 sera samples taken at 12, 35, 67, 89, 118, 201, 289, 385, 437, 754, 95 and 800 dpi for a total sample population of 320 clones. All nucleotide sequences were 96 translated in the Rev open reading frame. Amino acid variants were named in the order they 97 were identified, with identical variants given the same name, e.g. R1. Nucleotide variants were 98 named based on the corresponding amino acid variant name, e.g. R1A, R1B, etc. 99 100 101 Genetic analysis of Rev variation and evolution The consensus rev sequence from the inoculum was calculated using Bioedit 5.0.9 (Hall, 102 1999), and corresponded to the variant R1A. The amino acid consensus sequence of the 103 inoculum corresponded to the Rev amino acid variant R1. This variant was used for comparison 104 in both computational and biological analyses. 105 To test for evidence of selection on the Rev sequences, we developed a novel technique 106 inspired by the Nei & Gojobori (1986) method (NG method) for non-overlapping reading frames 107 (manuscript in preparation). Because rev overlaps with the tm reading frame, we classified 108 mutations into four distinct selection classes: double synonymous (SS), double nonsynonymous 109 (NN), synonymous in rev and nonsynonymous in tm (SN), or nonsynonymous in rev and 110 synonymous in tm (NS). Transitions are much more common than transversions; therefore, we 111 also distinguished two subclasses within each of the four selection classes. In the NG method, 112 the number of observed mutations in each class is computed and compared to the number of 113 opportunities for each type of mutation. 114 The connection between observed mutations and opportunities for mutations was made 115 through various plausible models. The neutral model was a fully specified model with no free 531 116 parameters. It assumed that each type of mutation occurs in direct proportion to the number of 117 opportunities it was given. As a very simple example, suppose you observe a single nucleotide 118 A. It has two opportunities to experience a transversion (to C or T) and one opportunity to 119 experience a transition (to G), thus we expect 2/3 of the observed mutations at this site to be 120 transversions. We considered more complex models with up to four parameters st ,sr ,str ,sv , 121 where st is the selection coefficient acting on SN type changes (nonsynonymous in TM), sr 122 against NS changes (nonsynonymous in Rev), str against NN changes, and sv against 123 transversions. Transversion selection was presumed multiplicative with protein selection, 124 justified by the fact that selection against transversions is actually a mechanistic bias occurring at 125 the level of replication and independent of protein selection. Again, as a simple example, 126 suppose the transversion A to C is a type SN mutation. Then the fitness of the C mutant relative 127 to any double synonymous (SS) transition is 1 st 1 sv . Fitness effects of mutations at 128 multiple sites were assumed independent. 129 We describe two methods for computing observed mutation counts. The original NG 130 method compares pairs of sequences and records the observed number of mutations in each 131 mutation class by averaging over all possible mutation pathways; counts may be fractional 132 because of averaging. Because there was insufficient variation in any pair of sequences to 133 perform statistical tests, we summed pairwise observed counts over all pairs of sequences, and 134 divided by the total number of mutations across all pairwise comparisons to obtain observed 135 mutation probabilities, p c for class c . We used the expected number of unique mutations in 136 each class as the observed data. Since the total number of unique mutations in the data set was 137 126 (after removing 4 mutations to stop codons in rev and 2 in tm), we rounded 126 p c to the 138 nearest integer to obtain the observed number of mutations in class c . By limiting the total 631 139 number of mutations to 126, we neglected the possibility of parallel or back substitutions and 140 therefore obtained a minimum estimate of the observed number of mutations. We call this 141 method for obtaining observed mutation counts the pairwise comparison method. 142 The above method may give excess importance to mutations appearing deep in the 143 evolutionary tree, because many pairs of sequences differ by these mutations. Given the low 144 diversity of this data set, it was possible to identify 60 non-overlapping blocks of mutations. 145 Within blocks, mutations are dependent because of codons and overlapping reading frames, but 146 between blocks, mutations were assumed independent. Of these 60 blocks, only six blocks had 147 variants with two or three simultaneous mutations different from the block consensus; some 148 blocks had more than one distinct multiply mutated variant. In all cases, it was possible to 149 identify a most parsimonious pathway for accumulation of these mutations because intermediate 150 mutants were always available in our sample. For the seven cases where there were multiple 151 intermediates and thus multiple equally parsimonious pathways to a multi-mutant, six cases had 152 one intermediate far more prevalent than the other, and we assumed the mutation pathway with 153 most prevalent intermediates. In the last case, where one intermediate was present in one copy 154 and the other in two, we gave each pathway equal probability and averaged over pathways, 155 resulting in ½ count each in the NS and SN categories. In the end, we identified 137 mutational 156 events, a few more than the number of unique mutations (126) because seven mutations 157 happened in two contexts and one mutation happened in three contexts. We also fit these 137 158 mutants to the selection models described above in what we call the most parsimonious 159 reconstruction method. 160 161 Given observed mutation counts, we then maximized the likelihood of observing the expected mutations over all or some of the parameters and compared nested models using the 731 162 likelihood ratio test (LRT) statistic (Casella & Berger, 2001). For comparing non-nested models, 163 say M1 and M2, of equal complexity, we performed parametric bootstrap to obtain an empirical 164 approximation of the sampling distribution of the likelihood ratio under M1. We then computed 165 the p-value to reject M1 as the proportion of times the likelihood ratio fell below the observed 166 likelihood ratio (Coulibaly and Brorsen 1999). We caution that the maximum likelihood 167 estimates of selection coefficients may be biased (Ina 1995, Yang 2000), but the biases tend to be 168 small for sequences with low diversity (Nei & Gojobori 1986). While parameter estimates 169 varied somewhat (within the reported confidence intervals), the results of model comparisons 170 were consistent for several different methods of computing observed and opportunity 171 distributions. For example, in the pairwise comparison method when codons are mutated at two 172 positions, either mutation may have occurred first and can impact how the mutations are 173 classified. We obtained the same results when assuming one mutation order (selected at random) 174 vs. averaging over all orders. 175 176 177 178 Assays of Rev nuclear export activity 179 A Rev expression vector was constructed by replacing the second exon of pRevWT 180 (Belshan et al., 1998) with R1, the consensus sequence in the inoculum. Specific mutations were 181 introduced into the RI background using PCR-based mutagenesis, and all mutations were 182 confirmed by sequencing. The Rev nuclear export activity of each of the mutants was 183 determined in transient transfection assays using the pDM138-based CAT reporter plasmid 184 pERRE-All, which contains the EIAV RRE (nt 5280-7534) and the chloramphenicol 831 185 acetyltransferase (CAT) gene as previously described (Belshan et al., 1998; Harris et al., 1998). 186 Briefly, 293 cells were seeded in triplicate at 1-5x105 cells/well in 6-well tissue culture dishes, 187 and transfected with 0.2 g of pERRE-All, 0.2g of -galactosidase reporter plasmid pCH110 188 (Pharmacia, Uppsala, Sweden), 1 g of Rev expression plasmid or empty vector along with 0.60 189 g pUC19 for a total of 2 g DNA per reaction. Each experiment included a sham group that 190 contained no reporter plasmid, but an additional 0.2 μg of pUC19. At 48 hours post transfection, 191 cells were harvested in phosphate-buffered saline (PBS) containing 0.5 mM EDTA, pelleted, and 192 resuspended in 500 l 0.25 M Tris, pH 7.5, and lysed by three rounds of freeze-thawing. Cell 193 lysates were assayed for -galactosidase activity and these values used to normalize for 194 transfection efficiency. Cell lysates were assayed for CAT enzyme using a commercial CAT 195 ELISA kit (Roche Molecular Biochemicals, Indianapolis, IN). Experiments were performed in 196 triplicate, and results represent at least 6 independent transfections. Statistical analysis was 197 performed using analysis of variance (ANOVA) and student’s t-test assuming unequal variance 198 among groups. 199 200 201 202 Nucleotide sequence accession numbers GenBank accession numbers are AF314257 to AF314404RESULTS 203 204 EIAV Rev variation in vivo 205 To accurately reflect the genetic diversity of an in vivo infection, pony 524 was 206 inoculated with the highly virulent Wyoming strain of EIAV (Belshan et al., 2000), which has 207 been maintained by serial in vivo passage (Oaks et al., 1998). Following experimental infection, 931 208 pony 524 experienced a variable clinical disease course characterized by recurring fever cycles 209 interspersed with afebrile periods ranging from days to months. In the data set of 320 rev clones, 210 there were 146 unique nucleotide variants and 99 unique amino acid variants. This included 61 211 clones from the inoculum, with 39 unique nucleotide variants and 25 unique amino acid variants. 212 The amino acid variant R1, was the consensus sequence of the inoculum, and the most frequently 213 observed variant overall. All genotypic and phenotypic analyses were performed relative to 214 R1A, which was the dominant nucleotide sequence encoding the amino acid variant R1. 215 216 217 Purifying selection dominates evolution of both Rev and TM Analyses of Rev evolution are complicated by the fact that the second exon of Rev, 218 which contains the functional domains, (Fridell et al., 1993; Lee et al., 2006) overlaps with the 219 cytoplasmic tail of the transmembrane protein coding region. We tested various models of dual- 220 protein selection to explain all non-stop codon mutations observed in the sample of 320 221 nucleotide sequences. Tables 1 and 2 display the models in order of increasing complexity or 222 degrees of freedom for pairwise comparison of sequences (Table 1) and the most parsimonious 223 reconstruction of mutations (Table 2); only the best-fitting model at each level of complexity is 224 shown. The parameters that can be included in each model are the selection coefficient against 225 transversions sv, the selection coefficient against nonsynonymous changes in TM st, the selection 226 coefficient against nonsynonymous changes in Rev sr, and the selection coefficient against 227 double nonsynonymous changes in both reading frames str. These parameters generally range 228 from zero, indicating no selection, to one, indicating maximal negative selection against change; 229 negative values indicate positive selection. NE means the parameter was not estimated in this 230 model, i.e. the corresponding selection coefficient was set to zero. NA means the parameter does 1031 231 not exist in this model. For each level of complexity, i=0, 1, 2, 3, and 4, we fit all possible 232 models with i parameters. We also tested multiplicative selection models where str=srst 233 wherever applicable. 234 The neutral model assumed no mutation bias or selection. Under the pairwise sequence 235 comparison method, Table 1 shows the neutral model fit significantly worse than the one- 236 parameter model where transitions were highly favored over transversions (p-value < 0.0001). 237 The transition model, in turn, fit significantly worse than the best two-parameter selection model, 238 which selected against single frame nonsynonymous mutations, but treated double 239 nonsynonymous mutations as neutral (p-value < 0.0001). Finally, the best three-parameter 240 model implied strong selection in the Rev reading frame but left double nonsynonymous 241 mutations essentially “neutral” (p-value 0.03). An alternative three-parameter model that could 242 not be rejected using the bootstrap test set st 0 and estimates sr 0.53, str 1.01, sv 0.99 , 243 emphasizing again the strong selection against change in Rev and the apparent selection for 244 double nonsynonymous mutations, especially relative to single nonsynonymous changes. The 245 best fitting three-parameter model fit no worse than the full four-parameter model or the most 246 general seven-parameter model (p-values >0.4). 247 The above analysis generated information about the observed mutations by comparing 248 pairs of sequences. The approach gave excess weight to mutations occurring deep in the 249 phylogenetic tree, since many pairs of sequences differed by these early mutational events. 250 Assuming no temporal change in selection, old mutations should display the same patterns as 251 recent mutations, but the presence of relatively few old mutations means random variation in 252 patterns can be accentuated by the pairwise analysis. To overcome this bias, we reconstructed 253 the most parsimonious series of mutational events by examination of the data. Most mutations 1131 254 occurred in a single local context in the reconstruction. Eight mutations occurred in multiple 255 contexts and each occurrence counted as one observed mutation. Although this approach avoids 256 excess weighting of deep branch mutations, the results are conditional on the hypothesized 257 pathways for accumulation of mutations. In addition, the separation of mutations into 258 independent blocks could under-count mutations that appeared in the same local sequence 259 background but distinct global backgrounds. As we observed for the pairwise comparison 260 method, the results in Table 2 indicated that the best fitting one-parameter model involved a 261 significant selection coefficient against transversions (p-value < 0.0001). The best two- 262 parameter selection model equally disfavored Rev nonsynonymous and double nonsynonymous 263 mutations, leaving changes in TM effectively neutral (pvalue 0.03). Several nearly equivalent 264 fits at this level suggested that the protein selection coefficients st , s r , str were statistically 265 indistinguishable and substantially greater than sv . Trends in the relationship between these 266 parameters are shown by the fit of the four-parameter model, but again, these differences were 267 not statistically supported by the data. No three-parameter or greater complexity model fit this 268 data better than the best-fitting two-parameter model. 269 Both tables include bootstrap-derived confidence intervals for each estimated parameter. 270 “Selection” (selection coefficient 0.93 to 0.99) against transversions was powerful and 271 consistently estimated across all models and methods. Selection against Rev (selection 272 coefficient 0.32 or 0.77) tended to be stronger than selection against TM (selection coefficient 273 0.0 or 0.6), though the confidence intervals did not rule out equal effects. In both the two- and 274 three-parameter model based on the pairwise analysis, double nonsynonymous mutations were 275 effectively neutral and favored over single-frame nonsynonymous mutations. In the 276 parsimonious reconstruction, double nonsynonymous mutations were about as disfavored as any 1231 277 single nonsynonymous mutation. Comparison of these analyses suggested a difference between 278 high and low frequency mutations. A statistical test for equal distributions of mutation type 279 between high frequency (>0.10) and low frequency (<0.10) mutations indicated double 280 nonsynonymous mutations were highly over-represented among high frequency mutations (p- 281 value: 0.001252). Together, the tests showed that selection acts to suppress amino acid change 282 in both reading frames, and especially in Rev, but that double nonsynonymous mutations were 283 particularly well-tolerated to become the most frequent mutation type observed. 284 285 Single amino acid substitutions significantly alter Rev phenotype. 286 Ten nucleotide mutations were present at a frequency greater than 0.10 of the total 287 population (Fig 1A). Surprisingly, given the selection against amino acid change in Rev, nine of 288 these mutations were nonsynonymous in Rev. All but one were also nonsynonymous in TM, and 289 this Rev codon experienced a second high frequency mutation that was nonsynonymous in both 290 reading frames. Except for this case, only one amino acid variant dominated at all the sites with 291 frequent nonsynonymous mutations (Fig 1B). Therefore, we identified nine highly variable 292 amino acid positions in Rev, which resulted in ten different amino acid variants. 293 To examine the effect of amino acid changes on Rev phenotype, each of the ten single 294 amino acid mutations was introduced in the backbone of R1, the consensus of the inoculum (Fig. 295 2A). Rev nuclear export activity was quantified using transient transfection assays and activity 296 was normalized relative to R1. Seven of the ten amino acid mutations significantly altered Rev 297 phenotype: six increased Rev activity and one decreased Rev activity (Fig. 2B). The only 298 mutations located within a known functional domain of Rev (Fridell et al., 1993; Mancuso et al., 299 1994; Harris et al., 1998; Lee et al., 2006) were the changes at position 55, at the C-terminal end 1331 300 of the nuclear export signal. Both S55L and S55P significantly increased Rev activity as 301 compared to R1. Interestingly, three of the four mutations located within a non-essential region 302 of Rev, amino acids 131-143 (Lee et al., 2006) resulted in significant changes in activity: 303 D135G and Q138R increased activity whereas G134D decreased activity. These findings 304 indicate that EIAV Rev nuclear export activity is highly sensitive to point mutations, even those 305 that occur outside known functional domains. Further, the majority of mutations observed at a 306 high frequency in vivo were sufficient to cause significant changes in Rev nuclear export 307 activity. 308 309 310 Fixation of preexisting and in vivo mutations in EIAV Rev To gain further insight into how the genetic mutations were related to the temporal 311 evolution and selection of EIAV Rev, we examined the mutations over time. R1 was the 312 dominant variant in the inoculum, as well as during the acute and inapparent stages of disease. 313 Four of the ten high frequency amino acid mutations observed in vivo were preexisting in the 314 inoculum and persisted throughout the course of disease. These included S55L, G134D, D135G 315 and Q138R. The V112A mutation was observed near the end of the acute stage of disease in the 316 background of the G134D mutation. Although no further high frequency mutations were 317 observed in this background, G134D/V112A variants persisted through the last time point of the 318 inapparent stage of disease. Variants containing the S55L mutation were observed in the 319 inoculum, as well as the acute and inapparent stages of disease. The persistence of S55L was 320 due primarily to the recurrence of variants that had been observed previously, and not to new 321 variants which had the S55L mutation. The remaining five mutations arose during the course of 322 infection and were fixed in the background of the D135G/Q138R mutations. The mutation 1431 323 R127K appeared at dpi 118 in the chronic stage of disease, followed by G110D and V105A at 324 dpi 201 in the inapparent stage of disease, and culminating with the simultaneous appearance of 325 S55P and R143H at dpi 754, during the late chronic stage of disease. At dpi 800, 91% of the 326 variants sampled contained these 7 amino acid changes, which resulted in a significant increase 327 in Rev nuclear export activity (Belshan et al., 2001). 328 It was of interest to determine if the cumulative fixation of mutations in the 329 D135G/Q138R background conferred greater fitness, as indicated by higher Rev activity. A 330 series of constructs were created that reflected the appearance and fixation of the high frequency 331 mutations through 800 days post infection (Fig. 3A). Rev phenotype was quantified in transient 332 expression assays and results were expressed as activity relative to R1 (Fig. 3B). Evo-1 contains 333 the Q138R mutation in the backbone of R1 and showed a significant increase in Rev activity, to 334 183. Evo-2 adds the D135G mutation, while constructs Evo-3 through Evo-6 represent the 335 cumulative fixation of the five remaining mutations in the backbone of Evo-2. The cumulative 336 fixation of high frequency mutations resulted in Rev activity significantly greater than the variant 337 R1; however, there did not appear to be selection for ever-increasing relative Rev activity. 338 In several instances, the effect of specific mutations on Rev phenotype was dependent on 339 the sequence context of the mutation. R127K and V105A showed no effect on Rev activity 340 when introduced singly in the backbone of R1 (Fig. 2); however, R127K significantly decreased 341 activity in the context of the cumulative mutations Q138R and D135G (Fig. 3B) and V105A 342 significantly increased activity when added to the background on the Evo-4. The G134D 343 mutation significantly decreased activity of R1 (Fig 2), but resulted in a significant increase in 344 activity in the background of Evo-1 (Fig 3). These results suggest specific sites in EIAV Rev 345 can not only accommodate genetic variation, but that the effect of variation can have a positive, 1531 346 negative or neutral effect on Rev phenotype, depending on the sequence context of that change. 347 These phenotypic assays provide experimental support for our hypothesis that special sites in 348 constrained regions of the virus genome may be permissive for genetic and phenotypic variation. 349 350 351 1631 352 353 DISCUSSION Lentiviruses are characterized by high rates of mutation, recombination, and replication, 354 resulting in diverse populations of viral variants that rapidly adapt to changes in the host 355 environment (Coffin, 1995). Understanding the virus and host factors that shape the evolution 356 and selection of viral variants in vivo is an essential component of preventive and therapeutic 357 strategies to control lentivirus infections. Previously, we identified genetic and phenotypic 358 variation in Rev coincident with changes in clinical stages of EIA, and suggested that Rev 359 phenotype contributes to variant selection in vivo (Belshan et al., 2001; Baccam et al., 2003). 360 Here, we examined in more detail the genetic variation in rev and its effect on Rev phenotype in 361 order to further understand the evolution and selection of Rev during disease progression. Within 362 the population of 320 Rev clones, 121 of 135 amino acid positions varied in less than 2% of 363 sequences, and 70 amino acid positions were 100% conserved. However, ten amino acid 364 positions varied in more than 10% of the sequences, and changes at nine positions resulted in 365 significant changes in Rev nuclear export activity. Both Rev and the overlapping region of TM 366 were overall subject to purifying selection, with Rev somewhat more highly selected than TM. 367 Interestingly, despite widespread purifying selection, mutations that were nonsynonymous in 368 both reading frames were, on average, highly tolerated. Especially among the common 369 mutations, double nonsynonymous mutations appeared to be effectively neutral, like double 370 synonymous mutations. We hypothesize that there are specialized sites that can mutate without 371 severe consequence in either reading frames, and that these sites may be selected in vivo. If 372 these sites also modulate activity of one encoding protein without disrupting function in the 373 second reading frame, they provide a mechanism for the virus to diverge functionally, despite 374 heavy selective constraints in regions of overlapping reading frames. 1731 375 The variation of Rev in pony 524 was dominated by the presence of four mutations that 376 pre-existed in the inoculum, and six mutations that arose throughout the course of disease in 377 vivo. Nine of these ten mutations, including the six novel mutations, were specific to the 378 previously describe subpopulation A, which had significantly higher Rev activity and was the 379 dominant population during recurrent febrile episodes of EIA (Belshan et al., 2001; Baccam et 380 al., 2003). Evolution of subpopulation A during disease was best characterized by two mutations, 381 Q138R and D135G, present in the inoculum and five mutations that occurred in vivo during 382 subsequent febrile episodes; the two other mutations seemed to be evolutionary dead-ends. Pre- 383 existing mutations, Q138R and D135G, both alone and together conferred a dramatic increase in 384 Rev activity relative to R1 (Figs. 2 and 3). Many of the five novel mutations that arose during 385 infection substantially altered Rev phenotype, but none significantly decreased Rev activity 386 below that of Q138R or altered phenotype more than Q138R alone in the backbone of R1. The 387 maintenance of high Rev activity despite continued mutation suggests that high Rev activity is 388 important for the virus, especially during febrile stages of disease. In pony 524, pre-existing 389 mutant Q138R was critical for achieving high Rev activity, but it is clear from our single mutant 390 analyses (Fig. 2) that an arginine at position 138 is not necessary for the high Rev phenotype. 391 Indeed, there are a number of variable positions where a single amino change was found 392 significantly alter Rev phenotype. The presence of multiple mutational pathways to high Rev 393 activity confers flexibility on a protein whose evolution is constrained by an overlapping reading 394 frame and occasional immune epitopes. 395 Rev nuclear-export activity is dependent on several defined functional domains that 396 mediate protein-protein or protein-RNA interactions essential for nuclear import, RNA-binding 397 and interaction with Crm-1. The highly variable amino acid positions observed in vivo were 1831 398 found outside the known functional domains of EIAV Rev, which varied in less than 2% of the 399 sequences. In fact, four of the 10 variable positions were located in a region found to be non- 400 essential for Rev nuclear export activity (Lee et al., 2006). Nonetheless, nine of the 10 amino 401 acid changes that occurred at a high frequency in vivo were found to significantly alter Rev 402 nuclear export activity, either as single mutations or in the context of cumulatively fixed 403 mutations. Further, three of the four changes within the non-essential region significantly 404 increased, or decreased, nuclear export activity. The non-essential region may function as a 405 regulatory domain, allowing a high rate of genetic variation that modulates, but does not 406 eliminate, an activity essential for virus replication. 407 Rev overlaps the intracytoplasmic tail (ICT) of TM, and selection may act on 408 nonsynonymous changes in TM. The ICT of lentiviruses is unusually long and analyses of 409 primate lentiviruses indicate the ICT affects multiple steps in virus replication, including 410 infectivity, cytopathicity, and assembly (Lee et al., 1989; Gabuzda et al., 1992; Dubay et al., 411 1992; Kalia et al., 2003, Freed and Martin, 1996; Cosson, 1996). In addition, the ICT has been 412 shown to be a locus for SIV attenuation in vivo (Shackeltt et al., 2001; Fultz et al., 2001). The 413 domains of ICT that mediate these various functions are not well defined. Functional motifs 414 identified in the ICT include the endocytotic sequence motifs, YXXL and di-leucine sequences 415 (Boge et al., 1998; Wyss et al., 2001). In addition, amphipathic -helical domains designated as 416 lentivirus lytic peptides (LLP-1 and LLP-2) play distinct roles in lentivirus infectivity and 417 fusogenicity (Kalia et al., 2003). Little information is available regarding functional domains of 418 the EIAV ICT, nor the role of the ICT in virus replication. The EIAV TM contains a proteolytic 419 cleavage site, and viruses producing a truncated TM were found to be more infectious in vitro 420 than wild-type viruses (Rice et al., 1990). Limited analyses of rev/tm variants in the context of 1931 421 infectious molecular clones correlated Rev activity in transient expression assays with replication 422 phenotype in vitro (Baccam et al., 2003). However, it is likely that at least some of the variants 423 would alter phenotype due to nonsynomymous changes in TM. Further characterization of the 424 replication and antigenic phenotype of Rev/TM variants will provide further insight into the 425 virologic and immunologic factors important in lentivirus selection and persistence in vivo. 426 The results of the phenotype analyses provide experimental support of our hypothesis that 427 specialized sites in constrained regions of the viral genome allow limited genetic variability that 428 can alter phenotype and confer a selective advantage in vivo. Importantly, the fact that so many 429 of the high frequency mutations induced measurable phenotypic differences suggests that their 430 abundance may be at least partly explained by selection. Evaluation of the phenotypic effects of 431 minor variants, would help establish this theory. Further support for a role of selection is found 432 in the observations that phenotypes with high Rev activity were dominant during febrile periods, 433 while Rev variants with lower activity were predominant during the inapparent stages of 434 infection (Belshan et al., 2001; Baccam et al., 2003). Although new mutations continued to 435 accumulate during febrile episodes, they did not progressively increase Rev activity, rather 436 maintaining a consistent, high level of Rev activity relative to the R1 variant that dominated the 437 inoculum and inapparent disease. The selective advantage of the observed variation in Rev is not 438 clear, but may include direct immune evasion resulting from down-regulation of structural gene 439 expression. If so, inapparent disease may indicate a healthy immune response that requires 440 evasion through decreased Rev activity, while febrile episodes mark immune escape allowing 441 vigorous virus production provided by accelerated Rev activity. 442 443 2031 444 ACKNOWLEDGEMENTS 445 The authors thank Yvonne Wannemuehler and Susan Vleck for excellent technical assistance. 446 This work was supported in part by funding from the National Institutes of Health grant 447 CA97936 and the National Research Initiative of the USDA Cooperative State Research, 448 Education and Extension Service grant number 2002-35204-12699. WOS was partially 449 supported by USDA HEP National Needs Fellowship 2000-3842-8824. REFERENCES 450 Addo, M.M., Altfeld, M., Rosenberg, E.S., Eldridge, R.L., Phillips, M.N., Habeeb, K., 451 Khatri, A., Brander, C., Robbins, G.K., Mazzara, G.P., Goulder, P.J.R., Walker, B.D. and 452 the HIV Controller Study Collaboration. (2001). The HIV-1 regulatory proteins Tat and Rev 453 are frequently targeted by cytotoxic T lymphocytes derived from HIV-1-infected individuals. 454 Proc. Natl. Acad. Sci. USA 98, 1781-1786. 455 456 Alexandersen, S., and Carpenter, S. (1991). Characterization of variable regions in the 457 envelope and S3 open reading frame of equine infectious anemia virus. J. Virol. 65, 4255-4262. 458 459 Baccam, P., Thompson, R.J., Li, Y., Sparks, W. O., Belshan, M., Dorman, K. S., 460 Wannemuehler, Y., Oaks, J. L., Cornette, J. L. and Carpenter, S. (2003). Subpopulations of 461 equine infectious anemia virus Rev coexist in vivo and differ in phenotype. J. Virol. 77, 12122- 462 12131. 463 464 Belshan, M., Baccam, P., Oaks, J. L., Sponseller, B. A., Murphy, S. C. , Cornette, J. and 465 Carpenter, S. (2001). Genetic and biological variation in equine infectious anemia virus Rev 2131 466 correlates with variable stages of clinical disease in an experimentally infected pony. Virology. 467 279, 185-200. 468 469 Belshan, M., Harris, M. E., Shoemaker, A. E., Hope, T. J. and Carpenter, S. (1998). 470 Biological characterization of Rev variation in equine infectious anemia virus. J. Virol. 72, 4421- 471 4426. 472 Bobbitt, K. R., Addo, M. M., Altfeld, M., Filzen, T., Onafuwa, A. A., Walker, B. D. and 473 Collins, K. L. (2003). Rev activity determines sensitivity of HIV-1-infected primary T cells to 474 CTL killing. Immunity 18, 289-299. 475 476 Casella, G. and Berger, R. L. (2001) Statistical Inference. Duxbury Press, Belmont, CA. 477 478 Coffin, J. M. (1995). HIV population dynamics in vivo: implications for genetic variation, 479 pathogenesis, and therapy. Science 267, 483-489. 480 481 Cosson, P. (1996). Direct interaction between the envelope and matrix proteins of HIV-1. 482 EMBO J. 15, 5783-5788. 483 484 Coulibaly, N., and Brorsen, B.W. (1999). Monte Carlo sampling approach to testing nonnested 485 hypotheses: Monte Carlo results.. Econometric Reviews 18,195-209. 486 487 de Oliveira, T., Salemi, M., Gordon, M., Vandamme, A.-M., van Rensburg, E. J., 488 Engelbrecht, S., Coovadia, H. M. and Cassol, S. (2004). Mapping sites of positive selection 2231 489 and amino acid diversification in the HIV genome: an alternative approach to vaccine design? 490 Genetics 167, 1047-1058. 491 492 Dubay, J.W., Roberts, S.J., Hahn, B.H. and Hunter, E. (1992). Truncation of the human 493 immunodeficiency virus type 1 transmembrane glycoprotein cytoplasmic domain blocks virus 494 infectivity. J. Virol. 66, 6616-6625. 495 496 Freed, E.O. and Martin, M.A. (1996). Domains of the human immunodeficiency virus type 1 497 matrix and gp41 cytoplasmic tail required for envelope incorporation into virions. J. Virol. 70, 498 341-351. 499 500 Fridell, R. A., Partin, K. M., Carpenter, S. and Cullen, B. R. (1993). Identification of the 501 activation domain of equine infectious anemia virus rev. J. Virol. 67, 7317-7323. 502 503 Fultz, P.N., Vance, P.J., Endres, M.J., Tao, B., Dvorin, J.D., Davis, I.C., Lifson, J.D., 504 Montefiori, D.C., Marsh, M., Malim, M.H. and Hoxie, J.A. (2001). In vivo attenuation of 505 simian immunodeficiency virus by disruption of a tyrosine-dependent sorting signal in the 506 envelope glycoprotein cytoplasmic tail. J. Virol. 75, 278-291. 507 508 Gabuzda, D.H., Lever, A., Terwilliger, E. and Sodroski. J. (1992). Effects of deletions in the 509 cytoplasmic domain on biological functions of human immunodeficiency virus type 1 envelope 510 glycoproteins. J. Virol. 66, 3306-3315. 511 2331 512 Hall, T. A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis 513 program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 45, 95-98. 514 515 Harris, M. E., Gontarek, R. R., Derse, D. and Hope, T. J. (1998). Differential requirements 516 for alternative splicing and nuclear export functions of equine infectious anemia virus Rev 517 protein. Mol. Cell. Biol. 18, 3889-3899. 518 519 Ina, Y. (1995). New methods for estimating the numbers of synonymous and nonsynonymous 520 substitutions. J. Mol. Evol. 40, 190-226. 521 522 Kalia, V., Sarkar, S., Gupta, P. and Montelaro, R.C. (2003). Rational site-directed mutations 523 of LLP-1 and LLP-2 lentivirus lytic peptide domains in the intracytoplasmic tail of human 524 immunodeficiency virus type 1 gp41 indicate common functions in cell-cell fusion but distinct 525 roles in virion envelope incorporation. J. Virol. 77, 3634-3646. 526 527 Kumar, S., Tamura, K., Jakobsen, I. B. and Nei, M. (2001). MEGA2: molecular evolutionary 528 genetics analysis software. Bioinformatics 17, 1244-1245. 529 530 Lee, J.-H., Murphy, S.C., Belshan, M., Sparks, W.O., Wannemuehler, Y., Liu, S., Hope, 531 T.J., Dobbs, D. and Carpenter, S. (2006). Characterization of functional domains of equine 532 infectious anemia virus Rev suggests a bipartite RNA-binding domain. J. Virol. 80, 3844-3852. 533 2431 534 Lee, S.J., Hu, W., Fisher, A.G., Looney, D.J., Kao, V.F., Mitsuya, H., Ratner, L. and Wong- 535 Staal, F. (1989). Role of the carboxy-terminal portion of the HIV-1 transmembrane protein in 536 viral transmission and cytopathogenicity. AIDS Res. Hum. Retrovir. 5, 441-449. 537 538 Leroux, C., Issel, C. J. and Montelaro, R. C. (1997). Novel and dynamic evolution of equine 539 infectious anemia virus genomic quasispecies associated with sequential disease cycles in an 540 experimentally infected pony. J. Virol. 71, 9627-9639. 541 542 Mancuso, V. A., Hope, T. J., Zhu, L., Derse, D., Phillips, T. and Parslow, T. G. (1994). 543 Posttranscriptional effector domains in the rev proteins of feline immunodeficiency virus and 544 equine infectious anemia virus. J. Virol. 68,1998-2001. 545 546 Mealey, R. H., Zhang, B., Leib, S. R., Littke, M. H. and McGuire, T. C. (2003). Epitope 547 specificity is critical for high and moderate avidity cytotoxic T lymphocytes associated with 548 control of viral load and clinical disease in horses with equine infectious anemia virus. Virology 549 313, 537-552. 550 551 Moya, A., Holmes, E. C. and Gonzalez-Candelas, F. (2004). The population genetics and 552 evolutionary epidemiology or RNA viruses. Nat. Rev. Microbiol. 2, 279-288. 553 554 Nei, M. and Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous 555 and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418-426. 556 2531 557 Oaks, J.L., McGuire, T.C., Ulibarri, C. and Crawford, T.B. (1998). Equine infectious 558 anemia virus is found in tissue macrophages during subclinical infection. J. Virol. 72, 7263- 559 7269. 560 561 Rice, N.R., Henderson, L. E., Sowder, R.C., Copeland, T.D., Oroszlan, S. and Edwards, 562 J.F. (1990). Synthesis and processing of the transmembrane envelope protein of equine 563 infectious anemia virus. J. Virol. 64, 3770-3778. 564 565 Shacklett, B.L., Weber, C.J., Shaw, K.E.S., Keddie, E.M., Gardner, M.B., Sonigo, P. and 566 Luciw, P.A. (2000). The intracytoplasmic domain of the Env transmembrane protein is a locus 567 for attenuation of simian immunodeficiency virus SIVmac in rhesus macaques. J. Virol. 74, 568 5836-5844. 569 570 Yang, Z. and Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution 571 rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32-43. 572 2631 573 Table 1. Fit of various selection rev/tm models for pairwise sequence comparisonsa. Log Model sv st sr str Neutral NE NE NE NE -282.15 564.30 1parameter 0.98 (0.96,1.00) NE NE NE -165.57** 333.14 2parameter 0.99 (0.97,1.00) NE -147.85** 299.70 3parameter 0.99 (0.97,1.00) 0.56 (0.34,0.73) 0.77 (0.61,0.88) NE -145.62* 297.24 4parameter 0.99 0.41 0.69 -0.37 -145.41 298.82 Unrestricted NA NA NA NA -143.93 301.86 0.66 (0.49,0.77) likelihood AIC 574 a 575 are shown were applicable, along with bootstrapped 95% confidence intervals. Each model is compared to the 576 nested model in the preceding line. 577 ** implies a pvalue <0.001, * implies a pvalue 0.03, and no star implies a non-significant result. 578 NE means the selection coefficient was set to 0. 579 NA means the selection coefficient does not exist in the specified model. Various models in order of increasing complexity are compared. Estimates of selection coefficients sv, st, sr, and str 580 2731 581 582 583 584 585 Table 2. Fit of various rev/tm selection models for parsimonious reconstruction of mutationsa. Log Model sv st sr str Neutral NE NE NE NE -305.85 611.70 1parameter 0.94 (0.90,0.96) NE NE NE -212.10** 426.20 2parameter 0.93 (0.89,0.97) NE 0.32 (0.04, 0.52) -209.69* 423.38 4parameter 0.93 0.26 0.36 0.49 -209.27 426.54 Unrestricted NA NA NA NA -208.94 431.88 a See notes for Table 1. 586 2831 likelihood AIC 587 FIGURE LEGENDS 588 589 Figure 1. Genetic variation in EIAV rev in vivo. (A) Frequency of non-consensus amino acids 590 in Rev exon 2, relative to the founder variant, R1. (B) Frequency of individual amino acids 591 observed at the nine positions with frequency of non-consensus amino acids greater than 0.10. 592 The first amino acid shown is the consensus of the inoculum. 593 594 Figure 2. The effect of high frequency mutations on Rev nuclear export activity. A. Amino acid 595 sequence of Rev exon 2 showing location a single high frequency amino acid changes introduced 596 into the backbone of R1 cDNA. The functional domains required for Rev activity are boxed and 597 include the nuclear export signal (a.a. 31-55); the RNA binding/nuclear localization signal 598 (RRDRW and KRRRK). The shaded area indicates a region not essential for Rev nuclear export 599 activity (Lee et al., 2006). (B) Rev nuclear export activity of single amino acid mutants. results 600 are expressed relative to the consensus of the inoculum, R1, and represent the mean activity of at 601 least six independent transfections, ± standard error. Variants that differed significantly from the 602 activity of R1 are indicated by astericks, *p<0.05; **p<0.005; ***p<0.0005. 603 604 Figure 3. Genetic and phenotypic variation in Rev over time. A. Cumulative fixation of high 605 frequency Rev mutations based on the inferred ancestry of Rev variants observed in vivo. B. 606 Phenotype of Rev evolution mutants relative to R1. Variants that differed significantly from the 607 activity of R1 are indicated, with p values represented by (*) p < 0.05, (**) p < 0.005, (***) p < 608 0.0005. 609 2931