Download Site-Specific Amino Acid Frequency, Fitness, and the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Genetics: Published Articles Ahead of Print, published on July 18, 2006 as 10.1534/genetics.106.062885
Site-Specific Amino Acid Frequency, Fitness, and the
Mutational Landscape Model of Adaptation in HIV-1
Jack da Silva
School of Molecular and Biomedical Science, The University of Adelaide, Adelaide,
SA 5005, Australia
1
Running Head: Site-Specific Amino Acid Frequency and Fitness
Keywords: site-specific amino acid frequency; marginal fitness; selection coefficient;
adaptation; HIV
Corresponding Author:
Jack da Silva
School of Molecular and Biomedical Science
Molecular Life Sciences Building
Gate 8, Victoria Drive
The University of Adelaide
Adelaide, SA 5005
Tel. +61 (8) 8303 8083
Fax +61 (8) 8303 4362
Email [email protected]
2
ABSTRACT
Analysis of the intensely studied HIV-1 gp120 V3 protein region reveals that the
among-population mean site-specific frequency of an amino acid is a measure of its
relative marginal fitness. This surprising result may arise if populations are displaced
from mutation-selection equilibrium by fluctuating selection and if the probability of
fixation of a beneficial amino acid is proportional to its selection coefficient.
3
Knowing the effect on fitness of every amino acid at every site of a protein would
greatly facilitate the study of adaptive evolution at the molecular level. Such data would
help determine the frequency distribution of fitness effects of new beneficial mutations,
which is key to understanding the dynamics of adaptation (ORR 2002), allow protein
variation to be more easily interpreted, and allow the realistic simulation of protein
evolution. But, such data are difficult to obtain empirically and are therefore not
available for any protein. However, for human immunodeficiency virus type 1 (HIV-1)
proteins, amino acid site-specific frequencies calculated across virus populations
infecting different patients appear to correlate with the marginal fitness effects of the
amino acids (HUNG et al. 1999; REZA et al. 2003), suggesting a simple method of
estimating the site-specific marginal fitnesses of amino acids.
HIV-1 V3: I investigated this correlation using sequence data from the third
variable region (V3) of the HIV-1 exterior envelope glycoprotein (gp120), located on
the surface of the virus particle (virion). V3 is the main determinant of which of two
cell-surface chemokine receptors (CCR5 or CXCR4) is used by a virion as a coreceptor
to enter a cell, which determines the type of cell infected (SPECK et al. 1997). V3 also
modulates the use of each coreceptor (DE JONG et al. 1992; HUNG et al. 1999) and
thereby affects the rate-limiting step in cell entry (PLATT et al. 2005). Finally, V3 is also
the main target of antibodies that interfere with cell entry (ZOLLA-PAZNER 2004).
Because of its critical function and its exposure to neutralizing antibodies, V3 has been
the focus of intense study, resulting in the data necessary to test the correlation.
Site-specific marginal fitness: Data on V3 site-specific fitness effects of amino
acids are available from a study that employed site-directed mutagenesis to modify the
amino acid at V3 site 25 (HUNG et al. 1999). This study measured the effects of
4
different amino acids at site 25 on the number of cells infected (infectivity) in a
standardized assay. Note that infectivity is the appropriate measure of the effect of V3
on fitness under the assay conditions; the sole function of V3 appears to be in cell entry,
and this component of fitness is unlikely to trade-off against the remaining components
of fitness, such as the rate of viral genome integration into the host cell genome, the rate
of provirus (integrated viral genome) replication, and virion budding, which are
controlled by other viral proteins (COFFIN 1999). The site-directed mutagenesis study
was conducted using CCR5-utilizing HIV-1 of Subtype B, the phylogenetic clade most
common in Europe and North America, and the most sequenced and studied.
Site-specific amino acid frequency: I calculated site-specific frequencies of
Subtype B V3 amino acids for the viral population infecting a patient and then averaged
frequencies across patients. Among-population means of frequencies were used because
the site-specific frequency of an amino acid in any one population is expected to be
dynamic, being dependent on the population’s level of adaptation, and thus a poor
predictor of the amino acid’s marginal fitness. This procedure was repeated for each of
the three coreceptor usage viral phenotypes: exclusively CCR5-utilizing (R5) (269
sequences from 58 patients), exclusively CXCR4-utilizing (X4) (109 sequences from 11
patients), and dual coreceptor utilizing (R5X4) (60 sequences from 16 patients). The
rank order of amino acid frequencies at most sites, including site 25, varies among
coreceptor-usage phenotypes (Fig. 1), as would be expected if the sites function in
determining or modulating coreceptor usage.
Site-specific fitness and frequency: I regressed relative infectivity, scaled
between 0 and 1, on the among-population mean relative frequency of the amino acid at
site 25, also scaled between 0 and 1. This was done only for R5 virus because
5
infectivity was measured for R5 virus. The regression shows a highly significant
relationship (slope = 0.84; df = 1, 5; P = 6.64 x 10-7), with frequency explaining 99% of
the variance in infectivity (r2 = 0.99) (Fig. 2). This surprisingly strong and linear
relationship demonstrates that the among-population mean site-specific frequency of an
amino acid is indeed a very good predictor of the amino acid’s marginal fitness.
The intercept of the regression line (0.15) is also significantly different from
zero (P = 1.18 x 10-4). Assuming that amino acids are not naturally observed at V3 site
25 if they render V3 non-functional, the 15% infectivity observed for amino acids with
zero frequency suggests that some other factor accounts for this level of infectivity and
that V3 accounts for only 85% of the infectivity component of fitness. Since we are
only interested in the portion of fitness attributable to V3, infectivity can be scaled from
the intercept to 1, giving a one-to-one relationship between marginal relative fitness (w)
and relative frequency (f), both scaled between 0 and 1: w = f. The selection coefficient
of amino acid j relative to any reference amino acid i at the same site can then be
calculated by rescaling fitness to wi = 1 and calculating the absolute difference between
fitnesses: sj = |1 – wj|. If the reference amino acid is the most common at a site, the
selection coefficient is simply sj = 1 – fj.
Selection coefficients: V3 site-specific amino acid selection coefficients were
calculated relative to the most common residue at each respective site for the R5
phenotype. These calculations assume no linkage among sites, which is reasonable
given the high rate of effective recombination (between viral variants) (1.38 x 10-4
recombination events/adjacent nucleotide site/generation) (SHRINER et al. 2004a)
relative to the mutation rate (2.4 x 10-5 point mutations/nucleotide site/generation)
(MANSKY and TEMIN 1995). Site-specific selection coefficients range from 0.180 to
6
0.999, with a mean of 0.923 (median = 0.970; Fig. 3), suggesting strong selection. Even
if a small intra-patient effective population size of Ne = 103 infected cells (ACHAZ et al.
2004; SHRINER et al. 2004b) is assumed and the minimum selection coefficient is used,
then 2Nes = 360 » 1, indicating that selection will prevail over genetic drift.
Selection: Strong selection on V3 by coreceptors is consistent with a lack of
genetic drift in this region in comparison to other similar-sized regions under severe
population bottlenecks in culture (YUSTE et al. 2000). It is also consistent with the
convergent evolution, among patients shortly after infection, of Subtype B V3
sequences toward the R5 sequence with the most common residue at each site (Fig. 1)
(ZHANG et al. 1993). Given such apparent strong selection by the coreceptor, the high
variability of V3 within and among patients suggests that equally strong forces displace
sequences away from the sequence with the highest fitness relative to the interaction
with CCR5. These forces may be migration, in the sense of repeat infections, or
selection by either the alternative coreceptor or the immune system. Migration on its
own may be excluded, since in the absence of strong opposing selection it would have
no effect (all populations would consist of the sequence with the highest fitness). This
leaves opposing selection by the alternative coreceptor, CXCR4, or the immune system.
Numerous studies, using a wide variety of methods of comparative sequence analysis,
have implicated coreceptors or the immune system or both as sources of positive
selection of V3 (e.g. BONHOEFFER et al. 1995; GERRISH 2001; NIELSEN and YANG
1998; TEMPLETON et al. 2004; WILLIAMSON 2003; YAMAGUCHI and GOJOBORI 1997).
Mutation-selection balance: Mutation-selection balance cannot explain the
high frequencies of less favored amino acids. Given a genetically homogenous virus
population at the time of infection (DERDEYN et al. 2004) and the early evolution of V3
7
toward the R5 sequence with the most common residue at each site (ZHANG et al.
1993), and assuming no migration between patients, amino acid polymorphism within a
population infecting a patient is assumed to arise via mutation. At mutation-selection
equilibrium, the most favored amino acid at each site is the most frequent, with each
alternative (deleterious) amino acid present at a mean frequency of p = µ/s, where µ is
the rate of mutation to the deleterious amino acid and s is its selection coefficient
relative to the most preferred amino acid (CROW and KIMURA 1970). For example, in
the case of Glutamic acid (E), the second most common residue at site 25 of R5 V3
sequences (Fig. 1), the mutation rate from the most common residue, Aspartic acid (D),
ignoring any mutation bias, is predicted to be 1.6 x 10-5/codon/generation and s = 0.18.
This gives p = 8.9 x 10-5, which is much lower than the observed mean frequency of
0.35.
Episodic adaptation: The high frequencies of less favored residues may be
explained if V3 is displaced from mutation-selection balance, either at the time of
infection, after which V3 evolves rapidly toward the R5 sequence with the most
common residue at each site, or during chronic infection when it is subject to strong
opposing selection, as argued above. Although the alternative coreceptor, CXCR4, may
impose strong selection during the late stages of disease progression (TEMPLETON et al.
2004), the immune system is expected to periodically select escape mutants throughout
the chronic stage of infection (DA SILVA 2003; FROST et al. 2005; TEMPLETON et al.
2004; WEI et al. 2003; WILLIAMSON 2003). For example, antibodies may impose
frequency dependent selection by targeting epitopes consisting of the most common
residues at sites, but such sites may eventually become shielded from antibody
8
surveillance by changes in other protein regions (FROST et al. 2005; WEI et al. 2003).
This scenario could result in the episodic adaptation of V3 to CCR5.
The mutational landscape model: Under weak selection and mutation (see
ORR 2002), each step in an episode of adaptation can be described by the mutational
landscape model of Gillespie (1984; 1991). In this model, a beneficial amino acid of
fitness rank j spreads to fixation as the next step in adaptation, when an amino acid of
fitness rank i is the current wild type at the site, with probability:
Pij =
"j
,
"1 + " 2 + K + " i#1
(1)
where Π is the probability of fixation and the subscripts denote the marginal fitness
ranks of the beneficial amino
acids within one mutational step of the wild type, with 1
!
indicating the highest fitness and j < i. This model assumes no recombination and
considers mutations at all sites of a sequence. However, the model can be applied to a
single amino acid site if free recombination is assumed or if it is assumed that the most
preferred amino acid is present at all other sites (i.e., that there is no potential for
adaptation at other sites). Since Πj ≈ 2sj for a new mutation (HALDANE 1927) and Πj ≈
2Nsjpj for standing variation (ORR and BETANCOURT 2001), Pij is, in either case,
proportional to sj. For example, in the case of new mutations,
Pij =
sj
.
s1 + s2 + K + si"1
(2)
Therefore, Pij is linearly related to sj and wj, and, after scaling both variables between 0
! equal to w regardless of which amino acid is the current wild
and 1, Pij is approximately
j
type (Fig. 4). This means that regardless of the level of adaptation of the population, a
particular beneficial amino acid will be the next to spread to fixation with a probability
that is proportional to its marginal fitness. If Pij is taken to be the proportion of
9
populations fixed for an amino acid of fitness rank j at a particular site or the sitespecific frequency of this amino acid averaged across many populations, then this
relationship may explain why among-population mean site-specific frequencies of
amino acids and their marginal relative fitnesses, both scaled between 0 and 1, are
equal.
Assumptions of the model: The mutational landscape model of adaptive
evolution is based on two fundamental assumptions: that selection prevails over genetic
drift (2Nes » 1) and that mutation is rare (Neµ « 1, where here µ is the number of
mutations per nucleotide site per generation) (ORR 2002). Under these conditions, a
single beneficial mutation is expected to spread to fixation before the next beneficial
mutation begins spreading. Although the model was originally described as assuming
strong selection (GILLESPIE 1984; GILLESPIE 1991), selection in the model is strong only
in relative terms, compared to 1/Ne; in absolute terms selection is assumed to be weak:
1/Ne « s « 1 (ORR 2002). HIV-1 clearly violates the assumptions of weak selection and
mutation; V3 experiences strong selection, as shown above, and the point mutation rate
for HIV-1 is high. Strong selection and mutation may mean that more than one
beneficial mutation spreads through the population simultaneously. Such clonal
interference (GERRISH and LENSKI 1998) among beneficial mutations at different amino
acid sites on different genomes would tend to reduce probabilities of fixation. However,
clonal interference is expected to be weak for HIV-1 because the effective
recombination rate is more than five times the mutation rate and, therefore, amino acid
sites can be considered unlinked.
The question then is, is there clonal interference among beneficial mutations at a
single amino acid site? The answer depends on the probability that more than one
10
beneficial mutation segregates at the same site. This, of course, depends on the
beneficial mutation rate, which depends on the level of adaptation at a given site (the
wild-type amino acid at the site). I calculated the proportion of nucleotide point
mutations that produce a beneficial amino acid change using the rank order of fitnesses
of amino acids at V3 site 25 (Fig. 2). Excluding mutations at multiple codon positions,
because the frequency of these is very low (µ2 and µ3) compared to the frequency of
mutations at a single codon position (µ), and ignoring any mutation bias, the proportion
of nucleotide point mutations that are beneficial, averaged among codons for each
amino acid, ranges from 0.037 for Arginine (R) to 0.389 for Glycine (G), and is 0.182
averaged across all amino acids observed at site 25 (except Aspartic acid (D), which has
the highest fitness). Then, the rate of beneficial mutations per codon is the product of
the probability of a mutation in the codon (including synonymous mutations) and the
proportion of these mutations that are beneficial. Allowing for the highest beneficial
mutation rate at site 25, that is, with Glycine as the wild-type amino acid, the rate of
beneficial mutations per codon is equal to 3 nucleotide sites/codon x 2.4 x 10-5
mutations/nucleotide site/generation x 0.389 beneficial mutations/mutation = 2.8 x 10-5
beneficial mutations/codon/generation. Therefore, with a census population size of
approximately N = 107 infected cells (CHUN et al. 1997) there will be a maximum total
of 2.8 x 102 beneficial mutations at V3 site 25 each generation within a population.
However, the probability that a new beneficial mutation becomes fixed is proportional
to Ne/N (KIMURA 1964), which for HIV-1 is 103/107 = 10-4. Therefore, at most 10-4 x 2.8
x 102 = 2.8 x 10-2 new beneficial mutations each generation ultimately become fixed.
This is equivalent to one beneficial mutation that ultimately becomes fixed arising every
36 generations or more, on average, for the highest beneficial mutation rate at site 25.
11
For the mean beneficial mutation rate at site 25 (1.3 x 10-5), one beneficial mutation that
ultimately becomes fixed arises every 77 generations or more. To determine whether
multiple beneficial mutations will segregate at the same site, we need to know how long
it takes a single new beneficial mutation to spread to fixation. This can be calculated
numerically from the frequency of a beneficial mutation after one generation of
selection in an haploid organism (the HIV-1 provirus is haploid): p′ = p(1 + s)/(ps + 1),
where p is the frequency in the current generation. With initial frequency p = 1/N = 10-7
for a new mutation, the number of generations taken to reach a frequency of 99% ranges
from six, for mutation G → D (s = 34.9), to 51, for mutation G → R (s = 0.5).
Therefore, in the extreme case of the highest beneficial mutation rate and lowest
selection coefficient, we can expect some clonal interference (on average, a new
beneficial mutation fixes every 36 generations or more and it takes such a mutation 51
generations to spread to near fixation). However, with the average beneficial mutation
rate, clonal interference is not expected regardless of the magnitude of the selection
coefficient (on average, a new beneficial mutation fixes only every 77 generations or
more).
A recent study with a DNA bacteriophage system under moderately strong
selection (s = 0.11 – 0.39) and weak mutation is the only empirical examination of the
mutational landscape model to date (ROKYTA et al. 2005). The fitnesses of nine
beneficial amino acid replacements that occurred as first steps in episodes of adaptation
were measured, and the frequencies of the replacements across 20 replicate populations
were compared to expectations under the model. Modifying the model to account for
mutation bias and population bottlenecks resulting from the experimental protocol
improved its fit to the observed distribution of amino acid replacement frequencies. In
12
the case of HIV-1, population size is constant during chronic infection and mutation
bias may be unimportant since selection is strong. The empirical support from the
Rokyta et al. study suggests that the model is robust to violations of the assumption of
weak selection. Indeed, the model requires only that fixation probabilities be
proportional to s (ROKYTA et al. 2005).
Most of the arguments above assume that the source of beneficial variants is
new mutations within the population. However, variation may also be introduced by
migration in the form of multiple infections of the same patient. Nevertheless, long term
studies of subtype B HIV-1-infected patients have not reported evidence of repeat
infections (e.g., SHANKARAPPA et al. 1999). Repeat infections may be rare because
sexual transmission is the main route of infection in North America and western Europe
(UNAIDS 2004), where subtype B predominates, and the probability of infection per
coital act is < 0.5% (GRAY et al. 2001). And, perhaps more importantly, sexual
transmission involves a severe bottleneck of the donor viral population, possibly due to
strong selection by host target cells, resulting in typically only a single variant
transmitting successfully (DERDEYN et al. 2004). Consequently, the impact of migration
on genetic variation is expected to be minor compared to that of mutation and selection.
For example, even with the successful transmission and integration of 102 distinct
genomes, a migration event would contribute only a very small fraction of the 2.4 x 106
mutant genomes generated by mutation each generation (approximately 2 days)
(MARKOWITZ et al. 2003) in a population (for an HIV-1 genome size of approximately
104 nucleotides). Therefore, mutation produces several orders of magnitude more
variation in a population in a single generation than can be contributed by a rare
migration event.
13
Another assumption made in applying the mutational landscape model to HIV-1
is that the HIV-1 populations studied here are true replicates. Strictly speaking, this
assumption is violated because populations differ in a variety of ways, including
differences in specific immunity (cytotoxic T-lymphocyte and antibody responses)
among patients. However, the main source of selection of V3 of interest here is the
chemokine receptor CCR5, which is expressed intact on cells targeted by HIV-1 in
every infected individual. The intact, expressed protein appears to exhibit little adaptive
variation within humans and among higher primates with respect to HIV-1 (ZHANG et
al. 2003). Therefore, CCR5 represents a homogenous source of selection among
patients. Indeed, HIV-1 adapts to CCR5 shortly after infection, rapidly evolving the V3
sequence with the most common amino acid at each site for the CCR5-utilizing
phenotype (Figure 1) (ZHANG et al. 1993). Therefore, although the subtype B HIV-1
populations analysed here are not true replicates, they are as similar with respect to the
source of selection under study as one might expect to find in a non-experimental
setting.
Conclusion: It will be interesting to see if the correlation between mean
frequency and fitness observed here holds more generally, which should depend on
whether fluctuating selection within populations combined with constant, strong
selection across populations is a common condition. If so, the ability to estimate sitespecific marginal fitnesses of amino acids from their among-population mean sitespecific frequencies should open new avenues of research into the dynamics of
adaptation at the molecular level.
14
I wish to acknowledge the support of the Discipline of Genetics and the School
of Molecular and Biomedical Science of The University of Adelaide. I also thank
Alexei Drummond, for suggesting the interpretation of the intercept of the regression
line in Figure 2, and two anonymous reviewers whose comments greatly improved the
manuscript.
15
Figure Legends
FIGURE 1.—V3 amino acids in order of decreasing (top to bottom) site-specific
frequency for each coreceptor usage phenotype. Residues in bold are unique to a
phenotype at that site. Site 25, which was the target of a site-directed mutagenesis
experiment, is indicated. I downloaded V3 amino acid sequences from the HIV
Sequence Database (http://hiv-web.lanl.gov) on May 2, 2005. Since V3 varies in
sequence among subtypes (clades) of HIV-1 (KUIKEN et al. 1999), I restricted the
analysis to Subtype B, which is the subtype used in the site-directed mutagenesis
experiment. V3 also varies in sequence among coreceptor usage phenotypes. Therefore,
I used only sequences from virus with known coreceptor usage, identified in the
database as using CCR5 (R5), CXCR4 (X4), or both coreceptors (R5X4). Finally, since
the objective was to calculate site-specific amino acid frequencies within patients
(populations) and then average these values across patients, I considered only sequences
associated with an identified patient. I omitted from the dataset sequences identified as
contaminants and sequences with unidentified residues or missing either conserved
terminal Cysteine residue (these residues are absolutely conserved and form a disulfide
bond between them). Subtype B V3 is most commonly 35 amino acids long. To avoid
uncertainties in sequence alignment (site homology), I used only sequences 35 amino
acids long that aligned unambiguously (without gaps) with the vast majority of other
sequences in the dataset.
FIGURE 2.—Simple least-squares linear regression of scaled virion relative
infectivity on the scaled among-population mean relative frequency of the amino acid at
16
site 25 of Subtype B V3 from R5 virus. Data point labels are amino acid one-letter
codes. The regression equation is y = 0.15 + 0.84x.
FIGURE 3.—Histogram of Subtype B R5 V3 site-specific amino acid selection
coefficients relative to the most common residue at each respective site. N = 71.
FIGURE 4.—The probability that a beneficial amino acid of fitness rank j will
be the next to spread to fixation, given that an amino acid of fitness rank i is the current
wild type (Pij), plotted against the amino acid’s marginal relative fitness (wj), both
scaled between 0 and 1. Pij was calculated for V3 site 25 residues using Equation 2 (see
text) with selection coefficients calculated relative to residues K (circles), A (squares),
and Q (triangles). One-letter amino acid codes are shown. The diagonal line is for y = x.
17
LITERATURE CITED
ACHAZ, G., S. PALMER, M. KEARNEY, F. MALDARELLI, J. W. MELLORS et al., 2004 A
Robust Measure of HIV-1 Population Turnover Within Chronically Infected
Individuals. Molecular Biology and Evolution 21: 1902-1912.
BONHOEFFER, S., E. C. HOLMES and M. A. NOWAK, 1995 Causes of HIV diversity.
Nature 376: 125.
CHUN, T. W., L. CARRUTH, D. FINZI, X. SHEN, J. A. DIGIUSEPPE et al., 1997
Quantification of latent tissue reservoirs and total body viral load in HIV-1
infection. Nature 387: 183-188.
COFFIN, J. M., 1999 Molecular Biology of HIV, pp. 3-40 in The Evolution of HIV,
edited by K. A. CRANDALL. The John Hopkins University Press, Baltimore.
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory.
Harper & Row, New York.
DA SILVA,
J., 2003 The Evolutionary Adaptation of HIV-1 to Specific Immunity.
Current HIV Research 1: 363-371.
DE JONG,
J. J., A. DE RONDE, W. KEULEN, M. TERSMETTE and J. GOUDSMIT, 1992
Minimal requirements for the human immunodeficiency virus type 1 V3 domain
to support the syncytium-inducing phenotype: analysis by single amino acid
substitution. Journal of Virology 66: 6777-6780.
DERDEYN, C. A., J. M. DECKER, F. BIBOLLET-RUCHE, J. L. MOKILI, M. MULDOON et
al., 2004 Envelope-constrained neutralization-sensitive HIV-1 after heterosexual
transmission. Science 303: 2019-2022.
18
FROST, S. D. W., T. WRIN, D. M. SMITH, S. L. K. POND, Y. LIU et al., 2005
Neutralizing antibody responses drive the evolution of human
immunodeficiency virus type 1 envelope during recent HIV infection. Proc. Natl
Acad. Sci. USA 102: 18514-18519.
GERRISH, P., 2001 The rhythm of microbial adaptation. Nature 413: 299-302.
GERRISH, P. J., and R. E. LENSKI, 1998 The fate of competing beneficial mutations in an
asexual population. Genetica 102-103: 127-144.
GILLESPIE, J. H., 1984 Molecular evolution over the mutational landscape. Evolution
38: 1116-1129.
GILLESPIE, J. H., 1991 The causes of molecular evolution. Oxford University Press,
New York.
GRAY, R. H., M. J. WAWER, R. BROOKMEYER, N. K. SEWANKAMBO, D. SERWADDA et
al., 2001 Probability of HIV-1 transmission per coital act in monogamous,
heterosexual, HIV-1-discordant couples in Rakai, Uganda. Lancet 357: 11491153.
HALDANE, J. B. S., 1927 A mathematical theory of natural and artificial selection. V.
Selection and mutation. Proceedings of the Cambridge Philosophical Society 28:
838-844.
HUNG, C. S., N. VANDER HEYDEN and L. RATNER, 1999 Analysis of the critical domain
in the V3 loop of human immunodeficiency virus type 1 gp120 involved in
CCR5 utilization. Journal of Virology 73: 8216-8226.
KIMURA, M., 1964 Diffusion models in population genetics. Journal of Applied
Probability 1: 177-232.
19
KUIKEN, C. L., B. FOLEY, E. GUZMAN and B. T. M. KORBER, 1999 Determinants of
HIV-1 Protein Evolution, pp. 432-468 in The Evolution of HIV, edited by K. A.
CRANDALL. The John Hopkins University Press, Baltimore.
MANSKY, L. M., and H. M. TEMIN, 1995 Lower in vivo mutation rate of human
immunodeficiency virus type 1 than that predicted from the fidelity of purified
reverse transcriptase. Journal of Virology 69: 5087-5094.
MARKOWITZ, M., M. LOUIE, A. HURLEY, E. SUN, M. DI MASCIO et al., 2003 A novel
antiviral intervention results in more accurate assessment of human
immunodeficiency virus type 1 replication dynamics and T-cell decay in vivo.
Journal of Virology 77: 5037-5038.
NIELSEN, R., and Z. YANG, 1998 Likelihood models for detecting positively selected
amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:
929-936.
ORR, H. A., 2002 The population genetics of adaptation: The adaptation of DNA
sequences. Evolution 56: 1317-1330.
ORR, H. A., and A. J. BETANCOURT, 2001 Haldane's Sieve and Adaptation From the
Standing Genetic Variation. Genetics 157: 875-884.
PLATT, E. J., J. P. DURNIN and D. KABAT, 2005 Kinetic Factors Control Efficiencies of
Cell Entry, Efficacies of Entry Inhibitors, and Mechanisms of Adaptation of
Human Immunodeficiency Virus. Journal of Virology 79: 4347-4356.
REZA, S. M., L.-M. SHEN, R. MUKHOPADHYAY, M. ROSETTI, T. PE'ERY et al., 2003 A
Naturally Occurring Substitution in Human Immunodeficiency Virus Tat
Increases Expression of the Viral Genome. Journal of Virology 77: 8602-8606.
20
ROKYTA, D. R., P. JOYCE, S. B. CAUDLE and H. A. WICHMAN, 2005 An empirical test
of the mutational landscape model of adaptation using a single-stranded DNA
virus. Nature Genetics 37: 441-444.
SHANKARAPPA, R., J. B. MARGOLICK, S. J. GANGE, A. G. RODRIGO, D. UPCHURCH et
al., 1999 Consistent viral evolutionary changes associated with the progression
of human immunodeficiency virus type 1 infection. Journal of Virology 73:
10489-10502.
SHRINER, D., A. G. RODRIGO, D. C. NICKLE and J. I. MULLINS, 2004a Pervasive
Genomic Recombination of HIV-1 in Vivo. Genetics 167: 1573-1583.
SHRINER, D., R. SHANKARAPPA, M. A. JENSEN, D. C. NICKLE, J. E. MITTLER et al.,
2004b Influence of random genetic drift on human immunodeficiency virus type
1 env evolution during chronic infection. Genetics 166: 1155-1164.
SPECK, R. F., K. WEHRLY, E. J. PLATT, R. E. ATCHISON, I. F. CHARO et al., 1997
Selective employment of chemokine receptors as human immunodeficiency
virus type 1 coreceptors determined by individual amino acids within the
envelope V3 loop. Journal of Virology 71: 7136-7139.
TEMPLETON, A. R., R. A. REICHERT, A. E. WEISSTEIN, X.-F. YU and R. B. MARKHAM,
2004 Selection in Context: Patterns of Natural Selection in the Glycoprotein 120
Region of Human Immunodeficiency Virus 1 within Infected Individuals.
Genetics 167: 1547-1561.
UNAIDS, 2004 2004 report on the global HIV/AIDS epidemic: 4th global report. Joint
United Nations Programme on HIV/AIDS (UNAIDS).
WEI, X., J. M. DECKER, S. WANG, H. HUI, J. C. KAPPES et al., 2003 Antibody
neutralization and escape by HIV-1. Nature 422: 307-312.
21
WILLIAMSON, S., 2003 Adaptation in the env gene of HIV-1 and evolutionary theories
of disease progression. Molecular Biology and Evolution 20: 1318-1325.
YAMAGUCHI, Y., and T. GOJOBORI, 1997 Evolutionary mechanisms and population
dynamics of the third variable envelope region of HIV within single hosts. Proc
Natl Acad Sci U S A 94: 1264-1269.
YUSTE, E., C. LOPEZ-GALINDEZ and E. DOMINGO, 2000 Unusual Distribution of
Mutations Associated with Serial Bottleneck Passages of Human
Immunodeficiency Virus Type 1. Journal of Virology 74: 9546-9552.
ZHANG, L. Q., P. MACKENZIE, A. CLELAND, E. C. HOLMES, A. J. BROWN et al., 1993
Selection for specific sequences in the external envelope protein of human
immunodeficiency virus type 1 upon primary infection. Journal of Virology 67:
3345-3356.
ZHANG, Y.-W., O. A. RYDER and Y.-P. ZHANG, 2003 Intra- and Interspecific Variation
of the CCR5 Gene in Higher Primates. Molecular Biology and Evolution 20:
1722-1729.
ZOLLA-PAZNER, S., 2004 Identifying epitopes of HIV-1 that induce protective
antibodies. Nature Reviews Immunology 4: 199-210.
22
Figure 1
R5
25
CTRPNNNTRKSIHIGPGRAFYATGDIIGDIRQAHC
I LS SS RGVPMAW KTWFT EEVV N K Y
S G
NR SL L SRLHR KQLT K
F
A T
T NF Q GSM V RA L E
H
TV C Q V S R
C
R
A
G
D
Y
K
R5X4
CTRPNNNTRKRISIGPGRAFYTTGQIIGDIRKAHC
IK SKII RSLHLSQ QVWFARRE V N Q Y
G K TGVRK
LHKAED T T R
S
NFN
I R R
TT
V
A
Y
S
G
P
N
Y
S
X4
CTRPNNNTRKRISIGPGRAFYTTGQIIGDIRQAHC
M GKKIIRSLHLSQ QVLSAMRD V N KTY
YYYKKTGFRV
KHI KLEE T T R
S TV NK YT
V R KK K Y
T
Y V N R
P
G E
R
Scaled Relative Infectivity
Figure 2
1.0
D
0.8
E
0.6
0.4
0.2
0.0
Q
KR
A
G
0.0
0.2
0.4
0.6
0.8
Scaled Mean Relative Frequency
1.0
Figure 3
60
Frequency
50
40
30
20
10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Selection Coefficient
0.9
1.0
Figure 4
D
1.0
E
Scaled Pij
0.8
0.6
Q
Scaled Pij
0.4
Scaled
Scaled
PijPij
0.2
A
GR
0.0
0.0
0.2
0.4
0.6
Scaled wj
0.8
1.0