Download 1 Natural Selection 2 Mutation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Saethre–Chotzen syndrome wikipedia , lookup

SNP genotyping wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Mutagen wikipedia , lookup

BRCA mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Inbreeding wikipedia , lookup

Koinophilia wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Group selection wikipedia , lookup

Mutation wikipedia , lookup

Epistasis wikipedia , lookup

Frameshift mutation wikipedia , lookup

Genetic drift wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
1
Natural Selection
1.1
Maximum Likelihood Estimation
Maximum Likelihood Estimation of Selection
We have previously derived recurrence relations for allele frequencies over time given relative fitnesses
w and starting allele frequencies p
P
p2u (t)wuu + v6=u pu (t)pv (t)wuv
pu (t + 1) =
.
w̄(t)
Suppose you observe allele counts over multiple generations {n1 (1), n2 (1), . . . , n1 (2), n2 (2), . . . , }.
The allele frequencies in each generation are pu (1), pu (2), . . ., where pu (t + 1) is a function (given
above) of pu (t) for all t > 0. Then, the likelihood for the fitness model is
YY
L ({wuv , pu (1)}) ∝
[pu (t)]nu (t) .
t
u
Numerical methods are required to maximize this likelihood over wuv and pu (1).
HIV Example Wu06
2
2.1
Mutation
Theory
Mutation
Mutation provides the raw material for evolution. All mutations are ultimately changes at
the nucleotide level.
The vast majority of mutations that have an effect are deleterious and incompletely dominant
(1 : 1 − hs : 1 − s, with h < 1). These mutations are present in populations because they arise
by accident during genome copying during meisosis. How does selection act upon them?
Such deleterious mutations ultimately achieve an equilibrium state, wherein their production
by mutation is balanced by their removal by selection.
Overview of mutation: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.chapter.1049
Modeling Mutation
Consider a locus where there are 2 alleles possible A and B.
Suppose the mutation rate (per replication cycle per locus) for mutating A → B is u. Let v
be the rate from B → A.
Let pA (t) be the frequency of allele A in the tth generation.
In the next generation, type A alleles will arise by faithful copy of type A alleles from
the previous generation, or by mutation during copying of type B alleles from the previous
generation. So,
pA (t + 1)
=
(1 − u)pA (t) + v [1 − pA (t)]
∆pA (t + 1) = pA (t + 1) − pA (t)
=
−upA (t) + v [1 − pA (t)] .
If the mutation rate per site per generation is 10−9 to 10−10 and we consider a gene locus
of about 1000 nucleotides, the mutation rate per gene per replication cycle is therefore 10−6 to
10−7 .
A Stable Mutation Equilibrium
A mutation equilibrium occurs when ∆pA (t) = 0 and pA (t) = pA . Solving the equation
for ∆pA (t) = 0
0 = −upA + v [1 − pA ]
yields
pA =
v
u+v
at equilibrium.
v
Exercise. The equilibrium is stable. To verify this, plug in pA = u+v
− δ in the equation
v
for ∆pA . Will pA increase or decrease in the next generation? Repeat with pA = u+v
+ δ.
Exercise. If A is the vastly dominant allele, show ∆pA (t) ≈ −u. How much relative error
is introduced in making this approximation? Relative error is the difference between the exact
and approximate values divided by the exact value.
Rate of Approach to Equilibrium
Take the recurrence relation for pA (t) and subtract the equilibrium pA
pA (t + 1) − pA
=
(1 − u)pA (t) + v [1 − pA (t)] − pA
=
(1 − u)pA (t) + v [1 − pA (t)] − (1 − u)pA − v [1 − pA ]
=
(1 − u) [pA (t) − pA ] + v [1 − pA (t) − 1 + pA ]
=
[1 − u − v] [pA (t) − pA ] ,
and we’re overly familiar with this kind of equation
pA (t) − pA = (1 − u − v)t (pA0 − pA ) ,
where pA0 is the initial frequency of type A alleles. The approach to equilibrium is very slow
since 1 − u − v ≈ 1.
Exercise. By Taylor’s series, we know (1 − u − v)t ≈ e−(u+v)t . Use this approximation
to compute the landmark times t1/2 , the time it takes to decrease the starting disequilibrium
pA0 − pA by one-half.
Neglecting Back Mutation
You will commonly hear someone say or write, "and we neglected back mutation."
If A is the normal (wild type) form and B is the mutant form of the allele, then neglecting
back mutation is equivalent to setting v = 0. Back mutation mutates the mutant form back to
the wild type form of the allele.
2
It is biologically reasonable to neglect back mutation because often we are speaking of an
allele variant of a protein. Either the protein works (normal/wild type) or it doesn’t (mutant).
There are many more ways to make a protein that doesn’t work than one that does, so generally
u v.
However, when considering DNA sequences it is not reasonable to neglect back mutation.
If A → C with probability u, then it is normally not all right to assume C → A is virtually
impossible (i.e. v = 0).
Mutation with Multiple Alleles
For DNA sequences it is often the case that you are dealing with a large number n of possible
alleles A1 , A2 , . . . , An .
Let uij be the mutation rate from allele i to allele j. Then, the recursion equation for type i
alleles is
2
3
X
X
uij 5 +
pj (t)uji
pi (t + 1) = pi (t) 41 −
j6=i
j6=i
Equations can be established for the equilibrium allele frequencies pi , by setting pi (t + 1) =
pi (t) = pi in the equations. For given uij , the resulting linear system of equations that can be
solved for pi .
Equilibrium with Multiple Alleles
For the special case that uij = u for all i 6= j, then
pi
=
pi [1 − (n − 1)u] +
X
upj
j6=i
(n − 1)pi
=
X
pj
j6=i
npi
=
pi
=
1
1
.
n
So, when all mutations are equally likely then all alleles are equally prevalent at equilibrium.
This model of evolution at the DNA level is called the Jukes-Cantor model of nucleotide
substitution. It implies that all nucleotides A, C, G, and T are equally likely at every position in
the alignment, when mutational equilibrium has been achieved.
2.2
With selection - haploid
Mutation and Selection
Because there are more ways to make a bad protein than a good, functional protein,
it would seem that mutation generally pushes toward a worse equilibrium. How does
nature handle/control mutation?
DNA Repair http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.section.1151
3
Mutation and Selection - Haploid
Consider two alleles (genotypes) A and B. Let genotype A have allele frequency p at birth.
We will consider how selection followed by mutation, in the circle of life, will impact the allele
frequency.
If the fitness of genotype A is 1 and the fitness of genotype B is 1 − s, then after selection
the genotype frequency will shift from the initial p to
p
p
=
p + (1 − s)(1 − p)
1 − (1 − p)s
p∗ =
after selection.
Next, the individuals copy themselves and the possibility of mutation is introduced. The
allele frequency after mutation is
p0 = (1 − u)p∗ ,
where we have neglected back mutation.
Mutation and Selection Equilibrium - Haploid
Combining selection and mutation, we have
p0 =
(1 − u)p
.
1 − (1 − p)s
At equilibrium p0 = p. Make the substitution and solve for the equilibrium,
p [1 − (1 − p)s]
=
p(1 − u)
up − sp(1 − p)
=
0
p [u − s(1 − p)]
=
0
When will the system be at equilibrium?
Mutation and Selection Equilibrium - Haploid
qe =
u
s
where qe = 1 − pe .
If s u, then qe is predicted to be small. There won’t be much allele B around despite the
efforts of mutation to increase its numbers!
The force of selection is generally much stronger than the force of mutation.
Only at the DNA level can you see mutations that may have little impact on fitness and hence
have small s, even on the order of magnitude of u.
What happens when u > s?
Mutation/Selection Balance Example
4
Ribeiro et al. 1998. The frequency of resistant mutant virus before antiviral therapy. AIDS. 12:461–465.
λ − dx − βx [w + (1 − s)m]
ẋ
=
ẇ
=
βx [(1 − µ)w + (1 − s)µm] − aw
ṁ
=
βx [µw + (1 − s)(1 − µ)m] − am
Mutation/Selection Balance Example
Parameter
xt
wt
mt
λ
d
β
(1 − s)β
µ
a
xt+1 − xt
=
λ − dxt − βxt [wt + (1 − s)mt ]
wt+1 − wt
=
βxt [(1 − µ)wt + (1 − s)µmt ] − awt
mt+1 − mt
=
βxt [µwt + (1 − s)(1 − µ)mt ] − amt
Meaning
Count of susceptible cells at generation t
Count of wild type-infected cells in generation t
Count of mutant-infected cells in generation t
Count of new susceptible cells born in each generation
Prob. susceptible cell dies in a generation
Prob. encounter of susc. cell and wild type-infected cell infects susc. cell
Prob. encounter of susc. and mutant-infected cell infects susc. cell
Mutation rate
Prob. of death of infected cell
They find at equilibrium
2.3
me
we
=
µ
.
s
Does this look familiar?
With selection - diploid
Mutation and Selection - Diploid Recessive
First consider recessive mutants such that AA and AB individuals have relative fitness
wA = 1, while the homozygous recessive mutant BB has decreased relative fitness wB = 1−s.
Assume that the frequency of allele A is p in the starting gamete pool.
After selection
p∗
=
=
p2 + p(1 − p)
p2 + 2p(1 − p) + (1 − s)(1 − p)2
p
.
1 − s(1 − p)2
After mutation, where again we neglect back mutation
p0 =
p(1 − u)
1 − s(1 − p)2
5
Diploid Recessive - Mutation/Selection Balance
To find the mutation/selection balance for the recessive diploid case, again assume equilibrium so p0 = p = pe . Then,
1 − s(1 − pe )2
1−u
=
Therefore, at equilibrium
r
qe = 1 − pe =
u
.
s
Comparing to the haploid case indicates that the gene frequency of the mutant allele B will
be higher in the diploid recessive case than the haploid case. Can you explain this biologically?
Exercise. What is the frequency of affected individuals at equilibrium?
Example
Cystic fibrosis is a disease caused by a recessive allele, we shall call B. The frequency of
affecteds at at birth is 1 in 2,500.
What is the cystic fibrosis allele B frequency q in the population?
Until recently, cystic fibrosis was fatal before affected individuals reached reproductive age,
therefore vBB = 1 − s = 0, where s = 1. At equilibrium
√
qe = u.
Is the required mutation rate reasonable for a disease caused by a single mutant protein?
Mutation/Selection Balance - Diploid Dominant
Now we consider the case where the mutant allele is completely or partially dominant to the wild type
allele.
First, we assume geometric (multiplicative fitness).
AA
1
AB
1−s
BB
(1 − s)2
After selection
p∗
=
w̄A
w̄
=
p
,
1 − (1 − p)s
the same as the haploid case. Mutation affects allele frequencies in diploids just the same way it does in
haploids, so the haploid results apply to diploid loci under multiplicative selection and qe ≈ us .
6
Selection in Homozygotes vs. Heterozygotes
Every generation a fraction 2sq(1 − q) of heterozygotes are “killed” by selection. Each
killing destroys a mutant B allele.
ˆ
˜
Equivalently, a fraction q 2 1 − (1 − s)2 homozygotes are “killed” by selection each generation. All these killings destroy 2 B alleles. The ratio of mutant alleles destroyed in heterozygotes to homozygotes is
2sq(1 − q)
s(1 − q)
1−q
=
=
,
2q 2 [1 − (1 − s)2 ]
q(2s − s2 )
q(2 − s)
which falls between
1−q
2q
and
1−q
q
because s ∈ [0, 1].
Since q 1, we conclude that this ratio is very large, and most mutants alleles are destroyed
by selection on heterozygotes. That’s because most mutant alleles are present in heterozygotes
when the mutant allele is rare.
Mut./Sel. Balance - Diploid Partial Dominance
Let’s parameterize partial dominance as follows:
AA
1
AB
1 − hs
BB
1−s
h indicates how much of the fitness detriment in homozygote mutants BB is also shared by the
heterozygote mutant carriers AB. So, for example, if h ≈ 1, then heterozygotes are nearly as
affected as homozygotes.
Because the mutant allele B will be rare when s u, homozygote BB will be rare in the
population. Selection will mostly be acting on heterozygotes, so there cannot possibly be much
practical difference between the above selection scheme and
AA
1
AB
1 − hs
BB
(1 − hs)2
Mut./Sel. Balance - Diploid Partial Dominance
The latter selection scheme is the familiar multiplicative.
Therefore, the mutant allele frequency for the general partial dominance fitness landscape is
qe ≈
u
.
hs
Caution. The above result is true only when the mutant allele B is rare, i.e. u hs.
Even fairly moderate heterozygote effects, e.g. h small, can still maintain mutant allele
frequency qe low as long as u s.
Counterintuitively, as far as population impact, the small fitness effects on mutant carriers
(heterozygote AB) is much more important than the potentially huge impacts on affected homozygotes BB.
7
2.4
Genetic load
Haldane-Muller Principle for Haploids
We will now concern ourselves with the magnitude of the detrimental effect of mutation on
a population.
For haploids, the mean relative fitness of the population is
w̄ = 1 − q + (1 − s)q = 1 − sq
and at equilibrium q = qe = u/s, so mean relative fitness is
w̄e = 1 − u.
Surprisingly, the effect of mutation on the mean relative fitness of the population is to decrease it by fraction u, which is independent of the fitness of the mutant!
Haldane-Muller Principle for Diploids
For diploids, we have mean relative fitness of a recessive allele is
w̄ = (1 − q)2 + 2q(1 − q) + (1 − s)q 2 = 1 − sq 2
p
At equilibrium qe = u/s, so again
w̄e = 1 − u.
For partial or fully dominant alleles, mean relative fitness is
w̄ = (1 − q)2 + (1 − hs)q(1 − q) + (1 − s)q 2 = 1 − 2hsq(1 − q) − sq 2 .
At equilibrium qe ≈
u
,
hs
w̄e = 1 − 2u +
2u2
u2
− 2 ≈ 1 − 2u,
hs
h s
since u is very small and u2 is neglible.
Genetic Load
The fraction of mean relative fitness lost because of mutation is called genetic load. It represents the
cost of mutation.
wmax − w̄
L=
,
wmax
where wmax is the fitness of the maximally fit genotype in the population. If wmax = 1, then L ≈ 2u for
diploids.
Consider n independent loci each mutating and contributing to the load. If we assume fitness effects
across loci are multiplicative, then for n partially dominant loci, the mean fitness is
(1 − 2u)n ≈ e−2un
and genetic load is 1 − e−2un .
Exercise. Show the cost of recessive mutations is less than the cost of dominant mutations.
8
Mutation and Linkage Disequilibrium
Let the gamete frequencies for two loci be PAC , PAD , PBC , PBD , where locus 1 has alleles A and
B and locus 2 has alleles C and D.
Suppose the mutation rate A → B is u1 and B → A is v1 for locus 1. Similarly define u2 and v2 for
locus 2.
Follow PAC over time and assume linkage equilibrium at generation t
PAC (t + 1)
=
(1 − u1 )(1 − u2 )PAC (t) + (1 − u1 )v2 PAD (t)
=
(1 − u1 )(1 − u2 )pA (t)pC (t) + (1 − u1 )v2 pA (t)pD (t)
+ v1 (1 − u2 )PBC (t) + v1 v2 PBD (t)
+ v1 (1 − u2 )pB (t)pC (t) + v1 v2 pB (t)pD (t)
=
[(1 − u1 )pA (t) + v1 pB (t)] [(1 − u2 )pC (t) + v2 pD (t)]
=
pA (t + 1)pC (t + 1)
which is linkage equilibrium.
Mutation and Linkage Disequilibrium II
So, by the next generation, the two loci are still in linkage equilibrium.
We conclude that mutation cannot create linkage disequilibrium.
Unless....
If mutation is so rare that it becomes a random force, then mutation creates temporary linkage disequilibrium. Imagine a mutation that occurs on average once every million years. When it is first introduced, it
will be introduced on a particular chromosome background.
*
A3
B6
C2 D9
E1
F1 G2
9
H4
I2 J6
K1