Download ppt4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Genome Evolution. Amos Tanay 2010
Genome evolution
Lecture 4:
population genetics III: selection
Genome Evolution. Amos Tanay 2010
Population genetics
Drift: The process by which allele frequencies are changing through
generations
Mutation: The process by which new alleles are being introduced
Recombination: the process by which multi-allelic genomes are mixed
Selection: the effect of fitness on the dynamics of allele drift
Epistasis: the drift effects of fitness dependencies among different
alleles
“Organismal” effects: Ecology, Geography, Behavior
Genome Evolution. Amos Tanay 2010
Wright-Fischer model for genetic drift
∞
gametes
N
individuals
N
individuals
∞
gametes
We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0)
We can model the frequency as a Markov process on a variable X (the number of A alleles)
with transition probabilities:
 2 N  i  
i 

Tij  
 1 

j
2
N
2
N
 



j
2N  j
Sampling j alleles from a
population 2N population
with i alleles.
In larger population the frequency would change more slowly (the variance of the binomial
variable is pq/2N – so sampling wouldn’t change that much)
Loss
0
1
2N-1
2N
Fixation
Genome Evolution. Amos Tanay 2010
The Moran model
Instead of working with discrete generation, we replace at most one individual at each time
step
A
t
A
t
A
a
a
X
A
A
A
a
a
a
A
A
A
A
A
A
Replace by
sampling from
the current
population
t  0
We assume time steps are small, what kind of mathematical models is describing the
process?
Genome Evolution. Amos Tanay 2010
The Moran model
A
t
A
t
A
a
a
X
A
A
A
a
a
a
A
A
A
A
A
A
Replace by
sampling from
the current
population
t  0
Assume the rate of replacement for each individual is 1,
We derive a model similar to Wright-Fischer, but in continuous time. A process on a random
variable counting the number of allele A:
Loss
0
i-1
1
i
i+1
2N-1
i  i 1
bi  (2 N  i ) 
i  i 1
di  i 
Rates:
2N  i
2N
i
2N
“Birth”
“Death”
2N
Fixation
Genome Evolution. Amos Tanay 2010
Fixation probability
Loss
0
i-1
1
i
i+1
2N-1
i  i 1
bi  (2 N  i ) 
i  i 1
di  i 
Rates:
2N  i
2N
i
2N
2N
Fixation
“Birth”
“Death”
In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example:
Theorem: When going backward in time, the Moran model generate the same distribution
of genealogy as Wright-Fischer, only that the time is twice as fast
Theorem: In the Moran model, the probability that A becomes fixed when there are initially I
copies is i/2N
Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged
since the probability of births and deaths is the same
Genome Evolution. Amos Tanay 2010
Fixation time
Ei  Ei ( | T2 N  To )
Expected fixation time assuming fixation
Theorem: In the Moran model, let p = i / 2N, then:
Proof: not here..
Ei  
2 N (1  p)
log( 1  p)
p
Genome Evolution. Amos Tanay 2010
Selection
Fitness: the relative reproductive success of an individual (or genome)
Fitness is only defined with respect to the current population.
Fitness is unlikely to remain constant in all conditions and environments
Sampling
probability is
multiplied by a
selection
factor 1+s
Mutations can change fitness
A deleterious mutation decrease fitness. It would therefore be selected
against. This process is called negative or purifying selection.
A advantageous or beneficial mutation increase fitness. It would therefore
be subject to positive selection.
A neutral mutation is one that do not change the fitness.
Genome Evolution. Amos Tanay 2010
Adaptive evolution in a tumor model
Selection
Human fibroblasts + telomerase
Passaged in the lab for many months
Spontaneously increasing growth rate
V. Rotter
Selection in haploids: infinite populations, discrete
generations
Genome Evolution. Amos Tanay 2010
This is a common situation:
•Bacteria gaining antibiotic residence
•Yeast evolving to adapt to a new environment
•Tumors cells taking over a tissue
Allele
Frequency
Relative fitness
Gamete after selection
Generation t:
A
B
pt 1
qt 1
w
1
pt 1w
qt 1
pt 1w
pt 1w  qt 1
qt 1
pt 1w  qt 1
Ratio as a function of time:
pt
p
 wt 0
qt
q0
Fitness represent the
relative growth rate of
the strain with the
allele A
It is common to use s
as w=1+s, defining the
selection coefficient
Genome Evolution. Amos Tanay 2010
Selection in haploid populations: dynamics
100
90
Growth = 1.5
80
Population
70
60
50
40
We can model it in continuous time:
30
20
Growth = 1.2
10
0
0
2
4
6
8
10
12
A (t )  aA(t ), B (t )  bB(t )
Generation
14
In infinite population, we can just consider the
ratios:
12
Ratio A/B
10
A(t ) A(0) ( a b )t

e
B(t ) B(0)
8
6
4
2
0
0
2
4
6
Generation
8
10
12
Genome Evolution. Amos Tanay 2010
Computing w
A(t ) A(0) t

w
B(t ) B(0)
log(
A(t )
A(0)
)  log(
)  (a  b)t  log( w)t
B(t )
B(0)
Example (Hartl Dykhuizen 81):
E.Coli with two gnd alleles. One allele is beneficial for growth on Gluconate.
A population of E.coli was tracked for 35 generations, evolving on two
mediums, the observed frequencies were:
Gluconate:
Ribose:
0.4555  0.898
0.594  0.587
For Gluconate:
log(0.898/0.102) - log(0.455/0.545) = 35logw
log(w) = 0.292, w=1.0696
Compare to w=0.999 in Ribose.
Genome Evolution. Amos Tanay 2010
Fixation probability: selection in the Moran model
When population is finite, we should consider the effect of selection more carefully
Loss
0
1
The models assume the
fitness is the probability of
the offspring to be viable. If
it is not, then there will not
be any replacement
i-1
i
i+1
2N-1
i  i 1
bi  (2 N  i ) 
i  i 1
di  i 
Rates:
Theorem: In the Moran model, with selection s>0
Pi (T2 N
1  (1  s)i
 T0 ) 
1  (1  s) 2 N
i
2N
2N  i
 (1  s )
2N
2N
Fixation
“Birth”
“Death”
Genome Evolution. Amos Tanay 2010
Fixation probability: selection in the Moran model
Theorem: In the Moran model, with selection s>0
Pi (T2 N
Note:
Note:
1  (1  s)i
 T0 ) 
1  (1  s) 2 N
i  i 1
bi  (2 N  i ) 
i  i 1
di  i 
i
2N
2N  i
 (1  s )
2N
i  1 2 Ns  0  Pi (T2 N  T0 )  s
s
s  1 (1  s)  e  Pi (T2 N
1  e is
 T0 ) 
1  e 2 Ns
Variant (Kimura 62): The probability of fixation in the Wright-Fischer model with selection
is:
P2 Np (T2 N
1  e 4 Nsp
 T0 ) 
1  e 4 Ns
Reminder: we should be using the effective population size Ne
Genome Evolution. Amos Tanay 2010
Fixation probability: selection in the Moran model
Theorem: In the Moran model, with selection s>0
Pi (T2 N  T0 ) 
1  (1  s)
1  (1  s) 2 N
i
i  i 1
bi  (2 N  i ) 
i  i 1
di  i 
i
2N
2N  i
 (1  s )
2N
Proof: First define:
Hitting time Ty  min{ t : X t  y}
Fixation given initial i “A”s
h(i)  Pi (T2 N  To )
The rates of births is bi and of deaths is di, so the probability a birth occur before a death
is bi/(bi+di). Therefore:
h(i ) 
bi
di
h(i  1) 
h(i  1)
bi  d i
bi  d i
h(i  1)  h(i ) 
di
(h(i )  h(i  1))  (1  s )( h(i )  h(i  1))
bi
h(0)  0, h(i  1)  h(i)  h(1)(1  s)i
j 1
1  (1  s) j
h( j )   c(1  s)  c
s
i 0
i
h( 2 N )  1  c 
s
1  (1  s) 2 N
Genome Evolution. Amos Tanay 2010
Fixation probabilities and population size
P2 Np (T2 N
1  e 4 Nsp
2s
 T0 ) 

1  e 4 Ns 1  e 4 Ns
0.02
0.01
0.0001
0.000001
0.00000001
0.015
1E-10
1E-12
1E-14
1E-16
0.01
1E-18
Ne=100
Ne=1000
Ne=10000
Ne=100000
Ne=100
Ne=1000
Ne=10000
Ne=100000
1E-20
1E-22
1E-24
0.005
1E-26
1E-28
1E-30
1E-32
0
-0.005
-0.003
-0.001
0.001
0.003
0.005
0.007
1E-34
0.009
1E-36
1E-38
-0.005
-0.005
-0.003
1E-40
-0.001
0.001
0.003
0.005
0.007
0.009
Genome Evolution. Amos Tanay 2010
Selection and fixation
Recall that the fixation time for a mutation (assuming fixation occurred) is
equal the coalescent time:
t  4N
Theorem: In the Moran model:
E1 ( | T2 N  To ) 
Theorem (Kimura):
t  (2 / s) ln( 2 N )
2
log N
s
(As said: twice slower)
Fixation process:
1.Allele is rare –
Number of A’s are a superciritcal
branching process”
1
log 2 N
s
2. Alelle 0<<p<<1 –
Logistic differential equation –
generally deterministic
log log 2 N
3. Alelle close to fixation –
Number of a’s are a subcritical
branching process
1
log 2 N
s
Selection
Drift
Genome Evolution. Amos Tanay 2010
Selection in diploids
Assume:
Genotype
AA
Aa
aa
Fitness
w11
w12
w22
Frequency
p2
2 pq
q2
(Hardy Weinberg!)
There are different alternative for interaction between alleles:
a is completely dominant: one a is enough – f(Aa) = f(aa)
a is Complete recessive: f(Aa) = f(AA)
codominance: f(AA)=1, f(Aa)=1+s, f(aa)=1+2s
overdominance: f(Aa) > f(AA),f(aa)
The simple (linear) cases are not qualitatively different from the haploid scenario
Genome Evolution. Amos Tanay 2010
Mutation-Selection balance
When an allele is weakly deleterious, mutations can play a major role in driving allele
frequencies
Genotype
New allele frequency,
without mutation
pqw12  p 2 w11
p'  2
p w11  2 pqw12  q 2 w22
What is the equilibrium frequency of the deleterious allele?
h  0, q' 
h  0, q ' 

s

hs
1
Fitness
Frequency(HW) p
New allele frequency,
assuming mutation
pq(1  hs)  p 2
p'  2
(1   )
p  2 pq(1  hs)  q 2 (1  s)
AA

A
Aa
aa
1  hs 1  s
2
2 pq
a
ignore (q<<1)
q2
Genome Evolution. Amos Tanay 2010
Mutation-Selection balance: Huntington disease
a neurological genetic disease appearing after age 35
Resulting from a dominant mutation – how does this disease survive
in the human population?
Although it may be fatal, the fitness is not very low due to the late
age of onset (estimated w12=0.81)
Human population: 70 per million (Europe) to 1 per million (Africa)
h>0, and we can estimate the mutation rate at the Huntington locus,
as hsq’ = 10-6 (1-0.81) = 1.9x107 to 70x10-6 (1-0.81) = 1.3x10-6
h  0, q' 
h  0, q ' 

s

hs
Genome Evolution. Amos Tanay 2010
Mutation-Selection balance: Haldane-Muller
h  0, q' 
The average fitness of the population, given recurrent mutations in rate  at a
locus with negative fitness s.
Assume perfect recessivity (h=0):
Assuming partial dominance (h>0)
1  qˆ 2 s  1 

s
h  0, q ' 

s

hs
s  1 
 

1  2 pˆ qˆhs  qˆ 2 s  1  2(1  ) hs    s  1  2 
hs hs
 hs 
The Haldane-Muller principle: the effect of mutation on the
average population fitness depends only on the mutation
rate, not on the fitness of the alleles!!
2
Genome Evolution. Amos Tanay 2010
Overdominance
A SNP affecting the beta-globin gene make the encoded protein defected. The resulted red
blood cells are curved and elongated, and are removed from the circulation
Homozygous for the mutation will usually die from anemia without intensive care
Heterozygous individual will have mild anemia, but will deal better with the malaria parasite
Plasmodium fliciparum (maybe because infected red cells become sickled)
(historical) Malaria distribution
Sickle-cell anemia
wiki
Genome Evolution. Amos Tanay 2010
Other types of selection
Different fitness for different individuals. e.g., male vs. female
For example male genes that take up female resources in
mammals
This was suggested to lead to the phenomenon of imprinting
where cells are expressing only the maternal or paternal allele
Imprinted genes are much like haploids
Genome Evolution. Amos Tanay 2010
Other types of selection
Frequency-, Density-dependent selection: when the fitness depend on the frequency of
the allele or the population size.
Fecundity selection: different reproductive potential for mating pairs.
Effects of heterogeneous environment
Effects that apply directly to the haplotype: gametic selection/meiotic drive (e.g., killing
your homologous chromosome reproductive potential)
Sexual selection: male advertising the reproductive potential, or confronting other
males
Kin selection: (“origin of altruism”)
Genome Evolution. Amos Tanay 2010
Recombination and selection
Genome Evolution. Amos Tanay 2010
Linkage and selection
Linkage interfere with the purging of deleterious mutations and reduce the
efficiency of positive selection!
Beneficial
Beneficial
Beneficial
Weakly deleterious
Selective sweep or
Hitchhiking effect or
genetic draft (Gillespie)
Hill-Robertson effect
Genome Evolution. Amos Tanay 2010
Linkage and selection
The variance in allele frequency is used to
define the effective population size
V ( p)  p(1  p) /( 2 N e )
Simplistically, assume a neutral locus is evolving such that a selective sweep is affecting
a fully linked locus at rate . A sweep will fixate the allele with probability p, and we
further assume that the sweep happens instantly:
 1  
Ne
V ( p)  p(1  p)  

N

l

2
N
1  2 N e
e 

This is very rough, but it demonstrates the basic intuition here: sweeps reduce the
effective selection in a way that can be quantified through reduction in the effective
population size.
Nl 
Ne
1 2 N eC
C – the average frequency of the
neutral allele after the sweep
Genome Evolution. Amos Tanay 2010
Don’t let it confuse you…
Purifying
Negative
Forces that drives
genomic conservation
Neutrality
Background
Directed
Adaptive
Positive
Forces that drives
genome change