Download Distribution of Mutations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Distribution of Mutations
Biostatistics 666
Lecture 5
Last Lecture:
Introduction to the Coalescent
z
Coalescent approach
z
Infinite sites model
• Proceed backwards through time.
• Genealogy of a sample of sequences.
• All mutations distinguishable.
• No reverse mutation.
Some key ideas …
z
Probability of coalescence events
z
Length of genealogy and its branches
z
Expected number of mutations
z
Parameter θ which combines population
size and mutation rate
Building Blocks…
z
Probability of sampling distinct ancestors
for n sequences
⎛n⎞
⎜⎜ ⎟⎟
n −1
2⎠
i ⎞
⎛
⎝
P ( n ) = ∏ ⎜1 − ⎟ ≈ 1 −
N⎠
N
i =1 ⎝
z
Coalescence time t is approximately
exponentially distributed
Some Key Results…
z
Coalescence Time (population size units)
⎛ j⎞
E (T j ) = 1 / ⎜⎜ ⎟⎟
⎝ 2⎠
z
Total Length (population size units)
n −1
2
E (Ttot ) = ∑
i =1 i
Some More Key Results …
z
Expected Number of Polymorphisms
For a diploid sample
n −1
n −1
i =1
i =1
E ( S ) = 4 Nµ ∑ 1 / i = θ ∑ 1 / i
For an haploid sample
n −1
n −1
i =1
i =1
E ( S ) = 2 Nµ ∑ 1 / i = θ ∑ 1 / i
Inferences about θ
z
Could be estimated from S
•
Divide by expected length of genealogy
θˆ =
S
n −1
∑1 / i
i =1
z
Could then be used to:
•
•
Estimate N, if mutation rate µ is known
Estimate µ, if population size N is known
Alternative Estimator for θ …
z
Count pairwise differences between
sequences
z
Compute average number of differences
⎛n⎞
θ = ⎜⎜ ⎟⎟
⎝ 2⎠
~
−1
n
n
∑ ∑S
i =1 j = i +1
ij
^
1.2
Var(θ) as a function of N
0.8
N = 10,000 individuals
µ = 10-4
0.2
0.4
0.6
θ=4
0.0
Variance in Estimate of Theta
1.0
Parameters
2
5
8
11 14 17 20 23 26 29 32 35 38 41 44 47 50
Sample Size
Today ...
z
More applications of the coalescent
z
Predicting allele frequency distributions
z
The full distribution of S
• Using simulations
• Using analytical calculations
A Coalescent Simulation …
z
Let’s consider tracing the ancestry of 4
sequences
When n = 4
Probability of Coalescent Event
⎛ 4⎞
P(4) ≈ ⎜⎜ ⎟⎟ 2 N
⎝ 2⎠
Time to Next Coalescent Event
⎛ 4⎞
T (4) ≈ 2 N ⎜⎜ ⎟⎟
⎝ 2⎠
Sample time from exponential distribution
Pick two sequences at random to coalesce
Next n = 3 …
Let’s assume that sequences 3 and 4 are selected …
Then, we repeat the process for a sample of 3 sequences
T(4)
Next n = 2 …
Let’s assume that sequences 1 and 2 are selected to coalesce
Then, we repeat the process for a sample of 2 sequences
T(3)
T(4)
The Simulated Coalescent
At this point, we could
place mutations in
genealogy. Most often,
these would fall in longer
branches.
T(2)
T(3)
T(4)
A Coalescent Simulation …
Mutations in these
branches affect
a pair of sequences
T(2)
T(3)
T(4)
A Coalescent Simulation …
Mutations in these
branches affect
a single sequence
T(2)
T(3)
T(4)
Frequency Spectrum
0.0
0.1
0.2
0.3
0.4
0.5
Repeating the simulation multiple times, would
give us a predicted mutation spectrum.
Probability
z
Frequency (out of n)
0.20
0.15
0.10
0.05
0.00
Proportion of All Variants
0.25
0.30
0.35
Frequency Spectrum (n = 10)
1
2
3
4
5
6
Frequency (out of n)
7
8
9
10
0.10
0.05
0.00
Proportion of All Variants
0.15
Frequency Spectrum (n = 100)
1 4 7
11
16
21
26
31
36
41
46
51
56
Frequency (out of n)
61
66
71
76
81
86
91
96
Frequency Spectrum
z
Constant size population
Exponentially growing population
z
Most variants are rare
z
• For n = 100, ~44% of variants occur < 5/100.
• For n = 10, ~35% of variants observed once.
Mutation Spectrum
z
Depends on genealogy
z
Does not depend on
• Population Size
• Population Growth
• Population Subdivision
• Mutation rate!
Deviations from Neutral Spectrum
z
When would you expect deviations from
the spectra we described?
z
What would you expect for …
z
Why?
• A rapidly growing population?
• A population whose size is decreasing?
Effect of Polymorphism Type
Data from Cargill et al, 1999
Number of Mutations
z
Can be derived from coalescent tree
z
Analytical results possible
• What are the key features?
• Trace back in time until MRCA, tracking
mutation events
Sample of Two Sequences
z
Track coalescences and mutations
• Probability of a coalescent event?
• Depends on population size …
• Probability of a mutation?
• Depends on mutation rate …
z
Proceed backwards until either occurs…
• Conditional probability for each outcome?
Two Identical Sequences
PCA
P2 ( S is 0) ≈
PCA + Pmut
1/ 2N
=
1 / 2 N + 2µ
1
=
1+θ
Full distribution of S…
z
Probability that first j events are mutations…
⎛ θ ⎞ ⎛ 1 ⎞
P2 ( j ) = ⎜
⎟ ⎜
⎟
⎝1+θ ⎠ ⎝1+θ ⎠
j
Example…
z
2 sequences
Population size N = 25,000
Mutation rate µ = 10-5
z
Probability of 0, 1, 2, 3… mutations
z
z
And for multiple sequences…
z
Describe number of mutations until the
next coalescence event
z
Proceed back in time, until:
• One of n sequences mutates…
• A coalescent event occurs…
• Then track mutations in (n-1) sequences
Formulae …
j
⎛
⎞
⎛n⎞
⎜
⎟
⎜⎜ ⎟⎟
⎜
⎟
⎝ 2⎠
j
⎜ nµ
⎟
⎞ n −1
⎛ θ
2
N
=⎜
Qn ( j ) = ⎜
⎟
⎟
⎛n⎞ ⎟
⎛n⎞ ⎝θ + n −1⎠ θ + n −1
⎜
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
2⎠ ⎟
2⎠
⎜
⎝
⎝
⎜ nµ +
⎟ nµ +
2N ⎠
2N
⎝
j
Pn ( j ) = ∑ Pn −1 ( j − i )Qn (i )
i =0
Example…
z
3 sequences
Population size N = 25,000
Mutation rate µ = 10-5
z
Probability of 0, 1, 2, 3… mutations
z
z
Number of Mutations
0.20
N = 10,000
n = 10
µ = 2 * 10-5
0.10
0.05
0.00
Probability
0.15
Approximately 2kb of sequence,
sequenced in 10 individuals
0
1
2
3
4
5
6
Number of Mutations
7
8
9
10+
So far …
z
One homogeneous population
• Coalescence times
• Number of mutations
• Expectation
• Distribution
• Spectrum of mutations
z
Several assumptions, including …
• No recombination
Recombination …
z
z
z
No recombination
•
Single genealogy
Free recombination
•
•
Two independent genealogies
Same population history
Intermediate case
•
Correlated genealogies
Recommended Reading
Richard R. Hudson (1990)
Gene genealogies and the coalescent process
Oxford Surveys in Evolutionary Biology, Vol. 7.
D. Futuyma and J. Antonovics (Eds).
Oxford University Press, New York.
Related documents