Download Calculation of numbers of synonymous and non

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Calculation of numbers of synonymous and
non-synonymous substitutions per site using the method of
Nei & Gojobori (1986).
Show that syn and non-syn sites evolve at different rates.
Need to calculate:
S = no. syn sites
N = no. non-syn sites
Sd = no. syn differences
Nd = no. non-syn differences
Now define :
DS = Sd/S (fraction of syn sites that differ)
DN = Nd/N (fraction of non-syn sites that differ)
These are equivalent to D in the Jukes-Cantor model.
We can use the JC distance formula to calculate two
evolutionary distances.
dS = -3/4 ln(1- 4DS/3)
dN = -3/4 ln(1- 4DN/3)
site)
(no. of syn subs per syn site)
(no. of non-syn subs per non-syn
These are equivalent to the usual Jukes-Cantor d, which is
the number of substitutions per site if all sites are
equivalent.
For any two homologous sequences, we expect dS > dN
because selection slows down the rate of non-syn subs.
If we know the time t since two species diverged, we can
calculate the rates of syn and non-syn subs:
dS/2t and dN/2t.
These rates would be numbers of subs per site per million
years.
If we don’t know t, we can still compare the two distances.
The ratio dN/dS tells us how much slower the non-syn subs
are.
Notation:
d is sometimes called K
dS is sometimes called KS
dN is sometimes called KA (where the A means amino acid
subs)
dN/dS is the same thing as KA/KS
Seq 1
Seq 2
1
Pro
CCC
CCC
Pro
2
Phe
UUU
UUC
Phe
3
Gly
GGG
GAG
Ala
4
Leu
UUA
CUA
Leu
5
Phe
UUU
GUA
Val
Calculate S for each codon.
Check the genetic code A fourfold degenerate site counts as S = 1(N = 0)
A non-degenerate site counts as S = 0 (N = 1)
A two fold degenerate site counts as S = 1/3 (N = 2/3)
1. S = 0 + 0 + 1 = 1
2. S = 0 + 0 + 1/3 = 1/3
3. S = 0 + 0 + 1 = 1 (whether we look at Gly or Ala codons)
4. for UUA, S = 1/3 + 0 + 1/3 = 2/3
for CUA, S = 1/3 + 0 + 1 = 4/3
Take the average of these: S = 1 for codon 4.
5. for UUU, S = 1/3
for GUA, S = 1
Take average: S = 2/3
For whole sequence, S = 1 + 1/3 + 1 + 1 + 2/3 = 4
N = total number of sites - S = 15 - 4 = 11
Seq 1
Seq 2
1
Pro
CCC
CCC
Pro
2
Phe
UUU
UUC
Phe
3
Gly
GGG
GAG
Ala
4
Leu
UUA
CUA
Leu
Calculate Sd and Nd for each codon.
1. Sd = 0,
Nd = 0
2. Sd = 1,
Nd = 0
3. Sd = 0,
Nd = 1
4. Sd = 1,
Nd = 0
5. this could happen two ways
UUU --> GUU --> GUA
N d = 1 Sd = 1
UUU --> UUA --> GUA
Nd = 1 Nd = 1
Take average of these two:
Sd = 0.5, Nd = 1.5
5
Phe
UUU
GUA
Val
route 1
Sd = 1, Nd = 1
route 2
Sd = 0, Nd = 2
(note that if all three positions were different there would be
6 routes to average)
Total Sd = 2.5
Total Nd = 2.5
DS = 2.5/4 = 0.625
dS = 1.34
DN = 2.5/11 = 0.227
dN = 0.271
Non-syn rate is much slower than syn rate in this example