Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Taming the Beast
Taming the Beast Workshop
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Molecular Evolution Models
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
David Rasmussen & Carsten Magnus
Let BEAST2 choose the
right model
References
June 27, 2016
1 / 31
Taming the Beast
Outline
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
Models of sequence evolution:
I
I
rate matrices
Markov chain model
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
I
Variable rates amongst different sites: ”+Γ ”
I
Codons and data partitions
I
Implementation in BEAST2
References
2 / 31
Taming the Beast
Levels of evolution
phenotype
genotype
sequence level
ACUGAACGUGACUACUG
ACUGAACGUAACUACUG
e.g. antigenic level: Antibody binding to HIV
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
codon: three nucleotides encode
for one amino acid
Let BEAST2 choose the
right model
References
one nucleotide change can already
change the phenotype
alphabet:
4 nucleotides:
DNA: TCAG
RNA: UCAG
20 amino acids
3 / 31
Taming the Beast
Levels of evolution
phenotype
genotype
sequence level
ACUGAACGUGACUACUG
ACUGAACGUAACUACUG
e.g. antigenic level: Antibody binding to HIV
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
codon: three nucleotides encode
for one amino acid
Let BEAST2 choose the
right model
References
one nucleotide change can already
change the phenotype
alphabet:
4 nucleotides:
DNA: TCAG
RNA: UCAG
20 amino acids
When comparing two nucleotide sequences we have to keep in
mind that they are the result of mutation during replication
(genotypic level) and selection (phenotypic level).
3 / 31
Sequence alignment
ATTACGAC
TCTACGAC
way of arranging sequences to identify regions
of similarity that may be a consequence of
functional, structural, or evolutionary
relationships between the sequences
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
I
I
To find an alignment: concept of positional homology:
nucleotides (or amino acids) show positional homology if
they exist at equivalent positions in the respective sequence.
The universal genetic code
Let BEAST2 choose the
right model
References
Programs for alignment MUSCLE, CLUSTAL which can be
called from e.g. AliView, MegAlign,. . .
BEAST analysis starts with aligned sequences!!!
→ file format .fas, .fasta, .nexus
4 / 31
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
Models for nucleotide substitions
References
5 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
Levels of evolution
A C T T G A T G
Sequence alignment
Substitution models
Substitution rate matrices
A C T A G C T G
taxon 1
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A G T T G C T G
taxon 2
A C T T G A T G
taxon 3
The universal genetic code
Let BEAST2 choose the
right model
References
6 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
A C T T G A T G
Levels of evolution
Sequence alignment
single substitution
C > G
Substitution models
Substitution rate matrices
A C T A G C T G
taxon 1
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A G T T G C T G
taxon 2
A C T T G A T G
taxon 3
The universal genetic code
Let BEAST2 choose the
right model
References
6 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
A C T T G A T G
Levels of evolution
multiple substitutions
T > C
C > A
Sequence alignment
Substitution models
Substitution rate matrices
A C T A G C T G
taxon 1
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A G T T G C T G
taxon 2
A C T T G A T G
taxon 3
The universal genetic code
Let BEAST2 choose the
right model
References
6 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
A C T T G A T G
Levels of evolution
Sequence alignment
convergent substitution
A > C
A > C
A C T A G C T G
Substitution models
Substitution rate matrices
taxon 1
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A G T T G C T G
taxon 2
A C T T G A T G
taxon 3
The universal genetic code
Let BEAST2 choose the
right model
References
6 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
A C T T G A T G
A C T T G A T G
C > G
A C T A G C T G
taxon 1
A G T T G C T G
A C T T G A T G
A C T T G A T G
T > C
C > A
A C T A G C T G
taxon 1
taxon 2
A G T T G C T G
taxon 2
taxon 3
A C T T G A T G
taxon 3
A > C
A > C
Substitutions modelled as
Markov chains
A C T A G C T G
A G T T G C T G
A C T T G A T G
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
Problem of phylogenetics:
We observe sequences but not their evolutionary history. Thus we
have to take all possible evolutionary trajectories into account.
6 / 31
Taming the Beast
The fundamental problem
Molecular Evolution
Models
Levels of evolution
A C T T G A T G
A C T T G A T G
A C T A G C T G
C > G
T > C
taxon 1
Sequence alignment
A C T T G A T G
C > A
A C T A G C T G
A > C
taxon 1
A G T T G C T G
taxon 2
A G T T G C T G
taxon 2
A C T T G A T G
taxon 3
A C T T G A T G
taxon 3
A > C
Substitution models
A C T A G C T G
A G T T G C T G
A C T T G A T G
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
Problem of phylogenetics:
We observe sequences but not their evolutionary history. Thus we
have to take all possible evolutionary trajectories into account.
+ The sequence evolution model appears in the posterior:
P(
|
)=P(
ACAC...
TCAC...
ACAG...
|
ACAC...
TCAC...
ACAG...
)P( | )P(
P( )
)P( )P( )
ACAC...
TCAC...
ACAG...
6 / 31
Taming the Beast
A model for nucleotide substitutions
State space of each nucleotide position: S = {T , C, A, G}
Molecular Evolution
Models
Levels of evolution
Sequence alignment
-(a+b+c)
Example: Assume
the process is at
state T
A
T
C
A
G
T
b
A
Substitution models
C
A
a
Substitution rate matrices
Substitutions modelled as
Markov chains
C
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
c
References
G
7 / 31
Taming the Beast
A model for nucleotide substitutions
State space of each nucleotide position: S = {T , C, A, G}
Molecular Evolution
Models
Levels of evolution
Sequence alignment
-(a+b+c)
Example: Assume
the process is at
state T
A
T
C
A
G
T
b
A
Substitution models
C
A
a
Substitution rate matrices
Substitutions modelled as
Markov chains
C
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
c
References
G
Substitution rate matrix:
T
T
-(a+b+c)
C
d
A
g
G
j
C
a
-(d+e+f)
h
k
A
b
e
-(g+h+i)
l
G
c
f
i
-(j+k+l)
7 / 31
Site models in BEAST2
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
8 / 31
Taming the Beast
The easiest substitution model: JC69
Molecular Evolution
Models
Levels of evolution
Sequence alignment
JC69:
I
Substitution models
Substitution rate matrices
named after TH Jukes, CR Cantor: Evolution of protein
molecules. 1969 [Jukes and Cantor, 1969].
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
I
all substitution have the same rate, λ
T
C
A
G
The universal genetic code
Let BEAST2 choose the
right model
Substitution rates:
T
T
·
C
λ
Aλ
G λ
C
λ
·
λ
λ
A
λ
λ
·
λ
References
G
λ
λ
λ
·
9 / 31
Accounting for transition/transversion: K80
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
K80:
Substitution models
Substitution rate matrices
I
named after M Kimura: A simple method for estimating
evolutionary rates of base substitutions through comparative
studies of nucleotide sequences. 1980. [Kimura, 1980]
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
transitions happen at rate α, transversions at rate β
Let BEAST2 choose the
right model
References
pyrimidines
(one ring)
T
C
transversion
purines
(two rings)
A
transition
G
Substitution rates:
T
T
·
C
α
Aβ
G β
C
α
·
β
β
A
β
β
·
α
G
β
β
α
·
10 / 31
Accounting for transition/transversion: HKY
Taming the Beast
Molecular Evolution
Models
HKY:
Levels of evolution
Sequence alignment
I
named after [Hasegawa et al., 1984, Hasegawa et al., 1985]
I
accounting for transitions (rate α), transversions (rate β)
I
after a long period of evolution, equilibrium frequencies are
reached
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
Substitution rates:
pyrimidines
(one ring)
T
C
transversion
purines
(two rings)
A
transition
G
T
T
·
C
απ
T
A βπT
G βπT
·
α
=
β
β
α
·
β
β
C
απC
·
βπC
βπC
β
β
·
α
A
βπA
βπA
·
απA
β
πT
β
· 0
α 0
·
0
References
G
βπG
βπG
απG
·
0
πC
0
0
0
0
πA
0
0
0
0
πG
11 / 31
Accounting for transition/transversion: TN93
Taming the Beast
Molecular Evolution
Models
Levels of evolution
TN93:
Sequence alignment
Substitution models
I
named after [Tamura and Nei, 1993]
I
accounting for different transition rates between T and C as
well as A and G
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
after a long period of evolution, equilibrium frequencies are
reached
pyrimidines
(one ring)
T
α1
C
transversion
purines
(two rings)
A
α2
transition
G
Let BEAST2 choose the
right model
References
Substitution rates:
T
C
A
G
T
·
α1 πT
βπT
βπT
C
α1 πC
·
βπC
βπC
A
βπA
βπA
·
α2 πA
G
βπG
βπG
α2 πG
·
12 / 31
A more general substitution model: GTR
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
GTR (REV):
Substitution models
Substitution rate matrices
I
generalised time-reversible model
I
based on three papers:
[Tavaré, 1986, Yang, 1994, Zharkikh, 1994]
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
Substitution rates:
T
T
·
C
aπ
T
A bπT
G cπT
C
aπC
·
dπC
eπC
A
bπA
dπA
·
fπA
G
cπG
eπG
fπG
·
+ quite flexible
+ time-reversible
- not completely
general
13 / 31
The most general substitution model –
implemented in BEAST2 but not in BEAUti
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
UNREST:
I
Substitutions modelled as
Markov chains
unrestricted model first described in [Yang, 1994]
Variable substitution rates
across sites
Codons and data partitions
I
each substitution has a (different) rate
The universal genetic code
Let BEAST2 choose the
right model
References
Substitution rates:
T
T
·
C
d
Ag
G j
C
a
·
h
k
A
b
e
·
l
G
c
f
i
·
+ most general case
+ all other models are special cases
of UNREST
- mathematical very complicated and
not handy to use
- not time-reversible
14 / 31
Substitution models in BEAUti
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
model
JC69
K80
parameters
1
2+3∗
HKY
2+3∗
TN93
GTR
UNREST
3+3∗
6+3∗
12
description
all substitutions have the same rate
accounts for transition and transversions,
not in BEAUti
distinction between transition and
transversions,
including equilibrium
frequencies
different rates for transitions
general, but still time-reversible
most general, not time-reversible, not in
BEAUti
Let BEAST2 choose the
right model
References
∗
Can be empirically estimated from the alignment or inferred alongside the
substitution rates.
15 / 31
Taming the Beast
The fundamental problem - again
Molecular Evolution
Models
A C T T G A T G
Levels of evolution
A C T A G C T G
taxon 1
Sequence alignment
Substitution models
Substitution rate matrices
A G T T G C T G
taxon 2
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A C T T G A T G
taxon 3
Problem of phylogenetics:
We observe sequences but not their evolutionary history. Thus we
have to take all possible evolutionary trajectories into account.
The universal genetic code
Let BEAST2 choose the
right model
References
16 / 31
Taming the Beast
The fundamental problem - again
Molecular Evolution
Models
A C T T G A T G
Levels of evolution
A C T A G C T G
taxon 1
Sequence alignment
Substitution models
Substitution rate matrices
A G T T G C T G
taxon 2
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
A C T T G A T G
taxon 3
Problem of phylogenetics:
We observe sequences but not their evolutionary history. Thus we
have to take all possible evolutionary trajectories into account.
The universal genetic code
Let BEAST2 choose the
right model
References
So far we determined rates of nucleotide substitutions. But we
need probabilities.
16 / 31
Taming the Beast
Nucleotide substitutions as Markov chains
(MC)
Definition of a Markov chain (see also
[Ross, 1996])
Nucleotide
substitutions as MC
A
G
stochastic process, i.e. a series of
random experiments through time
Molecular Evolution
Models
T
pTA
A
G
A
pTC
A
G
pCC
A
G
C
C
A
A
A
A
C
C
C
C
T
T
T
time
T
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
17 / 31
Taming the Beast
Nucleotide substitutions as Markov chains
(MC)
Definition of a Markov chain (see also
[Ross, 1996])
Nucleotide
substitutions as MC
A
G
stochastic process, i.e. a series of
random experiments through time
lives on a state space and jumps to the
different states
Molecular Evolution
Models
T
pTA
A
G
pTC
A
G
pCC
G
C
C
A
A
A
A
A
A
C
C
C
C
T
T
T
T
A
pTC
pCT
time
T
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
C
G
17 / 31
Taming the Beast
Nucleotide substitutions as Markov chains
(MC)
Definition of a Markov chain (see also
[Ross, 1996])
Nucleotide
substitutions as MC
A
G
stochastic process, i.e. a series of
random experiments through time
pTA
T
A
G
pTC
A
G
pCC
A
G
C
C
A
A
A
A
A
C
C
C
C
T
T
T
T
lives on a state space and jumps to the
different states
pTC
pCT
G
pTA
A
G
time
T
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
C
G
A
A
memorylessness: the probability of
jumping to a state only depends on the
actual state
Molecular Evolution
Models
pTC
A
G
pCC
A
G
T
A
C
C
A
A
A
A
C
C
C
T
T
T
time
C
T
17 / 31
Why Markov chains are a great model for
nucleotide substitutions
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
memorylessness: a nucleotides substitution happens
independently from the substitution history at this site
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
substitution rate matrix defines the transition probabilities
I
applying theories of linear algebra we can calculate the
transition probability matrix according to:
Let BEAST2 choose the
right model
References
P(t) = eQt = U diag(e1 t , e2 t , e3 t , e4 t )U−1
I
the transition probabilities take into account every
possible substitution path (Chapman-Kolmogorov
theorem)
18 / 31
Example of transition probabilities: JC69
Molecular Evolution
Models
Substitution rates:
−3λ
λ
Q=
λ
λ
λ
−3λ
λ
λ
Taming the Beast
T
λ
λ
−3λ
λ
λ
λ
λ
−3λ
C
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
A
P(t) = eQt
G
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
+
Let BEAST2 choose the
right model
References
transition probability matrix:
p0 (t)
p1 (t)
P(t) =
p1 (t)
p1 (t)
p1 (t)
p0 (t)
p1 (t)
p1 (t)
p1 (t)
p1 (t)
p0 (t)
p1 (t)
p1 (t)
p1 (t)
p1 (t)
p0 (t)
with p0 (t) = 14 + 34 e−4λt
and p1 (t) = 41 − 14 e−4λt
19 / 31
Taming the Beast
Example of transition probabilities: JC69
Substitution rates:
−3λ
λ
Q=
λ
λ
λ
−3λ
λ
λ
T
λ
λ
−3λ
λ
λ
λ
λ
−3λ
Molecular Evolution
Models
C
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
A
P(t) = eQt
G
Variable substitution rates
across sites
+
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
with p0 (t) = 14 + 34 e−4λt
and p1 (t) = 41 − 14 e−4λt
0.8
p1 (t)
p1 (t)
p1 (t)
p0 (t)
p0(t)
0.6
p1 (t)
p1 (t)
p0 (t)
p1 (t)
0.4
p1 (t)
p0 (t)
p1 (t)
p1 (t)
0.2
p0 (t)
p1 (t)
P(t) =
p1 (t)
p1 (t)
p1(t)
0.0
transistion probabilities
transition probability matrix:
1.0
per site
λ = 0.015 substitutions
day
0
20
40
60
80
100
time in days
19 / 31
Taming the Beast
JC69: Stationary distribution
Suppose we have a sequence that evolves with rate
per site
λ = 2.2/3 × 10−9 substitutions
. We follow the evolution of 4
year
different sites with T at site 1, C at site 2, A at site 3 and G at
site 4 at time point 0. How likely is it, that after time t has
passed, there is a T,C,A or G at the four different positions? To
answer this question, we follow the time evolution of the
transition probability matrix P(t):
0.46 0.18 0.18 0.18
0.18 0.46 0.18 0.18
0.18 0.18 0.46 0.18
0.18 0.18 0.18 0.46
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
1
4.5x108
0.31 0.23 0.23
0.23 0.31 0.23
0.23 0.23 0.31
0.23 0.23 0.23
9x108
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.23
0.23
0.23
0.31
1.8x109
I when t → ∞
stationary distribution
is reached
I Any long sequence
time/years
(e.g. TTTTTT...) at
time 0, will be
composed of equal
amounts of T,C,A,G
after time t → ∞
20 / 31
Taming the Beast
JC69: Time transformation
The times we look at, e.g. in species evolution, are very often very large.
Thus, instead of real time, we display an evolutionary time scale in terms of
sequence distances. As one substitution happens at rate 3λ in JC69 (keep in
mind that in other models the expected time to substitution is different!), we
expect one substitution to happen after time 1/(3λ). This is due to
exponentially distributed waiting times for an event happening at a certain
rate. This means, that we expect one substitution after
1
≈ 4.5 × 10−8 years in our example.
2.2×10−9
0.46 0.18 0.18 0.18
0.18 0.46 0.18 0.18
0.18 0.18 0.46 0.18
0.18 0.18 0.18 0.46
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
4.5x108
0
1
9x108
2
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.25 0.25 0.25 0.25
0.31 0.23 0.23 0.23
0.23 0.31 0.23 0.23
0.23 0.23 0.31 0.23
0.23 0.23 0.23 0.31
0
Molecular Evolution
Models
1.8x109
4
time/years
time in years
expected time to 1 substitution
t=
d
3λ
in JC69
Trick from physics:
compare units:
[t] =years
substitutions
d
] = ##
[ 3λ
substitutions/year
d=timex(3 λ)
21 / 31
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
Variable substitution rates across sites
References
22 / 31
Variable rates
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
I
so far: all sites in the sequence evolve at the same rate
but: substitution rates might differ over the genome
I
I
mutation rates might differ over sites
selective pressure might be different on the phenotypic level
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
23 / 31
Variable rates
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
I
so far: all sites in the sequence evolve at the same rate
but: substitution rates might differ over the genome
I
I
mutation rates might differ over sites
selective pressure might be different on the phenotypic level
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
We extend the existing models, by replacing the constant rates by
Γ -distributed random variables (notation: JC69+Γ , HKY+Γ , . . . )
23 / 31
Taming the Beast
we replace the substitution rate λ by
λR, where R is a Γ -distributed
random variable with shape
parameter α and mean 1.
g(r)
λ 7→ λR
0.0 0.5 1.0 1.5 2.0
Example: JC69+Γ
α=0.2
α=1
α=2
α=20
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
0.0 0.5 1.0 1.5 2.0 2.5 3.0
r
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
24 / 31
Taming the Beast
we replace the substitution rate λ by
λR, where R is a Γ -distributed
random variable with shape
parameter α and mean 1.
g(r)
λ 7→ λR
0.0 0.2 0.4 0.6 0.8 1.0
Example: JC69+Γ
Molecular Evolution
Models
α=2
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
0.0 0.5 1.0 1.5 2.0 2.5 3.0
r
In BEAUti:
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
Change number of Gamma Category Count to allow for rate
variation. 4 to 6 categories work normally well.
24 / 31
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
Codons and data partitions
References
25 / 31
Taming the Beast
The codon sun
Molecular Evolution
Models
A codon consists of three nucleotides, translating to one of the
20 amino acids:
Amino Acid
Alanine
Arginine
Asparagine
Asparticacid
Asparagineor
asparticacid
Cysteine
Glutamine
Glutamicacid
Glutamine
or
glutamicacid
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
[Sanger, 2015]
Three-Letter
Abbreviation
Ala
Arg
Asn
Asp
One-Letter
Symbol
A
R
N
D
Molecular
Weight
89Da
174Da
132Da
133Da
Asx
Cys
Gln
Glu
B
C
Q
E
133Da
121Da
146Da
147Da
Glx
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Z
G
H
I
L
K
M
F
P
S
T
W
Y
V
147Da
75Da
155Da
131Da
131Da
146Da
149Da
165Da
115Da
105Da
119Da
204Da
181Da
117Da
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Let BEAST2 choose the
right model
References
[Promega, 2015]
26 / 31
Taming the Beast
Example: Codon CTA
Molecular Evolution
Models
Levels of evolution
Overview over substitution rates to the same codon CTA, the
thickness of arrows represent different rates:
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
GTA
(Val)
ATA
(Ile)
TTA
(Leu)
CTT
(Leu)
CTA
(Leu)
CGA
(Arg)
Variable substitution rates
across sites
Codons and data partitions
CTC
(Leu)
CTG
(Leu)
CCA
(Pro)
CAA
(Gln)
I
synonymous substitutions:
AA does not change
I
nonsynonymous
substitutions:
AA does change
I
bigger arrows: transition
I
smaller arrows: transversion
The universal genetic code
Let BEAST2 choose the
right model
References
adapted from [Yang, 2014]
27 / 31
Varying substitution rates amongst the codon
positions
Taming the Beast
Molecular Evolution
Models
Levels of evolution
[Bofkin and Goldman, 2007] have shown that in protein encoding
regions
I
second codon positions evolve more slowly than first codon
positions
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
third codon positions evolve faster than first codon positions
Let BEAST2 choose the
right model
References
28 / 31
Varying substitution rates amongst the codon
positions
Taming the Beast
Molecular Evolution
Models
Levels of evolution
[Bofkin and Goldman, 2007] have shown that in protein encoding
regions
I
second codon positions evolve more slowly than first codon
positions
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
third codon positions evolve faster than first codon positions
Let BEAST2 choose the
right model
References
⇒ Different codon positions can have different evolutionary
rates. BEAST2 allows for estimating these rates separately.
file BEAST2.4.x/examples/nexus/primate-mtDNA.nex
28 / 31
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
Including the choice of substitution rate model into your BEAST
analysis
Let BEAST2 choose the
right model
References
29 / 31
Rate models in BEAST2
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
BEAST2 allows for including different site models into your
analysis (+ Site Model tab in BEAUti)
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
Which site model is the best for your data?
Let BEAST2 choose the
right model
References
30 / 31
Rate models in BEAST2
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
BEAST2 allows for including different site models into your
analysis (+ Site Model tab in BEAUti)
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
Which site model is the best for your data?
Let BEAST2 choose the
right model
References
T: package bModelTest: Bayesian site model selection for
nucleotide data
30 / 31
Rate models in BEAST2
Taming the Beast
Molecular Evolution
Models
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
I
BEAST2 allows for including different site models into your
analysis (+ Site Model tab in BEAUti)
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
Codons and data partitions
The universal genetic code
I
Which site model is the best for your data?
Let BEAST2 choose the
right model
References
T: package bModelTest: Bayesian site model selection for
nucleotide data
T: package SubstBMA: modelling across-site variation in the
nucleotide
30 / 31
References I
Taming the Beast
Molecular Evolution
Models
-
Bofkin, L. and Goldman, N. (2007). Variation in Evolutionary Processes at Different Codon Positions. Molecular Biology
and Evolution, 24(2):513–521.
-
Hasegawa, M., Kishino, H., and Yano, T. (1985). Dating of the Human Ape Splitting by a Molecular Clock of
Mitochondrial-Dna. Journal of Molecular Evolution, 22(2):160–174.
-
Hasegawa, M., Yano, T., and Kishino, H. (1984). A New Molecular Clock of Mitochondrial-Dna and the Evolution of
Hominoids. Proceedings of the Japan Academy Series B-Physical and Biological Sciences, 60(4):95–98.
-
Jukes, T. and Cantor, C. (1969). Evolution of protein molecules. Mammalian Protein Metabolism., pages 21–123.
-
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies
of nucleotide sequences. Journal of molecular evolution, 16(2):111–120.
-
Promega (2015). The amino acids: https://www.promega.com/ /media/files/resources/technical references/amino
acid abbreviations and molecular weights.pdf.
Codons and data partitions
-
Ross, S. M. (1996). Stochastic Processes. Second edition. Wiley.
-
Sanger (2015). The codon sun:
ftp://ftp.sanger.ac.uk/pub/yourgenome/downloads/activities/kras-cancer-mutation/krascodonwheel.pdf.
Let BEAST2 choose the
right model
-
Tamura, K. and Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of
mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3):512–526.
-
Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. In Some mathematical
questions in biology—DNA sequence analysis (New York, 1984), pages 57–86. Amer. Math. Soc., Providence, RI.
-
Yang, Z. (1994). Estimating the pattern of nucleotide substitution. Journal of molecular evolution, 39(1):105–111.
-
Yang, Z. (2014). Molecular Evolution – A Statistical Approach. Oxford University Press.
-
Zharkikh, A. (1994). Estimation of evolutionary distances between nucleotide sequences. Journal of molecular evolution,
39(3):315–329.
Levels of evolution
Sequence alignment
Substitution models
Substitution rate matrices
Substitutions modelled as
Markov chains
Variable substitution rates
across sites
The universal genetic code
References
31 / 31