Download Molecular Clocks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Multi-state modeling of biomolecules wikipedia , lookup

Size-exclusion chromatography wikipedia , lookup

Molecular evolution wikipedia , lookup

History of molecular evolution wikipedia , lookup

Transcript
Molecular Clocks
Carolin Kosiol
<[email protected]>
Molecular clock hypothesis
The rate of molecular sequence evolution is
constant over time and among evolutionary
lineages (Zuckerkandl & Pauling 1962)
Based on observations from data:
• haemoglobin (Zuckerkandl & Pauling 1962)
• cytochrome c (Margoliash 1963)
• fibrinopeptides (Doolittle & Blomback 1964)
[Review on the history of molecular clock: Kumar
2005]
Molecular clock hypothesis
(cont.)
Molecular clock hypothesis
Attention to Details!
• The clock is stochastic: changes happen at
random intervals rather than
regularly(exponential under Markov process)
• Different proteins (or their regions) have “their
own clocks”, i.e., evolve under different rates
• Rate constancy may not hold globally but applies
to groups of species
The utility of molecular clock
The utility of molecular clock:
• Reconstruct phylogenies and estimate
divergence times between species
Neutralist-selectionist debate:
• Constant evolution among species is
incompatible with the selectionist view: species
in different habitats, life span, etc. must be under
different selective regimes
• The observed clock-like behaviour of molecular
evolution is “perhaps the strongest evidence” for
the neutral theory [Kimura & Ohta 1971]
The utility of molecular clock
Difficulties in molecular dating
• The molecular clock is often violated: e.g., changes in
generation times and population sizes, selective forces,
species-specific differences (e.g., Ayala 1999)
• Assumptions about substitution rates affect time estimation
• Fossil calibrations always involve uncertainties:
error in fossil dating
incomplete fossil record
errors in assigning a fossil on the phylogeny
• Rates and patterns of substitution are different at different loci
How to improve molecular dating:
• Use multiple genes which may be evolving in different ways
• Use multiple fossil calibrations to constrain the rates
• Use statistically sound estimation methods
Testing for molecular clock
Likelihood ratio test
With s species
Null = clock; (s-1) parameters
No clock; (2s-3) parameters
Limitations of molecular clock
tests
• Molecular clock tests typically evaluate a (weak)
hypothesis that the tips of the tree are equally distant
from the root
• Cannot distinguish a constant rate from an average
variable rate within a lineage
• None of the molecular clock tests examines whether the
rate is constant over time
Failure to reject molecular clock may be due to
* lack of information in the data (small sample, little
divergence)
* lack of power of the test:
e.g., the relative rate test applied to only 3 species
Likelihood: local clock
Problem:
• How to choose number of rates and subdivide the tree?
• [rate smoothing was suggested]
• Some rate assignments make the model unidentifiable
Likelihood methods
Advantages:
• Well-studied statistical framework
• Multiple loci can be analyzed simultaneously (accounting
for differences)
• Disadvantages:
• Assignment of rates to branches is arbitrary
• Calibration node ages are assumed as known without
error. [Penalized likelihood approach uses constrained
minimization to incorporate fossil uncertainties problematic]
• Incorporating uncertainties and sophisticated rate
models is computationally expensive
Bayesian methods
Uncertainties in fossil ages
• Incorporated within the prior
on divergence times.
• Thorne et al. (1998) allowed
lower and upper bounds on
node ages
• During MCMC sampling:
fossil calibration age t ~ U(tL,
tU)
• Hard bound = values outside
(tL, tU) have 0-probability
• Soft bounds (Yang & Rannala
2006) allow a distribution with
5% of weight for values
outside of interval (tL, tU)
Summary
• Error-fossil model performs well on simulated data
• In real data, there seems to be a lot of conflict, and the
Bayesian method appears place high confidence on
whatever results it ends up with
• Bayesian framework provides a flexible framework to
incorporate different kinds of information and
uncertainties
• When models are complex it becomes difficult to
understand the effects of each single component
Molecular clock
For an indefinite time I clung to the [time]
machine as it swayed and vibrated, quite
unheeding how I went, and when I brought myself
to look at the dials again I was amazed to find
where I had arrived. One dial records days, and
another thousands of days, another millions of
days, and another thousands of millions […]
H.D. Wells “The Time Machine” 1895