Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Molecular Clocks Carolin Kosiol <[email protected]> Molecular clock hypothesis The rate of molecular sequence evolution is constant over time and among evolutionary lineages (Zuckerkandl & Pauling 1962) Based on observations from data: • haemoglobin (Zuckerkandl & Pauling 1962) • cytochrome c (Margoliash 1963) • fibrinopeptides (Doolittle & Blomback 1964) [Review on the history of molecular clock: Kumar 2005] Molecular clock hypothesis (cont.) Molecular clock hypothesis Attention to Details! • The clock is stochastic: changes happen at random intervals rather than regularly(exponential under Markov process) • Different proteins (or their regions) have “their own clocks”, i.e., evolve under different rates • Rate constancy may not hold globally but applies to groups of species The utility of molecular clock The utility of molecular clock: • Reconstruct phylogenies and estimate divergence times between species Neutralist-selectionist debate: • Constant evolution among species is incompatible with the selectionist view: species in different habitats, life span, etc. must be under different selective regimes • The observed clock-like behaviour of molecular evolution is “perhaps the strongest evidence” for the neutral theory [Kimura & Ohta 1971] The utility of molecular clock Difficulties in molecular dating • The molecular clock is often violated: e.g., changes in generation times and population sizes, selective forces, species-specific differences (e.g., Ayala 1999) • Assumptions about substitution rates affect time estimation • Fossil calibrations always involve uncertainties: error in fossil dating incomplete fossil record errors in assigning a fossil on the phylogeny • Rates and patterns of substitution are different at different loci How to improve molecular dating: • Use multiple genes which may be evolving in different ways • Use multiple fossil calibrations to constrain the rates • Use statistically sound estimation methods Testing for molecular clock Likelihood ratio test With s species Null = clock; (s-1) parameters No clock; (2s-3) parameters Limitations of molecular clock tests • Molecular clock tests typically evaluate a (weak) hypothesis that the tips of the tree are equally distant from the root • Cannot distinguish a constant rate from an average variable rate within a lineage • None of the molecular clock tests examines whether the rate is constant over time Failure to reject molecular clock may be due to * lack of information in the data (small sample, little divergence) * lack of power of the test: e.g., the relative rate test applied to only 3 species Likelihood: local clock Problem: • How to choose number of rates and subdivide the tree? • [rate smoothing was suggested] • Some rate assignments make the model unidentifiable Likelihood methods Advantages: • Well-studied statistical framework • Multiple loci can be analyzed simultaneously (accounting for differences) • Disadvantages: • Assignment of rates to branches is arbitrary • Calibration node ages are assumed as known without error. [Penalized likelihood approach uses constrained minimization to incorporate fossil uncertainties problematic] • Incorporating uncertainties and sophisticated rate models is computationally expensive Bayesian methods Uncertainties in fossil ages • Incorporated within the prior on divergence times. • Thorne et al. (1998) allowed lower and upper bounds on node ages • During MCMC sampling: fossil calibration age t ~ U(tL, tU) • Hard bound = values outside (tL, tU) have 0-probability • Soft bounds (Yang & Rannala 2006) allow a distribution with 5% of weight for values outside of interval (tL, tU) Summary • Error-fossil model performs well on simulated data • In real data, there seems to be a lot of conflict, and the Bayesian method appears place high confidence on whatever results it ends up with • Bayesian framework provides a flexible framework to incorporate different kinds of information and uncertainties • When models are complex it becomes difficult to understand the effects of each single component Molecular clock For an indefinite time I clung to the [time] machine as it swayed and vibrated, quite unheeding how I went, and when I brought myself to look at the dials again I was amazed to find where I had arrived. One dial records days, and another thousands of days, another millions of days, and another thousands of millions […] H.D. Wells “The Time Machine” 1895