Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Building Phylogenies Parsimony 1 Methods • Distance-based • Parsimony • Maximum likelihood Note • Some of the following figures come from: – [S05] Swofford http://www.csit.fsu.edu/~swofford/bioin formatics_spring05 – [F05] Felsenstein http://evolution.gs.washington.edu/gs54 1/2005/ Parsimony methods • Goal: Find the tree that allows evolution of the sequences with the fewest changes. • This is called a most parsimonious (MP) tree • Parsimony is implemented in PAUP* http://paup.csit.fsu.edu/ • Compatibility methods are closely related to parsimony: – Goal: Find tree that perfectly fits the most characters. Evolutionary Steps A G G A G Steps can have weights Parsimony A B C D a 0 1 1 1 b 0 1 1 1 c 0 0 1 1 d 0 1 1 0 e 0 0 0 1 f 1 0 0 0 a, b f A d B c d C Typically, each site is treated separately e D Some numbers Number of unrooted trees on n 2 species: Un = (2n5)(2n7)(2n9) . . . (3)(1), Number of rooted trees on n 3 species: Rn = (2n5) Un The number of rooted trees [F05] Small versus Large Parsimony • Parsimony score of a tree: The smallest (weighted) number of steps required by the tree • (Large) Parsimony: Find the tree with the lowest parsimony score • Small Parsimony: Given a tree, find its parsimony score • Small parsimony is by far the easier problem. – Used to solve large parsimony A DNA data set [F05] An example tree [F05] Most parsimonious states for site 1 Most parsimonious states for site 2 Most parsimonious states for site 3 Most parsimonious states for sites 4 and 5 Most parsimonious states for site 6 Evolutionary steps on tree Only one choice of reconstruction at each site is shown 9 steps in all Algorithms for Small Parsimony • Fitch’s algorithm: – Based on set operations – Evolutionary steps have same weight • Sankoff’s algorithm: – Based on dynamic programming – Allows steps to have different weights • Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site. Fitch’s Algorithm • Each node v in tree has a set X(v) • If v is a leaf (tip), X(v) is the nucleotide observed at v – if there is ambiguity, X(v) contains all possible nucleotides at v • If v is a node with descendants u and w, – Let Y X(u) X(w) – If Y make X(v) Y, – If Y make X(v) X(u)X(w) and count one step. Fitch’s Algorithm: Example [F05] Sankoff’s Algorithm • Let cij be the cost of going from state i to state j. • E.g., transitions (AG or CT) are more probable than transversions, so give lower weight to transitions • Let Sv(k) be the smallest (weighted) number of steps needed to evolve the subtree at or above node v, given that node v is in state k. Sankoff’s Algorithm • If v is a leaf (tip) if node v has (or could have) state k 0 Sv (k ) otherwise • If v is a node with descendants u and w Sv k min cki Su i min ckj Sw j i j • The minimum number of (weighted) steps is S * min Sroot k k Sankoff’s Algorithm: Example Sankoff’s Algorithm: Traceback Searching for an MP tree • Exhaustive search (exact) • Branch-and-bound search (exact) • Heuristic search methods – Stepwise addition – Branch swapping – Star decomposition Homology, orthology, and paralogy • Homology: Similarity attributed to descent from a common ancestor. • Orthologous sequences: Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. • Paralogous sequences: Homologous sequences within a single species that arose by gene duplication. Orthology and Paralogy http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html