* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download tree
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Koinophilia wikipedia , lookup
Human genome wikipedia , lookup
Point mutation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
The Genome Access Course Phylogenetic Analysis Phylogenetics •Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966) What is the ancestral sequence? • pfeffer • pepper • (pf/p)e(ff/pp)er Evolutionary Trees • • • • • • A tree is a connected, acyclic 2D graph Leaf: Taxon Node: Vertex Branch: Edge Tree length = sum of all branch lengths Phylogenetic trees are binary trees A Generic Tree Evolutionary Trees • Rooted – common ancestor – unique path to any leaf – directed • Unrooted – root could be placed anywhere – fewer possible than rooted Rooted Tree generated by DRAWGRAM (PHYLIP) Unrooted Tree generated by DRAWTREE (PHYLIP) Possible Evolutionary Trees Taxa (n) Rooted Unrooted (2n-3)!/(2n-2(n-2)!) (2n-5)!/(2n-3(n-3)!) 2 1 1 3 3 1 4 15 3 5 105 15 6 954 105 7 10395 954 8 135135 10395 9 2027025 135135 10 34459425 2027025 Genes vs. Species • Sequences show gene relationships, but phylogenetic histories may be different for gene and species • Genes evolve at different speeds • Horizontal gene transfer Methods for Phylogenetic Analysis • Character-State – Maximum Parsimony – Maximum Likelihood • Genetic Distance – Fitch & Margoliash – Neighbor-Joining – Unweighted Pair Group Phylogenetic Software • • • • • PHYLIP PAUP (Available in GCG) TREE-PUZZLE PhyloBLAST Felsenstein maintains an extensive list of programs on the PHYLIP site PHYLIP Programs • • • • • • dnapars/protpars dnadist/protdist dnaml (use fastDNAml instead) neighbor fitch/kitsch drawtree/drawgram Maximum Parsimony • • • • Most common method Allows use of all evolutionary information Build and score all possible trees Each node is a transformation in a character state • Minimize treelength • Best tree requires the fewest changes to derive all sequences Which is the more parsimonious tree? 3 Nodes 9 Node Crossings 3 Nodes 8 Node Crossings Maximum Likelihood • Reconstruction using an explicit evolutionary model • Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data. • Demanding computationally • Slowest method • Use to test (or improve) an existing tree Clustering Algorithms • Use distances to calculate phylogenetic trees • Trees are based on the relative numbers of similarities and differences between sequences • A distance matrix is constructed by computing pairwise distances for all sequences • Clustering links successively more distant taxa DNA Distances • Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences • Can only work for pairs of sequences that are similar enough to be aligned • All base changes are considered equal • Insertion/deletions are generally given a larger weight than replacements (gap penalties). • Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites. Amino Acid Distances • More difficult to compute • Substitutions have differing effects on structure • Some substitutions require more than one DNA mutation • Use replacement frequencies (PAM, BLOSUM) Fitch & Margoliash • 3 sequences are combined at a time to define branches and calculate their length • Additive branch lengths • Accurate for short branches Neighbor Joining • Most common method of tree construction • Distance matrix adjusted for each taxon depending on its rate of evolution • Good for simulation studies • Most efficient computationally UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages • Simplest method • Calculates branch lengths between most closely related sequences • Averages distance to next sequence or cluster • Predicts a position for the root Phylogenetic Complications • • • • Errors Loss of function Convergent evolution Lateral gene transfer Validation • Use several different algorithms and data sets • NJ methods generate one tree, possibly supporting a tree built by parsimony or maximum likelihood • Bootstrapping – Perturb data and note effect on tree – Repeat many times – Unchanged ~90%, tree’s correctness is supported Are there bugs in our genome? N-acetylneuraminate lyase The End