* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Phylogeny
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Point mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Phylogeny Modified by Dietlind Gerloff for use in BME110/BIOL181 Fall 2009 Winter 2009 Winter 2008 © Wiley Publishing. 2007. All Rights Reserved. Learning Objectives !Understand the most basic concepts of phylogeny !Be able to compute simple phylogenetic trees !Understand/remember the difference between orthologs and paralogs !Understand what bootstrapping means in phylogeny Outline ! Finding out what to do with phylogeny ! Gathering sequences to make a tree ! Preparing your multiple-sequence alignment ! Computing a tree ! Bootstrapping your tree to check its reliability ! Displaying your tree Why Build a Phylogenetic Tree ? ! Phylogenetic trees reconstruct the evolutionary history of your sequences ! They tell you who is closer to whom in the big tree of life ! Phylogenetic trees are based on sequence similarity rather than morphologic characters 3 Ways to Use Your Gene Tree ! Finding the closest relative of your organism • Usually done with a tree based on the ribosomal RNA ! Discovering the function of a gene • Finding the orthologues of your gene ! Finding the origin of your gene • Finding whether your gene comes from another species Orthology and Paralogy ! Orthologous genes • Separated by speciation • Often have the same function (but this is not ! Paralogous genes a requirement!) • Separated by duplications • Can have different functions ! In the graph: • A is paralogous with B • A1 is orthologous with A2 • A1 is also paralogous with B2! Working on the Right Data ! Garbage in " garbage out ! The quality of your tree depends on the quality of the data ! Your first task is to assemble a very accurate MSA DNA or Proteins ! Most phylogenetic methods work on Proteins and DNA sequences ! If possible, always compute a multiple-sequence alignment on the protein sequences • Translate the sequences if the DNA is coding • Align the sequences http://www.bork.embl.de/ coot.embl.de/ • Thread the DNA sequences back onto the protein MSA with coot.embl.de/pal2nal pal2nal/ pal2nal ! If your DNA sequences are coding and have more than 70% identity . . . • Compute the tree on the DNA multiple-sequence alignment ! If your DNA sequences are coding and have less than 70% identity . . . • Compute the tree on the protein multiple-sequence alignment Which Sequences ? ! Orthologous sequences • If you want to produce a species tree • Show how the considered species have diverged ! Paralogous sequences • If you want to produce a gene tree, include them • Show the evolution of a protein family • May help speculate about the function of a protein of interest Establishing Orthology ! Establishing orthology is very complicated ! It is common practice to establish orthology using the best reciprocal BLAST • • • • A is a gene of Genome X B is a gene of Genome Y BLAST (Gene A against Genome X) = B BLAST (Gene B against Genome Y) = A Note: the COG functional annotation that we encountered early in the course approximately follows this same idea, only it considers more than two species. ! A is B’s best friend and B is A’s best friend… ! Phylogeny purists dislike this method Creating the Perfect Dataset (Note: “recombinant” in the sense of hybrid, or chimeric, sequences - cloned expression is OK if the sequence is not altered) Building the Right MSA ! Your MSA should have as few gaps as possible. • Typical alignment editing step: consolidating gaps ! Some variability but not too much! ! Some conservation but not too much! Building the Right Tree ! There are two types of tree-reconstruction methods • Distance-based methods • Statistical methods ! Statistical methods are the most accurate • Maximum likelihood of success Note: Parsimony methods are intuitive (favor to • Parsimony use a minimum number of substitutions) but are Note: Parsimony methods are intuitive (= favor using as few as possible mutations) but beclosely accurate forsequences. very closely are only deemedonly to be deemed accurate fortovery related related genes/proteins. ! Statistical methods take more time • Limited to small datasets Distance-based Methods for Tree Reconstruction ! Distance-based methods are the most popular • Neighbor Joining (NJ) • UPGMA (Note: most MSA guide trees are UPGMA trees NEVER use those for actual phylogeny, and/or publications) ! Distance-based methods involve 2 steps: • Measure the distances between pairs of sequences in the MSA • Transform the distance matrix into a tree (Note: some distance-based methods do not use MSAs but pair-wise alignments; this is not optimal but one can obtain decent trees that way if the proper stats/computation is applied!) ! The two most popular packages for making trees are • Clustalw: very simple, not very sophisticated (best avoided for phylogeny, s.above) easily obtained/learned • Phylip: very powerful, less convivial • TreeTop (Genebee server): is a web server that produces high quality trees Which Format ? ! Trees are displayed in graphic formats ! Always keep a version of your tree in newick format • Also called new-hampshire, or nh • Note the parentheses in this format ! Display your nh tree with "Phylodendron" (for example) • Use iubio.bio.indiana.edu/treeapp Reading Your Tree ! There’s a lot of vocabulary in a tree ! Nodes correspond to common ancestors ! The root is the oldest ancestor • Often artificial • Only meaningful with a good outgroup ! Trees can be un-rooted ! Branch lengths are only meaningful when the tree is scaled * *(Spotting the difference • Cladograms are often scaled • Phenograms are usualy unscaled is easy - unscaled trees look very/too regular!) A good tree figure contains: an indication of distance (scale), and branching confidence (e.g. bootstrap values) - sadly we only rarely see these… Bootstrapping ! Use bootstrapping to verify the solidity of each node ! ClustalW and Phylip do bootstrap operations automatically ! Bootstrapping involves these steps: • • • • • Select a subset of your MSA Redo the tree Repeat this operation N times (100 or 1000 times if you can) Compute a consensus tree of the N trees Measure how many of the N trees agree with the consensus tree on each node ! Each node gets a bootstrap figure between 0 and N ! High bootstrap " good node A Bootstrapped Tree ! This tree was produced with 2 bootstrap cycles ! It shows some nodes as more robust than others ! In practice, always use more than 100 cycles (e.g. TreeTop offers 100)