Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
“Nothing in biology makes sense except in the light of evolution.” “Scientists often have a naive faith that if only they could discover enough facts about a problem, these facts would somehow arrange themselves in a compelling and true solution.” Theodosius Dobzhansky 1900-1975 Good sources of information on molecular phylogenetics and tree reconstruction Freeman and Herron, 4th ed Hillis, Moritz, Mable, 2nd ed Hartl and Clark, 4th ed Phylogenetic Estimation • Deriving hypotheses about the of evolutionary history of lineages based on molecular data – Species delineation – Phylogeography – Character evolution – Lots more Pedagogical Considerations • Phylogenetics is an interdisciplinary science – Genetics – Evolutionary processes (multiple levels) – Life history and natural history of organisms – Geological history – Statistics and mathematical algorithms Intro to Phylogenetics Types of characters: • nuclear DNA • • • • mt DNA restriction fragment data (RFLPs) DNA fingerprinting (microsatellites) proteins (allozymes, aa sequence) Intro to Phylogenetics Types of characters: • nuclear DNA • • • • Mt/chlor DNA restriction fragment data (RFLPs) DNA fingerprinting (microsatellites) proteins (allozymes, aa sequence) Samples or representations of genetic material to capture “phylogenetic signal” Phylogenetics in the genomics age… • Extensions to genomic-level analyses and questions: – Genome-wide sequence divergence and phylogenetic analysis--whole mtDNA genome vs. parts • How many bp resolves best phylogenetic hypotheses? – How are mtDNA and nuclear DNA variation related? – Does mtDNA sequence diversity within lineages correlate with genome size variation? – Can you use functional/structural protein sequences for phylogenetic analyses if you survey enough of them? Genetic diversity/no morphological diversity Plethodontid salamanders mt DNA sequence P. hubrichti RM 1 N1 RM 2 BM 1 BM 2 DG VA 1 GF 1 N2 GF 2 WT VA1 DG VA 2 WT VA 2 SI 1 SI 2 S1 CW 1 CW 2 RBB 1 RBB 2 BR 1 CM 1 CM 2 BR 2 S2 PG 1 SM. 1 SM. 2 PG 2 ML 1 ML 2 CD 1 S3 CD 2 Desmognathus wrighti N1 (pygmy salamander) N2 S3 S2 S1 Combination mtDNA and allozyme Morphological diversity/no genetic diversity Finches and widowbirds Sexual dimorphism Resource partitioning Nutritional effects Recent isolation Founder effect Genomic/mt DNA extraction PCR target gene sequence DNA sequence Alignment gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAGGGTGTAAATCTCG GTGAGCTCTCGCTGGTCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG *************** ******************** ************* gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CGCCAGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT **** ********************************************* REV PRIMER GGCATGTTAGATCAAGGTAGATAAGGGAAGTCGGCAAATCAGATCCGTAA GGCATGTTAGAACAATGTATGTAAGGGAAGTCGGCAAGTCAGATCCGTAA GGCATGTTAGAACAATGTAGGTAAGGGAAGTCGGCAAGTCAGATCCGTAA *********** *** *** **************** ************ gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 Phylogenetic analysis gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGAGTGC CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC ********************************************* **** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GAAGCGGGGCTGGGCTCGTGCCGCGGCTGGGGGAGCAGTCGCCCCGTCGC GAAGCGGGGCTGGGCTCGAGCCGCGGCTGGGGGAGCAGTTGCTCCGCCTC GAAGCGGGGCTGGGCACGCGCCGCGGCTGGACGAG-----GCGTCGCCT*************** ** *********** *** ** ** * Assumptions of phylogenetic analyses • • • • Common descent Characters must reflect genetic inheritance Characters evolve independently No homoplasy – i.e., event-by- event recounting of fixed mutations in a lineage over time • No polarity in character states unless an outgroup is specified (based on other types of data) • Intertaxon variation > intrataxon variation Forefathers of Phylogenetics Charles Darwin (1809-1882) Sewell Wright (1889-1988) Motoo Kimura (1924-1994) Neutral Theory Paradigm • The majority of base substitutions that become fixed in populations are neutral with respect to fitness • Regions of genome that are under selection are not appropriate for detection of phylogenetic signal • Genetic mutation is the source of genetic variation • Genetic drift dominates evolution at the level of DNA sequence Mutation • Heritable change in genetic code – Point mutation – Insertions/deletions (recombination) – Transposable elements • Mutation rates are not equal throughout the genome Variation in mutation rates among genomic regions • • • • • Coding sequences (exons, code for proteins) Non-coding sequences (introns) Regulatory regions (5’UTR, 3’UTR, promoters) Pseudogenes (non-functional gene relicts) Wobble position nucleotides – Synonymous (silent) vs. non-synonymous (replacement) • Variation due to function of protein product Hartl & Clark, Principles of Population Genetics Kinds of mutations occur at different rates • Genes/regions that best detect phylogenetic signal conform to neutral theory predictions • Models of evolution are used to incorporate variation in mutation rates within the data (based on molecular genetic processes) for more realistic estimations of evolutionary history Hartl & Clark, Principles of Population Genetics Molecular clocks • Implicitly used when choosing a region to assay for variation given the expected evolutionary distance of interest • Explicitly used when attempting to date divergence times • Need to calibrate divergence times estimated with DNA variation with historical geological dates/events • Lots of debate and criticism about the use of molecular clocks Molecular clocks Hartl & Clark, Principles of Population Genetics Molecular clocks When is a molecule not appropriate? Saturation (homoplasy) Molecular clocks When is a molecule not appropriate? Questions to ask yourself Do molecular clocks tick evenly through time? Is there a geological date/event for calibration? Are geological calibrations useful? Molecules can evolve at different rates than organisms (or other molecules)! 28S rRNA partial sequence gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTCGATATA CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTCGATATA CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTTGATATA ******************************************* ****** FOR PRIMER GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA ************************************************** CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGGAGCG CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGGAGCG CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGTAGCG ********************************************* **** Forward primer gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 TCGGGCCCATACCCGGCCGTCGCCGGCAACAGGAGCCGCGAGGGCTATGC TCGGGCCCATACCCGGCCGTCGCTGGCAACGAGAGCCTCGAGGGCTATGC TCGGGCCCATACCCGGCCGTCGCCGGCCACGGGAGCCTCGCAGGCTATGC *********************** *** ** ***** ** ******** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CGCGACGAGTAGGAGGGCCGCCGCGGTGAGCACGGAAGCCTAGGGCGTGG CGCGACGAGTAGGAGGGCCGCCGCGGTGAGCACGGAAGCCTAGGGCGCGG CGCGACGAGTAGGAGGGCCGCCGCGGTGGGCACTGAAGCCTAGGGCGAGG **************************** **** ************* ** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GCCCGGGTGGAGCCGCCGCGGGTGCAGATCTTGGTGGTAGTAGCAAATAT GCCCGGGTGGAGCCGCCGCGGGTGCAGATCTTGGTGGTAGTAGCAAATAT GCCCGGGTGGAGCCGCCGCAGGTGCAGATCTTGGTGGTAGTAGCAAATAT ******************* ****************************** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 TCAAACGAGAACTTTGAAGGCCGAAGTGGAGAAGGGTTCCATGTGAACAG TCAAACGAGAACTTTGAAGGCCGAAGTGGAGAAGGGTTCCATGTGAACAG TCAAACGAGAACTTTGAAGACCGAAGTGGAGAAGGGTTCCATGTGAACAG ******************* ****************************** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATGGGCGAACGCCGTTCGG CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATGGCCGAACGCCGTTCGG CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATAGGCGAATCCCGTTCTG ********************************* * **** ****** * gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 AAGGGTGGGGCGATGGCCTACGTCGCCCCCGGCCGATCGAAAGGGAGTCG AAGGGAGGGGCGATGCCCTCCGTCGCCCCCGGCCGATCGAAAGGGAGTCG AAAGGAGGGACGATGACCTCCGTCGCCCCCGGCTGATCGAAAGGGAGTCG ** ** *** ***** *** ************* **************** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GGTTCAGATCCCCGAATCTGGAGTGGCGGAGATAGGCGCCGCGAGGCGTC GGTTCAGATCCCCGAATCCGGAGTGGCGGAGATGGGCGCCGCGAGGCGTC GGTTCAGATCCCCGAATCCGGAGTGGCGGAGACGGCCGCCGCGAGGCGTC ****************** ************* * ************** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CAGTGCGGTAACGCAAACGATCCCGGAGGAGCTGGCGGGAGCCCCGGGGA CAGTGCGGTAACGCGACCGATCCCGGAGAAGCTGGCGGGAGCCCCGGGGA CAGTGCGGTAACGCAACCGATCCCGGAGAAGCCGGCGAGAGCCCCGGAGA ************** * *********** *** **** ********* ** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCC GAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCC GAGTTCTCTTTTCTTTGTGAAGGGCAGGCCACCCTGGAATGGGTTCCCCC **************************** * *************** *** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CGAGAGAGGGGCCCGTGCCCTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG CGAGAGAGGGGCCCAAGCCCTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG CGAGAGAGGGGCCCGCGCCTTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG ************** *** ****************************** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAGGGTGTAAATCTCG GTGAGCTCTCGCTGGTCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG *************** ******************** ************* gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CGCCAGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT **** ********************************************* REV PRIMER GGCATGTTAGATCAAGGTAGATAAGGGAAGTCGGCAAATCAGATCCGTAA GGCATGTTAGAACAATGTATGTAAGGGAAGTCGGCAAGTCAGATCCGTAA GGCATGTTAGAACAATGTAGGTAAGGGAAGTCGGCAAGTCAGATCCGTAA *********** *** *** **************** ************ gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 Reverse primer gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGAGTGC CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC ********************************************* **** gi|38154450|gb|AY452491.1| gi|1144505|gb|U34341.1|OMU3434 gi|1144500|gb|U34340.1|ABU3434 GAAGCGGGGCTGGGCTCGTGCCGCGGCTGGGGGAGCAGTCGCCCCGTCGC GAAGCGGGGCTGGGCTCGAGCCGCGGCTGGGGGAGCAGTTGCTCCGCCTC GAAGCGGGGCTGGGCACGCGCCGCGGCTGGACGAG-----GCGTCGCCT*************** ** *********** *** ** ** * Alignment rules of thumb: • Assumption that similarity in sequence reflects homology • Best to use the same number of characters across operational taxonomic units (OTUs) • Gaps are problematic for algorithms even though they may be evolutionarily important - minimize gaps-check for reliability of sequence, etc. - can be considered a 5th character state or included in some way in analysis in some programs. Forefathers of Phylogenetic Analyses Willi Hennig (father of cladistics) Masatoshi Nei Joseph Felsenstein (father of our favorite (phylogenetic algorithms) phylogenetic statistics) Basic steps of phylogenetic estimation… 1. 2. Define specific sequence of steps (algorithm) for constructing the best tree from a set of possible phylogenies Define criteria for comparing alternate phylogenies to determine which is best (optimality criteria statistic) 99 74 100 D. affinidisjuncta D. heteroneura D. adiastola 100 D. mimica 99 D. nigra S. albovittata 100 D. crassifemur D. mulleri S. lebanonensis D. melanogaster 100 100 0.02 D. pseudoobscura Types of tree construction methods… • Distance Methods (minimum evolution) computation intensity • based on calculated pairwise distance statistics • the smallest value of the sum of all branches as an estimate of the correct tree (additive tree) • Maximum Parsimony • based on only characters that vary among sequences • calculates the most efficient tree length (tree value is the least number of changes to create phylogeny) • Maximum Likelihood** • Bayesian Analyses** ** beyond scope of MEGA, most undergraduates Distance methods Kinds: UPGMA, Neighbor-Joining, Wagner, etc. additive (e.g., neighbor joining) or ultrameric (UPGMA) 99 74 100 D. affinidisjuncta D. heteroneura D. adiastola 100 D. mimica 99 D. nigra S. albovittata 100 Distance matrix OTU1 OTU2 .256 OTU3 .056 .139 OTU4 .176 .222 D. crassifemur D. mulleri S. lebanonensis D. melanogaster 100 100 D. pseudoobscura 0.02 Pros: • uses similarity and differences in measure • simple to calculate and faster to compute • statistical methods to evaluate trees • can estimate genetic distances from branch lengths Cons: • doesn’t take into consideration models of evolution • reduced phylogenetic information .312 Maximum Parsimony Moderate computing intensity Exhaustive searches most intense (all trees are found and evaluated) Heuristic searches (not all trees are found and evaluated independently) - branch and bound, closest neighbor swapping, min-mini algorithm Pros: • Follows philosophy of evolutionary theory--intuitive • Multiple data sets (genes) can be combined in one analyses • statistical methods to evaluate trees • can estimate genetic distances from branch lengths Cons: • doesn’t take into consideration sophisticated models of evolution as Max. Likelihood • Only uses parsimony informative characters (differences) Statistical tests for reliability of tree Are nodes found repeatedly and not due to chance arrangements? 99 1. Bootstrapping • Reordering data with replacement • Repeating 500-1000 times • Statistical probability of node formation • Strong phylogenetic signals should form nodes despite this rearrangement • Parsimony, neighbor joining, minimum evolution 74 100 D. adiastola 100 D. mimica 99 D. nigra S. albovittata 100 2, Compare total branch lengths among trees • neighbor joining and minimum evolution algorithms 3. Interior Branch Length Test • Are interior branch lengths significantly different than 0 using standard errors (maybe a node should be trifurcating)? • Neighbor joining and minimun evolution algorithms D. crassifemur D. mulleri S. lebanonensis D. melanogaster 100 0.02 D. affinidisjuncta D. heteroneura 100 D. pseudoobscura Consensus and collapsed trees Collapse uncertain nodes Consensus and collapsed trees Collapse uncertain nodes Consensus vs. Combination New topologies and gene trees Gene tree of CRF family peptides in vertebrates Boorse and Denver, 2005 Phylogenetics software 383 phylogeny packages and 52 free servers PAUP PHYLIP MacClade Mesquite MrBayes MEGA http://evolution.genetics.washington.edu/phylip/software.html MEGA tutorial 1. Importing sequences 2. Alignment 3. Sequence statistics 4. Phylogenetic estimation 5. Visualization of trees