Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture1.1 PhylogeneticTrees IntroductiontoMolecularPhylogenetics NathanLo Whatisaphylogenetictree? • Phylogenetictrees Phylogeny:thetrue evolutionary relationships amongasetoforganisms 3 • Topology (relationships) • Branchlengths(amountofevolutionary change ortime) 4 Phylogenetictrees Phylogenetictrees Tip,leaf,orterminalnode Root Stemlineage Clade TOTAL GROUP Internalnode (divergenceevent) Sister clade Crowngroup Branchoredge 5 Phylogenetictrees 6 Cladisticterms Paraphyletic Polytomy Polyphyletic Polytomies canbe hard or soft 7 Monophyletic 8 Phylogenetictrees:Cladogram Phylogenetictrees:Phylogram Branch lengths havenomeaning Branchlengths measureamount ofevolutionary change 9 Phylogenetictrees:Chronogram 10 Phylogenetictrees Branch lengths measuretime (yrorMyr) Vertical scaledoes notrepresent anything 11 12 Phylogenetictrees Phylogenetictrees:Circular • Rootisplacedincentre • Cladogram, phylogram, or chronogram • Oftenusedto displaylargetrees • Difficult tointerpret Rotatingaround nodes doesnotchangethe meaningofatree 13 Phylogenetictrees:Circular Jetz et al. (2012) Nature 14 Phylogenetictrees:Unrooted 15 • Whenposition of rootisunknown • Rootbyincluding outgroup taxa 16 Rooting • • Include outgrouptaxa • Taxoncloselyrelatedtoingroup • Taxonisnotpartofingroup • Without branchlengths(cladogram): • Rootatmidpoint • • Phylogenetictrees:Newickformat • Highlyunreliableifinternalbranchesareshort Withbranchlengths(phylogram/chronogram): • Useamolecularclock • (Monotremes,(Marsupials,((Elephant,Armadillo),(((Squirrel,Rabbit),(Mo nkey,Treeshrew)),(Shrew,(Whale,(Bat,(Cat,Rhinoceros)))))))); Automaticallyestimatespositionofroot (Monotremes:12.0,(Marsupials:11.0,((Elephant:1.0,Armadillo:1.0):9.0,(( (Squirrel:1.0,Rabbit:1.0):2.0,(Monkey:1.0,Treeshrew:1.0):2.0):5.0,(Shre w:4.0,(Whale:3.0,(Bat:2.0,(Cat:1.0,Rhinoceros:1.0):1.0):1.0):1.0):4.0):2. 0):1.0):1.0); 17 18 Phylogeneticanalysis • Molecularphylogenetics • Sometimes weknowthephylogeny • Viraltransmissionhistories • Pedigrees(humans,domesticated animals,laborganisms,etc.) Usuallywedonotknowthephylogeny butwecanestimateit • Morphologicaldata • Moleculardata 20 Molecularphylogenetics 1. Moleculardata Datapreparation • Select datatooptimise signal:noise • Taxonandgenesampling • Slowlyevolvingmarkersfordeepevolutionary events • Datafiltering • Rapidlyevolvingmarkersforrecentevolutionary events • Sequence alignment • Homoplasy • 2. Phylogeneticinference • Modelselection • Estimationoftree • Furtheranalysisandinterpretation • Taxasharesimilaritiesthatdonotreflectevolutionaryhistory Takeadvantage ofexistingresources 21 22 Moleculardata Single-nucleotidepolymorphisms • Binarydata(presence/absence ofgenomic features) • Singlesitessampledfromthroughout thegenome • Microsatellites (repeat numbers) • Morecommoninintraspecific (population) studies • Single-nucleotide polymorphisms (SNPs) • Reduced-representation sequences Issuestoconsider: • • Sequence data • Nucleotides • Aminoacids 23 • Recombination SNPsareusuallyunlinkedsotheyarelikelytohavedifferent (gene)trees • Ascertainmentbias SNPsareselectedforvariabilityandthiscanmisleadestimatesof populationsizes,rates,andotherparameters 24 Reduced-representationsequences • Markers identified bycuttinggenomewithrestriction enzymes • Processcreates binarydataandshortsequences • Examplesinclude RADseqandDArTseq Sequencedata • • • Issuestoconsider: • Recombination Markersareusuallyunlinkedsotheyare likelytohavedifferent(gene)trees • Missingdata Typicallyalargeproportionofmissingdata Codingsequences • RibosomalRNA • Protein-codinggenes Non-coding sequences • Intergenicsites • Introns • Oftenhaveindels(insertions/deletions) • Needtoalignsequences 25 Example:Whales 26 DNAsequencealignment InsertionofT AACATTAGT AACATTAGT AACATTAGT AACATAGGT AACATAGGT AACATAGGT ACCAAAGT ACCAAAGT ACCA-AAGT CACAAAT CACA--AAT ATAAACAA ATAA-ACAA AACATAAGT AACAAAGT AACAAAAT AAAAAAAA DeletionofA CACAAAT ATAAACAA 27 28 DNAsequencealignment • Homologous site • Inherited fromthecommon ancestor ofallsequences in thealignment • Theaimofsequence alignment istomaximisethenumberof sitesforwhich youcaninfer homology DNAsequencealignment • AACATTAGT • AACATAGGT • ACCA-AAGT Groupstogether thefirst 3sequences Groupstogether thelast 2sequences Informative forallphylogenetic methods AACATTAGT AACATAGGT ACCA-AAGT CACA--AAT CACA--AAT ATAA-ACAA ATAA-ACAA 29 30 DNAsequencealignment • Doesnotgroupanysequences Notusefulformaximum parsimony AACATTAGT Butinformative forestimating amountofevolutionary change AACATAGGT • • DNAsequencealignment • Usefulforothermethods ACCA-AAGT CACA--AAT ATAA-ACAA • Indel– insertion ordeletion • Potentially informative AACATTAGT • Mostphylogenetic methods donotreally useindeldata AACATAGGT • Maximum-likelihood and Bayesianmethodstypically treattheminthesameway asmissingdata ACCA-AAGT CACA--AAT ATAA-ACAA 31 32 Apracticalapproach Apracticalapproach Alignsequencesusingautomatedmethods Alignsequencesusingautomatedmethods Adjustalignmentsbyeye CTATGTGGCACCCAGCCCATGCA--AGC ATATGTGGCA-----CCCAGGCA--AGATATGTGGCACCCAGCCCATGCATTT-33 34 Apracticalapproach Apracticalapproach Alignsequencesusingautomatedmethods Alignsequencesusingautomatedmethods Adjustalignmentsbyeye Adjustalignmentsbyeye Deletesiteswithuncertainhomology Deletesiteswithuncertainhomology CTATGTGGCACCCAGCCCATGCA--AGC Furtherdatafiltering ATATGTGGCA-----CCCAGGCA--AG- ? ATATGTGGCACCCAGCCCATGCATTT-35 36 Gapsandmissingdata Deletesiteswithanymissingdata • • Potentiallossofinformative data • Problematicinanalysesofdatasupermatrices • Impactofmissingdataremainspoorly understood • Filter dataaccording tochosenthreshold ofmissingdata Treatgapsasunresolveddata • • • Gene1 Gene2 Gene3 Gene4 Gene5 Gapissimultaneously A,C,G,andT Taxon1 Mostcommonapproach Taxon2 Taxon4 Notappropriatewhentherearelonggaps • Maximise gene sampling Taxon3 Treatgapsasa5th(nucleotide) or21st(aminoacid)state • • Gapsandmissingdata Taxon5 Codegapsasbinarycharacters Taxon6 Maximisetaxonsampling 37 Mutationalsaturation • 38 Usefulreferences Somesitescanevolve veryrapidly • 3rdcodonpositions • LoopregionsinRNA • Multiple hitscanerodephylogenetic signal • Various waysoftestingforsaturation Saturatedsitescanberemovedtoimprovesignal:noise 39 40 Testyourunderstanding Testyourunderstanding Thenow-defunctmammalorderInsectivoraincludedtheshrewand treeshrew.IsInsectivoramonophyletic, polyphyletic,orparaphyletic? Thistreehas11taxa.How manyinternalbranches doesitcontain? Whatisthesistertaxon to thearmadillo? 41 42