Download What is a phylogenetic tree? Phylogenetic trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular ecology wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
Lecture1.1
PhylogeneticTrees
IntroductiontoMolecularPhylogenetics
NathanLo
Whatisaphylogenetictree?
•
Phylogenetictrees
Phylogeny:thetrue
evolutionary relationships
amongasetoforganisms
3
•
Topology (relationships)
•
Branchlengths(amountofevolutionary change ortime)
4
Phylogenetictrees
Phylogenetictrees
Tip,leaf,orterminalnode
Root
Stemlineage
Clade
TOTAL
GROUP
Internalnode
(divergenceevent)
Sister
clade
Crowngroup
Branchoredge
5
Phylogenetictrees
6
Cladisticterms
Paraphyletic
Polytomy
Polyphyletic
Polytomies canbe
hard or soft
7
Monophyletic
8
Phylogenetictrees:Cladogram
Phylogenetictrees:Phylogram
Branch lengths
havenomeaning
Branchlengths
measureamount
ofevolutionary
change
9
Phylogenetictrees:Chronogram
10
Phylogenetictrees
Branch lengths
measuretime
(yrorMyr)
Vertical scaledoes
notrepresent anything
11
12
Phylogenetictrees
Phylogenetictrees:Circular
•
Rootisplacedincentre
•
Cladogram, phylogram, or
chronogram
•
Oftenusedto
displaylargetrees
•
Difficult tointerpret
Rotatingaround nodes
doesnotchangethe
meaningofatree
13
Phylogenetictrees:Circular
Jetz et al. (2012) Nature
14
Phylogenetictrees:Unrooted
15
•
Whenposition of
rootisunknown
•
Rootbyincluding
outgroup taxa
16
Rooting
•
•
Include outgrouptaxa
•
Taxoncloselyrelatedtoingroup
•
Taxonisnotpartofingroup
•
Without branchlengths(cladogram):
•
Rootatmidpoint
•
•
Phylogenetictrees:Newickformat
•
Highlyunreliableifinternalbranchesareshort
Withbranchlengths(phylogram/chronogram):
•
Useamolecularclock
•
(Monotremes,(Marsupials,((Elephant,Armadillo),(((Squirrel,Rabbit),(Mo
nkey,Treeshrew)),(Shrew,(Whale,(Bat,(Cat,Rhinoceros))))))));
Automaticallyestimatespositionofroot
(Monotremes:12.0,(Marsupials:11.0,((Elephant:1.0,Armadillo:1.0):9.0,((
(Squirrel:1.0,Rabbit:1.0):2.0,(Monkey:1.0,Treeshrew:1.0):2.0):5.0,(Shre
w:4.0,(Whale:3.0,(Bat:2.0,(Cat:1.0,Rhinoceros:1.0):1.0):1.0):1.0):4.0):2.
0):1.0):1.0);
17
18
Phylogeneticanalysis
•
Molecularphylogenetics
•
Sometimes weknowthephylogeny
•
Viraltransmissionhistories
•
Pedigrees(humans,domesticated animals,laborganisms,etc.)
Usuallywedonotknowthephylogeny butwecanestimateit
•
Morphologicaldata
•
Moleculardata
20
Molecularphylogenetics
1.
Moleculardata
Datapreparation
•
Select datatooptimise signal:noise
•
Taxonandgenesampling
•
Slowlyevolvingmarkersfordeepevolutionary events
•
Datafiltering
•
Rapidlyevolvingmarkersforrecentevolutionary events
•
Sequence alignment
•
Homoplasy
•
2.
Phylogeneticinference
•
Modelselection
•
Estimationoftree
•
Furtheranalysisandinterpretation
•
Taxasharesimilaritiesthatdonotreflectevolutionaryhistory
Takeadvantage ofexistingresources
21
22
Moleculardata
Single-nucleotidepolymorphisms
•
Binarydata(presence/absence ofgenomic features)
•
Singlesitessampledfromthroughout thegenome
•
Microsatellites (repeat numbers)
•
Morecommoninintraspecific (population) studies
•
Single-nucleotide polymorphisms (SNPs)
•
Reduced-representation sequences
Issuestoconsider:
•
•
Sequence data
•
Nucleotides
•
Aminoacids
23
•
Recombination
SNPsareusuallyunlinkedsotheyarelikelytohavedifferent
(gene)trees
•
Ascertainmentbias
SNPsareselectedforvariabilityandthiscanmisleadestimatesof
populationsizes,rates,andotherparameters
24
Reduced-representationsequences
•
Markers identified bycuttinggenomewithrestriction enzymes
•
Processcreates binarydataandshortsequences
•
Examplesinclude RADseqandDArTseq
Sequencedata
•
•
•
Issuestoconsider:
•
Recombination
Markersareusuallyunlinkedsotheyare
likelytohavedifferent(gene)trees
•
Missingdata
Typicallyalargeproportionofmissingdata
Codingsequences
•
RibosomalRNA
•
Protein-codinggenes
Non-coding sequences
•
Intergenicsites
•
Introns
•
Oftenhaveindels(insertions/deletions)
•
Needtoalignsequences
25
Example:Whales
26
DNAsequencealignment
InsertionofT
AACATTAGT
AACATTAGT
AACATTAGT
AACATAGGT
AACATAGGT
AACATAGGT
ACCAAAGT
ACCAAAGT
ACCA-AAGT
CACAAAT
CACA--AAT
ATAAACAA
ATAA-ACAA
AACATAAGT
AACAAAGT
AACAAAAT
AAAAAAAA
DeletionofA
CACAAAT
ATAAACAA
27
28
DNAsequencealignment
•
Homologous site
•
Inherited fromthecommon
ancestor ofallsequences in
thealignment
•
Theaimofsequence alignment
istomaximisethenumberof
sitesforwhich youcaninfer
homology
DNAsequencealignment
•
AACATTAGT
•
AACATAGGT
•
ACCA-AAGT
Groupstogether thefirst
3sequences
Groupstogether thelast
2sequences
Informative forallphylogenetic
methods
AACATTAGT
AACATAGGT
ACCA-AAGT
CACA--AAT
CACA--AAT
ATAA-ACAA
ATAA-ACAA
29
30
DNAsequencealignment
•
Doesnotgroupanysequences
Notusefulformaximum
parsimony
AACATTAGT
Butinformative forestimating
amountofevolutionary change
AACATAGGT
•
•
DNAsequencealignment
•
Usefulforothermethods
ACCA-AAGT
CACA--AAT
ATAA-ACAA
•
Indel– insertion ordeletion
•
Potentially informative
AACATTAGT
•
Mostphylogenetic methods
donotreally useindeldata
AACATAGGT
•
Maximum-likelihood and
Bayesianmethodstypically
treattheminthesameway
asmissingdata
ACCA-AAGT
CACA--AAT
ATAA-ACAA
31
32
Apracticalapproach
Apracticalapproach
Alignsequencesusingautomatedmethods
Alignsequencesusingautomatedmethods
Adjustalignmentsbyeye
CTATGTGGCACCCAGCCCATGCA--AGC
ATATGTGGCA-----CCCAGGCA--AGATATGTGGCACCCAGCCCATGCATTT-33
34
Apracticalapproach
Apracticalapproach
Alignsequencesusingautomatedmethods
Alignsequencesusingautomatedmethods
Adjustalignmentsbyeye
Adjustalignmentsbyeye
Deletesiteswithuncertainhomology
Deletesiteswithuncertainhomology
CTATGTGGCACCCAGCCCATGCA--AGC
Furtherdatafiltering
ATATGTGGCA-----CCCAGGCA--AG-
?
ATATGTGGCACCCAGCCCATGCATTT-35
36
Gapsandmissingdata
Deletesiteswithanymissingdata
•
•
Potentiallossofinformative data
•
Problematicinanalysesofdatasupermatrices
•
Impactofmissingdataremainspoorly understood
•
Filter dataaccording tochosenthreshold ofmissingdata
Treatgapsasunresolveddata
•
•
•
Gene1 Gene2 Gene3 Gene4 Gene5
Gapissimultaneously A,C,G,andT
Taxon1
Mostcommonapproach
Taxon2
Taxon4
Notappropriatewhentherearelonggaps
•
Maximise
gene
sampling
Taxon3
Treatgapsasa5th(nucleotide) or21st(aminoacid)state
•
•
Gapsandmissingdata
Taxon5
Codegapsasbinarycharacters
Taxon6
Maximisetaxonsampling
37
Mutationalsaturation
•
38
Usefulreferences
Somesitescanevolve veryrapidly
•
3rdcodonpositions
•
LoopregionsinRNA
•
Multiple hitscanerodephylogenetic signal
•
Various waysoftestingforsaturation
Saturatedsitescanberemovedtoimprovesignal:noise
39
40
Testyourunderstanding
Testyourunderstanding
Thenow-defunctmammalorderInsectivoraincludedtheshrewand
treeshrew.IsInsectivoramonophyletic, polyphyletic,orparaphyletic?
Thistreehas11taxa.How
manyinternalbranches
doesitcontain?
Whatisthesistertaxon to
thearmadillo?
41
42