Download Presentation - Cloudfront.net

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
“Nothing in biology makes sense
except in the light of evolution.”
“Scientists often have a naive
faith that if only they could
discover enough facts about a
problem, these facts would
somehow arrange themselves in a
compelling and true solution.”
Theodosius Dobzhansky
1900-1975
Good sources of information on molecular
phylogenetics and tree reconstruction
Freeman and Herron, 4th ed
Hillis, Moritz, Mable, 2nd ed
Hartl and Clark, 4th ed
Phylogenetic Estimation
• Deriving hypotheses about the of evolutionary
history of lineages based on molecular data
– Species delineation
– Phylogeography
– Character evolution
– Lots more
Pedagogical Considerations
• Phylogenetics is an interdisciplinary science
– Genetics
– Evolutionary processes (multiple levels)
– Life history and natural history of organisms
– Geological history
– Statistics and mathematical algorithms
Intro to Phylogenetics
Types of characters:
• nuclear DNA
•
•
•
•
mt DNA
restriction fragment data (RFLPs)
DNA fingerprinting (microsatellites)
proteins (allozymes, aa sequence)
Intro to Phylogenetics
Types of characters:
• nuclear DNA
•
•
•
•
Mt/chlor DNA
restriction fragment data (RFLPs)
DNA fingerprinting (microsatellites)
proteins (allozymes, aa sequence)
Samples or
representations of
genetic material
to capture
“phylogenetic signal”
Phylogenetics in the genomics age…
• Extensions to genomic-level analyses and questions:
– Genome-wide sequence divergence and phylogenetic
analysis--whole mtDNA genome vs. parts
• How many bp resolves best phylogenetic hypotheses?
– How are mtDNA and nuclear DNA variation related?
– Does mtDNA sequence diversity within lineages correlate
with genome size variation?
– Can you use functional/structural protein sequences for
phylogenetic analyses if you survey enough of them?
Genetic diversity/no morphological diversity
Plethodontid salamanders
mt DNA sequence
P. hubrichti
RM 1
N1
RM 2
BM 1
BM 2
DG VA 1
GF 1
N2
GF 2
WT VA1
DG VA 2
WT VA 2
SI 1
SI 2
S1
CW 1
CW 2
RBB 1
RBB 2
BR 1
CM 1
CM 2
BR 2
S2
PG 1
SM. 1
SM. 2
PG 2
ML 1
ML 2
CD 1
S3
CD 2
Desmognathus wrighti
N1
(pygmy salamander)
N2
S3
S2
S1
Combination mtDNA and allozyme
Morphological diversity/no genetic diversity
Finches and widowbirds
Sexual dimorphism
Resource partitioning
Nutritional effects
Recent isolation
Founder effect
Genomic/mt DNA
extraction
PCR target gene sequence
DNA sequence
Alignment
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG
GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAGGGTGTAAATCTCG
GTGAGCTCTCGCTGGTCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG
*************** ******************** *************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CGCCAGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
**** *********************************************
 REV PRIMER
GGCATGTTAGATCAAGGTAGATAAGGGAAGTCGGCAAATCAGATCCGTAA
GGCATGTTAGAACAATGTATGTAAGGGAAGTCGGCAAGTCAGATCCGTAA
GGCATGTTAGAACAATGTAGGTAAGGGAAGTCGGCAAGTCAGATCCGTAA
*********** *** *** **************** ************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
Phylogenetic analysis
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGAGTGC
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC
********************************************* ****
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GAAGCGGGGCTGGGCTCGTGCCGCGGCTGGGGGAGCAGTCGCCCCGTCGC
GAAGCGGGGCTGGGCTCGAGCCGCGGCTGGGGGAGCAGTTGCTCCGCCTC
GAAGCGGGGCTGGGCACGCGCCGCGGCTGGACGAG-----GCGTCGCCT*************** ** *********** ***
** ** *
Assumptions of phylogenetic analyses
•
•
•
•
Common descent
Characters must reflect genetic inheritance
Characters evolve independently
No homoplasy
– i.e., event-by- event recounting of fixed mutations
in a lineage over time
• No polarity in character states unless an outgroup is
specified (based on other types of data)
• Intertaxon variation > intrataxon variation
Forefathers of Phylogenetics
Charles Darwin
(1809-1882)
Sewell Wright
(1889-1988)
Motoo Kimura
(1924-1994)
Neutral Theory Paradigm
• The majority of base substitutions that become fixed
in populations are neutral with respect to fitness
• Regions of genome that are under selection are not
appropriate for detection of phylogenetic signal
• Genetic mutation is the source of genetic variation
• Genetic drift dominates evolution at the level of DNA
sequence
Mutation
• Heritable change in genetic code
– Point mutation
– Insertions/deletions (recombination)
– Transposable elements
• Mutation rates are not equal throughout
the genome
Variation in mutation rates among genomic regions
•
•
•
•
•
Coding sequences (exons, code for proteins)
Non-coding sequences (introns)
Regulatory regions (5’UTR, 3’UTR, promoters)
Pseudogenes (non-functional gene relicts)
Wobble position nucleotides
– Synonymous (silent) vs. non-synonymous (replacement)
• Variation due to function of protein product
Hartl & Clark, Principles of Population Genetics
Kinds of mutations occur at different rates
• Genes/regions that best detect
phylogenetic signal conform to
neutral theory predictions
• Models of evolution are used to
incorporate variation in mutation
rates within the data (based on
molecular genetic processes)
for more realistic estimations of
evolutionary history
Hartl & Clark, Principles of Population Genetics
Molecular clocks
• Implicitly used when choosing a region to assay for
variation given the expected evolutionary distance of
interest
• Explicitly used when attempting to date divergence
times
• Need to calibrate divergence times estimated with
DNA variation with historical geological dates/events
• Lots of debate and criticism about the use of
molecular clocks
Molecular clocks
Hartl & Clark, Principles of Population Genetics
Molecular clocks
When is a molecule not appropriate?
Saturation
(homoplasy)
Molecular clocks
When is a molecule not appropriate?
Questions to ask yourself
Do molecular clocks tick evenly through time?
Is there a geological date/event for calibration?
Are geological calibrations useful?
Molecules can evolve at different rates than
organisms (or other molecules)!
28S rRNA partial sequence
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTCGATATA
CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTCGATATA
CGCCCGATGCCGACGCTCATCAGACCCCAGAAAAGGTGTTGGTTGATATA
******************************************* ******
FOR PRIMER
GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA
GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA
GACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAA
**************************************************

CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGGAGCG
CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGGAGCG
CAACTCACCTGCCGAATCAACTAGCCCTGAAAATGGATGGCGCTGTAGCG
********************************************* ****
Forward primer
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
TCGGGCCCATACCCGGCCGTCGCCGGCAACAGGAGCCGCGAGGGCTATGC
TCGGGCCCATACCCGGCCGTCGCTGGCAACGAGAGCCTCGAGGGCTATGC
TCGGGCCCATACCCGGCCGTCGCCGGCCACGGGAGCCTCGCAGGCTATGC
*********************** *** ** ***** ** ********
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CGCGACGAGTAGGAGGGCCGCCGCGGTGAGCACGGAAGCCTAGGGCGTGG
CGCGACGAGTAGGAGGGCCGCCGCGGTGAGCACGGAAGCCTAGGGCGCGG
CGCGACGAGTAGGAGGGCCGCCGCGGTGGGCACTGAAGCCTAGGGCGAGG
**************************** **** ************* **
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GCCCGGGTGGAGCCGCCGCGGGTGCAGATCTTGGTGGTAGTAGCAAATAT
GCCCGGGTGGAGCCGCCGCGGGTGCAGATCTTGGTGGTAGTAGCAAATAT
GCCCGGGTGGAGCCGCCGCAGGTGCAGATCTTGGTGGTAGTAGCAAATAT
******************* ******************************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
TCAAACGAGAACTTTGAAGGCCGAAGTGGAGAAGGGTTCCATGTGAACAG
TCAAACGAGAACTTTGAAGGCCGAAGTGGAGAAGGGTTCCATGTGAACAG
TCAAACGAGAACTTTGAAGACCGAAGTGGAGAAGGGTTCCATGTGAACAG
******************* ******************************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATGGGCGAACGCCGTTCGG
CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATGGCCGAACGCCGTTCGG
CAGTTGAACATGGGTCAGTCGGTCCTAAGAGATAGGCGAATCCCGTTCTG
********************************* * **** ****** *
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
AAGGGTGGGGCGATGGCCTACGTCGCCCCCGGCCGATCGAAAGGGAGTCG
AAGGGAGGGGCGATGCCCTCCGTCGCCCCCGGCCGATCGAAAGGGAGTCG
AAAGGAGGGACGATGACCTCCGTCGCCCCCGGCTGATCGAAAGGGAGTCG
** ** *** ***** *** ************* ****************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GGTTCAGATCCCCGAATCTGGAGTGGCGGAGATAGGCGCCGCGAGGCGTC
GGTTCAGATCCCCGAATCCGGAGTGGCGGAGATGGGCGCCGCGAGGCGTC
GGTTCAGATCCCCGAATCCGGAGTGGCGGAGACGGCCGCCGCGAGGCGTC
****************** ************* * **************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CAGTGCGGTAACGCAAACGATCCCGGAGGAGCTGGCGGGAGCCCCGGGGA
CAGTGCGGTAACGCGACCGATCCCGGAGAAGCTGGCGGGAGCCCCGGGGA
CAGTGCGGTAACGCAACCGATCCCGGAGAAGCCGGCGAGAGCCCCGGAGA
************** * *********** *** **** ********* **
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCC
GAGTTCTCTTTTCTTTGTGAAGGGCAGGGCGCCCTGGAATGGGTTCGCCC
GAGTTCTCTTTTCTTTGTGAAGGGCAGGCCACCCTGGAATGGGTTCCCCC
**************************** * *************** ***
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CGAGAGAGGGGCCCGTGCCCTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG
CGAGAGAGGGGCCCAAGCCCTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG
CGAGAGAGGGGCCCGCGCCTTGGAAAGCGTCGCGGTTCCGGCGGCGTCCG
************** *** ******************************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG
GTGAGCTCTCGCTGGCCCTTGAAAATCCGGGGGAGAGGGTGTAAATCTCG
GTGAGCTCTCGCTGGTCCTTGAAAATCCGGGGGAGAAGGTGTAAATCTCG
*************** ******************** *************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CGCCAGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
CGCCGGGCCGTACCCATATCCGCAGCAGGTCTCCAAGGTGAACAGCCTCT
**** *********************************************
 REV PRIMER
GGCATGTTAGATCAAGGTAGATAAGGGAAGTCGGCAAATCAGATCCGTAA
GGCATGTTAGAACAATGTATGTAAGGGAAGTCGGCAAGTCAGATCCGTAA
GGCATGTTAGAACAATGTAGGTAAGGGAAGTCGGCAAGTCAGATCCGTAA
*********** *** *** **************** ************
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
Reverse primer
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGAGTGC
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC
CTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCGGGCTGGGGTGC
********************************************* ****
gi|38154450|gb|AY452491.1|
gi|1144505|gb|U34341.1|OMU3434
gi|1144500|gb|U34340.1|ABU3434
GAAGCGGGGCTGGGCTCGTGCCGCGGCTGGGGGAGCAGTCGCCCCGTCGC
GAAGCGGGGCTGGGCTCGAGCCGCGGCTGGGGGAGCAGTTGCTCCGCCTC
GAAGCGGGGCTGGGCACGCGCCGCGGCTGGACGAG-----GCGTCGCCT*************** ** *********** ***
** ** *
Alignment rules of thumb:
• Assumption that similarity in
sequence reflects homology
• Best to use the same number
of characters across
operational taxonomic units
(OTUs)
• Gaps are problematic for
algorithms even though they
may be evolutionarily important
- minimize gaps-check for reliability of
sequence, etc.
- can be considered a 5th character state
or included in some way in analysis in
some programs.
Forefathers of Phylogenetic Analyses
Willi Hennig
(father of cladistics)
Masatoshi Nei
Joseph Felsenstein
(father of our favorite (phylogenetic algorithms)
phylogenetic statistics)
Basic steps of phylogenetic estimation…
1.
2.
Define specific sequence of steps (algorithm)
for constructing the best tree from a set of
possible phylogenies
Define criteria for comparing alternate
phylogenies to determine which is best
(optimality criteria statistic)
99
74
100
D. affinidisjuncta
D. heteroneura
D. adiastola
100
D. mimica
99
D. nigra
S. albovittata
100
D. crassifemur
D. mulleri
S. lebanonensis
D. melanogaster
100
100
0.02
D. pseudoobscura
Types of tree construction methods…
• Distance Methods (minimum evolution)
computation intensity
• based on calculated pairwise distance statistics
• the smallest value of the sum of all branches as an
estimate of the correct tree (additive tree)
• Maximum Parsimony
• based on only characters that vary among sequences
• calculates the most efficient tree length (tree value is the
least number of changes to create phylogeny)
• Maximum Likelihood**
• Bayesian Analyses**
** beyond scope of MEGA, most undergraduates
Distance methods
Kinds: UPGMA, Neighbor-Joining, Wagner, etc.
additive (e.g., neighbor joining) or ultrameric (UPGMA)
99
74
100
D. affinidisjuncta
D. heteroneura
D. adiastola
100
D. mimica
99
D. nigra
S. albovittata
100
Distance matrix
OTU1
OTU2
.256
OTU3
.056
.139
OTU4
.176
.222
D. crassifemur
D. mulleri
S. lebanonensis
D. melanogaster
100
100
D. pseudoobscura
0.02
Pros:
• uses similarity and differences in measure
• simple to calculate and faster to compute
• statistical methods to evaluate trees
• can estimate genetic distances from branch lengths
Cons:
• doesn’t take into consideration models of evolution
• reduced phylogenetic information
.312
Maximum Parsimony
Moderate computing intensity
 Exhaustive searches most intense (all trees are found and evaluated)
 Heuristic searches (not all trees are found and evaluated independently)
- branch and bound, closest neighbor swapping, min-mini algorithm
Pros:
• Follows philosophy of evolutionary
theory--intuitive
• Multiple data sets (genes) can be
combined in one analyses
• statistical methods to evaluate trees
• can estimate genetic distances from
branch lengths
Cons:
• doesn’t take into consideration
sophisticated models of evolution
as Max. Likelihood
• Only uses parsimony informative
characters (differences)
Statistical tests for reliability of tree
Are nodes found repeatedly and not due to chance arrangements?
99
1. Bootstrapping
• Reordering data with replacement
• Repeating 500-1000 times
• Statistical probability of node formation
• Strong phylogenetic signals should form
nodes despite this rearrangement
• Parsimony, neighbor joining, minimum evolution
74
100
D. adiastola
100
D. mimica
99
D. nigra
S. albovittata
100
2, Compare total branch lengths among trees
• neighbor joining and minimum evolution algorithms
3. Interior Branch Length Test
• Are interior branch lengths significantly different than
0 using standard errors (maybe a node should be
trifurcating)?
• Neighbor joining and minimun evolution algorithms
D. crassifemur
D. mulleri
S. lebanonensis
D. melanogaster
100
0.02
D. affinidisjuncta
D. heteroneura
100
D. pseudoobscura
Consensus and collapsed trees
Collapse
uncertain
nodes
Consensus and collapsed trees
Collapse
uncertain
nodes
Consensus vs. Combination
New topologies and gene trees
Gene tree of CRF
family peptides in
vertebrates
Boorse and Denver, 2005
Phylogenetics software
383 phylogeny packages and 52 free servers
PAUP
PHYLIP
MacClade
Mesquite
MrBayes
MEGA
http://evolution.genetics.washington.edu/phylip/software.html
MEGA tutorial
1. Importing sequences
2. Alignment
3. Sequence statistics
4. Phylogenetic estimation
5. Visualization of trees
Related documents