Download Lecture 6 Phylogenetic Inference

Document related concepts

Microevolution wikipedia , lookup

DNA barcoding wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Koinophilia wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Lecture 6
Phylogenetic Inference
From Darwin’s
notebook in 1837
Charles Darwin
Willi Hennig
From “The Origin” in 1859
Cladistics
Phylogenetic inference
Willi Hennig, Cladistics
1.  Clade, Monophyletic group, Natural group
a.  All individuals in the clade derived from a single ancestor
b.  This ancestor’s descendants are all in the clade
Monophyletic groups
Lungfishes
Sarcopterygians
Fishes
Tetrapods
Fishes
Coelacanths
Ancestor
Phylogenetic inference
Definitions:
2. Ancestral v.s. Derived characters
A
B
C
D
Phylogenetic inference
Definitions: apomorphy: derived character
3.
Synapomorphy: Shared derived character
A
B
C
D
apomorphy
synapomorphy
Phylogenetic inference
Definitions:
4.
Reversal evolution
←
←
←
Phylogenetic inference
5. Homoplasy, Convergent evolution
Fossa, Madagascar
Mongoose
Mountain Lion, California,
Cat
Thylacine, Tasmania
Marsupial
Phylogenetic inference
6. Parallel evolution
Phylogenetic Inference
• phylogenetic trees are built from “characters”.
Phylogenetic Inference
• phylogenetic trees are built from “characters”.
• characters can be morphological, behavioral,
physiological, or molecular.
Phylogenetic Inference
• phylogenetic trees are built from “characters”.
• characters can be morphological, behavioral,
physiological, or molecular.
• there are two important assumptions about the
characters used to build trees:
Phylogenetic Inference
• phylogenetic trees are built from “characters”.
• characters can be morphological, behavioral,
physiological, or molecular.
• there are two important assumptions about the
characters used to build trees:
1. they are independent.
Phylogenetic Inference
• phylogenetic trees are built from “characters”.
• characters can be morphological, behavioral,
physiological, or molecular.
• there are two important assumptions about
characters used to build trees:
1. they are independent.
2. they are homologous.
What is a homologous character?
What is a homologous character?
• a homologous character is shared by two species
because it was inherited from a common
ancestor.
What is a homologous character?
• a homologous character is shared by two species
because it was inherited from a common
ancestor.
• a character possessed by two species but was not
present in their recent ancestors, it is said to exhibit
“homoplasy”.
Types of homoplasy:
Types of homoplasy:
1. Convergent evolution
Example: evolution of eyes, flight.
Examples of convergent evolution
Convergent evolution between
placental and marsupial mammals
Types of homoplasy:
1. Convergent evolution
Example: evolution of eyes, flight.
2. Parallel evolution
Example: lactose tolerance in humans.
What is the difference between convergent
and parallel evolution?
What is the difference between convergent
and parallel evolution?
Convergent
Parallel
What is the difference between convergent
and parallel evolution?
Species compared
Convergent
Parallel
distantly
related
closely
related
What is the difference between convergent
and parallel evolution?
Species compared
Trait produced by
Convergent
Parallel
distantly
related
closely
related
different genes/
developmental
pathways
same genes/
developmental
pathways
Types of homoplasy:
1. Convergent evolution
Example: evolution of eyes, flight.
2. Parallel evolution
Example: lactose tolerance in human adults
3. Evolutionary reversals
Example: back mutations at the DNA sequence level (C →
A → C).
Phylogenetic reconstructions
1.  Phenetics (Neighbor - Joining)
2.  Cladistics (Maximum Parsimony)
3.  Statistics (Maximum Likelihood)
Phylogenetic reconstructions
Phenetics (Distance Methods)
A
B
C
D
ATGTTGCCA
* AAGTTGCCA
*****
ATCAACCCA
*
**
CTCAACTTA
A
B
C
D
A B C D
1
4 5
7 8 4
Phylogenetic reconstructions
Phenetics (Distance Methods)
(A,B)=1
(A,B)C=(4+5)/2=4.5 (A,B)D=(7+8)/2=7.5
(A,B,C)D=(7+8+4)/3=6.3
A
B
C
D
A B C D
1
A B
4 5
0.5
7 8 4
C
D
2.25
1.75
3.15
0.9
Phylogenetic reconstructions
Cladistics:
Maximum
Parsimony
Method
A
B
C
D
G
G
A
A
G
A
1 step
A
G
C
B
D
A
G
A
G
3 steps
A
G
G
A
D
B
C
G
A
G
A
G
3 steps
A
G
Phylogenetic reconstructions
Cladistics: Maximum Parsimony
Number of possible rooted trees
Number of taxa
4
7
10
Number of
rooted trees
15
10,395
34,459,425
Number of
unrooted trees
3
954
2,027,025
How do we select the “best” tree?
No. of Taxa
No. of possible trees
4
5
6
7
10
11
50
3
15
105
945
2 x 106
34 x 106
3 x 1074
Independent gain of camera eye requires two
changes
Evolution and loss of camera eye
requires six changes
Phylogenetic reconstructions
Phenetics (Distance Methods)
- what are the principles pheneticists use to construct phylogenies?
1. tree should reflect overall degree of similarity.
2. tree should be based on as many characters as possible.
3. tree should minimize the distance between taxa.
Phylogenetic reconstructions
Cladistics
1. tree should reflect the true phylogeny.
2. phylogeny should be based on characters that are shared (by
more than one taxon) and derived (from some known ancestral
state).
3.  the ancestral state of characters are inferred from an outgroup
that roots the tree.
-
an outgroup is ideally picked from fossil evidence - i.e., it
belongs to a genus or family that existed prior to taxa forming
the ingroup.
Each subspecies of seaside sparrow has a restricted range.
maritima
Atlantic coast
junicola
macgillivraii
Gulf coast
nigrescens
fisheri
peninsulae
compared.
macgillivraii
Atlantic coast
The subspecies separate into two
groups when DNA sequences are
maritima
nigrescens
peninsulae
fisheri
Gulf coast
junicola
How do distance trees differ from
cladograms?
Distance trees
Cladograms
Characters used
as many as
possible
synapomorphies
only
Monophyly
not required
absolute
requirement
Emphasis
branch lengths
branch-splitting
Outgroup
not required
absolute
requirement
Phylogenetic reconstructions
3.
Statistics (Maximum Likelihood)
The only method based on a mutation model !
Phylogenetic reconstructions
3.
α
A
α
α
C
Maximum Likelihood
G
α
α
Jukes-Cantor
Model
α
T
pAn = 3α
Phylogenetic reconstructions
3.
α
A
α
α
C
Maximum Likelihood
G
α
α
Jukes-Cantor
Model
α
T
α
A
β
C
β
G
β
α
β
T
Kimura - 2 parameter
Model
Phylogenetic reconstructions
3.
Maximum Likelihood
α
A
pAn = α + 2β
β
C
β
G
β
α
β
T
Kimura - 2 parameter
Model
Infer relationships among three species:
Outgroup:
Markov chain Monte Carlo
1. 
2. 
3. 
Start at an arbitrary point
Make a small random move
Calculate height ratio (r) of new state to old state:
1. 
2. 
4. 
r > 1 -> new state accepted
r < 1 -> new state accepted with probability r. If new state not accepted, stay in the old state
Go to step 2
2a
always accept
1
2b
20 %
tree 1
accept sometimes
48 %
32 %
tree 2
tree 3
The proportion of time the
MCMC procedure samples
from a particular parameter
region is an estimate of that
region’s posterior
probability density
Markov chain Monte Carlo
1. 
2. 
3. 
Start at an arbitrary point
Make a small random move
Calculate height ratio (r) of new state to old state:
1. 
2. 
4. 
r > 1 -> new state accepted
r < 1 -> new state accepted with probability r. If new state not accepted, stay in the
old state
Go to step 2
2a
always accept
1
2b
20 %
tree 1
accept sometimes
48 %
32 %
tree 2
tree 3
The proportion of time the
MCMC procedure samples
from a particular parameter
region is an estimate of that
region’s posterior
probability density
Phylogenetic reconstructions
1.  Phenetics (Neighbor - Joining)
2.  Cladistics (Maximum Parsimony)
3.  Statistics (Maximum Likelihood)
Phylogenetic Inference
Two points to keep in mind:
1. Phylogenetic trees are hypotheses
2. Gene trees are not the same as species trees
• a species tree depicts the evolutionary history of a
group of species.
• a gene tree depicts the evolutionary history of a
specific locus.
Conflict between gene trees and species trees
Conflict between gene trees and species trees
How do we select the “best” tree?
Evaluating tree support by bootstrapping
Evaluating tree support by bootstrapping
Species 1
Species 2
Species 3
Species 4
A
A
A
A
A
T
T
T
C
C
T
T
G
G
G
G
C
C
A
A
C
C
C
C
T…
T…
C…
C…
G
G
G
G
Evaluating tree support by bootstrapping
Species 1
Species 2
Species 3
Species 4
A
A
A
A
A
T
T
T
C
C
T
T
G
G
G
G
C
C
A
A
C
C
C
C
T…
T…
C…
C…
Species 1
Species 2
Species 3
Species 4
G
G
G
G
Evaluating tree support by bootstrapping
Species 1
Species 2
Species 3
Species 4
A
A
A
A
A
T
T
T
C
C
T
T
G
G
G
G
C
C
A
A
C
C
C
C
T…
T…
C…
C…
G
G
G
G
Step 1. Randomly select a base to represent position 1
Evaluating tree support by bootstrapping
Species 1
Species 2
Species 3
Species 4
A
A
A
A
A
T
T
T
C
C
T
T
G
G
G
G
C
C
A
A
C
C
C
C
T…
T…
C…
C…

G
G
G
G
Step 1. Randomly select a base to represent position 1
Species 1
Species 2
Species 3
Species 4
T
T
C
C
Evaluating tree support by bootstrapping
Species 1
Species 2
Species 3
Species 4
A
A
A
A
A
T
T
T
C
C
T
T
G
G
G
G

C
C
A
A
C
C
C
C
T…
T…
C…
C…
G
G
G
G
Step 2. Randomly select a base to represent position 2
Species 1
Species 2
Species 3
Species 4
T
T
C
C
G
G
G
G
Evaluating tree support by bootstrapping
Step 3. Generate complete data set (sampling with
replacement).
Evaluating tree support by bootstrapping
Step 3. Generate complete data set (sampling with
replacement).
Step 4. Build tree and record if groupings match original
tree.
Evaluating tree support by bootstrapping
Step 3. Generate complete data set (sampling with
replacement).
Step 4. Build tree and record if groupings match original
tree.
Step 5. Repeat 1,000 times.
Evaluating tree support by bootstrapping
Species 1
98
Species 2
92
Species 3
Species 4
Cospeciation of aphids and their
bacterial endosymbionts