Download An Introduction to Phylogenetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Koinophilia wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Introduction to Phylogenies
Dr Laura Emery
[email protected]
www.ebi.ac.uk/training
Objectives
After this tutorial you should be able to…
• Use essential phylogenetic terminology effectively
• Discuss aspects of phylogenies and their implications for
phylogenetic interpretation
• Apply phylogenetic principles to interpret simple trees
Outline
•
•
•
•
Applications of phylogenetics
What is a phylogeny or tree?
Aspects of a tree
Phylogenetic Interpretation
What can I do with phylogenetics?
• Deduce relationships among species or genes
• Deduce the origin of pathogens
• Identify biological processes that affect how your
sequence has evolved e.g. identify genes or residues
undergoing positive selection
• Explore the evolution of traits through history
• Estimate the timing of major historical events
• Explore the impact of geography on species
diversification
What is a phylogenetic tree?
Darwin 1837
A tree is an explanation of how sequences evolved, their
genealogical relationships and thus how they came to be
the way they are today (or at the time of sampling).
Phylogenies explain genealogical
relationships
• Family tree
Aspects of a tree
1.
Topology (branching order)
2.
Branch lengths (indication of genetic change)
3.
Nodes
i.
Tips (sampled sequences known as taxa)
ii.
Internal nodes (hypothetical ancestors)
iii. Root (oldest point on the tree)
4.
* Confidence
(bootstraps/probabilities)
*
1. Topology
The topology describes the branching structure of the tree,
which indicate patterns of relatedness.
These trees
display the
same topology
These trees
display
different
topologies
A
A
B
B
C
C
B
C
A
A
C
C
B
A
B
B
A
C
Topology Question
Are these topologies the same?
Answer = yes
Topology Question II
Which of these trees has a different topology from the
others?
F
E D A B C
F
D E
F
C B
C A B
A E D
C B A
F
E D B A C
E D F
2. Branch lengths indicate genetic change
0.8
1.2
0.5
0.6
0.5
0.5
• Longer branches indicate greater change
• Change is typically represented in units of number of
substitutions per site (but check the legend)
A scale bar can represent branch lengths
0.8
1.2
0.5
0.6
0.5
0.5
0.5
These are alternative representations of the same
phylogeny
Branch Length Question
Which of these statements are true?
1. For both gene trees, the Fish is the
most genetically different of the four
species compared
2. For both gene trees, more
substitutions have occurred since
the divergence of Dog and Snake
than they have since Cat and Snake
Gene A
0.5
3. Gene B has accumulated more
substitutions than Gene A on the
Snake lineage
4. Gene B has accumulated more
substitutions than Gene A on the
Fish lineage
Gene B
Alternative representations of phylogenies
All of these representations depict the same topology
Branch lengths are indicated in blue
Red lengths are meaningless
Not all trees include branch length data
Cladogram
Phylogram
Distance and substitution rate are
confounded
• Branch lengths indicate the genetic change that has
occurred
• We often don’t know if long branch lengths reflect:
• A rapid evolutionary rate
• An ancient divergence time
A
• A combination of both
• Genetic change = Evolutionary rate
B
C
E
D
x Divergence time
(substitutions/site) (substitutions/site/year) (years)
Alternative Representations Question
3. Nodes
A
B
C
D
E
• Nodes occur at the ends of branches
• There are three types of nodes:
i.
Tips (sampled sequences known as taxa)
ii.
Internal nodes (hypothetical ancestors)
iii. Root (oldest point on the tree)
Figures Andrew Rambaut
The root is the oldest point on the tree
present
A
B
C
D
E
past
• The root indicates the direction of evolution
• It is also the (hypothesised) most recent common
ancestor (MRCA) of all of the samples in the tree
Figures Andrew Rambaut
Trees can be drawn in an unrooted form
Rooted
A
B
C
D
Unrooted
E
A
D
B
E
C
These are alternative representations of the same topology
There are multiple rooted tree topologies for
any given unrooted tree
*
• Most tree-building
methods produce
unrooted trees
• Identifying the correct
root is often critical for
interpretation!
Figure Aiden Budd
How to root a tree
Midpoint rooted
• Midpoint rooting
• Assume constant
evolutionary rate
Unrooted
• Often not the case!
• Outgroup rooting
Outgroup rooted
• The outgroup is one or
more taxa that are known
to have diverged prior to
the group being studied
• The node where the
outgroup lineage joins the
other taxa is the root
Recommended
Root Question
This tree shows a cladogram i.e. the branch lengths do not
indicate genetic change.
Indicate any root positions where bird and crocodile are not
sister taxa (each other's closest relatives).
4. Confidence
How good is a tree?
A tree is a collection of hypotheses
so we assess our confidence in each
of its parts or branches independently
0.99
100
0.81
63
0.93
85
There are three main approaches:
• Bootstraps
• Bayesian methods
• Approximate likelihood ratio test (aLRT) methods
probabilistic
What is a monophyletic group?
A monophyletic group (also described as a clade) is a group of
taxa that share a more recent common ancestor with each other
than to any other taxa.
monophyletic group
Confidence Question
Which of the bootstrap values indicates our confidence in
the grouping of A, B, C, and D together as a monophyletic
group? Do you think we can be confident in this grouping?
100
91
63
84
A
B
C
D
E
F
Review
1.
Topology (branching order)
2.
Branch lengths (indication of genetic change)
3.
Nodes
i.
Tips (sampled sequences known as taxa)
ii.
Internal nodes (hypothetical ancestors)
iii. Root (oldest point on the tree)
4.
* Confidence
(bootstraps/probabilities)
*
Simple phylogenetic interpretation question
• Which is true?
• A) Mouse is more closely
related to fish than frog is to
fish
• B) Lizard is more closely
related to fish than mouse is to
fish
• C) Human and frog are equally
related to fish
Now it is your turn…
• Open your tutorial manual and begin Tree-thinking quiz 1
(appendix 1)
• The manual is available to download from:
http://www.ebi.ac.uk/training/course/scuola-di-bioinformatica2013
• When you are finished you can mark your own
(the answers are at the end of the quiz).
• Remember to ask for help at any stage!