Download Introduction to Phylogenetics - Lectures For UG-5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

DNA barcoding wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Koinophilia wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Introduction
to
Phylogenetics
Phylogenetics
• Phylogenetics is the study of evolutionary
relationships among a group of organisms.
• Phylogenetic analysis is the means of inferring
or estimating these relationships.
• The result of phylogenetic studies is a
hypothesis about the evolutionary history of
taxonomic groups: their phylogeny
Phylogenetic tree
• The
evolutionary
history inferred from
phylogenetic analysis is
usually depicted as
branching,
treelike
diagrams that represent
an estimated pedigree
of
the
inherited
relationships
among
molecules
(‘‘gene
trees’’), organisms, or
both.
Human
Mouse
Rat
Frog
Fugu
Tetraodon
Zebrafish
Species tree
Why phylogenetic trees?
• Phylogenetic trees are extremely handy tools used by
biologists to understand:
- the composition of genomes,
- relationships among genes in humans and other
species,
- the functions of proteins that run our cells,
- historical relationships among diverse species,
- the processes that generate unique body shapes,
- the origins of remarkable abilities in living
organisms,
Types of trees
Rooted trees reflect the most
basal ancestor of the tree in
question
Unrooted trees do not imply
a known ancestral root.
Terminologies
Nucleotide
substitution
or amino
acid
substitution
Internal branch
Node (common ancestors,
hypothetical)
Root
Leaf/operational
taxonomic units (OTUs)
External branch
outgroup
Terminologies
Nodes: represent taxonomic units. Internal nodes correspond to ancestral
species that are not part of the data.
Internal branch: between 2 nodes. Internal nodes are connected by internal
branches.
External branch: between a node and a leaf (OTUs). Leaves are connected to
the rest of the tree by the external branches emanating from an internal node.
Leaf/operational taxonomic units (OTUs): The observed species
(corresponding to the data) appear at the tip of the branches.
Species/genes/population.
Horizantal branch length: The branch lenght usually represents the
number of changes that have occured in that branch.
Outgroup: Is a taxon that is clearly more distantly related to the taxa of
interest than any of them is to another of these taxa.
Tree topology: The branching pattern of a tree is called the topology.
Clade: A set of all taxa derived from particular common ancestor.
Cladogenesis: The process of branching.
Unscaled trees: Branch lengths are not proportional to the number of
changes.
Scaled trees: Branch lengths are proportional to the number of changes
Altering the position of root, changes the meaning of Phylogenetic
A
tree
C
D
B
B
D
C
A
A
A
C
B
B
D
A
C
C
B
D
D
C
A
D
B
C and D branch late
C and D branch early
Changing the taxon order doesn't
matter
Types of trees
Simply shows relative
recency of common
ancestor
A cladogram with
branch lengths
A dendogram having
all tips equidistant
from root
Types of Phylogenetic Data
• Biomolecular sequences: DNA, RNA, amino acid,
in a multiple alignment
• Molecular markers (e.g., SNPs, RFLPs, etc.)
• Morphology
• Gene order and content
Molecular Data
There are two types of molecular data: characters and distances
Characters: can be a nucleotide / amino acid at a site in DNA /protein
sequence, or the presence or absence of deletion or insertion at a site. That is
each nucleotide/amino acid site in a DNA/protein sequence can be consider a
character site.
Taxa
Characters
Species A ATGGCTATTCTTATAGTACG
Species B ATCGCTAGTCTTATATTACA
Species C TTCACTAGACCT--TGGTCCA
Species D TTGACCAGACCT--TGGTCCG
Species E TTGACCAGTTCT-- TAGTTCG
Making trees using character-based
methods
The main idea of character based methods is to search for a tree
that requires the smallest number of evolutionary changes to
explain the differences among the OTUs under study.
Such a tree is called maximum parsimonious (“simple”) tree.
As an example of tree-building using
maximum parsimony, consider these four
taxa:
AAG
AAA
GGA
AGA
How might they have evolved from a
common ancestor such as AAA?
Tree-building methods: Maximum
parsimony
Molecular Data
Distance: the other type of data are distance
data, which are computed from DNA or amino
acid sequence data . These data are also called
the distance matrix data, because the distance
are usually presented in the matrix from.
Tree construction using distances
The simplest distance based method is
UPGMA
UPGMA employs a sequential clustring algorithm, in which local topological
relationships are inferred in order of decreasing similarity and a phylogenetic
tree is built in a stepwise manner.
First we identify the two OTUs that are most similar to each other (having the
shortest distance) and treat them as a new single OTU.
Such an OTU is is referred to as a composite OTU.
Subsequently from among the new group of OTUs, we identify the pair with
the highest similarity and so on, until only two OTUs are left.
UPGMA
UPGMA
Consider a case of four OTUs. The pairwise evolutionary distances are given by the
following matrix:
OTU
1
2
2
d12
3
d13
d23
4
d14
d24
3
d34
UPGMA: Construct a phylogenetic tree following UPGMA
algorithm.
Distance Matrix
A
B
C
D
E
A
0
B
2
0
C
4
4
0
D
6
6
6
0
E
6
6
6
4
0
F
8
8
8
8
8
F
0
TRY THIS!!!
Neighbor joining Method
Neighbor-joining (Saitou and Nei, 1987) is a method that is related
to the cluster method but does not require the data to be
ultrametric. In other words it does not require that all lineages have
diverged by equal amounts.
The method is especially suited for datasets comprising lineages
with largely varying rates of evolution.
Neighbor joining Method
The neighbor-joining
method Is especially useful
for making a tree having a
large number of taxa.
Begin by placing all the taxa in a star-like structure.
Compute pairwise distances among all OTUs.
Retain the pair with smallest distance (neighbors). Group i and j in the
tree. Connect these neighbors to other OTUs via an internal branch, XY.
When two nodes are linked, their common ancestral node is added to the
tree and the terminal nodes with their respective branches are removed
from the tree.
Neighbor joining Method
N=6
N=5
N=4
Applications of phylogeny
"Species"
trees
recover
the
genealogy of taxa,
individuals of a
population, etc.
Internal
nodes
represent speciation
or other taxonomic
events.
Species trees should
contain sequences
from
only
orthologous genes.
"Gene" trees represent
the evolutionary history
of the genes included in
the study.
Gene trees can provide
evidence
for
gene
duplication events, as
well
as
speciation
events.
Sequences
from
different homologs can
be included in a gene
tree; the subsequent
analyses should cluster
orthologs,
thus
demonstrating
the
evolutionary history of
the orthologs.
Tools/Softwares
• CLUSTALW
http://www.ebi.ac.uk/Tools/services/web/toolform.ebi?too
l=clustalw2
• PHYLIP
http://evolution.genetics.washington.edu/phylip.html
• PAUP
http://paup.csit.fsu.edu/
• Mega5
• http://www.ncbi.nlm.nih.gov/pubmed/21546353