Download Lecture 11 - Graham Ellis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Transcript
Computational Molecular Biology
Lecture Eleven: Introduction to phylogenetic trees
Semester I, 2009-10
Graham Ellis
NUI Galway, Ireland
Page from Darwin’s notebooks (c. 1837)
On the origin of species (excerpt)
The affinities of all the beings of the same class have sometimes
been represented by a great tree. I believe this simile largely speaks
the truth. The green and budding twigs may represent existing
species; and those produced during former years may represent the
long succession of extinct species. At each period of growth all the
growing twigs have tried to branch out on all sides, and to overtop
and kill the surrounding twigs and branches, in the same manner
as species and groups of species have at all times overmastered
other species in the great battle for life. The limbs divided into
great branches, and these into lesser and lesser branches, were
themselves once, when the tree was young, budding twigs; and this
connection of the former and present buds by ramifying branches
may well represent the classification of all extinct and living species
in groups subordinate to groups. Of the many twigs which
flourished when the tree was a mere bush, only two or three, now
grown into great branches, yet survive and bear the other branches;
so with the species which lived during long-past geological periods,
very few have left living and modified descendants.
Jargon
A eukaryote is an organism whose cells contain complex structures
enclosed within membranes.
Jargon
A eukaryote is an organism whose cells contain complex structures
enclosed within membranes.
Almost all species of large organisms are eukaryotes, including
animals, plants and fungi, although most species of eukaryotic
protists are microorganisms.
Jargon
A eukaryote is an organism whose cells contain complex structures
enclosed within membranes.
Almost all species of large organisms are eukaryotes, including
animals, plants and fungi, although most species of eukaryotic
protists are microorganisms.
The defining membrane-bound structure that sets eukaryotic cells
apart from prokaryotic cells is the nucleus.
Tree of life today
Darwin’s tree model is still considered valid for eukaryotic life
forms. The earliest branch of the eukaryote tree yields four
supergroups:
◮
Plants (green and red algae, and plants),
◮
Unikonts (amoebas, fungi, and all animals - including
humans),
◮
Excavates (free-living organisms and parasites),
◮
and SAR (a recently identified main group, abbreviated from
Stramenopiles, Alveolates, and Rhizaria, the names of some of
its members).
Tree of life today
Biologists now recognize that the prokaryotes, the bacteria and
archaea , have the ability to transfer genetic information between
unrelated organisms through horizontal gene transfer (HGT).
Tree of life today
Biologists now recognize that the prokaryotes, the bacteria and
archaea , have the ability to transfer genetic information between
unrelated organisms through horizontal gene transfer (HGT).
Recombination, gene loss, duplication, and gene creation are a few
of the processes by which genes can be transferred within and
between bacterial and archael species, causing variation that’s not
due to vertical transfer.
Tree of life today
Biologists now recognize that the prokaryotes, the bacteria and
archaea , have the ability to transfer genetic information between
unrelated organisms through horizontal gene transfer (HGT).
Recombination, gene loss, duplication, and gene creation are a few
of the processes by which genes can be transferred within and
between bacterial and archael species, causing variation that’s not
due to vertical transfer.
Darwin’s tree is a useful tool in understanding the basic processes
of evolution but cannot explain the full complexity of the situation.
More jargon
A graph consists of a set of vertices and a set of edges joining
certain pairs of verices.
More jargon
A graph consists of a set of vertices and a set of edges joining
certain pairs of verices.
More jargon
A graph consists of a set of vertices and a set of edges joining
certain pairs of verices.
A graph is connected if, for any pair of vertices A, B, there exists a
path of edges starting at A and ending at B.
And more
A tree is a connected graph with no loops.
Phylogenetic trees
A phylogenetic tree is a tree showing the evolutionary relationships
among various biological species or other entities that are believed
to have a common ancestor.
Phylogenetic trees
A phylogenetic tree is a tree showing the evolutionary relationships
among various biological species or other entities that are believed
to have a common ancestor.
In a phylogenetic tree, each node with descendants represents the
most recent common ancestor of the descendants.
Phylogenetic trees
A phylogenetic tree is a tree showing the evolutionary relationships
among various biological species or other entities that are believed
to have a common ancestor.
In a phylogenetic tree, each node with descendants represents the
most recent common ancestor of the descendants.
The edge lengths in some trees correspond to time estimates or
”distances” between species.
Phylogenetic trees
A phylogenetic tree is a tree showing the evolutionary relationships
among various biological species or other entities that are believed
to have a common ancestor.
In a phylogenetic tree, each node with descendants represents the
most recent common ancestor of the descendants.
The edge lengths in some trees correspond to time estimates or
”distances” between species.
Each node is called a taxonomic unit. Internal nodes are generally
called hypothetical taxonomic units as they cannot be directly
observed.
Edit distance between DNA strands
Given two strings X , Y of letters (or nucleotides) we can think of
the score of an optimal alignment (some choice of µ, ρ) as being a
measure of the distance between X and Y .
Edit distance between DNA strands
Given two strings X , Y of letters (or nucleotides) we can think of
the score of an optimal alignment (some choice of µ, ρ) as being a
measure of the distance between X and Y .
Alternatively, we can count the minimum number of
insertions/deletions and substitutions/mismatches needed to
convert X into Y , and use this count as a measure of the distnce
between X and Y . This is known as the edit distance between X
and Y .
Example
Consider V=AACCGGTT.
One subsitution/mismatch produces W=AACCGGTA.
Example
Consider V=AACCGGTT.
One subsitution/mismatch produces W=AACCGGTA.
A different substitution/mismatch of V produces X=TACCGGTT.
Example
Consider V=AACCGGTT.
One subsitution/mismatch produces W=AACCGGTA.
A different substitution/mismatch of V produces X=TACCGGTT.
A substitution/mismatch of X produces Y=TACTGGTT.
Example
Consider V=AACCGGTT.
One subsitution/mismatch produces W=AACCGGTA.
A different substitution/mismatch of V produces X=TACCGGTT.
A substitution/mismatch of X produces Y=TACTGGTT.
A different substitution/mismatch of X produces Z=TACCGATT.
Example
Consider V=AACCGGTT.
One subsitution/mismatch produces W=AACCGGTA.
A different substitution/mismatch of V produces X=TACCGGTT.
A substitution/mismatch of X produces Y=TACTGGTT.
A different substitution/mismatch of X produces Z=TACCGATT.
The words V , W , X , Y , Z can be represented in a phylogenetic
tree where each edge is of length 1, and the edit distance between
two words is the number of edges in the path joining the words.
V
X
W
Y
Z
ClustalW2
Have a go at using ClustalW2 software for reproducing the above
tree, starting from the data
V=AACCGGTT.
W=AACCGGTA.
X=TACCGGTT.
Y=TACTGGTT.
Z=TACCGATT.