Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Molecular Biology Lecture Eleven: Introduction to phylogenetic trees Semester I, 2009-10 Graham Ellis NUI Galway, Ireland Page from Darwin’s notebooks (c. 1837) On the origin of species (excerpt) The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth. The green and budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species. At each period of growth all the growing twigs have tried to branch out on all sides, and to overtop and kill the surrounding twigs and branches, in the same manner as species and groups of species have at all times overmastered other species in the great battle for life. The limbs divided into great branches, and these into lesser and lesser branches, were themselves once, when the tree was young, budding twigs; and this connection of the former and present buds by ramifying branches may well represent the classification of all extinct and living species in groups subordinate to groups. Of the many twigs which flourished when the tree was a mere bush, only two or three, now grown into great branches, yet survive and bear the other branches; so with the species which lived during long-past geological periods, very few have left living and modified descendants. Jargon A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Jargon A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Almost all species of large organisms are eukaryotes, including animals, plants and fungi, although most species of eukaryotic protists are microorganisms. Jargon A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Almost all species of large organisms are eukaryotes, including animals, plants and fungi, although most species of eukaryotic protists are microorganisms. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus. Tree of life today Darwin’s tree model is still considered valid for eukaryotic life forms. The earliest branch of the eukaryote tree yields four supergroups: ◮ Plants (green and red algae, and plants), ◮ Unikonts (amoebas, fungi, and all animals - including humans), ◮ Excavates (free-living organisms and parasites), ◮ and SAR (a recently identified main group, abbreviated from Stramenopiles, Alveolates, and Rhizaria, the names of some of its members). Tree of life today Biologists now recognize that the prokaryotes, the bacteria and archaea , have the ability to transfer genetic information between unrelated organisms through horizontal gene transfer (HGT). Tree of life today Biologists now recognize that the prokaryotes, the bacteria and archaea , have the ability to transfer genetic information between unrelated organisms through horizontal gene transfer (HGT). Recombination, gene loss, duplication, and gene creation are a few of the processes by which genes can be transferred within and between bacterial and archael species, causing variation that’s not due to vertical transfer. Tree of life today Biologists now recognize that the prokaryotes, the bacteria and archaea , have the ability to transfer genetic information between unrelated organisms through horizontal gene transfer (HGT). Recombination, gene loss, duplication, and gene creation are a few of the processes by which genes can be transferred within and between bacterial and archael species, causing variation that’s not due to vertical transfer. Darwin’s tree is a useful tool in understanding the basic processes of evolution but cannot explain the full complexity of the situation. More jargon A graph consists of a set of vertices and a set of edges joining certain pairs of verices. More jargon A graph consists of a set of vertices and a set of edges joining certain pairs of verices. More jargon A graph consists of a set of vertices and a set of edges joining certain pairs of verices. A graph is connected if, for any pair of vertices A, B, there exists a path of edges starting at A and ending at B. And more A tree is a connected graph with no loops. Phylogenetic trees A phylogenetic tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. Phylogenetic trees A phylogenetic tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants. Phylogenetic trees A phylogenetic tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants. The edge lengths in some trees correspond to time estimates or ”distances” between species. Phylogenetic trees A phylogenetic tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants. The edge lengths in some trees correspond to time estimates or ”distances” between species. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units as they cannot be directly observed. Edit distance between DNA strands Given two strings X , Y of letters (or nucleotides) we can think of the score of an optimal alignment (some choice of µ, ρ) as being a measure of the distance between X and Y . Edit distance between DNA strands Given two strings X , Y of letters (or nucleotides) we can think of the score of an optimal alignment (some choice of µ, ρ) as being a measure of the distance between X and Y . Alternatively, we can count the minimum number of insertions/deletions and substitutions/mismatches needed to convert X into Y , and use this count as a measure of the distnce between X and Y . This is known as the edit distance between X and Y . Example Consider V=AACCGGTT. One subsitution/mismatch produces W=AACCGGTA. Example Consider V=AACCGGTT. One subsitution/mismatch produces W=AACCGGTA. A different substitution/mismatch of V produces X=TACCGGTT. Example Consider V=AACCGGTT. One subsitution/mismatch produces W=AACCGGTA. A different substitution/mismatch of V produces X=TACCGGTT. A substitution/mismatch of X produces Y=TACTGGTT. Example Consider V=AACCGGTT. One subsitution/mismatch produces W=AACCGGTA. A different substitution/mismatch of V produces X=TACCGGTT. A substitution/mismatch of X produces Y=TACTGGTT. A different substitution/mismatch of X produces Z=TACCGATT. Example Consider V=AACCGGTT. One subsitution/mismatch produces W=AACCGGTA. A different substitution/mismatch of V produces X=TACCGGTT. A substitution/mismatch of X produces Y=TACTGGTT. A different substitution/mismatch of X produces Z=TACCGATT. The words V , W , X , Y , Z can be represented in a phylogenetic tree where each edge is of length 1, and the edit distance between two words is the number of edges in the path joining the words. V X W Y Z ClustalW2 Have a go at using ClustalW2 software for reproducing the above tree, starting from the data V=AACCGGTT. W=AACCGGTA. X=TACCGGTT. Y=TACTGGTT. Z=TACCGATT.