Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Aspectos Claves de la Evolución Molecular 1. Porqué estudiar filogenética y evolución molecular 2. Aspectos claves: C. genético, Mutaciones. 3. Alineamientos y homología 4. Construcción de árboles filogenéticos 5. Modelos de sustitución Porqué estudiar filogenética y evolución molecular? “Nothing in biology makes sense except in the light of evolution” - Theodosius Dobzhanski, 1973 (The American Biology Teacher 35:125) “Nothing in evolutionary biology makes sense except in the light of a phylogeny” - Jeff Palmer, Douglas Soltis, Mark Chase, 2004 (American J. Botany 91: 1437-1445) The evolutionary thinking • Russel Wallace writes to Charles Darwin (June 17th 1858) Ernst Haeckel (mid19th Century): the tree of life The neo-synthesis (Fisher, Heldane, and Wright, 1930-1950) The molecular REvolution • Nuttal, 1904: Serological cross-reactions to study phylogenetic relationships among various group of animals. • Watson and Crick beautiful helix! • Zuckerland and Pauling, 1965: molecular clocks. • Fitch & Margoliash, 1967: Construction of phylogenetic trees.A method based on mutation distances as estimated from cytochrome c sequences is of general applicability (Science, 155:279-284). • Kimura, 1968: Evolutionary rate at the molecular level (Nature, 217:624-626). The birth of molecular evolution Aspectos claves: C. genético, Mutaciones. Inferencia Filogenética Genetic code In the RNA that encode a protein, each triplet of bases is recognized by the ribosome as a code for a specific amino acid. This genetic code is universal for all organisms, with only a few exceptions such as the mitochondria. There are 64 possible triplets: 61 sense codons (encode 20 amino acids) and 3 non-sense codons (stop codons). A reading frame that is able to encode for a protein (open reading frame, ORF) starts with a codon for methionine and ends with a stop codon. Aspectos claves: C. genético, Mutaciones. Inferencia Filogenética Point Mutations • Errors in duplication of genetic information can result in the incorporation of a noncomplementary nucleotide: point mutations. • Point mutations at the 1st, 2nd, and 3rd codon position usually (96%), always (100%), and rarely (30%) result in an amino-acid change, respectively. • Point mutations that do not results in an amino-acid change are called synonymous. • Point mutations that results in an amino-acid change are called non-synonymous. Transitions and transversions A C G T Transitions () are purine (A, G) or pyrimidine (C, T) mutations: PuPu, Py-Py Transversions () are purine to pyrimidine mutations or the reverse: (Pu-Py, or Py-Pu). Point mutations and the genetic code 4 possible transitions: AG, CT 8 possible transversions: AC, AT, GC, GT Thus if mutations were random, transversions are 2 times more likely than transitions. Due to steric hindrance (as well as negative selection!), the opposite is true, transitions occur in general more often than transversions [2-15 times more, depending on the gene region and the species]. Aspectos claves: C. genético, Mutaciones. Inferencia Filogenética Indel Mutations • Errors in duplication of genetic information can also result in deletions or insertions of one or more nucleotides: indels mutations. • When three (or multiples of three) nucleotides are inserted or deleted in coding regions, the ORF remains intact, but one (or more) amino acids are inserted or deleted. • In any other case, indels mutations disturb the ORF and the resulting gene codes for an entirely different protein, with a different length than the original one. • Viruses often encode several proteins from a single gene by using overlapping ORFs. Mecanismos genéticos explotados por los virus para la generación de variabilidad Mutacion e Hipermutación Inserciones y Delecciónes Recombinación Reordenamientos Dinámica de Cuasiespecies Dinámica de Cuasiespecies Secuencia maestra Concentración Espectro de mutantes compleja población de variantes fuertemente relacionadas genéticamente baja fidelidad de la ARN polimerasa ARN dependiente Schuster P 2008 Dinámica de Cuasiespecies Dinámica de Cuasiespecies Dinámica de Cuasiespecies Cooperación y complementación Dinámica de Cuasiespecies Dinámica de Cuasiespecies Alineamientos y homología Inferencia Filogenética Alignments • There are three main methods of sequence alignments: 1) Manual 2) Automatic (Dynamic programming, Progressive alignment) 3) Combined Alineamientos y homología Alignments Inferencia Filogenética • An alignment is an hypothesis of positional homology between nucleotide/amino acids. • Homologous sequences are usually aligned such that homologous sites form columns in the alignment. Human beta Horse beta Human alpha Horse alpha Whale myoglobin Lamprey globin Lupin globin PEEKSAVTALWGKV GEEKAAVLALWDKV PADKTNVKAAWGKV AADKTNVKAAWSKV EGEWQLVLHVWAKV AAEKTKIRSAWAPV ESQAALVKSSWEEF Human beta Horse beta Human alpha Horse alpha Whale myoglobin Lamprey globin Lupin globin VHLTN–-FFESFGDLST VQLSN–-FFDSFGDLSN -VLSGAHYFPHF-DLS-VLSGGHYFPHF-DLS-VLSEADKFDRFKHLKT APLSYSTFFPKFKGLTT GALTNANLFSFLKGTSE Easy Difficult due to indels Alineamientos y homología Inferencia Filogenética Dynamic programming • Dynamic programming (Needleman and Wunsch, 1970; Gotoh, 1982) is an exhaustive method that find the best alignment by giving substitutions scores for all pairs of aligned residues and gap penalties (GP). GATTTC GAATTC GAT–TTC GA–ATTC • To prevent excessive use of gaps, indels are usually penalized using so-called GP. • Alignment programs have separate penalties for inserting a gap (gap opening) and for extending a gap (gap extend). Alineamientos y homología Inferencia Filogenética Dynamic programming Scoring Scheme: •Match: +1 •Mismatch: 0 •Indel: -1 Alineamientos y homología Dynamic programming GATTC– GAATTC 1 1 0 1 0 -1 Alignment score = 2 Alineamientos y homología Dynamic programming GA–TTC GAATTC 1 1 -1 1 Alignment score = 4 1 1 Alineamientos y homología Dynamic programming G–ATTC GAATTC 1 -1 1 1 Alignment score = 4 1 1 Alineamientos y homología Multiple Sequence Alignments • Phylogenetic trees are based on multiple sequence alignments. • Dynamic programming can be used to align multiple sequences but the time required growth exponentially with the number of sequences. • Until end 1989 multiple sequences alignments were assembled by hand because the exhaustive alignment of more than five or six sequences is computationally unfeasible. • Now, most multiple sequences alignments are constructed by the method known as progressive sequence alignment (Feng and Doolittle, 1987; Higgins and Sharp; 1988). Alineamientos y homología Progressive Alignment • Progressive alignment is a heuristic method as it makes no guarantees to produce an alignment with the best score according to a formula. • 1) Perform all possible pairwise alignments between each pair of sequences using a fast/approximate method. • 2) Calculate the ‘distance’ between each pair of sequences and construct a crude “guide tree” with the Neighbor-Joining method. • 3) The alignment is gradually built up by following the branching order in the tree, with each step being treated as a dynamic programming pairwise alignment, sometimes with each member of a ‘pair’ having more than one sequence. Alineamientos y homología Progressive alignment - step 1 1 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgacagcta 2 3 3. gctcgatacacgatgacta----gcta 4 4. gctcgatacacgatgacga---gcga 5 5. ctcgaacgatacgatgact----agct 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta Alineamientos y homología Progressive alignment - step 2 1 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta 3. gctcgatacacgatgactagcta 4. gctcgatacacgatgacgagcga 5. ctcgaacgatacgatgactagct 3. gctcgatacacgatgactagcta 4. gctcgatacacgatgacgagcga 2 3 4 5 Alineamientos y homología Progressive alignment - step 3 1 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta + 3. gctcgatacacgatgactagcta 4. gctcgatacacgatgacgagcga 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta 3. gctcgatacacga---tgactagcta 4. gctcgatacacga---tgacgagcga 2 3 4 5 Alineamientos y homología Progressive alignment – final step 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta 3. gctcgatacacga---tgactagcta 4. gctcgatacacga---tgacgagcga + 5. ctcgaacgatacgatgactagct 1. gctcgatacgatacgatgactagcta 2. gctcgatacaagacgatgac-agcta 3. gctcgatacacga---tgactagcta 4. gctcgatacacga---tgacgagcga 5. -ctcga-acgatacgatgactagct- 1 2 3 4 5 . Construcción de árboles filogenéticos • La inferencia de relaciones filogenéticas a partir de secs. moleculares requiere de la selección de uno de los muchos métodos disponibles BR . 96. R J 081 BR . 96. R J 089 BR . 91. R J 347 BR . 96. R J 019 BR . 92. BR 003 BR . 92. BR 021 BR . 95. R J 006 BR . 03. 56ST BR . 97. R J 130 BR . 03. 16ST BR . 03. 58ST BR . 96. R J 065 BR . 90. R J 049 BR . 90. R J 129 • Con frecuencia la inferencia filogenética es considerada como una “caja negra” en la que “entran las secuencias y salen los árboles” BR . 97. R J 001 BR . 97. R J 008 BR . 03. 05ST BR . 01. M19 BR . 03. 46ST BR . 96. R J 092 BR . 90. R J 054 BR . 92. BR 017 BR . 92. BR 018 BR . 03. 06ST BR . 03. 72C F BR . 92. BR 030 BR . 96. R J 005 BR . 92. BR 020 BR . 97. R J 013 BR . 91. R J 364 BR . 96. R J 029 BR . 96. R J 095 BR . 97. R J 004 BR . 03. 41ST BR . 92. BR 019 BR . 03. 29ST BR . 96. R J 044 BR . 91. R J 145 BR . 91. R J 404 BR . 91. R J 379 BR . 95. R J 020 BR . 92. R J 452 BR . 95. R J 019 BR . 96. R J 002 BR . 03. 51ST BR . 96. R J 001 7 7 BR . 96. R J 004 BR . 92. R J 625 BR . 97. P1 10 BR . 01. M06 BR . 91. R J 350 BR . 92. BR 014 BR . 95. R J 008 BR . 95. R J 002 BR . 96. R J 043 BR . 03. 13ST BR . 95. R J 015 BR . 03. 59ST BR . 96. R J 066 BR . 01. M44 BR . 97. R J 006 BR . 96. R J 093 BR . 97. R J 105 1 0 0 BR . 97. R J 116 BR . 90. R J 125 BR . 03. 14ST BR . 01. M20 BR . 03. 50ST BbrC ons BR . 96. R J 025 BR . 92. R J 636 BR . 96. R J 070 BR . 91. R J 153 BR . 01. M49 BR . 92. BR 024 BR . 91. R J 392 BR . 01. M22 BR . 01. M23 BR . 01. M45 BR . 01. M41 BR . 90. R J 121 BR . 97. R J 011 BR . 90. R J 064 Objetivos fundamentales BR . 91. R J 336 BR . 96. R J 088 BR . 91. R J 420 BR . 95. R J 005 BR . 92. R J 483 BR . 92. R J I 04 BR . 90. R J 062 1. desarrollar un marco conceptual para entender los fundamentos teóricos (filosóficos) que distinguen a los distintos métodos de inferencia (clasificación de métodos) BR . 90. R J 014 BR . 97. R J 131 BR . 03. 17ST BR . 92. R J I 02 BR . 90. R J 019 BR . 90. R J 108 BR . 95. R J 017 BR . 97. R J 005 BR . 01. M46 BR . 03. 25ST BR . 91. R J 139 BR . 91. R J 416 BR . 92. BR 028 BR . 95. R J 016 BR . 95. R J 013 BR . 96. R J 011 BR . 92. R J 478 BR . 95. R J 009 1 0 0 BR . 95. R J 010 BR . 92. R J 626 BR . 96. R J 041 BR . 90. R J 131 BR . 92. R J 484 BR . 96. R J 071 BR . 91. R J 398 BR . 90. R J 012 BR . 92. BR 004 BR . 96. R J 075 BR . 92. BR 026 BR . 03. 40ST BR . 97. R J 124 2. presentar el uso de modelos y suposiciones en filogenética BR . 97. P2 12 BR . 90. R J 059 BR . 03. 36ST BR . 03. 53ST BR . 91. R J 143 BR . 03. 54ST BR . 92. R J 623 BR . 01. M16 0 .0 5 . Construcción de árboles filogenéticos Tree Reconstructions Methods Character-based methods Distance-based methods Methods based on an explicit model of evolution Maximum-likelihood Bayesian Inference Neighbour-Joining Minimum Evolution UPGMA Methods not based on an explicit model of evolution Maximum-parsimony . Construcción de árboles filogenéticos Neighbor-Joining method The NJ (Saitou and Nei, 1987) method is a heuristic method for estimating the minimum evolution tree. The NJ method is based on the minimum evolution principle and construct internal nodes by joining nearest neighbors (two taxa connected by a single node) in each step. PAM Spinach Rice Mosquito Monkey Human Spinach 0.0 84.9 105.6 90.8 86.3 Distance Matrix Rice Mosquito 84.9 105.6 0.0 117.8 117.8 0.0 122.4 84.7 122.6 80.8 Monkey 90.8 122.4 84.7 0.0 3.3 Human 86.3 122.6 80.8 3.3 0.0 . Construcción de árboles filogenéticos Neighbor-Joining (1) Distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum. Mon-Hum Mosquito Spinach Rice Human Recalculate the distance matrix again….. Monkey . Construcción de árboles filogenéticos Neighbor-Joining (2) PAM Spinach Rice Mosquito MonHum Spinach 0.0 84.9 105.6 88.6 Rice 84.9 0.0 117.8 122.5 Mosquito 105.6 117.8 0.0 82.8 MonHum 88.6 122.5 82.8 0.0 Mos-(Mon-Hum) Mon-Hum Rice Spinach Mosquito Human Recalculate the distance matrix again….. Monkey . Construcción de árboles filogenéticos Neighbor-Joining (3) PAM Spinach Rice MosMonHum Spinach 0.0 84.9 97.1 Rice 84.9 0.0 120.2 MosMonHum 97.1 120.2 0.0 Mos-(Mon-Hum) Spin-Rice Rice Spinach Mon-Hum Mosquito Human Recalculate the distance matrix again….. Monkey . Construcción de árboles filogenéticos Neighbor-Joining (4) PAM Spinach MosMonHu m SpinRice 0.0 108.7 MosMonHu m 108.7 0.0 (Spin-Rice)-(Mos-(Mon-Hum)) Mos-(Mon-Hum) Spin-Rice Rice Mon-Hum Spinach Mosquito Human Monkey . Construcción de árboles filogenéticos Unrooted Neighbor-Joining Tree Human Spinach Monkey Rice Mosquito . Construcción de árboles filogenéticos Distance-based methods Advantages: - very fast - allows the use of an explicit model of evolution Disadvantages: - only produces one best tree (we do not get any idea about other potential tress) - reduces all sequence information into a single distance value - generally outperformed by Maximum likelihood or Bayesian methods in choosing the correct tree in computer simulations . Construcción de árboles filogenéticos Tree Reconstructions Methods Character-based methods Distance-based methods Methods based on an explicit model of evolution Maximum-likelihood Bayesian Inference Neighbour-Joining Neighbour-Joining Minimum Evolution UPGMA Methods not based on an explicit model of evolution Maximum-parsimony . Construcción de árboles filogenéticos