Download Progressive alignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Aspectos Claves de la Evolución Molecular
1. Porqué estudiar filogenética y evolución molecular
2. Aspectos claves: C. genético, Mutaciones.
3. Alineamientos y homología
4. Construcción de árboles filogenéticos
5. Modelos de sustitución
Porqué estudiar filogenética y evolución molecular?
“Nothing in biology makes sense except in the light of evolution”
- Theodosius Dobzhanski, 1973
(The American Biology Teacher 35:125)
“Nothing in evolutionary biology makes sense except in the light of a phylogeny”
- Jeff Palmer, Douglas Soltis, Mark Chase, 2004
(American J. Botany 91: 1437-1445)
The evolutionary
thinking
• Russel Wallace writes
to Charles Darwin
(June 17th 1858)
 Ernst
Haeckel (mid19th Century): the tree
of life
 The neo-synthesis
(Fisher, Heldane, and
Wright, 1930-1950)
The molecular REvolution
• Nuttal, 1904: Serological cross-reactions to study
phylogenetic relationships among various group of
animals.
• Watson and Crick beautiful helix!
• Zuckerland and Pauling, 1965: molecular clocks.
• Fitch & Margoliash, 1967: Construction of
phylogenetic trees.A method based on mutation
distances as estimated from cytochrome c sequences
is of general applicability (Science, 155:279-284).
• Kimura, 1968: Evolutionary rate at the molecular level
(Nature, 217:624-626).
The birth of molecular evolution
Aspectos claves: C. genético, Mutaciones.
Inferencia Filogenética
Genetic code
 In the RNA that encode a protein, each triplet of bases is
recognized by the ribosome as a code for a specific amino
acid.
 This genetic code is universal for all organisms, with only a
few exceptions such as the mitochondria.
 There are 64 possible triplets: 61 sense codons (encode 20
amino acids) and 3 non-sense codons (stop codons).
 A reading frame that is able to encode for a protein (open
reading frame, ORF) starts with a codon for methionine and
ends with a stop codon.
Aspectos claves: C. genético, Mutaciones.
Inferencia Filogenética
Point Mutations
• Errors in duplication of genetic information can result
in the incorporation of a noncomplementary
nucleotide: point mutations.
• Point mutations at the 1st, 2nd, and 3rd codon position
usually (96%), always (100%), and rarely (30%) result
in an amino-acid change, respectively.
• Point mutations that do not results in an amino-acid
change are called synonymous.
• Point mutations that results in an amino-acid change
are called non-synonymous.
Transitions and transversions
A



C


G



T
Transitions () are purine (A, G) or pyrimidine (C, T) mutations: PuPu, Py-Py
Transversions () are purine to pyrimidine mutations or the
reverse: (Pu-Py, or Py-Pu).
Point mutations and the genetic code
4
possible transitions: AG, CT
 8 possible transversions: AC, AT,
GC, GT
 Thus if mutations were random, transversions
are 2 times more likely than transitions.
 Due to steric hindrance (as well as negative
selection!), the opposite is true, transitions
occur in general more often than transversions
[2-15 times more, depending on the gene
region and the species].
Aspectos claves: C. genético, Mutaciones.
Inferencia Filogenética
Indel Mutations
• Errors in duplication of genetic information can also result in
deletions or insertions of one or more nucleotides: indels
mutations.
• When three (or multiples of three) nucleotides are inserted
or deleted in coding regions, the ORF remains intact, but
one (or more) amino acids are inserted or deleted.
• In any other case, indels mutations disturb the ORF and the
resulting gene codes for an entirely different protein, with a
different length than the original one.
• Viruses often encode several proteins from a single gene by
using overlapping ORFs.
Mecanismos genéticos explotados
por los virus para la generación de
variabilidad
Mutacion e Hipermutación
Inserciones y Delecciónes
Recombinación
Reordenamientos
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Secuencia maestra
Concentración
Espectro
de mutantes
compleja población de variantes fuertemente
relacionadas
genéticamente
baja fidelidad de la ARN polimerasa ARN dependiente
Schuster P 2008
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Cooperación y complementación
Dinámica de Cuasiespecies
Dinámica de Cuasiespecies
Alineamientos y homología
Inferencia Filogenética
Alignments
• There are three main methods of sequence
alignments:
1) Manual
2) Automatic (Dynamic programming, Progressive
alignment)
3) Combined
Alineamientos y homología
Alignments
Inferencia Filogenética
• An alignment is an hypothesis of positional
homology between nucleotide/amino acids.
• Homologous sequences are usually aligned such that
homologous sites form columns in the alignment.
Human beta
Horse beta
Human alpha
Horse alpha
Whale myoglobin
Lamprey globin
Lupin globin
PEEKSAVTALWGKV
GEEKAAVLALWDKV
PADKTNVKAAWGKV
AADKTNVKAAWSKV
EGEWQLVLHVWAKV
AAEKTKIRSAWAPV
ESQAALVKSSWEEF
Human beta
Horse beta
Human alpha
Horse alpha
Whale myoglobin
Lamprey globin
Lupin globin
VHLTN–-FFESFGDLST
VQLSN–-FFDSFGDLSN
-VLSGAHYFPHF-DLS-VLSGGHYFPHF-DLS-VLSEADKFDRFKHLKT
APLSYSTFFPKFKGLTT
GALTNANLFSFLKGTSE
Easy
Difficult due
to indels
Alineamientos y homología
Inferencia Filogenética
Dynamic programming
• Dynamic programming (Needleman and Wunsch, 1970; Gotoh, 1982) is
an exhaustive method that find the best alignment by giving
substitutions scores for all pairs of aligned residues and gap
penalties (GP).
GATTTC
GAATTC
GAT–TTC
GA–ATTC
• To prevent excessive use of gaps, indels are usually penalized
using so-called GP.
• Alignment programs have separate penalties for inserting a
gap (gap opening) and for extending a gap (gap extend).
Alineamientos y homología
Inferencia Filogenética
Dynamic programming
Scoring Scheme:
•Match: +1
•Mismatch: 0
•Indel: -1
Alineamientos y homología
Dynamic programming
GATTC–
GAATTC
1
1
0
1
0
-1
Alignment score = 2
Alineamientos y homología
Dynamic programming
GA–TTC
GAATTC
1
1
-1
1
Alignment score = 4
1
1
Alineamientos y homología
Dynamic programming
G–ATTC
GAATTC
1
-1
1
1
Alignment score = 4
1
1
Alineamientos y homología
Multiple Sequence Alignments
• Phylogenetic trees are based on multiple sequence
alignments.
• Dynamic programming can be used to align multiple
sequences but the time required growth exponentially with
the number of sequences.
• Until end 1989 multiple sequences alignments were
assembled by hand because the exhaustive alignment of
more than five or six sequences is computationally
unfeasible.
• Now, most multiple sequences alignments are constructed
by the method known as progressive sequence alignment
(Feng and Doolittle, 1987; Higgins and Sharp; 1988).
Alineamientos y homología
Progressive Alignment
• Progressive alignment is a heuristic method as it makes no
guarantees to produce an alignment with the best score
according to a formula.
• 1) Perform all possible pairwise alignments between each
pair of sequences using a fast/approximate method.
• 2) Calculate the ‘distance’ between each pair of sequences
and construct a crude “guide tree” with the Neighbor-Joining
method.
• 3) The alignment is gradually built up by following the
branching order in the tree, with each step being treated as
a dynamic programming pairwise alignment, sometimes
with each member of a ‘pair’ having more than one
sequence.
Alineamientos y homología
Progressive alignment - step 1
1
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgacagcta
2
3
3. gctcgatacacgatgacta----gcta
4
4. gctcgatacacgatgacga---gcga
5
5. ctcgaacgatacgatgact----agct
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
Alineamientos y homología
Progressive alignment - step 2
1
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
3. gctcgatacacgatgactagcta
4. gctcgatacacgatgacgagcga
5. ctcgaacgatacgatgactagct
3. gctcgatacacgatgactagcta
4. gctcgatacacgatgacgagcga
2
3
4
5
Alineamientos y homología
Progressive alignment - step 3
1
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
+
3. gctcgatacacgatgactagcta
4. gctcgatacacgatgacgagcga
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
3. gctcgatacacga---tgactagcta
4. gctcgatacacga---tgacgagcga
2
3
4
5
Alineamientos y homología
Progressive alignment – final step
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
3. gctcgatacacga---tgactagcta
4. gctcgatacacga---tgacgagcga
+
5. ctcgaacgatacgatgactagct
1. gctcgatacgatacgatgactagcta
2. gctcgatacaagacgatgac-agcta
3. gctcgatacacga---tgactagcta
4. gctcgatacacga---tgacgagcga
5. -ctcga-acgatacgatgactagct-
1
2
3
4
5
. Construcción de árboles filogenéticos
• La inferencia de relaciones filogenéticas a partir de secs.
moleculares requiere de la selección de uno de los muchos
métodos disponibles
BR . 96. R J 081
BR . 96. R J 089
BR . 91. R J 347
BR . 96. R J 019
BR . 92. BR 003
BR . 92. BR 021
BR . 95. R J 006
BR . 03. 56ST
BR . 97. R J 130
BR . 03. 16ST
BR . 03. 58ST
BR . 96. R J 065
BR . 90. R J 049
BR . 90. R J 129
• Con frecuencia la inferencia filogenética es considerada
como una “caja negra” en la que “entran las secuencias y
salen los árboles”
BR . 97. R J 001
BR . 97. R J 008
BR . 03. 05ST
BR . 01. M19
BR . 03. 46ST
BR . 96. R J 092
BR . 90. R J 054
BR . 92. BR 017
BR . 92. BR 018
BR . 03. 06ST
BR . 03. 72C F
BR . 92. BR 030
BR . 96. R J 005
BR . 92. BR 020
BR . 97. R J 013
BR . 91. R J 364
BR . 96. R J 029
BR . 96. R J 095
BR . 97. R J 004
BR . 03. 41ST
BR . 92. BR 019
BR . 03. 29ST
BR . 96. R J 044
BR . 91. R J 145
BR . 91. R J 404
BR . 91. R J 379
BR . 95. R J 020
BR . 92. R J 452
BR . 95. R J 019
BR . 96. R J 002
BR . 03. 51ST
BR . 96. R J 001
7 7
BR . 96. R J 004
BR . 92. R J 625
BR . 97. P1
10
BR . 01. M06
BR . 91. R J 350
BR . 92. BR 014
BR . 95. R J 008
BR . 95. R J 002
BR . 96. R J 043
BR . 03. 13ST
BR . 95. R J 015
BR . 03. 59ST
BR . 96. R J 066
BR . 01. M44
BR . 97. R J 006
BR . 96. R J 093
BR . 97. R J 105
1 0 0
BR . 97. R J 116
BR . 90. R J 125
BR . 03. 14ST
BR . 01. M20
BR . 03. 50ST
BbrC ons
BR . 96. R J 025
BR . 92. R J 636
BR . 96. R J 070
BR . 91. R J 153
BR . 01. M49
BR . 92. BR 024
BR . 91. R J 392
BR . 01. M22
BR . 01. M23
BR . 01. M45
BR . 01. M41
BR . 90. R J 121
BR . 97. R J 011
BR . 90. R J 064
Objetivos fundamentales
BR . 91. R J 336
BR . 96. R J 088
BR . 91. R J 420
BR . 95. R J 005
BR . 92. R J 483
BR . 92. R J I 04
BR . 90. R J 062
1. desarrollar un marco conceptual para entender los fundamentos
teóricos (filosóficos)
que distinguen a los distintos métodos de inferencia (clasificación
de métodos)
BR . 90. R J 014
BR . 97. R J 131
BR . 03. 17ST
BR . 92. R J I 02
BR . 90. R J 019
BR . 90. R J 108
BR . 95. R J 017
BR . 97. R J 005
BR . 01. M46
BR . 03. 25ST
BR . 91. R J 139
BR . 91. R J 416
BR . 92. BR 028
BR . 95. R J 016
BR . 95. R J 013
BR . 96. R J 011
BR . 92. R J 478
BR . 95. R J 009
1 0 0
BR . 95. R J 010
BR . 92. R J 626
BR . 96. R J 041
BR . 90. R J 131
BR . 92. R J 484
BR . 96. R J 071
BR . 91. R J 398
BR . 90. R J 012
BR . 92. BR 004
BR . 96. R J 075
BR . 92. BR 026
BR . 03. 40ST
BR . 97. R J 124
2. presentar el uso de modelos y suposiciones en filogenética
BR . 97. P2
12
BR . 90. R J 059
BR . 03. 36ST
BR . 03. 53ST
BR . 91. R J 143
BR . 03. 54ST
BR . 92. R J 623
BR . 01. M16
0 .0 5
. Construcción de árboles filogenéticos
Tree Reconstructions Methods
Character-based methods
Distance-based methods
Methods based on an
explicit model of evolution
Maximum-likelihood
Bayesian Inference
Neighbour-Joining
Minimum Evolution
UPGMA
Methods not based on an
explicit model of evolution
Maximum-parsimony
. Construcción de árboles filogenéticos
Neighbor-Joining method
 The NJ (Saitou and Nei, 1987) method is a heuristic method for
estimating the minimum evolution tree.
 The NJ method is based on the minimum evolution principle and
construct internal nodes by joining nearest neighbors (two taxa
connected by a single node) in each step.
PAM
Spinach
Rice
Mosquito
Monkey
Human
Spinach
0.0
84.9
105.6
90.8
86.3
Distance Matrix
Rice
Mosquito
84.9
105.6
0.0
117.8
117.8
0.0
122.4
84.7
122.6
80.8
Monkey
90.8
122.4
84.7
0.0
3.3
Human
86.3
122.6
80.8
3.3
0.0
. Construcción de árboles filogenéticos
Neighbor-Joining (1)
Distance 3.3 (Human - Monkey) is the minimum. So we'll join
Human and Monkey to MonHum.
Mon-Hum
Mosquito
Spinach
Rice Human
Recalculate the distance matrix again…..
Monkey
. Construcción de árboles filogenéticos
Neighbor-Joining (2)
PAM
Spinach
Rice
Mosquito
MonHum
Spinach
0.0
84.9
105.6
88.6
Rice
84.9
0.0
117.8
122.5
Mosquito
105.6
117.8
0.0
82.8
MonHum
88.6
122.5
82.8
0.0
Mos-(Mon-Hum)
Mon-Hum
Rice
Spinach
Mosquito
Human
Recalculate the distance matrix again…..
Monkey
. Construcción de árboles filogenéticos
Neighbor-Joining (3)
PAM
Spinach
Rice
MosMonHum
Spinach
0.0
84.9
97.1
Rice
84.9
0.0
120.2
MosMonHum
97.1
120.2
0.0
Mos-(Mon-Hum)
Spin-Rice
Rice
Spinach
Mon-Hum
Mosquito
Human
Recalculate the distance matrix again…..
Monkey
. Construcción de árboles filogenéticos
Neighbor-Joining (4)
PAM
Spinach
MosMonHu m
SpinRice
0.0
108.7
MosMonHu m
108.7
0.0
(Spin-Rice)-(Mos-(Mon-Hum))
Mos-(Mon-Hum)
Spin-Rice
Rice
Mon-Hum
Spinach
Mosquito
Human
Monkey
. Construcción de árboles filogenéticos
Unrooted Neighbor-Joining Tree
Human
Spinach
Monkey
Rice
Mosquito
. Construcción de árboles filogenéticos
Distance-based methods
 Advantages:
- very fast
- allows the use of an explicit model of evolution
Disadvantages:
- only produces one best tree (we do not get any idea about other
potential tress)
- reduces all sequence information into a single distance value
- generally outperformed by Maximum likelihood or Bayesian methods
in choosing the correct tree in computer simulations
. Construcción de árboles filogenéticos
Tree Reconstructions Methods
Character-based methods
Distance-based methods
Methods based on an
explicit model of evolution
Maximum-likelihood
Bayesian Inference
Neighbour-Joining
Neighbour-Joining
Minimum Evolution
UPGMA
Methods not based on an
explicit model of evolution
Maximum-parsimony
. Construcción de árboles filogenéticos