Download Sequence analysis course

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to bioinformatics
lecture 9
Multiple sequence alignment (II)
Scoring a profile position
Profile 1
A
C
D
.
.
Y
Profile 2
A
C
D
.
.
Y
 At each position (column) we have different residue
frequencies for each amino acid (rows)
SO:
 Instead of saying S=M(aa1, aa2) (one residue pair)
 For frequency f>0 (amino acid is actually there) we take:
20 20
S   faai  faaj  M(aai , aaj )
i
j
Progressive alignment
1.
2.
3.
Perform pair-wise alignments of all of the sequences;
Use the alignment scores to produces a dendrogram using
neighbour-joining methods (guide-tree);
Align the sequences sequentially, guided by the
relationships indicated by the tree.
Biopat
(first method ever)
MULTAL
(Taylor 1987)
DIALIGN
PRRP
(1&2, Morgenstern 1996)
(Gotoh 1996)
ClustalW
(Thompson et al 1994)
PRALINE
(Heringa 1999)
T
Coffee (Notredame 2000)
POA
(Lee 2002)
MUSCLE
(Edgar 2004)
Progressive multiple alignment
1
2
1
3
Score 1-2
4
5
Score 4-5
Score 1-3
Scores
5×5
Scores to distances
Guide tree
Similarity
matrix
Iteration possibilities
Multiple alignment
General progressive multiple
alignment technique (follow generated tree)
d
1
3
1
3
2
5
1
3
2
5
root
1
3
2
5
PRALINE progressive strategy
d
1
3
1
3
2
1
3
2
5
4
1
3
2
5
4
There are problems …
Accuracy is very important !!!!

Alignment errors during the construction of the MSA cannot
be repaired anymore: propagated into the progressive steps.

The comparisons of sequences at early steps during
progressive alignments cannot make use of information from
other sequences.

It is only later during the alignment progression that more
information from other sequences (e.g. through profile
representation) becomes employed in the alignment steps.
“Once a gap, always a gap”
Feng & Doolittle, 1987
Additional strategies for multiple
sequence alignment
• Profile pre-processing
• Secondary structure-induced
alignment
• Globalised local alignment
• Matrix extension
Objective: try to avoid (early) errors
Profile pre-processing
1
2
1
3
4
5
Score 1-2
Score 1-3
Score 4-5
1
1
1
2
3
4
5
A
C
D
.
.
Y
Pi
Px
Key Sequence
Pre-alignment
Master-slave (N-to-1)
alignment
Pre-profile
Pre-profile generation
1
2
1
3
Score 1-2
4
5
Score 4-5
Score 1-3
Cut-off
1
1
2
3
4
5
2
2 134
5
5
5
1
2
3
4
Pre-alignments
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
Pre-profiles
Pre-profile alignment
Pre-profiles
1
2
3
4
5
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
Final alignment
A
C
D
.
.
Y
1
2
3
4
5
Pre-profile alignment
1
2
3
4
5
12
3
4
5
21
3
4
5
31
2
4
5
41
2
3
5
5
1
2
3
4
Final alignment
1
2
3
4
5
Pre-profile alignment
Alignment consistency
1
2
3
4
5
12
3
4
5
21
3
4
5
1
2
31
2
4
5
41
2
3
5
5
1
2
3
4
5
Ala131
A131
A131
L133
C126
A131
PRALINE pre-profile generation
• Idea: use the information from all query sequences to
make a pre-profile for each query sequence that
contains information from other sequences
• You can use all sequences in each pre-profile, or use
only those sequences that will probably align
‘correctly’. Incorrectly aligned sequences in the preprofiles will increase the noise level.
• Select using alignment score: only allow sequences in
pre-profiles if their alignment with the score higher
than a given threshold value. In PRALINE, this
threshold is given as prepro=1500 (alignment score
threshold value is 1500 – see next two slides)
Flavodoxin-cheY consistency scores
(PRALINE prepro=0)
1fx1
FLAV_DESVH
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
FLAV_CLOAB
3chy
--7899999999999TEYTAETIARQL8776-6657777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF
-46788999999999TEYTAETIAREL7777-7757777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF
-47899999999999999999999988776695658888777777778763YDAVL999SAW9877789877753556666669777776789GRKVAAF
-46788999999999TEGVAEAIAKTL9997-76678888777777887539DVVL999ST987776--9889546667776697776557777888888
93677799999999999999999999988759765777888888888876399999999STW77765--9999536666677797998779999999999
-878779999999999999999999776666967567788888888888777999999988777776--9889577788888897773237888888888
9776779999999999999999997777766-665666677788899976799999999987777669--887362334466695555455778888888
--87899999999999TEVADFIGK996541900300000112233355679DLLF99999855312888111224555555407777777888888888
-47899LFYGTQTGKTESVAEIIR9777653922356677777777897779999999999988843--9998555778777899998879999999999
997789999GSDTGNTENIAKMIQ8774222922456678889999995569999999999755553----99262225555495777767778999999
--79IGLFFGSNTGKTRKVAKSIK99887759657577888888999777899999999999877761112222222244555-5555555778999999
94789999999999999999999998755229223234555555555555688899999998875521111111133477777-7777777999999999
-86999ILYSSKTGKTERVAK9997555555057678887888887777765778899998522223--9888342234455597777777777777777
0122222223333335666665555555222922222222222221112163335555755553222888877674533344493332222222222222
Avrg Consist
Conservation
8667778888888889999999998776554844455566666666665557888888888766544887666334445566586666556778888888
0125538675848969746963946463343045244355446543473516658868567554455000000314365446505575435547747759
1fx1
FLAV_DESVH
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
FLAV_CLOAB
3chy
G888799955555559888888888899777----7777797787787978---555555566776555677777778888799-----G888799955555559888888888899777----7777797787787978---555555566776555677777778888799-----A88878685555555999988888889998879--8777788-98777777--8555555554433245667777777777599-----87775977755555677777777777777778---88888887667778777775555555555542424667888887777-------977768777555556777777777777777767887777777778888-978985555555556536556888888888877-------867777555555552666666666555555577887767999877777977777665555555555444466666666555798-----8577775666666525556777778888888689977888988776558677885544333222222212233223355557-------877773573333333777766667777765533333333333333322833333333332244444567777777888777633-----977773775333344777888888777777733334444444444433833333344444444444455577777788777734-----977743786444444777788888888888833334444444444444244444555554555775667788888888877734110000
97776355333333466666667777777773333444444444444482333355555555555545558888888877772311---977773886555555866666666677666633333333333333322123333344444444455555665566666555582-----766627222222212444444444455555587882222222222222111111122222222222344443333333233399-----222227222222224111355431113324578-87778997666556877776322222222222322222323344444422------
Avrg Consist
Conservation
866656564444444666666666666666656665555565555555655565444443444443344455666666666666889999
73663057433334163464534444*746710000011010011000000010434744645443225474454448434301000000
Consistency
are scored
0 toSId=
10;3838
the value
10 is represented by the corresponding amino acid (red)
Iteration 0values
SP= 135136.00
AvSP=from
10.473
AvSId= 0.297
Flavodoxin-cheY consistency scores
(PRALINE prepro=1500)
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
-42444IVYGSTTGNTEYTAETIARQL886666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
-34444IVYGSTTGNTEYTAETIAREL776666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
-33444IVYGSTTGNTET99999888777655777668888899666686YDIVLFGCSTW77777----996466666779-88SL98ADLKGKKVSVF
-34444IVYGSTTGNTEGVA9999999999765555677777886666678DVVLLGCSTW77777----995466666779-88887688888KKVGVF
-44777IVFGSSTGNTE988777666655566777778899999777777YDAVLFGCSAW88877----997587777779-8887766777GRKVAAF
-32222IVYWSGTGNTE8888888876666778888888888NI8888586DILILGCSA888888------8-8888886--66665378ISGKKVALF
-12222IVYWSGTGNTEAMA8888888888888888555555555555485DVILLGCPAMGSE77------572222288--8888755588GKKVGLF
-41456IFFSTSTGNTTEVA999998865432222765554443244779YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
-00456LFYGTQTGKTESVAEII987755323322427776666623589YQYLIIGCPTW55532--999843678W988899998888888GKLVAYF
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
-266IGIFFGSDTGQTRKVAKLIHQKL6664664424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-8NTLSEADLTGKTVALF
-51114IFFGSDTGNTENIAKMI987743311111555555588355599YDILLLGIPT954431----88355225544--44666666779KLVALF
-63666ILYSSKTGKTERVAKLIE63333333333333333333366LQESEGIIFGTPTY63--6--------66SWE33333333333333GKLGAAF
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
Conservation
9334459999999999999999988776655555555666667756667889999999999767658888775555566668967777677889999999
0236428675848969746963946463344354312564565414344366588685675544550000003144654460055575345547747759
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899
G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899
G98878-688688888-88--88999999999999979988888887788889-89-9787777666756645577776666654466899899
G98879-898688888987--788888999GATLV7698899-9998789888-8899787878776663122477788888333276899899
AS8888-68-888888899--9999999999988888-99988888988778897888776668854222212255555555333277999999
GS2228-228222222222--2388888888888888888888888888888888888887778866765535577555533221288888888
G4888--28-8888882MD--AWKQRTEDTGATVI77---------------------77222--224444222222244222112-------GLGDA5-8Y5DNFC88-88--8877777777777765444555555555544385555777774465333357799999987555333899899
GTGDQ5-GY5899999-99--99EEKISQRGG99975555544444444433284444466665555555556666676666433333899899
GLGDQ5-885777555-55--55555788888888555555555555555554855555555555666555555888855555544442--288
GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG8888EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE88842242688688
GC99549784688888987997777777778888855444444444444444114444777774455775567788888887433322100100
STANS6366663333333333336666666666666666663333363366336663333336EDENARIFGERIANKVKQI333333666666
VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------
Avrg Consist
Conservation
9988779787777777777997788888888888866777777777767766677777676667766655455577776666433355788788
746640037154545706300354534444*745753000001010010000000010683760144442335574454448434301000000
Iteration 0
SP= 136702.00
AvSP= 10.654
SId= 3955
AvSId= 0.308
Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)
Strategies for multiple sequence
alignment
• Profile pre-processing
• Secondary structure-induced
alignment
• Globalised local alignment
• Matrix extension
Objective: integrate secondary structure
information to anchor alignments and avoid
errors
Protein structure hierarchical levels
PRIMARY STRUCTURE (amino acid sequence)
SECONDARY STRUCTURE (helices, strands)
VHLTPEEKSAVTALWGKVNVDE
VGGEALGRLLVVYPWTQRFFE
SFGDLSTPDAVMGNPKVKAHG
KKVLGAFSDGLAHLDNLKGTFA
TLSELHCDKLHVDPENFRLLGN
VLVCVLAHHFGKEFTPPVQAAY
QKVVAGVANALAHKYH
QUATERNARY STRUCTURE (oligomers)
TERTIARY STRUCTURE (fold)
Why use (predicted) structural
information
• “Structure more conserved than sequence”
– Many structural protein families (e.g. globins) have family
members with very low sequence similarities. For example,
globin sequences identities can be as low as 10% while still
having an identical fold.
• This means that you can still observe equivalent
secondary structures in homologous proteins even if
sequence similarities are extremely low.
• But you are dependent on the quality of prediction
methods. For example, secondary structure prediction is
currently at 76% correctness. So, 1 out of 4 predicted
amino acids is still incorrect.
Two superposed protein
structures with two wellsuperposed helices
Red: well
superposed
Blue: low match
quality
C5 anaphylatoxin -- human (PDB code 1kjs) and pig
(1c5a)) proteins are superposed
How to combine ss and aa info
Amino acid
substitution
matrices
Dynamic programming
search matrix
M
D
A
A
S
T
I
L
C
G
S
MDAGSTVILCFV
HHHCCCEEEEEE
H
H H
H
H
C
C
C
E
E
E
C
C
H
E
E
C
Default
In terms of scoring…
• So how would you score a profile using this
extra information?
– Same formula as in lecture 6, but you can use
sec. struct. specific substitution scores in
various combinations.
• Where does it fit in?
– Very important: structure is always more
conserved than sequence so it can help with the
insertion(or not) of gaps.
Sequences to be aligned
Predict secondary structure
Secondary
structure
HHHHCCEEECCCEEECCHH
CCCCCCEECCCEEEECCHH
HHHCCCCEECCCEEHHH
HHHHHCCEEEECCCEECCC
HHHHHHHHHHHHHCCCEEEE
Align sequences using secondary structure
Multiple
alignment
Using predicted secondary structure
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
-PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF
e eeee b ssshhhhhhhhhhhhhhttt eeeee stt
tttttt seeee b ee sss
ee ttthhhhtt ttss tt eeeee
MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf
e eeeeee
hhhhhhhhhhhhhhh
eeeeee
eeeeee
hhhhhh
eeeee
MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf
e eeeeee
hhhhhhhhhhhhhh
eeeeee
hhhhhh eeeeeee
hhhhhh
eeeeee
MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf
eeeeee
hhhhhhhhhhhhhh
eeeee
eeeee
hhhhhhh h
eeeee
MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf
eeee
hhhhhhhhhhhhhh
eeeee
hhhhhhhhhhheeeee
hhhhhhh hh
eeeee
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF
eeeee ssshhhhhhhhhhhhhggg
b
eeggg s gggggg seeeeeee stt s
s s sthhhhhhhtggg
tt eeeee
SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLYSE-LDDVDFNGKLVAYf
eeeee
hhhhhhhhhhhh
eee
hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee
-AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QCDWDDFFPT-LEEIDFNGKLVALf
eee
hhhhhhhhhhhh
eee
hhh hhhhhhheeeee
hhhhh
eeeeee
-AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf
eee
hhhhhhhhhhhhh
hhh hhhhhhheeeee
hhhhhhhhh
eeeeee
MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf
eeee
hhhhhhhhhhhh
hhh hhhhhhheeeee
hhhhh
eeeee
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF
eeeee ssshhhhhhhhhhhhhhhtt
eeeettt sttttt seeeeee btttb
ttthhhhhhh hst t tt eeeee
M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh
eeeee
hhhhhhhh eeeee
eeeee
M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWEMKKWIDE-SSEFNLEGKLGAAf
eee
hhhhhhhhhhhhhh
eeeeee
hhhhhhhhhh eeee
hhhhhhhhh
eeeee
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSALPVLMV
tt eeee s hhhhhhhhhhhhhht
eeeesshh hhhhhhhh
eeeee
s sss
hhhhhhhhhh ttttt eeee
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------eee s ss sstthhhhhhhhhhhttt ee s
eeees
gggghhhhhhhhhhhhhh
GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------eee
hhhhhhhhhhhh
eeeee
eeeee
hhhhhhhhhhhhhh
GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS---------------------SLKIDGE--P--DSAEVLDwAREVLARV-------eee
hhhhhhhhhhhh
eeeee
hhhhhhhhhhh
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD---------------------SLKIDGD--P--ERDEIVSwGSGIADKI-------hhhhhhhhhhhh
eeeee
e
eee
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------e
hhhhhhhhhhhhhh
eeeee
ee
hhhhhhhhhhh
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhht
GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL-----hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhhhh
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-e
hhhhhhhhhhhhhh
eeeee
hhhhhhhhhhh
GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L-----hhhhhhhhhhhhhhh
eeee
hhhhhhh
hhhhhhhhhhhh
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------e
eesss shhhhhhhhhhhhtt ee s
eeees
ggghhhhhhhhhhhht
G-----SYGWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNAPE-CKElGEAAAKA--------hhhhhhhhhhh
eeeee
eeee
h hhhhhhhh
STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF-hhhhhhhhhhhhhh eeeee
hhhh hhh
hhhhhhhhhhhh h
-----------TAEAKKENIIAAAQAGASGY-------------------------VVK----P-FTAATLEEKLNKIFEKLGM-----ess hhhhhhhhhtt see
ees
s
hhhhhhhhhhhhhhht
G
Strategies for multiple sequence
alignment
• Profile pre-processing
• Secondary structure-induced alignment
• Globalised local alignment
• Matrix extension
Objectives:
Instead of single amino acid positions, focus on local alignments
Consider best local alignment through each cell in DP matrix
Try to avoid (early) errors
Globalised local alignment
1. Local (SW) alignment (M + Po,e)
+
=
2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
Strategies for multiple sequence
alignment
• Profile pre-processing
• Secondary structure-induced
alignment
• Globalised local alignment
• Matrix extension
Objective: try to avoid (early) errors
Integrating alignment methods
and alignment information with
T-Coffee
• Integrating different pair-wise alignment
techniques (NW, SW, ..)
• Combining different multiple alignment
methods (consensus multiple alignment)
• Combining sequence alignment methods
with structural alignment techniques
• Plug in user knowledge
Matrix extension
T-Coffee
Tree-based Consistency Objective Function
For alignmEnt Evaluation
Cedric Notredame
Des Higgins
Jaap Heringa
J. Mol. Biol., 302, 205-217;2000
Using different sources of alignment information
Clustal
Clustal
Structure alignments
Dialign
Lalign
Manual
T-Coffee
Search matrix extension – alignment transitivity
T-Coffee
Other
sequences
Direct
alignment
Search matrix extension
but.....
T-COFFEE (V1.23) multiple sequence alignment
Flavodoxin-cheY
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
4fxn
FLAV_MEGEL
FLAV_CLOAB
2fcr
FLAV_ENTAG
FLAV_ANASP
FLAV_AZOVI
FLAV_ECOLI
3chy
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
4fxn
FLAV_MEGEL
FLAV_CLOAB
2fcr
FLAV_ENTAG
FLAV_ANASP
FLAV_AZOVI
FLAV_ECOLI
3chy
----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-------MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-------MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK-------MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK-------MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK----------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK---------MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK--------MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL---------KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP-------MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT-------SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL--------AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT--------AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL----ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV
:.
.
. :
.
::
---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI----------------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI----------------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV----------------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI----------------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL----------------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI-----------------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA-----------------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-------------------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV--------------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL---------------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL--------------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL------------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA
TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM---------------------------------------------------------.
Multiple alignment methods
 Multi-dimensional dynamic programming
> extension of pairwise sequence alignment.
 Progressive alignment
> incorporates phylogenetic information to guide the
alignment process
 Iterative alignment
> correct for problems with progressive alignment by
repeatedly realigning subgroups of sequence
Related documents