Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Multiple Sequence Alignment Julie Thompson Laboratory of Integrative Bioinformatics and Genomics IGBMC, Strasbourg, France [email protected] Multiple Sequence Alignment Introduction: what is a multiple alignment? Multiple alignment construction Multiple alignment analysis Traditional approaches: optimal, progressive Alignment parameters Iterative and co-operative approaches Quality analysis/error detection Conserved/homologous regions Multiple alignment applications Julie Thompson – IGBMC What is a multiple alignment? a representation of a set of sequences, where equivalent residues (e.g. functional, structural) are aligned in rows or more usually columns Example: part of an alignment of SH2 domains from 14 sequences lnk_rat crk1_mouse nck_human ht16_hydat pip5_human fer_human 1ab2 1mil 1blj 1shd 1lkkA 1csy 1bfi 1gri Julie Thompson – IGBMC * conserved identical residues : conserved similar residues What is a multiple alignment? conserved residues conservation profile Julie Thompson – IGBMC secondary structure Multiple Sequence Alignment Introduction: what is a multiple alignment? Multiple alignment construction Multiple alignment analysis Traditional approaches: optimal, progressive Alignment parameters Iterative and co-operative approaches Quality analysis/error detection Conserved/homologous regions Multiple alignment applications Julie Thompson – IGBMC Multiple Alignment Construction Optimal multiple alignment example : MSA (Lipman et al. 1989, Gupta et al. 1995) Julie Thompson – IGBMC Optimal multiple alignment Extension of dynamic programming for 2 sequences => N dimensions Example : alignment of 3 sequences Problem : calculation time and memory requirements Time proportional to Nk for k sequences of length N => limited to less than 10 sequences Alignment of 5 sulfate binding proteins, length 224-263 residues: MSA OMA ClustalW >12hours 62.9min 0.6sec Julie Thompson – IGBMC Multiple Alignment Construction Optimal multiple alignment MSA, OMA Progressive multiple alignment ClustalW (Thompson et al. NAR. 1994) ClustalX (Thompson et al. NAR. 1997) Julie Thompson – IGBMC Progressive multiple alignment Idea : Progressively align pairs of sequences (or groups of sequences) Problem : Start with which sequences ? How to decide order of alignment ? first align the most closely related sequences How to measure the similarity of the sequences ? align all the sequences pairwise calculate the similarity between each pair from the alignment Julie Thompson – IGBMC Progressive multiple alignment 1) Pairwise alignments of all sequences The alignment can be obtained by : - local or global method - dynamic programming or heuristic method (eg. K-tuple count) Hbb_human Hba_human Ex : local pairwise alignments of globin sequences Hbb_human Hbb_horse Hba_human Hbb_horse Julie Thompson – IGBMC 3 LTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLST ... |.| :|. | | |||| . | | ||| |: . :| |. :| | ||| 2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLS. ... 1 VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST ... | |. |||.|| ||| ||| :|||||||||||||||||||||:|||||| 1 VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSN ... 2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLSH ... || :| | | | || | | ||| |: . :| |. :| | |||. 3 LSGEEKAAVLALWDKVNEE..EVGGEALGRLLVVYPWTQRFFDSFGDLSN ... Progressive multiple alignment 2) Construction of a distance matrix Example in ClustalW/X : distance between 2 sequences = 1- Ex : 7 globin sequences Julie Thompson – IGBMC Hbb_human Hbb_horse Hba_human Hba_horse Myg_phyca Glb5_petma Lgb2_lupla 1 2 3 4 5 6 7 .17 .59 .59 .77 .81 .87 1 .60 .59 .77 .82 .86 2 .13 .75 .73 .86 3 No. identical residues No. aligned residues .75 .74 .88 4 .80 .93 5 .90 6 7 Progressive multiple alignment 3) Decide order of alignment • Sequential branching • Construction of a ‘guide tree’ - Neigbor-Joining (NJ) - UPGMA - Maximum likelihood Progressive alignment using sequential branching Hba_human Hba_horse Hbb_horse Hbb_human Glb5_petma Myg_phyca Lgb2_lupla Julie Thompson – IGBMC Progressive alignment following a guide tree .081 .226 1 .061 2 3 .015 4 .062 5 6 6 5 4 3 2.084 .055 .219 1.065 Hbb_human Hbb_horse Hba_human Hba_horse .398 Myg_phyca .389 Glb5_petma .442 Lgb2_lupla Progressive multiple alignment 4) Progressive multiple alignment The sequences are aligned progressively (global or local algorithm) : - alignment of 2 sequences - alignment of 1 sequence and a profile (group of sequences) - alignment of 2 profiles (groups of sequences) xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx Julie Thompson – IGBMC Progressive multiple alignment H1 H2 H3 H4 HBB_HUMAN HBB_HORSE HBA_HUMAN HBA_HORSE MYG_PHYCA GLB5_PETMA LGB2_LUPLU --------VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDN --------VQLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDN ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSAQVKGHGKKVADALTNAVAHVDD ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-----HGSAQVKAHGKKVGDALTLAVGHLDD ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGH PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDD --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQV *: : : * . : .: *: * : .. .:: *. : . HBB_HUMAN HBB_HORSE HBA_HUMAN HBA_HORSE MYG_PHYCA GLB5_PETMA LGB2_LUPLU -----LKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH----------LKGTFAALSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH----------MPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR----------LPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR----------HEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG T--EKMSMKLRDLSGKHAKSFQVDPQYFKVLAAVIADTVAAG---------DAGFEKLMSMICILLRSAY------TGVVVTDATLKNLGSVHVSKG-VADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--: *. * . : : . : : .: ... . : Julie Thompson – IGBMC H5 H6 H7 Progressive multiple alignment Global Local SBpima SB multal NJ ML MLpima SB - sequential branching Julie Thompson – IGBMC UPGMA clustalx multalign pileup UPGMA- Unweighted Pair Grouping Method ML - maximum likelihood NJ - neighbor-joining Alignment parameters : similarity matrices Dynamic programming methods score an alignment using residue similarity matrices, containing a score for matching all pairs of residues For nucleotide sequences: A A 2 C -2 G -1 T -2 C -2 2 -2 -1 G -1 -2 2 -2 T -2 -1 -2 2 Transitions (A-G or C-T) are more frequent than transversions (A-T or C-G) More complex matrices exist where matches between ambiguous nucleotides are given values whenever there is any overlap in the sets of nucleotides represented Julie Thompson – IGBMC Alignment parameters : similarity matrices For proteins, a wide variety of matrices exist: Identity, PAM, Blosum, Gonnet etc. Matrices are generally constructed by observing the mutations in large sets of alignments, either sequence-based or structure-based Matrices range from strict ones for comparing closely related sequences to soft ones for very divergent sequences. e.g. PAM250 corresponds to an evolutionary distance of 250%, or approximately 80% residue divergence PAM1 corresponds to less than 1% divergence Julie Thompson – IGBMC Alignment parameters : similarity matrices A single best matrix does not exist! Altschul, 1991 suggests PAM250 for related sequences, PAM120 when the sequences are not known to be related and PAM40 to search for short segments of highly similar sequences. Henikoff, Henikoff, 1993 suggest Blosum62 as a good all-round matrix, Blosum45 for more divergent sequences and Blosum100 for strongly related sequences ClustalW automatically selects a suitable matrix depending on the observed pairwise % identity: By default: ID >35% 35%>ID >25% <25%ID Julie Thompson – IGBMC Gonnet 80 Gonnet 250 Gonnet 350 Alignment parameters : gap penalties A gap penalty is a cost for introducing gaps into the alignment, corresponding to insertions or deletions in the sequences SFGDLSNPGAVMG HF-DLS-----HG proportional gap costs charge a fixed penalty for each residue aligned with a gap - the cost of a gap is proportional to its length: GAP_COST=uk where k is the length of gap linear or ‘affine’ gap costs define a cost for introducing or ‘opening’ a gap, plus a length-dependent ‘extension’ cost GAP_COST=v+uk where v is the gap opening cost, u is the gap extension cost Julie Thompson – IGBMC Alignment parameters : gap penalties ClustalW uses position-specific gap penalties to make gaps more or less likely at different positions in the alignment 30 20 10 0 HLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDL QLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDL VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLS Gap penalties are lowered at existing gaps and increased near to existing gaps Gap penalties are lowered in hydrophilic stretches Otherwise, gap opening penalties are modified according to their observed relative frequencies adjacent to gaps (Pascarella & Argos, 1992) Goal is to introduce gaps in sequence segments corresponding to flexible regions of the protein structure Julie Thompson – IGBMC Multiple Alignment Construction Optimal multiple alignment MSA, OMA Progressive multiple alignment ClustalW, ClustalX Iterative multiple alignment PRRP (Gotoh, 1993) SAGA (Notredame et al. NAR. 1996) DIALIGN (Morgenstern et al. 1999) HMMER (Eddy 1998), SAM (Karplus et al. 2001) Julie Thompson – IGBMC Iterative refinement PRRP (Gotoh, 1993) refines an initial progressive multiple alignment by iteratively dividing the alignment into 2 profiles and realigning them. divide sequences into 2 groups initial alignment Global progressif profile 1 pairwise profile alignment refined alignment converged? profile 2 no Julie Thompson – IGBMC Genetic Algorithms SAGA (Notredame et al.1996) evolves a population of alignments in a quasi evolutionary manner, iteratively improving the fitness of the population population n select a number of individuals to be parents modify the parents by shuffling gaps, merging 2 alignments etc. population n+1 evaluation of the fitness using OF (sum-of-pairs or COFFEE) END Julie Thompson – IGBMC Segment-to-segment alignment Dialign (Morgenstern et al. 1996) compares segments of sequences instead of single residues 1. construct dot-plots of all possible pairs of sequences Sequence i Sequence j 2. find a maximal set of consistent diagonals in all the sequences .......aeyVRALFDFngndeedlpfkKGDILRIrdkpeeq...............WWNAedsegkr.GMIPVPYVek.......... ........nlFVALYDFvasgdntlsitKGEKLRVlgynhnge..............WCEAqtkngq..GWVPSNYItpvns....... ieqvpqqptyVQALFDFdpqedgelgfrRGDFIHVmdnsdpn...............WWKGachgqt..GMFPRNYVtpvnrnv..... gsmstselkkVVALYDYmpmnandlqlrKGDEYFIleesnlp...............WWRArdkngqe.GYIPSNYVteaeds...... .....tagkiFRAMYDYmaadadevsfkDGDAIINvqaideg...............WMYGtvqrtgrtGMLPANYVeai......... ..gsptfkcaVKALFDYkaqredeltfiKSAIIQNvekqegg...............WWRGdyggkkq.LWFPSNYVeemvnpegihrd .......gyqYRALYDYkkereedidlhLGDILTVnkgslvalgfsdgqearpeeigWLNGynettgerGDFPGTYVeyigrkkisp.. 3. Local alignment - residues between the diagonals are not aligned Julie Thompson – IGBMC Multiple alignment methods Progressive Global Local SBpima SB multal NJ ML UPGMA MLpima multalign pileup clustalx prrp dialign Iterative Julie Thompson – IGBMC Genetic Algo. HMM saga hmmt Comparison of programs League Table based on BAliBASE benchmark database Reference 1: < 6 sequences Tous All multal multalign pileup clustalx prrp saga hmmt MLpima SBpima dialign < 100 résidues > 400 résidues Reference 2: a family with an orphan Reference 3: several sub-families N/A N/A Reference 4: long N/C terminal extensions Reference 5: long insertions N/A N/A iterative N/A N/A iterative • Iterative algorithms can improve alignment quality, but can be slow • Global algorithms work well when sequences are homologous over their full lengths, local algorithms are better for non-colinear sequences Julie Thompson – IGBMC Thompson et al. 1999 Multiple Alignment Construction Optimal multiple alignment MSA, OMA Progressive multiple alignment ClustalW, ClustalX Iterative multiple alignment PRRP, SAGA, DIALIGN, HMMER, SAM Co-operative multiple alignment Julie Thompson – IGBMC T-COFFEE (Notredame et al. 2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ DbClustal (Thompson et al. 2000) http://www-igbmc.u-strasbg.fr/BioInfo/ MAFFT (Katoh et al. 2002) http://www.biophys.kyotou.ac.jp/˜katoh/programs/align/mafft/ MUSCLE (Edgar, 2004) http://www.drive5.com/muscle Probcons (Do et al. 2005) Kalign (Lassmann et al. 2005) DbClustal Blast Database Search Query Sequence http://bips.u-strasbg.fr/PipeAlign/ Ballast Anchors Query Sequence Anchors Database Hits Domain A Domain B Domain C Julie Thompson – IGBMC DbClustal Alignment Comparaison ClustalW / DbClustal ClustalW DbClustal Julie Thompson – IGBMC MAFFT • Local homologous segments detected using a Fast Fourier Transform • Pairwise alignments are performed using restricted global dynamic programming • Multiple alignment is built up using a progressive algorithm, similar to ClustalW • Multiple alignment is then iteratively refined by dividing alignment into 2 parts and realigning Julie Thompson – IGBMC MAFFT Pairwise alignments c(k) -1 2 k K=2 GLWGKAAAEEEGLWLFF—--KGVFGAEQEGLFVFFGG K=-1 -GLWGKAAAEEEGLWLFF KGVFGAEQEGLFVFFGG- 1. Fast Fourier Transform to detect local conserved segments Julie Thompson – IGBMC 2. Segment Level Dynamic Programming to select ‘consistent’ segments 3. Fix residues at the centre of each segment pair and realign between fixed points (white regions only) State-of-the-art Co-operative algorithms have led to significant improvements… BAliBASE 3 : Ref 11 <20% ID Ref 5 insertions Ref 12 20-40% ID Ref 4 extensions Ref 2 orphan ClustalW (1994) Dialign (1996) Mafft (2002) Probcons (2005) Ref 3 sub-families … but none of the methods currently available are capable of producing high-quality alignments for all test cases Julie Thompson – IGBMC Thompson et al. 2005, 2006 RNA alignment methods Comparison using ‘BRAliBASE’ RNA structure alignments (Gardner et al, 2005) Above 60% identity, sequence and structure based approaches have similar scores Algorithms incorporating structural information outperform pure sequence methods. However, these algorithms are computationally demanding which severely limits their use in practice. Some more recent methods: Sequence: R-Coffee (Wilm, 2008), MAFFT (Katoh, 2008) Structure: LARA (Bauer, 2007), FoldalignM (Torarinsson, 2007), SCARNA (Tabei, 2008) Julie Thompson – IGBMC DNA alignment methods Complete genomes Local alignments (BlastZ, MultiZ, MUMmer,…) Global alignments (MGA, Multi-LAGAN, MAVID, MAUVE, MAP2, Mulan,…) Julie Thompson – IGBMC Reviewed in Dewey and Pachter, Human Molecular Genetics, 2006 Multiple Sequence Alignment Introduction: what is a multiple alignment? Multiple alignment construction Multiple alignment analysis Traditional approaches: optimal, progressive Alignment parameters Iterative and co-operative approaches Quality analysis/error detection Conserved/homologous regions Multiple alignment applications Julie Thompson – IGBMC Multiple alignment analysis Are the sequences correctly aligned? Quality analysis: alignment objective functions (SP, NorMD) error detection and correction (RASCAL, Refiner) Are the sequences in the alignment homologous? Conserved/homologous regions (MCOFFEE, LEON) Conserved (functional) residues Julie Thompson – IGBMC Objective functions Sum-of-pairs (Carrillo, Lipman, 1988) : Sum of scores for all pairs of sequences Sequence 1 Sequence 2 Sequence 3 Sequence 4 N N N N N N N C N N C C Seq1-2 Seq1-3 Seq1-4 Seq2-3 Seq2-4 Seq3-4 3 pairs N-N 2 pairs N-N, 1 pair N-C 1 pair N-N, 2 pairs N-C 2 pairs N-N, 1 pair N-C 1 pair N-N, 2 pairs N-C 1 pair N-N, 1 pair N-C, 1 pair CC 3x6=18 2x6+(-3)=9 6+2x(-3)=0 2x6+(-3)=9 6+2x(-3)=0 6+(-3)+9=12 Blosum62 N C N 6 -3 C -3 9 48 Information content (Hertz et al, 1999) - Entropy column scores (between 0 and 1), sum for all columns in the alignment norMD (Thompson et al, 2001) - Column scores - normalisation for sequence set to be aligned (number, length, similarity) <0.3 bad alignment 0.3-0.7 some local errors >0.7 good alignment Julie Thompson – IGBMC Objective functions: NorMD 1gln 1exd 1exd syq_luplu syq_human syq_ecoli syq_haein sye_metja sye_metth sye_mettm pyro_hori1 pyro_aby1 sye_arcfu aero_perni sye_sulso syep_human caeno_eleg syep_drome schizo_pom syec_yeast arab_thali syem_yeast pseudo_aer sye_rhime chlamy_psi sye_mycge sye_mycpn sye_mycpu sye_theth sye_horvu sye_tobac thermo_mar strepto_co sye_lacde sye_bacsu sye_bacst mycob_lepr sye_borbu sye_haein sye_ecoli heli_pylor caeno_eleg sye_syny3 sye_aquae sye_helpy ricket_pro rhodo_spha ricket_pro sye_azobr Archeal/ Eukaryotic GluRS + GlnRS Bacterial GluRS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ‘HIGH’ H8 ‘KMSKS’ KHLKATG-GKVLTRFPPEPNGYLHIGHAKAMFVDFGLAKDRNGGCYLRFDDTNP--EAEKKEYIDHIEEIVQWMGWEPF----------KITYTSNYFQELYEFAVELIRRGHAYVDHQTADEIKEYR----------EKKLNSPWRDRPISESLKLFEDMRR-GFIEEGKATLRMKQDMQSDNYNMY--------------------DLIAYRIKFTP---HPHAGDKWCIYPSYDYAHCIVDSIENVTHSLCTLEFETRRASYYWLLHALGIY-----QPYVWEYSR-LNVS-NTVMSKRKLNRLVTEK--WVDGWDD syq_luplu ::: PRLMTLAGLRRR-GMTPTAINAFVRGMGI---------------------------TRSDGTLISVERLEYHVREELNK-TAPRAMVVLHPLKVVITNLEAKSA-IEVDAKKWPDAQADDASAFYKIPFSN--VVYIERSDFR-MQDSKDYYGLAPGKSVILRYA-FPIKCTEVILADDN--ETILEIRAEYDP--------SKKTKPKGVLHWVSQPSP-GVDPLKVEVRLFERLFLSEN----PAELDNWLGDLNPHSKVEISNAYGVSLLKDAKLGDRFQFERLGYFAVDQ---------DSTPEKLVFNRTVTLKD syq_luplu syq_luplu PRLMTLAGLRRR-GMTPTAINAFVRGMGI---------------------------TRSDGTLISVERLEYHVREELNK-TAPRAMVVLHPLKVVITNLEAKSA-IEVDAKKWPDAQADDASAFYKIPFSN--VVYIERSDFR-MQDSKDYYGLAPGKSVILRYA-FPIKCTEVILADDN--ETILEIRAEYDP--------SKKTKPKGVLHWVSQPSP-GVDPLKVEVRLFERLFLSEN----PAELDNWLGDLNPHSKVEISNAYGVSLLKDAKLGDRFQFERLGYFAVDQ---------DSTPEKLVFNRTVTLKD QHLEITG-GQVRTRFPPEPNGILHIGHAKAINFNFGYAKANNGICFLRFDDTNP--EKEEAKFFTAICDMVAWLGYTPY----------KVTYASDYFDQLYAWAVELIRRGLAYVCHQRGEELKGHN------------TLPSPWRDRPMEESLLLFEAMRK-GKFSEGEATLRMKLVMEDGKM-----------------------DPVAYRVKYTP---HHRTGDKWCIYPTYDYTHCLCDSIEHITHSLCTKEFQARRSSYFWLCNALDVY-----CPVQWEYGR-LNLH-YAVVSKRKILQLVATG--AVRDWDD syq_human ::: PRLFTLTALRRR-GFPPEAINNFCARVGV---------------------------TVA-QTTMEPHLLEACVRDVLND-TAPRAMAVLESLRVIITNFPAAKS-LDIQVPNFPADETK---GFHQVPFAP--IVFIERTDFK-EEPEPGFKRLAWGQPVGLRHT-GYVIELQHVVKGPS--GCVESLEVTCRRA-------DAGEKPKAFIHWVSQ------PLMC-EVRLYERLFQHKNPEDPTEVPGGFLSDLNLASLHVVDAALVDCSVALAKPFDKFQFERLGYFSVDPD--------SHQGKLVFNRTVTLKED syq_human syq_human PRLFTLTALRRR-GFPPEAINNFCARVGV---------------------------TVA-QTTMEPHLLEACVRDVLND-TAPRAMAVLESLRVIITNFPAAKS-LDIQVPNFPADETK---GFHQVPFAP--IVFIERTDFK-EEPEPGFKRLAWGQPVGLRHT-GYVIELQHVVKGPS--GCVESLEVTCRRA-------DAGEKPKAFIHWVSQ------PLMC-EVRLYERLFQHKNPEDPTEVPGGFLSDLNLASLHVVDAALVDCSVALAKPFDKFQFERLGYFSVDPD--------SHQGKLVFNRTVTLKED EDLASGKHTTVHTRFPPEPNGYLHIGHAKSICLNFGIAQDYKGQCNLRFDDTNP--VKEDIEYVESIKNDVEWLGFHWSG---------NVRYSSDYFDQLHAYAIELINKGLAYVDELTPEQIREYRGTL------TQPGKNSPYRDRSVEENLALFEKMRA-GGFEEGKACLRAKIDMASPFIVMR--------------------DPVLYRIKFAE---HHQTGNKWCIYPMYDFTHCISDALEGITHSLCTLEFQDNRRLYDWVLDNITIP----VHPRQYEFSR-LNLE-YTVMSKRKLNLLVTDK--HVEGWDD syq_ecoli ::: PRMPTISGLRRR-GYTAASIREFCKRIGV---------------------------TKQ-DNTIEMASLESCIREDLNE-NAPRAMAVIDPVKLVIENYQGEG--EMVTMPNHPNKPEM---GSRQVPFSG--EIWIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAE--GNITTIFCTYDADTLSKDP-ADGRKVKGVIHWVSAA-----HALPVEIRLYDRLFSVPN----PGAADDFLSVINPESLVIK-QGFAEPSLKDAVAGKAFQFEREGYFCLDSR--------HSTAEKPVFNRTVGLRD syq_ecoli syq_ecoli PRMPTISGLRRR-GYTAASIREFCKRIGV---------------------------TKQ-DNTIEMASLESCIREDLNE-NAPRAMAVIDPVKLVIENYQGEG--EMVTMPNHPNKPEM---GSRQVPFSG--EIWIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAE--GNITTIFCTYDADTLSKDP-ADGRKVKGVIHWVSAA-----HALPVEIRLYDRLFSVPN----PGAADDFLSVINPESLVIK-QGFAEPSLKDAVAGKAFQFEREGYFCLDSR--------HSTAEKPVFNRTVGLRD EDLASGKHKSVHTRFPPEPNGYLHIGHAKSICLNFGLAKEYQGLCNLRFDDTNP--VKEDVEYVDSIKADVEWLGFKWEG---------EPRYASDYFDALYGYAVELIKKGLAYVDELSPDEMREYRGTL------TEPGKNSPYRDRTIEENLALFEKMKN-GEFAEGKASLRAKIDMASPFMVMR--------------------EPVIYRIKFSS---HHQTGDKWCIYPMYDFTHCISDAIERITHSICTLEFQDNRRLYDWVLENISIER---PLPHQYEFSR-LNLE-GTLTSKRKLLKLVNDE--IVDGWND syq_haein ::: PRMPTISGLRRR-GYTPASLREFCRRIGV---------------------------TKQ-DNVVEYSALEACIREDLNE-NAPRAMAVIDPVRVVIENFESE---AVLTAPNHPNRPEL---GERQLPFTK--ELYIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAN--GEITTIFCTYDPETLGKNP-ADGRKVKGVIHWVSAV-----NNHPAEFRLYDRLFTVPN----PGAEDDIESVLNPNSLVIK-QGFVEQSLANAEAEKGYQFEREGYFCADSK--------DSRPEHLVFNLTVSLKE syq_haein syq_haein PRMPTISGLRRR-GYTPASLREFCRRIGV---------------------------TKQ-DNVVEYSALEACIREDLNE-NAPRAMAVIDPVRVVIENFESE---AVLTAPNHPNRPEL---GERQLPFTK--ELYIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAN--GEITTIFCTYDPETLGKNP-ADGRKVKGVIHWVSAV-----NNHPAEFRLYDRLFTVPN----PGAEDDIESVLNPNSLVIK-QGFVEQSLANAEAEKGYQFEREGYFCADSK--------DSRPEHLVFNLTVSLKE -ELP-NVKDKVVMRFAPNPSGPLHIGHARAAVLNDYFVKKYGGKLILRLEDTDP--KRVLPEAYDMIKEDLDWLGVKVD----------EVVIQSDRIELYYEYGRKLIEMGHAYVCDCNPEEFRELR----------NKGVPCKCRDRAIEDNLELWEKMLN-GELEN--VAVRLKTDIKHKNPSIR--------------------DFPIFRVEKTP---HPRTGDKYCVYPLMNFSVPVDDHLLGMTHVLRGKDHIVNTEKQAYIYKYFGWE-----MPEFIHYGI-LKIE-DIVLSTSSMYKGIKEG--LYSGWDD sye_metja ::: VRLGTLRALRRR-GIKPEAIYEIMKRIGI---------------------------KQA-DVKFSWENLYAINKELIDK-DARRFFFVWNPKKLIIEGAEKKV----LKLRMHPDRPEF---GERELIFDG--EVYVVGDELEE--------------NKMYRLMELFNIVVEKVDDIA----LAKYHSDDFKI---------ARKNKAKIIHWIPVK-----DSVKVKVLMPDGEIK---------------------------EGFAEKDFAKVEVDDIIQFERFGFVRIDKK--------DNDGFVCCYAHR----sye_metja sye_metja VRLGTLRALRRR-GIKPEAIYEIMKRIGI---------------------------KQA-DVKFSWENLYAINKELIDK-DARRFFFVWNPKKLIIEGAEKKV----LKLRMHPDRPEF---GERELIFDG--EVYVVGDELEE--------------NKMYRLMELFNIVVEKVDDIA----LAKYHSDDFKI---------ARKNKAKIIHWIPVK-----DSVKVKVLMPDGEIK---------------------------EGFAEKDFAKVEVDDIIQFERFGFVRIDKK--------DNDGFVCCYAHR----RELA-GVKGEVVLRFAPNPSGPLHIGHARAAILNHEYARKYDGRLILRIEDTDP--RRVDPEAYDMIPADLEWLGVEWD----------ETVIQSDRMETYYEYTEKLIERGGAYVCTCRPEEFRELK----------NRGEACHCRSLGFRENLQRWREMFE---MKEGSAVVRVKTDLNHPNPAIR--------------------DWVSMRIVEAE---HPRTGTRYRVYPMMNFSVAVDDHLLGVTHVLRGKDHLANREKQEYLYRHLGWE-----PPEFIHYGR-LKMD-DVALSTSGAREGILRG--EYSGWDD sye_metth ::: PRLGTLRAIARR-GIRPEAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRSILEE-EARRYFFAADPVKLEVVGLPGPV---RVERPLHPDHPEI---GNRVLELRG--EVYLPGDDLGE---------------GPLRLIDAVNVIYSGG--------ELRYHSEGIEE---------ARELGASMIHWVPAE-----SALEAEVIMPDASRV---------------------------RGVIEADASELEVDDVVQLERFGFARLDS---------AGPGMVFYYAHK----sye_metth sye_metth PRLGTLRAIARR-GIRPEAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRSILEE-EARRYFFAADPVKLEVVGLPGPV---RVERPLHPDHPEI---GNRVLELRG--EVYLPGDDLGE---------------GPLRLIDAVNVIYSGG--------ELRYHSEGIEE---------ARELGASMIHWVPAE-----SALEAEVIMPDASRV---------------------------RGVIEADASELEVDDVVQLERFGFARLDS---------AGPGMVFYYAHK----RNLP-DVKGEVVLRFAPNPSGPLHIGHARAAILNHEYARRYDGKLILRIEDTDP--RRVDPEAYDMIPSDLEWLGVEWD----------ETIIQSDRMEIYYEYTERLIERGGAYVCTCTPEAFREFK----------NEGKACHCRDLGVRENLQRWREMFE---MPEGSAVVRVKTDLQHPNPAIR--------------------DWVSMRIVEAE---HPRTGTRYRVYPMMNFSVAVDDHLLGVTHVLRGKDHLANSEKQEYLYRHLGWE-----PPVFIHYGR-LKMD-DIALSTSGAREGIVEG--KYSGWDD sye_mettm ::: PRLGTIRAIARR-GIRSDAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRNILEE-EARRYFFAADPVRFEIEGLPGPI---RVERSLHPDKPEL---GNRILELNG--DVYLPRGDLRE---------------GPLRLIDAVNVIYSDG--------ELRYHSEGIEE---------ARELQAAMIHWVPAE-----SALKAVVVMPDASEI---------------------------EGVIEGDASELEVDDVVQLERFGFARVDS---------SGERLVFYYAHK----sye_mettm sye_mettm PRLGTIRAIARR-GIRSDAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRNILEE-EARRYFFAADPVRFEIEGLPGPI---RVERSLHPDKPEL---GNRILELNG--DVYLPRGDLRE---------------GPLRLIDAVNVIYSDG--------ELRYHSEGIEE---------ARELQAAMIHWVPAE-----SALKAVVVMPDASEI---------------------------EGVIEGDASELEVDDVVQLERFGFARVDS---------SGERLVFYYAHK----PLLPKAEKGKVVTRFAPNPDGAFHLGNARAAILSYEYAKMYGGKFILRFDDTDPKVKRPEPIFYKMIIEDLEWLGIKPD----------EIVYASDRLEIYYKYAEELIKMGKAYVCTCPPEKFRELR----------DKGIPCPHRDEPVEVQLERWKKMLN-GEYKEGEAVVRIKTDLNHPNPAVR--------------------DWPALRIIDNPN--HPRTGNKYRVWPLYNFASAIDDHELGVTHIFRGQEHAENETRQRYIYEYFGWE-----YPVTIHHGR-LSIE-GVVLSKSKTRKGIEEG--KYLGWDD pyro_hori1 ::: PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATISWENLAAINRKLVDP-IANRYFFVADPIPMEVEGAPEFI----AEIPLHPDHPER---GVRRLKFTPERPVYVSKDDLNLLK-----------PGNFVRLKDLFNVEILEVGDKI----RARFYSFEYEI---------AKKNRWKMVHWVTE-------GRPCEVIIPEGDELVV------------------------RKGLLEKD-AKVQVNEIVQFERFGFVRIDRI--------EGDKVIAIYAHK----pyro_hori1 pyro_hori1 PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATISWENLAAINRKLVDP-IANRYFFVADPIPMEVEGAPEFI----AEIPLHPDHPER---GVRRLKFTPERPVYVSKDDLNLLK-----------PGNFVRLKDLFNVEILEVGDKI----RARFYSFEYEI---------AKKNRWKMVHWVTE-------GRPCEVIIPEGDELVV------------------------RKGLLEKD-AKVQVNEIVQFERFGFVRIDRI--------EGDKVIAIYAHK----PPLPKAEKGKVVTRFAPNPDGAFHLGNARAAILSYEYAKMYGGKFILRFDDTDPKVKRPEPIFYEMIIEDLEWLGIKPD----------EIVYASDRLELYYKYAEELIKMGKAYVCTCKPEKFRELR----------DKGIPCPHRDEPVEVQLERWRKMLN-GEYKEGEAVVRIKTDLNHPNPAVR--------------------DWPALRIVDNPN--HPRAGNKYRVWPLYNFASAIDDHELGVTHIFRGQEHAENETRQRYIYEYFGWE-----YPVTVHHGR-LSIE-GVILSKSKTRKGIEEG--KYLGWDD pyro_aby1 ::: PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATVSWDNLAAINRKLVDP-IANRYFFVADPVPMEVEGAPEFI----AKIPLHPDHPER---GTRELRFTPGKPIYVSKDDLDLLK-----------PGSFVRLKDLFNVEIVEVGEKI----KAKFHSFEYEI---------ARKNKWRMIHWVPE-------GRPCEVIIPEGDELIV------------------------RKGLLEKD-ANVKAGEIVQFERFGFVRIDKI--------EGEKVVAIYAHK----pyro_aby1 pyro_aby1 PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATVSWDNLAAINRKLVDP-IANRYFFVADPVPMEVEGAPEFI----AKIPLHPDHPER---GTRELRFTPGKPIYVSKDDLDLLK-----------PGSFVRLKDLFNVEIVEVGEKI----KAKFHSFEYEI---------ARKNKWRMIHWVPE-------GRPCEVIIPEGDELIV------------------------RKGLLEKD-ANVKAGEIVQFERFGFVRIDKI--------EGEKVVAIYAHK----PELEGAEKGKVVMRFAPNPNGPPTLGSARGIIVNGEYAKMYEGKYIIRFDDTDPRTKRPMIEAYEWYLEDIEWLGYKPD----------EVIYASRRIPIYYDYARKLIEMGKAYTCFCSQEEFKKFR----------DSGEECPHRNISVEDTLEVWERMLE-GDYEEGEVVLRIKTDMRHKDPAIR--------------------DWVAFRIIKES---HPLVGDKYVVYPTLDFESAIEDHLLGITHIIRGKDLIDSERRQRYIYEYFGWI-----YPITKHWGR-VKIFEFGKLSTSSIKKDIERG--KYEGWDD sye_arcfu ::: PRLPTLRAFRRR-GFEPEAIKSFFLSLGV---------------------------GEN-DVSVSLKNLYAENRKIIDR-KANRYFFIWGPVKIEIVNLPEKK---EVELPLNPHTGE-----KRRLKGER--TIYVTKDDFERLK------------GQVVRLKDFCNVLLDEK---------AEFMGFELEG---------VKK-GKNIIHWLPE------SEAIKGKVIGERE----------------------------AEGLVERN-AVRDVGKVVQFERFAFCKVES---------ADEELVAVYTHP----sye_arcfu sye_arcfu PRLPTLRAFRRR-GFEPEAIKSFFLSLGV---------------------------GEN-DVSVSLKNLYAENRKIIDR-KANRYFFIWGPVKIEIVNLPEKK---EVELPLNPHTGE-----KRRLKGER--TIYVTKDDFERLK------------GQVVRLKDFCNVLLDEK---------AEFMGFELEG---------VKK-GKNIIHWLPE------SEAIKGKVIGERE----------------------------AEGLVERN-AVRDVGKVVQFERFAFCKVES---------ADEELVAVYTHP----PPLPGAVEGRVKLRFAPNPDFVIHMGNARPAIVNHEYARMYKGRMVLRFEDTDPRTKTPLREAYDLIRQDLKWLGVSWD----------EEYIQSLRMEVFYSVARRAIERGCAYVDNCGRE-GKELL----------SRGEYCPTRDLGPEDNLELFEKMLE-GEFYEGEAVVRMKTDPRHPNPSLR--------------------DWVAMRIIDTEKHPHPLVGSRYLVWPTYNFAVSVDDHMMEITHVLRGKEHQLNTEKQLAVYRCMGWR-----PPYFIHFGR-LKLE-GFILSKSKIRKLLEERPGEFMGYDD aero_perni ::: PRFGTIAGLRRR-GVLAEAIRQIILEVGV---------------------------KPT-DATISWANLAAANRKLLDE-RADRIMYVEDPVEMEVELAQVEC--RAAEIPFHPSRPQR----KRRITLCTGDKVLLTREDAVE--------------GRQLRLMGLSNFTVSQG--------ILREVDPSLEY---------ARRMKLPIVQWVKKG-----GEASVEVLEPVELELRRH------------------------QGYAEDAIRGYGVDSRLQFVRYGFVRVDSV--------EDGVYRVIYTHK----aero_perni aero_perni PRFGTIAGLRRR-GVLAEAIRQIILEVGV---------------------------KPT-DATISWANLAAANRKLLDE-RADRIMYVEDPVEMEVELAQVEC--RAAEIPFHPSRPQR----KRRITLCTGDKVLLTREDAVE--------------GRQLRLMGLSNFTVSQG--------ILREVDPSLEY---------ARRMKLPIVQWVKKG-----GEASVEVLEPVELELRRH------------------------QGYAEDAIRGYGVDSRLQFVRYGFVRVDSV--------EDGVYRVIYTHK----PPLP-NVKGQVVTRFAPNPDGPLHLGNARSAILSYEYAKMYNGKFILRFDDTDPKVKRPILDAYDWIKEDLKWLGIKWE----------QELYASERLELYYKYARYLIEKGYAYVDTCDSSIFRKFRDSRGK-----MKEPECLHRSSSPESNLELFEKMLG-GKFKEGEAVVRLKTDLSDPDPSQI--------------------DWVMLRIIDTAKNPHPRVGSKYWVWPTYNFASIIDDHELGITHVLRAKEHMSNTEKQRYISEYMGWE-----FPEVLQFGR-LRLE-GFMMSKSKIRGMLEKG----TNRDD sye_sulso ::: PRLPTLAGLRRR-GILPDTIKDVIIDVGV---------------------------KVT-DATISFENIAAINRKKLDP-VAKRIMFVKDAEEFSVELPESLN----AKIPLIPSKQEM----NRTIIVNPGDKILIESNDAED--------------NSILRLMELCNVKVDKHNR------KLIFHSKTLDE---------AKKVNAKIVQWVKSN-----EKVPVMVEKAERDEIKMI------------------------NGYAEKIAADLEIDEIVQFYRFGFVRVDRK--------DENMLRVVFSHD----sye_sulso sye_sulso PRLPTLAGLRRR-GILPDTIKDVIIDVGV---------------------------KVT-DATISFENIAAINRKKLDP-VAKRIMFVKDAEEFSVELPESLN----AKIPLIPSKQEM----NRTIIVNPGDKILIESNDAED--------------NSILRLMELCNVKVDKHNR------KLIFHSKTLDE---------AKKVNAKIVQWVKSN-----EKVPVMVEKAERDEIKMI------------------------NGYAEKIAADLEIDEIVQFYRFGFVRVDRK--------DENMLRVVFSHD----VELPGAEMGKVTVRFPPEASGYLHIGHAKAALLNQHYQVNFKGKLIMRFDDTNP--EKEKEDFEKVILEDVAMLHIKPD----------QFTYTSDHFETIMKYAEKLIQEGKAYVDDTPAEQMKAER----------EQRIESKHRKNPIEKNLQMWEEMKK-GSQFGHSCCLRAKIDMSSNNGCMR--------------------DPTLYRCKIQP---HPRTGNKYNVYPTYDFACPIVDSIEGVTHALRTTEYHDRDEQFYWIIEALGIR-----KPYIWEYSR-LNLN-NTVLSKRKLTWFVNEG--LVDGWDD syep_human ::: PRFPTVRGVLRR-GMTVEGLKQFIAAQGS---------------------------SRS-VVNMEWDKIWAFNKKVIDP-VAPRYVALLKKEVIPVNVPEAQE--EMKEVAKHPKNPEV---GLKPVWYSP--KVFIEGADAETFSE-----------GEMVTFINWGNLNITKIHKNADGKIISLDAKFNLENK--------DYKKTT-KVTWLAETT--HALPIPVICVTYEHLITKPV----LGKDEDFKQYVNKNSKHEE-LMLGDPCLKDLKKGDIIQLQRRGFFICDQPYEPVSPYSCKEAPCVLIYIPDGHTK syep_human syep_human PRFPTVRGVLRR-GMTVEGLKQFIAAQGS---------------------------SRS-VVNMEWDKIWAFNKKVIDP-VAPRYVALLKKEVIPVNVPEAQE--EMKEVAKHPKNPEV---GLKPVWYSP--KVFIEGADAETFSE-----------GEMVTFINWGNLNITKIHKNADGKIISLDAKFNLENK--------DYKKTT-KVTWLAETT--HALPIPVICVTYEHLITKPV----LGKDEDFKQYVNKNSKHEE-LMLGDPCLKDLKKGDIIQLQRRGFFICDQPYEPVSPYSCKEAPCVLIYIPDGHTK VELPGAEKGKVVVRFPPEASGYLHIGHAKAALLNQYYQQAFEGQLIMRFDDTNP--AKENAHFEHVIKEDLSMLNIVPD----------RWTHSSDHFEMLLTMCEKLLKEGKAFVDDTDTETMRNER----------EQRQDSRNRSNTPEKNLQLWEEMKK-GSPKGLTCCVRMKIDMKSNNGAMR--------------------DPTIYRCKPEE---HVRTGLKYKVYPTYDFTCPIVDSVEGVTHALRTTEYHDRDDQYYFICDALGLR-----RPHIWEYAR-LNMT-NTVMSKRKLTWFVDEG--HVEGWDD caeno_eleg ::: PRLPTVRGVMRR-GLTVEGLKQFIVAQGG---------------------------SRS-VVMMEWDKIWAFNKKVIDP-VAPRYTALDSTSPLVSIELTDSISDDTSNVSLHPKNAEI---GSKDVHKGK--KLLLEQVDAAALKE-----------GEIVTFVNWGNIKIGKIEK-KGAVITKISATLQLDNT--------DYKKTT-KVTWLGDVKAEAGKTIPVVTADYDHIISKAI----IGKDEDWKQFINFDSVHYT-KMVGEPAIKNVKKGDIIQIQRKGFYIVDQPYNPKSELSGVETPLLLIAIPDGHTG caeno_eleg caeno_eleg PRLPTVRGVMRR-GLTVEGLKQFIVAQGG---------------------------SRS-VVMMEWDKIWAFNKKVIDP-VAPRYTALDSTSPLVSIELTDSISDDTSNVSLHPKNAEI---GSKDVHKGK--KLLLEQVDAAALKE-----------GEIVTFVNWGNIKIGKIEK-KGAVITKISATLQLDNT--------DYKKTT-KVTWLGDVKAEAGKTIPVVTADYDHIISKAI----IGKDEDWKQFINFDSVHYT-KMVGEPAIKNVKKGDIIQIQRKGFYIVDQPYNPKSELSGVETPLLLIAIPDGHTG VDLPGAEMGKVVVRFPPEASGYLHIGHAKAALLNQYYALVCQGTLIMRFDDTNP--AKETVEFENVILGDLEQLQIKPD----------VFTHTSNYFDLMLDYCVRLIKESKAYVDDTPPEQMKLER----------EQRVESANRSNSVEKNLSLWEEMVK-GSEKGQNTACAAKIDMSSPNGCMR--------------------DPTIYRCKNEP---HPRTGTKYKVYPTYDFACPIVDAIENVTHTLRTTEYHDRDDQFYWFIDALKLR-----KPYIWSYSR-LNMT-NTVLSKRKLTWFVDSG--LVDGWDD syep_drome ::: PRFPTVRGIIRR-GMTVEGLKEFIIAQGS---------------------------SKS-VVFMNWDKIWAFNKKVIDP-IAPRYTALEKEKRVIVNVAGAKV--ERIQVSVHPKDESL---GKKTVLLGP--RIYIDYVDAEALKE-----------GENATFINWGNILIKKVNKDASGNITSVDAALNLENK--------DFKKTL-KLTWLAVEDD-PSAYPPTFCVYFDNIISKAV----LGKDEDFKQFIGHKTRDEV-PMLGDPELKKCKKGDIIQLQRRGFFKVDVAYLPPSGYTNVPSPIVLFSIPDGHTK syep_drome syep_drome PRFPTVRGIIRR-GMTVEGLKEFIIAQGS---------------------------SKS-VVFMNWDKIWAFNKKVIDP-IAPRYTALEKEKRVIVNVAGAKV--ERIQVSVHPKDESL---GKKTVLLGP--RIYIDYVDAEALKE-----------GENATFINWGNILIKKVNKDASGNITSVDAALNLENK--------DFKKTL-KLTWLAVEDD-PSAYPPTFCVYFDNIISKAV----LGKDEDFKQFIGHKTRDEV-PMLGDPELKKCKKGDIIQLQRRGFFKVDVAYLPPSGYTNVPSPIVLFSIPDGHTK IGLPDAIDGKVVTRFPPEPSGYLHIGHAKAALLNQYFANKYHGKLIVRFDDTNP--SKENSEFQDAILEDVALLGIKPD----------VVTYTSDYLDTIHQYCVDMIKSGQAYADDTDVETMRHER----------TEGIPSKHRDRPIEESLEILSEMDK-GSDVGLKNCIRAKISYENPNKAMR--------------------DPVIYRCNLLP---HHRTGTKYRAYPTYDFACPIVDSLEGVTHALRTTEYRDRNPLYQWMIKAMNLR-----KIHVWEFSR-MNFV-RTLLSKRKLTEIVDHG--LVWGWDD schizo_pom ::: PRFPTVRGVRRR-GMTIEALQQYIVSQGP---------------------------SKN-ILTLDWTSFWATNKKIIDP-VAPRHTAVESGDVVKATIVNGPAAPYAEDRPRHKKNPEL---GNKKSIFAN--EILIEQADAQSFKQ-----------DEEVTLMDWGNAYVREINRDASGKVTSLKLELHLDG---------DFKKTEKKVTWLADTE----DKTPVDLVDFDYLITKDK----LEEGENYKDFLTPQTEFHS-PVFADVGIKNLKKGDIIQVERKGYYIVDVP--------FDGTQAVLFNIPDGKTV schizo_pom schizo_pom PRFPTVRGVRRR-GMTIEALQQYIVSQGP---------------------------SKN-ILTLDWTSFWATNKKIIDP-VAPRHTAVESGDVVKATIVNGPAAPYAEDRPRHKKNPEL---GNKKSIFAN--EILIEQADAQSFKQ-----------DEEVTLMDWGNAYVREINRDASGKVTSLKLELHLDG---------DFKKTEKKVTWLADTE----DKTPVDLVDFDYLITKDK----LEEGENYKDFLTPQTEFHS-PVFADVGIKNLKKGDIIQVERKGYYIVDVP--------FDGTQAVLFNIPDGKTV IDLPDAKMGEVVTRFPPEPSGYLHIGHAKAALLNQYFAQAYKGKLIIRFDDTNP--SKEKEEFQDSILEDLDLLGIKGD----------RITYSSDYFQEMYDYCVQMIKDGKAYCDDTPTEKMREER----------MDGVASARRDRSVEENLRIFTEEMKNGTEEGLKNCVRAKIDYKALNKTLR--------------------DPVIYRCNLTP---HHRTGSTWKIYPTYDFCVPIVDAIEGVTHALRTIEYRDRNAQYDWMLQALRLR-----KVHIWDFAR-INFV-RTLLSKRKLQWMVDKD--LVGNWDD syec_yeast ::: PRFPTVRGVRRR-GMTVEGLRNFVLSQGP---------------------------SRN-VINLEWNLIWAFNKKVIDP-IAPRHTAIVNPVKIHLEGSEAPQEPKIEMKPKHKKNPAV---GEKKVIYYK--DIVVDKDDADVINV-----------DEEVTLMDWGNVIITKKNDDGS-----MVAKLNLEG---------DFKKTKHKLTWLADTK----DVVPVDLVDFDHLITKDR----LEEDESFEDFLTPQTEFHT-DAIADLNVKDMKIGDIIQFERKGYYRLDAL-------PKDGKPYVFFTIPDGKSV syec_yeast syec_yeast PRFPTVRGVRRR-GMTVEGLRNFVLSQGP---------------------------SRN-VINLEWNLIWAFNKKVIDP-IAPRHTAIVNPVKIHLEGSEAPQEPKIEMKPKHKKNPAV---GEKKVIYYK--DIVVDKDDADVINV-----------DEEVTLMDWGNVIITKKNDDGS-----MVAKLNLEG---------DFKKTKHKLTWLADTK----DVVPVDLVDFDHLITKDR----LEEDESFEDFLTPQTEFHT-DAIADLNVKDMKIGDIIQFERKGYYRLDAL-------PKDGKPYVFFTIPDGKSV VDLPEAEIGKVKLRFAPEPSGYLHIGHAKAALLNKYFAERYQGEVIVRFDDTNP--AKESNEFVDNLVKDIGTLGIKYE----------KVTYTSDYFPELMDMAEKLMREGKAYVDDTPREQMQKER----------MDGIDSKCRNHSVEENLKLWKEMIA-GSERGLQCCVRGKFNMQDPNKAMR--------------------DPVYYRCNPMS---HHRIGDKYKIYPTYDFACPFVDSLEGITHALRSSEYHDRNAQYFKVLEDMGLR-----QVQLYEFSR-LNLV-FTLLSKRKLLWFVQTG--LVDGWDD arab_thali ::: PRFPTVQGIVRR-GLKIEALIQFILEQGA---------------------------SKN-LNLMEWDKLWSINKRIIDP-VCPRHTAVVAERRVLFTLTDGPDEPFVRMIPKHKKFEGA---GEKATTFTK--SIWLEEADASAISV-----------GEEVTLMDWGNAIVKEITKDEEGRVTALSGVLNLQG---------SVKTTKLKLTWLPDTN----ELVNLTLTEFDYLITKKK----LEDDDEVADFVNPNTKKET-LALGDSNMRNLKCGDVIQLERKGYFRCDVP------FVKSSKPIVLFSIPDGRAA arab_thali arab_thali PRFPTVQGIVRR-GLKIEALIQFILEQGA---------------------------SKN-LNLMEWDKLWSINKRIIDP-VCPRHTAVVAERRVLFTLTDGPDEPFVRMIPKHKKFEGA---GEKATTFTK--SIWLEEADASAISV-----------GEEVTLMDWGNAIVKEITKDEEGRVTALSGVLNLQG---------SVKTTKLKLTWLPDTN----ELVNLTLTEFDYLITKKK----LEDDDEVADFVNPNTKKET-LALGDSNMRNLKCGDVIQLERKGYFRCDVP------FVKSSKPIVLFSIPDGRAA IKEDIHPSLPVRTRFAPSPTGFLHLGSLRTALYNYLLARNTNGQFLLRLEDTDQ--KRLIEGAEENIYEILKWCNINYDET---------PIKQSERKLIYDKYVKILLSSGKAYRCFCSKERLNDLRHSAMELKPPSMASYDRCCAHLGEEEIKSKLAQ--------GIPFTVRFKSP-ERYPTFTDLLHGQINLQPQVNFNDKRYDDLILVKSD---------------KLPTYHLANVVDDHLMGITHVIRGEEWLPSTPKHIALYNAFGWA-----CPKFIHIPLLTTVG-DKKLSKRKGD--------------syem_yeast ::: ---MSISDLKRQ-GVLPEALINFCALFGWSPPRDLASKKHECFSMEELETIFNLNGLTKGNAKVDDKKLWFFNKHFLQKRILNPSTLRELVDDIMPSLESIYNTSTISREKVAKILLNCGGSLSRINDF---HDEFYYFFEKPKYN-----------DNDAVTKFLSKNESRHIA--------HLLKKLGQFQEG------TDAQEVESMVETMYYEN-----GFSRKVTYQAMRFALA-------------------------------GCHPGAKIAAMIDILG-IKESNKRLSEGLQFLQREKK------------syem_yeast syem_yeast ---MSISDLKRQ-GVLPEALINFCALFGWSPPRDLASKKHECFSMEELETIFNLNGLTKGNAKVDDKKLWFFNKHFLQKRILNPSTLRELVDDIMPSLESIYNTSTISREKVAKILLNCGGSLSRINDF---HDEFYYFFEKPKYN-----------DNDAVTKFLSKNESRHIA--------HLLKKLGQFQEG------TDAQEVESMVETMYYEN-----GFSRKVTYQAMRFALA-------------------------------GCHPGAKIAAMIDILG-IKESNKRLSEGLQFLQREKK-------------------MTTVRTRIAPSPTGDPHVGTAYIALFNLCFARQHGGQFILRIEDTDQ--LRSTRESEQQIYDALRWLGIEWDEGPDVGGP-HGPYRQSERGHIYKRYSDELVEKGHAFTCFCTPERLDAVRAEQMARK--ETPRYDGHCMHLPKDEVQRRLAA--------GESHVTRMKVPTEGVCVVPDMLRGDVEIPWDRMD------MQVLMKAD---------------GLPTYFLANVVDDHLMGITHVLRGEEWLPSAPKLIKLYEYFGWE-----QPQLCYMPLLRNPD-KSKLSKRKNP--------------pseudo_aer ::: ---TSITFYERM-GYLPQALLNYLGRMGWSMP-----DEREKFTLAEMIEHFDLSRVSLGGPIFDLEKLSWLNGQWIREQSV-EEFAREVQKWALNP------------EYLMKIAPHVQGRVENFSQIAP-LAGFFFSGGVPLDASLF--------EHKKLDPTQVRQVLQLVL--------WKLESLRQWE-----------KERITGCIQAVAEH----LQLKLRDVM-PLMFPAIT------------------------------GHASSVSVLDAMEILG-ADLSRYRLRQALELLGGASKKETKEWEKIRDAI pseudo_aer pseudo_aer ---TSITFYERM-GYLPQALLNYLGRMGWSMP-----DEREKFTLAEMIEHFDLSRVSLGGPIFDLEKLSWLNGQWIREQSV-EEFAREVQKWALNP------------EYLMKIAPHVQGRVENFSQIAP-LAGFFFSGGVPLDASLF--------EHKKLDPTQVRQVLQLVL--------WKLESLRQWE-----------KERITGCIQAVAEH----LQLKLRDVM-PLMFPAIT------------------------------GHASSVSVLDAMEILG-ADLSRYRLRQALELLGGASKKETKEWEKIRDAI -----MADSAVRVRIAPSPTGEPHVGTAYIALFNYLFAKKHGGKFILRIEDTDA--TRSTPEFEKKVLDALKWCGLEWSEGPDIGGP-YGPYRQSDRKDIYKPYVEKIVANGHGFRCFCTPERLEQMREAQRAAG--KPPKYDGLCLSLSAEEVTSRVDA--------GEPHVVRMKIPTEGSCKFRDGVYGDVEIPWEAVD------MQVLLKAD---------------GMPTYHMANVVDDHLMKITHVARGEEWLASVPKHILIYQYLGLE-----PPVFMHLSLMRNAD-KSKLSKRKNP--------------sye_rhime ::: ---TSISYYTAL-GYLPEALMNFLGLFFIQIA-----EGEELLTMEELAEKFDPENLSKAGAIFDIQKLDWLNARWIREKLSEEEFAARVLAWAMDN------------ERLKEGLKLSQTRISKLGELPD-LAAFLFKSDLGLQPAAF--------AGVKASPEEMLKILNTVQ--------PDLEKILEWN-----------KDSIETELR-ASER----MGKKLKAVVAPLFVACS-------------------------------GSQRSLPLFDSMELLG-RSVVRQRLKVAAQVVASMAGSGKQ--------sye_rhime sye_rhime ---TSISYYTAL-GYLPEALMNFLGLFFIQIA-----EGEELLTMEELAEKFDPENLSKAGAIFDIQKLDWLNARWIREKLSEEEFAARVLAWAMDN------------ERLKEGLKLSQTRISKLGELPD-LAAFLFKSDLGLQPAAF--------AGVKASPEEMLKILNTVQ--------PDLEKILEWN-----------KDSIETELR-ASER----MGKKLKAVVAPLFVACS-------------------------------GSQRSLPLFDSMELLG-RSVVRQRLKVAAQVVASMAGSGKQ-------------MAWENVRVRVAPSPTGDPHVGTAYMALFNEIFAKRFNGKMILRIEDTDQ--TRSRDDYEKNIFSALQWCGIQWDEGPDIGGP-HGPYRQSERTEIYREYAELLLKTDYAYKCFATPKELEEMRAVATTLG--YRGGYDRRYRYLSPEEIEARTQE--------GQPYTIRLKVPLTGECVLEDYCKGRVVFPWADVD------DQVLMKSD---------------GFPTYHFANVVDDHLMGITHVLRGEEWLSSTPKHLLLYEAFGWE-----PPIFLHMPLLLNPD-GTKLSKRKNP--------------chlamy_psi ::: ---TSIFYYRDA-GYIKEAFMNFLTLMGYSME-----GDEEVYSLEKLIANFDPKRIGKSGAVFDVRKLDWMNKHYLNHEGSPENLLARLKDWLVND------------EFLLKILPLCQSRMATLAEFVG-LSEFFFSVLPEYSKEEL--------LPAAISQEKAAILFYSYV--------KYLEKTDLWV-----------KDQFYLGSKWLSEA----FQVHHKKVVIPLLYVAIT------------------------------GKKQGLPLFDSMELLG-KPRTRMRMVHAQNLLGGVPKKIQTAIDKVLKEE chlamy_psi chlamy_psi ---TSIFYYRDA-GYIKEAFMNFLTLMGYSME-----GDEEVYSLEKLIANFDPKRIGKSGAVFDVRKLDWMNKHYLNHEGSPENLLARLKDWLVND------------EFLLKILPLCQSRMATLAEFVG-LSEFFFSVLPEYSKEEL--------LPAAISQEKAAILFYSYV--------KYLEKTDLWV-----------KDQFYLGSKWLSEA----FQVHHKKVVIPLLYVAIT------------------------------GKKQGLPLFDSMELLG-KPRTRMRMVHAQNLLGGVPKKIQTAIDKVLKEE -------MEKIRTRYAPSPTGYLHVGGTRTAIFNFLLAKHFNGEFIIRIEDTDT--ERNIKEGINSQFDNLRWLGVIADESVYNPGN-YGPYLQSQKLAVYKKLAFDLIEKNLAYRCFCSKEKLESDRKQAINNH--KTPKYLGHCRNLHSKKITNHLEK--------NDPFTIRLKINNEAEYSWNDLVRGQITIPGSALT------DIVILKAN---------------GVATYNFAVVIDDYDMEITDVLRGAEHISNTAYQLAIYQALGFKR----IPRFGHLSVIVDES-GKKLSKRDEKTT------------sye_mycge ::: ---QFIEQFKQQ-GYLPEALLNFLALLGWHP-----QYNQEFFNLKQLIENFSLSRVVSAPAFFDIKKLQWINANYIKQ-LTDNAYFNFIDNYLDVKVDYLK-------DKNREISLLFKNQITHGVQINE-LIRESFATKIGVENLA---------KKSHILFKNIKLFLEQLA--------KSLQGLEEWK-----------AEQIKTTINKVGAV----FNLKGKQLFMPIRLIFT-------------------------------NKEHGPDLAHIIEIFD-KESAINLIKQFINATNLF--------------sye_mycge sye_mycge ---QFIEQFKQQ-GYLPEALLNFLALLGWHP-----QYNQEFFNLKQLIENFSLSRVVSAPAFFDIKKLQWINANYIKQ-LTDNAYFNFIDNYLDVKVDYLK-------DKNREISLLFKNQITHGVQINE-LIRESFATKIGVENLA---------KKSHILFKNIKLFLEQLA--------KSLQGLEEWK-----------AEQIKTTINKVGAV----FNLKGKQLFMPIRLIFT-------------------------------NKEHGPDLAHIIEIFD-KESAINLIKQFINATNLF---------------------MEKIRTRYAPSPTGYLHVGGARTAIFNFLLAKHFNGEFIIRIEDTDT--ERNVEGGIESQLENLRWLGIIPDESIYNPGN-YGPYIQSQKLATYKKLAYELVGKGLAYRCFCTKEKLEHERQLALEHH--QTPKYLGTCRNLHSKHIQTNLDN--------QVPFTIRLKINQDAEFAWNDQVRGKITIPGNSLT------DIVLLKAN---------------GIATYNFAVVIDDHDMEITDVLRGAEHISNTAYQLAINQALGYQR----IPRFGHLSVIVDKS-GKKLSKRDTKTI------------sye_mycpn ::: ---QFIEQFKQE-GYLPEAVVNFLALLGWNS-----DFNREFFTINQLIESFTVNRVVGAPAFFDIKKLQWINAHYIKE-LSDNAYFNFIDNYLTIDFDYLK-------NKRKEVSLLFKNQLAFGIEINQ-LIKETFAPKLGVQHLS---------VKHRELFKELQSALQQLS--------EQLQALPDWT-----------KDNVKSTLTQIGEQ----FNLKGKKLFMPLRLIFT-------------------------------NKEHGPDLAGIMVLHG-KTQVLALLQEFIHATNLF--------------sye_mycpn sye_mycpn ---QFIEQFKQE-GYLPEAVVNFLALLGWNS-----DFNREFFTINQLIESFTVNRVVGAPAFFDIKKLQWINAHYIKE-LSDNAYFNFIDNYLTIDFDYLK-------NKRKEVSLLFKNQLAFGIEINQ-LIKETFAPKLGVQHLS---------VKHRELFKELQSALQQLS--------EQLQALPDWT-----------KDNVKSTLTQIGEQ----FNLKGKKLFMPLRLIFT-------------------------------NKEHGPDLAGIMVLHG-KTQVLALLQEFIHATNLF---------------------MKKLRTRYAPSPTGYLHIGGARTALFNYLLAKHYNGDFIIRIEDTDV--KRNIADGEASQIENLKWLNIEANESPLKPNEKYGPYRQSQKLEKYLKIAHELIEKGYAYKAYDNSEELEEQKKHSEKLG-VASFRYQRDFLKISEEEKQKRDAS--------G-AYSIRVICPKNTTYQWDDLVRGNIAVNSNDIG------DWIIIKSD---------------DYPTYNFAVVIDDIDMEISHILRGEEHITNTPKQMMIYDYLNAP-----KPLFGHLTIITNME-GKKLSKRDLSLK------------sye_mycpu ::: ---QFIHEYKEE-GYNSQAIFNFLTLLGWTD-----EKARELMDHDEIIKSFLYTRLSKSPSKFDITKMQWFSKQYWKN-TPNEELIKILNLNDYDN------------DWINLFLDLYKENIYSLNQLKN-YLKIYKQANLNQ-------------EKDLDLNDAEKNVVKSFS--------SYIDYS-NFS-----------VNQIQEAINKTQEK----LSIKGKNLFLPIRKATT-------------------------------FQEHGPELAKAIYLFG-SEIIEKRMKKWK--------------------sye_mycpu sye_mycpu ---QFIHEYKEE-GYNSQAIFNFLTLLGWTD-----EKARELMDHDEIIKSFLYTRLSKSPSKFDITKMQWFSKQYWKN-TPNEELIKILNLNDYDN------------DWINLFLDLYKENIYSLNQLKN-YLKIYKQANLNQ-------------EKDLDLNDAEKNVVKSFS--------SYIDYS-NFS-----------VNQIQEAINKTQEK----LSIKGKNLFLPIRKATT-------------------------------FQEHGPELAKAIYLFG-SEIIEKRMKKWK-----------------------------MVVTRIAPSPTGDPHVGTAYIALFNYAWARRNGGRFIVRIEDTDR--ARYVPGAEERILAALKWLGLSYDEGPDVAAP-TGPYRQSERLPLYQKYAEELLKRGWAYRAFETPEELEQIRKEK--------GGYDGRARNIPPEEAEERARR--------GEPHVIRLKVPRPGTTEVKDELRGVVVYDNQEIP------DVVLLKSD---------------GYPTYHLANVVDDHLMGVTDVIRAEEWLVSTPIHVLLYRAFGWE-----APRFYHMPLLRNPD-KTKISKRKSH--------------sye_theth ::: ---TSLDWYKAE-GFLPEALRNYLCLMGFSMP-----DGREIFTLEEFIQAFTWERVSLGGPVFDLEKLRWMNGKYIREVLSLEEVAERVKPFLREAGLSWESE-----AYLRRAVELMRPRFDTLKEFPE-KARYLFTEDYPVS------------EKAQRKLEEGLPLLKELY--------PRLRAQEEWT-----------EAALEALLRGFAAE----KGVKLGQVAQPLRAALT-------------------------------GSLETPGLFEILALLG-KERALRRLERALA-------------------sye_theth sye_theth ---TSLDWYKAE-GFLPEALRNYLCLMGFSMP-----DGREIFTLEEFIQAFTWERVSLGGPVFDLEKLRWMNGKYIREVLSLEEVAERVKPFLREAGLSWESE-----AYLRRAVELMRPRFDTLKEFPE-KARYLFTEDYPVS------------EKAQRKLEEGLPLLKELY--------PRLRAQEEWT-----------EAALEALLRGFAAE----KGVKLGQVAQPLRAALT-------------------------------GSLETPGLFEILALLG-KERALRRLERALA-------------------ASADSGGSGPVRVRFAPSPTGNLHVGGARTALFNYLFARSRGGKFVLRVEDTDL--ERSTKKSEEAVLTDLSWLGLDWDEGPDIGGD-FGPYRQSERNALYKEHAQKLMESGAVYRCFCSNEELEKMKETANRMK--IPPVYMGKWATASDAEVQQELEK--------GTPYTYRFRVPKEGSLKINDLIRGEVSWNLNTLG------DFVIMRSN---------------GQPVYNFCVTVDDATMRISHVIRAEEHLPNTLRQALIYKALGFA-----MPLFAHVSLILAPD-KSKLSKRHGA--------------sye_horvu ::: ---TSVGQYKEM-GYLPQAMVNYLALLGWGD-----GTENEFFTIDDLVEKFTIDRVNKSGAVFDATKLKWMNGQHLRS-LPSDLLIKDFEDQWRSTGILLESES----GFAKEAAELLKEGIDLITDADAALCKLLSYPLHETLSSD---------EAKSVVEDKLSEVASGLI--------SAYDSG-ELD--------QALAEGHDGWKKWVKSFGKT-HKRKGKSLFMPLRVLLT-------------------------------GKLHGPAMDSTVILVH-KAGTSGAVAPQSGFVSLDERFKILKEVNWESLQ sye_horvu sye_horvu ---TSVGQYKEM-GYLPQAMVNYLALLGWGD-----GTENEFFTIDDLVEKFTIDRVNKSGAVFDATKLKWMNGQHLRS-LPSDLLIKDFEDQWRSTGILLESES----GFAKEAAELLKEGIDLITDADAALCKLLSYPLHETLSSD---------EAKSVVEDKLSEVASGLI--------SAYDSG-ELD--------QALAEGHDGWKKWVKSFGKT-HKRKGKSLFMPLRVLLT-------------------------------GKLHGPAMDSTVILVH-KAGTSGAVAPQSGFVSLDERFKILKEVNWESLQ VYASAGDGGDVRVRFAPSPTGNLHVGGARTALFNYLYARAKGGKFILRIEDTDL--ERSTKESEEAVLRDLSWLGPAWDEGPGIGGE-YGPYRQSERNALYKQFAEKLLQSGHVYRCFCSNEELEKMKEIAKLKQ--LPPVYTGRWASATEEEVVEELAK--------GTPYTYRFRVPKEGSLKIDDLIRGEVSWNLDTLG------DFVIMRSN---------------GQPVYNFCVTVDDATMAISHVIRAEEHLPNTLRQALIYKALGFP-----MPHFAHVSLILAPD-RSKLSKRHGA--------------sye_tobac ::: ---TSVGQFRDM-GYLPQAMVNYLALLGWGD-----GTENEFFTLEQLVEKFTIERVNKSGAIFDSTKLRWMNGQHLRS-LPSEELNRIIGERWKDAGIATESQG----IFIQDAVLLLKDGIDLITDSEKALSSLLSYPLYETLASA---------EGKPILEDGVSEVAKSLL--------AAYDSG-ELS--------GALAEGQPGWQKWAKNFGKL-LKRKGKSLFMPLRVLLT-------------------------------GKLHGPDIGATTVLLY-KAGTSGSVVPQAGFVTFDERFKILREVQWESFS sye_tobac sye_tobac ---TSVGQFRDM-GYLPQAMVNYLALLGWGD-----GTENEFFTLEQLVEKFTIERVNKSGAIFDSTKLRWMNGQHLRS-LPSEELNRIIGERWKDAGIATESQG----IFIQDAVLLLKDGIDLITDSEKALSSLLSYPLYETLASA---------EGKPILEDGVSEVAKSLL--------AAYDSG-ELS--------GALAEGQPGWQKWAKNFGKL-LKRKGKSLFMPLRVLLT-------------------------------GKLHGPDIGATTVLLY-KAGTSGSVVPQAGFVTFDERFKILREVQWESFS ---------MVRVRFAPSPTGFLHVGGARTALFNFLFARKEKGKFILRIEDTDL--ERSEREYEEKLMESLRWLGLLWDEGPDVGGD-HGPYRQSERVEIYREHAERLVKEGKAYYVYAYPEEIEEMREKLLSEG--KAPHYSQEMFEKFDTPERRREYEEK------GLRPAVFFKMPR-KDYVLNDVVKGEVVFKTGAIG------DFVIMRSN---------------GLPTYNFACVVDDMLMEITHVIRGDDHLSNTLRQLALYEAFEKA-----PPVFAHVSTILGPD-GKKLSKRHGA--------------thermo_mar ::: ---TSVEAFRDM-GYLPEALVNYLALLGWSH-----PEGKELLTLEELISSFSLDRLSPNPAIFDPQKLKWMNGYYLRN-MPIEKLAELAKPFFEKAGIKIIDE-----EYFKKVLEITKERVEVLSEFPE-ESRFFFEDP-----------------APVEIPEEMKEVFSQLK--------EELQNV-RWT-----------MEEITPVFKKVLKQ----HGVKPKEFYMTLRRVLT-------------------------------GREEGPELVNIIPLLG-KEIFLRRIERSLGG------------------thermo_mar thermo_mar ---TSVEAFRDM-GYLPEALVNYLALLGWSH-----PEGKELLTLEELISSFSLDRLSPNPAIFDPQKLKWMNGYYLRN-MPIEKLAELAKPFFEKAGIKIIDE-----EYFKKVLEITKERVEVLSEFPE-ESRFFFEDP-----------------APVEIPEEMKEVFSQLK--------EELQNV-RWT-----------MEEITPVFKKVLKQ----HGVKPKEFYMTLRRVLT-------------------------------GREEGPELVNIIPLLG-KEIFLRRIERSLGG--------------------MASASGSPVRVRFCPSPTGNPHVGLVRTALFNWAFARHHQGTLVFRIEDTDA--ARDSEESYDQLLDSMRWLGFDWDEGPEVGGP-HAPYRQSQRMDIYQDVAQKLLDAGHAYRCYCSQEELDTRREAARAAG--KPSGYDGHCRELTDAQVEEYTSQ--------GREPIVRFRMPDE-AITFTDLVRGEITYLPENVP------DYGIVRAN---------------GAPLYTLVNPVDDALMEITHVLRGEDLLSSTPRQIALYKALIELGVAKEIPAFGHLPYVMGEG-NKKLSKRDPQ--------------strepto_co ::: ---SSLNLYRER-GFLPEGLLNYLSLLGWSLS-----ADQDIFTIEEMVAAFDVSDVQPNPARFDLKKCEAINGDHIRL-LEVKDFTERCRPWLKA-PVAPWAPEDFDEAKWQAIAPHAQTRLKVLSEITD-NVDFLFLPEPVFDEA----------SWTKAMKEGSDALLTTAR--------EKLD-AADWTS----------PEALKEAVLAAGEA----HGLKLGKAQAPVRVAVT-------------------------------GRTVGLPLFESLEVLG-KEKALARIDAALARLAA---------------strepto_co strepto_co ---SSLNLYRER-GFLPEGLLNYLSLLGWSLS-----ADQDIFTIEEMVAAFDVSDVQPNPARFDLKKCEAINGDHIRL-LEVKDFTERCRPWLKA-PVAPWAPEDFDEAKWQAIAPHAQTRLKVLSEITD-NVDFLFLPEPVFDEA----------SWTKAMKEGSDALLTTAR--------EKLD-AADWTS----------PEALKEAVLAAGEA----HGLKLGKAQAPVRVAVT-------------------------------GRTVGLPLFESLEVLG-KEKALARIDAALARLAA--------------------MANKKIRVRYAPSPTGHLHIGNARTALFNYLFARHNKGTLVLRIEDADT--ERNVEGGAESQIENLHWLGIDWDEGPDIGGD-YGPYKQSERKDIYQKYIDQLLEEGKAYYSFKTEEELEAQREEQRAMG--IAPHYVYEYEGMTTDEIKQAQAEARAK----GLKPVVRIHIPEGVTYEWDDIVKGHLSFESDTIG-----GDFVIQKRD---------------GMPTYNFAVVIDDHLMEISHVLRGDDHISNTPKQLCVYEALGWE-----APVFGHMTLIINSATGKKLSKRDESVL------------sye_lacde ::: ---QFIEQYREL-VSCQKPCSTSSSLLGWSP-----VGESEIFSKREFIKQFDPARLSKSPAAFDQKKLDWVNNQYMKT-ADRDELLDLALHNLQEAGLVEANPAPGKMEWVRQLVNMYANQMSYTKQIVD-LSKIFFTEAKYLTDE----------EVEEIKKDEARPAIEEFK--------KQLDKLDNFT-----------AKKIMGAIMATRRE----TGIKGRKLFMPIRIATT-------------------------------RSMVGPGIGEAMELMG-KDTVMKHLDLTLKQLSEAGIE-----------sye_lacde sye_lacde ---QFIEQYREL-VSCQKPCSTSSSLLGWSP-----VGESEIFSKREFIKQFDPARLSKSPAAFDQKKLDWVNNQYMKT-ADRDELLDLALHNLQEAGLVEANPAPGKMEWVRQLVNMYANQMSYTKQIVD-LSKIFFTEAKYLTDE----------EVEEIKKDEARPAIEEFK--------KQLDKLDNFT-----------AKKIMGAIMATRRE----TGIKGRKLFMPIRIATT-------------------------------RSMVGPGIGEAMELMG-KDTVMKHLDLTLKQLSEAGIE-----------------MGNEVRVRYAPSPTGHLHIGNARTALFNYLFARNQGGKFIIRVEDTDK--KRNIEGGEQSQLNYLKWLGIDWDESVDVGGE-YGPYRQSERNDIYKVYYEELLEKGLAYKCYCTEEELEKEREEQIARG--EMPRYSGKHRDLTQEEQEKFIAE--------GRKPSIRFRVPEGKVIAFNDIVKGEISFESDGIG------DFVIVKKD---------------GTPTYNFAVAIDDYLMKMTHVLRGEDHISNTPKQIMIYQAFGWD-----IPQFGHMTLIVNES-RKKLSKRDESII------------sye_bacsu ::: ---QFIEQYKEL-GYLPEALFNFIGLLGWSP-----VGEEELFTKEQFIEIFDVNRLSKSPALFDMHKLKWVNNQYVKK-LDLDQVVELTLPHLQKAGKVGTELSAEEQEWVRKLISLYHEQLSYGAEIVE-LTDLFFTDEIEYNQE----------AKAVLEEEQVPEVLSTFA--------AKLEELEEFT-----------PDNIKASIKAVQKE----TGHKGKKLFMPIRVAVT-------------------------------GQTHGPELPQSIELIG-KETAIQRLKNI---------------------sye_bacsu sye_bacsu ---QFIEQYKEL-GYLPEALFNFIGLLGWSP-----VGEEELFTKEQFIEIFDVNRLSKSPALFDMHKLKWVNNQYVKK-LDLDQVVELTLPHLQKAGKVGTELSAEEQEWVRKLISLYHEQLSYGAEIVE-LTDLFFTDEIEYNQE----------AKAVLEEEQVPEVLSTFA--------AKLEELEEFT-----------PDNIKASIKAVQKE----TGHKGKKLFMPIRVAVT-------------------------------GQTHGPELPQSIELIG-KETAIQRLKNI---------------------------MAKDVRVGYAPSPTGHLHIGGARTALFNYLFARHHGGKMIVRIEDTDI--ERNVEGGEQSQLENLQWLGIDYDESVDKDGG-YGPYRQTERLDIYRKYVDELLEQGHAYKCFCTPEELEREREEQRAAG-IAAPQYSGKCRRLTPEQVAELEAQ--------GKPYTIRLKVPEGKTYEVDDLVRGKVTFESKDIG------DWVIVKAN---------------GIPTYNFAVVIDDHLMEISHVFRGEEHLSNTPKQLMVYEYFGWE-----PPQFAHLTLIVNEQ-RKKLSKRDESII------------sye_bacst ::: ---QFVSQYKEL-GYLPEAMFNFFALLGWSP-----EGEEEIFSKDELIRIFDVSRLSKSPSMFDTKKLTWMNNQYIKK-LDLDRLVELALPHLVKAGRLPADMSDEQRQWARDLIALYQEQMSYGAEIVP-LSELFFKEEVEYEDE----------ARQVLAEEQVPDVLSAFL--------AHVRDLDPFT-----------ADEIKAAIKAVQKA----TGQKGKKLFMPIRAAVT-------------------------------GQTHGPELPFAIQLLG-KQKVIERLERALQEKF----------------sye_bacst sye_bacst ---QFVSQYKEL-GYLPEAMFNFFALLGWSP-----EGEEEIFSKDELIRIFDVSRLSKSPSMFDTKKLTWMNNQYIKK-LDLDRLVELALPHLVKAGRLPADMSDEQRQWARDLIALYQEQMSYGAEIVP-LSELFFKEEVEYEDE----------ARQVLAEEQVPDVLSAFL--------AHVRDLDPFT-----------ADEIKAAIKAVQKA----TGQKGKKLFMPIRAAVT-------------------------------GQTHGPELPFAIQLLG-KQKVIERLERALQEKF----------------TSDGTPQAAKVRVRFCPSPTGVPHVGMVRTALFNWAYARHTGGTFVLRIEDTDA--DRDSEESYLALLDALRWLGLNWDEGPEVGGP-YGPYRQSQRTDIYREVVAKLLATGEAYYAFSTPEEVENRHLAAGRNP---KLGYDNFDRDLTDAQFSAYLAE--------GRKPVVRLRMPDE-DISWDDLVRGTTTFAVGTVP------DYVLTRAS---------------GDPLYTLVNPCDDALMKITHVLRGEDLLSSTPRQVALYQALIRIGMAERIPEFGHFPSVLGEG-TKKLSKREPQ--------------mycob_lepr : mycob_lepr mycob_lepr :: ---SNLFAHRDR-GFIPEGLLNYLALLGWAIA-----DDHDLFSLDEMVAAFDVVDVNSNPARFDQKKADAVNAEHIRM-LDSEDFAGRLRDYFTTHGYHIALDPANYEAGFVAAAQLVQTRIVVLGDAWD-LLKFLNDDEYSIDSK----------AAAKELDADAGPVLDVAC--------AVLDSLVDWT-----------TASIEDVLKVALIE---GLGLKPRKVFGPIRVAAT-------------------------------GALVSPPLFESLELLG-RARSLQRLSAARARVTSA-----------------SNLFAHRDR-GFIPEGLLNYLALLGWAIA-----DDHDLFSLDEMVAAFDVVDVNSNPARFDQKKADAVNAEHIRM-LDSEDFAGRLRDYFTTHGYHIALDPANYEAGFVAAAQLVQTRIVVLGDAWD-LLKFLNDDEYSIDSK----------AAAKELDADAGPVLDVAC--------AVLDSLVDWT-----------TASIEDVLKVALIE---GLGLKPRKVFGPIRVAAT-------------------------------GALVSPPLFESLELLG-RARSLQRLSAARARVTSA----------------------MSTRVRYAPSPTGLQHIGGIRTALFNYFFAKSCGGKFLLRIEDTDQ--SRYSPEAENDLYSSLKWLGISFDEGPVVGGD-YAPYVQSQRSAIYKQYAKYLIESGHAYYCYCSPERLERIKKIQNINK--MPPGYDRHCRNLSNEEVENALIK--------KIKPVVRFKIPLEGDTSFDDILLGRITWANKDIS-----PDPVILKSD---------------GLPTYHLANVVDDYLMKITHVLRAQEWVSSGPLHVLLYKAFKWK-----PPIYCHLPMVMGND-GQKLSKRHGS--------------sye_borbu ::: ---TALRQFIED-GYLPEAIINYVTLLGWSYD-----DKREFFSKNDLEQFFSIEKINKSPAIFDYHKLDFFNSYYIRE-KKDEDLFNLLLPFFQKKGYVSKPSTLEENQKLKLLIPLIKSRIKKLSDALN-MTKFFYEDIKSWNLDEF--------LSRKKTAKEVCSILELIK--------PILEGFEKRS-----------SEENDKIFYDFAES----NGFKLGEILLPIRIAAL-------------------------------GSKVSPPLFDSLKLIG-KSKVFERIKLAQEFLRINE-------------sye_borbu sye_borbu ---TALRQFIED-GYLPEAIINYVTLLGWSYD-----DKREFFSKNDLEQFFSIEKINKSPAIFDYHKLDFFNSYYIRE-KKDEDLFNLLLPFFQKKGYVSKPSTLEENQKLKLLIPLIKSRIKKLSDALN-MTKFFYEDIKSWNLDEF--------LSRKKTAKEVCSILELIK--------PILEGFEKRS-----------SEENDKIFYDFAES----NGFKLGEILLPIRIAAL-------------------------------GSKVSPPLFDSLKLIG-KSKVFERIKLAQEFLRINE-------------APFNLDPNVKVRTRFAPSPTGYLHVGGARTALYSWLYAKHNNGEFVLRIEDTDL--ERSTPEATAAIIEGMEWLNLPWEH---------GPYYQTKRFDRYNQVIDEMIEQGLAYRCYCTKEHLEELRHTQEQNK--EKPRYDRHCLHDH-NHSP-------------DEPHVVRFKNPTEGSVVFDDAVRGRIEISNSELD------DLIIRRTD---------------GSPTYNFCVVVDDWDMGITHVVRGEDHINNTPRQINILKAIGAP-----IPTYAHVSMINGDD-GQKLSKRHGA--------------sye_haein ::: ---VSVMQYRDD-GYLPEALINYLVRLGWGH------GDQEIFSREEMINYFELDHVSKSASAFNTEKLQWLNQHYIRE-LPPEYVAKHLEWHYKDQGIDTSNG-----PALTEIVTMLAERCKTLKEMAR-SSRYFFEEFETFDEA----------AAKKHFKGNAAEALAKVK--------EKLTALSSWD-----------LHSIHEAIEQTAAE----LEVGMGKVGMPLRVAVT-------------------------------GSGQSPSMDVTLVGIG-RDRVLARIQRAIDFIHAQNA------------sye_haein sye_haein ---VSVMQYRDD-GYLPEALINYLVRLGWGH------GDQEIFSREEMINYFELDHVSKSASAFNTEKLQWLNQHYIRE-LPPEYVAKHLEWHYKDQGIDTSNG-----PALTEIVTMLAERCKTLKEMAR-SSRYFFEEFETFDEA----------AAKKHFKGNAAEALAKVK--------EKLTALSSWD-----------LHSIHEAIEQTAAE----LEVGMGKVGMPLRVAVT-------------------------------GSGQSPSMDVTLVGIG-RDRVLARIQRAIDFIHAQNA--------------------MKIKTRFAPSPTGYLHVGGARTALYSWLFARNHGGEFVLRIEDTDL--ERSTPEAIEAIMDGMNWLSLEWDE---------GPYYQTKRFDRYNAVIDQMLEEGTAYKCYCSKERLEALREEQMAKG--EKPRYDGRCRHSHEHHAD-------------DEPCVVRFANPQEGSVVFDDQIRGPIEFSNQELD------DLIIRRTD---------------GSPTYNFCVVVDDWDMEITHVIRGEDHINNTPRQINILKALKAP-----VPVYAHVSMINGDD-GKKLSKRHGA--------------sye_ecoli : sye_ecoli sye_ecoli :: ---VSVMQYRDD-GYLPEALLNYLVRLGWSH------GDQEIFTREEMIKYFTLNAVSKSASAFNTDKLLWLNHHYINA-LPPEYVATHLQWHIEQENIDTRNG-----PQLADLVKLLGERCKTLKEMAQ-SCRYFYEDFAEFDAD----------AAKKHLRPVARQPLEVVR--------DKLAAITDWT-----------AENVHHAIQATADE----LEVGMGKVGMPLRVAVT-------------------------------GAGQSPALDVTVHAIG-KTRSIERINKALDFIAERENQQ-------------VSVMQYRDD-GYLPEALLNYLVRLGWSH------GDQEIFTREEMIKYFTLNAVSKSASAFNTDKLLWLNHHYINA-LPPEYVATHLQWHIEQENIDTRNG-----PQLADLVKLLGERCKTLKEMAQ-SCRYFYEDFAEFDAD----------AAKKHLRPVARQPLEVVR--------DKLAAITDWT-----------AENVHHAIQATADE----LEVGMGKVGMPLRVAVT-------------------------------GAGQSPALDVTVHAIG-KTRSIERINKALDFIAERENQQ---------------------MLRFAPSPTGDMHIGNLRAAIFNYIVAKQQYKPFLIRIEDTDK--ERNIEGKDQEILEILKLMGISWDKL----------VYQSHNIDYHREMAEKLLKENKAFYCYASAEFLEREKEKAKNEK--RPFRYSDEWATLEKDK---------------HHAPVVRLKAP-NHAVSFNDAIKKEVKFEPDELD------SFVLLRQD---------------KSPTYNFACACDDLLYKISLIIRGEDHVSNTPKQILIQQALGSND----PIVYAHLPIILDEVSGKKMSKRDEA--------------heli_pylor ::: ---SSVKWLLNQ-GFLPVAIANYLITIGN-------KVPKEVFSLDEAIEWFSLENLSSSPAHFNLKYLKHLNHEHLKL-LDDDKLLELTSIKD---------------KNLLGLLRLFIEECGTLLELRE-KISLFLEPKD----------------IVKTYENEDFKERCLAL--------FNALTSMDFQA----------YKDFESFKKEAMRL----SQLKGKDFFKPLRILLT-------------------------------GNSHGVELPLIFPYIQSHHQEVLRLKA----------------------heli_pylor heli_pylor ---SSVKWLLNQ-GFLPVAIANYLITIGN-------KVPKEVFSLDEAIEWFSLENLSSSPAHFNLKYLKHLNHEHLKL-LDDDKLLELTSIKD---------------KNLLGLLRLFIEECGTLLELRE-KISLFLEPKD----------------IVKTYENEDFKERCLAL--------FNALTSMDFQA----------YKDFESFKKEAMRL----SQLKGKDFFKPLRILLT-------------------------------GNSHGVELPLIFPYIQSHHQEVLRLKA----------------------MKLTGFLKQNVRVRFAPSPTGHLHIGGLRTAFFNYLFAKKYGGDFILRIEDTDR--TRFIY-------SSLNFYNLLPDEGPREGGK-FGPYEQSKRLEIYRNAAYRLIDSGHAYRCFCSENRLDLLRKTAEKRG--EIPKYDRKCANLSSRDAVKMEQN--------GEKFVIRFKLD-KQNVQFHDEVFGSVNQFIDES-------DPVLLKSD---------------GFPTYHLANVIDDRKMEISHVIRGMEWLSSTGKHTILYKAFNWT-----PPKFVHLSLIMRSA-TKKLSKRDKD--------------caeno_eleg ::: ---AFVSYYSEQLGALPEAVLNLMIRNGAGIRN---FDAEHFYSLDEMIEQFDLSLLGRRNLLLDSDVLQKYSRMAFQK-SDFKELYPRIIDILNKKSNYSTSREDI--QKIVTFLKAKEENFGFLSSLST-EFSWFFTRPQ---------------SSQLLKESHPNVDLRNIL--------NSLLEIEVFN-----------SESLEYLAKNH--------QLNLAKAMGIVRISLI-------------------------------GSKKGPPISELVEFFG-MTECHRRI----RIMQELL-------------caeno_eleg caeno_eleg ---AFVSYYSEQLGALPEAVLNLMIRNGAGIRN---FDAEHFYSLDEMIEQFDLSLLGRRNLLLDSDVLQKYSRMAFQK-SDFKELYPRIIDILNKKSNYSTSREDI--QKIVTFLKAKEENFGFLSSLST-EFSWFFTRPQ---------------SSQLLKESHPNVDLRNIL--------NSLLEIEVFN-----------SESLEYLAKNH--------QLNLAKAMGIVRISLI-------------------------------GSKKGPPISELVEFFG-MTECHRRI----RIMQELL---------------------MTVRVRIAPSPTGNLHIGTARTAVFNWLFARHTGGTFILRVEDTDL--ERSKAEYTENIQSGLQWLGLNWDEG---------PFFQTQRLDHYRKAIQQLLDQGLAYRCYCTSEELEQMREAQKAKN--QAPRYDNRHRNLTPDQEQALRAE--------GRQPVIRFRIDDDRQIVWQDQIRGQVVWQGSDLG-----GDMVIARAS--------ENPEEAFGQPLYNLAVVVDDIDMAITHVIRGEDHIANTAKQILLYEALGGA-----VPTFAHTPLILNQE-GKKLSKRDGV--------------sye_syny3 : sye_syny3 sye_syny3 :: ---TSIDDFRAM-GFLPQAIANYMCLLGWTPP----DSTQEIFTLAEAAEQFSLERVNKAGAKFDWQKLDWINSQYLHA-LPAAELVPLLIPHLEAGGHQVDPDRDQ--AWLVGLATLIGPSLTRLTDAAT-ESQLLFGDRLELKED----------GQKQLAVEGAKAVLEAAL--------TFSQNTPELT-----------LDEAKGEINRLTKE----LGLKKGVVMKSLRAGLM-------------------------------GTVQGPDLLQSWLLLQQKGWATTRLTQAIAAE--------------------TSIDDFRAM-GFLPQAIANYMCLLGWTPP----DSTQEIFTLAEAAEQFSLERVNKAGAKFDWQKLDWINSQYLHA-LPAAELVPLLIPHLEAGGHQVDPDRDQ--AWLVGLATLIGPSLTRLTDAAT-ESQLLFGDRLELKED----------GQKQLAVEGAKAVLEAAL--------TFSQNTPELT-----------LDEAKGEINRLTKE----LGLKKGVVMKSLRAGLM-------------------------------GTVQGPDLLQSWLLLQQKGWATTRLTQAIAAE------------------------MSKVKTRFAPSPTGYLHLGNARTAIFSYLFARHNNGGFVLRIEDTDP--ERSKKEYEEMLIEDLKWLGIDWDEF----------YRQSERFDIYREYVNKLLESGHAYPCFCTPEELEKEREEARKKG--IPYRYSGKCRHLTPEEVEKFKKE--------GKPFAIRFKVPENRTVVFEDLIKGHIAINTDDFG------DFVIVRSD---------------GSPTYNFVVVVDDALMGITHVIRGEDHIPNTPKQILIYEALGFP-----VPKFAHLPVILGED-RSKLSKRHGA--------------sye_aquae ::: ---VSVRAYREE-GYMPEALFNYLCLLGWSPP----EEGREIFSKEELIKIFDLKDVNDSPAVFNKEKLKWMNGVYIREVLPLDVLLERAIPFLEKAG--YDTSDR---EYIKKVLEYTRDSFDTLSEMVD-RLRPFFVDEFEIPEE----------LWSFLDDEKAYQVLSAFL--------EKIREKKPET-----------PQEVKKLAKEIQKA----LKVKPPQVWKPLRIALT-------------------------------GELEGVGIDILIAVLP-KEKIEKRILRVLEKLS----------------sye_aquae sye_aquae ---VSVRAYREE-GYMPEALFNYLCLLGWSPP----EEGREIFSKEELIKIFDLKDVNDSPAVFNKEKLKWMNGVYIREVLPLDVLLERAIPFLEKAG--YDTSDR---EYIKKVLEYTRDSFDTLSEMVD-RLRPFFVDEFEIPEE----------LWSFLDDEKAYQVLSAFL--------EKIREKKPET-----------PQEVKKLAKEIQKA----LKVKPPQVWKPLRIALT-------------------------------GELEGVGIDILIAVLP-KEKIEKRILRVLEKLS-----------------------MSLIVTRFAPSPTGYLHIGGLRTAIFNYLFARANQGKFFLRIEDTDL--SRNSIEAANAIIEAFKWVGLEYDG---------EILYQSKRFEIYKEYIQKLLDEDKAYYCYMSKEELDALREEQKARK--ETPRYDNRYRDFKGTPPK-------------GIEPVVRIKVPQNEVIGFNDGVKGEVKVNTNELD------DFIIARSD---------------GTPTYNFVVTIDDALMGITDVIRGDDHLSNTPKQIVLYKALNFK-----IPNFFHVPMILNEE-GQKLSKRHGA--------------sye_helpy ::: ---TNVMDYQEM-GYLKEALVNFLARLGWSY------QDKEVFSMQELLELFDPKDLNSSPSCFSWHKLNWLNAHYLKN-QSVQELLKLLKPFSFSDLSHLNP------TQLDRLLDALKERSQTLKELAL-KIDEVLIAPVEYEEK----------VFKKLNQALVMPLLEKFK--------LELNKANFND-----------ESALENAMRQIIEE----EKIKAGSFMQPLRLALL-------------------------------GKGGGIGLKEALFILG-KTESVKRIEDFLKN------------------sye_helpy sye_helpy ---TNVMDYQEM-GYLKEALVNFLARLGWSY------QDKEVFSMQELLELFDPKDLNSSPSCFSWHKLNWLNAHYLKN-QSVQELLKLLKPFSFSDLSHLNP------TQLDRLLDALKERSQTLKELAL-KIDEVLIAPVEYEEK----------VFKKLNQALVMPLLEKFK--------LELNKANFND-----------ESALENAMRQIIEE----EKIKAGSFMQPLRLALL-------------------------------GKGGGIGLKEALFILG-KTESVKRIEDFLKN-------------------------MTNIITRFAPSPTGFLHIGSARTALFNYLFARHNNGKFFLRIEDTDK--KRSTKEAVEAIFSGLKWLGLNWDG---------EVIFQSKRNSLYKEAALKLLKEGKAYYCFTRQEEIAKQRQQALKDK--QHFIFNSEWRDKGPSTYPADIK------------PVIRLKVPREGSITIHDTLQGEIVIENSHID------DMILIRTD---------------GTATYMLAVIVDDHDMGITHIIRGDDHLTNAARQIAIYHAFGYE-----VPNMTHIPLIHGAD-GTKLSKRHGA--------------ricket_pro : ricket_pro ricket_pro :: ---LGVEAYKDM-GYLPESLCNYLLRLGWSH------GDDEIISMNQAIEWFNLASLGKSPSKLDFAKMNSINSHYLRM-LDNDSLTSKTVEILKQNYKISEKEV----SYIKQAMPSLIVRSETLRDLAQ-LAYIYLVDSPMIYSQ----------DAKEVINNCDKDLIKQVI--------ENLSKLEQFN-----------KECVQNKFKEIAIY----NGLKLNDIMKPVRALIT-------------------------------GMTASPSVFEIAETLG-KENILKRLKIIYYNNLNF-----------------LGVEAYKDM-GYLPESLCNYLLRLGWSH------GDDEIISMNQAIEWFNLASLGKSPSKLDFAKMNSINSHYLRM-LDNDSLTSKTVEILKQNYKISEKEV----SYIKQAMPSLIVRSETLRDLAQ-LAYIYLVDSPMIYSQ----------DAKEVINNCDKDLIKQVI--------ENLSKLEQFN-----------KECVQNKFKEIAIY----NGLKLNDIMKPVRALIT-------------------------------GMTASPSVFEIAETLG-KENILKRLKIIYYNNLNF----------------MPAASDKPVVTRFAPSPTGYLHIGGGRTALFNWLYARGRKGTFLLRIEDTDR--ERSTPEATDAILRGLTWLGLDWDG---------EVVSQFARKDRHAEVAREMLERGAAYKCFSTQEEIEAFRESARAEG--RSTLFRSPWRDADPTSHPDA-------------PFVIRMKAPRSGETVIEDEVQGTVRFQNETLD------DMVVLRSD---------------GTPTYMLAVVVDDHDMGVTHVIRGDDHLNNAARQTMVYEAMGWE-----VPVWAHIPLIHGPD-GKKLSKRHGA--------------rhodo_spha ::: ---LGVEEYQAM-GYPAAGMRNYLARLGWSH------GDDEFFTSEQAMDWFDLGGIGRSPARLDFKKLESVCGQHIAV-MEDAELMREIAAYLAAARKPALTDLQA--ARLEKGLYALKDRAKTFPELLE-KARFALESRPIVADD----------AAAKALDPVSRGILRELT--------P-MLQAASWS-----------KQDLEAILTAFASE----KGMGFGKLAAPLRTALA-------------------------------GRTVTPSVYDMMLVIG-RDETIARLEDAAAA------------------rhodo_spha rhodo_spha ---LGVEEYQAM-GYPAAGMRNYLARLGWSH------GDDEFFTSEQAMDWFDLGGIGRSPARLDFKKLESVCGQHIAV-MEDAELMREIAAYLAAARKPALTDLQA--ARLEKGLYALKDRAKTFPELLE-KARFALESRPIVADD----------AAAKALDPVSRGILRELT--------P-MLQAASWS-----------KQDLEAILTAFASE----KGMGFGKLAAPLRTALA-------------------------------GRTVTPSVYDMMLVIG-RDETIARLEDAAAA-------------------------MTKVITRFAPSPTGMLHVGNIRVALLNWLYAKKHNGKFILRFDDTDL--ERSKQKYKNDIERDLKFLNINWDQ----------TFNQLSRVSRYHEIKNLLINKKRLYACYETKEELELKRKLQLSKG--LPPIYDRASLNLTEKQIQKYIEQ--------GRKPHYRFFLSYE-PISWFDMIKGEIKYDGKTLS------DPIVIRAD---------------GSMTYMLCSVIDDIDYDITHIIRGEDHVSNTAIQIQMFEALNKI-----PPVFAHLSLIINKE--EKISKRVGG--------------ricket_pro ::: ---FEIAYLKKEVGLEAMTIASFFSLLGSSLH-----IF-PYKSIEKLVAQFEISSFSKSPTIYQQYDLERLNHKLLIS-LDFNEVKERLKEIDAD-------------YIDENFWLSVR---PNLQKLSD-IKDWWDICYQTPKIKNLN-------LDKEYLKQASKLLP-LKI--------TKDSWSIWT-------------KEITNIT-----------GRKGKELFLPLRLALT-------------------------------GRESGPEIAGILPLID-REEIIRRLISIA--------------------ricket_pro ricket_pro ---FEIAYLKKEVGLEAMTIASFFSLLGSSLH-----IF-PYKSIEKLVAQFEISSFSKSPTIYQQYDLERLNHKLLIS-LDFNEVKERLKEIDAD-------------YIDENFWLSVR---PNLQKLSD-IKDWWDICYQTPKIKNLN-------LDKEYLKQASKLLP-LKI--------TKDSWSIWT-------------KEITNIT-----------GRKGKELFLPLRLALT-------------------------------GRESGPEIAGILPLID-REEIIRRLISIA----------------------------MSVAVPFAPSPTGLLHVGNVRLALVNWLFARKAGGNFLVRLDDTDE--ERSKPEYAEGIERDLTWLGLTWDR----------FARESDRYGATDEVAAALKASGRLYPCYETPEELNLKRASLSSQG--RPPIYDRAALRLGDADRARLEAE--------GRKPHWRFKLEHT-PVEWTDLVRGPVHFEGSALS------DPVLIAED---------------GRPLYTLTSVVDDADLAITHVIRGEDHLANTAVQIQIFEAVGGA-----VPVFAHLPLLTDAT-GQGLSKRLGS--------------sye_azobr : sye_azobr sye_azobr :: ---LSVASLREEEGIEPMALASLLAKLGTSDA-----IE-PRLTLDELVAEFDIAKVSRATPKFDPEELLRLNARILHL-LPFERVAGELAASVWM-------------MPTPAFWEAV----PNLSRVAE-ARDWWAVTHAP--VARRR-------TIPLFLAEAATLLPKEPW--------DLSTWGTWT-------------GAVKAKT-----------GRKGKDLFLPLRRALT-------------------------------GRDHGGQLKNLLPLIG-RTRAHKRLAGETA----------------------LSVASLREEEGIEPMALASLLAKLGTSDA-----IE-PRLTLDELVAEFDIAKVSRATPKFDPEELLRLNARILHL-LPFERVAGELAASVWM-------------MPTPAFWEAV----PNLSRVAE-ARDWWAVTHAP--VARRR-------TIPLFLAEAATLLPKEPW--------DLSTWGTWT-------------GAVKAKT-----------GRKGKDLFLPLRRALT-------------------------------GRDHGGQLKNLLPLIG-RTRAHKRLAGETA-------------------- 1gln 1.0 0.5 Window length = 8 Window length = 40 Julie Thompson – IGBMC Error detection and correction RASCAL (Thompson et al, 2003), Refiner (Chakrabati et al, 2006) RASCAL Define sequence groups with the Secator program Wicker N. et al. (2001). Define core blocks : regions with average NorMD_sw above a specified threshold Calculate a Gribskov profile for each block in each group Julie Thompson – IGBMC Error detection and correction RASCAL, errors within core blocks metalloprotease Julie Thompson – IGBMC HExxH Error detection and correction RASCAL, errors between core blocks methyltransferase DxxxG[AST]GxF[ILV] Julie Thompson – IGBMC DxxxG[AST]GxF[ILV] Homology detection methods Sequence percent identity: local analysis of positional conservation >30% identity sequences are homologous 15-30% identity ‘twilight zone’ AL2CO (Pi, Grishin, 2001), SEGID (Wang,Zu,2003), NorMD Conserved regions LEON (Thompson et al, 2004), MCOFFEE (Moretti et al, 2007) Julie Thompson – IGBMC Homology analysis with LEON vertical analysis :sequence clustering, intermediate sequences horizontal analysis : residue conservation, motif context information composition analysis : prediction of compositionally biased segments Homologous regions are delineated Removal of sequences non-homologous to query Julie Thompson – IGBMC Homology analysis with LEON BlastP results : Query sequence: DKK1_HUMAN * DKK1_HUMAN Dickkopf related protein-1 precursor 1e-151 * DKK3_MOUSE Dickkopf related protein-3 precursor 8e-07 * * TXCA_CAEEX Neurotoxic peptide caeron precursor. 0.007 PRK1_RAT Prokineticin 1 precursor 0.021 * VPRA_DENPO Intestinal toxin 1 _MIT 0.10 Q8BKK7 MEGF11 protein. 0.10 * COL_RABIT Colipase precursor. 0.13 * PRK2_HUMAN Prokineticin 2 precursor 0.17 Q7XZ34 Growth factor _Fragment_. 0.17 * 1imt_ VENOM. MAMBA INTESTINAL TOXIN 1, 0.23 * Q863H5 Bv8/prokineticin 2-like protein. 0.30 VE6_RHPV1 E6 protein. 1.1 COL_CANFA Colipase precursor. 3.3 Q9Y7V5 Conidiospore surface protein. 3.3 COLA_HORSE Procolipase A precursor _Fragment_. 4.3 O00508 Latent TGF-beta binding protein-4. 5.6 1pco_ LIPASE PROTEIN COFACTOR. 7.3 Q8SRF4 GTP binding protein. 7.3 NTC1_MOUSE Neurogenic locus notch homolog 9.6 * * * Julie Thompson – IGBMC Homology analysis with LEON dkk1 dkk2 dkk3 Prokinecitin/ Intestinal toxin Lipase protein cofactor Pfam : Dickkopf N-terminal domain Colipase Colipase C-terminal domain Julie Thompson – IGBMC Structural proteomics : target characterisation Detection of structural homologs for targets in the SPINE (Structural Proteomics in Europe) project For a training set of 510 potential targets : No. of targets with at least 1 PDB neighbour BlastP (E<10-7) 142 (28%) BlastP (E<10-4) 166 (33%) PipeAlign (BlastP E<10) 196 (38%) PipeAlign (PDB-Blast) 223 (44%) Julie Thompson – IGBMC Conserved residue analysis Active site residues are under evolutionary pressure to maintain their functional integrity and undergo fewer mutations than less functionally important amino acids Methods: Evolutionary trace (Lichtarge et al, 1996): sequence conservation patterns in homologous proteins are mapped onto the protein surface to generate clusters identifying functional interfaces Julie Thompson – IGBMC Conserved residue analysis Comparison of sequence-based methods FRcons combines information : • conservation at each site • amino acid distribution • predicted secondary structure (ss) • predicted relative solvent accessibility (rsa) Julie Thompson – IGBMC FRcons: Fischer et al. Bioinformatics 2008 OrdAli : Ordered Alignment Analysis color scheme residues conserved in all sequences in family structural or functional importance: characteristic motifs residues conserved within a sub-group of sequences discriminant residues Julie Thompson – IGBMC Schematic alignment of aspartyl-tRNA synthetases • universal proteins, play a key role in traduction 180 200 220 240 260 280 300 320 Euc Arc Bac Anticodon binding domain 340 360 380 400 420 P 440 460 L Q PQ KQ 480 500 520 540 560 R Euc Arc Bac Motif I Flipping loop Motif II Catalytic core I 690 710 730 750 770 790 810 Insertion domain 830 850 870 890 930 HG Euc Family conserved Archaea+Bacteria Archaea+Eukaryote Arc Bac Motif III Catalytic core II Julie Thompson – IGBMC PipeAlign: automatic protein analysis BlastP search Ballast Anchors DbClustal Alignment Query Sequence RASCALED MACS Multiple Alignment of Complete Sequences Anchors LMS (local maximum segments) Homologous regions Plewniak et al. (2000) Bioinformatics. Thompson et al (2000) Nucl Acids Res. Thompson et al. (2003) Bioinformatics. Thompson et al (2004) Nucl Acids Res. Thompson et al. (2001) J Mol Biol. • Secator/DPC : automatic clustering algorithms Wicker et al. (2001) Mol Biol Evol. Wicker et al. (2002) Nucl Acids Res. Phylogeny Conserved residues/domains 2D/3D structure prediction Cellular location prediction … Julie Thompson – IGBMC http://www-igbmc.u-strasbg.fr/PipeAlign/ Julie Thompson – IGBMC Multiple sequence alignment editors No automatic method is 100% reliable - manual verification and refinement is essential! SeqLab GCG Wisconsin Package SeaView (Gaultier et al, 1996) http://pbil.univ-lyon1.fr/software/seaview.html UNIX/Linux, Windows 95+, MAC OS 8,9,X WEB servers : GeneAlign (Kurukawa) http://www.gen-info.osaka-u.ac.jp/geneweb2/genealign/ Jalview (Clamp, 1998) http://www.ebi.ac.uk/~michele/jalview/ CINEMA (Lord et al, 2002) http://www.bioinf.man.ac.uk/dbbrowser/cinema-mx Julie Thompson – IGBMC Multiple Sequence Alignment Introduction: what is a multiple alignment? Multiple alignment construction Multiple alignment analysis Traditional approaches: optimal, progressive Alignment parameters Iterative and co-operative approaches Conserved/homologous regions Quality analysis/error detection Multiple alignment applications Julie Thompson – IGBMC Central role of multiple alignments euk domain structure bac arc conserved, functional sites Julie Thompson – IGBMC Central role of multiple alignments Comparative genomics Phylogenetic studies Hierarchical function annotation: homologs, domains, motifs Gene identification, validation Multiple alignment Structure comparison, modelling Interaction networks RNA sequence, structure, function Human genetics, SNPs Therapeutics, drug design insertion domain DBD Therapeutics, drug discovery LBD Julie Thompson – IGBMC binding sites / mutations Example: protein, RNA complexes : ASP tRNA synthetase : Comparative genomics Phylogenetic studies eukaryotic extension Hierarchical function annotation: anticodon binding aspRS, tRNA interactions : euk arc bac hinge region euk arc bac Gene identification, validation euk arc bac U A catalytic domain Multiple alignment GG A U GUC Structure comparison, modelling GGUUC.A.UC Interaction networks RNA sequence, structure, function amino acid acceptor stem AspRS in complex with tRNAAsp A B (Cavarelli et al, 1993) E B A aspartate determinants are conserved in Eprokaryotes and eukaryotes (Becker et al, 1996) Human genetics, SNPs Therapeutics, drug design anticodon loop and stem global alignment cloverleaf representation Julie Thompson – IGBMC anticodon-binding domain Westhof et al, 1988 Ruff et al, 1991 Example: Bardet Biedl Syndrome Phylogenetic studies Comparative Hierarchical functiondisease, annotation: Identification ofgenomics newbased genesanalysis responsible forA BBS : Ba rare genetic A recessive Multiple alignment identified aE new gene with a chaperonin-like fold E(BBS10)autosomic B eukaryotic extension anticodon binding probably caused by a defect at the basal body of ciliated cells deletion insertion 3 insertion 1 insertion 2 Phenotypes : obesity, retinopathy, polydactyly, global alignment BBS10retardation, hypogonadism, renal failure mental BBS6 Gene identification, validation chaperonin catalytic domain anticodon-binding domain Multiple alignment 9 genes are known to be involved : BBS1 – BBS9 hinge region Structure comparison, modelling euk arc bac In a comparative genomics study, euk arc BBS10 shows688 a high frequency of mutation (~20% of patients) Li et al, (2004)gene identified bac Interaction networks RNA sequence, structure, function genes implicated in cilia andeuk arc flagella bac U A GG A U GUC GGUUC.A.UC Clinical studies have identified a candidate chromosomic region Human genetics, SNPs of 8Mb with approx. 23 genes Therapeutics, drug design • including 4 genes from set of 688 Julie Thompson – IGBMC J. Muller et al 2006