Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
K E – Q Genetic semihomology algorithm K E N D N R – Q H D G Y H Y – R R G analysis ofR protein sequences W A new approach to the comparative S G S R G C R C T A P point of view S It is admitted that from evolutionary the A ‘language’ P have evolved S genetic code andTamino acid T also actA with strict P coherence S simultaneously. They T P with each other. Therefore, in Aanalysis of protein S differentiation Iand variability, both Llevels should be V L considered simultaneously. M V L L I V I L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F The tools currently used for comparative sequence analysis apply the Markovian model of amino acid replacement and are based on stochastic matrices of the observed amino acid substitution frequencies Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra BLOSUM62 matrix of amino acid replacements A R N D C Q E G H I L K M F P S T W Y V 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 A 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 R 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 N 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 D 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 C 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 Q 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 E 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 G 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 H 4 2 -3 1 0 -3 -2 -1 -3 -1 3 I 4 -2 2 0 -3 -2 -1 -2 -1 1 L 5 -1 -3 -1 0 -1 -3 -2 -2 K 5 0 -2 -1 -1 -1 -1 1 M 6 -4 -2 -2 1 3 -1 F 7 -1 -1 -4 -3 -2 P 4 1 5 -3 -2 11 -2 -2 2 7 -2 0 -3 -1 4 S T W Y V Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra BLAST protein search - output BLASTP 2.2.2 [Dec-14-2001] Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Genetic semihomology algorithm The aim of the new algorithm elaboration is to overcome the basic disadvantages of protein sequence analysis tools and to exclude some basic errors in the assumptions of the existing statistical methods. It is to be able to explain the mechanism and pathway of protein evolution and differentiation, not only limited to the description of the initial and final step of the observed changes. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Genetic semihomology algorithm Another goal of this algorithm is to make it applicable to any group of proteins of any nature, function and location. It can be achieved for two reasons: 1) minimization of basic assumptions limited to the general amino acid: codon translation table and assuming that single point mutation is a principle, most common, mechanism of protein variability; 2) non-statistical approach (no stochastic matrices) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Genetic semihomology algorithm The algorithm of genetic semihomology assumes the close relation between the compared amino acids and their codons in related proteins. The algorithm is based on the network of genetic relationship between amino acids. Such assumption makes the same residues at different positions of the sequence unequal with respect to their changeability. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Genetic semihomology algorithm The algorithm assumes that the basic mechanism of differentiation among related proteins consists in the single point mutation. The general part of the algorithm is the three-dimensional diagram reflecting the network of genetic relationship between amino acids Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Diagram Diagram of of amino codon acid genetic genetic relationships relationships K AAA E GAA K AAG E GAG N AAC R AGA 1 D GAU T ACA I AUA M AUG A GCU S UCC P CCU L CUA S UCU L UUA L CUG V GUC I AUU S UCG P CCC V GUG I AUC C UGU S UCA P CCG A GCC V GUA C UGC R CGU P CCA T ACU W UGG R CGC A GCG T ACC Y UAU – UGA G GGU A GCA T ACG H CAU R CGG G GGC S AGU 3 Y UAC R CGA G GGG S AGC 2 H CAC G GGA R AGG – UAG Q CAG D GAC N AAU AGCU – UAA Q CAA L UUG L CUC V GUU F UUC L CUU Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F UUU Semihomology algorithm Input requirements The minimum data required for starting analysis with the algorithm of genetic semihomology are the protein sequences (at least two). The more sequences are used for analysis and multiple alignment construction - the more concise and accurate the results are. Although the nucleotide sequences of the genes are not necessarily required, it is very helpful if the nucleotide sequences are known at least for some of the analyzed proteins. That increases significantly the amount of the information accessible from such analysis. Also the results are the best for sequences revealing sufficiently high degree of diversity. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC 1) V N C S L Y A S G ATT GAT TGC TCT CCG TAC CTC CAA 2) I D C S P Y L Q I G K D G T S W V A GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC - V V R D G N T M V A W V M V A A SCORE % 1 7/19 36.8 UNITARY MATRIX V I N D C C S S L P Y Y A S G I G K <L Q V V R> D D G G T N S T 0 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 GENETIC CODE MATRIX GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC ATT GAT TGC TCT CCG TAC CTC < CAA > GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC 2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9 W V M V A A 42/97 42/89 20/38 43.3 47.2 52.6 34/57 59.6 PAM250 SCORING V I N D C C S S L P Y Y A L S G <Q> I V G V K R D D G G T N S T 1 1 2 2 0 2 0 0 1 0 1 2 2 1 1 0 0 2 2 W V M V A A GENETIC SEMIHOMOLOGY V I N D C C S S L P Y Y A L S G <Q> I V G V K R D D G G T N S T 2 2 3 3 2 3 0 0 2 1 2 3 3 1 1 0 0 3 3 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Semihomology algorithm Advantages of the approach The results obtained by using this method are more comprehensive than those of the methods used currently and reflect the actual mechanism of protein differentiation and evolution. They concern: 1) location of homologous and semihomologous sites in compared proteins, 2) precise estimation of gap location in non-identical fragments of different length, 3) analysis of internal homology and semihomology, 4) precise location of domains in multidomain proteins, 5) estimation of genetic code of non homologous fragments, Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Semihomology algorithm Advantages of the approach 6) construction of genetic probes, 7) studies on differentiation processes among related proteins, 8) estimation of the degree of relationship among related proteins, 9) studies on the evolution mechanism within homologous protein families, 10) confirmation of the actual relationship between sequences revealing low degree of identity/similarity. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Semihomology algorithm Advantages of the approach Application of the semihomology approach has led to discovery and describing some important mechanisms affecting the protein evolution and differentiation. The most important are: 1) the mechanism and role of cryptic mutations at unusually variable positions (Leluk; 2000b-c) 2) the phenomenon of very long distance (dispersed) mutational correlation within sets of variable positions (Leluk 2000a; Leluk and Grabiec, 2001; Leluk et al., 2001b; Leluk et al., 2002) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Semihomology algorithm Limitations of the approach The limitations in use of the semihomology approach appear in case if: - there are too few sequences taken for analysis - the identity degree among the sequences is too high (too low diversity) - the long fragments of the compared sequences show no identity at all (e.g. N-terminal signal fragments of homologous proteins) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein variability at the amino acid level. The amino acid codons do contain the information about the „ancestral” amino acids, whose codons were the starting point to the codon of current residue. It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary variability. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Software based on genetic semihomology algorithm GEISHA - Protein sequence similarity and homology analysis. - Multiple alignment construction on the basis of genetic relationships between amino-acids. - Analysis of variability within homologous protein families - Molecular phylogenetic studies Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Software based on genetic semihomology algorithm FQS Semihomology tool http://www.fqspl.com.pl/sh/ Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The semihomologous correlation between amino acids occurring at selected non-homologous positions of inhibitors from squash seeds. 19. DEGQK G D E K 25. NHEQDS E Q D H 7. LYW L W 14. RSDA D A S R or N Q Y R S D A S The solid lines indicate the transition type of semihomology, the dashed lines refer to transversion type of semihomology. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Application of genetic semihomology algorithm for identifying the fragments of possible different mechanism than single point mutation 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. VILE NDH C [STR][D] LPKQE YF ALPKQ SQTK GTRS– IVKNT GVSTL KRTQ– DGN– G– TNRKE– STLAQP WMLIV– 1. VTI 2. [A][R]– 3. C 4. PT 5. [RM][F] 6. [NI][E] 7. [L][Y] 8. KSLQDV 9. [P][E] 10. [V][H] 11. C 12. GA 13. TS 14. DN 15. GS 16. SFV 17. T 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Y SDA NS [ED][R] C GSTF ILF C [L][A][N] [YH][A] NY RAILV EQ HQLS GHRN ATR [NHST][E] 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. VIL ESKAGN [K][L] ELKSRV [YHS][K] [DN][M] GA EKRA C RKE PLQE KERD [ISV][H] [VG][PT] [MEK][PS] Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Dot matrix pairwise alignment Noise reduction Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Dot matrix pairwise alignment Internal homology (gene multiplication) BLAST 2 SEQUENCES SEMIHOM Chicken ovoinhibitor precursor (7 domains) Chicken ovomucoid precursor (3 domains) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Dot matrix comparison of selected homologous Kazal inhibitors ovoinhibitor-ovoinhibitor ovoinhibitor-ovomucoid ovoinhibitor-PSTI BLASTP 2.0.9 (Blosum62) SEMIHOM (algorithm of genetic semihomology) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Consecutive steps of dot matrix results obtained from comparison of chicken ovoinhibitor with itself by program SEMIHOM Raw dot matrix adding semihomology Noise filtering marking of whole fragments Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Precise gap location in non-identical fragments of different length by using the genetic semihomology algorithm. The compared proteins are trypsin inhibitors from squash seeds - CPGTI-I (horizontal) and CSTI-IIb (vertical). ? RVCPKILMECKKDSDCLAECICLEH-GYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS RVCPKILMECKKDSDCLAECICLEH-GYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS ? RVCPKILMECKKDSDCLAECICLE-HGYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS identities only identities and semihomology CPGTI-I has a non homologous His25 while CSTI-IIb possesses at relative site two non homologous residues Asp25 and Ile26. The analysis of semihomology shows the genetic relationship between His25 and Asp25, therefore a gap in CPGTI-I should be located next to Ile26 in CSTI-IIb. Window size = 10; minimum number of homologous positions in window = 4. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Multiple alignment Proteinase inhibitors from squash seeds Acc.number sp|P01074 sp|P07853 sp|P07853 sp|P10293 sp|P10293 sp|P10293 sp|P10291 sp|P10292 sp|P11969 sp|P11968 sp|P17680 sp|P12071 sp|P10294 RVCPRILMECKKDSDCLAECVCLEH-GYCG RVCPRILMKCKKDSDCLAECVCLEH-GYCG HEERVCPRILMKCKKDSDCLAECVCLEH-GYCG RVCPKILMECKKDSDCLAECICLEH-GYCG RVCPKILMECKKDSDCLAECICLEH-GYCG HEERVCPKILMECKKDSDCLAECICLEH-GYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS MMCPRILMKCKHDSDCLPGCVCLEHIEYCG GRRCPRIYMECKRDADCLADCVCLQH-GICG RGCPRILMRCKRDSDCLAGCVCQKN-GYCG GICPRILMECKRDSDCLAQCVCKRQ-GYCG GCPRILMRCKQDSDCLAGCVCGPN-GFCG <ERRCPRILKQCKRDSDCPGECICMAH-GFCG Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Multiple alignment of seven chicken ovoinhibitor domains obtained with Markovian and nonMarkovian methods Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Comparison of multiple alignment consensus results for three inhibitor families achieved with ClustalW, Multalin (both applied BLOSUM62) and algorithm of genetic semihomology Consensus alignment results for Bowman Birk inhibitors 1 10 20 30 40 50 60 ClustalW .. ...**: * ** * ** * * *: ::*** *. * *: * * :* * * .*** *: Multalin .d....aCC#.C.CTkS.PPqCrC.Dir.#tCHSaCksCiCtrS.PpqCrC.Dtt.FCYk.C. GS AA S.s.SSS**S.*S**s*S**.*S*S*SS*.S***S*.S*s*Ss*s*SS*s*s*SSS***ss*S GS NA sss.SSSS*S.*.**.*SS*sSs*s*SS*sSSS*SSssSsSSsSsSSSS.SsSSsS*SSs.Ss Consensus alignment results for squash inhibitors 1 ClustalW Multalin GS AA 10 20 30 ** * * : ** * * ** ....r.CPrIlm.Ck.DsDCla.C.Cl...G.CG.. SS**S*sSS*SsSs**SSS*S*SSS.SS** Consensus alignment results for ovoinhibitor domains (Kazal type) 1 10 20 30 40 50 60 ClustalW .* : *. *.::. ** . * :* : : . * Multalin .dcs.y......dg...vaCp.il.pvCgt#gvTYsneC..Cahn.#..t...k..dg.C...... GS AA SS*sSSSSSSSSS*SSSS.*Sss..ss*sSSSS**sSs*sS*.sSSSsSSsSS.sssSS*SSSsss GS NA sSSs.S.S....sSSssss*Sss..ssssSSSSSSsSsS.sS.sS.S...sSSs..sSSSssss.s Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The dot matrix comparison of human erythrocyte alpha-spectrin with itself For the best visualization of the repeats, the identity threshold is set as 15, and frame size as 75 -spectrin 500 1000 1500 2000 -spectrin 500 1000 1500 2000 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The multiple alignment of proteins homologous to the 10th segment of human erythrocyte -spectrin Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The multiple alignment of human erythrocyte α-spectrin 23 segments achieved with the MultAlin (BLOSUM62) method and genetic semihomology approach Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The dot matrix comparison of human erythrocyte α-spectrin with the consensus of 106-residue repeats 500 1000 -spectrin 1500 2000 20 60 100 The consensus described by Sahr et al. [1990]. The identity threshold and frame size are set as 8 and 40 respectively. 500 -spectrin repeats consensus (genetic semihomology) -spectrin repeats consensus (Sahr et al) -spectrin 1000 1500 2000 20 60 100 The consensus achieved with genetic semihomology approach [Leluk, 1998]. The identity threshold and frame size are set as 8 and 40 respectively. The identity and genetic semihomology of the compared residues is visualized Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence obtained with MultAlin program (BLOSUM62) Score Sequences producing significant alignments: (bits) sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 17 sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 16 sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 16 sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 16 sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 16 sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 16 sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 16 sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 16 sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 15 sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 14 sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 14 sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 14 sp|P39254|Y04O_BPT4 HYPOTHETICAL 36.3 KD PROTEIN IN NRDC-M... 11 E-value 21352 27970 27970 27970 27970 27970 36639 47996 62874 141334 185142 185142 935541 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence attained by Sahr et al. [1990] Sequences producing significant alignments: sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE sp|P05095|AACT_DICDI ALPHA-ACTININ 3, NON MUSCULAR (F-ACTIN... sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... sp|P30427|PLEC_RAT PLECTIN sp|P31670|GT27_FASHE GLUTATHIONE S-TRANSFERASE 26 KD 47 (GS... sp|P46125|YEDI_ECOLI HYPOTHETICAL 32.2 KD PROTEIN IN DSRB-V... sp|P42094|PHYT_BACSU 3-PHYTASE PRECURSOR (PHYTATE 3-PHOSPHA... sp|P56288|E2BG_SCHPO PROBABLE TRANSLATION INITIATION FACTOR... sp|O00273|DFFA_HUMAN DNA FRAGMENTATION FACTOR ALPHA SUBUNI... sp|P12311|ADH1_BACST ALCOHOL DEHYDROGENASE (ADH-T) Score E-value (bits) 43 2e-04 40 0.001 39 0.003 39 0.004 37 0.012 37 0.016 36 0.021 35 0.048 35 0.063 34 0.082 34 0.11 32 0.32 21 797 20 1791 19 3073 19 4026 19 4026 18 6908 18 6908 18 9049 18 9049 18 9049 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence attained by genetic semihomology algorithm Score Sequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 49 sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 47 sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 42 sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 41 sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 38 sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 37 sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 36 sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 36 sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 33 sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 sp|Q99001|AACB_CHICK ALPHA-ACTININ, BRAIN ISOFORM (F-ACTIN ... 20 sp|P35609|AAC2_HUMAN ALPHA-ACTININ 2, SKELETAL MUSCLE ISOF... 20 sp|P12814|AAC1_HUMAN ALPHA-ACTININ 1, CYTOSKELETAL ISOFORM ... 20 sp|P05094|AACT_CHICK ALPHA-ACTININ, SMOOTH MUSCLE ISOFORM (... 20 sp|P30427|PLEC_RAT PLECTIN 19 sp|P20111|AACS_CHICK ALPHA-ACTININ, SKELETAL MUSCLE ISOFORM... 19 sp|P47493|SYG_MYCGE GLYCYL-TRNA SYNTHETASE (GLYCINE--TRNA L... 18 sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 18 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra E-value 3e-06 1e-05 3e-04 6e-04 0.005 0.014 0.019 0.024 0.032 0.094 0.094 0.21 1195 1566 1566 1566 1566 2687 3520 4611 7913 Kinase project Multiple alignment of hexokinase domains HEXOKINASE_I (2 domains) HEXOKINASE_II (2 domains) HEXOKINASE_III (2 domains) HEXOKINASE_IV (1 domain) Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Kinase project Complex comparative studies at the primary structure level Construction of molecular phylogenetic trees Studies on sequence/structure/function relationship Studies on the mechanisms of correlated mutations and variability Genetic principles of differentiation within this protein family Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Simplified (planar) diagram of genetic relationships between amino acids IM T RS KN V A G DE L P R HQ FL S CW In planar diagram the encoding role of the third codon position is ignored. Y Only first two codon positions are taken into account. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Simplified (planar) diagram of genetic relationships between amino acids The simplified planar diagram emphasizes the special encoding character of sixcodon amino acids – Leu, Arg and Ser. IM IMT T V V RS A RS A KN KN The six-codon amino acids may play the role Lthe of „mutational passages” that are not liable toConclusions? the selection restrictions. G DE G L FL P P R S DE R HQ These amino FL acids may influence on the variability range increase. CW Y HQ CW Y S In fact the six-codon amino acids occur unusually frequent at very variable positions. This concerns especially serine, and to lesser extent – arginine. Leucine does not show the correlation between the frequency of occurrence and variability range. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Frequency of six-codon amino acids as a function of position variability in randomly selected proteins of different origin and nature The results for 2686 residues at 606 corresponding positions Ser ALL SEQUENCES (2686 residues at 606 positions) (discrete data) 100 Arg Leu % of occurrence 80 60 40 20 0 1 2 3 4 5 6 7 8 Number of residues occurring at aligned position Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Phylogenetic tree verification The ovoinhibitor domains homology comparison. The similarity scores (%) for the aligned DNA-coding sequences are shown above the diagonal. Below diagonal are the similarity scores for the aligned amino acid sequences. The results obtained by Scott et al. I I II III IV V VI VII 53 56 47 44 38 25 II 61 64 56 44 41 28 III 64 66 61 56 52 26 IV 54 65 66 I 50 42 II 25 V 54 57 59 55 VI 53 54 66 57 58 VII 45 45 42 39 42 40 42 III IV I II V VI 26 29 OI OM VII III PSTI OI OM Scott M. J., Huckaby C. S., Kato I., Kohr W. J., Laskowski M. Jr, Tsai M.-J. and O'Malley B. W. (1987) J. Biol. Chem. 262, 5899-5907 The results obtained by semihomology analysis I I II III IV V VI VII 72 75 68 67 60 52 II 52 81 71 66 65 55 III 57 64 77 76 75 53 IV 39 59 56 68 V 45 66 51 49 III IV I70V VI II 68 52 OI 52 VI 42 65 62 45 46 56 I VII 33 55 33 30 31 38II OM VII III OI OM Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra PSTI Virtual reverse translation Deducing the genetic code QRCRRDSDC KKCRMDSDC Arg codons: AGA AGG CGA CGG CGC CGT Met codons: ATG Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Virtual reverse translation Deducing the genetic code QRCRRDSDC KKCRMDSDC Arg codons: AGA AGG CGA CGG CGC CGT Met codons: ATG Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Virtual reverse translation K E K E N D R H D G Y H Y – R 1 R G S R G S 3 2 – Q N AGCU – Q T R G A T P T P T A S P L V S L L V I S P V I C S A M C R A I W L L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F Virtual reverse translation K E K E N D R H D G Y H Y – R 1 R G S R G S 3 2 – Q N AGCT – Q T R G A T P T P T A S P L V S L L V I S P V I C S A M C R A I W L L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F Virtual reverse translation K E K E N D AGA H D G Y H Y – CGA 1 AGG G S CGG G S 3 2 – Q N AGCT – Q T CGC G A T P T P T A S P L V S L L V I S P V I C S A ATG C CGT A I W L L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F Virtual reverse translation K E K E N D AGA 1 D AGG H G S P T P T S P A V S P L V S L L V I C S A I C CGT A ATG W CGC A I – G T Y CGG G T Y CGA S 2 H G 3 – Q N AGCT – Q L L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F Genetic relationships between Arg and Met/Gln K Q E K Q E N D N AGCU 1 R D R H – G S R A P T P T A S P L V S L L V I S P V I C S A M C R A I W G T Y R G T Y R S 2 – H G 3 – L L V F L Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra F Genetic semihomology prediction of genetic code for chicken ovoinhibitor domains. I II III IV V VI VII 1 4 5 7 8 12 13 15 22 33 37 49 50 59 61 62 63 66 V I V I L L E S S S D T S R L P K Q Q K E A L P P L K Q S Q S T S T K K R K T Q K - D D D G N D - T N R K E R - R R R R F M M S F F F V V V N N N N N S N G H R G G R N A T T T T T R E E K R R R A R K R K R E P L Q E E E K E E R E D M S E P E K ATA AGY CCR GCR ACR ACR GGY AGR AGG GTY AGY GGY AGR AGR AGR CCR AGR AAG GTA CGY CTR CCR TCR TCY CGY ACR GCR CTR GAG GAA CTR GCR YTA Prediction accuracy = 69.7% The predicted gene sequence is compared with the known DNA sequence. Only those positions where prediction reduces possible codons are considered. The predicted codons that are consistent with those present in the ovoinhibitor gene are shadowed. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (1) Leluk, J., Pham, T.-C. (1985) Calculation and analysis of protein sequence homology using microcomputer ZX Spectrum. VIIth Polish Conferrence "Chemistry of Amino Acids and Peptides", Abstracts, 83 Kubiak, Z.J., Leluk, J. (1986) Application of the ZX Spectrum microcomputer for prediction of the secondary structure of proteins. Pr. Nauk. Inst. Chem. Nieorg. Metal. Pierwiastków Rzadkich. Wrocław 55, 186-189 Leluk, J. (1993) Analysis of protein primary structure using program HOMOLOGY. XXIXth Meeting of Polish Biochemical Society, Abstracts, 384 Leluk, J., Krowarsch, D. (1994) A new algorithm for analysis the homology in protein primary structure, 3rd Conferrence "Computers in Chemistry", Wrocław, Poland, Abstracts, 60. Leluk, J. (1994) Application of program SEMIHOM based on a new algorithm for homology analysis of protein primary structure. Ist Conference "Computerbased Scientific Research", Wrocław, Poland, Abstracts, 189-194. . Leluk, J. (1996) Comparison of the algorithm of genetic semihomology with currently applied algorithms for protein homology analysis. IIIrd Conferrence "Computer-based Scientific Research", Wrocław, Polanica-Zdrój, Abstracts, 5358. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (2) Leluk, J., Kuźnicki, T. (1996) Analysis of homology of selected proteinase inhibitor families using programs HOMOLOGY and SEMIHOM. IVth Conferrence "Computers in Chemistry '96", Polanica-Zdrój, Abstracts, 37. Leluk, J. (1998) A new algorithm for analysis of the homology in protein primary structure. Computers & Chemistry, 22, 123-131. Leluk, J. (1999a) Mutational variability in proteins and the Markovian model of replacement. 5th International Conference "Computers in Chemistry '99", Szklarska Poręba, Poland, Abstracts, P47. Leluk, J. (1999b) Studies on mutational regularities in selected proteinase inhibitory families. 1st Polish Congress of Biotechnology, Wrocław, Poland, Abstracts (lectures), 183-184. Leluk, J. (2000a) Serine proteinase inhibitor family in squash seeds: Mutational variability mechanism and correlation. Cell. Mol. Biol. Lett., 5, 91-106. Leluk, J. (2000b) A non-statistical approach to protein mutational variability. BioSystems, 56, 83-93. Leluk, J. (2000c) Regularities in mutational variability in selected protein families and the Markovian model of amino acid replacement. Computers & Chemistry, 24, 659-672. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (3) Leluk, J., Grabiec, M., Sobczyk, M. (2000) The application of genetic semihomology algorithm for theoretical studies on various protein families. International Conference on Conformation of Peptides, Proteins and Nulceic Acids, August 29 - September 2, 2000, Debrzyno, Poland - Abstracts, p. 27. Leluk, J., Hanus-Lorenz, B. Sikorski, A.F. (2001a) Application of genetic semihomology algorithm to theoretical studies on various protein families. Acta Biochim. Polon., 48, 21-33. Leluk, J. (2001a) An introduction to theoretical studies on protein primary structure. International Workshop “Information Theory Days” Warsaw, April 23-29 2001. Leluk, J. (2001b) Benefits from theoretical comparative studies on diverse protein families. International Workshop “Information Theory Days” Warsaw, April 23-29 2001. Leluk, J. and Grabiec, M. (2001) Sequence similarity estimation and correlated mutations in selected protein families. I. An approach to protein sequence similarity estimation. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5. Hanus-Lorenz, B., Hryniewicz-Jankowska, A., Leluk, J., Lorenz, M., Skała, J. and Sikorski, A.F. (2001) Spectrin motifs are detected in plant and yeast genomes, 8th International W. Mejbaum-Katzenellenbogen's Seminar on Membrane Skeleton and Its Regulatory Functions, Szklarska Poręba 2001, Cell. Mol. Biol. Lett., 6, 207. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (4) Leluk, J., Sobczyk, M. and Becella, Ł. (2001b) Sequence similarity estimation and correlated mutations in selected protein families. II. Correlated mutations in selected protein families. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5. Leluk, J., Sobczyk, M. and Becella, Ł. (2002) Correlated mutations in selected protein families, TASK Quarterly, , 6(3), 469-482. Leluk, J., Konieczny, L. and Roterman, I. (2003) Search for structural similarity in proteins, Bioinformatics, 19(1), 117-124. Barbara Bereza, Agnieszka Kubiak, Jacek Leluk, Wacław Hendrich (2003) Photoenzyme Protochlorophyllide-NADPH oxidoreductase (LPOR) – the key for chlorophyll biosynthesis, Postępy Biochemii, 49, 46-55. Jacek Kuśka, Jacek Leluk (2003) Theoretical studies on sugar and protein kinases: Structural and mutational variability, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 592 Anna Zdyb, Jacek Leluk (2003) Comparative studies on sugar and protein kinases: Genetic relationships and consensus sequences, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 593 Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (5) Jacek Leluk, Bogdan Lesyng (2003) A comparative study of the primary structures of protein and sugar kinases, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 620-621 A. Czogalla, P. Kwołek, A. Hryniewicz-Jankowska, M. Nietubyć, J. Leluk, A.F. Sikorski (2003) A protein isolated from Escherichia coli, identified as GroEL, reacts with anti-beta spectrin antibodies, Arch Biochem Biophys., 415, 94100. M. Kukuła, B. Hanus-Lorenz, E. Bok, J. Leluk, and A. F. Sikorski (2004) Proteins with Spectrin Motifs Which Do Not Belong to the Spectrin-a-ActininDystrophin Family, Z. Naturforsch., 59c, 565-571 Jacek Leluk (2002) Studies on length-dependent estimation of sequence identity significance, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 198-199. Anna Zdyb, Jacek Leluk (2002) Theoretical studies on homology among kinase family, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215. Jacek Kuśka, Jacek Leluk (2002) Contact-variability-structure relationship within the kinase molecule, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (6) Jacek Kuśka, Jacek Leluk (2003) Studies on kinase structure and mutational variability, 2nd Polish Congress of Biotechnology, Łódź, Poland 2327.06.2003, Abstracts, 75. Anna Zdyb, Jacek Leluk (2003) Kinases – Comparative analysis, genetic relationships and consensus sequence, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75. Jacek Leluk, Bogdan Lesyng (2003) Comparative analysis of homologous kinases, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 235. Jacek Leluk (2003) Wrong assumptions and misinterpretations in molecular biology, biochemistry and bioinformatics, Gliwice Scientific Meetings 2003, Gliwice, Poland, Abstracts, 16 Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2004) Construction of consensus sequences of the β-spectrin family with variable parameterthresholds and validation of the applied procedure, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29-30. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (7) Adam Górecki, Jacek Leluk, Bogdan Lesyng (2004) A Java-implementation of a genetic semihomology algorithm (GEISHA), and its applications for analyses of selected protein families, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30. Jacek Leluk, Artur Mikołajczyk (2004) A new approach to sequence comparison and similarity estimation, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29. Agata Meglicz, Jacek Leluk, Bogdan Lesyng (2004) Protein inhibitors of kinases – homology analysis, mechanisms of differentiation and correlated mutations, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30. Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2005) β-Spectrin consensus sequence construction with variable threshold parameters; verification of their usefulness; The International Conference Sequence-Structure-Function Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 12. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra References (8) Jacek Kuśka, Jacek Leluk, Bogdan Lesyng (2005) The variability patterns in kinase subfamilies; The International Conference Sequence-StructureFunction Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 23. Elżbieta Gajewska, Jacek Leluk (2005) An approach to sequence similarity significance estimation; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 14. Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra