Download Document

Document related concepts
no text concepts found
Transcript
K
E
–
Q
Genetic semihomology
algorithm
K
E
N
D
N
R
–
Q
H
D
G
Y
H
Y
–
R
R
G analysis ofR protein sequences
W
A new approach to the comparative
S
G
S
R
G
C
R
C
T
A
P point of view
S
It is admitted that
from evolutionary
the
A ‘language’
P have evolved
S
genetic code andTamino acid
T also actA with strict
P coherence
S
simultaneously. They
T
P
with each other. Therefore,
in Aanalysis of
protein S
differentiation Iand variability,
both Llevels should
be
V
L
considered simultaneously.
M
V
L
L
I
V
I
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
The tools currently used for
comparative sequence analysis
apply the Markovian model of amino
acid replacement and are based on
stochastic matrices of the observed
amino acid substitution frequencies
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
BLOSUM62 matrix of amino acid replacements
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
A
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
R
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
N
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
D
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
C
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
Q
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
E
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
G
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
H
4
2
-3
1
0
-3
-2
-1
-3
-1
3
I
4
-2
2
0
-3
-2
-1
-2
-1
1
L
5
-1
-3
-1
0
-1
-3
-2
-2
K
5
0
-2
-1
-1
-1
-1
1
M
6
-4
-2
-2
1
3
-1
F
7
-1
-1
-4
-3
-2
P
4
1 5
-3 -2 11
-2 -2 2 7
-2 0 -3 -1 4
S T W Y V
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
BLAST protein search - output
BLASTP 2.2.2 [Dec-14-2001]
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology
algorithm
The aim of the new algorithm elaboration is to overcome the
basic disadvantages of protein sequence analysis tools and
to exclude some basic errors in the assumptions of the
existing statistical methods.
It is to be able to explain the mechanism and pathway of
protein evolution and differentiation, not only limited to the
description of the initial and final step of the observed
changes.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology
algorithm
Another goal of this algorithm is to make it applicable to any
group of proteins of any nature, function and location.
It can be achieved for two reasons:
1) minimization of basic assumptions limited to the general
amino acid: codon translation table and assuming that
single point mutation is a principle, most common,
mechanism of protein variability;
2) non-statistical approach (no stochastic matrices)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology
algorithm
The algorithm of genetic semihomology assumes the close
relation between the compared amino acids and their
codons in related proteins. The algorithm is based on the
network of genetic relationship between amino acids.
Such assumption makes the same residues at different
positions of the sequence unequal with respect to their
changeability.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology
algorithm
The algorithm assumes that the basic mechanism of
differentiation among related proteins consists in the single
point mutation. The general part of the algorithm is the
three-dimensional diagram reflecting the network of genetic
relationship between amino acids
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Diagram
Diagram
of of
amino
codon
acid
genetic
genetic
relationships
relationships
K
AAA
E
GAA
K
AAG
E
GAG
N
AAC
R
AGA
1
D
GAU
T
ACA
I
AUA
M
AUG
A
GCU
S
UCC
P
CCU
L
CUA
S
UCU
L
UUA
L
CUG
V
GUC
I
AUU
S
UCG
P
CCC
V
GUG
I
AUC
C
UGU
S
UCA
P
CCG
A
GCC
V
GUA
C
UGC
R
CGU
P
CCA
T
ACU
W
UGG
R
CGC
A
GCG
T
ACC
Y
UAU
–
UGA
G
GGU
A
GCA
T
ACG
H
CAU
R
CGG
G
GGC
S
AGU
3
Y
UAC
R
CGA
G
GGG
S
AGC
2
H
CAC
G
GGA
R
AGG
–
UAG
Q
CAG
D
GAC
N
AAU
AGCU
–
UAA
Q
CAA
L
UUG
L
CUC
V
GUU
F
UUC
L
CUU
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
UUU
Semihomology algorithm
Input requirements
The minimum data required for starting analysis with the
algorithm of genetic semihomology are the protein sequences (at
least two). The more sequences are used for analysis and
multiple alignment construction - the more concise and accurate
the results are. Although the nucleotide sequences of the genes
are not necessarily required, it is very helpful if the nucleotide
sequences are known at least for some of the analyzed proteins.
That increases significantly the amount of the information
accessible from such analysis.
Also the results are the best for sequences revealing sufficiently
high degree of diversity.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid
using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology
GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC
1)
V
N
C
S
L
Y
A
S
G
ATT GAT TGC TCT CCG TAC CTC CAA
2)
I
D
C
S
P
Y
L
Q
I
G
K
D
G
T
S
W V
A
GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC
-
V
V
R
D
G
N
T
M
V
A
W V
M V
A
A
SCORE
%
1
7/19
36.8
UNITARY MATRIX
V
I
N
D
C
C
S
S
L
P
Y
Y
A S G I G K
<L Q V V R>
D
D
G
G
T
N
S
T
0
0
1
1
0
1
0
1
1
0
0
0
0
0
0
0
0
1
GENETIC CODE MATRIX
GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC
ATT GAT TGC TCT CCG TAC CTC < CAA > GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC
2
2
3
0
2
2
1
0
0
1
1
1
3
2
1
1
1
3
3
29/57
50.9
W V
M V
A
A
42/97
42/89
20/38
43.3
47.2
52.6
34/57
59.6
PAM250 SCORING
V
I
N
D
C
C
S
S
L
P
Y
Y
A
L
S G
<Q>
I
V
G
V
K
R
D
D
G
G
T
N
S
T
1
1
2
2
0
2
0
0
1
0
1
2
2
1
1
0
0
2
2
W V
M V
A
A
GENETIC SEMIHOMOLOGY
V
I
N
D
C
C
S
S
L
P
Y
Y
A
L
S G
<Q>
I
V
G
V
K
R
D
D
G
G
T
N
S
T
2
2
3
3
2
3
0
0
2
1
2
3
3
1
1
0
0
3
3
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithm
Advantages of the approach
The results obtained by using this method are more
comprehensive than those of the methods used currently and
reflect the actual mechanism of protein differentiation and
evolution. They concern:
1) location of homologous and semihomologous sites in
compared proteins,
2) precise estimation of gap location in non-identical
fragments of different length,
3) analysis of internal homology and semihomology,
4) precise location of domains in multidomain proteins,
5) estimation of genetic code of non homologous fragments,
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithm
Advantages of the approach
6) construction of genetic probes,
7) studies on differentiation processes among related
proteins,
8) estimation of the degree of relationship among related
proteins,
9) studies on the evolution mechanism within homologous
protein families,
10) confirmation of the actual relationship between
sequences revealing low degree of identity/similarity.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithm
Advantages of the approach
Application of the semihomology approach has led to
discovery and describing some important mechanisms
affecting the protein evolution and differentiation. The
most important are:
1) the mechanism and role of cryptic mutations at unusually
variable positions (Leluk; 2000b-c)
2) the phenomenon of very long distance (dispersed)
mutational correlation within sets of variable positions
(Leluk 2000a; Leluk and Grabiec, 2001; Leluk et al., 2001b;
Leluk et al., 2002)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithm
Limitations of the approach
The limitations in use of the semihomology approach appear
in case if:
- there are too few sequences taken for analysis
- the identity degree among the sequences is too high (too
low diversity)
- the long fragments of the compared sequences show no
identity at all (e.g. N-terminal signal fragments of
homologous proteins)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The analysis of genetic semihomology excludes
applicability of Markov model for the studies on protein
variability at the amino acid level.
The amino acid codons do contain the information about
the „ancestral” amino acids, whose codons were the
starting point to the codon of current residue.
It refers mainly to the positions undergoing single-point
mutations as the most basic mechanism of evolutionary
variability.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Software based on genetic semihomology
algorithm
GEISHA
- Protein sequence similarity and homology analysis.
- Multiple alignment construction on the basis of genetic relationships
between amino-acids.
- Analysis of variability within homologous protein families
- Molecular phylogenetic studies
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Software based on genetic semihomology
algorithm
FQS Semihomology tool
http://www.fqspl.com.pl/sh/
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The semihomologous correlation between amino acids
occurring at selected non-homologous positions of
inhibitors from squash seeds.
19. DEGQK
G
D
E
K
25. NHEQDS
E
Q
D
H
7. LYW
L
W
14. RSDA
D
A
S
R
or
N
Q
Y
R
S
D
A
S
The solid lines indicate the transition type of semihomology,
the dashed lines refer to transversion type of semihomology.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Application of genetic semihomology algorithm for
identifying the fragments of possible different mechanism
than single point mutation
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
VILE
NDH
C
[STR][D]
LPKQE
YF
ALPKQ
SQTK
GTRS–
IVKNT
GVSTL
KRTQ–
DGN–
G–
TNRKE–
STLAQP
WMLIV–
1. VTI
2. [A][R]–
3. C
4. PT
5. [RM][F]
6. [NI][E]
7. [L][Y]
8. KSLQDV
9. [P][E]
10. [V][H]
11. C
12. GA
13. TS
14. DN
15. GS
16. SFV
17. T
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Y
SDA
NS
[ED][R]
C
GSTF
ILF
C
[L][A][N]
[YH][A]
NY
RAILV
EQ
HQLS
GHRN
ATR
[NHST][E]
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
VIL
ESKAGN
[K][L]
ELKSRV
[YHS][K]
[DN][M]
GA
EKRA
C
RKE
PLQE
KERD
[ISV][H]
[VG][PT]
[MEK][PS]
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix pairwise alignment
Noise reduction
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix pairwise alignment
Internal homology (gene multiplication)
BLAST 2 SEQUENCES
SEMIHOM
Chicken
ovoinhibitor
precursor
(7 domains)
Chicken
ovomucoid
precursor
(3 domains)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix comparison of selected homologous Kazal inhibitors
ovoinhibitor-ovoinhibitor
ovoinhibitor-ovomucoid
ovoinhibitor-PSTI
BLASTP 2.0.9 (Blosum62)
SEMIHOM (algorithm of genetic semihomology)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Consecutive steps of dot matrix results obtained from comparison
of chicken ovoinhibitor with itself by program SEMIHOM
Raw dot matrix
adding semihomology
Noise filtering
marking of whole fragments
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Precise gap location in non-identical fragments of different length by using the
genetic semihomology algorithm. The compared proteins are trypsin inhibitors
from squash seeds - CPGTI-I (horizontal) and CSTI-IIb (vertical).
?
RVCPKILMECKKDSDCLAECICLEH-GYCG
MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
RVCPKILMECKKDSDCLAECICLEH-GYCG
MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
?
RVCPKILMECKKDSDCLAECICLE-HGYCG
MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
identities only
identities and semihomology
CPGTI-I has a non homologous His25 while CSTI-IIb possesses at relative site two non
homologous residues Asp25 and Ile26.
The analysis of semihomology shows the genetic relationship between His25 and Asp25,
therefore a gap in CPGTI-I should be located next to Ile26 in CSTI-IIb. Window size = 10;
minimum number of homologous positions in window = 4.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Multiple alignment
Proteinase inhibitors from squash seeds
Acc.number
sp|P01074
sp|P07853
sp|P07853
sp|P10293
sp|P10293
sp|P10293
sp|P10291
sp|P10292
sp|P11969
sp|P11968
sp|P17680
sp|P12071
sp|P10294
RVCPRILMECKKDSDCLAECVCLEH-GYCG
RVCPRILMKCKKDSDCLAECVCLEH-GYCG
HEERVCPRILMKCKKDSDCLAECVCLEH-GYCG
RVCPKILMECKKDSDCLAECICLEH-GYCG
RVCPKILMECKKDSDCLAECICLEH-GYCG
HEERVCPKILMECKKDSDCLAECICLEH-GYCG
MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
MMCPRILMKCKHDSDCLPGCVCLEHIEYCG
GRRCPRIYMECKRDADCLADCVCLQH-GICG
RGCPRILMRCKRDSDCLAGCVCQKN-GYCG
GICPRILMECKRDSDCLAQCVCKRQ-GYCG
GCPRILMRCKQDSDCLAGCVCGPN-GFCG
<ERRCPRILKQCKRDSDCPGECICMAH-GFCG
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Multiple alignment
of seven chicken
ovoinhibitor domains
obtained with
Markovian and nonMarkovian methods
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Comparison of multiple alignment consensus results for
three inhibitor families achieved with ClustalW, Multalin (both
applied BLOSUM62) and algorithm of genetic semihomology
Consensus alignment results for Bowman Birk inhibitors
1
10
20
30
40
50
60
ClustalW
.. ...**: * ** * ** * * *: ::*** *. * *: * * :* * * .*** *:
Multalin .d....aCC#.C.CTkS.PPqCrC.Dir.#tCHSaCksCiCtrS.PpqCrC.Dtt.FCYk.C.
GS AA
S.s.SSS**S.*S**s*S**.*S*S*SS*.S***S*.S*s*Ss*s*SS*s*s*SSS***ss*S
GS NA
sss.SSSS*S.*.**.*SS*sSs*s*SS*sSSS*SSssSsSSsSsSSSS.SsSSsS*SSs.Ss
Consensus alignment results for squash inhibitors
1
ClustalW
Multalin
GS AA
10
20
30
** *
* : **
* *
**
....r.CPrIlm.Ck.DsDCla.C.Cl...G.CG..
SS**S*sSS*SsSs**SSS*S*SSS.SS**
Consensus alignment results for ovoinhibitor domains (Kazal type)
1
10
20
30
40
50
60
ClustalW .* :
*.
*.::. ** . * :*
:
:
. *
Multalin .dcs.y......dg...vaCp.il.pvCgt#gvTYsneC..Cahn.#..t...k..dg.C......
GS AA
SS*sSSSSSSSSS*SSSS.*Sss..ss*sSSSS**sSs*sS*.sSSSsSSsSS.sssSS*SSSsss
GS NA
sSSs.S.S....sSSssss*Sss..ssssSSSSSSsSsS.sS.sS.S...sSSs..sSSSssss.s
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The dot matrix comparison of human erythrocyte
alpha-spectrin with itself
For the best visualization of the repeats, the identity threshold is set as 15, and frame size as 75
-spectrin
500
1000
1500
2000
-spectrin
500
1000
1500
2000
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The multiple alignment of proteins homologous
to the 10th segment of human erythrocyte -spectrin
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The multiple alignment of human erythrocyte α-spectrin 23
segments achieved with the MultAlin (BLOSUM62) method
and genetic semihomology approach
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The dot matrix comparison of human erythrocyte
α-spectrin with the consensus of 106-residue repeats
500
1000
-spectrin
1500
2000
20
60
100
The consensus described by Sahr et al.
[1990]. The identity threshold and frame
size are set as 8 and 40 respectively.
500
-spectrin repeats consensus
(genetic semihomology)
-spectrin repeats consensus (Sahr et al)
-spectrin
1000
1500
2000
20
60
100
The consensus achieved with genetic
semihomology approach [Leluk, 1998].
The identity threshold and frame size are
set as 8 and 40 respectively. The identity
and genetic semihomology of the
compared residues is visualized
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The use of α-spectrin consensus sequence achieved with
different algorithms for sequence similarities search (BLAST)
Query sequence: α-Spectrin segment consensus sequence obtained with
MultAlin program (BLOSUM62)
Score
Sequences producing significant alignments:
(bits)
sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE
17
sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 16
sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE
16
sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,...
16
sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 16
sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 16
sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N...
16
sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN
16
sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN
15
sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 14
sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE
14
sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE
14
sp|P39254|Y04O_BPT4 HYPOTHETICAL 36.3 KD PROTEIN IN NRDC-M...
11
E-value
21352
27970
27970
27970
27970
27970
36639
47996
62874
141334
185142
185142
935541
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The use of α-spectrin consensus sequence achieved with
different algorithms for sequence similarities search (BLAST)
Query sequence: α-Spectrin segment consensus sequence attained by Sahr et al.
[1990]
Sequences producing significant alignments:
sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN
sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN...
sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,...
sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,...
sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N...
sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE
sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE
sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE
sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN
sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,...
sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ...
sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE
sp|P05095|AACT_DICDI ALPHA-ACTININ 3, NON MUSCULAR (F-ACTIN...
sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ...
sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H...
sp|P30427|PLEC_RAT PLECTIN
sp|P31670|GT27_FASHE GLUTATHIONE S-TRANSFERASE 26 KD 47 (GS...
sp|P46125|YEDI_ECOLI HYPOTHETICAL 32.2 KD PROTEIN IN DSRB-V...
sp|P42094|PHYT_BACSU 3-PHYTASE PRECURSOR (PHYTATE 3-PHOSPHA...
sp|P56288|E2BG_SCHPO PROBABLE TRANSLATION INITIATION FACTOR...
sp|O00273|DFFA_HUMAN DNA FRAGMENTATION FACTOR ALPHA SUBUNI...
sp|P12311|ADH1_BACST ALCOHOL DEHYDROGENASE (ADH-T)
Score E-value
(bits)
43 2e-04
40
0.001
39
0.003
39
0.004
37
0.012
37
0.016
36
0.021
35
0.048
35
0.063
34
0.082
34
0.11
32
0.32
21
797
20 1791
19 3073
19 4026
19 4026
18 6908
18 6908
18 9049
18 9049
18 9049
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The use of α-spectrin consensus sequence achieved with
different algorithms for sequence similarities search (BLAST)
Query sequence: α-Spectrin segment consensus sequence attained by genetic
semihomology algorithm
Score
Sequences producing significant alignments:
(bits)
sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN
49
sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 47
sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,...
42
sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N...
41
sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE
38
sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN
37
sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE
36
sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE
36
sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 36
sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34
sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34
sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE
33
sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ...
20
sp|Q99001|AACB_CHICK ALPHA-ACTININ, BRAIN ISOFORM (F-ACTIN ...
20
sp|P35609|AAC2_HUMAN ALPHA-ACTININ 2, SKELETAL MUSCLE ISOF... 20
sp|P12814|AAC1_HUMAN ALPHA-ACTININ 1, CYTOSKELETAL ISOFORM ... 20
sp|P05094|AACT_CHICK ALPHA-ACTININ, SMOOTH MUSCLE ISOFORM (... 20
sp|P30427|PLEC_RAT PLECTIN
19
sp|P20111|AACS_CHICK ALPHA-ACTININ, SKELETAL MUSCLE ISOFORM... 19
sp|P47493|SYG_MYCGE GLYCYL-TRNA SYNTHETASE (GLYCINE--TRNA L... 18
sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 18
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
E-value
3e-06
1e-05
3e-04
6e-04
0.005
0.014
0.019
0.024
0.032
0.094
0.094
0.21
1195
1566
1566
1566
1566
2687
3520
4611
7913
Kinase project
Multiple alignment of hexokinase domains
HEXOKINASE_I (2 domains)
HEXOKINASE_II (2 domains)
HEXOKINASE_III (2 domains)
HEXOKINASE_IV (1 domain)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Kinase project
Complex comparative studies at the primary structure level
Construction of molecular phylogenetic trees
Studies on sequence/structure/function relationship
Studies on the mechanisms of correlated mutations and
variability
Genetic principles of differentiation within this protein family
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Simplified (planar) diagram of genetic relationships between
amino acids
IM
T
RS
KN
V
A
G
DE
L
P
R
HQ
FL
S
CW
In planar diagram the encoding
role of the third codon position is
ignored.
Y
Only first two codon positions are
taken into account.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Simplified (planar) diagram of genetic relationships between
amino acids
The simplified planar diagram emphasizes
the special encoding character of sixcodon amino acids – Leu, Arg and Ser.
IM
IMT
T
V
V
RS
A
RS
A
KN
KN
The six-codon amino acids may play the
role Lthe of „mutational passages” that are
not liable toConclusions?
the selection restrictions.
G
DE
G
L
FL P
P
R
S
DE
R
HQ
These amino
FL acids may influence on the
variability range increase.
CW
Y
HQ
CW
Y
S
In fact
the six-codon amino acids occur
unusually frequent at very variable
positions. This concerns especially serine,
and to lesser extent – arginine.
Leucine does not show the correlation
between the frequency of occurrence and
variability range.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Frequency of six-codon amino acids as a function of position
variability in randomly selected proteins of different origin and
nature
The results for 2686 residues at 606 corresponding positions
Ser
ALL SEQUENCES
(2686 residues at 606 positions)
(discrete data)
100
Arg
Leu
% of occurrence
80
60
40
20
0
1
2
3
4
5
6
7
8
Number of residues occurring at aligned position
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Phylogenetic tree verification
The ovoinhibitor domains homology comparison. The similarity scores (%) for
the aligned DNA-coding sequences are shown above the diagonal. Below
diagonal are the similarity scores for the aligned amino acid sequences.
The results obtained by Scott et al.
I
I
II
III
IV
V
VI
VII
53
56
47
44
38
25
II
61
64
56
44
41
28
III
64
66
61
56
52
26
IV
54
65
66
I
50
42
II
25
V
54
57
59
55
VI
53
54
66
57
58
VII
45
45
42
39
42
40
42
III IV
I
II
V VI
26
29
OI
OM
VII III PSTI
OI OM
Scott M. J., Huckaby C. S., Kato I., Kohr W. J., Laskowski M. Jr, Tsai M.-J. and O'Malley B. W. (1987) J.
Biol. Chem. 262, 5899-5907
The results obtained by semihomology analysis
I
I
II
III
IV
V
VI
VII
72
75
68
67
60
52
II
52
81
71
66
65
55
III
57
64
77
76
75
53
IV
39
59
56
68
V
45
66
51
49
III IV I70V VI
II 68
52 OI 52
VI
42
65
62
45
46
56
I
VII
33
55
33
30
31
38II
OM
VII III
OI OM
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
PSTI
Virtual reverse translation
Deducing the genetic code
QRCRRDSDC
KKCRMDSDC
Arg codons: AGA AGG CGA CGG CGC CGT
Met codons: ATG
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Virtual reverse translation
Deducing the genetic code
QRCRRDSDC
KKCRMDSDC
Arg codons: AGA AGG CGA CGG CGC CGT
Met codons: ATG
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Virtual reverse translation
K
E
K
E
N
D
R
H
D
G
Y
H
Y
–
R
1
R
G
S
R
G
S
3
2
–
Q
N
AGCU
–
Q
T
R
G
A
T
P
T
P
T
A
S
P
L
V
S
L
L
V
I
S
P
V
I
C
S
A
M
C
R
A
I
W
L
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
Virtual reverse translation
K
E
K
E
N
D
R
H
D
G
Y
H
Y
–
R
1
R
G
S
R
G
S
3
2
–
Q
N
AGCT
–
Q
T
R
G
A
T
P
T
P
T
A
S
P
L
V
S
L
L
V
I
S
P
V
I
C
S
A
M
C
R
A
I
W
L
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
Virtual reverse translation
K
E
K
E
N
D
AGA
H
D
G
Y
H
Y
–
CGA
1
AGG
G
S
CGG
G
S
3
2
–
Q
N
AGCT
–
Q
T
CGC
G
A
T
P
T
P
T
A
S
P
L
V
S
L
L
V
I
S
P
V
I
C
S
A
ATG
C
CGT
A
I
W
L
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
Virtual reverse translation
K
E
K
E
N
D
AGA
1
D
AGG
H
G
S
P
T
P
T
S
P
A
V
S
P
L
V
S
L
L
V
I
C
S
A
I
C
CGT
A
ATG
W
CGC
A
I
–
G
T
Y
CGG
G
T
Y
CGA
S
2
H
G
3
–
Q
N
AGCT
–
Q
L
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
Genetic relationships between Arg and Met/Gln
K
Q
E
K
Q
E
N
D
N
AGCU
1
R
D
R
H
–
G
S
R
A
P
T
P
T
A
S
P
L
V
S
L
L
V
I
S
P
V
I
C
S
A
M
C
R
A
I
W
G
T
Y
R
G
T
Y
R
S
2
–
H
G
3
–
L
L
V
F
L
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
F
Genetic semihomology prediction of genetic code for
chicken ovoinhibitor domains.
I
II
III
IV
V
VI
VII
1
4
5
7
8
12
13
15
22
33
37
49
50
59
61
62
63
66
V
I
V
I
L
L
E
S
S
S
D
T
S
R
L
P
K
Q
Q
K
E
A
L
P
P
L
K
Q
S
Q
S
T
S
T
K
K
R
K
T
Q
K
-
D
D
D
G
N
D
-
T
N
R
K
E
R
-
R
R
R
R
F
M
M
S
F
F
F
V
V
V
N
N
N
N
N
S
N
G
H
R
G
G
R
N
A
T
T
T
T
T
R
E
E
K
R
R
R
A
R
K
R
K
R
E
P
L
Q
E
E
E
K
E
E
R
E
D
M
S
E
P
E
K
ATA AGY CCR GCR ACR ACR GGY AGR AGG GTY AGY GGY AGR AGR AGR CCR AGR AAG
GTA CGY CTR CCR TCR
TCY
CGY ACR GCR
CTR
GAG
GAA
CTR
GCR
YTA
Prediction accuracy = 69.7%
The predicted gene sequence is compared with the known DNA sequence. Only those
positions where prediction reduces possible codons are considered. The predicted
codons that are consistent with those present in the ovoinhibitor gene are shadowed.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (1)
Leluk, J., Pham, T.-C. (1985) Calculation and analysis of protein sequence
homology using microcomputer ZX Spectrum. VIIth Polish Conferrence
"Chemistry of Amino Acids and Peptides", Abstracts, 83
Kubiak, Z.J., Leluk, J. (1986) Application of the ZX Spectrum microcomputer
for prediction of the secondary structure of proteins. Pr. Nauk. Inst. Chem.
Nieorg. Metal. Pierwiastków Rzadkich. Wrocław 55, 186-189
Leluk, J. (1993) Analysis of protein primary structure using program
HOMOLOGY. XXIXth Meeting of Polish Biochemical Society, Abstracts, 384
Leluk, J., Krowarsch, D. (1994) A new algorithm for analysis the homology in
protein primary structure, 3rd Conferrence "Computers in Chemistry", Wrocław,
Poland, Abstracts, 60.
Leluk, J. (1994) Application of program SEMIHOM based on a new algorithm for
homology analysis of protein primary structure. Ist Conference "Computerbased Scientific Research", Wrocław, Poland, Abstracts, 189-194. .
Leluk, J. (1996) Comparison of the algorithm of genetic semihomology with
currently applied algorithms for protein homology analysis. IIIrd Conferrence
"Computer-based Scientific Research", Wrocław, Polanica-Zdrój, Abstracts, 5358.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (2)
Leluk, J., Kuźnicki, T. (1996) Analysis of homology of selected proteinase
inhibitor families using programs HOMOLOGY and SEMIHOM. IVth
Conferrence "Computers in Chemistry '96", Polanica-Zdrój, Abstracts, 37.
Leluk, J. (1998) A new algorithm for analysis of the homology in protein
primary structure. Computers & Chemistry, 22, 123-131.
Leluk, J. (1999a) Mutational variability in proteins and the Markovian model of
replacement. 5th International Conference "Computers in Chemistry '99",
Szklarska Poręba, Poland, Abstracts, P47.
Leluk, J. (1999b) Studies on mutational regularities in selected proteinase
inhibitory families. 1st Polish Congress of Biotechnology, Wrocław, Poland,
Abstracts (lectures), 183-184.
Leluk, J. (2000a) Serine proteinase inhibitor family in squash seeds: Mutational
variability mechanism and correlation. Cell. Mol. Biol. Lett., 5, 91-106.
Leluk, J. (2000b) A non-statistical approach to protein mutational
variability. BioSystems, 56, 83-93.
Leluk, J. (2000c) Regularities in mutational variability in selected protein
families and the Markovian model of amino acid replacement. Computers
& Chemistry, 24, 659-672.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (3)
Leluk, J., Grabiec, M., Sobczyk, M. (2000) The application of genetic semihomology
algorithm for theoretical studies on various protein families. International Conference
on Conformation of Peptides, Proteins and Nulceic Acids, August 29 - September 2,
2000, Debrzyno, Poland - Abstracts, p. 27.
Leluk, J., Hanus-Lorenz, B. Sikorski, A.F. (2001a) Application of genetic
semihomology algorithm to theoretical studies on various protein families.
Acta Biochim. Polon., 48, 21-33.
Leluk, J. (2001a) An introduction to theoretical studies on protein primary structure.
International Workshop “Information Theory Days” Warsaw, April 23-29 2001.
Leluk, J. (2001b) Benefits from theoretical comparative studies on diverse protein
families. International Workshop “Information Theory Days” Warsaw, April 23-29
2001.
Leluk, J. and Grabiec, M. (2001) Sequence similarity estimation and correlated
mutations in selected protein families. I. An approach to protein sequence similarity
estimation. Ist Summer School on “Parallel Computing in Biomolecular Simulations”,
September 1-3 2001, Gdańsk, Poland; Abstracts L-5.
Hanus-Lorenz, B., Hryniewicz-Jankowska, A., Leluk, J., Lorenz, M., Skała, J. and
Sikorski, A.F. (2001) Spectrin motifs are detected in plant and yeast genomes, 8th
International W. Mejbaum-Katzenellenbogen's Seminar on Membrane Skeleton and
Its Regulatory Functions, Szklarska Poręba 2001, Cell. Mol. Biol. Lett., 6, 207.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (4)
Leluk, J., Sobczyk, M. and Becella, Ł. (2001b) Sequence similarity estimation
and correlated mutations in selected protein families. II. Correlated mutations
in selected protein families. Ist Summer School on “Parallel Computing in
Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts
L-5.
Leluk, J., Sobczyk, M. and Becella, Ł. (2002) Correlated mutations in
selected protein families, TASK Quarterly, , 6(3), 469-482.
Leluk, J., Konieczny, L. and Roterman, I. (2003) Search for structural
similarity in proteins, Bioinformatics, 19(1), 117-124.
Barbara Bereza, Agnieszka Kubiak, Jacek Leluk, Wacław Hendrich (2003)
Photoenzyme Protochlorophyllide-NADPH oxidoreductase (LPOR) – the key
for chlorophyll biosynthesis, Postępy Biochemii, 49, 46-55.
Jacek Kuśka, Jacek Leluk (2003) Theoretical studies on sugar and protein
kinases: Structural and mutational variability, 3rd International Conference
“Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 592
Anna Zdyb, Jacek Leluk (2003) Comparative studies on sugar and protein
kinases: Genetic relationships and consensus sequences, 3rd International
Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 593
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (5)
Jacek Leluk, Bogdan Lesyng (2003) A comparative study of the primary
structures of protein and sugar kinases, 3rd International Conference
“Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 620-621
A. Czogalla, P. Kwołek, A. Hryniewicz-Jankowska, M. Nietubyć, J. Leluk, A.F.
Sikorski (2003) A protein isolated from Escherichia coli, identified as GroEL,
reacts with anti-beta spectrin antibodies, Arch Biochem Biophys., 415, 94100.
M. Kukuła, B. Hanus-Lorenz, E. Bok, J. Leluk, and A. F. Sikorski (2004) Proteins
with Spectrin Motifs Which Do Not Belong to the Spectrin-a-ActininDystrophin Family, Z. Naturforsch., 59c, 565-571
Jacek Leluk (2002) Studies on length-dependent estimation of sequence
identity significance, XXXVIIIth Meeting of Polish Biochemical Society,
Abstracts, 198-199.
Anna Zdyb, Jacek Leluk (2002) Theoretical studies on homology among kinase
family, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215.
Jacek Kuśka, Jacek Leluk (2002) Contact-variability-structure relationship within
the kinase molecule, XXXVIIIth Meeting of Polish Biochemical Society,
Abstracts, 215.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (6)
Jacek Kuśka, Jacek Leluk (2003) Studies on kinase structure and mutational
variability, 2nd Polish Congress of Biotechnology, Łódź, Poland 2327.06.2003, Abstracts, 75.
Anna Zdyb, Jacek Leluk (2003) Kinases – Comparative analysis, genetic
relationships and consensus sequence, 2nd Polish Congress of
Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75.
Jacek Leluk, Bogdan Lesyng (2003) Comparative analysis of homologous
kinases, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003,
Abstracts, 235.
Jacek Leluk (2003) Wrong assumptions and misinterpretations in molecular
biology, biochemistry and bioinformatics, Gliwice Scientific Meetings 2003,
Gliwice, Poland, Abstracts, 16
Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2004) Construction of
consensus sequences of the β-spectrin family with variable parameterthresholds and validation of the applied procedure, 29th FEBS
Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem.,
271, Supplement, 29-30.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (7)
Adam Górecki, Jacek Leluk, Bogdan Lesyng (2004) A Java-implementation of a
genetic semihomology algorithm (GEISHA), and its applications for analyses
of selected protein families, 29th FEBS Congress, 26 June - 1 July 2004,
Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30.
Jacek Leluk, Artur Mikołajczyk (2004) A new approach to sequence comparison
and similarity estimation, 29th FEBS Congress, 26 June - 1 July 2004,
Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29.
Agata Meglicz, Jacek Leluk, Bogdan Lesyng (2004) Protein inhibitors of kinases
– homology analysis, mechanisms of differentiation and correlated mutations,
29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J.
Biochem., 271, Supplement, 30.
Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2005) β-Spectrin consensus
sequence construction with variable threshold parameters; verification of their
usefulness; The International Conference Sequence-Structure-Function
Relationships; Theoretical and Experimental Approaches, Warsaw, Poland,
April 6-10 2005, Abstracts, 12.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (8)
Jacek Kuśka, Jacek Leluk, Bogdan Lesyng (2005) The variability patterns in
kinase subfamilies; The International Conference Sequence-StructureFunction Relationships; Theoretical and Experimental Approaches, Warsaw,
Poland, April 6-10 2005, Abstracts, 23.
Elżbieta Gajewska, Jacek Leluk (2005) An approach to sequence similarity
significance estimation; Theoretical and Experimental Approaches, Warsaw,
Poland, April 6-10 2005, Abstracts, 14.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Related documents