Download Common Pattern of Coarse-Grained Charge Distribution of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Silencer (genetics) wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Point mutation wikipedia , lookup

Biochemistry wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Structural alignment wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
Common Pattern of Coarse-Grained Charge Distribution of
Structurally Analogous Proteins
Kenichiro Imai* and Shigeki Mitaku
Nagoya University, Graduate School of Engineering, Department of Applied Physics, Nagoya,
Chikusa-ku, Furocho, 464-8606, Japan
*E-mail: [email protected]
(Received November 5, 2003; accepted December 22, 2003; published online December 31, 2003)
Abstract
Structurally analogous protein pairs with low sequence identity, such as analogues and
remote homologues, comprise a large part of structurally similar pairs thus complicating the
relationship between sequence and structure. To obtain clues for clarifying such intricate
relationships, we developed a method to analyze the coarse-grained charge distribution in an
amino acid sequence and analyzed the pattern of charge distribution for the pairs of
structurally similar proteins with sequence identities lower than 20%. We found two types of
pairs, those with similar patterns of charge distribution and those with inverted charge
distribution. This finding suggested that the charge distribution in a sequence might be a
good parameter for clustering the structures as analogs and remote homologs. The possibility
of automatic fold recognition is discussed by a quantitative comparison of charge distribution
patterns.
Key Words: analogue, remote homologue, charge distribution, structural biology, bioinformatics
Area of Interest: Bioinformatics and Bio computing
1. Introduction
Proteins with high amino acid sequence similarity generally adopt similar structures. The
majority of structurally similar pairs, however, have low sequence identity (less than 20 %
sequence identity) [1][2]. The protein pairs with weak identity are defined as remote homologues
and analogues in terms of their functional similarity [3][4]. The occurrence of protein pairs with
low sequence homology and high structural similarity complicate the analysis of a relationship
between sequence and structure. If the use of well-defined physical parameters can solve the
problem of modeling the intricate relationship between sequence and structure, it will provide a
new method for the annotation of orphan genes in a genome.
One of the efficient ways of clustering the shape of proteins is the so-called coarse-graining of
194
Copyright 2003 Chem-Bio Informatics Society
http://www.cbi.or.jp
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
physicochemical parameters of amino acid sequences. For example, a hydropathy plot is a kind of
coarse graining of hydropathy values [5]. Transmembrane regions in an amino acid sequence
correctly correspond to the peaks in the hydropathy profiles. Up to now, however, similar coarse
graining approaches have rarely been applied to soluble proteins. Sipple and his group reported a
prediction system for the protein fold recognition, itself [6]. According to their system, the effective
force potential between amino acids is used even though the physical meaning of the effective force
potential is not clear.
In this study, we focused on the net charges of amino acid sequences and found very similar
patterns of the coarse-grained charge distribution for protein pairs of analogues and remote
homologues. In general, the electrical interaction in water is much weaker than that in a nonpolar
environment because of the high permittivity and dielectric constant of water. However, it is well
known that the electrostatic interaction between large colloidal particles with large electrical
charges is very important for their stability. In the same way as for colloidal particles, the
electrostatic interaction may be an important factor in protein folding, when an amino acid
sequence carries clumps of electrical charges. Thus, we devised a charge density plot (CD plot) for
estimating the coarse-grained charge distribution of an amino acid sequence. By comparing the CD
plots of analogous protein pairs, we found that the charge distributions of several pairs were very
similar or showed an inverted pattern.
2. Methods
2.1 Charge density plot (CD plot)
The charge density plot (CD plot) is a method by which the net electrical charge densities of
polypeptide segments of various lengths are plotted by pseudo-color according to the following
procedures, which are shown in Figure 1. (1) An amino acid sequence is transformed to a sequence
representing the number of elementary charges, in which LYS, ARG and HIS are +1, ASP and GLU
are –1, and other residues are 0. The charge of His depends on pH, but the pattern of the CD plot
did not depend so much on the charge value of His, because the His residues are not present in large
numbers nor are they clustered for most proteins. Also, the charges at the N- and C-terminal ends
had no effect on the pattern of the CD plot. (2) The density of the net charge for every segment in
an amino acid sequence was calculated from the i-th to j-th residues, as represented by the
following equation.
CD(i, j ) = CD(j,i) =
j
∑ C ( k ) /(| j-i |+1 )
T
(1)
k=i
(3) CD(i,j) is represented by a pseudo-color and then plotted at the position of (i,j) and (j,i). The
parameter CT (k ) was 1, 0, or -1 corresponding to the positive, neutral, and negative charges,
respectively. As shown in Figure 1., blue and red represent positive and negative charges,
respectively.
2.2 Comparison of two charge density plots
The comparison was represented by equation (2). We used the summation of the squares of
CD A (i, j ) − CD B (i, j ) or CD A (i, j ) + CD B (i, j ) to compare the charge density plots of two proteins
in which the suffixes A and B indicate two amino acid sequences. The inverse similarity of the
195
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
charge distribution can be estimated by CD A (i, j ) + CD B (i, j ) .
Figure 1. The procedures for the calculation of the CD plot
Each point of the CD plot is colored according to the charge density calculated by equation (1). Blue
and red are the respective pseudo-colors of the positive and negative charges. The position of a point
(i, j) is determined by the sequence numbers, i and j, for the N- and C-sides of a segment. A CD plot
of interferon α-2A (1itf) is shown as an example. The rainbow bar in the right of the CD plot is
colored in the order of the secondary structures.
196
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
< S ± >=
∑ (CD
|i − j |≥ 20
A
(i, j ) ± CD B (i, j )) 2
∑ (CD A (i, j )) 2 ∑ (CDB (i, j )) 2
(2)
The similarities of patterns and inverted patterns are estimated by the parameters S+ and S-,
respectively. We neglected the local properties, but the segments longer than 20 residues was used
for comparison
Figure 2. A CD plot for (a) thiamin phosphate synthase (PDB id 2tpsA) and (b) KDPG
aldolase (1fq0A) which adopted a TIM barrel fold.
2.3 Dataset of analogous protein pairs
We constructed a dataset of protein pairs having pairwise sequence identity lower than 20 %
and structural similarity with RMSD less than 4.0 A from the DBAli database [7]. In addition, we
removed the pairs whose difference of sequence length was more then 30 residues and had partial
197
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
structural similarity and finally got 256 protein pairs.
The information of the 3D-structures together with the sequences was taken from the PDB, and
DS ViewerPro 5.0 (Accerlys) was used for the graphical representation.
Figure 3. A CD plot for (a) pyrrolidone carboxyl peptidase (1a2zA) and (b) purine nucleoside
phosphorylase (1ecpB).
3. Results and Discussion
The CD plots of protein pairs with low sequence homology and high structural similarity
showed that several protein pairs have similar charge distributions. For example, the pairs of
thiamin phosphate synthase (PDB id 2tpsA) and KDPG aldolase (1fq0A), which adopt the similar
structure of a TIM barrel, showed very similar charge distribution in spite of their weak pairwise
identity (Figure 2). The structural similarity of these protein pairs cannot be identified from the
198
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
amino acid sequence itself. However, when that sequence is transformed into a sequence of net
electric charges, the similarity becomes visible, suggesting that the coarse-grained charge
distribution can be a good physical parameter of protein folding.
We also found several pairs with an inverted pattern of charge distribution. Pyrrolidone
carboxyl peptidase (1a2zA) and purine nucleoside phosphorylase (1ecpB) showed very similar
structures, but their CD plot gave inverted profiles of positive and negative charge distribution
(Figure 3). The inverted charge distributions for structurally similar proteins are physically
reasonable, because the electrostatic forces are the same when the signs of all charge clusters are
inverted.
Figure 4. Histograms of the value of <S-> (a) and <S+> (b), which evaluate the similarity and
inversion of the CD plot patterns, respectively, for 256 structurally analogous pairs.
The graphs at the bottom are the histograms of <S-> and <S+> for 256 structurally analogous pairs
and the graphs at the top are the histograms of <S->and <S+> for the pairs which show distinct
similarity and inversion of the pattern, respectively.
199
Chem-Bio Informatics Journal, Vol. 3, No. 4, pp.194-200(2003)
Figure 4 shows histograms of <S-> or <S+>. The upper graphs show the histograms for protein
pairs, which have very similar charge density maps, and the lower graphs are the histograms of all
256 charge density maps. It is clear that the parameters <S-> or <S+> represent the similarity of
charge distribution. This fact suggests the possibility for achieving protein fold recognition from the
coarse-grained charge distributions.
There are many genes in a genome that cannot be annotated by sequence alignment programs.
These genes would likely code for many remote homologous and analogous proteins. However, we
cannot obtain any information about those proteins by the comparison of amino acid sequences
alone. Therefore, the comparison of similar or inverted charge profiles for protein pairs with low
sequence homology and high structural similarity may give new insight into the mechanism of
protein folding. Furthermore, this common charge distribution may provide a novel method for the
annotation of genes on a genome-wide scale.
References
[1]
[2]
[3]
[4]
B. Rost, Folding & Design. 2, S19-24 (1997).
J. M. Sauder, J. W. Arthur, R. L. Dunbrack, Jr., PROTEINS, 40, 6-22 (2001).
R. B. Russell, J.G. Barton, J.Mol.Biol., 244, 332-350 (1994).
R. B. Russell, M. A. S. Saqi, R. A. Sayle, P. A. Bates, M. J. E. Sternberg J.Mol.Biol., 269,
423-439 (1997).
[5] J. Kyte and R. F. Doolittle, J.Mol.Biol., 157, 105-132 (1982).
[6] H. Floeckner, M. Braxenthaler, P. Lackner, M. Jaritz, M. Ortner and M. J. Sippl, Proteins,
23, 376-386 (1995).
[7] M. A. Marti-Renom, V. A. llyin, A. Sali, Bioinformatics, 17, 746-747 (2001).
200