Download Structural Breakpoint Prediction and Its Application Based on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Structural Breakpoint Prediction and Its Application
Based on Dimers in Amino Acid Sequence
Yoshihide Makino
Nobuya Itoh
[email protected]
[email protected]
Department of Biotechnology, Faculty of Engineering, Toyama Prefectural University, 5180
Kurokawa, Imizu-shi, Toyama 939-0362, Japan
Keywords: dimer entropy, structural breakpoint, secondary structure prediction
1
Introduction
The relationship between sequence and structure is one of the major problems in protein science. Accurate
structure prediction is required as a key technology in the post-genome era. Furthermore, more powerful
knowledge or a technique that develops biotechnology and medicinal science will be developed by
understanding the principle of protein architecture. We report here the development of a basic tool of
protein architecture analysis. Most protein consists of specific secondary structure segments and linkers
connecting the segments. As the relative orientation of secondary structure segments defines the fold of
each protein, accurate prediction of the extent of secondary structures in the protein sequence will be
valuable information. We thus developed a method of predicting structural breakpoints in an amino acid
sequence. When perfect prediction of the breakpoints was achieved, the polypeptide could be treated as a
simpler sequence of secondary structural segments and its linkers. The conformational entropy value of
dimers in the amino acid sequence of the protein structural database was calculated. The entropy value was
found corresponding to the frequency of the unstructured conformation. Based on this observation, we
developed a prediction procedure of structural breakpoints, which defines the terminus of the secondary
structure segment. The application of secondary structure prediction with predicted breakpoints was also
tested.
2
Methods and Results
2.1 Entropy Calculation
The 6,486 structures of the non-redundant version of the protein data bank (PDB) were used for the
following analysis. The dimer indicates the combination of two adjacent amino acids, 400 in total, found in
an amino acid sequence [2]. All dimers with dihedral angles of φ at (n+1)th position and ψ at (n)th position
were extracted from all amino acid sequences in the above structures. The dihedral angles were plotted by
each dimer species. The density of points for each of the 18 x 18 degrees square lattice in the plot was
calculated, and the entropy value was calculated according to the following equation of Shannon’s entropy
(H) [1]:
where p is the density value at ith and jth index in the plot.
2.2 Entropy and Secondary Structure Distribution
The structure regions were appropriately defined for alpha helix and beta strand on the φ(n+1)-ψ(n) plot. A
structure that did not belong to either region was also defined as ‘other’ region. The summation of the
density of each structural region was calculated and plotted for all 400 dimers (Fig. 1). Although the
summation of alpha and beta regions was distributed
almost uniformly, the summation at other region
increased as entropy increased. Thus, relatively high
probability would be expected by selecting dimers with
a high entropy value.
Figure 1: The entropy versus the density summation
for each structural region.
2.3 Breakpoint Prediction
Based on the above
consideration,
we
predicted the breakpoints.
The entropy values were
plotted along each amino
acid sequence, and the
peaks were assigned as
the predicted breakpoints.
The observed breakpoints
were defined at the next
dimer after the previous
consecutive three dimers
with the same secondary
Figure 2: The coverage of observed (left) and predicted (right) breakpoints.
structure; the prediction
and observation were then compared (Fig. 2). The parameters of the entropy peak detection threshold and
window size of the entropy value for averaging were optimized. The secondary structures of the above PDB
entries were predicted with optimized parameters of entropy, averaging window size 1 and entropy detection
threshold 0.04. The best result, 61.7% concordance, was obtained by the procedure of uniform secondary
structure assignment between the predicted breakpoints with a reasonable secondary structure prediction index.
3
Discussion
It was shown that the breakpoints of the secondary structure could be reasonably predicted with the entropy
values of amino acid dimers. The entropy value was specific for each dimer, because a difference in value
was often observed by exchanging the first and second amino acids. The high coverage of observed
breakpoints by predicted breakpoints in the optimized condition suggests that the entropy value of dimers
could be used as a good index. Relatively moderate coverage of the predicted breakpoints by the observed
breakpoints was mainly caused by overprediction of breakpoints on the alpha helix. The difference
between observation and prediction could be overcome by testing structural consistency at the
three-dimensional level. The development of a procedure to improve the breakpoint prediction accuracy by
structural consistency check is now in progress.
References
[1] Jurkowski, W., Brylinski, M., Konieczny, L., Wiíniowski, Z., and Roterman, I., Conformational subspace
in simulation of early-stage protein folding, Proteins: Structure, Function, and Bioinformatics,
55:115-127, 2004.
[2] Sudarsanam, S., Dubose, R.F., March, C.J., and Srinivasan, S., Modeling protein loops using a φi+1, ψi
dimer database, Protein Science, 4:1412-1420, 1995.
Related documents