Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Structural Breakpoint Prediction and Its Application Based on Dimers in Amino Acid Sequence Yoshihide Makino Nobuya Itoh [email protected] [email protected] Department of Biotechnology, Faculty of Engineering, Toyama Prefectural University, 5180 Kurokawa, Imizu-shi, Toyama 939-0362, Japan Keywords: dimer entropy, structural breakpoint, secondary structure prediction 1 Introduction The relationship between sequence and structure is one of the major problems in protein science. Accurate structure prediction is required as a key technology in the post-genome era. Furthermore, more powerful knowledge or a technique that develops biotechnology and medicinal science will be developed by understanding the principle of protein architecture. We report here the development of a basic tool of protein architecture analysis. Most protein consists of specific secondary structure segments and linkers connecting the segments. As the relative orientation of secondary structure segments defines the fold of each protein, accurate prediction of the extent of secondary structures in the protein sequence will be valuable information. We thus developed a method of predicting structural breakpoints in an amino acid sequence. When perfect prediction of the breakpoints was achieved, the polypeptide could be treated as a simpler sequence of secondary structural segments and its linkers. The conformational entropy value of dimers in the amino acid sequence of the protein structural database was calculated. The entropy value was found corresponding to the frequency of the unstructured conformation. Based on this observation, we developed a prediction procedure of structural breakpoints, which defines the terminus of the secondary structure segment. The application of secondary structure prediction with predicted breakpoints was also tested. 2 Methods and Results 2.1 Entropy Calculation The 6,486 structures of the non-redundant version of the protein data bank (PDB) were used for the following analysis. The dimer indicates the combination of two adjacent amino acids, 400 in total, found in an amino acid sequence [2]. All dimers with dihedral angles of φ at (n+1)th position and ψ at (n)th position were extracted from all amino acid sequences in the above structures. The dihedral angles were plotted by each dimer species. The density of points for each of the 18 x 18 degrees square lattice in the plot was calculated, and the entropy value was calculated according to the following equation of Shannon’s entropy (H) [1]: where p is the density value at ith and jth index in the plot. 2.2 Entropy and Secondary Structure Distribution The structure regions were appropriately defined for alpha helix and beta strand on the φ(n+1)-ψ(n) plot. A structure that did not belong to either region was also defined as ‘other’ region. The summation of the density of each structural region was calculated and plotted for all 400 dimers (Fig. 1). Although the summation of alpha and beta regions was distributed almost uniformly, the summation at other region increased as entropy increased. Thus, relatively high probability would be expected by selecting dimers with a high entropy value. Figure 1: The entropy versus the density summation for each structural region. 2.3 Breakpoint Prediction Based on the above consideration, we predicted the breakpoints. The entropy values were plotted along each amino acid sequence, and the peaks were assigned as the predicted breakpoints. The observed breakpoints were defined at the next dimer after the previous consecutive three dimers with the same secondary Figure 2: The coverage of observed (left) and predicted (right) breakpoints. structure; the prediction and observation were then compared (Fig. 2). The parameters of the entropy peak detection threshold and window size of the entropy value for averaging were optimized. The secondary structures of the above PDB entries were predicted with optimized parameters of entropy, averaging window size 1 and entropy detection threshold 0.04. The best result, 61.7% concordance, was obtained by the procedure of uniform secondary structure assignment between the predicted breakpoints with a reasonable secondary structure prediction index. 3 Discussion It was shown that the breakpoints of the secondary structure could be reasonably predicted with the entropy values of amino acid dimers. The entropy value was specific for each dimer, because a difference in value was often observed by exchanging the first and second amino acids. The high coverage of observed breakpoints by predicted breakpoints in the optimized condition suggests that the entropy value of dimers could be used as a good index. Relatively moderate coverage of the predicted breakpoints by the observed breakpoints was mainly caused by overprediction of breakpoints on the alpha helix. The difference between observation and prediction could be overcome by testing structural consistency at the three-dimensional level. The development of a procedure to improve the breakpoint prediction accuracy by structural consistency check is now in progress. References [1] Jurkowski, W., Brylinski, M., Konieczny, L., Wiíniowski, Z., and Roterman, I., Conformational subspace in simulation of early-stage protein folding, Proteins: Structure, Function, and Bioinformatics, 55:115-127, 2004. [2] Sudarsanam, S., Dubose, R.F., March, C.J., and Srinivasan, S., Modeling protein loops using a φi+1, ψi dimer database, Protein Science, 4:1412-1420, 1995.