Download Matching Protein -Sheet Partners by Feedforward and Recurrent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Matching Protein -Sheet Partners
by Feedforward and Recurrent
Neural Network
Proceedings of Eighth International Conference on
Intelligent Systems for Molecular Biology
(ISMB2000), pp. 25-36
P. Baldi, G. Pollastri, C. Anderson, and S. Brunak
Cho, Dong-Yeon
Introduction

Prediction of the Secondary Structure of Proteins
 Understanding their three dimensional conformations
 -helices
are built up from one contiguous region of the
polypeptide chain.
 -sheets are built up from a combination of several disjoint regions.

Previous Studies
 The best existing methods for predicting protein secondary
structure achieve prediction accuracy in 75-77% range.
 -sheet
is almost invariably the weakest category in terms of
correct percentages.

Prediction of Amino Acid Partners in -sheets
Data Preparation

Selecting the Data
 826 protein chains from the PDB select list of June 1998

Assigning -sheets Partners
A2-B2
A3-B3
B2-C2
B3-C3
C2-D2
C3-D3
Statistical Analysis

First Order Statistics
 The frequency of occurrence of each amino acid
General amino acid frequencies in the data
Amino acid frequencies in -sheets
 The ratio of the frequencies in -sheets over data

Second Order Statistics
 The conditional probabilities P(X|Y) of observing a X
knowing that the partner is Y in a -sheet
 Logo
representation

Length Distribution
 Interval distances between paired -strands, measured
in residue positions along the chain
Artificial Neural Network
Architecture

Feedforward Neural Network
 Large input windows
 They
tend to dilute sparse information present in the input that
is really relevant for the prediction.
 Two-window approach
 One
can either provide the distance information as a third input
to the system or one can train a different architecture for each
distance type.
 The architecture
 Two
input windows of length W
 The number D of amino acid is also given as an input unit to the
architecture with scaled activity D/100.
 The goal is to output a probability reflecting whether the two amino
acids located at the center of each window are partners or not.

Recurrent Neural Network
 Bi-directional recurrent neural network (BRNN)
 Input
layer
 Forward and backward Markov chain
 Output layer
Experiments and Results

Data
 Randomly split the data 2/3 for training and 1/3 for test
 Extremely
 At
unbalanced
each epoch, all the 37008 positive examples are presented
with 37008 randomly selected negative examples.
 The total balanced percentage is the average of the two
percentages obtained on the positive and negative examples.

Results
 Feedforward neural network
 The
best architecture
 The
predicted second order statistics
 Five-fold
cross validation
 BRNN Architecture
 Three
values (7, 9, and 11) are used as the size of two input
windows.
 Length 7 yields again the best performance.
 Five-fold
cross validation
 Ensemble architecture
 The
ensemble of 3 BRNNS
 Five-fold cross validation
 Summary of all the five-fold cross validation results
 Profile approach
 The
profile approach was used as input to the artificial neural
network.
 The overall performance is comparable, but not any better.
 Profiles may provide more robust first order statistics, but
weaker intrasequence correlation.
Discussion

We have developed a NN architecture that predicts
-sheet amino acid partners with a balanced
performance close to 84% correct prediction.
 It is insufficient by itself to reliably predict strand pairing
because of the large number of false positive predictions.

Some of directions for future work
 Profiles on the BRNNs
 Reduce the number of false positive predictions
 Improve the quality of the match
 Use of raw sequence information in addition to profiles
 -sheet predictor
 Various combinations of the present architectures
Related documents