Download Prediction of Protein Structure Using Backbone Fragment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Community fingerprinting wikipedia , lookup

Magnesium transporter wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Expression vector wikipedia , lookup

Biochemistry wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Interactome wikipedia , lookup

Western blot wikipedia , lookup

Protein wikipedia , lookup

Metalloprotein wikipedia , lookup

Point mutation wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
Prediction of Protein Structure Using Backbone Fragment
Library and a Multilayered Learning Algorithm
Pramod P Wangikar
†
Department of Chemical Engineering, Indian Institute of Technology, Bombay, Powai
Mumbai 400076 INDIA; Email: [email protected]
The current approaches for protein structure prediction rely on (i) homology of the entire
protein sequence with a template structure or (ii) ab initio prediction methods. These
methods suffer from the disadvantages of (a) lack of homologous template structure for a
majority of new sequences or (b) untractably large conformational search space for ab
initio predictions. We propose a method that exploits the correlation between
conformation and sequence features in the local region. For this purpose, we first
constructed a library of local conformation classes or backbone fragments. We use
octapeptide as an arbitrary unit of local conformation. Using a “geometric invariant
based approach”1,2, we show that the octapeptide fragment structures can be clustered
into 46 structural classes. The protein 3-D structure can now be described as a sequence
of backbone fragments or structure labels. Note that the average 3-D structure for each
of the 46 structure labels is available and that the 3-D structure of a protein can be
reconstructed from the sequence of structure labels. Analysis of the sequence features
reveals the presence of sequence-structure relationship in local regions, which can be
exploited to predict local conformations of protein based on its amino acid
sequence. We have formulated this problem as a classical text segmentation problem
using Conditional Random Field (CRF). CRF considers a Markov random field (Y)
globally conditioned on another random field (X). In this case, Y is the sequence of
structure labels while X is the amino acid sequence. The accuracy of the CRF predictions
was augmented by using Support Vector Machine (SVM) as an additional layer of
learning. In this layered algorithm, CRF manages the task of segmentation while SVM
provides input for labeling of the segments. The model was trained with 146 high
resolution x-ray crystal structures to obtain over 30% prediction accuracy for 60 unseen
sequences. We believe that the prediction accuracy can be improved further by fine
tuning the model parameters or by using a larger data set. We argue that the results of
this prediction algorithm can be used to complement the efforts of homology modeling as
well as ab initio predictions.
Tendulkar et al (2005) “A geometric invariant-based framework for the analysis of
protein conformational space” Bioinformatics, 21, 3622-3628
Tendulkar et al (2004) “Clustering of Protein Structural Fragments Reveals Modular
Building Block Approach of Nature” J. Mol. Biol, 338, 611-629