Download Deep architectures for protein contact map prediction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Proteomics wikipedia , lookup

Protein design wikipedia , lookup

Protein wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein folding wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein domain wikipedia , lookup

Rosetta@home wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Cyclol wikipedia , lookup

Structural alignment wikipedia , lookup

Alpha helix wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Deep architectures for
protein contact map
prediction
L E I CA I
Introduction and scientific motivation
Proteins are chains of amino acids and have a variety of functions in living organisms.
There are four levels of proteins structures:
Primary structure: a sequence of amino acids (20 types).
Secondary structure: a local structure of a protein (3 types).
Tertiary structure: 3D structure of a single protein.
Quaternary structure: the structure of several proteins or protein complex
Introduction and scientific motivation
Since the tertiary structure of a single protein or quaternary structure of protein complex
can help the understanding or predicting its function, several experimental methods were
invented to identify its structure.
This paper focus on a sub-problem in protein structure prediction:
Contact Map Prediction
Protein residue-residue contact prediction is the problem of predicting whether any two
residues in a protein sequence are spatially close to each other in the folded 3D structure.
Problem definition
Protein residue-residue contact prediction is the problem of predicting whether any two residues
in a protein sequence are spatially close to each other in the folded 3D structure.
For a protein of N amino acids, the contact map is an NxN matrix C whose elements are by:
Related body of work
There are four main approaches for residue-residue contact prediction:
• Neural networks
• Recursive neural networks
• Support vector machines
• Hidden Markov models
Limitations
The performance of many contact predictors has been assessed every 2 years.
However, the best predictors got 20% accuracy for long-range contacts, which
means the contact prediction is not accurate enough to be useful.
Contributions of paper
• This paper introduced new ideas for contact prediction using primarily a multistage machine learning approach.
• The authors developed deep architectures that integrate information over
multiple temporal and spatial scales.
• The experiment results show that the proposed method get an accuracy close
to 30%, which is a significant improvement.
Outline of proposed method
• First, predict coarse contact maps corresponding to contacts between secondary elements.
• Secondly, use a novel energy based neural network approach to refine the prediction of the
alignment and orientation of contacting secondary structure elements and predict residueresidue contact probabilities for residues in contacting pairs of alpha-helices and beta-strands.
• Finally, employ a deep neural network to predict all the residue-residue contact probabilities by
integrating information both spatially and temporally.
Coarse contact and orientation
prediction
In the part, the authors employ two-dimensional bidirectional recurrent neural networks(2DBRNN) to predict coarse contact probabilities and orientations between secondary structure
elements.
For each pair 𝑆𝑛 and 𝑆𝑚 of secondary structure elements, the outputs of the 2D-BRNN is
probability vector corresponding to the probability of parallel contact, anti-parallel contact, or
no contact.
The input of the 2D-BRNN for the pair 𝑆𝑛 , 𝑆𝑚 consists of two feature vectors.
Blue, green, and red squares correspond to antiparallel contact, parallel contact, and no-contact
respectively.
Element alignment prediction
This part use energy-based method to assign energies then
probabilities to the alignment between contacting secondary
structure and derive approximate probabilities of contact.
Alignments
Alignments between secondary structure elements are described by two components:
• Relative shift
The relative shift is an integer representing how the residues in the first element are
shifted with respect to the second element. For instance, the shift between two strands
of length 5 can have any integer value from 0 to 9.
• Phase
The phase is an integer assigned to pairs of residues, one from each contacting element,
which is meant to capture in approximate fashion the periodic component.
Strands and helices as periodic structures with periods 2 and 7 respectively.
Energy definition
Given a pair of contacting elements 𝑆𝑛 and 𝑆𝑚 . Assume that the segment 𝑆𝑛
consists of residues i,i+1,…i+𝑘𝑛 and 𝑆𝑚 consists of residues j,j+1,…j+𝑘𝑚 . the
energy of a-th shift with phase 𝜃 of segment 𝑆𝑛 versus segment 𝑆𝑚 is given by
Residue-residue contact prediction
Experiments