* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Deep architectures for protein contact map prediction
Survey
Document related concepts
List of types of proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein design wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein folding wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein domain wikipedia , lookup
Rosetta@home wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural alignment wikipedia , lookup
Alpha helix wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Transcript
Deep architectures for protein contact map prediction L E I CA I Introduction and scientific motivation Proteins are chains of amino acids and have a variety of functions in living organisms. There are four levels of proteins structures: Primary structure: a sequence of amino acids (20 types). Secondary structure: a local structure of a protein (3 types). Tertiary structure: 3D structure of a single protein. Quaternary structure: the structure of several proteins or protein complex Introduction and scientific motivation Since the tertiary structure of a single protein or quaternary structure of protein complex can help the understanding or predicting its function, several experimental methods were invented to identify its structure. This paper focus on a sub-problem in protein structure prediction: Contact Map Prediction Protein residue-residue contact prediction is the problem of predicting whether any two residues in a protein sequence are spatially close to each other in the folded 3D structure. Problem definition Protein residue-residue contact prediction is the problem of predicting whether any two residues in a protein sequence are spatially close to each other in the folded 3D structure. For a protein of N amino acids, the contact map is an NxN matrix C whose elements are by: Related body of work There are four main approaches for residue-residue contact prediction: • Neural networks • Recursive neural networks • Support vector machines • Hidden Markov models Limitations The performance of many contact predictors has been assessed every 2 years. However, the best predictors got 20% accuracy for long-range contacts, which means the contact prediction is not accurate enough to be useful. Contributions of paper • This paper introduced new ideas for contact prediction using primarily a multistage machine learning approach. • The authors developed deep architectures that integrate information over multiple temporal and spatial scales. • The experiment results show that the proposed method get an accuracy close to 30%, which is a significant improvement. Outline of proposed method • First, predict coarse contact maps corresponding to contacts between secondary elements. • Secondly, use a novel energy based neural network approach to refine the prediction of the alignment and orientation of contacting secondary structure elements and predict residueresidue contact probabilities for residues in contacting pairs of alpha-helices and beta-strands. • Finally, employ a deep neural network to predict all the residue-residue contact probabilities by integrating information both spatially and temporally. Coarse contact and orientation prediction In the part, the authors employ two-dimensional bidirectional recurrent neural networks(2DBRNN) to predict coarse contact probabilities and orientations between secondary structure elements. For each pair 𝑆𝑛 and 𝑆𝑚 of secondary structure elements, the outputs of the 2D-BRNN is probability vector corresponding to the probability of parallel contact, anti-parallel contact, or no contact. The input of the 2D-BRNN for the pair 𝑆𝑛 , 𝑆𝑚 consists of two feature vectors. Blue, green, and red squares correspond to antiparallel contact, parallel contact, and no-contact respectively. Element alignment prediction This part use energy-based method to assign energies then probabilities to the alignment between contacting secondary structure and derive approximate probabilities of contact. Alignments Alignments between secondary structure elements are described by two components: • Relative shift The relative shift is an integer representing how the residues in the first element are shifted with respect to the second element. For instance, the shift between two strands of length 5 can have any integer value from 0 to 9. • Phase The phase is an integer assigned to pairs of residues, one from each contacting element, which is meant to capture in approximate fashion the periodic component. Strands and helices as periodic structures with periods 2 and 7 respectively. Energy definition Given a pair of contacting elements 𝑆𝑛 and 𝑆𝑚 . Assume that the segment 𝑆𝑛 consists of residues i,i+1,…i+𝑘𝑛 and 𝑆𝑚 consists of residues j,j+1,…j+𝑘𝑚 . the energy of a-th shift with phase 𝜃 of segment 𝑆𝑛 versus segment 𝑆𝑚 is given by Residue-residue contact prediction Experiments