Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Introduction Popular Algorithms New Results Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Topics Algorithms for Protein Folding and Structure Prediction Introduction to proteins Protein folding is an optimization problem Martin Paluszewski Ph.D. student [email protected] Not recent but popular algorithms Recent algorithms DIKU 12/12-2006 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results The Cell - Building block of life and protein factory Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Protein Synthesis Proteins act as structure blocks, tools or machines in the cells. 1 2 3 Structural proteins: Hair, skin, nail, etc. Enzymatic proteins: Catalysts in chemical reactions. Functional proteins: Transportation of oxygen in blood, antibody defense etc. Synthesis steps 1 2 3 The Cell Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results 4 Algorithms for Protein Folding and Structure Prediction DNA is transcribed into mRNA. mRNA is transported to the ribosomes. mRNA is translated into a chain of amino acids. Transcription ...CAGAGAUCA... mRNA Ribosomes ...CAGAGAUCA... translation Q R S amino acids The chain folds to a protein. Martin Paluszewski Ph.D. student [email protected] Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Protein Synthesis ...CAGAGATCA... DNA Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding The Genetic Code Synthesis steps 1 2 3 4 DNA is transcribed into mRNA. mRNA is transported to the ribosomes. mRNA is translated into a chain of amino acids. The chain folds to a protein. Martin Paluszewski Ph.D. student [email protected] ...CAGAGATCA... DNA Transcription ...CAGAGAUCA... mRNA Ribosomes ...CAGAGAUCA... translation Q ? R S amino acids Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results The 20 Amino Acids The Chain of Amino Acids Amino Acids R1 H 2N Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction R2 CH CO2H CH Martin Paluszewski Ph.D. student [email protected] Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results H 2N + R3 H 2N + CH CO2H Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results The Chain of Amino Acids CO2H The Chain of Amino Acids Peptide Bonds R1 H 2N R2 CH CO2H H 2N + H 2N R3 CH CO2H H 2N + CH R1 O CH C H CO2H R2 O CH C NH R3 NH CH Peptide bond H { { Amino Acids Peptide bond H O CO2H H O The Rigid Planes Peptide Bonds R1 O CH C R2 O CH C R3 R H H H NH NH CH { { H 2N Peptide bond H O CO2H Cα ond H Pep tid e O eb tid Pep d N Peptide bond H H C C φ ψ O N Cα H O H R Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results R R O H Cα ond eb tid H φ ψ O ψ ψ φ O R Cα eb ond de ti Pep φ ψ bon d H R tid R Cα eb ond C φ ψ N Cα H H H Pep O N C N Cα H H H O R H Example of Conformational Space Size Degree of Freedom Atoms in the peptide bonds are fixed in planes. Planes can rotate according to φ and ψ. The conformational space is all non-clashing conformations. Martin Paluszewski Ph.D. student [email protected] tid C N Cα H Pep O C φ H H R Cα de b ond N C N Cα R Pep ti Pep C ψ φ Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding The Chain of Amino Acids H de bon d Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results The Chain of Amino Acids Pep ti R Cα bon Algorithms for Protein Folding and Structure Prediction Given a protein length N, then there are 2(N − 1) bond angles. Assume 4 discrete values of φ and ψ. This gives C (N) = 42N−2 possible configurations. C(20) = 75557863725914323419136. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Protein Folding Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Energy of a Protein Energy Function (model) An amino acid chain always folds to a unique 3D structure (Anfinsen). U = Bond + Angle + Dihedral + Van der Waals + Electrostatic It should therefore be possible to compute the folded structure of a protein only using the amino acid sequence as input. Hypothesis: The folded structure is the conformation with minimum free energy. Local or global? Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Why should we fold proteins? Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Problem variations Protein Folding Problem The 3D structure of a protein is found using X-ray and NMR technology. This is expensive, time consuming and for some proteins impossible. Can we contruct an algorithm that determines the 3D structure of a protein given it’s amino acid sequence? Trajectory Unfolded amino acid chain Folded protein Protein Structure Prediction DTSGNALYQVGLAINDYKLA Amino acid sequence Folded protein Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Introduction Popular Algorithms New Results Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Backbone of a protein Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Backbone of a protein R H H Pep tid O Cα eb ond Pep φ C ψ φ ψ e tid d bon H N C N tid R Cα eb ond C φ ψ N Cα Cα H H R Pep O H O Martin Paluszewski Ph.D. student [email protected] R H Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Backbone of a protein Introduction Popular Algorithms New Results Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Structure Levels Primary structure. The amino acid sequence. DTSGNALY ... DYKLA Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Structure Levels Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Topics The Cell Protein Synthesis Structure of Proteins Anfinsens Results Folding Structure Levels Primary structure. The amino acid sequence. Primary structure. The amino acid sequence. Secondary structure. Local structures, helix, sheet, turn, coil etc. Secondary structure. Local structures, helix, sheet, turn, coil etc. Tertiary structure. 3D coordinates of all atoms. Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Popular Algorithms and Models Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing The HP Model Exact algorithm in the HP model. Amino acids are either hydrophobic (H) or hydrophilic (P). Side-chain positioning problem. Observation: Hydrophobic amino acids tend to be packed in the protein core. Protein Threading. Secondary structure prediction using neural networks. Protein folding using distributed computing, (Folding@Home). Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Consider binary chain of H’s P’s Embed a chain of points (AA) in a lattice such that the number of neighbouring H’s is maximized. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing The HP Model - Example Amino acids: HP: Introduction Popular Algorithms New Results Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Approximation FRDLDRYYFHDINNFRHIEG HPPHPPHHHPPHPPHPPHPP The HP problem is NP hard (also in 2D). Approximation algorithms exist (Hart and Istrail). Ratio: 1/4 (2D) and 3/8 (3D). 2D lattice Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results 3D lattice Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Exact Method Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Exact Method Observation Optimal solutions usually have a core of optimal packed H’s Algorithm Input: A string s = {H, P}∗ . Output: Lattice positions of s. 1 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Generate a set compact cores of H’s 2 Thread the string to the cores 3 If succes → done, else generate sets of almost compact cores and try again. Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Exact Method - Interactive demo Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Side-chain Positioning Problem SCP Fixed backbone. For each amino acid i, there is a set of possible rotamers {ri }. A rotamer is a specific rotation of a side-chains. Problem: Choose one rotamer for each amino acid that minimize: E = E0 + X i Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction E (ir ) + X E (ir , js ) i <j Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Protein Threading Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Protein Threading Naive idea Similar amino acid sequences have similar structures. Naive idea Similar amino acid sequences have similar structures. Better idea We want to predict the structure of sequence A. Find a set of homologue proteins B with similar sequence and known structure. Thread A on each core structural model in B. The structure of A corresponds to the threading with the lowest cost. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Core Structural Model - Example Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Core Structural Model - Example AGABZILMKAPFAHETWTNDAB [1, 4] 5 AGA [1, 4] 5 [3, 6] Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results 2 [1, 2] 6 [3, 6] BZILM KAP 2 [1, 2] 6 [0, 4] FA H TWTNDA B [0, 4] Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Core Structural Model - Example Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Core Structural Model - Example Threading Threading Find threading with minimum cost. Find threading with minimum cost. Exponentially many threadings. AGABZILMKAPFAHETWTNDAB [1, 4] AGA Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction 5 BZILM [3, 6] KAP Martin Paluszewski Ph.D. student [email protected] 2 [1, 2] 6 FA H TWTNDA [0, 4] B Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Core Structural Model - Example Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Prediction of Secondary Structure Using Neural Networks Threading Find threading with minimum cost. Exponentially many threadings. Input: AGABZILMKAPFAHETWTNDABGAPFAHET Output: CCCCHHHHHHHHHCCCSSSSSSSCCCHHHH NP hard. However, large instances can be solved using branch and bound. Secondary Structure Categories AGABZILMKAPFAHETWTNDAB H: helix [1, 4] 5 [3, 6] 2 [1, 2] 6 FA H TWTNDA [0, 4] S: sheet C: coil AGA BZILM KAP Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results B Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Prediction of Secondary Structure Using Neural Networks Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Prediction of Secondary Structure Using Neural Networks ... Dendrites N V K E T 21 21 21 21 21 ... Input layer Axon Synapse Hidden layer Biology of the Brain Axons transmits signals Dendrites receives signals H Output layer Synapses are junctions between axons and dendrites Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Prediction of Secondary Structure Using Neural Networks S C 0.6 0.3 0.1 Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Protein Folding Using Distributed Computing Trajectory Results Best non-neural network approach: 55 % accuracy. Multilayer perceptron: 60 % accuracy. Consensus using different methods: >80 % accuracy. Unfolded amino acid chain Folded protein Simulation Start with an unfolded amino acid chain. Iteratively apply Newtonian laws of motions. A home PC can simulate ' 10−10 seconds of motion in an hour (small proteins). Simulation of 1 second of protein motion would take ' 1.000.000 years. Super computers > 1.000 years. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Protein Folding Using Distributed Computing Simulation New paradigm of worldwide distributed computing might break the CPU time barrier. SETI@HOME: Analyzes radio telescope data. CPU time > 2M years. Introduction Popular Algorithms New Results Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Ensemble Dynamics Energy Introduction Popular Algorithms New Results Folding@HOME: How can a protein folding simulation be massively distributed? Conformational space Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Energy Ensemble Dynamics Energy Ensemble Dynamics Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Conformational space Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Conformational space Martin Paluszewski Ph.D. student [email protected] Topics The HP Model Side-chain Positioning Problem Protein Threading Prediction of Secondary Structure Protein Folding Using Distributed Computing Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure From Solvent Exposure Using Tabu Search Ensemble Dynamics Energy Algorithms for Molecular Biology BioMed Central Open Access Research Reconstructing protein structure from solvent exposure using tabu search Martin Paluszewski*1, Thomas Hamelryck2 and Pawel Winter1 Address: 1Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark and 2Bioinformatics Center, Institute of Molecular Biology, University of Copenhagen, Universitetsparken 15 building 10, 2100 Copenhagen, Denmark Email: Martin Paluszewski* - [email protected]; Thomas Hamelryck - [email protected]; Pawel Winter - [email protected] * Corresponding author Published: 27 October 2006 Algorithms for Molecular Biology 2006, 1:20 doi:10.1186/1748-7188-1-20 Received: 30 March 2006 Accepted: 27 October 2006 This article is available from: http://www.almob.org/content/1/1/20 Conformational space Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction © 2006 Paluszewski et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Martin Paluszewski Ph.D. student [email protected] Abstract Algorithms for Protein Folding and Structure Prediction Background: A new, promising solvent exposure measure, called half-sphere-exposure (HSE), has recently been proposed. Here, we study the reconstruction of a protein's Cα trace solely from structure-derived HSE information. This problem is of relevance for de novo structure prediction using predicted HSE measure. For comparison, we also consider the well-established contact number (CN) measure. We define energy functions based on the HSE- or CN-vectors and minimize them using two conformational search heuristics: Monte Carlo simulation (MCS) and tabu search (TS). While MCS has been the dominant conformational search heuristic in literature, TS has been applied only a few times. To discretize the conformational space, we use lattice models with various complexity. Results: The proposed TS heuristic with a novel tabu definition generally performs better than MCS for this problem. Our experiments show that, at least for small proteins (up to 35 amino acids), it is possible to reconstruct the protein backbone solely from the HSE or CN information. In general, the HSE measure leads to better models than the CN measure, as judged by the RMSD and the angle correlation with the native structure. The angle correlation, a measure of structural similarity, evaluates whether equivalent residues in two structures have the same general Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Contact Vector Contact number of a Cα -atom is the number of other Cα -atoms in a sphere centered at the Cα -atom. CN-vector is the contact number of all Cα -atoms. Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Contact Vector CN-vector can be predicted from amino acid sequence using neural networks. Is it possible to reconstruct the protein backbone from the CN-vector? CN−vector Amino acid sequence M L S D E D F K A V F G M T R 8 7 7 NN 10 Predicted contact numbers (CN) 5 8 10 11 10 11 13 12 13 10 9 10 ? Structure Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Half Sphere Exposure Vector The sphere is slit in upper- and lower hemispheres. Energy HSE is therefore a pair of numbers (up CN, down CN). Recent results show that HSE vectors can be predicted. Is it possible to reconstruct the protein backbone from the HSE-vector? Given a structure S with N amino acids. Let HSE s be the HSE-vector of structure S. Let HSE p be the predicted HSE-vector. The energy of S is: E= CB up A 5 B down Vb 5 p i (HSEi rP HSE−vector−up HSE−vector−down − HSEis )2 N We want to find a structure S with zero (or near-zero) energy. AB C Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Advantages of using lattices Discrete state space Well-defined combinatorial optimization problem. Lattice Model Chain of Cα -atoms. Exact computations - No rounding errors. The Cα -atoms are positioned at lattice points. Many algorithmic problems can be solved efficiently. Succeeding Cα -atoms are positioned at connected lattice points. Martin Paluszewski Ph.D. student [email protected] Collision detection Finding neighbours Local moves Simple backbone representation Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Disadvantage Advantages of using lattices Nature does not represent proteins in lattices. Discrete state space Well-defined combinatorial optimization problem. Cubic Face Centered Cubic (FCC) High Coordination (HC) Exact computations - No rounding errors. Many algorithmic problems can be solved efficiently. Collision detection Finding neighbours Local moves Simple backbone representation Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing ProteinLattices Structure Using Tabu Search A Comparison Cubic Martin Paluszewski Ph.D. student [email protected] Face Centered Cubic Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search High Coordination cRMSD Let a1 , · · · , aN be the coordinates of Structure A, and let b1 , · · · , bN be the coordinates of Structure B. Then the similarity between A and B is: s PN 2 i=1 |ai − bi | cRMSD(A,B) = N ! ! 6 basis vectors RMSD: 2.69 ! ! 12 basis vectors RMSD: 1.84 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results ! ! 678 basis vectors RMSD: 0.34 Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Tabu Search Strategy Given an amino acid sequence A, predict the HSE-vector using neural networks. Find a structure, S, in a high coordination lattice such that the energy is minimized. This structure will have an HSE-vector similar to the predicted HSE-vector. Hypothesis: S will be similar to the native structure of the amino acid sequence A. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction 1: bestStructure, s ← random_conformation() 2: bestCost ← cost(bestStructure) 3: while not stop() do 4: N ← compute_neighbours(s) 5: sort N with respect to cost 6: for all i ∈ N do 7: if cost(Ni ) < bestCost then 8: bestCost ← cost(Ni ) 9: s, bestStructure ← Ni 10: break loop 11: end if 12: if not Tabu(Ni , Q) then 13: s ← Ni 14: break loop 15: end if 16: end for 17: pushback( Q, s ) 18: end while 19: return bestStructure Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Protein: 1EDN, size: 21 amino acids. Tabu 1: for all i ∈ Q do 2: if cost(s) > cost(Qi ) AND RMSD(s, Qi ) ≤ then 3: return true 4: end if 5: end for 6: return false Energy a 4 b c ε 1 2 3 Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Protein Structure Using Tabu Search Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Conclusion Protein: 1SRK, size: 35 amino acids. Tabu search and other metaheuristics often have difficulties reaching all parts of the conformational space. Especially for large proteins. HSE-based energy functions generates a reasonable energy landscape for small proteins. However, more information is needed to fold large proteins. Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Work in progress: Branch and bound algorithm. Use branch-and-bound on the whole conformational space to guarantee implicit evaluation of all possible conformations. Add more predictable information to the model: Secondary structure. Radius of gyration. Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Radius of gyration Elongated chains have large Rg Compact chains have small Rg Rg of globular proteins can be predicted: Predicted Rg Radius of gyration rP Rg = Martin Paluszewski Ph.D. student [email protected] i (vi − vcm )2 N Algorithms for Protein Folding and Structure Prediction Rg = 2.2N 0.38 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Segments Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Discrete directions Segments are constructed from the secondary structure assignment of the amino acid sequence. The direction of a segment must be one of the FCC directions. (1,1,0), (1,0,1), (1,-1,0), (1,0,-1), (-1,1,0), (-1,0,1), (-1,-1,0), (-1,0,-1), (0,1,1), (0,1,-1), (0,-1,1), (0,-1,-1) Amino acid sequence MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF Secondary structure assignment CCCHHHHHHHCCCCHHHHHCCCHHHHHHHHCCCCCC Segments Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Super structure Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Segment structures Super structure definition A super structure is a list of segments and their directions. If S is the number of segments, then there exists N = 4 × 11S−2 different super structures. Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Given a segment there are many configurations of the internal Cα atoms. We call these segment structures. Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Segment structures Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Segment structures Given a segment there are many configurations of the internal Cα atoms. We call these segment structures. For helices and sheets, we generate U such segment structures by rotating one structure uniformly around the axis defined by the segment. Given a segment there are many configurations of the internal Cα atoms. We call these segment structures. For helices and sheets, we generate U such segment structures by rotating one structure uniformly around the axis defined by the segment. For coils, we find U/2 best match structures from a fragment library, and use 2 rotations for each match. Coil sequence K E K G L F Fragment library Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Complete structure Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Complete structure Complete structure definition Complete structure definition Given a super structure. A corresponding complete structure is the selection of exactly one segment structure for each segment. There exists N = 4 × 11S−2 × U S different complete structures. Given a super structure. A corresponding complete structure is the selection of exactly one segment structure for each segment. There exists N = 4 × 11S−2 × U S different complete structures. Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching example Branch tree for three segments: helix, coil, helix. A node is branched by either fixing a segment direction or fixing a segment structure. Obviously, nodes in level 2 × S contain complete structures. Performance of the algorithm depends very much on the order fixing segment directions or fixing segment structures. Best performance is observed when segment directions are fixed as early as possible. 1 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching example Branch tree for three segments: helix, coil, helix. Branch tree for three segments: helix, coil, helix. 1 2 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching example 1 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction 2 3 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Branching example Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching example Branch tree for three segments: helix, coil, helix. 1 Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound 3 2 Branch tree for three segments: helix, coil, helix. 1 4 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results 3 2 5 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Branching example 4 Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Energy Example Branch tree for three segments: helix, coil, helix. Amino acid sequence M L S D E D F K A V F G M T R 9 10 8 7 7 Desired contact numbers (CN) 5 1 3 2 4 Martin Paluszewski Ph.D. student [email protected] 5 6 Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results 8 10 11 10 11 13 12 13 10 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Energy Example L S D E D F K A V Amino acid sequence F G M T R M L S D E Desired contact numbers (CN) 5 8 10 11 10 11 13 12 13 10 9 12 10 12 10 13 12 10 12 2 1 D F 9 10 8 7 7 5 1 2 1 0 0 3 4 1 4 1 0 0 Martin Paluszewski Ph.D. student [email protected] 9 V F G M T R 9 10 8 7 7 Structure contact numbers (CN) 9 10 7 8 9 7 9 12 10 12 10 13 12 10 12 0 1 1 2 2 1 9 10 7 8 9 0 1 1 2 Difference 2 0 Difference squared 1 A 8 10 11 10 11 13 12 13 10 Difference 2 K Desired contact numbers (CN) Structure contact numbers (CN) 7 4 Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Energy Example Amino acid sequence M Algorithms for Protein Folding and Structure Prediction 4 2 1 2 1 0 0 3 2 0 Difference squared 0 0 1 1 4 Algorithms for Protein Folding and Structure Prediction 4 1+ 4+ 1 4+ 4 1+ 1 4+ 4 1+ 1 0+ 0 0+ 0 9+ 9 4+ 4 0+ 0 0+ 0 1+ 1 1+ 1 4 = 34 Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Energy Example Difference squared 4 1+ 4+ 1 4+ 4 1+ 1 4+ 4 1+ 1 0+ 0 0+ 0 9+ 9 4+ 4 0+ 0 0+ 0 1+ 1 1+ 1 4 = 34 Amino acid sequence M L S D E D F K A V F G M T R 15 13 6 Desired contact numbers (CN) 5 8 10 11 10 11 13 12 13 10 9 10 8 7 7 Definition Structure contact numbers (CN) 7 9 12 10 12 10 13 12 10 12 9 10 7 8 9 CNdiff (i) = Difference 2 1 2 1 2 1 0 0 3 2 0 0 1 1 2 si X (CNdi,j − CNi,j )2 j=0 Difference squared 4 1+ 4+ 1 4+ 4 1+ 1 4+ 4 1+ 1 0+ 0 0+ 0 9+ 9 4+ 4 0+ 0 0+ 0 1+ 1 1+ 1 4 = 34 15 13 E (Z , Q) = 6 N X CNdiff (i) i=0 Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Bound example Lower bound CNLdiff (i) = min CNdiff (i) ≤ Q∈QZ min Q∈QZOPT CNdiff (i) 2 1 2 Problem 1 Given a segment i with a fixed segment structure qa ∈ Qi . Choose exactly one of the allowed segment structures qb ∈ Qj , for each segments 1 ≤ j ≤ N, j 6= i, such that CNdiff of segment i having structure qa is minimized. Call this value CNLdiff (a, i). Let CNLdiff (i) be the smallest of these. 3 1 2 1 Segment Black 1 (1,0) 2 (1,1) 3 (0,1) Red (2,1) − (0,2) R=(2,3) Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Another formulation 1: 2: 3: 4: Problem 2 Given an X × Y matrix M of d-dimensional vectors and a d-dimensional result vector R. Choose exactly one vector from each row of M. Let S be the sum of the chosen vectors, let D be the Euclidean distance between S and R. Choose the vectors of M such that D is minimized. 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Algorithms for Protein Folding and Structure Prediction Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Input: 2 dimensional X × Y matrix M and result vector R Output: Minimal D as defined in problem 2 for j ← 1 to X do S1 .insert(M1,j ) end for for i ← 2 to Y do for j ← 1 to X do for all m ∈ S1 do S2 .insert(m + Mi,j ) end for end for S1 ← S2 end for Return minimal distance between a vector in S1 and R Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Example Introduction Popular Algorithms New Results Reconstructing Proteins Using Tabu Search Reconstructing Proteins Using Branch and Bound Results. Work in Progress. Protein 2GB1, 56 amino acids. 2,4 3,2 9,1 7,8 0,1 3,4 5,5 1,2 3,3 R = 23,12 S = 21,14 D = 22 x 22 2 3 9 4 2 1 7 0 3 8 1 4 5 1 3 5 2 3 R = 23 S = 21 D = 22 Martin Paluszewski Ph.D. student [email protected] R = 12 S = 12 D = 0 Algorithms for Protein Folding and Structure Prediction Martin Paluszewski Ph.D. student [email protected] Algorithms for Protein Folding and Structure Prediction