* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Survey
Document related concepts
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Drug design wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Metalloprotein wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Proteolysis wikipedia , lookup
Protein purification wikipedia , lookup
Transcript
Protein Structure Prediction N. Gautham Department of Crystallography and Biophysics University of Madras, Guindy Campus Chennai 600 025 Lecture Outline • The problem of protein structure prediction - its statement • The Levinthal paradox and computational complexity • Methods of structure prediction • Ab initio methods • Genetic algorithms • Potential energy minimisation • MOLS Information Transfer pathway within the cell ……ATGCATGCATGCATGCATGC.. ………CGUACGUACGUACGU………… DNA ………CGUACGUACGUACGU………… RNA DECODING MECHANISM PROTEIN Sequence PROTEIN Structure Biological function Statement of the problem • The sequence of amino acids in a protein determines its three dimensional structure ….AVTYRGSED…. • The structure of a protein is essential for its function Statement of the problem • Structures are determined experimentally using X-ray crystallography and NMR • This is expensive and time-consuming • Instead, can this be done using computers ? • The Problem: Given the sequence of a protein, can we use available information from Physics, Chemistry (and databases of previous structures, etc.) to calculate its three dimensional structure ? Levinthal Paradox • The Levinthal paradox arises when we consider • protein folding as a thermodynamic phenomenon (driven by entropy) This means: - the native fold of a protein is its minimum energy state the protein folds by sampling its conformational space to find the one with least energy - Levinthal Paradox • The time taken to search all possibilities increases exponentially with the size of the protein (by known algorithms) • THE PARADOX: In real life proteins fold in a few milliseconds, though we expect them to take centuries!! • In other words – the problem of protein structure prediction (or protein folding) is NP in terms of computational complexity Levinthal Paradox The ‘Golf Course’ model of the potential energy landscape Levinthal Paradox – The new view of protein folding The ‘folding funnel’ model of the potential energy landscape Computational complexity • If an algorithm is such that the computation time increases as a polynomial function of the size of the problem – it is a ‘Polynomial time’ algorithm. It belongs to the set P e.g. Time = const x (size)2 + const x (size)5 • If an algorithm is such that the computation time increases as an exponential function of the size of the problem – it is a ‘NonPolynomial time’ algorithm. It belongs to the set NP e.g. Time = const x 2.5size Computational complexity • The Travelling Salesman problem – an example of a problem in NP City 3 City 1 City 6 City 2 City 4 City 5 • Problem – Find the path with the least distance that covers all cities at least once. • The number of paths to be tried increases an exponential function of the number of cities as Computational complexity • Problems in computational biology that are in NP : • Construction of phylogenetic trees A B A C A B A B B D C D D C • Multiple sequence alignment • Protein Structure Prediction Methods of Protein Structure Prediction • Homology modelling • Fold recognition • Ab initio methods • Genetic algorithms • Potential energy minimisation • MOLS Ab initio methods: Genetic algorithms • a.k.a. ‘Evolutionary Computation’ • The method operates on pieces of information • (like Nature on genes) Start with a group of individuals (binary coded ?) that represent possible solutions to the problem • Apply mutation, variation and crossover operators to the individuals • From the resulting population, select individuals with high values of fitness to populate next generation • Iterate till best individual is obtained Genetic algorithms: Application to Protein Structure Prediction • The initial generation consisted of protein structures with random choice of torsion angle values • The fitness function was a semi-empirical potential energy function, i.e. EvdW + Eel + Etor + Epseudoentropy • The mutate operator randomly changed torsion angle values • The variate operator made small, random increments or decrements to torsion angle values • The crossover operator interchanged portions of randomly selected pairs of individuals Potential Energy Minimization • Minimize Potential Energy (Least squares, Conjugate Gradient, Molecular Dynamics….) • The problem – where to start? How to avoid local minima? • Many methods - Build-up method - Conformational Space Annealing - Monte Carlo Minimization - Diffusion Equation and Distance Scaling - Simulated Annealing - ……. Potential Energy Minimization • Build-up method 5 1 Step 1 Minimize 10 6 1 Step 2 10 Minimize Mutually Orthogonal Latin Squares OBJECTIVE: To build a library of the lowest energy conformations of an oligopeptide Mutually Orthogonal Latin Squares METHOD (IN BRIEF): The MOLS cycle Parameterize the search space Use these to build a set of MOLS (chosen at random) to globally sample the space Analyze the samples to obtain a low energy conformation (This is followed by gradient minimization) Yes Another low energy conformation? Mutually Orthogonal Latin Squares Results: The (23) best structures for Met-enkephalin GA MOLS Initial population of 50 individual structures for the sequence Sequence is split into overlapping fragments of five/seven/nine residues Mutations Conformational search for all fragments using MOLS yielding ~ 1000 structures each Resulting structures are clustered A library of structures for each fragment Variations Structures from MOLS libraries Crossing over Selection of individuals with lower energy Avian pancreatic polypeptide 36 residues RMSD 4.0 A Prediction Experiment (X-ray crystallography) Villin headpiece 36 residues RMSD 5.2 A Prediction Experiment (X-ray crystallography) Tryptophan zipper 16 residues RMSD 2.7 A Prediction Experiment (NMR) Bovine Pancreatic Trypsin Inhibitor 58 residues 3 disulphide bridges Prediction Experiment (X-ray) Protein Structure Prediction: Conclusion • If the sequence of the protein of unknown structure has greater than 40% identity with one of known structure, the structure prediction problem may be considered solved – especially with the structural genomics initiative • Ab initio structure prediction, using knowledge only of sequence, and of physics and chemistry, is as yet an unsolved problem