* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Protein moonlighting wikipedia , lookup
List of types of proteins wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Implicit solvation wikipedia , lookup
Protein design wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Folding@home wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein domain wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein folding wikipedia , lookup
Structural alignment wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
3D Structures of Biological Macromolecules Part 5 Protein Structure Prediction - II Jürgen Sühnel [email protected] Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material: http://www.imb-jena.de/www_bioc/3D/ Molecular Mechanics (Force Field) http://cmm.info.nih.gov/modeling/guide_documents/molecular_mechanics_document.html How Do We Get the Parameters ? Experimental Data (Examples: Geometrical Parameters) Quantum-chemical Calculations (Examples: Charges) Quantum Chemistry Quantum Chemistry Geometry Optimization Optimization Methods Optimization Methods Optimization Methods – Steepest Descent Selection of an initial point x0 Determination of direction and step size for calculating the next point Optimization Methods – Conjugate Gradients Method Optimization Methods – Newton-Raphson Methods g -. gradient h - Hessian Molecular Dynamics Simulation of Protein Folding – Molecular Dynamics AMBER GROMOS CHARMM TINKER Molecular Dynamics Simulation Protein Capsid Of Filamentous Bacteriophage Ph75 From Thermus Thermophilus 1HGV, extended structure 1HGV, actual structure 1HGV, 61% helix, 1.928 ns 1HGV, 75% helix, 3.428 ns Images created using VMD (Visual Molecular Dynamics) (HUMPHREY, W., DALKE, A. and SCHULTEN, K., 1996.VMD - Visual Molecular Dynamics. Journal Molecular Graphics,14, pp33-38). Molecular Dynamics Packages amber.scripps.edu Molecular Dynamics Packages www.igc.ethz.ch/gromos/ Molecular Dynamics Packages www.charmm.org Molecular Dynamics Packages dasher.wustl.edu/tinker/ Visualizing and Analyzing Molecular Dynamics Simulations www.ks.uiuc.edu/Research/vmd/ Folding Surface for Lysozyme Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868. Protein Folding States Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868. Monitoring Protein Folding by Experimental Methods Dobson, Sali, Karplus, Angew. Chem. Int. Ed. 1998, 37, 868. Monitoring Protein Folding by Experimental Methods Paxco, Dobson, Curr. Opin. Struct. Biol. 1996, 6, 630. Protein Folding by Molecular Dynamics Protein Folding by Molecular Dynamics Protein Folding by Molecular Dynamics Villin headpiece domain (PDB code: 1vii) Actin binding site highlighted 36 amino acids Protein Folding by Molecular Dynamics Protein Folding by Molecular Dynamics Protein Folding by Molecular Dynamics Radius of Gyration The radius of gyration Rg is defined by the root-mean-square distance between all atoms in a molecule and the centroid. In a globular protein the radius of gyration Rg can be predicted with reasonable accuracy from the relationship Rg(pred) = 2.2 N 0.38 where N is the number of amino acids. Protein Folding by Molecular Dynamics Protein Folding by Molecular Dynamics Statistical Potentials wij(r) ij(r) * – - interaction free energy pair density reference pair density at infinite separation Statistical potentials can be determined by simply counting interactions of a specific type in a dataset of experimental structures. The distance dependence may or may not be taken into account. If not, the interaction free energy is usually called a contact potential. It represents an average over distances shorter than some cutoff distance rc. Thomas, Dill, J. Mol. Biol. 1996, 257, 457-469 Lattice Folding Lattice Algorithm • • • • • • Red = hydrophobic, Blue = hydrophilic If Red is near empty space E = E+1 If Blue is near empty space E = E-1 If Red is near another Red E = E-1 If Blue is near another Blue E = E+0 If Blue is near Red E = E+0 Ab Initio Protein Structure Prediction http://rosettadesign.med.unc.edu/ Ab Inition Protein Structure Prediction - Rosetta Structure representation: Only main-chain heavy atoms and Cbeta-atom of sidechains are taken into account, Bond lengths and bond angles are held constant and correspond to the alanine geometry. The only remaining geometrical variables are the backbone torsion angles. Structure generation: Generation of fragment libraries from experimental structures (3 and 9 amino acids). Splicing together fragments of proteins of known structure with similar sequences. The conformational space defined by these fragments is then searched by a Monte Carlo procedure with an energy function that favors compact structures with paired beta-strands and buried hydrophobic amino acids. A total of 1000 independent simulations are carried out (starting from different random number seeds) for each query sequence. The resulting structures are clustered. Initial evaluation by the scoring function Low-scoring conformations are identified by simulated annealing with a move set that involves replacing the torsion angles of a segment of the chain with a related amino acid sequence. Further evaluation by Protein Backbone Torsion Angles and Ramachandran Plot Bayesian Statistics Bayesian statistical methods differ from other types of statistics by the use of conditional probabilities. Bayes Theorem P(A|B) = [P(B|A) x P(A)] / P(B) ROSETTA Results Simons, Strauss, Baker. J. Mol. Biol. 2001, 306, 1191-1199. Computational Thermostabilization Computational Thermostabilization Prediction of stabile mutations with Rosetta Design Computational Thermostabilization PDB code: 1ox7 Cytosine deaminase (CD) catalyzes the deamination of cytosine (converts cytosine to uracil) and is only present in prokaryotes and fungi, where it is a member of the pyrimidine salvage pathway. The enzyme is of interest both for antimicrobial drug design and gene therapy applications against tumors. Computational Thermostabilization Computational Thermostabilization Computational Thermostabilization Superposition of double and triple mutant structures (PDB codes: 1ysb, 1ysd) A23L I140L V108I Comparing Protein Structures • The RMSD is a measure to quantify structural similarity • Requires 2 superimposed structures (designated here as “a” & “b”) • N = number of atoms being compared RMSD = S (xai - xbi)2+(yai - ybi)2+(zai - zbi)2 N Comparing Protein Structures http://wishart.biology.ualberta.ca/SuperPose/ Comparing Protein Structures http://www.ebi.ac.uk/DaliLite/ Comparing Protein Structures http://cl.sdsc.edu/ Comparing Protein Structures – Superpose Server Beginning with an input PDB file or set of files, SuperPose first extracts the sequences of all chains in the file(s). Each sequence pair is then aligned using a Needleman–Wunsch pairwise alignment algorithm. If the pairwise sequence identity falls below the default threshold (25%), SuperPose determines the secondary structure using VADAR (volume, area, dihedral angle reporter) and performs a secondary structure alignment using a modified Needleman–Wunsch algorithm. After the sequence or secondary structure alignment is complete, SuperPose then generates a difference distance (DD) matrix between aligned alpha carbon atoms. A difference distance matrix can be generated by first calculating the distances between all pairs of C atoms in one molecule to generate an initial distance matrix. A second pairwise distance matrix is generated for the second molecule and, for equivalent/aligned Calpha atoms, the two matrices are subtracted from one another, yielding the DD matrix. From the DD matrix it is possible to quantitatively assess the structural similarity/dissimilarity between two structures. In fact, the difference distance method is particularly good at detecting domain or hinge motions in proteins. SuperPose analyzes the DD matrices and identifies the largest contiguous domain between the two molecules that exhibits <2.0 Å difference. From the information derived from the sequence alignment and DD comparison, the program then makes a decision regarding which regions should be superimposed and which atoms should be counted in calculating the RMSD. This information is then fed into the quaternion superposition algorithm and the RMSD calculation subroutine. The quaternion superposition program is written in C and is based on both Kearsley's method and the PDBSUP Fortran program developed by Rupp and Parkin. Quaternions were developed by W. Hamilton (the mathematician/physicist) in 1843 as a convenient way to parameterize rotations in a simple algebraic fashion. Because algebraic expressions are more rapidly calculable than trigonometric expressions using computers, the quaternion approach is exceedingly fast. SuperPose can calculate both pairwise and multiple structure superpositions [using standard hierarchical methods and can generate a variety of RMSD values for alpha carbons, backbone atoms, heavy atoms and all atoms (average and pairwise). When identical sequences are compared, SuperPose also generates ‘per residue’ RMSD tables and plots to allow users to identify, assess and view individual residue displacements. http://wishart.biology.ualberta.ca/SuperPose/ Comparing Protein Structures Comparing Protein Structures Comparing Protein Structures http://www-structure.llnl.gov/xray/comp/suptext.htm IMB LINUX Cluster by IBM 1 Frontend 2 Storage Nodes 26 Compute Nodes Compute nodes: 2 x 2.4GHz Intel Xeon [tm] processors, 1GByte RAM, 40 GByte local IDE Hard Disk Frontend: Mirrored 73 GByte SCSI Disk Interconnect: Myrinet Disk array: 10 x 73 GByte Fiber Channel Disks Operating system: Linux Red-Hat 7.3 Cluster software: CSM (Cluster Systems Management), GPFS (General Parallel File System) Cluster vs. Grid Computing Clusters are made up of dedicated components and all components in a cluster are exclusively owned and managed as part of the cluster. All resources are known, fixed and usually uniform in configuration. It is a static environment. Grids differ from clusters because grids share resources from and among independent system owners. Grids are configured from computer systems that are individually managed and used both as independent systems and as part of the grid. Thus, individual components are not 'fixed' in the grid and the overall configuration of the grid changes over time. This results in a dynamic system that continually assesses and optimises its utilisation of resources. EUROGRID - BioGRID www.eurogrid.org/wp1.html Simulation of Protein Folding Simulation of Protein Folding thousan trillon FLOPs IBM Blue Gene Project | System-on-a-Chip Approach ~ 65.000 processors teraflop – a trillion floating point operations per second IBM Blue Gene Project | System-on-a-Chip Approach