* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Structure Validation
Survey
Document related concepts
Transcript
Protein Structure Validation Or How to Know When Your Model is Finished Tristan Fiedler ACA Summer School 2003 Protein Structure Validation • Types of Model Building Errors • Why Errors are Made • How to Detect & Avoid Errors • Available Tools & References References • • • • • Intl Tables for Cryst. Vol. F : Crystallography for Biological Macromolecules. Rossman & Arnold, eds. Ch. 21 Methods in Enzymology. Vol 277, Part B. Carter & Sweet, eds. 1977. Ch. 10 Crystallography Made Crystal Clear. Rhodes. 2000. Academic Press Principles of Protein X-ray Crystallography. Drenth. 1994 Springer-Verlag http://www.usm.maine.edu/~rhodes/CMCC • http://www.sb.fsu.edu/~chapman/Classes/Crystallography/password/Structure_Assessment_files/v3_document.htm • • • • • • • • • http://como.bio.columbia.edu/tong/Public/Replace/tf/contents/contents.html http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html http://xray.bmc.uu.se/gerard/embo2001/modval/03.html http://www.bmsc.washington.edu/CrystaLinks/validation.html http://kinemage.biochem.duke.edu/validation/valid.html http://www.cmbi.kun.nl/whatif/ http://www.ccp4.ac.uk/html/sfcheck.html http://xray.bmc.uu.se/usf/gmrp_man.html http://xray.bmc.uu.se/~gerard/gmrp/gmrp.html Times to Celebrate ! Protein - Crystal - Phases - Model - Paper Backbone - Fit Sequence - Coarse Model - Optimize - Evaluate Types of Errors • Entirely Wrong Fold • Local Fold Incorrect – Individual Subunits – Connectivity between 2o Structure – ALWAYS consult Experimental Map • Main Chain thru Side Chain (> 2.7 Å) • Frameshift – Loops – 2n : May go undetected • Side Chain Conformations – Widespread initially – Alter rotamer not torsion angles – Usu. Corrected by Automated refinement Why Are Modeling Errors Made? • Phase Errors (MIR, MAD, MR) affect maps – Better : Area Detectors, Cryo, Density Modif, Synchrotron Radiation • Resolution – w/o f errors, trace chain at > 4 Å • Inexperience • Pressure to Publish • Structure Factors Not Deposited – Difficult to Assess PDB Model Quality • Good Enough R-factor … Crystallographic R factor • • • • • • 59% - randomly placed atoms 30 to 50% unrefined - essentially correct >30% refined - incorrect structure 25 to 30% refined - minor errors 15 to 20% refined - essentially correct 0% - perfect / currently unattainable ! http://www.sb.fsu.edu/~chapman/Classes/Crystallography/password/Structure_Assessment_files/v3_document.htm Improper R factor Manipulation • Removal of Diffraction Data – Resolution Limits – I/s cutoff • Reduce Weight of Restraints • Increase # Parameters in Refinement – Remove NCS averaging – Inappropriate Temp. Factor Model • “Overzealous Modelling” - Overfitting – Alternate Conformations – Excessive Solvent Modelling • ~2 Å : 1 H2O per amino acid residue • ~1 Å : 1.6 H2O per amino acid residue • Isoelectronic Ions: Na+, NH4+ How to Detect & Avoid Errors in Protein Modelling Avoiding Errors in Modelling (1) • • • • • Obtain highest quality data Treat Model as Hypothesis Verify Chemical & Biochemical Sense Refer often to the Experimental Map Use Info from Well-Refined Structures – Engh & Huber Bond Lengths & Angles – Implemented in “O” - OOPS • Rotamer Libraries • Main Chain Fragments Avoiding Errors in Modelling (2) • After each Refinement cycle, assess: – – – – – – – RSR, omit maps, Ramachandran plot Pep-flip (outlier > 2.5 Å) Rotamer sc-fit (outlier > ~ 1.0 Å) Hydrogen bonding (eg. H, N, Q, H2O) Unusual B factors, bond lengths & angles Deviations from planarity Large Positional Shifts Ramachandran Plot http://xray.bmc.uu.se/gerard/embo2001/modval/03.html phi (C[i-1]-N[i]-CA[i]-C[i]), psi (N[i]-CA[i]-C[i]-N[i+1]) Chi-1 Torsion Angle Distribution http://xray.bmc.uu.se/gerard/embo2001/modval/03.html Chi-1 Chi-2 Torsion Angle Distribution Chi-2 Chi-1 N=47,000 residues http://xray.bmc.uu.se/gerard/embo2001/modval/03.html R factor Alternatives • Free R (global) – – – – – – Brunger, 1992. Nature 355: 472 Much more reliable; Harder to manipulate Use to monitor model improvement Helps reduce over-fitting of data Rfree - Rcryst < ~ 0.05 (good models) Exclude ~ 5% of data (500 - 2000 reflections) • Real Space R (local) – Jones et al., 1991 Acta Cryst A 47: 110 – Compares observed & calculated density – Subsets : residues, main chain, side chain Tools for Detecting & Correcting Errors in Model Building OOPS : Efficient rebuilding of protein structures • Real-space fit of model to density – Maps : 2Fo-Fc, 3Fo-2Fc, SA-omit, NCS averaged • Main Chain Geometry – Pep-flip, phi/psi, peptide bond planarity • Side Chain Geometry – Rotamer fit, Ca chirality • Temperature factors & occupancies – User-determined cutoff values • Mask Violations – For Electron-Density Averaging • Significant Shifts – xyz, B, occ, phi/psi, chi1/chi2 PROCHECK procheck filename [chain] resolution • • • • • Covalent geometry Planarity Dihedral angles Chirality Non-bonded interactions • Main-chain hydrogen bonds • Disulphide bonds • Stereochemical parameters • Parameter comparisons • Residue-by-residue analysis http://www.biochem.ucl.ac.uk/~roman/procheck/procheck_run.html Procheck Sample Output Ramachandran Plot Quality Peptide Bond Planarity Bad NonBonded Interactions Ca Tetrahedral Distortion Main Chain Hydrogen Bond Energy Overall G Molprobity : Sample Output http://kinemage.biochem.duke.edu/molprobity/help/tutorial.html Be Careful & Good Luck !