Download lecture 5

Chapter 4: Protein Structures II BINF 6101/8101, Spring 2017 Protein Structure Classification Why bother? q  Provides structural and evolutionary relationship q  Provides current fold space q  Assists protein structure prediction (BINF6202/8202) Two popular protein classification databases: q  SCOP (Structural Classification Of Proteins )/SCOPe (extended) http://scop.berkeley.edu Latest release: v2.06 (February 2016) 244,326 domains Murzin et al. J. Mol. Biol. 247, 536-540, 1995 q  CATH: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H). http://www.cathdb.info/ Latest release: v4.1 (January 1, 2015) | CATH-B 308,999 domains Orengo et al. Structure, 5, 1093-1108, 1997 Classifications are Domains-based, Why? q  q  q  q  Basic units for protein structure comparison/classification protein domain databases: SCOP, CATH Structural domains are evolutionary, functional, and folding units of proteins Very useful in protein structure prediction. Many structural similarities are between domains Protein design Nucleic Acids Res. 1998 Jan 1;26(1):316-9. Classifications are Domain-based What is a protein domain? --Definition of protein domain is not well defined --General Considerations: •  compact, semi-independent units * (close to spherical shape) •  interactions between domains are weak (small contact) •  identifiable hydrophobic core ** (interface is more hydrophilic) * Wetlaufer DB. PNAS 1973; 70:697-701 ** Swindells MB. Protein Science 1995; 4:103-112 Pyruvate kinase Simple Cases—Continuous Domains Adding to the Complexity—Discontinuous Domains N-terminal C-terminal SCOP Classification: 33844 px 39360 px c.56.5.4 d1cg2a1 d.58.19.1 d1cg2a2 1cg2 1cg2 A:26-213,A:327-414 A:214-326 About 20% of mutidomain proteins are non-contiguous Redfern OC. et al, PloS Computational Biology, 2007 Multi-domain Proteins ~50% proteins are multi-domain (data from 2005) It could be as high as 80% in eukaryotes Redfern OC. et al, PloS Computational Biology, 2007 Hierarchical Structure Classification by SCOP SCOP—Structural Classification of Proteins Family: Clear evolutionarily relationship (1) pairwise residue identities between the proteins are 30% and greater. (2) Proteins with low sequence similarity but very similar functions and structures; for example, many globins have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily. Fold: Major structural similarity (1) have same major secondary structures in same arrangement and with the same topological connections. (2) Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies. Class: secondary structure content and organization Murzin et al. J. Mol. Biol. 247, 536-540, 1995 Hierarchical Structure Classification by SCOP Mainly parallel beta sheets Segregated alpha and beta regions, anti-parallel beta Automatic Assignment in SCOPe Fox NK, Brenner SE, Chandonia JM. 2014. SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research 42:D304-309 Timeline of SCOP(e) Releases SCOP Classification Statistics SCOP ID SCOP Classification: 33844 px c.56.5.4 d1cg2a1 1cg2 39360 px d.58.19.1 d1cg2a2 1cg2 A:26-213,A:327-414 A:214-326 Redfern OC. et al, PloS Computational Biology, 2007 All of these different proteins share the TIM-barrel fold, named after triosephosphate isomerase All α All β α/β α+β Common Folds Immunoglobulin fold • all-β protein fold • consists of 2 layers • ~7 antiparallel β-strands arranged in two β-sheets. Tim barrel fold • α/β protein fold • named after triosephosphate isomerase • eight α-helices and eight parallel β-strands Rossman fold • α/β protein fold • named after Michael Rossman • Parallel β-strands connected by α-helices PDB: Protein Data Bank http://www.rcsb.org/pdb/home/home.do Protein Data Bank (PDB) •  X-ray structures and NMR structures (modeled structures in a separate place) •  Each PDB entry has one unique 4-letter ID 1GUO 1JUN 1TAO 1PHD Some PDB Statistics Protein Structure Methods: X-Ray Crystallography Protein purification Steps needed •  Purify the protein •  Crystallize the protein •  Collect diffraction data •  Calculate electron density •  Fit residues into density Pros •  No size limits, well-established Cons •  Difficult for membrane proteins •  Cannot see hydrogen atoms •  not every protein can be crystalized! Images from PDB and “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe Analysis of Diffraction Pattern q  The diffraction pattern is analyzed by mathematical and computational methods (Fourier transform analysis) to produce an electron density map. q  Note the objective result of a crystallographic experiment is not really a picture of the atoms, but a map of the distribution of electrons in the molecule, i.e. an electron density map—the x-rays are scattered from the electron cloud of the atoms. q  Since the electrons are mostly tightly localized around the nuclei, the electron density map gives us a pretty good picture of the molecule. Resolution of X-ray Structures 3.0 Å 2.0 Å 1.0 Å Resolution is a measure of the level of detail present in the diffraction pattern and the level of detail that will be seen when the electron density map is calculated. Images from PDB and “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe Resolution of X-ray Structures Determining the Structure of Myoglobin by x-ray Method Structure Determination by NMR Steps needed •  Purify the protein •  Dissolve the protein •  Collect NMR data •  Assign NMR signals •  Calculate the structure Pros •  No need to crystallize the protein •  Can see hydrogen atoms Cons •  Difficult for insoluble proteins •  Works best with small proteins, size limit (<50 kd) Image from “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe Determining the Structure of Myoglobin by NMR q  Based on magnetic moments of atomic nuclei. q  NMR measures the interactions of atomic nuclei. q  A typical NMR structure includes an ensemble of protein structures, all of which are consistent with the observed list of experimental restraints. Alternate Location Indicator Intrinsically Disordered Proteins •  Contain protein segments that lack definable structure •  Composed of amino acids whose higher concentration forces less-defined structure –  Lys, Arg, Glu, and Pro •  Disordered regions can conform to many different proteins, facilitating interaction with numerous different partner proteins Intrinsically Disordered Proteins REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 ALA A 10 REMARK 465 LEU A 11 REMARK 465 TYR A 12 REMARK 465 ASP A 13 REMARK 465 GLU A 14 REMARK 465 ASN A 15 REMARK 465 GLN A 16 REMARK 465 LYS A 17 REMARK 465 GLY A 34 REMARK 465 SER A 35 REMARK 465 ASP A 36 REMARK 465 THR A 37 REMARK 465 LYS A 38 REMARK 465 VAL A 39 REMARK 465 LEU A 40 REMARK 465 ASN A 97 REMARK 465 LYS A 98 ! 1AZ3, ECORV ENDONUCLEASE ! ........... ! Intrinsically Disordered Proteins ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 N CA C O CB CG CD1 CD2 N CA C O CB OG N CA C O CB OG1 CG2 N LEU LEU LEU LEU LEU LEU LEU LEU SER SER SER SER SER SER THR THR THR THR THR THR THR ILE A A A A A A A A A A A A A A A A A A A A A A 33 33 33 33 33 33 33 33 41 41 41 41 41 41 42 42 42 42 42 42 42 43 26.730 27.036 28.374 28.664 25.885 24.530 23.595 23.966 22.783 22.712 22.385 22.445 24.019 24.113 22.154 21.792 20.607 19.810 22.947 23.998 22.476 20.520 -44.512 -44.367 -43.713 -42.605 -43.635 -44.331 -43.397 -44.741 -41.387 -41.306 -42.623 -42.687 -40.757 -40.937 -43.682 -45.011 -45.407 -46.272 -46.040 -45.743 -47.447 -44.767 3.763 2.333 1.986 2.433 1.634 1.784 2.462 0.434 -5.851 -7.306 -8.008 -9.214 -7.877 -9.299 -7.238 -7.752 -6.888 -7.239 -7.570 -8.494 -7.809 -5.726 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 42.58 42.29 39.56 39.78 37.47 32.39 32.67 32.82 20.25 28.02 27.89 25.11 33.28 39.21 33.31 30.10 30.51 34.44 27.32 33.52 21.11 30.77 N C C O C C C C N C C O C O N C C O C O C N ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Intrinsically Disordered Proteins REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 465 MISSING RESIDUES THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) M RES ALA THR SER THR LYS LYS GLU ASP ASN ALA ASP SER GLY GLN C SSSEQI A 1 A 2 A 3 A 4 A 5 A 6 A 142 A 143 A 144 A 145 A 146 A 147 A 148 A 149 ! N-terminal and C-terminal disordered regions 1EYA P53 Protein and its Binding Partners Protein folding and stability Protein Stability and Folding A protein’s function depends on its 3D-structure Proteins are stabilized by many(!) collective weak interactions Protein Stability •  Protein stability is a small difference of large numbers. •  Proteins are stable (ΔG < 0) only over a narrow environmental range. •  In fact, there are forces pushing the equilibrium between folded and unfolded in both directions. •  Stabilizing forces: Intraprotein salt bridges, hydrogen bonds, dipole-dipole interactions and VDW interactions (all of which are electrostatic in nature). •  Destabilizing forces: Primarily electrostatic interactions with solvent and conformational entropy reduction. •  The hydrophobic effect is a not a true force; rather, it is a colligative property. •  Common denaturants include: detergents, organic solutions, extreme pH, extreme temperature, high ionic strength, etc. Thermal Denaturation Loss of structural integrity with accompanying loss of function is called denaturation. Tm: melting temperature Ribonuclease A: CD method Apomyoglobin: W content Absorption of UV light by Aromatic Amino Acids •  •  •  •  The aromatic amino acids absorb light in the UV region Proteins typically have UV absorbance maxima around 275–280 nm Tryptophan and tyrosine are the strongest chromophores Concentration can be determined by UV-visible spectrophotometry using Beers law: A = ε·c·l ε: Molar absorptivity c: concentration l: length of the light path Circular Dichroism (CD) Analysis •  CD measures the molar absorption difference Δε of left- and right-circularly polarized light: Δε = εL – εR •  Chromophores in the chiral environment produce characteristic signals •  CD signals from peptide bonds depend on the chain conformation Protein Denaturation Ribonuclease Refolding Experiment •  Ribonuclease is a small protein that contains 8 cysteines linked via four disulfide bonds •  Urea in the presence of 2-mercaptoethanol fully denatures ribonuclease •  When urea and 2-mercaptoethanol are removed, the protein spontaneously refolds, and the correct disulfide bonds are reformed •  The sequence alone determines the native conformation •  Quite “simple” experiment, but so important it earned Chris Anfinsen the 1972 Chemistry Nobel Prize Ribonuclease Refolding Experiment 2-Mercaptoethanol Ribonuclease Refolding Experiment “Simulations of CI2 in 8 M urea indicate that urea promotes unfolding by both indirect and direct mechanisms. Direct urea interactions consisted of hydrogen bonding to the polar moieties of the protein, particularly peptide groups, leading to screening of intramolecular hydrogen bonds. Solvation of the hydrophobic core proceeded via the influx of water molecules, then urea. Urea also promoted protein unfolding in an indirect manner by altering water structure and dynamics, as also occurs on the introduction of nonpolar groups to water, thereby diminishing the hydrophobic effect and facilitating the exposure of the hydrophobic core residues. Overall, urea-induced effects on water indirectly contributed to unfolding by encouraging hydrophobic solvation, whereas direct interactions provided the pathway.” PNAS April 29, 2003 vol. 100 no. 9, 5142047 Protein Folding •  Proteins fold to the lowest-energy fold in the microsecond to second time scales. How can they find the right fold so fast? •  It is mathematically impossible for protein folding to occur by randomly trying every conformation until the lowest-energy one is found (Levinthal’s paradox, see next slide) •  Search for the minimum is not random because the direction toward the native structure is thermodynamically most favorable Levinthal’s Paradox Introducing Levinthal’s paradox... Q: Assume a 300 aa protein, with 3 possible conformational states per residue. How many possible conformations are there? A: 3300 conformations (easy question) Q: Assuming some finite time for each transition (a few ps), how long would it take the protein to fold assuming a completely random sampling? A: Older than the age of the universe. Levinthal, Cyrus (1969). "How to Fold Graciously" Proteins Folding Path Chaperones in Protein Folding Chaperones in Protein Folding Chaperones in Protein Folding Protein Misfolding and Human Disease

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture 5