Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Worldwide Protein Data Bank www.wwpdb.org What the Protein Data Bank teaches us about structural biology Helen M. Berman NCMI Workshop December 13, 2008 1960’s Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature 181 662-666; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, 161-187; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. Myoglobin Hemoglobin Lysozyme Ribonuclease 1970’s Grass roots community efforts to archive data Protein crystallographers discuss how to archive data June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972) October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) 1975 PDB receives first funding from NSF (~32 structures) Hemoglobin M.F. Perutz (1962) Proc. R. Soc. A265:161-187 Carboxypeptidase A F.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem 25:1-78 Myoglobin J.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H. Wyckoff, D.C. Phillips (1958) Nature 181:662-666 Subtilisin R.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright (1971) Biochem Biophys Res Commun 45: 337-344 Alpha-chymotrypsin J.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: 187-240 Pancreatic trypsin inhibitor R. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek (1970) Nature 57: 389-392 Rubredoxin K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen (1973) Acta Crystallogr B29: 943-956 Lactate dehydrogenase J.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C. Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G. Rossmann (1976) J Mol Biol 102: 759-779 Cytochrome b5 F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring Harb Symp Quant Biol 36: 387-395 Papain J. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G. Wolthers (1968) Nature 218: 929-932 Enzymes Enzyme Class 1972-79 Lysozyme 1990-99 2000-08 Total Oxidoreductases 5 25 918 2977 3925 Transferases 3 29 1423 5246 6701 29 123 2797 6846 9795 Lyases 2 3 451 1337 1793 Isomerases 1 2 280 716 999 Ligases 0 4 123 652 779 40 186 5992 17774 23992 Hydrolases In the beginning 1980-89 Total Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757 Ligases Lyases Ribonuclease Kartha, Bello, Harker (1967) Nature 213, 862865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. Hydrolases Transferases Oxidoreductases Percent Isomerases Decade: Proportion of enzyme classes relative to total enzyme structures RNA-containing structures (1317) In the beginning Number of Structures 1200 1000 800 600 400 200 tRNA J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: 546-551. 0 Decade: 1972-1979 1980-1989 1990-1999 2000-2008 Protein/RNA complexes DNA/RNA hybrid RNA only Protein/DNA/RNA complexes 1980’s Technology takes off Structural biology is able to focus on medical problems Community efforts to promote data sharing IUCr guidelines requiring data deposition in the PDB are published DNA-containing structures (2474) In the beginning Protein/DNA complexes DNA only DNA/RNA hybrid Protein/DNA/RNA complexes B-DNA Z-DNA 1bna Dickerson & Drew (1981) J. Mol. Biol. 149: 761-786 2dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: 680-686 Decade Protein-nucleic acid complexes (1920) Phage 434 repressor-operator 2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: 899-907 Number of Structures In the beginning Decade: Protein/DNA complexes Protein/RNA complexes Protein/DNA/RNA complexes Viruses (280 total) Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V. Coat protein sequence determination and its structural implications J.Mol.Biol. 177: 701-713 Number of Structures In the beginning 139 160 121 140 120 100 80 60 40 20 20 0 1980-1989 1990-1999 Decade Helical (25) Silva, Rossmann (1985) The refinement of southern bean mosaic virus in reciprocal space Acta Crystallogr. B41: 147-157 >=2000 Icosahedral (255) Cooperative community action Individual letters to editors of journals Committees – IUCr commission on Biological Macromolecules – ACA/USNCCr – Richards committee Funding agencies Articles in journals Marvin Cassman Fred Richards Richard Dickerson 1990’s Number of structures increases exponentially Complexity of structures increases mmCIF dictionary created New databases begin to emerge User base expands dramatically PDB archive moves mmCIF Working Group Members Electron Microscopy structures In the beginning Bacteriorhodopsin Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol. 213: 899-929. Ribosome structures (214) In the beginning Ribosome 30S 1% 1% Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: 905-920; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: 833-840; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: 615-623; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: 233-241. 2% Prokaryotic 41% 55% 50S Eukaryotic 2000’s wwPDB is formed Continued growth in structures Structural genomics takes off www.wwpdb.org Number of released entries Depositions to the PDB by decade Year: July 2008 What can we learn from the PDB? Structure distribution 582 Protein-RNA complexes 655 RNA only Ribosome 39 1093 RNA-DNA hybrid 218 DNA only Virus 280 755 1301 Other ProteinDNA complexes Other 17988 Enzyme 23466 46157 Protein only Cellular processes* 2911 Response* to stimuli 500 * GO process Biological regulation & signal t transduction 4445 * Immune system process* 819 Number of structures Structure determination methods 33797 35000 30000 number_prot_rna_nmr.list number_prot_rna_xray.list 35000 number_total_em.list number_total_nmr.list number_total_xray.list 30000 25000 33797 X-Ray 20000 NMR EM X-Ray NMR EM 250000 0 200000 15000 0.3 15000 10000 5000 8837 10000 5000 86 0 0 341 8837 86 341 5492 1790 2 0 1980-1989 2 154 2 0 1972-1979 5492 1990-1999 25 1790 2000-2008 6 176 20 15 0 1972-1979 1980-1989 1990-1999 Decade 10 2000-2008 5 0 N N N N ING PH TIO TIO TIO TIO RA TER AC AC AC AC OG AT FR FR FR FR F F F F C I I I I S OM D D D D E T N P N R N R N S E E TIO RO RO RO ED FIB WD LU UT CT CT AR PO E SO NE R L ELE F E IN April 30, 2008 Resolution distribution of all structures Resolution Resolution distribution of protein structures Year Resolution distribution of other structures Distinct and novel protein sequences Percent of distinct/novel structures 70 63% 60 51% 50 40 39% Structures containing distinct protein sequences (<98%) 7% Structures containing novel protein sequences (<30%) 37% 32% 27% 30 7% Subset of PSI structures 14% 20 25% 16% 4% 2% 10% 10 0 1972-1979 1980-1989 1990-1999 Decade 2000-2008 Subset of other SG structures Redundancy: protein clusters Cluster # Total distinct chains in cluster 1 459 2 Protein cluster First structure Deposition Date Bacteriophage T4 lysozyme 2LZM 1977-03-28 297 Hen white lysozyme 2LYZ 1975-02-01 3 196 Human lysozyme 1GFE 1984-10-12 4 445 Mouse immunoglobulin Fc&Fab fragments 1GIG 1993-01-20 5 218 Human immunoglobulin Fc&Fab fragments 1FC1 1981-05-21 6 330 HIV-1 protease 2HVP 1989-04-10 7 302 Trypsin (serine protease) 5PTP 1977-12-19 8 254 Thrombin 2HGT 1991-06-03 9 229 Human carbonic anhydrase II 1CA2 1976-05-22 10 185 Whale myoglobin 1MBN 1973-04-05 11 182 Human leukocyte antigen 1HLA 1987-10-15 12 178 Human hemoglobin -subunit 3HHB 1975-04-01 13 176 Human hemoglobin -subunit 3HHB 1975-04-01 14 160 Ribonuclease A 2RNS 1973-04-01 15 153 Human cyclin-dependant kinase 2 (CDK2) 1HCK 1996-06-03 Lysozyme: Lessons learned T4 bacteriophage (459 structures) Amino acid replacement studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: 35-41. Insight into folding and catalysis Hen egg white (297 structures) Low sequence identity Structural similarity of active site to T4 B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: 545-58. Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757. Insight into evolution and catalysis Myoglobin and hemoglobin: Lessons learned Whale myoglobin (185 structures) Different ligands: oxygen, carbon dioxide1 Amino acid substitution studies2 Laue studies3 Insight into function and dynamics Other species myoglobin Low sequence identity, same structure4 Insight into evolution Human hemoglobin (178 structures) Insight into function and disease (sickle cell anemia, thalassemia)5 Other species hemoglobin Low sequence identity, same structure4 Profound insight into evolution 1Kuriyan, Lodish et al.6 Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: 8704-8709; 4Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10:1739-1749, Harrington, Adachi, Royer Jr. TIM barrel proteins: Lessons learned TIM barrel structures (1727) http://www.cathdb.info Share the same fold but represent significant sequence and functional diversity Are enzymes or enzyme-related proteins involved in molecular or energy metabolism Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res. Commun. 72: 146-155 Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65. Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65. 122 311 27 39 Number of Structures HIV-related structures (609) 110 Decade Protease Reverse Transcriptase Gag protein Integrase Other HIV-1 protease (311) 226 structures with ligands Amprenavir (GSK) Fosamprenavir (GSK) 1T7J, 1HPV Lopinavir (Abbott) Atazanavir (BMS) 2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF, 2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI Nelfinavir (Agouron) Darunavir (Tibotec) 2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR Tipranavir (BI) Indinavir (Merck) 2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH 2O4N, 2O4L, 2O4P, 1D4Y, 1D4S Ritonavir (Abbott) 2B60, 1RL8, 1SH9, 1N49, 1HXW Saquinavir (Roche) 3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7 Navia, Fitzgerald, McKeever, Leu, Heimbach, Herber, Sigal, Darke, Springer (1989) Nature 337: 615-620; Wlodawer, Miller, Jaskolski, Sathyanarayana, Baldwin, Weber, Selk, Clawson, Schneider, Kent (1989) Science 245: 616-621 HIV-1 reverse transcriptase (110) 76 structures with ligands Abacavir (GSK) Nevirapine (BI) Stavudin (BMS) 2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT Efavirenz (BMS) Lamivudine (GSK) Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman, Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: 7242-7246 Zidovudine (GSK) Emtricitabine (Gilead) Tenofovir (Gilead) Zalcitabine (HoffmannLaRoche) 1T05 Etravirine (Tibotec) Delavirdine (Pfizer) Number of Structures 1JKH, 1IKW, 1IKV, 1FKO, 1FK9 1S6P Year Structural coverage of KEGG pathways 50136 structures 16526 structures associated with KEGG pathway (33%) KEGG Pathway Number of Structures Complement and coagulation cascades 506 Small cell lung cancer 506 Regulation of actin cytoskeleton 449 Non-small cell lung cancer 407 Pyrimidine metabolism 402 Nitrogen metabolism 399 Two-component system - General 360 Ribosome 333 Base excision repair 328 Purine metabolism 310 Antigen processing and presentation 281 Nicotinate and nicotinamide metabolism 252 Insulin signaling pathway 248 Porphyrin and chlorophyll metabolism 248 ABC transporters - General 246 Prostate cancer 244 Human biological pathways Complement and coagulation cascades pathway Regulation of actin cytoskeleton Small cell lung cancer Non small cell lung cancer Genes that contain a PDB structure are in red KEGG (http://www.genome.jp/kegg/) EM maps and Models in the PDB How EM experiments are archived 580 entries total Nuclear pore complex, 85 Å EMD-1097 EMDataBank Created by EBI in 2002 for archiving EM maps US deposition/annotation site added this year Maps stored in CCP4/MRC format Associated metadata stored in xml format Rotavirus V6 protein, 3.8 Å EMD-1461 230 entries total PBCV-1 (1m4x, 1680 matrices) EM entries in the PDB Atomic coordinate models fitted to EM maps Storage format for models and metadata is CIF Matrix representations possible Some large entries “break” PDB format 80S ribosome (1s1h + 1s1i) PDBj Goals Common data model Data harvesting tools “One-stop shop” for deposition and retrieval Tools for visualization, segmentation, and assessment Acknowledgements NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL BIRD-JST, MEXT NLM Acknowledgements NIH GM079429 (Baylor, Rutgers, EBI) 2007- 2012 EU Network of Excellence LSHG-CT-2004-50282 (EBI) 2004-2009