Download berman-NCMI

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Worldwide Protein Data Bank
www.wwpdb.org
What the Protein Data Bank teaches
us about structural biology
Helen M. Berman
NCMI Workshop
December 13, 2008
1960’s
 Protein
crystallography
begins to take off
 Emerging interest
in protein folding
 Use of computer
graphics to
represent
structure
 Nobel Prize
awarded for the
first 3D protein
structures:
myoglobin and
hemoglobin
Myoglobin: Kendrew, Bodo, Dintzis,
Parrish, Wyckoff, Phillips (1958) Nature 181
662-666; Hemoglobin: Perutz (1962) Proc.
R. Soc. A265, 161-187; Lysozyme: Blake,
Koenig, Mair, North, Phillips, Sarma (1965)
Nature 206 757; Ribonuclease: Kartha,
Bello, Harker (1967) Nature 213, 862-865;
Wyckoff, Hardman, Allewell, Inagami,
Johnson, Richards (1967) J. Biol. Chem.
242, 3753-3757.
Myoglobin
Hemoglobin
Lysozyme
Ribonuclease
1970’s





Grass roots
community efforts to
archive data
Protein
crystallographers
discuss how to
archive data
June 1971 Cold
Spring Harbor
meeting brings
groups together (Cold
Spring Harbor
Symposia on
Quantitative Biology,
vol. XXXVI, 1972)
October 1971 PDB is
announced in Nature
New Biology (7
structures; vol 233,
1971, page 223)
1975 PDB receives
first funding from NSF
(~32 structures)
Hemoglobin
M.F. Perutz (1962) Proc. R. Soc. A265:161-187
Carboxypeptidase A
F.A. Quiocho, W.N. Lipscomb (1971) Adv Protein Chem
25:1-78
Myoglobin
J.C. Kendrew, G. Bodo, H.M. Dintzis, R.G. Parrish, H.
Wyckoff, D.C. Phillips (1958) Nature 181:662-666
Subtilisin
R.A. Alden, J.J. Birktoft, J. Kraut, J.D. Robertus, C.S. Wright
(1971) Biochem Biophys Res Commun 45: 337-344
Alpha-chymotrypsin
J.J. Birktoft, D.M. Blow (1972) J Mol Biol 68: 187-240
Pancreatic trypsin inhibitor
R. Huber, D. Kukla, A. Ruhlmann, O. Epp, H. Formanek
(1970) Nature 57: 389-392
Rubredoxin
K.D. Watenpaugh, L.C. Sieker, J.R. Herriott, L.H. Jensen
(1973) Acta Crystallogr B29: 943-956
Lactate dehydrogenase
J.L. White, M.L. Hackert, M. Buehner, M.J. Adams, G.C.
Ford, P.J. Lentz Jr., I.E. Smilely, S.J. Steindel, M.G.
Rossmann (1976) J Mol Biol 102: 759-779
Cytochrome b5
F.S. Mathews, P. Argos, M. Levine (1972) Cold Spring
Harb Symp Quant Biol 36: 387-395
Papain
J. Drenth, J.N. Jansonius, R. Koekoek, H.M. Swen, B.G.
Wolthers (1968) Nature 218: 929-932
Enzymes
Enzyme Class
1972-79
Lysozyme
1990-99
2000-08
Total
Oxidoreductases
5
25
918
2977
3925
Transferases
3
29
1423
5246
6701
29
123
2797
6846
9795
Lyases
2
3
451
1337
1793
Isomerases
1
2
280
716
999
Ligases
0
4
123
652
779
40
186
5992
17774
23992
Hydrolases
In the beginning
1980-89
Total
Blake, Koenig,
Mair, North,
Phillips, Sarma
(1965) Nature
206 757
Ligases
Lyases
Ribonuclease
Kartha, Bello,
Harker (1967)
Nature 213, 862865; Wyckoff,
Hardman, Allewell,
Inagami, Johnson,
Richards (1967) J.
Biol. Chem. 242,
3753-3757.
Hydrolases
Transferases
Oxidoreductases
Percent
Isomerases
Decade:
Proportion of enzyme classes relative to
total enzyme structures
RNA-containing structures (1317)
In the beginning
Number of Structures
1200
1000
800
600
400
200
tRNA
J.L. Sussman, S.-H. Kim (1976)
Biochem Biophys Res
Commun. 68:89-96; J.D.
Robertus, J.E. Ladner, J.T.
Finch, D. Rhodes, R.S. Brown,
B.F.C. Clark, & A. Klug (1974)
Nature 250: 546-551.
0
Decade: 1972-1979
1980-1989
1990-1999
2000-2008
Protein/RNA complexes
DNA/RNA hybrid
RNA only
Protein/DNA/RNA
complexes
1980’s

Technology takes
off

Structural biology
is able to focus
on medical
problems

Community
efforts to promote
data sharing

IUCr guidelines
requiring data
deposition in the
PDB are
published
DNA-containing structures (2474)
In the beginning
Protein/DNA complexes
DNA only
DNA/RNA hybrid
Protein/DNA/RNA
complexes
B-DNA
Z-DNA
1bna Dickerson & Drew
(1981) J. Mol. Biol. 149:
761-786
2dcg Wang, Quigley,
Kolpak, Crawford, van
Boom, van der Marel, Rich
(1979) Nature 282: 680-686
Decade
Protein-nucleic acid complexes (1920)
Phage 434
repressor-operator
2or1 Aggarwal, Rodgers,
Drottar, Ptashne, & Harrison
(1988) Science 242: 899-907
Number of Structures
In the beginning
Decade:
Protein/DNA complexes
Protein/RNA complexes
Protein/DNA/RNA complexes
Viruses (280 total)
Hopper, Harrison, Sauer (1984)
Structure of tomato bushy stunt virus. V.
Coat protein sequence determination
and its structural implications
J.Mol.Biol. 177: 701-713
Number of Structures
In the beginning
139
160
121
140
120
100
80
60
40
20
20
0
1980-1989
1990-1999
Decade
Helical (25)
Silva, Rossmann (1985) The refinement
of southern bean mosaic virus in
reciprocal space Acta Crystallogr. B41:
147-157
>=2000
Icosahedral
(255)
Cooperative community action
 Individual letters to editors
of journals
 Committees
– IUCr commission on Biological
Macromolecules
– ACA/USNCCr
– Richards committee
 Funding agencies
 Articles in journals
Marvin Cassman
Fred Richards
Richard Dickerson
1990’s
 Number of structures
increases exponentially
 Complexity of structures
increases
 mmCIF dictionary
created
 New databases begin to
emerge
 User base expands
dramatically
 PDB archive moves
mmCIF Working Group Members
Electron Microscopy structures
In the beginning
Bacteriorhodopsin
Henderson, Baldwin, Ceska, Zemlin,
Beckmann, Downing (1990)
J.Mol.Biol. 213: 899-929.
Ribosome structures (214)
In the beginning
Ribosome
30S
1%
1%
Ban, Nissen, Hansen, Moore, & Steitz (2000)
Science 289: 905-920; Clemons Jr., May,
Wimberly, McCutcheon, Capel, &
Ramakrishnan (1999) Nature 400: 833-840;
Schluenzen, Tocilj, Zarivach, Harms,
Gluehmann, Janell, Bashan, Bartels, Agmon,
Franceschi, Yonath (2000) Cell 102: 615-623;
Yusupova, Yusupov, Cate,& Noller (2001)
Cell 106: 233-241.
2%
Prokaryotic
41%
55%
50S
Eukaryotic
2000’s
 wwPDB is formed
 Continued growth in structures
 Structural genomics takes off
www.wwpdb.org
Number of released entries
Depositions to the PDB by decade
Year:
July 2008
What can we learn from the PDB?
Structure distribution
582
Protein-RNA complexes
655 RNA only
Ribosome
39
1093
RNA-DNA hybrid
218
DNA only
Virus
280
755
1301
Other
ProteinDNA
complexes
Other
17988
Enzyme
23466
46157
Protein only
Cellular
processes*
2911
Response*
to stimuli
500
* GO process
Biological
regulation &
signal
t transduction
4445
*
Immune
system
process*
819
Number of structures
Structure determination methods
33797
35000
30000
number_prot_rna_nmr.list
number_prot_rna_xray.list
35000
number_total_em.list
number_total_nmr.list
number_total_xray.list
30000
25000
33797
X-Ray
20000
NMR
EM
X-Ray
NMR
EM
250000
0
200000
15000
0.3
15000
10000
5000
8837
10000
5000
86
0
0
341
8837
86
341
5492
1790
2
0
1980-1989
2
154
2
0
1972-1979
5492
1990-1999
25
1790
2000-2008
6
176
20
15
0
1972-1979
1980-1989
1990-1999
Decade
10
2000-2008
5
0
N
N
N
N
ING
PH
TIO
TIO
TIO
TIO
RA
TER
AC
AC
AC
AC
OG
AT
FR
FR
FR
FR
F
F
F
F
C
I
I
I
I
S
OM
D
D
D
D
E
T
N
P
N
R
N
R
N
S
E
E
TIO
RO
RO
RO
ED
FIB
WD
LU
UT
CT
CT
AR
PO
E
SO
NE
R
L
ELE
F
E
IN
April 30, 2008
Resolution
distribution of
all structures
Resolution
Resolution distribution of
protein structures
Year
Resolution distribution of
other structures
Distinct and novel protein sequences
Percent of distinct/novel
structures
70
63%
60
51%
50
40
39%
Structures containing distinct
protein sequences (<98%)
7%
Structures containing novel
protein sequences (<30%)
37%
32%
27%
30
7%
Subset of PSI structures
14%
20
25%
16%
4%
2%
10%
10
0
1972-1979
1980-1989 1990-1999
Decade
2000-2008
Subset of other SG structures
Redundancy: protein clusters
Cluster #
Total distinct chains
in cluster
1
459
2
Protein cluster
First structure
Deposition Date
Bacteriophage T4 lysozyme
2LZM
1977-03-28
297
Hen white lysozyme
2LYZ
1975-02-01
3
196
Human lysozyme
1GFE
1984-10-12
4
445
Mouse immunoglobulin Fc&Fab
fragments
1GIG
1993-01-20
5
218
Human immunoglobulin Fc&Fab
fragments
1FC1
1981-05-21
6
330
HIV-1 protease
2HVP
1989-04-10
7
302
Trypsin (serine protease)
5PTP
1977-12-19
8
254
Thrombin
2HGT
1991-06-03
9
229
Human carbonic anhydrase II
1CA2
1976-05-22
10
185
Whale myoglobin
1MBN
1973-04-05
11
182
Human leukocyte antigen
1HLA
1987-10-15
12
178
Human hemoglobin -subunit
3HHB
1975-04-01
13
176
Human hemoglobin -subunit
3HHB
1975-04-01
14
160
Ribonuclease A
2RNS
1973-04-01
15
153
Human cyclin-dependant kinase
2 (CDK2)
1HCK
1996-06-03
Lysozyme: Lessons learned
T4 bacteriophage (459 structures)
 Amino acid replacement studies suggest
that fraction of amino acid residues that
define the structure of T4 lysozyme is
about 50%
B.W. Matthews (1996) FASEB J.10: 35-41.
Insight into folding and catalysis
Hen egg white (297 structures)
 Low sequence identity
 Structural similarity of active site to T4
B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981)
J.Mol.Biol. 147: 545-58.
Blake, Koenig, Mair, North, Phillips, Sarma
(1965) Nature 206: 757.
Insight into evolution and catalysis
Myoglobin and hemoglobin:
Lessons learned
Whale myoglobin (185 structures)
 Different ligands: oxygen, carbon dioxide1
 Amino acid substitution studies2
 Laue studies3
Insight into function and dynamics
Other species myoglobin
 Low sequence identity, same structure4
Insight into evolution
Human hemoglobin (178 structures)
Insight into function and disease (sickle cell
anemia, thalassemia)5
Other species hemoglobin
 Low sequence identity, same structure4
Profound insight into evolution
1Kuriyan,
Lodish et al.6
Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini,
Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: 8704-8709;
4Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10:1739-1749, Harrington, Adachi, Royer Jr.
TIM barrel proteins: Lessons learned
TIM barrel structures (1727)
http://www.cathdb.info
 Share the same fold but represent
significant sequence and
functional diversity
 Are enzymes or enzyme-related
proteins involved in molecular or
energy metabolism
 Comparative structure analysis
indicates evolutionary relatedness
of TIM barrel proteins
Banner, Bloomer, Petsko,
Phillips, Wilson, (1976)
Biochem.Biophys.Res.
Commun. 72: 146-155
Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65.
Nagano, Orengo,
Thornton (2002) J.Mol.
Biol. 321: 741-65.
122
311
27
39
Number of Structures
HIV-related structures (609)
110
Decade
Protease
Reverse Transcriptase
Gag protein
Integrase
Other
HIV-1 protease (311)
226 structures with ligands
Amprenavir (GSK)
Fosamprenavir (GSK)
1T7J, 1HPV
Lopinavir (Abbott)
Atazanavir (BMS)
2FXE, 2FXD, 2O4K,
2AQU, 2FND
2RKG, 2RKF,
2QHC, 2Z54,
2Q5K, 2O4S,
1RV7, 1MUI
Nelfinavir (Agouron)
Darunavir (Tibotec)
2QAK, 2PYM,
2Q63, 2PYN,
2Q64, 2R5Q,
1OHR
Tipranavir (BI)
Indinavir (Merck)
2R5P, 2B7Z,
2AVV, 2AVO,
2AVS, 1SGU,
1SDT, 1SDV,
1SDU, 1K6C,
1C6Y, 2BPX,
1HSG, 1HSH
2O4N, 2O4L, 2O4P, 1D4Y,
1D4S
Ritonavir (Abbott)
2B60,
1RL8,
1SH9,
1N49,
1HXW
Saquinavir (Roche)
3D1X, 3D1Y, 3CYX,
2NMW, 2NMZ,
2NNP, 2NMY,
2NNK, 1C6Z, 1FB7
Navia, Fitzgerald, McKeever,
Leu, Heimbach, Herber, Sigal,
Darke, Springer (1989) Nature
337: 615-620; Wlodawer, Miller,
Jaskolski, Sathyanarayana,
Baldwin, Weber, Selk, Clawson,
Schneider, Kent (1989) Science
245: 616-621
HIV-1 reverse transcriptase (110)
76 structures with ligands
Abacavir (GSK)
Nevirapine (BI)
Stavudin (BMS)
2HND, 2HNY, 1S1U,
1S1X, 1LW0, 1LWE,
1LWC, 1LWF, 1JLB,
1JLF, 1FKP, 1VRT,
3HVT
Efavirenz (BMS)
Lamivudine (GSK)
Wang, Smerdon, Jager, Kohlstaedt, Rice, Friedman,
Steitz, (1994) Proc.Natl.Acad.Sci.USA 91: 7242-7246
Zidovudine (GSK)
Emtricitabine (Gilead)
Tenofovir (Gilead)
Zalcitabine (HoffmannLaRoche)
1T05
Etravirine (Tibotec)
Delavirdine (Pfizer)
Number of Structures
1JKH, 1IKW, 1IKV,
1FKO, 1FK9
1S6P
Year
Structural coverage of KEGG pathways
50136 structures
16526 structures associated with KEGG pathway (33%)
KEGG Pathway
Number of
Structures
Complement and coagulation cascades
506
Small cell lung cancer
506
Regulation of actin cytoskeleton
449
Non-small cell lung cancer
407
Pyrimidine metabolism
402
Nitrogen metabolism
399
Two-component system - General
360
Ribosome
333
Base excision repair
328
Purine metabolism
310
Antigen processing and presentation
281
Nicotinate and nicotinamide metabolism
252
Insulin signaling pathway
248
Porphyrin and chlorophyll metabolism
248
ABC transporters - General
246
Prostate cancer
244
Human biological pathways
Complement and coagulation cascades pathway
Regulation of actin cytoskeleton
Small cell lung cancer
Non small cell lung cancer
Genes that contain a PDB structure are in red
KEGG (http://www.genome.jp/kegg/)
EM maps and Models in the PDB
How EM experiments
are archived
580 entries total
Nuclear pore
complex, 85 Å
EMD-1097
EMDataBank




Created by EBI in 2002 for archiving EM maps
US deposition/annotation site added this year
Maps stored in CCP4/MRC format
Associated metadata stored in xml format
Rotavirus V6
protein, 3.8 Å
EMD-1461
230 entries total
PBCV-1
(1m4x, 1680 matrices)
EM entries in the PDB




Atomic coordinate models fitted to EM maps
Storage format for models and metadata is CIF
Matrix representations possible
Some large entries “break” PDB format
80S ribosome
(1s1h + 1s1i)
PDBj
Goals




Common data model
Data harvesting tools
“One-stop shop” for deposition and retrieval
Tools for visualization, segmentation, and
assessment
Acknowledgements
NSF, NIGMS, DOE, NLM, NCI,
NCRR, NIBIB, NINDS, NIDDK
Wellcome Trust, EU,
CCP4, BBSRC, MRC, EMBL
BIRD-JST, MEXT
NLM
Acknowledgements
NIH GM079429 (Baylor, Rutgers, EBI) 2007- 2012
EU Network of Excellence LSHG-CT-2004-50282 (EBI) 2004-2009
Related documents