* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structural genomics of proteins from conserved biochemical
Signal transduction wikipedia , lookup
Point mutation wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Metalloprotein wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Biosynthesis wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Mitogen-activated protein kinase wikipedia , lookup
Biochemistry wikipedia , lookup
Western blot wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemical cascade wikipedia , lookup
Paracrine signalling wikipedia , lookup
Protein–protein interaction wikipedia , lookup
383 Structural genomics of proteins from conserved biochemical pathways and processes Stephen K Burley*† and Jeffrey B Bonanno* During the past year, X-ray crystallographers and solution NMR spectroscopists have made significant progress towards the complete structural characterization of conserved biochemical pathways and processes. Some of these advances were made in the context of nascent structural genomics programs, which promise to accelerate structural studies of biologically and medically important proteins. The results of high-throughput protein production, crystallization, structure determination, homology modeling and functional annotation published by two such programs have provided insight into the evolution and function of enzymes in the isoprenoid biosynthesis and ribulose monophosphate pathways. serve as templates for calculating homology models of closely related amino acid sequences, providing enormous leverage in the amount of three-dimensional information (i.e. atomic coordinates) coming from a single structure determination. Protein structures (both experimental and those obtained via homology modeling) yield insights into biochemical function and, in favorable cases, biological function, enzyme mechanism, protein–ligand interactions and oligomerization state(s). The structures of viral or bacterial proteins, and human disease gene products can also be used for identification and optimization of new pharmaceutical agents (reviewed in [1,2]). Addresses *Howard Hughes Medical Institute, Laboratories of Molecular Biophysics, The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA † Current address: Structural GenomiX Inc, 10505 Roselle Street, San Diego, California 92121, USA; e-mail: [email protected] Abbreviations CMK diphosphocytidyl-2-C-methyl-D-erythritol kinase DOXP 1-deoxy-D-xylulose-5-phosphate GHMP GK–HSK–MK–PMK superfamily GK galactokinase HSK homoserine kinase IDI1/2 type-1/2 isopentenyl diphosphate isomerase iso-GlmS glucosamine-6-phosphate synthase MDD mevalonate-5-phosphate decarboxylase MK mevalonate kinase NIGMS National Institute of General Medical Sciences NIH National Institutes of Health NYSGRC New York Structural Genomics Research Consortium PDB Protein Data Bank PHI 6-phospho-3-hexulose isomerase PMK phosphomevalonate kinase PSI Protein Structure Initiative rmsd root mean square deviation A comprehensive structural database containing experimental structures and homology models for every protein sequence found in nature will accelerate research in all areas of biomedicine. Structural biologists are poised to make decisive contributions to this objective with the development and execution of high-throughput protein structure determination combined with automated homology modeling (also known as comparative protein structure modeling). The relative simplicity of protein structure space, which is composed of only 2000–5000 conserved shapes or domain folds, provides enormous leverage for both X-ray and NMR structures. A recent study of 29 organisms with fully sequenced genomes showed that 30–40% of their proteins belong to families with more than 100 orthologous or paralogous members [3]. The careful selection of targets for experimental structure determination will permit the generation of numerous homology models within these families, providing extensive coverage of protein sequence/structure space [4]. In this review, we examine the utility of determining experimental structures and calculating homology models, with particular emphasis on studies of conserved biochemical pathways and processes recently published by two structural genomics centers. Introduction The primary objective of structural genomics High-throughput genome sequencing has changed biological and biomedical research, transforming the scale, scope and even concept of what we mean by the term ‘discovery’. For some time, structural biologists have used public-domain sequence information to identify biologically interesting and/or medically important experimental structure determination candidates. More recently, X-ray crystallographers and solution NMR spectroscopists have embarked on systematic, large-scale programs of structure determination aimed at exploring all of protein structure space. The benefits of bringing three-dimensional information to bear on the challenges posed by biological and biomedical research are well recognized. Experimental structures The overall goal of nascent international structural genomics initiatives is a fundamental three-dimensional understanding of the protein universe. The National Institutes of Health (NIH) National Institute of General Medical Sciences (NIGMS) Protein Structure Initiative (PSI) [5] has funded nine P50 Center grants and two P01 Program Project grants for structural genomics (Table 1). Additional efforts are also underway in Japan, Western Europe, Canada and South America [6]. Not surprisingly, each effort has adopted slightly different strategies to accomplish the task. Some have targeted entire proteomes of model microorganisms, some have dissected protein fold space into pathways, whereas others have utilized Current Opinion in Structural Biology 2002, 12:383–391 0959-440X/02/$ — see front matter © 2002 Elsevier Science Ltd. All rights reserved. 384 Sequences and topology Table 1 NIH NIGMS structural genomics projects. Consortium Principal investigator* Focus URL † NYSGRC Stephen K Burley Method: X-ray The Rockefeller University Targets: novel structural data; all kingdoms of life with emphasis on medically relevant proteins Structural Genomics of Pathogenic Protozoa Consortium Wim GJ Hol Method: X-ray University of Washington Targets: proteins from pathogens (Leishmania major, Trypanosoma brucei, Trypanosoma cruzi and Plasmodium falciparum) Midwest Center for Structural Genomics Andrzej Joachimiak Argonne National Laboratory UC Berkeley Structural Genomics Center Sung-Hou Kim Lawrence Berkeley National Laboratory Center for Eukaryotic Structural Genomics John L Markley University of Wisconsin, Madison Northeast Structural Genomics Consortium Gaetano Montelione Rutgers University TB Structural Genomics Consortium Thomas Terwilliger Los Alamos National Laboratory † † † † † † † The Southeast Collaboratory Bi-Cheng Wang for Structural Genomics University of Georgia Joint Center for Structural Genomics † Ian Wilson The Scripps Research Institute ‡ Structural Genomics of Timothy Cross Integral Membrane Proteins Florida State University Structure 2 Function Pilot Project at CARB/TIGR ‡ John Moult University of Maryland http://www.nysgrc.org http://depts.washington. edu/sgpp Method: X-ray Targets: novel structural data; all kingdoms of life http://www.mcsg.anl.gov Method: X-ray, NMR Targets: whole organism proteomes (X-ray, Mycoplasma genitalium; NMR, Mycoplasma pneumoniae) http://www.strgen.org Method: X-ray, NMR http://www.uwstructural Targets: novel structural, functional data; eukaryotic proteins genomics.org (model A. thaliana) Method: X-ray, NMR http://www.nesg.org Targets: novel structural data; eukaryotic proteins (e.g. S. cerevisiae, C. elegans, D. melanogaster, human) or practical prokaryotic homologs Method: X-ray, NMR Targets: whole M. tuberculosis proteome with emphasis on functionally important proteins http://www.doembi.ucla.edu/TB Method: X-ray, NMR http://128.192.15.145 Targets: whole organism proteomes (C. elegans, Pyrococcus /secsg furiosus), human proteins Method: X-ray http://www.jcsg.org Targets: novel structural data; proteins from T. maritima and C. elegans Method: NMR Targets: membrane proteins from M. tuberculosis http://magnet.fsu.edu/ ~changlin/MBPweb Method: X-ray, NMR Targets: model organism; Haemophilus influenzae http://s2f.carb.nist.gov *Spatial constraints prohibit listing all †researchers from the more‡ than 65 institutions partaking in these studies; see URLs listed in final column for other participants and institutions. NIGMS PSI P50 Center. NIGMS PSI P01 Program Project. bioinformatic analyses of sequence databases to identify potential sequence/structure families for which no structural information is available. It is generally agreed that a thorough understanding of protein sequence/structure space represents one of the most important goals of the new discipline of structural genomics (see the overview by Burley [7] in a supplement dedicated to the subject; readers are urged to examine the remainder of this supplement). Targets are chosen for high-throughput structural study with the expectation that each experimental X-ray or NMR structure will permit accurate homology modeling of a subset of protein sequence space. With careful target selection, these efforts should lead to the creation of a publicly available database containing structural information for the vast majority of protein sequences found in nature. Lessons from structural genomics: evolution of biochemical pathways and processes Perhaps the earliest ‘structural genomics’ project targeted the glycolytic pathway. Researchers from six institutions adopted a ‘divide and conquer’ philosophy to carry out structural studies of every enzyme in the biochemical pathway from glucose to pyruvate [8]. The results of that pioneering effort (and a host of more recent studies, reviewed in [9]) yielded a detailed view of enzyme mechanism and biochemical function for the entire pathway. A recent review by Teichmann et al. [10] addressed the impact of experimental structures attributed to modern structural genomics projects on our understanding of protein function, evolution and interactions. The authors summarized work on 42 proteins, commenting on structural novelty (i.e. did the work reveal a new fold?), making superfamily assignments and examining the utility of new structures for the prediction of interaction partners. Automated methods to analyze multiple genomes for the assignment of domain structure and functional annotation have been implemented and their results are available via the World Wide Web. For example, the Protein Data Bank (PDB; http://www.rcsb.org; [11]) provides direct entry points from each structure to five databases (CATH [12,13], Structural genomics Burley and Bonanno 385 Figure 1 Pathways for the biosynthesis of isopentenyl diphosphate. Isopentenyl diphosphate, the central intermediate in sterol/isoprenoid biosynthesis, is produced by two independent pathways (mevalonate-dependent and -independent), which have different evolutionary distributions. Enzymes with a representative structure in the PDB are underlined. Acetyl-CoA + Acetoacetyl-CoA D-glyceraldehyde-3-phosphate + pyruvate HMG-CoA synthase –CO 2 DOXP synthase HMG-CoA 1-deoxy-D-xylulose-5-phosphate (DOXP) NADPH –NADP+ HMG-CoA reductase O DOXP reductase HO 2C-methyl-D-erythritol-4-phosphate (MEP) CDP, –PPi O 4-diphosphocytidyl-2-C-methylD-erythritol +ATP, –ADP HO CMK/ychB/ispE O MK OH Mevalonate 5-phosphate O HO ygbB/ispF O2– P O O OH OP +ATP, –ADP PMK 4-diphosphocytidyl-2-C-methylD-erythritol-2-phosphate –CMP Mevalonate +ATP, –ADP ygbP/ispD OH OH Mevalonate 5-diphosphate OPP +ATP, –ADP, –Pi, –CO2 MDD IDI PO2– OH HO 2C-methyl-D-erythritol2,4-cyclodiphosphate GcpE/ispG LytB/ispH OPP Isopentenyl diphosphate OPP Dimethylallyl diphosphate Isoprenoid-based products Current Opinion in Structural Biology CE [14,15], FSSP [16,17], SCOP [18,19] and VAST [20]). These links provide supplementary information relating a given structure to protein sequence/structure families and superfamilies. Profile-based searching strategies (IMPALA [21], PSI-BLAST [22] and hidden Markov models [23–25]) have dramatically increased the radius of convergence for the detection of distant sequence relationships [26–33]. Other publicly available resources that match amino acid patterns or motifs to databases of profiles or structural alignments are also undergoing continuous development (Pfam [34,35], PROSITE [36,37], SMART [38], BLOCKS [39], InterPro [40] and CDD [41]). Finally, Bourne and co-workers [42–44] have developed a database of ‘conserved key amino acid positions’ (CKAAPs), which can be searched to reveal more elusive structural and functional relationships. Research Consortium (NYSGRC; http://www.nysgrc.org; an NIH-NIGMS-funded P50 PSI Structural Genomics Center; Table 1) [45••]. One of the first set of NYSGRC structure determination targets was Saccharomyces cerevisiae mevalonate-5-phosphate decarboxylase (MDD; NYSGRC target P100; PDB code 1FI4), an enzyme from the mevalonate-dependent sterol/isoprenoid biosynthesis pathway that catalyzes the third of three ATP-dependent steps responsible for the conversion of mevalonate to isopentenyl diphosphate (Figure 1). Identified as a protein from a large superfamily with no available structural information, MDD was rapidly cloned, expressed, purified and crystallized, resulting in a high-resolution X-ray structure (Figure 2a). At the time that these studies were completed, MDD had no homologs in the PDB, as judged by the DALI server [46,47]. Structural genomics of sterol/isoprenoid biosynthesis Automated homology modeling with MODPIPE [48] and the structure of yeast MDD yielded high-quality models for the MDD enzyme family (22 sequences), plus a much larger number of less accurate homology models for various GHMP small-molecule kinases [49], including the galactokinases A case study of structural genomics applied to the problem of understanding a medically important biochemical pathway has been published by the New York Structural Genomics 386 Sequences and topology Figure 2 GKs), homoserine kinases (H HSKs), mevalonate kinases (G MKs) and phosphomevalonate kinases (P PMKs), as well (M as for diphosphocytidyl-2-C-methyl-D-erythritol kinases (CMK, an enzyme in the mevalonate-independent 1-deoxy-D-xylulose-5-phosphate or DOXP pathway; Figure 1), other poorly characterized enzymes and some hypothetical proteins. Mapping the sequence similarity among the MDDs to the molecular surface of yeast MDD identified a cleft lined with highly conserved residues in close proximity to a conserved ATP-binding motif and permitted identification of the enzyme active site. (a) Following the MDD structure determination and deposition of the MDD atomic coordinates in the PDB, X-ray structures of Methanococcus jannaschii HSK, its binary complex with ADP (Figure 2b; PDB codes 1FWL and 1FWK; [50•]) and its ternary complexes with ATP analogs and various substrates (PDB codes 1H72, 1H73 and 1H74; [51•]) were reported. As anticipated from earlier homology modeling efforts with MDD, the X-ray structures of MDD and HSK are strikingly similar, despite low amino acid sequence identity (13% identity for 276 structurally equivalent α-carbons with rmsd 3.0 Å). The bound nucleotide and substrates found in the HSK co-crystal structures confirmed the predicted location of the MDD active site. (b) Inspection of the homology modeling results obtained with both structures (MDD and HSK) revealed models encompassing eight discrete enzyme activities and three distinct groups of hypothetical sequences. The models provided direct insights into the evolution of the mevalonate-dependent sterol/isoprenoid biosynthesis pathway. All three enzymes on the pathway from mevalonate to isopentenyl diphosphate (MK, PMK and MDD; Figure 1) share a common fold and may provide an example of a process that was originally termed retrograde evolution by Horowitz [52]. The evolution of the mevalonate-dependent (MVA) and -independent (DOXP) pathways has been the subject of recent comparative genomics analyses [53–55]. The distribution and loci of genes for the two pathways suggest that the MVA pathway initially appeared in an archaebacterium or a primitive eukaryote, and was subsequently acquired by Gram-positive cocci and Borrelia burgdorferi through lateral gene transfer events. (c) Current Opinion in Structural Biology Structural comparison of MDD, HSK and PMK — GHMP kinase superfamily members. Ribbon drawings of (a) MDD , (b) HSK and (c) PMK in the same orientation. P-loops are colored red. Pairwise comparison between structures: MDD/HSK, rmsd 3.0 Å, 13% sequence identity; MDD/PMK, rmsd 3.4 Å, 8% sequence identity; PMK/HSK, rmsd 2.8 Å, 16% sequence identity. Genes encoding MK, PMK and MDD fall within a single operon in multiple archaebacteria [55] and, in several cases, this isoprenoid biosynthesis operon also includes type-2 isopentenyl diphosphate isomerase (IDI2), one of two distinct enzymes that act on the product of MDD catalysis. The gene encoding IDI2 was previously annotated as a carotenoid-biosynthesis-related enzyme, but has recently been characterized as possessing FMN- and NADPHdependent IDI activity [56]. (Another enzyme, LytB, has been implicated in a similar function in an engineered Escherichia coli strain [57].) The structure of type-1 IDI (IDI1; Figure 1) has been reported by the NYSGRC (PDB code 1I9A; [45••]) and by Durbecq et al. [58•] (PDB codes 1HZT and 1HX3). Structural genomics Burley and Bonanno It is evident from these reports that the mevalonateindependent pathway is almost certainly older than its mevalonate-dependent counterpart. Bacteria that possess both sterol/isoprenoid biosynthesis pathways (some Streptomyces, for example) are thought to use the DOXP pathway to produce primary metabolites during exponential growth, while employing nonessential MVA pathway enzymes in the stationary phase to produce secondary factors, including antibiotics [59–62]. Given that the more ancient DOXP pathway contains CMK [63,64], which possesses the GHMP kinase fold, it is also likely that one of the other GHMP kinase superfamily members (although not necessarily CMK) is the evolutionary ancestor of MK, PMK and MDD [49]. Multiple gene duplications followed by mutations to impart new substrate specificity could generate the three genes necessary to convert mevalonate into isopentenyl diphosphate. It appears more than fortuitous that these genes (and, in some cases, the gene encoding IDI2) are under the control of a single bacterial promoter. This chromosomal arrangement may have facilitated lateral gene transfer during evolution following the ‘selfish operon’ model [65,66]. Presumably, loss of the operon structure in more distantly related species reflects recombination events. Observed operon structures may, therefore, be of limited use in studying the evolution of the sterol/isoprenoid biosynthesis pathways [67]. Homology modeling of GHMP kinase superfamily members other than MDDs and HSKs depended on distantly related experimental templates (i.e. those with less than the generally accepted threshold of 30% amino acid identity to the modeled protein sequence). These models are almost certainly less accurate than those calculated with closely related templates (i.e. >30% identity between the template and the modeled sequence; see [68,69] for comprehensive analyses of errors in comparative protein structure modeling). A phylogenetic analysis of all modeled GHMP kinase superfamily members identified 19 30% sequence identity clusters, from which 17 new structure determination targets were selected. To date, three of these targets have been examined by X-ray crystallography, one by the NYSGRC (PMK) [70•] and two (MK, vide infra) by independent laboratories [71•,72•]. The structure of Streptococcus pneumoniae PMK (PDB code 1K47; NYSGRC target T27; Figure 2c) confirmed the predicted fold assignment, identified conserved residues in the active site and allowed the more accurate modeling of enzymes constituting the 30% sequence identity PMK cluster (accurate models were obtained for all but fungal PMKs). M. jannaschii MK (PDB code 1KKH) was also shown to be a member of the GHMP kinase sequence/structure superfamily and in silico docking of ATP and mevalonate (based on the HSK co-crystal structures) into the active site revealed a constellation of putative contacts with conserved residues. The co-crystal structure of rat MK bound to ATP (PDB code 1KVK) was used to 387 understand the structural and functional consequences of mutations causing human hyperimmunoglobulinaemia D (hyper-IgD) and periodic fever syndrome or mevalonic aciduria [73,74]. Although structures of both archaeal and eukaryotic MKs have been elucidated, the results of clustering analyses suggest that eubacterial MKs cannot be modeled at acceptable accuracy levels using the existing X-ray structures as templates. The NYSGRC is proceeding with the elucidation of an additional MK structure from a Gram-positive eubacterium. Once this experimental structure is available, accurate homology modeling of MKs from all three living kingdoms should be possible and reliable structural information will be made publicly available for all extant MK sequences via MODBASE ([48,75]; http://www.nysgrc.org). There is also considerable interest in the mevalonateindependent pathway outside the context of organized structural genomics efforts. Both academic and industrial research laboratories have completed X-ray structure determinations of DOXP pathway enzymes. Stubbs and co-workers [76•] published the structure of DOXP reductoisomerase or DOXP reductase (PDB code 1K5H; Figure 1), revealing a V-shaped monomer that is composed of an N-terminal dinucleotide-binding domain, a linker region and a C-terminal four-helix bundle domain. The linker region is responsible for dimerization and harbors most of the active site residues, which include strictly conserved acidic residues thought to coordinate catalytic divalent metals. Another group has also deposited coordinates for this structure (PDB code 1JVS). Both Noel and co-workers [77•] and Structural GenomiX Inc have determined the structure of 4-diphosphocytidyl-2-Cmethyl-D-erythritol synthetase (ygbP/ispD; PDB codes 1INI, 1INJ and 1I52; Figure 1). Additional crystallographic work at Structural GenomiX Inc, the Max-Planck-Institut für Biochemie, the Salk Institute and the Center for Advanced Research in Biotechnology (CARB) has yielded structures of 2-C-methyl-D-erythritol-2,4-cyclodiphosphate synthase [78•,79•] (ygbB/ispF; PDB codes 1JY8, 1KNK, 1KNJ and 1JN1; also 1IV1–4, deposited by an unidentified research group). The enzymes GcpE and LytB are thought to be involved in the mevalonate-independent pathway [80–84] and have been called ispG and ispH, respectively, as they appear to catalyze steps downstream of ispF in engineered E. coli [57,85]. These X-ray structures and the homology model of CMK/ychB/ispE provide three-dimensional information for many of the mevalonate-independent pathway enzymes depicted in Figure 1, leaving DOXP synthase, ispG and ispH as structure determination targets. A similar situation prevails for the mevalonate-dependent pathway, for which only structures of IDI2 and HMG-CoA synthase are lacking (low accuracy homology models for the latter can be computed using a β-ketoacyl-acp synthase III template; PDB code 1HNJ). X-ray structures of the 388 Sequences and topology Figure 3 (a) Structural comparison of M. jannaschii target MJ1247 and iso-GlmS. Ribbon drawings of (a) tetrameric MJ1247 and (b) the dimeric isomerization domain of iso-GlmS in similar topological orientations. (b) Current Opinion in Structural Biology catalytic domain of HMG-CoA reductase have been available for some time [86,87] and an exhaustive series of co-crystal structures with various lipid-lowering HMG-CoA reductase inhibitors or statins has been published recently by Diesenhofer and co-workers [88–90]. Structural genomics to assign biochemical function Another illustrative example of the utility of structural genomics in establishing functional annotations for enzymatic processes was published by the UC Berkeley Structural Genomics Center [91••] (Table 1). The sequence of M. jannaschii target MJ1247, annotated as a hypothetical protein, was presumptively identified as a distant relative of 6-phospho-3-hexulose isomerase (PHI, an enzyme in the ribulose monophosphate pathway for formaldehyde fixation). The X-ray structure of MJ1247 (PDB code 1JEO), determined by Se-Met multiwavelength anomalous dispersion (or MAD) [92,93], revealed a homotetrameric arrangement of α–β–α sandwich monomers (Figure 3a). Significant structural similarity to two PDB entries was detected by the DALI server [46,47] and, not surprisingly, these related sequences were found to have been previously annotated with small-molecule phosphate isomerase activity: the isomerase domain of E. coli glucosamine-6phosphate synthase (iso-GlmS; PDB code 1MOQ; subdomain 1: 20% identical with rmsd 2.9 Å for 148 equivalent α carbons; subdomain 2: 10% identical with rmsd 3.2 Å for 157 equivalent α carbons; vide infra) and phosphoglucose isomerase (PGI; PDB code 2PGI; 6% identical with rmsd 3.8 Å for 143 equivalent α carbons). Inspection of a sequence alignment of MJ1247 with known or putative PHIs revealed regions of highly conserved residues mapping to features on the surface of the tetramer that coalesce to form four clefts. The structural alignment of tetrameric MJ1247 with dimeric iso-GlmS (the monomer is composed of two isostructural subdomains and the dimer possesses the same overall topology as the MJ1247 tetramer; Figure 3a,b) revealed four structurally conserved residues in each cleft, three serine and one threonine, which contribute to phosphate binding in iso-GlmS. Surface residues that determine the sugar-binding specificity of iso-GlmS are not present in the MJ1247 sequence and appear to be replaced by amino acids that support the binding of ligand(s) specific to MJ1247. Finally, an isomerization assay of PHI by MJ1247 confirmed the functional annotation inferred by X-ray crystallography. Conclusions and perspectives Nascent structural genomics initiatives and combined research efforts in both academic and industrial research laboratories are determining an enormous number of experimental protein structures. When coupled with automated homology modeling, these structures provide a wealth of three-dimensional information that eventually promises to encompass most of the proteins found in nature. Beyond the obvious technical difficulties inherent in large-scale experimental and computational efforts aimed at structural characterization of the universe of protein sequences, there are considerable challenges ahead for the field of bioinformatics. Organizing this vast body of structural information, attributing accurate functional annotations and integrating these data with the results of Structural genomics Burley and Bonanno 389 expression profiling and protein–protein and protein–ligand interaction studies (among others) represent very real bottlenecks in our quest for knowledge in biology. 22. Altschul SF, Madden TL, Schaffer AA, Zhang JZ, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. References and recommended reading 23. Krogh A, Brown M, Mian IS, Sjolander K, Haussler D: Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 1994, 235:1501-1531. Papers of particular interest, published within the annual period of review, have been highlighted as: • of special interest •• of outstanding interest 1. Harris T: Genetics, genomics, and drug discovery. Med Res Rev 2000, 20:203-211. 2. Weng Z, DeLisi C: Protein therapeutics: promises and challenges for the 21st century. Trends Biotechnol 2002, 20:29-35. 3. Liu J, Rost B: Comparing function and structure between entire proteomes. Protein Sci 2001, 10:1970-1979. 4. Vitkup D, Melamud E, Moult J, Sander C: Completeness in structural genomics. Nat Struct Biol 2001, 8:559-566. 5. Norvell JC, Machalek AZ: Structural genomics programs at the US National Institute of General Medical Sciences. Nat Struct Biol 2000, 7(suppl):931. 6. Stevens RC, Yokoyama S, Wilson IA: Global efforts in structural genomics. Science 2001, 294:89-92. 7. Burley SK: An overview of structural genomics. Nat Struct Biol 2000, 7(suppl):932-934. 8. Campbell JW, Duee E, Hodgson G, Mercer WD, Stammers DK, Wendell PL, Muirhead H, Watson HC: X-ray diffraction studies on enzymes in the glycolytic pathway. Cold Spring Harb Symp Quant Biol 1972, 36:165-170. 9. Erlandsen H, Abola EE, Stevens RC: Combining structural genomics and enzymology: completing the picture in metabolic pathways and enzyme active sites. Curr Opin Struct Biol 2000, 10:719-730. 10. Teichmann SA, Murzin AG, Chothia C: Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol 2001, 11:354-363. 11. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28:235-242. 12. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH—a hierarchic classification of protein domain structures. Structure 1997, 5:1093-1108. 24. Eddy SR: Hidden Markov models. Curr Opin Struct Biol 1996, 6:361-365. 25. Hughey R, Krogh A: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 1996, 12:95-107. 26. Buchan DW, Shepherd AJ, Lee D, Pearl FM, Rison SC, Thornton JM, Orengo CA: Structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res 2002, 12:503-514. 27. Orengo CA, Bray JE, Buchan DW, Harrison A, Lee D, Pearl FM, Sillitoe I, Todd AE, Thornton JM: The CATH protein family database: a resource for structural and functional annotation of genomes. Proteomics 2002, 2:11-21. 28. Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, Orengo CA: The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 2002, 11:233-244. 29. Gough J, Chothia C: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 2002, 30:268-272. 30. Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N: SUPFAM—a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes. Nucleic Acids Res 2002, 30:289-293. 31. Karplus K, Sjolander K, Barrett C, Cline M, Haussler D, Hughey R, Holm L, Sander C: Predicting protein structure using hidden Markov models. Proteins 1997, 1(suppl):134-139. 32. Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14:846-856. 33. Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 2001, 313:903-919. 13. Orengo CA, Pearl FMG, Bray JE, Todd AE, Martin AC, Lo Conte L, Thornton JM: The CATH Database provides insights into protein structure/function relationship. Nucleic Acids Res 1999, 27:275-279. 34. Sonnhammer EL, Eddy SR, Durbin R: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 1997, 28:405-420. 14. Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11:739-747. 35. Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002, 30:276-280. 15. Shindyalov IN, Bourne PE: A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm. Nucleic Acids Res 2001, 29:228-229. 36. Bairoch A: PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res 1991, 19:2241-2245. 16. Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G: A database of protein structure families with common folding motifs. Protein Sci 1992, 1:1691-1698. 17. Holm L, Sander C: Touring protein fold space with Dali/FSSP. Nucleic Acids Res 1998, 26:316-319. 18. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247:536-540. 19. Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res 2002, 30:264-267. 37. Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A: The PROSITE database, its status in 2002. Nucleic Acids Res 2002, 30:235-238. 38. Letunic I, Goodstadt L, Dickens NJ, Doerks T, Schultz J, Mott R, Ciccarelli F, Copley RR, Ponting CP, Bork P: Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res 2002, 30:242-244. 39. Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 2000, 28:228-230. 20. Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6:377-385. 40. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD et al.: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 2001, 29:37-40. 21. Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 1999, 15:1000-1011. 41. Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH: CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 2002, 30:281-283. 390 Sequences and topology 42. Li WW, Reddy BV, Shindyalov IN, Bourne PE: CKAAPs DB: a conserved key amino acid positions database. Nucleic Acids Res 2001, 29:329-331. 59. Dairi T, Hamano Y, Kuzuyama T, Itoh N, Furihata K, Seto H: Eubacterial diterpene cyclase genes essential for production of the isoprenoid antibiotic terpentecin. J Bacteriol 2001, 183:6085-6094. 43. Reddy BV, Li WW, Shindyalov IN, Bourne PE: Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins. Proteins 2001, 42:148-163. 60. Shiomi K, Iinuma H, Naganawa H, Isshiki K, Takeuchi T, Umezawa H: Biosynthesis of napyradiomycins. J Antibiot (Tokyo) 1987, 40:1740-1745. 44. Li WW, Reddy BV, Tate JG, Shindyalov IN, Bourne PE: CKAAPs DB: a Conserved Key Amino Acid Positions DataBase. Nucleic Acids Res 2002, 30:409-411. 61. Seto H, Orihara N, Furihata K: Studies on the biosynthesis of terpenoids produced by Actinomycetes. Part 4. Formation of BE-40644 by the mevalonate and non-mevalonate pathways. Tetrahedron Lett 1998, 39:9497-9500. 45. Bonanno JB, Edo C, Eswar N, Pieper U, Romanowski MJ, Ilyin V, •• Gerchman SE, Kycia H, Studier FW, Sali A et al.: Structural genomics of enzymes involved in steroid/isoprenoid biosynthesis. Proc Natl Acad Sci USA 2001, 98:12896-12901. The NYSGRC reports the X-ray structures of MDD and IDI1 from the sterol/isoprenoid biosynthesis pathway. This paper gives a descriptive account of structures resulting from high-throughput methodologies. The implications of automated homology modeling for fold assignment, pathway evolution and target selection for structural genomics are discussed. 46. Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233:123-138. 47. Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L: A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucleic Acids Res 2001, 29:55 57. 48. Sanchez R, Pieper U, Mirkovic N, de Bakker PI, Wittenstein E, Sali A: MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 2000, 28:250-253. 49. Bork P, Sander C, Valencia A: Convergent evolution of similar enzymatic function on different protein folds: the hexokinase, ribokinase, and galactokinase families of sugar kinases. Protein Sci 1993, 2:31-40. 50. Zhou T, Daugherty M, Grishin NV, Osterman AL, Zhang H: Structure • and mechanism of homoserine kinase: prototype for the GHMP kinase superfamily. Structure 2000, 8:1247-1257. The X-ray structure of HSK, which is structurally similar to MDD from the MVA pathway, is reported. This first published structure of a bona fide GHMP kinase offers a platform for expanding understanding of a large and diverse superfamily of enzymes. 51. Krishna SS, Zhou T, Daugherty M, Osterman A, Zhang H: Structural • basis for the catalysis and substrate specificity of homoserine kinase. Biochemistry 2001, 40:10810-10818. Further X-ray studies of the HSK mechanism of action. This paper underscores the need for collaborative interaction of structural genomics programs with other academic laboratories to further investigate enzyme mechanism and specificity. 52. Horowitz NH: On the evolution of biochemical syntheses. Proc Natl Acad Sci USA 1945, 31:153-157. 53. Smit A, Mushegian A: Biosynthesis of isoprenoids via mevalonate in Archaea: the lost pathway. Genome Res 2000, 10:1468-1484. 54. Boucher Y, Doolittle WF: The role of lateral gene transfer in the evolution of isoprenoid biosynthesis pathways. Mol Microbiol 2000, 37:703-716. 55. Wilding EI, Brown JR, Bryant AP, Chalker AF, Holmes DJ, Ingraham KA, Iordanescu S, So CY, Rosenberg M, Gwynn MN: Identification, evolution, and essentiality of the mevalonate pathway for isopentenyl diphosphate biosynthesis in Gram-positive cocci. J Bacteriol 2000, 182:4319-4327. 56. Kaneda K, Kuzuyama T, Takagi M, Hayakawa Y, Seto H: An unusual isopentenyl diphosphate isomerase found in the mevalonate pathway gene cluster from Streptomyces sp. strain CL190. Proc Natl Acad Sci USA 2001, 98:932-937. 57. Rohdich F, Hecht S, Gartner K, Adam P, Krieger C, Amslinger S, Arigoni D, Bacher A, Eisenreich W: Studies on the nonmevalonate terpene biosynthetic pathway: metabolic role of IspH (LytB) protein. Proc Natl Acad Sci USA 2002, 99:1158-1163. 58. Durbecq V, Sainz G, Oudjama Y, Clantin B, Bompard-Gilles C, • Tricot C, Caillet J, Stalon V, Droogmans L, Villeret V: Crystal structure of isopentenyl diphosphate:dimethylallyl diphosphate isomerase. EMBO J 2001, 20:1530-1537. The recently determined X-ray structures of apo and metal-chelated IDI1 from the sterol/isoprenoid biosynthesis pathway are reported. 62. Seto H, Watanabe H, Furihata K: Simultaneous operation of the mevalonate and non-mevalonate pathways in the biosynthesis of isopentenyl diphosphate in Streptomyces aeriouvifer. Tetrahedron Lett 1996, 37:7979-7982. 63. Rohdich F, Wungsintaweekul J, Luttgen H, Fischer M, Eisenreich W, Schuhr CA, Fellermeier M, Schramek N, Zenk MH, Bacher A: Biosynthesis of terpenoids: 4-diphosphocytidyl-2-C-methyl-Derythritol kinase from tomato. Proc Natl Acad Sci USA 2000, 97:8251-8256. 64. Eisenreich W, Rohdich F, Bacher A: Deoxyxylulose phosphate pathway to terpenoids. Trends Plant Sci 2001, 6:78-84. 65. Lawrence JG: Selfish operons and speciation by gene transfer. Trends Microbiol 1997, 5:355-359. 66. Lawrence JG, Ochman H: Reconciling the many faces of lateral gene transfer. Trends Microbiol 2002, 10:1-4. 67. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 2001, 11:356-372. 68. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A: Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 2000, 29:291-325. 69. Tramontano A, Leplae R, Morea V: Analysis and assessment of comparative modeling predictions in CASP4. Proteins 2001, 45:22-38. 70. Romanowski MR, Bonanno JB, Burley SK: Crystal structure of • Streptococcus pneumoniae phosphomevalonate kinase. Proteins 2002, 47:568-571. The NYSGRC X-ray structure of PMK from the MVA pathway. This structure was pursued as a result of directed target selection based on the structures of MDD and HSK. 71. Yang D, Shipman LW, Roessner CA, Scott AI, Sacchettini JC: • Structure of the Methanococcus jannaschii mevalonate kinase – a member of the GHMP kinase superfamily. J Biol Chem 2002, 277:9462-9467. The X-ray structure of an archaeal MK from the MVA pathway. As another example of a prototypical GHMP kinase, this structure allows biologists to probe the differences in specificity of the superfamily. 72. Fu Z, Wang M, Potter D, Miziorko HM, Kim JJ: The structure of a • binary complex between a mammalian mevalonate kinase and ATP: insights into the reaction mechanism and human inherited disease. J Biol Chem 2002, 27:in press. The X-ray structure of a eukaryotic MK from the MVA pathway. This structure may allow direct insight into mutations in human MK that are implicated in disease. 73. Houten SM, Koster J, Romeijn G-J, Frenkel J, Di Rocco M, Caruso U, Landrieu P, Kelly RI, Kuis W, Poll-The BT et al.: Organization of the mevalonate kinase (MVK) gene and identification of novel mutations causing mevalonic aciduria and hyperimmunoglobulinaemia D and periodic fever syndrome. Eur J Hum Genet 2001, 9:253-259. 74. Cuisset L, Drenth JPH, Simon A, Vincent MF, van der Velde Visser S, van der Meer JWM, Grateau G, Delpech M: Molecular analysis of MVK mutations and enzymatic activity in hyper-IgD and periodic fever syndrome. Eur J Hum Genet 2001, 9:260-266. 75. Sanchez R, Sali A: ModBase: a database of comparative protein structure models. Bioinformatics 1999, 15:1060-1061. 76. Reuter K, Sanderbrand S, Jomaa H, Wiesner J, Steinbrecher I, Beck E, • Hintz M, Klebe G, Stubbs MT: Crystal structure of 1-deoxy-Dxylulose-5-phosphate reductoisomerase, a crucial enzyme in the non-mevalonate pathway of isoprenoid biosynthesis. J Biol Chem 2002, 277:5378-5384. The X-ray structure of DOXP reductase from the DOXP pathway is revealed. Structural genomics Burley and Bonanno 77. • Richard SB, Bowman ME, Kwiatkowski W, Kang I, Chow C, Lillo AM, Cane DE, Noel JP: Structure of 4-diphosphocytidyl-2-Cmethylerythritol synthetase involved in mevalonate-independent isoprenoid biosynthesis. Nat Struct Biol 2001, 8:641-648. The X-ray structure of ygbP/ispD from the DOXP pathway is revealed. 78. Richard SB, Ferrer JL, Bowman ME, Lillo AM, Tetzlaff CN, Cane DE, • Noel JP: structure and mechanism of 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase. An enzyme in the mevalonateindependent isoprenoid biosynthetic pathway. J Biol Chem 2002, 277:8667-8672. The X-ray structure of ygbB/ispF from the DOXP pathway is revealed. 79. Steinbacher S, Kaiser J, Wungsintaweekul J, Hecht S, Eisenreich W, • Gerhardt S, Bacher A, Rohdich F: Structure of 2C-methyl-Derythritol-2,4-cyclodiphosphate synthase involved in mevalonate-independent biosynthesis of isoprenoids. J Mol Biol 2002, 316:79-88. The X-ray structure of ygbB/ispF in the DOXP pathway. 80. Cunningham FX Jr, Lafond TP, Gantt E: Evidence of a role for LytB in the nonmevalonate pathway of isoprenoid biosynthesis. J Bacteriol 2000, 182:5841-5848. 81. Altincicek B, Kollas A, Eberl M, Wiesner J, Sanderbrand S, Hintz M, Beck E, Jomaa H: LytB, a novel gene of the 2-C-methyl-D-erythritol 4-phosphate pathway of isoprenoid biosynthesis in Escherichia coli. FEBS Lett 2001, 499:37-40. 391 85. Hecht S, Eisenreich W, Adam P, Amslinger S, Kis K, Bacher A, Arigoni D, Rohdich F: Studies on the nonmevalonate pathway to terpenes: the role of the GcpE (IspG) protein. Proc Natl Acad Sci USA 2001, 98:14837-14842. 86. Lawrence CM, Rodwell VW, Stauffacher CV: Crystal structure of Pseudomonas mevalonii HMG-CoA reductase at 3.0 angstrom resolution. Science 1995, 268:1758-1762. 87. Tabernero L, Bochar DA, Rodwell VW, Stauffacher CV: Substrate-induced closure of the flap domain in the ternary complex structures provides insights into the mechanism of catalysis by 3-hydroxy-3-methylglutaryl-CoA reductase. Proc Natl Acad Sci USA 1999, 96:7167-7171. 88. Istvan ES, Palnitkar M, Buchanan SK, Deisenhofer J: Crystal structure of the catalytic portion of human HMG-CoA reductase: insights into regulation of activity and catalysis. EMBO J 2000, 19:819-830. 89. Istvan ES, Deisenhofer J: Structural mechanism for statin inhibition of HMG-CoA reductase. Science 2001, 292:1160-1164. 90. Istvan ES, Deisenhofer J: The structure of the catalytic portion of human HMG-CoA reductase. Biochim Biophys Acta 2000, 1529:9-18. 82. Campos N, Rodriguez-Concepcion M, Seemann M, Rohmer M, Boronat A: Identification of gcpE as a novel gene of the 2-C-methyl-D-erythritol 4-phosphate pathway for isoprenoid biosynthesis in Escherichia coli. FEBS Lett 2001, 488:170-173. 91. Martinez-Cruz LA, Dreyer MK, Boisvert DC, Yokota H, •• Martinez-Chantar ML, Kim R, Kim SH: Crystal structure of MJ1247 protein from M. jannaschii at 2.0 Å resolution infers a molecular function of 3-hexulose-6-phosphate isomerase. Structure 2002, 10:195-204. A report of functional annotation via X-ray crystallography from the UC Berkeley Structural Genomics Center. 83. Altincicek B, Kollas AK, Sanderbrand S, Wiesner J, Hintz M, Beck E, Jomaa H: GcpE is involved in the 2-C-methyl-D-erythritol 4-phosphate pathway of isoprenoid biosynthesis in Escherichia coli. J Bacteriol 2001, 183:2411-2416. 92. Hendrickson WA, Horton JR, LeMaster DM: Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO J 1990, 9:1665-1672. 84. McAteer S, Coulson A, McLennan N, Masters M: The lytB gene of Escherichia coli is essential and specifies a product needed for isoprenoid biosynthesis. J Bacteriol 2001, 183:7403-7407. 93. Hendrickson W: Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 1991, 254:51-58.