Download Structural genomics of proteins from conserved biochemical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Point mutation wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Enzyme wikipedia , lookup

Gene expression wikipedia , lookup

Metabolism wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

Metalloprotein wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Biosynthesis wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Interactome wikipedia , lookup

Mitogen-activated protein kinase wikipedia , lookup

Biochemistry wikipedia , lookup

Western blot wikipedia , lookup

Protein wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biochemical cascade wikipedia , lookup

Paracrine signalling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
383
Structural genomics of proteins from conserved biochemical
pathways and processes
Stephen K Burley*† and Jeffrey B Bonanno*
During the past year, X-ray crystallographers and solution NMR
spectroscopists have made significant progress towards the
complete structural characterization of conserved biochemical
pathways and processes. Some of these advances were made
in the context of nascent structural genomics programs, which
promise to accelerate structural studies of biologically and
medically important proteins. The results of high-throughput
protein production, crystallization, structure determination,
homology modeling and functional annotation published by two
such programs have provided insight into the evolution and
function of enzymes in the isoprenoid biosynthesis and ribulose
monophosphate pathways.
serve as templates for calculating homology models of
closely related amino acid sequences, providing enormous
leverage in the amount of three-dimensional information
(i.e. atomic coordinates) coming from a single structure
determination. Protein structures (both experimental and
those obtained via homology modeling) yield insights into
biochemical function and, in favorable cases, biological
function, enzyme mechanism, protein–ligand interactions
and oligomerization state(s). The structures of viral or
bacterial proteins, and human disease gene products can
also be used for identification and optimization of new
pharmaceutical agents (reviewed in [1,2]).
Addresses
*Howard Hughes Medical Institute, Laboratories of Molecular
Biophysics, The Rockefeller University, 1230 York Avenue, New York,
New York 10021, USA
† Current address: Structural GenomiX Inc, 10505 Roselle Street,
San Diego, California 92121, USA; e-mail: [email protected]
Abbreviations
CMK
diphosphocytidyl-2-C-methyl-D-erythritol kinase
DOXP
1-deoxy-D-xylulose-5-phosphate
GHMP
GK–HSK–MK–PMK superfamily
GK
galactokinase
HSK
homoserine kinase
IDI1/2
type-1/2 isopentenyl diphosphate isomerase
iso-GlmS glucosamine-6-phosphate synthase
MDD
mevalonate-5-phosphate decarboxylase
MK
mevalonate kinase
NIGMS
National Institute of General Medical Sciences
NIH
National Institutes of Health
NYSGRC New York Structural Genomics Research Consortium
PDB
Protein Data Bank
PHI
6-phospho-3-hexulose isomerase
PMK
phosphomevalonate kinase
PSI
Protein Structure Initiative
rmsd
root mean square deviation
A comprehensive structural database containing experimental
structures and homology models for every protein
sequence found in nature will accelerate research in all
areas of biomedicine. Structural biologists are poised to
make decisive contributions to this objective with the
development and execution of high-throughput protein
structure determination combined with automated
homology modeling (also known as comparative protein
structure modeling). The relative simplicity of protein
structure space, which is composed of only 2000–5000
conserved shapes or domain folds, provides enormous
leverage for both X-ray and NMR structures. A recent
study of 29 organisms with fully sequenced genomes
showed that 30–40% of their proteins belong to families
with more than 100 orthologous or paralogous members
[3]. The careful selection of targets for experimental
structure determination will permit the generation of
numerous homology models within these families, providing
extensive coverage of protein sequence/structure space
[4]. In this review, we examine the utility of determining
experimental structures and calculating homology models,
with particular emphasis on studies of conserved biochemical
pathways and processes recently published by two structural
genomics centers.
Introduction
The primary objective of structural genomics
High-throughput genome sequencing has changed biological
and biomedical research, transforming the scale, scope and
even concept of what we mean by the term ‘discovery’. For
some time, structural biologists have used public-domain
sequence information to identify biologically interesting
and/or medically important experimental structure
determination candidates. More recently, X-ray crystallographers and solution NMR spectroscopists have embarked
on systematic, large-scale programs of structure determination aimed at exploring all of protein structure space.
The benefits of bringing three-dimensional information to
bear on the challenges posed by biological and biomedical
research are well recognized. Experimental structures
The overall goal of nascent international structural
genomics initiatives is a fundamental three-dimensional
understanding of the protein universe. The National
Institutes of Health (NIH) National Institute of General
Medical Sciences (NIGMS) Protein Structure Initiative
(PSI) [5] has funded nine P50 Center grants and two P01
Program Project grants for structural genomics (Table 1).
Additional efforts are also underway in Japan, Western
Europe, Canada and South America [6]. Not surprisingly,
each effort has adopted slightly different strategies to
accomplish the task. Some have targeted entire proteomes
of model microorganisms, some have dissected protein
fold space into pathways, whereas others have utilized
Current Opinion in Structural Biology 2002, 12:383–391
0959-440X/02/$ — see front matter
© 2002 Elsevier Science Ltd. All rights reserved.
384
Sequences and topology
Table 1
NIH NIGMS structural genomics projects.
Consortium
Principal investigator*
Focus
URL
†
NYSGRC
Stephen K Burley
Method: X-ray
The Rockefeller University Targets: novel structural data; all kingdoms of life with
emphasis on medically relevant proteins
Structural Genomics
of Pathogenic Protozoa
Consortium
Wim GJ Hol
Method: X-ray
University of Washington Targets: proteins from pathogens (Leishmania major,
Trypanosoma brucei, Trypanosoma cruzi and Plasmodium
falciparum)
Midwest Center for
Structural Genomics
Andrzej Joachimiak
Argonne National
Laboratory
UC Berkeley Structural
Genomics Center
Sung-Hou Kim
Lawrence Berkeley
National Laboratory
Center for Eukaryotic
Structural Genomics
John L Markley
University of Wisconsin,
Madison
Northeast Structural
Genomics Consortium
Gaetano Montelione
Rutgers University
TB Structural Genomics
Consortium
Thomas Terwilliger
Los Alamos National
Laboratory
†
†
†
†
†
†
†
The Southeast Collaboratory Bi-Cheng Wang
for Structural Genomics
University of Georgia
Joint Center for Structural
Genomics
†
Ian Wilson
The Scripps Research
Institute
‡
Structural Genomics of
Timothy Cross
Integral Membrane Proteins Florida State University
Structure 2 Function Pilot
Project at CARB/TIGR
‡
John Moult
University of Maryland
http://www.nysgrc.org
http://depts.washington.
edu/sgpp
Method: X-ray
Targets: novel structural data; all kingdoms of life
http://www.mcsg.anl.gov
Method: X-ray, NMR
Targets: whole organism proteomes (X-ray, Mycoplasma
genitalium; NMR, Mycoplasma pneumoniae)
http://www.strgen.org
Method: X-ray, NMR
http://www.uwstructural
Targets: novel structural, functional data; eukaryotic proteins genomics.org
(model A. thaliana)
Method: X-ray, NMR
http://www.nesg.org
Targets: novel structural data; eukaryotic proteins (e.g. S.
cerevisiae, C. elegans, D. melanogaster, human) or practical
prokaryotic homologs
Method: X-ray, NMR
Targets: whole M. tuberculosis proteome with emphasis on
functionally important proteins
http://www.doembi.ucla.edu/TB
Method: X-ray, NMR
http://128.192.15.145
Targets: whole organism proteomes (C. elegans, Pyrococcus /secsg
furiosus), human proteins
Method: X-ray
http://www.jcsg.org
Targets: novel structural data; proteins from T. maritima and
C. elegans
Method: NMR
Targets: membrane proteins from M. tuberculosis
http://magnet.fsu.edu/
~changlin/MBPweb
Method: X-ray, NMR
Targets: model organism; Haemophilus influenzae
http://s2f.carb.nist.gov
*Spatial constraints prohibit listing all †researchers from the more‡ than 65 institutions partaking in these studies; see URLs listed in final column
for other participants and institutions. NIGMS PSI P50 Center. NIGMS PSI P01 Program Project.
bioinformatic analyses of sequence databases to identify
potential sequence/structure families for which no structural
information is available.
It is generally agreed that a thorough understanding of
protein sequence/structure space represents one of the most
important goals of the new discipline of structural genomics
(see the overview by Burley [7] in a supplement dedicated to
the subject; readers are urged to examine the remainder of
this supplement). Targets are chosen for high-throughput
structural study with the expectation that each experimental
X-ray or NMR structure will permit accurate homology
modeling of a subset of protein sequence space. With careful
target selection, these efforts should lead to the creation of a
publicly available database containing structural information
for the vast majority of protein sequences found in nature.
Lessons from structural genomics: evolution
of biochemical pathways and processes
Perhaps the earliest ‘structural genomics’ project targeted
the glycolytic pathway. Researchers from six institutions
adopted a ‘divide and conquer’ philosophy to carry out
structural studies of every enzyme in the biochemical
pathway from glucose to pyruvate [8]. The results of that
pioneering effort (and a host of more recent studies,
reviewed in [9]) yielded a detailed view of enzyme
mechanism and biochemical function for the entire
pathway. A recent review by Teichmann et al. [10]
addressed the impact of experimental structures attributed
to modern structural genomics projects on our understanding of protein function, evolution and interactions. The
authors summarized work on 42 proteins, commenting on
structural novelty (i.e. did the work reveal a new fold?),
making superfamily assignments and examining the utility
of new structures for the prediction of interaction partners.
Automated methods to analyze multiple genomes for the
assignment of domain structure and functional annotation
have been implemented and their results are available via
the World Wide Web. For example, the Protein Data Bank
(PDB; http://www.rcsb.org; [11]) provides direct entry
points from each structure to five databases (CATH [12,13],
Structural genomics Burley and Bonanno
385
Figure 1
Pathways for the biosynthesis of isopentenyl
diphosphate. Isopentenyl diphosphate, the
central intermediate in sterol/isoprenoid
biosynthesis, is produced by two independent
pathways (mevalonate-dependent and
-independent), which have different evolutionary
distributions. Enzymes with a representative
structure in the PDB are underlined.
Acetyl-CoA + Acetoacetyl-CoA
D-glyceraldehyde-3-phosphate
+ pyruvate
HMG-CoA synthase
–CO 2
DOXP synthase
HMG-CoA
1-deoxy-D-xylulose-5-phosphate
(DOXP)
NADPH
–NADP+
HMG-CoA reductase
O
DOXP reductase
HO
2C-methyl-D-erythritol-4-phosphate
(MEP)
CDP, –PPi
O
4-diphosphocytidyl-2-C-methylD-erythritol
+ATP, –ADP
HO
CMK/ychB/ispE
O
MK
OH
Mevalonate
5-phosphate
O
HO
ygbB/ispF
O2–
P
O
O
OH
OP
+ATP, –ADP PMK
4-diphosphocytidyl-2-C-methylD-erythritol-2-phosphate
–CMP
Mevalonate
+ATP, –ADP
ygbP/ispD
OH
OH
Mevalonate
5-diphosphate
OPP
+ATP, –ADP, –Pi, –CO2 MDD
IDI
PO2–
OH
HO
2C-methyl-D-erythritol2,4-cyclodiphosphate
GcpE/ispG
LytB/ispH
OPP
Isopentenyl
diphosphate
OPP
Dimethylallyl
diphosphate
Isoprenoid-based products
Current Opinion in Structural Biology
CE [14,15], FSSP [16,17], SCOP [18,19] and VAST [20]).
These links provide supplementary information relating
a given structure to protein sequence/structure families
and superfamilies. Profile-based searching strategies
(IMPALA [21], PSI-BLAST [22] and hidden Markov
models [23–25]) have dramatically increased the radius
of convergence for the detection of distant sequence
relationships [26–33]. Other publicly available resources
that match amino acid patterns or motifs to databases of
profiles or structural alignments are also undergoing
continuous development (Pfam [34,35], PROSITE
[36,37], SMART [38], BLOCKS [39], InterPro [40] and
CDD [41]). Finally, Bourne and co-workers [42–44] have
developed a database of ‘conserved key amino acid positions’
(CKAAPs), which can be searched to reveal more elusive
structural and functional relationships.
Research Consortium (NYSGRC; http://www.nysgrc.org;
an NIH-NIGMS-funded P50 PSI Structural Genomics
Center; Table 1) [45••]. One of the first set of NYSGRC
structure determination targets was Saccharomyces cerevisiae
mevalonate-5-phosphate decarboxylase (MDD; NYSGRC
target P100; PDB code 1FI4), an enzyme from the
mevalonate-dependent sterol/isoprenoid biosynthesis
pathway that catalyzes the third of three ATP-dependent
steps responsible for the conversion of mevalonate to
isopentenyl diphosphate (Figure 1). Identified as a protein
from a large superfamily with no available structural
information, MDD was rapidly cloned, expressed, purified
and crystallized, resulting in a high-resolution X-ray
structure (Figure 2a). At the time that these studies were
completed, MDD had no homologs in the PDB, as judged
by the DALI server [46,47].
Structural genomics of sterol/isoprenoid
biosynthesis
Automated homology modeling with MODPIPE [48] and
the structure of yeast MDD yielded high-quality models for
the MDD enzyme family (22 sequences), plus a much larger
number of less accurate homology models for various GHMP
small-molecule kinases [49], including the galactokinases
A case study of structural genomics applied to the problem
of understanding a medically important biochemical pathway
has been published by the New York Structural Genomics
386
Sequences and topology
Figure 2
GKs), homoserine kinases (H
HSKs), mevalonate kinases
(G
MKs) and phosphomevalonate kinases (P
PMKs), as well
(M
as for diphosphocytidyl-2-C-methyl-D-erythritol kinases
(CMK, an enzyme in the mevalonate-independent
1-deoxy-D-xylulose-5-phosphate or DOXP pathway;
Figure 1), other poorly characterized enzymes and some
hypothetical proteins. Mapping the sequence similarity
among the MDDs to the molecular surface of yeast MDD
identified a cleft lined with highly conserved residues
in close proximity to a conserved ATP-binding motif and
permitted identification of the enzyme active site.
(a)
Following the MDD structure determination and deposition
of the MDD atomic coordinates in the PDB, X-ray structures
of Methanococcus jannaschii HSK, its binary complex with
ADP (Figure 2b; PDB codes 1FWL and 1FWK; [50•]) and
its ternary complexes with ATP analogs and various
substrates (PDB codes 1H72, 1H73 and 1H74; [51•]) were
reported. As anticipated from earlier homology modeling
efforts with MDD, the X-ray structures of MDD and HSK
are strikingly similar, despite low amino acid sequence
identity (13% identity for 276 structurally equivalent
α-carbons with rmsd 3.0 Å). The bound nucleotide and
substrates found in the HSK co-crystal structures confirmed
the predicted location of the MDD active site.
(b)
Inspection of the homology modeling results obtained
with both structures (MDD and HSK) revealed models
encompassing eight discrete enzyme activities and three
distinct groups of hypothetical sequences. The models
provided direct insights into the evolution of the mevalonate-dependent sterol/isoprenoid biosynthesis pathway.
All three enzymes on the pathway from mevalonate to
isopentenyl diphosphate (MK, PMK and MDD; Figure 1)
share a common fold and may provide an example of a
process that was originally termed retrograde evolution by
Horowitz [52]. The evolution of the mevalonate-dependent
(MVA) and -independent (DOXP) pathways has been the
subject of recent comparative genomics analyses [53–55].
The distribution and loci of genes for the two pathways
suggest that the MVA pathway initially appeared in an
archaebacterium or a primitive eukaryote, and was
subsequently acquired by Gram-positive cocci and
Borrelia burgdorferi through lateral gene transfer events.
(c)
Current Opinion in Structural Biology
Structural comparison of MDD, HSK and PMK — GHMP kinase
superfamily members. Ribbon drawings of (a) MDD , (b) HSK and
(c) PMK in the same orientation. P-loops are colored red. Pairwise
comparison between structures: MDD/HSK, rmsd 3.0 Å, 13%
sequence identity; MDD/PMK, rmsd 3.4 Å, 8% sequence identity;
PMK/HSK, rmsd 2.8 Å, 16% sequence identity.
Genes encoding MK, PMK and MDD fall within a single
operon in multiple archaebacteria [55] and, in several
cases, this isoprenoid biosynthesis operon also includes
type-2 isopentenyl diphosphate isomerase (IDI2), one of two
distinct enzymes that act on the product of MDD catalysis.
The gene encoding IDI2 was previously annotated as a
carotenoid-biosynthesis-related enzyme, but has recently
been characterized as possessing FMN- and NADPHdependent IDI activity [56]. (Another enzyme, LytB, has been
implicated in a similar function in an engineered Escherichia
coli strain [57].) The structure of type-1 IDI (IDI1; Figure 1)
has been reported by the NYSGRC (PDB code 1I9A; [45••])
and by Durbecq et al. [58•] (PDB codes 1HZT and 1HX3).
Structural genomics Burley and Bonanno
It is evident from these reports that the mevalonateindependent pathway is almost certainly older than its
mevalonate-dependent counterpart. Bacteria that possess
both sterol/isoprenoid biosynthesis pathways (some
Streptomyces, for example) are thought to use the DOXP
pathway to produce primary metabolites during exponential
growth, while employing nonessential MVA pathway
enzymes in the stationary phase to produce secondary
factors, including antibiotics [59–62].
Given that the more ancient DOXP pathway contains
CMK [63,64], which possesses the GHMP kinase fold, it is
also likely that one of the other GHMP kinase superfamily
members (although not necessarily CMK) is the evolutionary
ancestor of MK, PMK and MDD [49]. Multiple gene
duplications followed by mutations to impart new substrate
specificity could generate the three genes necessary to
convert mevalonate into isopentenyl diphosphate. It appears
more than fortuitous that these genes (and, in some cases,
the gene encoding IDI2) are under the control of a single
bacterial promoter. This chromosomal arrangement may
have facilitated lateral gene transfer during evolution
following the ‘selfish operon’ model [65,66]. Presumably,
loss of the operon structure in more distantly related species
reflects recombination events. Observed operon structures
may, therefore, be of limited use in studying the evolution
of the sterol/isoprenoid biosynthesis pathways [67].
Homology modeling of GHMP kinase superfamily members
other than MDDs and HSKs depended on distantly
related experimental templates (i.e. those with less than
the generally accepted threshold of 30% amino acid
identity to the modeled protein sequence). These models
are almost certainly less accurate than those calculated
with closely related templates (i.e. >30% identity between
the template and the modeled sequence; see [68,69] for
comprehensive analyses of errors in comparative protein
structure modeling). A phylogenetic analysis of all modeled
GHMP kinase superfamily members identified 19 30%
sequence identity clusters, from which 17 new structure
determination targets were selected. To date, three of these
targets have been examined by X-ray crystallography, one
by the NYSGRC (PMK) [70•] and two (MK, vide infra) by
independent laboratories [71•,72•].
The structure of Streptococcus pneumoniae PMK (PDB code
1K47; NYSGRC target T27; Figure 2c) confirmed the
predicted fold assignment, identified conserved residues
in the active site and allowed the more accurate modeling
of enzymes constituting the 30% sequence identity
PMK cluster (accurate models were obtained for all but
fungal PMKs). M. jannaschii MK (PDB code 1KKH) was
also shown to be a member of the GHMP kinase
sequence/structure superfamily and in silico docking of ATP
and mevalonate (based on the HSK co-crystal structures)
into the active site revealed a constellation of putative
contacts with conserved residues. The co-crystal structure
of rat MK bound to ATP (PDB code 1KVK) was used to
387
understand the structural and functional consequences of
mutations causing human hyperimmunoglobulinaemia D
(hyper-IgD) and periodic fever syndrome or mevalonic
aciduria [73,74].
Although structures of both archaeal and eukaryotic MKs
have been elucidated, the results of clustering analyses
suggest that eubacterial MKs cannot be modeled at
acceptable accuracy levels using the existing X-ray
structures as templates. The NYSGRC is proceeding with
the elucidation of an additional MK structure from a
Gram-positive eubacterium. Once this experimental
structure is available, accurate homology modeling of MKs
from all three living kingdoms should be possible and
reliable structural information will be made publicly
available for all extant MK sequences via MODBASE
([48,75]; http://www.nysgrc.org).
There is also considerable interest in the mevalonateindependent pathway outside the context of organized
structural genomics efforts. Both academic and industrial
research laboratories have completed X-ray structure
determinations of DOXP pathway enzymes. Stubbs and
co-workers [76•] published the structure of DOXP
reductoisomerase or DOXP reductase (PDB code 1K5H;
Figure 1), revealing a V-shaped monomer that is composed
of an N-terminal dinucleotide-binding domain, a linker
region and a C-terminal four-helix bundle domain. The
linker region is responsible for dimerization and harbors
most of the active site residues, which include strictly
conserved acidic residues thought to coordinate catalytic
divalent metals. Another group has also deposited
coordinates for this structure (PDB code 1JVS). Both Noel
and co-workers [77•] and Structural GenomiX Inc have
determined the structure of 4-diphosphocytidyl-2-Cmethyl-D-erythritol synthetase (ygbP/ispD; PDB codes
1INI, 1INJ and 1I52; Figure 1). Additional crystallographic
work at Structural GenomiX Inc, the Max-Planck-Institut
für Biochemie, the Salk Institute and the Center for
Advanced Research in Biotechnology (CARB) has yielded
structures of 2-C-methyl-D-erythritol-2,4-cyclodiphosphate
synthase [78•,79•] (ygbB/ispF; PDB codes 1JY8, 1KNK,
1KNJ and 1JN1; also 1IV1–4, deposited by an unidentified
research group). The enzymes GcpE and LytB are thought to
be involved in the mevalonate-independent pathway [80–84]
and have been called ispG and ispH, respectively, as they
appear to catalyze steps downstream of ispF in engineered
E. coli [57,85]. These X-ray structures and the homology
model of CMK/ychB/ispE provide three-dimensional
information for many of the mevalonate-independent
pathway enzymes depicted in Figure 1, leaving DOXP
synthase, ispG and ispH as structure determination targets.
A similar situation prevails for the mevalonate-dependent
pathway, for which only structures of IDI2 and HMG-CoA
synthase are lacking (low accuracy homology models for
the latter can be computed using a β-ketoacyl-acp synthase
III template; PDB code 1HNJ). X-ray structures of the
388
Sequences and topology
Figure 3
(a)
Structural comparison of M. jannaschii target
MJ1247 and iso-GlmS. Ribbon drawings of
(a) tetrameric MJ1247 and (b) the dimeric
isomerization domain of iso-GlmS in similar
topological orientations.
(b)
Current Opinion in Structural Biology
catalytic domain of HMG-CoA reductase have been available
for some time [86,87] and an exhaustive series of co-crystal
structures with various lipid-lowering HMG-CoA reductase
inhibitors or statins has been published recently by
Diesenhofer and co-workers [88–90].
Structural genomics to assign biochemical function
Another illustrative example of the utility of structural
genomics in establishing functional annotations for enzymatic
processes was published by the UC Berkeley Structural
Genomics Center [91••] (Table 1). The sequence of
M. jannaschii target MJ1247, annotated as a hypothetical
protein, was presumptively identified as a distant relative
of 6-phospho-3-hexulose isomerase (PHI, an enzyme in
the ribulose monophosphate pathway for formaldehyde
fixation). The X-ray structure of MJ1247 (PDB code
1JEO), determined by Se-Met multiwavelength anomalous
dispersion (or MAD) [92,93], revealed a homotetrameric
arrangement of α–β–α sandwich monomers (Figure 3a).
Significant structural similarity to two PDB entries was
detected by the DALI server [46,47] and, not surprisingly,
these related sequences were found to have been previously
annotated with small-molecule phosphate isomerase
activity: the isomerase domain of E. coli glucosamine-6phosphate synthase (iso-GlmS; PDB code 1MOQ;
subdomain 1: 20% identical with rmsd 2.9 Å for 148
equivalent α carbons; subdomain 2: 10% identical with
rmsd 3.2 Å for 157 equivalent α carbons; vide infra) and
phosphoglucose isomerase (PGI; PDB code 2PGI; 6%
identical with rmsd 3.8 Å for 143 equivalent α carbons).
Inspection of a sequence alignment of MJ1247 with known
or putative PHIs revealed regions of highly conserved
residues mapping to features on the surface of the
tetramer that coalesce to form four clefts.
The structural alignment of tetrameric MJ1247 with
dimeric iso-GlmS (the monomer is composed of two
isostructural subdomains and the dimer possesses the same
overall topology as the MJ1247 tetramer; Figure 3a,b)
revealed four structurally conserved residues in each
cleft, three serine and one threonine, which contribute to
phosphate binding in iso-GlmS. Surface residues that
determine the sugar-binding specificity of iso-GlmS are
not present in the MJ1247 sequence and appear to be
replaced by amino acids that support the binding of
ligand(s) specific to MJ1247. Finally, an isomerization
assay of PHI by MJ1247 confirmed the functional annotation
inferred by X-ray crystallography.
Conclusions and perspectives
Nascent structural genomics initiatives and combined
research efforts in both academic and industrial research
laboratories are determining an enormous number of
experimental protein structures. When coupled with
automated homology modeling, these structures provide a
wealth of three-dimensional information that eventually
promises to encompass most of the proteins found in
nature. Beyond the obvious technical difficulties inherent
in large-scale experimental and computational efforts
aimed at structural characterization of the universe of
protein sequences, there are considerable challenges ahead
for the field of bioinformatics. Organizing this vast body
of structural information, attributing accurate functional
annotations and integrating these data with the results of
Structural genomics Burley and Bonanno
389
expression profiling and protein–protein and protein–ligand
interaction studies (among others) represent very real
bottlenecks in our quest for knowledge in biology.
22. Altschul SF, Madden TL, Schaffer AA, Zhang JZ, Miller W, Lipman DJ:
Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 1997,
25:3389-3402.
References and recommended reading
23. Krogh A, Brown M, Mian IS, Sjolander K, Haussler D: Hidden Markov
models in computational biology. Applications to protein
modeling. J Mol Biol 1994, 235:1501-1531.
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
•• of outstanding interest
1.
Harris T: Genetics, genomics, and drug discovery. Med Res Rev
2000, 20:203-211.
2.
Weng Z, DeLisi C: Protein therapeutics: promises and challenges
for the 21st century. Trends Biotechnol 2002, 20:29-35.
3.
Liu J, Rost B: Comparing function and structure between entire
proteomes. Protein Sci 2001, 10:1970-1979.
4.
Vitkup D, Melamud E, Moult J, Sander C: Completeness in structural
genomics. Nat Struct Biol 2001, 8:559-566.
5.
Norvell JC, Machalek AZ: Structural genomics programs at the US
National Institute of General Medical Sciences. Nat Struct Biol
2000, 7(suppl):931.
6.
Stevens RC, Yokoyama S, Wilson IA: Global efforts in structural
genomics. Science 2001, 294:89-92.
7.
Burley SK: An overview of structural genomics. Nat Struct Biol
2000, 7(suppl):932-934.
8.
Campbell JW, Duee E, Hodgson G, Mercer WD, Stammers DK,
Wendell PL, Muirhead H, Watson HC: X-ray diffraction studies on
enzymes in the glycolytic pathway. Cold Spring Harb Symp Quant
Biol 1972, 36:165-170.
9.
Erlandsen H, Abola EE, Stevens RC: Combining structural
genomics and enzymology: completing the picture in metabolic
pathways and enzyme active sites. Curr Opin Struct Biol 2000,
10:719-730.
10. Teichmann SA, Murzin AG, Chothia C: Determination of protein
function, evolution and interactions by structural genomics. Curr
Opin Struct Biol 2001, 11:354-363.
11. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,
Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids
Res 2000, 28:235-242.
12. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB,
Thornton JM: CATH—a hierarchic classification of protein domain
structures. Structure 1997, 5:1093-1108.
24. Eddy SR: Hidden Markov models. Curr Opin Struct Biol 1996,
6:361-365.
25. Hughey R, Krogh A: Hidden Markov models for sequence analysis:
extension and analysis of the basic method. Comput Appl Biosci
1996, 12:95-107.
26. Buchan DW, Shepherd AJ, Lee D, Pearl FM, Rison SC, Thornton JM,
Orengo CA: Structural assignment for whole genes and genomes
using the CATH domain structure database. Genome Res 2002,
12:503-514.
27.
Orengo CA, Bray JE, Buchan DW, Harrison A, Lee D, Pearl FM,
Sillitoe I, Todd AE, Thornton JM: The CATH protein family database:
a resource for structural and functional annotation of genomes.
Proteomics 2002, 2:11-21.
28. Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, Orengo CA:
The CATH extended protein-family database: providing structural
annotations for genome sequences. Protein Sci 2002,
11:233-244.
29. Gough J, Chothia C: SUPERFAMILY: HMMs representing all
proteins of known structure. SCOP sequence searches,
alignments and genome assignments. Nucleic Acids Res 2002,
30:268-272.
30. Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS,
Sowdhamini R, Srinivasan N: SUPFAM—a database of potential
protein superfamily relationships derived by comparing
sequence-based and structure-based families: implications for
structural genomics and function annotation in genomes. Nucleic
Acids Res 2002, 30:289-293.
31. Karplus K, Sjolander K, Barrett C, Cline M, Haussler D, Hughey R,
Holm L, Sander C: Predicting protein structure using hidden
Markov models. Proteins 1997, 1(suppl):134-139.
32. Karplus K, Barrett C, Hughey R: Hidden Markov models for
detecting remote protein homologies. Bioinformatics 1998,
14:846-856.
33. Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology
to genome sequences using a library of hidden Markov models
that represent all proteins of known structure. J Mol Biol 2001,
313:903-919.
13. Orengo CA, Pearl FMG, Bray JE, Todd AE, Martin AC, Lo Conte L,
Thornton JM: The CATH Database provides insights into protein
structure/function relationship. Nucleic Acids Res 1999, 27:275-279.
34. Sonnhammer EL, Eddy SR, Durbin R: Pfam: a comprehensive
database of protein domain families based on seed alignments.
Proteins 1997, 28:405-420.
14. Shindyalov IN, Bourne PE: Protein structure alignment by
incremental combinatorial extension (CE) of the optimal path.
Protein Eng 1998, 11:739-747.
35. Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR,
Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam
protein families database. Nucleic Acids Res 2002, 30:276-280.
15. Shindyalov IN, Bourne PE: A database and tools for 3-D protein
structure comparison and alignment using the Combinatorial
Extension (CE) algorithm. Nucleic Acids Res 2001, 29:228-229.
36. Bairoch A: PROSITE: a dictionary of sites and patterns in proteins.
Nucleic Acids Res 1991, 19:2241-2245.
16. Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G: A database of
protein structure families with common folding motifs. Protein Sci
1992, 1:1691-1698.
17.
Holm L, Sander C: Touring protein fold space with Dali/FSSP.
Nucleic Acids Res 1998, 26:316-319.
18. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural
classification of proteins database for the investigation of
sequences and structures. J Mol Biol 1995, 247:536-540.
19. Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP
database in 2002: refinements accommodate structural
genomics. Nucleic Acids Res 2002, 30:264-267.
37.
Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K,
Bairoch A: The PROSITE database, its status in 2002. Nucleic
Acids Res 2002, 30:235-238.
38. Letunic I, Goodstadt L, Dickens NJ, Doerks T, Schultz J, Mott R,
Ciccarelli F, Copley RR, Ponting CP, Bork P: Recent improvements
to the SMART domain-based sequence annotation resource.
Nucleic Acids Res 2002, 30:242-244.
39. Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased
coverage of protein families with the blocks database servers.
Nucleic Acids Res 2000, 28:228-230.
20. Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure
comparison. Curr Opin Struct Biol 1996, 6:377-385.
40. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M,
Bucher P, Cerutti L, Corpet F, Croning MD et al.: The InterPro
database, an integrated documentation resource for protein
families, domains and functional sites. Nucleic Acids Res 2001,
29:37-40.
21. Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF:
IMPALA: matching a protein sequence against a collection of
PSI-BLAST-constructed position-specific score matrices.
Bioinformatics 1999, 15:1000-1011.
41. Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA,
Geer LY, Bryant SH: CDD: a database of conserved domain
alignments with links to domain three-dimensional structure.
Nucleic Acids Res 2002, 30:281-283.
390
Sequences and topology
42. Li WW, Reddy BV, Shindyalov IN, Bourne PE: CKAAPs DB:
a conserved key amino acid positions database. Nucleic Acids Res
2001, 29:329-331.
59. Dairi T, Hamano Y, Kuzuyama T, Itoh N, Furihata K, Seto H: Eubacterial
diterpene cyclase genes essential for production of the isoprenoid
antibiotic terpentecin. J Bacteriol 2001, 183:6085-6094.
43. Reddy BV, Li WW, Shindyalov IN, Bourne PE: Conserved key amino
acid positions (CKAAPs) derived from the analysis of common
substructures in proteins. Proteins 2001, 42:148-163.
60. Shiomi K, Iinuma H, Naganawa H, Isshiki K, Takeuchi T, Umezawa H:
Biosynthesis of napyradiomycins. J Antibiot (Tokyo) 1987,
40:1740-1745.
44. Li WW, Reddy BV, Tate JG, Shindyalov IN, Bourne PE: CKAAPs DB:
a Conserved Key Amino Acid Positions DataBase. Nucleic Acids
Res 2002, 30:409-411.
61. Seto H, Orihara N, Furihata K: Studies on the biosynthesis of
terpenoids produced by Actinomycetes. Part 4. Formation of
BE-40644 by the mevalonate and non-mevalonate pathways.
Tetrahedron Lett 1998, 39:9497-9500.
45. Bonanno JB, Edo C, Eswar N, Pieper U, Romanowski MJ, Ilyin V,
•• Gerchman SE, Kycia H, Studier FW, Sali A et al.: Structural
genomics of enzymes involved in steroid/isoprenoid
biosynthesis. Proc Natl Acad Sci USA 2001, 98:12896-12901.
The NYSGRC reports the X-ray structures of MDD and IDI1 from the
sterol/isoprenoid biosynthesis pathway. This paper gives a descriptive
account of structures resulting from high-throughput methodologies. The
implications of automated homology modeling for fold assignment, pathway
evolution and target selection for structural genomics are discussed.
46. Holm L, Sander C: Protein structure comparison by alignment of
distance matrices. J Mol Biol 1993, 233:123-138.
47.
Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L:
A fully automatic evolutionary classification of protein folds:
Dali Domain Dictionary version 3. Nucleic Acids Res 2001,
29:55 57.
48. Sanchez R, Pieper U, Mirkovic N, de Bakker PI, Wittenstein E, Sali A:
MODBASE, a database of annotated comparative protein
structure models. Nucleic Acids Res 2000, 28:250-253.
49. Bork P, Sander C, Valencia A: Convergent evolution of similar
enzymatic function on different protein folds: the hexokinase,
ribokinase, and galactokinase families of sugar kinases. Protein
Sci 1993, 2:31-40.
50. Zhou T, Daugherty M, Grishin NV, Osterman AL, Zhang H: Structure
•
and mechanism of homoserine kinase: prototype for the GHMP
kinase superfamily. Structure 2000, 8:1247-1257.
The X-ray structure of HSK, which is structurally similar to MDD from the
MVA pathway, is reported. This first published structure of a bona fide
GHMP kinase offers a platform for expanding understanding of a large and
diverse superfamily of enzymes.
51. Krishna SS, Zhou T, Daugherty M, Osterman A, Zhang H: Structural
•
basis for the catalysis and substrate specificity of homoserine
kinase. Biochemistry 2001, 40:10810-10818.
Further X-ray studies of the HSK mechanism of action. This paper underscores the need for collaborative interaction of structural genomics programs
with other academic laboratories to further investigate enzyme mechanism
and specificity.
52. Horowitz NH: On the evolution of biochemical syntheses. Proc
Natl Acad Sci USA 1945, 31:153-157.
53. Smit A, Mushegian A: Biosynthesis of isoprenoids via
mevalonate in Archaea: the lost pathway. Genome Res 2000,
10:1468-1484.
54. Boucher Y, Doolittle WF: The role of lateral gene transfer in the
evolution of isoprenoid biosynthesis pathways. Mol Microbiol
2000, 37:703-716.
55. Wilding EI, Brown JR, Bryant AP, Chalker AF, Holmes DJ,
Ingraham KA, Iordanescu S, So CY, Rosenberg M, Gwynn MN:
Identification, evolution, and essentiality of the mevalonate
pathway for isopentenyl diphosphate biosynthesis in
Gram-positive cocci. J Bacteriol 2000, 182:4319-4327.
56. Kaneda K, Kuzuyama T, Takagi M, Hayakawa Y, Seto H: An unusual
isopentenyl diphosphate isomerase found in the mevalonate
pathway gene cluster from Streptomyces sp. strain CL190. Proc
Natl Acad Sci USA 2001, 98:932-937.
57.
Rohdich F, Hecht S, Gartner K, Adam P, Krieger C, Amslinger S,
Arigoni D, Bacher A, Eisenreich W: Studies on the
nonmevalonate terpene biosynthetic pathway: metabolic
role of IspH (LytB) protein. Proc Natl Acad Sci USA 2002,
99:1158-1163.
58. Durbecq V, Sainz G, Oudjama Y, Clantin B, Bompard-Gilles C,
•
Tricot C, Caillet J, Stalon V, Droogmans L, Villeret V: Crystal structure
of isopentenyl diphosphate:dimethylallyl diphosphate isomerase.
EMBO J 2001, 20:1530-1537.
The recently determined X-ray structures of apo and metal-chelated IDI1
from the sterol/isoprenoid biosynthesis pathway are reported.
62. Seto H, Watanabe H, Furihata K: Simultaneous operation of the
mevalonate and non-mevalonate pathways in the biosynthesis of
isopentenyl diphosphate in Streptomyces aeriouvifer. Tetrahedron
Lett 1996, 37:7979-7982.
63. Rohdich F, Wungsintaweekul J, Luttgen H, Fischer M, Eisenreich W,
Schuhr CA, Fellermeier M, Schramek N, Zenk MH, Bacher A:
Biosynthesis of terpenoids: 4-diphosphocytidyl-2-C-methyl-Derythritol kinase from tomato. Proc Natl Acad Sci USA 2000,
97:8251-8256.
64. Eisenreich W, Rohdich F, Bacher A: Deoxyxylulose phosphate
pathway to terpenoids. Trends Plant Sci 2001, 6:78-84.
65. Lawrence JG: Selfish operons and speciation by gene transfer.
Trends Microbiol 1997, 5:355-359.
66. Lawrence JG, Ochman H: Reconciling the many faces of lateral
gene transfer. Trends Microbiol 2002, 10:1-4.
67.
Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment,
evolution of prokaryotic genome organization, and prediction of
gene function using genomic context. Genome Res 2001,
11:356-372.
68. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A:
Comparative protein structure modeling of genes and genomes.
Annu Rev Biophys Biomol Struct 2000, 29:291-325.
69. Tramontano A, Leplae R, Morea V: Analysis and assessment of
comparative modeling predictions in CASP4. Proteins 2001,
45:22-38.
70. Romanowski MR, Bonanno JB, Burley SK: Crystal structure of
•
Streptococcus pneumoniae phosphomevalonate kinase. Proteins
2002, 47:568-571.
The NYSGRC X-ray structure of PMK from the MVA pathway. This structure
was pursued as a result of directed target selection based on the structures
of MDD and HSK.
71. Yang D, Shipman LW, Roessner CA, Scott AI, Sacchettini JC:
•
Structure of the Methanococcus jannaschii mevalonate kinase – a
member of the GHMP kinase superfamily. J Biol Chem 2002,
277:9462-9467.
The X-ray structure of an archaeal MK from the MVA pathway. As another
example of a prototypical GHMP kinase, this structure allows biologists to
probe the differences in specificity of the superfamily.
72. Fu Z, Wang M, Potter D, Miziorko HM, Kim JJ: The structure of a
•
binary complex between a mammalian mevalonate kinase and
ATP: insights into the reaction mechanism and human inherited
disease. J Biol Chem 2002, 27:in press.
The X-ray structure of a eukaryotic MK from the MVA pathway. This structure
may allow direct insight into mutations in human MK that are implicated
in disease.
73. Houten SM, Koster J, Romeijn G-J, Frenkel J, Di Rocco M, Caruso U,
Landrieu P, Kelly RI, Kuis W, Poll-The BT et al.: Organization of the
mevalonate kinase (MVK) gene and identification of novel
mutations causing mevalonic aciduria and
hyperimmunoglobulinaemia D and periodic fever syndrome. Eur J
Hum Genet 2001, 9:253-259.
74. Cuisset L, Drenth JPH, Simon A, Vincent MF, van der Velde Visser S,
van der Meer JWM, Grateau G, Delpech M: Molecular analysis of
MVK mutations and enzymatic activity in hyper-IgD and periodic
fever syndrome. Eur J Hum Genet 2001, 9:260-266.
75. Sanchez R, Sali A: ModBase: a database of comparative protein
structure models. Bioinformatics 1999, 15:1060-1061.
76. Reuter K, Sanderbrand S, Jomaa H, Wiesner J, Steinbrecher I, Beck E,
•
Hintz M, Klebe G, Stubbs MT: Crystal structure of 1-deoxy-Dxylulose-5-phosphate reductoisomerase, a crucial enzyme in the
non-mevalonate pathway of isoprenoid biosynthesis. J Biol Chem
2002, 277:5378-5384.
The X-ray structure of DOXP reductase from the DOXP pathway is revealed.
Structural genomics Burley and Bonanno
77.
•
Richard SB, Bowman ME, Kwiatkowski W, Kang I, Chow C, Lillo AM,
Cane DE, Noel JP: Structure of 4-diphosphocytidyl-2-Cmethylerythritol synthetase involved in mevalonate-independent
isoprenoid biosynthesis. Nat Struct Biol 2001, 8:641-648.
The X-ray structure of ygbP/ispD from the DOXP pathway is revealed.
78. Richard SB, Ferrer JL, Bowman ME, Lillo AM, Tetzlaff CN, Cane DE,
•
Noel JP: structure and mechanism of 2-C-methyl-D-erythritol
2,4-cyclodiphosphate synthase. An enzyme in the mevalonateindependent isoprenoid biosynthetic pathway. J Biol Chem 2002,
277:8667-8672.
The X-ray structure of ygbB/ispF from the DOXP pathway is revealed.
79. Steinbacher S, Kaiser J, Wungsintaweekul J, Hecht S, Eisenreich W,
•
Gerhardt S, Bacher A, Rohdich F: Structure of 2C-methyl-Derythritol-2,4-cyclodiphosphate synthase involved in
mevalonate-independent biosynthesis of isoprenoids. J Mol Biol
2002, 316:79-88.
The X-ray structure of ygbB/ispF in the DOXP pathway.
80. Cunningham FX Jr, Lafond TP, Gantt E: Evidence of a role for LytB in
the nonmevalonate pathway of isoprenoid biosynthesis.
J Bacteriol 2000, 182:5841-5848.
81. Altincicek B, Kollas A, Eberl M, Wiesner J, Sanderbrand S, Hintz M,
Beck E, Jomaa H: LytB, a novel gene of the 2-C-methyl-D-erythritol
4-phosphate pathway of isoprenoid biosynthesis in
Escherichia coli. FEBS Lett 2001, 499:37-40.
391
85. Hecht S, Eisenreich W, Adam P, Amslinger S, Kis K, Bacher A,
Arigoni D, Rohdich F: Studies on the nonmevalonate pathway to
terpenes: the role of the GcpE (IspG) protein. Proc Natl Acad Sci
USA 2001, 98:14837-14842.
86. Lawrence CM, Rodwell VW, Stauffacher CV: Crystal structure of
Pseudomonas mevalonii HMG-CoA reductase at 3.0 angstrom
resolution. Science 1995, 268:1758-1762.
87.
Tabernero L, Bochar DA, Rodwell VW, Stauffacher CV:
Substrate-induced closure of the flap domain in the ternary
complex structures provides insights into the mechanism of
catalysis by 3-hydroxy-3-methylglutaryl-CoA reductase. Proc Natl
Acad Sci USA 1999, 96:7167-7171.
88. Istvan ES, Palnitkar M, Buchanan SK, Deisenhofer J: Crystal
structure of the catalytic portion of human HMG-CoA reductase:
insights into regulation of activity and catalysis. EMBO J 2000,
19:819-830.
89. Istvan ES, Deisenhofer J: Structural mechanism for statin inhibition
of HMG-CoA reductase. Science 2001, 292:1160-1164.
90. Istvan ES, Deisenhofer J: The structure of the catalytic portion of
human HMG-CoA reductase. Biochim Biophys Acta 2000, 1529:9-18.
82. Campos N, Rodriguez-Concepcion M, Seemann M, Rohmer M,
Boronat A: Identification of gcpE as a novel gene of the
2-C-methyl-D-erythritol 4-phosphate pathway for isoprenoid
biosynthesis in Escherichia coli. FEBS Lett 2001, 488:170-173.
91. Martinez-Cruz LA, Dreyer MK, Boisvert DC, Yokota H,
•• Martinez-Chantar ML, Kim R, Kim SH: Crystal structure of MJ1247
protein from M. jannaschii at 2.0 Å resolution infers a molecular
function of 3-hexulose-6-phosphate isomerase. Structure 2002,
10:195-204.
A report of functional annotation via X-ray crystallography from the UC Berkeley
Structural Genomics Center.
83. Altincicek B, Kollas AK, Sanderbrand S, Wiesner J, Hintz M, Beck E,
Jomaa H: GcpE is involved in the 2-C-methyl-D-erythritol
4-phosphate pathway of isoprenoid biosynthesis in
Escherichia coli. J Bacteriol 2001, 183:2411-2416.
92. Hendrickson WA, Horton JR, LeMaster DM: Selenomethionyl
proteins produced for analysis by multiwavelength anomalous
diffraction (MAD): a vehicle for direct determination of
three-dimensional structure. EMBO J 1990, 9:1665-1672.
84. McAteer S, Coulson A, McLennan N, Masters M: The lytB gene of
Escherichia coli is essential and specifies a product needed for
isoprenoid biosynthesis. J Bacteriol 2001, 183:7403-7407.
93. Hendrickson W: Determination of macromolecular structures from
anomalous diffraction of synchrotron radiation. Science 1991,
254:51-58.