Download Bioinorganic motifs: towards functional classification of metalloproteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic code wikipedia , lookup

Gene expression wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Enzyme wikipedia , lookup

Protein folding wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein wikipedia , lookup

Protein domain wikipedia , lookup

Biochemistry wikipedia , lookup

Interactome wikipedia , lookup

Protein moonlighting wikipedia , lookup

Western blot wikipedia , lookup

List of types of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Homology modeling wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein adsorption wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
BIOINFORMATICS REVIEW
Vol. 16 no. 10 2000
Pages 851–864
Bioinorganic motifs: towards functional
classification of metalloproteins
Kirill Degtyarenko ∗
EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome
Campus, Hinxton, Cambridge CB10 1SD, UK
Received on December 21, 1999; revised on April 6, 2000; accepted on May 2, 2000
Abstract
The habitat of bioinorganic motifs (BIMs) is at the interface of biological inorganic chemistry and bioinformatics.
BIM is defined as a common structural feature shared
by functionally related, but not necessarily homologous,
proteins, and consisting of the metal atom(s) and first
coordination shell ligands. BIMs appear to be suitable for
classification of metal centres at any level, from groups
of unrelated proteins with similar function to different
functional states of the same protein, and for description
of possible evolutionary relationships of metalloproteins.
However, they have not attracted wide attention from the
bioinformatics community. Although their presence is
appreciated, they are difficult to predict—therefore the
current ‘high-throughput’ initiatives are likely to miss or
ignore them altogether. The protein sequence databases
do not distinguish between proteins containing different
prosthetic groups (unless they have different sequences)
or between apo- and holoprotein. On the other hand,
the protein structure databases include data on ‘hetero
compounds’ of various origin but these data are often
inconsistent. A number of specialized databases dealing
with BIMs and attempts to classify them are reviewed.
Supplementary information: The additional bibliography
and list of Internet resources on bioinorganic chemistry
are available at http:// www.ebi.ac.uk/ ∼kirill/ biometal/
Contact: [email protected]
Abbreviations
BChl-a, bacteriochlorophyll a
BIM, bioinorganic motif
BOM, bioorganic motif
CCDC, Cambridge Crystallographic Data Centre
CSD, Cambridge Structural Database
D, dimensionality
EC, Enzyme Commission
FeMoco, iron–molybdenum cofactor
∗ To whom correspondence should be addressed.
c Oxford University Press 2000
ICSD, Inorganic Crystal Structure Database
EPR, electron paramagnetic resonance
MDB, Metalloprotein site Database and Browser
MSD, Macromolecular Structure Database
Moco, molybdenum cofactor
NMR, nuclear magnetic resonance
ppIX, protoporphyrin IX
PDB, Protein Data Bank
Sec, L-selenocysteine
TPQ, 2,4,5-trihydroxyphenylalanine quinone
Introduction
The field of biological inorganic chemistry is multidisciplinary and perhaps lacks well defined boundaries
(Valentine and O’Halloran, 1999), but its main focus
undoubtedly is on the structure and function of metalcontaining proteins. Metalloproteins participate in the
most important biochemical processes including respiration, nitrogen fixation and oxygenic photosynthesis.
Metalloenzymes were the first biological catalysts on
Earth. About one-third of all structurally characterized
proteins contain metals, while over 50% of all proteins
are estimated to be metalloproteins. This emphasizes the
crucial role of metal ions in stabilizing protein structure
(Jernigan et al., 1994).
Computational protein structure analysis is one of the
cornerstones of bioinformatics. It seems strange how little
attention the bioinformatics community has paid to metalloproteins and other complex proteins. (To get an idea,
try PubMed search with the combination ‘bioinformatics’
+ ‘biological inorganic’ or ‘computational’ + ‘bioinorganic’.) It is particularly striking considering the remarkable efforts and progress made in computational inorganic
chemistry in the last few years [see, for example, Davidson
(2000)]. Why did this happen?
Historically, the main focus of bioinformatics has been
on computational analysis of biological macromolecules,
i.e. proteins and nucleic acids. Advent of high-throughput
851
K.Degtyarenko
sequencing methods provides bioinformaticians with
more and more raw sequence data to analyse. Since
proteins are biochemical entities, the lack of specific
biochemical data results in the immense information
gap between protein structure and function. A ‘seamless
transition between bioinformatics and chemoinformatics’
(Hann and Green, 1999) is needed to bridge this gap.
Although the term ‘chemoinformatics’ came from the
field of drug discovery, the methodologies employed
are equally applicable to fundamental chemistry, including bioinorganic chemistry. Most of the challenges
in chemoinformatics also have direct analogy with
bioinformatics (just replace ‘molecule’ with ‘biological
macromolecule’). There are some important differences
too. Many bioinformatics resources (databases, services,
programs) are freely accessible via the Internet. In contrast, almost all chemical resources are not. The lack of
free chemical databases has resulted in the absence of a
standard format for chemical data.
Here, I present my view of what chemical information
should be (or already is) available in terms of bioinorganic
motifs (BIMs). Many features of BIMs were already
summarized in our earlier paper (Degtyarenko et al., 1998)
but the basic definition was lacking. Firstly, I try to give
definitions and discuss the properties of BIMs. I then
review the databases dealing with metalloproteins and
BIMs, which are summarized in Table 1.
Some ideas discussed in this review were presented at
the Second International Nomenclature Workshop (White
et al., 1999).
Some definitions
This section contains definitions of terms directly related
to the concept of bioinorganic motif. The definition for
terms given in italic are summarized in the Glossary. The
Glossary of Terms in Bioinorganic Chemistry (de Bolster,
1997) contains definitions for approximately 400 terms of
relevance and I recommend it for further reference.
The same low-molecular compound can play different roles depending on its chemical context in the
macromolecular environment (Lippard and Berg, 1994).
The term cofactor causes confusion since it has been
used instead of either prosthetic group or coenzyme, or
referred to both collectively. (Sometimes even proteins
such as calmodulin are referred to as ‘cofactors’ in
biochemical literature). Therefore, the use of this term
should be generally avoided, apart from certain well
established combinations, e.g. molybdenum cofactor and
iron–molybdenum cofactor. Both the apoprotein and the
prosthetic group are integral parts of a functional complex
protein. In contrast, the coenzyme is just another substrate
of an enzyme. It should be noted that the biological
functions of complex proteins (Table 2) may be other
than catalysis, while the term ‘coenzyme’ should be used
852
only in conjunction with enzymes. An example of two
distinct roles for the same compound within one protein
complex is provided by the photosynthetic reaction centre
from Rhodobacter sphaeroides: while one molecule of
ubiquinone-10 is tightly bound, another one exchanges
with the quinone pool of the membrane so that the
electrons are transported outside the protein (Deisenhofer
and Michel, 1992).
Both prosthetic groups and coenzymes may (or may
not) contain metal ions. Metal atoms per se can play such
different roles as prosthetic group; substrate, product or
inhibitor of an enzyme; stored or transported atom.
The word ligand has two distinct, and sometimes
directly opposite, meanings:
(i) In coordination chemistry, the atoms or chemical
groups bound to the central atom (usually a metal)
via dative bond are called ligands. The donors of one
or more electron pairs to the central atom are called
monodentate or polydentate ligands, respectively. In
bioinorganic chemistry, the ligands are often derived
from macromolecules (polypeptides and nucleic
acids) and some of them are polydentate.
(ii) In biochemistry, any low-molecular compound (including metal ions and metal compounds) bound to
the macromolecule may be referred to as ligand, e.g.
in ‘ligand–receptor interactions’. The linkage, therefore, is not restricted to dative bonds. Interestingly,
the names such as LIGAND (Goto et al., 1998), ReLiBase (Hendlich, 1998) and LIGPLOT (Wallace et
al., 1995) all make use of this biochemical meaning.
I will use the term ligand only in its (i) sense. The
ligands surrounding the central atom are collectively
called the (first) coordination shell. Polypeptide can
be regarded as a polydentate ligand, but it is often
easier to think of the amino acid residues as separate
ligands. In some cases, however, the polydentate nature
of polypeptide simply cannot be ignored. For example,
in nitrile hydratase (Figure 1a), the active centre iron
is coordinated to polypeptide backbone as well as side
chains. The coordination geometry is octahedral, with the
iron atom and equatorial ligands that can be superimposed
on the plane of the iron and the four pyrrole nitrogens in
haem (Huang et al., 1997).
There are four classes of functional groups collectively
referred to as polypeptide-derived, or endogenous, ligands
(Holm et al., 1996):
• Side chain groups: amide (Asn, Gln), amino (Lys),
carboxyl (Asp, Glu), hydroxyl (Ser, Thr), imidazole
(His), phenol (Tyr), selenol (Sec), sulphide (Met) and
thiol (Cys)
• Carbonyl and amide of main chain
Bioinorganic motifs: towards functional classification of metalloproteins
Table 1. Databases relevant to this review
Database
URL
Description
Reference
Protein sequences
Protein sequence motifs (fingerprints)
Barker et al., 2000
Attwood et al., 2000
PROSITE
http://pir.georgetown.edu/pir/
http://www.bioinf.man.ac.uk/dbbrowser/
PRINTS/
http://www.expasy.ch/prosite/
Hofmann et al., 1999
RESID
SWISS-PROT
http://pir.georgetown.edu/resid/
http://www.expasy.ch/sprot/
Protein sequence motifs (regular expressions
and profiles)
Post-translational modifications in proteins
Protein sequences
http://www.brenda.uni-koeln.de/
http://www.expasy.ch/enzyme/
http://www.genome.ad.jp/dbget/ligand.html
Physico-chemical properties of enzymes
Enzyme nomenclature
Enzymes, reactions and compounds
Schomburg et al., 1999
Bairoch, 2000
Goto et al., 1998
BioMagResBank
http://www.bmrb.wisc.edu/
Seavey et al., 1991
CATH
CSD
http://www.biochem.ucl.ac.uk/bsm/cath/
http://cds.dl.ac.uk/cds/llcsd2.html
HIC-Up
ICSD
IMB Jena Image Library
http://xray.bmc.uu.se/hicup/
http://barns.ill.fr/dif/icsd/
http://www.fiz-karlsruhe.de/stn/Databases/
icsd.html
http://www.imb-jena.de/IMAGE.html
NMR data on proteins, peptides and nucleic
acids
Protein structure classification
Crystal structures of organic and
metalloorganic compounds
HET compounds from PDB
Crystal structures of inorganic compounds
MSD
http://msd.ebi.ac.uk/
PDB
PDBsum
http://www.rcsb.org/pdb/
http://pdb-browsers.ebi.ac.uk/
http://www.biochem.ucl.ac.uk/bsm/pdbsum/
ReLiBase
http://rcsb.rutgers.edu:8081/
SCOP
http://scop.mrc-lmb.cam.ac.uk/scop/
Sequence
PIR
PRINTS
Garavelli, 2000
Bairoch and Apweiler, 2000
Function
BRENDA
ENZYME
LIGAND
Comprehensive structure
Orengo et al., 1998
Allen and Hoy, 1998
Kleywegt and Jones, 1998
Bergerhoff, 1998
Visualization and analysis of macromolecule
structures
3D and quaternary structures of biological
macromolecules
3D structures of biological macromolecules
Reichert et al., 2000
Summaries and structural analyses of PDB
data files
Protein–HET compound interactions from
PDB
Protein structure classification
Laskowski et al., 1997
Berman et al., 2000
Hendlich, 1998
Hubbard et al., 1998
Bioinorganic structure
HAD
MDB
http://www.bmm.icnet.uk/had/
http://metallo.scripps.edu/
PROCAT
http://www.biochem.ucl.ac.uk/bsm/
PROCAT/PROCAT.html
http://www.biochem.ucl.ac.uk/bsm/proLig/
ProtHaem.html
http://bioinf.leeds.ac.uk/promise/
http://metallo.scripps.edu/PROMISE/
Protein–haem interactions
PROMISE
• Amino group at N-terminus
• Carboxylate group at C-terminus
Ligands not derived from polypeptides are called exogenous (Holm et al., 1996). The exogenous ligands range
from simple inorganic entities (e.g. oxide, hydroxide, sulphide, water and other solvent-derived molecules, or such
physiological ligands as dioxygen or nitric oxide) to polydentate organic compounds, e.g. porphyrins or corrins.
Heavy-atom derivatives of protein crystals
Metalloprotein sites derived from 3D
structures
Enzyme active site 3D templates
Protein–haem interactions in
non-homologous haem proteins from PDB
Annotation of naturally occurring BIMs
Islam et al., 1998
Wallace et al., 1997
Karmirantzou, 1998
Degtyarenko et al., 1998
BIMs and BOMs
Let us suppose that there is only a limited set of
‘basic recurrent structures’ (Karlin, 1993) occurring in
natural metalloproteins; such structures will be called here
bioinorganic motifs (BIMs). Before giving more formal
definition, let us consider the possible scenarios:
• In the most simple case, when a single metal atom
is bound to a protein (mononuclear centre), the BIM
includes the metal and its first coordination shell
853
K.Degtyarenko
Table 2. Functional classification of metalloproteins and roles of corresponding compounds (metals or metal complexes). The ‘permanently’ bound compounds
involved in electron/proton transfer, substrate activation and gas binding are usually referred to as prosthetic groups. The reactants involved in electron/proton
transfer are usually referred to as coenzymes but can also be considered as ‘transiently’ bound redox centres. One representative PDB entry for each example
(if available) is given
Function of protein
Role of compound
Electron transfer
Electron transfer
Light harvesting
Excitation energy
transfer
Catalysis
Compound binding mode
Permanent
Transient
Substrate activation
Electron transfer
√
√
√
√
Catalysis and
regulation
Translocation
Switch of function
√
To be translocated
Catalysis or
transport
Inhibitor
Storage (uptake,
binding and release)
Gas coordination
Various
Structural
1CYO
1AYF
1AG6
Light-harvesting complex LH-II
BChl-a
1KZU
Nitrile hydratase
DMSO reductase
Nitrogenase MoFe protein
Manganese superoxide dismutase
2AHJ
1DMR
3MIN
1VEW
1FGJ
Manganese peroxidase
Ferrochelatase
Fe
Moco
FeMoco
Mn
haem P460
haems c
Mn
Fe, haem
Holoenzyme: aconitate hydrolase
Fe4 S4
1FGH
Apoenzyme: IRE-BP
Copper-transporting ATPase
Cu+
2AW0
√
Ca2+ -ATPase
La3+
Ca2+
1XLM
√
√
√
• When the metal centre is formed by more than one
metal atom (polynuclear centre), BIM includes all the
metal atoms and their first coordination shell ligands,
at least one of which is bridging (Figure 1c, d).
• When the protein binds a complex of metal with an
exogenous polydentate organic compound, such as
porphyrin or pterin, the BIM includes the metal atom
and its first coordination shell ligands, of which at least
two belong to the organic compound (Figure 1e, f).
Therefore, the BIM may be defined as
a common structural feature of a class of functionally related, but not necessarily homolo-
1MNP
1DOZ
–
√
ligands, of which at least three are endogenous ligands
(Figure 1a, b).
854
haem b
Fe2 S2
Cu
D -xylose
To be transported or
stored
PDB
Cytochrome b5
Adrenodoxin
Plastocyanin
Hydroxylamine oxidoreductase
√
Reactant
Example
Compound
Protein
isomerase
Nitrophorin
Haemocyanin
Haemophore HasA
Metallothioneins
Lactoferrin
Bacterioferritin
haem (coordinates NO)
2 Cu2+ (coordinates O2 )
haem
Cd2+ , Hg2+ , Pb2+ , Tl+
Fe
Fe (in form of
hydrated ferric
phosphate)
4NP1
1OXY
1B2V
4MT2
1B1X
1BFR
Lignin peroxidase
Zinc finger
Endonuclease III
Ca2+
Zn2+
Fe4 S4
1B82
1AAY
2ABK
gous, proteins, that includes the metal atom(s)
(1)
and first coordination shell ligands
For example, the similarity in active sites structure
of P450, chloroperoxidase and nitric oxide synthase
(Figure 1e), originally predicted from spectroscopic data
and later confirmed by crystallography, led to recognition
of these non-homologous enzymes as a distinctive class,
‘haem–thiolate proteins’ (NCIUB, 1991). A further
differentiation may be achieved by either considering the
second coordination shell (e.g. the amino acid residues
bound to metal through the solvent molecules may
be included) or by taking into account the chemistry,
orientation or conformation of the organic compound.
Interestingly, the different prosthetic groups (e.g. haem b
and haem a) may form similar BIMs and vice versa, the
same prosthetic group may form different BIMs (cf. haem
a and haem a3 centres in cytochrome c oxidase).
Bioinorganic motifs: towards functional classification of metalloproteins
Mononuclear centres
Polynuclear centres
N
N
O
N
O
N
N
S er
N
N
S
N
Fe
S
O
Cys
N
N
O
Cu
Cu
O
N
S
Metal–exogenous compound centres
N
N
N
N
Fe
N
N
N
O
(a)
OH
(c)
(e)
Cys
O
Cys
HO H
Fe
O
O
N
H2O
N
S
Cys
(b)
S
S
S
H
N
Cys
Fe
Fe
Fe
N
O
S
S
OH2
S
S
Mg
OH
O
Glu
Leu
N
S
O
Mo OH
S
HN
Cys
O
S
H2N
S
N
N
H
(d)
OH
O
P
O
O-
(f)
γ
Fig. 1. Examples of bioinorganic motifs. (a) Mononuclear iron centre in photosensitive nitrile hydratase: [Fe(NCys )(NSer )(SCys )3 (NO)].
ε
(b) Mononuclear magnesium centre in Ni–Fe hydrogenase: [Mg(OH2 )3 (Nε2
His ) (OGlu )(OLeu )]. (c) Dinuclear (type III) copper centre in
γ
ε2
oxyhaemocyanin: [{Cu(NHis )3 }2 (µ-O2 )]. (d) Polynuclear iron–sulphur centre: [Fe4 S4 (SCys )4 ]. (e) Haem iron coordination in haem–thiolate
γ
γ
proteins: [Fe(η4 -ppIX)SCys ]. (f) Molybdenum centre in sulphite oxidase: [MoO(OH)(SCys )(η2 -molybdopterin)].
The concept of BIM may be further broadened by
incorporating model molecules mimicking the natural
metalloprotein function. These models may be either
completely synthetic coordination compounds, complexes of peptide ‘maquettes’ and prosthetic group, or
‘redesigned’ natural proteins with novel metal centres
(Karlin, 1993; Lu and Valentine, 1997). Note that the idea
of non-protein derived compounds containing BIM does
not contradict the above definition of BIM (1) given that
the coordination mode of metal in these compounds is,
at least qualitatively, identical to that in natural proteins.
On the other hand, the field of bioinorganic chemistry
is not confined to metalloproteins. Siderophores and
antibiotics such as bleomycin are examples of naturally
occurring non-protein metal-binding biological molecules
(Lippard and Berg, 1994) which have their functional
analogues in the protein world. In its turn, the amazingly
diverse bioinorganic centres represent but a fraction of
the universal coordination compounds. For example,
only a few types of Fe–S clusters are found in biological
systems—cf. variety of abiological Fe–S clusters (Ogino
et al., 1998).
Many complex proteins contain purely organic prosthetic groups, such as flavins, pterins, pheophytins,
quinones or carotenoids. By analogy with bioinorganic
motifs, the bioorganic motif (BOM) can be defined as
a common structural feature of a class of functionally related, but not necessarily homologous, proteins, that includes the organic prosthetic group and polypeptide-derived groups
(2)
bonded to it
However, there is an intrinsic difficulty in defining
BOM because of the heterogeneity of chemical bonds
(covalent, hydrogen, van der Waals) which may or may
not be involved in interaction between prosthetic group
and polypeptide. The concept of a coordination shell is
not applicable any longer.
Haem proteins are the most extensive group of metalloproteins which often display complex combinations of
BIMs and BOMs (Karmirantzou, 1998). The choice of
BIMs and BOMs by Nature results in a spectacular variety
of active site structures even within the same protein family. Therefore it is difficult to predict from the amino acid
855
K.Degtyarenko
sequence whether a BIM/BOM is conserved in the protein family. The situation becomes even more complex at
domain and/or subunit interfaces. While the protein threedimensional (3D) structure tends to be better conserved
than the sequence, the quaternary structure may be less
conserved than the 3D structure. This means that the BIM
hosted at the subunit interface in the oligomeric protein
may not exist in a homologous monomeric protein even
if all the residues involved in metal coordination are conserved.
Metalloprotein and BIM evolution
A BIM by its very nature implies comparison between
members of a metalloprotein class. What about the evolutionary aspect of BIMs? Apparently BIMs were among the
first emerged structural features of proteins. The necessity
to catalyse reactions involving small inert molecules such
as CO2 , CH4 , H2 and N2 was a driving force of the evolution under primitive conditions (Williams, 1997). Since
none of the common amino acids are able to perform any
useful catalytic redox chemistry (Bugg, 1997), the first oxidoreductases and electron-transfer proteins employed the
available metals, most importantly Mn, Fe, Ni. It is unlikely that the fold of these first protein molecules evolved
prior to its metal binding function as implied by molecular
recognition theory (Blalock, 1999) (although the question
undoubtedly deserves a review on its own). The advent of
dioxygen, the toxic by-product of oxygenic photosynthesis (the process itself involving a unique variety of metal
centres), had a dramatic effect. It changed the availability of metals (in particular increasing availability of Cu
and Zn) and brought into existence new redox enzymes,
involved in detoxification of reactive oxygen species and
oxidative energy production (Williams, 1997).
The diversity of metal sites in proteins (as, indeed, of
almost any feature in biology) is due to both divergent and
convergent evolution. Within a divergent family, the active
site structure is usually, but not always, conserved while
the pairwise sequence identity may be as low as 10%.
Homologous metalloproteins may have different BIMs
(and vice versa). Thus, the sequence homology, although
often resulting in the same fold, does not guarantee the
same active site structure. I suggest the use of sequenceindependent BIMs to complement traditional evolutionary
trees based on sequence comparison.
Let us consider two haemoproteins having neither
sequence nor 3D similarity. What is homologous between
them? The answer is: the haem group. The known
porphyrin biosynthesis pathways are essentially the same;
the homologues of the corresponding enzymes are found
in different kingdoms of living organisms. ‘Unusual’
prosthetic groups seem to be restricted to taxa containing
the corresponding metabolic systems, and so on. The
use of a particular metal (or, indeed, particular organic
856
compounds) by certain organisms may be governed not
only by thermodynamics but also by the availability
of specific transport pathways and enzymes catalyzing
formation of complexes.
Some complex proteins are unable to fold correctly
without the corresponding prosthetic groups. In other
cases, the specific proteins route the metal ion or
prosthetic group to the target proteins. For example,
holocytochrome c synthase (EC 4.4.1.17) is required to
attach the haem covalently to the apocytochrome c. It
appears that each copper protein is served by a specific
Cu(I) transporting protein, ‘copper chaperone’ (Harrison
et al., 2000). Thus, the functional structure of a protein is
not always derived from its amino acid sequence alone.
However sophisticated the software tools that might
appear tomorrow, they would not be sufficient to predict
the function if the biochemical context is not taken into
account.
Dimensionality of BIMs
One always has to distinguish between structure and
its representation. For instance, the covalent structure of
a polypeptide is conventionally represented as a onedimensional (1D) amino acid sequence, although the
individual amino acids, like any organic compounds, may
be represented in two dimensions (2D) using structural
formulae or in 3D using the coordinates. As soon as
bonds other than peptide (e.g. disulphide) are taken into
account, the higher dimensions are required. Except for
glycine, all amino acids that occur in proteins are chiral,
therefore their unambiguous representation should include
stereochemistry. The dimensionality of stereochemical
structural formulae is between 2 and 3 and may be referred
to as 2.5D.
BIMs, like other co-ordination compounds, may be
represented in different ways (Figure 2). In contrast to
polymers, coordination compounds cannot (sensibly) be
represented in 1D. The formulae and systematic names
are too cumbersome to be useful. On the other hand, 3D
co-ordinates, if available, provide too much information
to be used for comparison or classification. I am in
favour of 2.5D representation. At the ligand level, not
only the nature of the residue (e.g. His) but also the
interacting atom is indicated (e.g. Nδ1 or Nε2 ). Not only
the nature of a metal and its coordination number, but
also its stereochemistry and, to some extent, coordination
geometry should be included. Stereochemistry is more
important because it is easier to define and the minor
adjustments of the polypeptide chain do not break it.
Because the coordination polyhedra in metalloproteins
are often distorted, it is difficult to choose between,
say, distorted octahedral and distorted trigonal biprismatic
geometries. Exact angles and distances are not important
in 2.5D.
Bioinorganic motifs: towards functional classification of metalloproteins
D
Database
Entry
0
ENZYME
1.14.16.2
0.5
1
–
PROSITE
1.5
–
–
PS00367
–
Representation
Iron
[Fe(His)2Glu]
P-D-x(2)-H-[DE]-[LI]-[LIVMF]-G-H-[LIVMC]-P
[Fe(Nε2His)2(OεGlu)(OH2)2]
N
N
N
2
–
–
H2O
Fe
H2O
N
O
O
N
N
N
2.5 PROMISE
AAAOH
H2O
H2O
Fe
N
O
O
3
MDB
1TOH
Fig. 2. Different dimensionality (D) representations of a BIM for
a mononuclear iron enzyme tyrosine 3-monooxygenase. Note that
the PROSITE pattern (1D) contains only two of the three protein
ligands.
The 2.5D representation is attractive because it is intuitively understood. The problem, however, is that there are
no publicly available 2.5D databases of biological macromolecules. The information of value for bioinorganic
chemistry, therefore, should somehow be derived from
other resources such as 1D (sequence), 3D (crystal and
solution structure) and dimensionless (enzyme function)
databases (Table 1).
BIMs in sequence databases
One possible explanation of the gap between bioinorganic
chemistry and bioinformatics is that the concept of ‘hetero compounds’ is fundamentally alien to the sequence
databases. A typical sequence database entry includes the
core data (i.e. sequence itself) and the annotation. The
core data is constituted by text with a limited number of
characters. This limitation is both advantageous and disadvantageous. The progress of bioinformatics in large-scale
genome sequence analysis is due to the one-dimensional
nature of the core data!
A comprehensive protein database remains to be
created. My idea of the ‘ideal’ protein database entry is
one which contains all the qualitative and quantitative
information available for the particular protein. So-called
protein sequence databases are, in the best case, polypeptide sequence databases. With the majority of entries
originating from nucleic acid sequencing projects, their
core data, at most, represent the explicit translation of
genomic data. Moreover, the major sequence databases
still stick to the 20 amino acid vocabulary and even
the naturally occurring amino acid residues such as
L -selenocysteine (IUPAC-IUBMB, 1999) and N-formylL -methionine are not considered ‘standard’. Both the
low-level (disulphide bridges, prosthetic groups, covalent
modifications) and the high-level (domains, membrane
topology, quaternary structure, biological function) information exists only as annotation. Quantitative data
are not presented at all. In SWISS-PROT (Bairoch and
Apweiler, 2000), the reliability of ‘low-level’ annotations
varies depending on whether the property is known from
3D structure, from a site-directed mutagenesis studies
or just from sequence comparison (Junker et al., 1999).
Since amino acid residues often have more than one
possible metal-binding mode, the annotations in sequence
databases are not informative enough for a bioinorganic
chemist.
The sequence motif databases, such as PROSITE (Hofmann et al., 1999) and PRINTS (Attwood et al., 2000),
provide information on the conserved amino acid residues
in protein families. As at 1 April 2000, 23% of PROSITE
entries (240 of 1035) and 27% of PRINTS entries (356 of
1310) correspond to metalloprotein or metal-binding protein families although the chosen motifs do not necessarily
include the actual metal-binding residues.
In an attempt to impose the restricted vocabulary and
standard syntax for feature annotation in the PIR Protein
Sequence Database, the RESID Database has been built
(Garavelli, 2000). RESID lists all the post-translational
modifications in proteins, including the covalently attached prosthetic groups. The entries include structural
formulae of the compounds (step towards 2.5D!) Again,
RESID has a limited use for the bioinorganic chemist.
One problem is that no distinction is made between
covalent (as in Cys-haem c) and coordination bonds [as
in Fe(Cys)4 ], while other prosthetic groups and structural
metal sites, though being coordination-bonded to the
polypeptide, are not included [e.g. haem b, Zn(Cys)4 ].
Enzyme databases
The only comprehensive protein function databases in
the public domain are enzyme databases: ENZYME,
LIGAND and BRENDA. In all three databases, the
entry names correspond to the EC (Enzyme Commission)
numbers according to Enzyme Nomenclature (IUBMB,
1992). One has to bear in mind, however, that each EC
857
K.Degtyarenko
number defines a particular chemical reaction (or family
of reactions) but not the chemical nature of the particular
catalyst. Therefore it comes as no surprise that, apart
from pointers to the few macromolecular databases, both
ENZYME and LIGAND are notably devoid of proteinspecific information. In fact, the chemical reactions in
these databases are essentially detached from catalysts,
making it possible (at least in principle) to assign the
same EC number to both the natural enzyme and, say,
the catalytic antibody, if they both catalyse the same
reaction. On the other hand, not all known biological
catalytic activities are assigned EC numbers or classified
as enzymatic at all. One of the most important biochemical
reactions on Earth is photosynthetic oxygen evolution:
2 H2 O + light → O2 + 4 H+ + 4 e−
but it is not assigned an EC number and photosystem II is
not considered to be an enzyme.
The ENZYME database (Bairoch, 2000) is primarily
based on Enzyme Nomenclature (IUBMB, 1992). The
Cofactor field of ENZYME actually includes the names
of prosthetic groups. There are few exceptions, such as
heme–thiolate (already BIM) and selenium (which
belongs to selenocysteine that contributes to the active site
structure, sometimes also as a part of BIM).
LIGAND is a composite database of the ENZYME
and COMPOUND sections (Goto et al., 1998) and is
at the moment the most comprehensive biochemical
information resource freely available through the Web.
In a COMPOUND entry, the links to the pertinent ENZYME entries are included together with one of four
possible roles of a given compound in the enzymatic
catalysis: R, reactant; C, cofactor; I, inhibitor and E,
enhancer (activator). Many compounds have more than
one function. For instance, manganese (COMPOUND
C00034) functions as a reactant in manganese peroxidase reaction (EC 1.11.1.13); as cofactor of manganese
superoxide dismutase (EC 1.15.1.1); as an inhibitor of
lysyl aminopeptidase (EC 3.4.11.1) and as an activator of
peptidase A (EC 3.4.13.18).
The linkage of ENZYME entries back to the COMPOUND database is sometimes problematic since a single
COMPOUND entry may correspond to more than one
chemical compound. For instance, C00034 includes
both Mn(II) and Mn(III); therefore, one cannot use
COMPOUND accession numbers as a sole reference for
reactants in certain oxidation–reduction or isomerisation
reactions.
BRENDA (Schomburg et al., 1999) is the most comprehensive freely available database on enzyme function. The
database entries are being consistently created and regularly updated by the curators using the original literature
rather than computer resources.
858
Two fields are of special interest for this review:
Cofactors/prosthetic groups
and
Metal
ions/salts. As one can expect, the information
provided is heterogeneous. Cofactors/prosthetic
groups also include the coenzymes. The difference
between coenzymes and prosthetic groups is clear within
the database context: coenzymes are the substrates and
therefore are also found in the Reaction and Substrate
spectrum fields. Thus, one could assume that
{Cofactors/prosthetic groups} = {Prosthetic groups} ∪ {Coenzymes}
{Coenzymes} = {Cofactors/prosthetic groups} ∩ {Substrates}
However, the metal ions which act as prosthetic groups
are also found in another field, Metal ions/salts,
together with ‘effector’ metals. Metal ions can also be
found in Inhibitor field. All in all, the existing data
structure in BRENDA is better suited for the human reader
than for a computer program.
Since all three databases are based on Enzyme Nomenclature, the same entry may contain information on structurally different proteins which catalyse the same or similar reaction. For example, LIGAND lists copper, zinc,
manganese and iron as cofactors of superoxide dismutases
(EC 1.15.1.1) while these enzymes are known to contain
either Cu and Zn or Fe or Mn. This information can be
found in a COMMENT field but it will require a human interpreter. To add even more confusion, BRENDA also includes the additional reactions catalysed by the same protein which catalyse the ‘main’ reaction. For instance, the
entry for EC 1.1.1.1 (alcohol dehydrogenase) lists peroxidase and esterase activities which should not be formally
classified as EC 1.1.1.1. The polynuclear inorganic prosthetic groups are not treated as separate compounds. There
are several types of Fe–S clusters but all of them are listed
either as iron–sulfur (ENZYME) or iron and sulfur
(LIGAND). Neither of the enzyme databases include enzymes with partially assigned EC numbers.
3D databases
‘Structural biology’ is the ’90s name for an older branch
of biophysics dealing with determination of the threedimensional (3D) structure of macromolecules. The
protein crystallographers were among the first to realize
the need to establish a computer database to store 3D
structures (Meyer, 1997), well before the genomic era.
Of the about 10 000 proteins of known 3D structure at
least half contain metal ions or other non-polypeptide
derived groups bound in their active sites, such groups
often themselves containing metal ions (Table 3). It is
worth remembering that both the first globular proteins
(Perutz and Kendrew, 1962) and the first membrane
protein (Deisenhofer et al., 1988) to be solved by x-ray
crystallography were metalloproteins!
Bioinorganic motifs: towards functional classification of metalloproteins
Table 3. Biologically important metals in the Periodic Table. The numbers of structurally characterized metal-binding proteins in MDB (version 1.4) are
indicated by the lower figure. Note that metal–protein complexes found in PDB are not necessarily the native metalloproteins
It should be noted however that the methods used in
macromolecular structure determination (i.e. crystallography and NMR) do not yield high quality structures of
small compounds. Indeed, while there are a growing number of structures determined at 1.2 Å resolution or better
(Longhi et al., 1998), the ‘high-resolution’ in structural
biology usually means that macromolecular structure
is solved at <2 Å resolution. Although this resolution
is more than enough for many biological applications,
it often does not provide sufficient information to the
bioinorganic chemist. The limitations of crystallography
for metalloprotein active centres account for uncertainty
or errors in the definition of ligand set; bond lengths and
angles; stereochemistry; protonation state of ligands; and
oxidation state of transition metals (Holm et al., 1996)
Nevertheless, crystallography is the most informative
physical method for protein structure determination
and 3D databases store the structural data in more or
less standard format, while spectroscopic databases of
metalloproteins simply do not exist.
The Protein Data Bank (PDB) is an archive of 3D
structures (Berman et al., 2000). A number of secondary databases have been derived from the PDB. The
Macromolecular Structure Database (MSD) project aims
to represent biological entities incorporating all levels
of structural organization, from covalent to quaternary
(rather than, say, crystallographic asymmetric units).
Therefore, the protein subset of MSD will be a real, if not
comprehensive, protein database.
A great deal of effort has been invested in the hierarchical classification of proteins. In such classification
schemes as SCOP (Hubbard et al., 1998) and CATH
(Orengo et al., 1998), the overall fold is the feature
conserved along every hierarchical branch; the functional
properties, including details of small compound binding,
may be highly specific for individual proteins. A number
of tools to search for functional sites in 3D structures
have been developed (Orengo et al., 1999), but there is
no comprehensive database of such sites. What should be
used for functional classification of metalloproteins?
Hetero compounds in PDB
In PDB, any chemical entity other than one of 20
standard amino acid residues in a polypeptide or one of
standard nucleotides in a nucleic acid, is referred to as
‘hetero compound’ (HET field). Given the great chemical
diversity of small molecules and relatively low resolution
of macromolecular structures, it is not surprising that the
data available for HET compounds ‘are generally in a
sorry state’ (Kleywegt and Jones, 1998).
Two main problems make HET compounds a poor
basis for metalloprotein classification: heterogeneity and
inconsistency. Indeed, HET compounds include:
• Water
• Metal ions
• Other exogenous inorganic compounds (e.g. CN− ,
−
Cl− , O2 , · NO, NH+
4 , HSO3 )
• Exogenous organic compounds (e.g. substrates, products, inhibitors)
• Prosthetic groups
• ‘Non-standard’ amino acids (e.g. Sec)
• Modified amino acids (e.g. TPQ)
• Modifiers (e.g. N-acetyl-D-glucosamine, myristoyl)
859
K.Degtyarenko
To illustrate the second problem, let us consider the large
group of diiron–carboxylate proteins, which contain Fe–
O–Fe unit in the active site. This group includes such proteins as ribonucleotide reductase, methane monooxygenase, ferritins, haemerythrin and purple acid phosphatase.
In different PDB entries, the HET compounds are:
• FE (iron, Fe2+ or Fe3+ )
• FEO (µ-oxo-diiron, Fe–O–Fe),
• FEA (monoazido-µ-oxo-diiron, N3 –Fe–O–Fe)
• MN (manganese, Mn2+ )
In the first case, FE cannot be distinguished from any other
iron ion, whether it is a mononuclear iron, iron–sulphur
cluster or haem. In the case of FEA, azide anion (N3− ) represents an ‘external’ ligand as opposite to intrinsic Fe–O–
Fe group. N3− binds to haemerythrin (and myohaemerythrin) at the site normally occupied by O2 in oxyhaemerythrin (oxymyohaemerythrin). The last case is found in the
structure of manganese-substituted bacterioferritin (PDB
1BFR). It is assumed that the structure of the metal-binding
site of Mn-substituted bacterioferritin is similar to that of
native protein. However, such additional information cannot be deduced from the existing 3D model in the PDB
format and may not always be found in comments. Such
important information as ligand protonation and metal oxidation states is often missing from the PDB entries and
sometimes also in the original articles. Not only different
HET names are used in PDB for the same compound (like
HEM and HEC for haem c), but also the same HET names
are used for different compounds (e.g. HEM may be either
haem b or haem c; HEC is either haem c or hydroxyethyl,
etc.).
PDB/HET derived resources
Since the databases derived from PDB use the HET
compounds as defined in PDB, they inherit most of the
problems discussed above.
HIC-Up (Hetero-compound Information Centre—
Uppsala) is a resource containing co-ordinates, dictionaries for a number of software packages (CNS, X-PLOR,
TNT and O), and other relevant information for the HET
compounds from PDB (Kleywegt and Jones, 1998). ReLiBase (Hendlich, 1998) is a complete data management
system comprising the object-oriented database handling
protein–hetero compound structures derived from PDB,
various query tools and web interface. The 3D structures of HET compounds in ReLiBase are converted to
‘two-dimensional’ (2D) chemical structures. This feature
allows the 2D substructure and 2D similarity search of
the database. However, the 2D structure does not always
represent the correct chemical structure due to protonation
860
and bond type uncertainty intrinsic to the PDB (which
keeps only geometric data). The protons are often missing
from the PDB entries, and the algorithm that generates
the 2D structure tends to fill the ‘free’ valences by bonds
to implicit hydrogens. ReLiBase is commercialized by
Cambridge Crystallographic Data Centre (CCDC) but
also remains freely accessible via WWW.
PDBsum (Laskowski et al., 1997) gives an at-a-glance
overview of the contents of each PDB entry in terms of
numbers of protein chains, HET compounds (including
metal ions), etc. Among other goodies, PDBsum offers
automatically produced LIGPLOT (Wallace et al., 1995)
maps of compound–protein interactions. ‘Compounds’
here are not only separate HET groups but also HET–HET
complexes (e.g. HEM-OXY). Unfortunately, such maps are
not available yet for single metal ion–protein interactions.
The IMB Jena Image Library (Reichert et al., 2000) also
includes a database of HET compounds and allows for
various searches, including very handy element search via
the Periodic Table of Elements.
Chemical 3D structure databases
The Cambridge Structural Database (CSD) is one of the
largest chemical resources currently available and the
largest crystallography database, containing ∼190 000
entries (Allen and Hoy, 1998). It comprises a comprehensive archive of bibliographic, chemical (2D), molecular
structure (3D) and crystal structure (3D) data for organic,
inorganic and organometallic compounds. Another important 3D chemistry resource is the Inorganic Crystal
Structure Database (ICSD) (Bergerhoff, 1998). It contains
complete structural information for inorganic compounds
abstracted from original journal articles, including compound name, molecular formula, crystal symmetry group,
unit cell parameters, atomic coordinates, and temperature
factors. IsoStar (Cole et al., 1998) incorporates experimental information on non-covalent interactions derived
from the CSD and the PDB as well as molecular orbital
calculations. IsoStar has great potential to address the
protein–small compound (e.g. protein–drug) interactions.
CSD and IsoStar are commercially distributed by CCDC;
ICSD is distributed by FIZ Karlsruhe. In the UK, access
to CSD and ICSD is free of charge to academic users of
Chemical Database Service (Fletcher et al., 1996).
I can name two reasons why a bioinorganic chemist
should give special attention to CSD. First, the resolution
generally achieved in crystallography of small compounds
is significantly better than that of macromolecules. Therefore, the high quality structures of small compounds can
be used to refine or validate the metalloprotein structures.
Second, CSD contains a number of structures of synthetic
compounds mimicking the metalloprotein active centres.
These structures could be considered to be BIMs!
Bioinorganic motifs: towards functional classification of metalloproteins
Harding (1999) used CDS to systematically extract geometrical data relevant to metalloproteins, using specific
2D ‘bioinorganic’ queries. The queries included the six
most common metals (Ca, Cu, Fe, Mg, Mn and Zn) and
six classes of ligands (alcohols, carboxylates, imidazoles,
phenolates, thiolates and water). Where appropriate,
the bond type and length restrictions were applied. The
method could be easily extended to use more complex
queries specific for the metal environment in proteins (i.e.
BIMs).
Need for biophysical databases
Apart from crystallography and NMR, there is whole
arsenal of other biophysical methods that can help reveal
the structure of metalloprotein active centres even in
the absence of 3D structures. BioMagResBank (Seavey
et al., 1991) contains chemical shift data derived from
∼1530 proteins and peptides, including those for HET
compounds. Unfortunately, there are no other publicly
available databases containing the spectroscopic information. Similarly, the functional properties, most importantly
the midpoint potentials of redox centres, await the bioinformaticians’ attention. The lack of standard data formats
poses a problem—but one not serious enough to prevent
the rapid colonization of this ‘ecological niche’ in the
very near future.
BIM databases?
The Scripps Research Institute’s Metalloprotein site
Database and Browser (MDB) contains geometrical and
functional information on metal sites derived from 3D
structures that allows the classification and search of
particular combinations of site characteristics. The current
release (MDB 1.4) consists of two databases:
• The ‘raw’ database is created by automatic recognition and extraction of quantitative information on
metal sites from protein subsets of PDB. This is a
comprehensive database, containing information on
about 4100 proteins.
• The ‘edited’ database includes 32 sites from representative Ca, Cu, Fe, Mn and Zn proteins and
contains manually added information, such as function (structural, storage, electron transfer, O2 binding
or catalytic).
The Java-based viewer provides an interactive query tool
to both ‘raw’ and ‘edited’ databases. The ‘raw’ database
could also be searched using either HTML forms or
an SQL interface. Using these tools, complex queries
involving metal identity, coordination geometry, number
of ligands, type and number of protein-derived ligands,
distance cutoff criteria, etc. can be made.
MDB allows the interactive visualization of the metal
centre and ligands contributing to the metal atom’s first
coordination shell. It is not possible to visualize organic
compound–protein interactions, such as in haem proteins.
Maria Karmirantzou and Janet Thornton have analysed
321 haemoproteins from the PDB comprising 13 nonhomologous families and have created the specialized
Protein–Haem Interactions database (Karmirantzou,
1998). Conformational analysis of haem included torsion
angles, planarity and accessibility of the haem group.
Analysis of polypeptide–haem interactions included
amino acids propensities, constraints upon the haem conformation and secondary structure at the protein–haem
interface. The results were made available on the Web but
the database has not been updated since 1998.
PROMISE was intended to be a comprehensive information source on naturally occurring BIMs (Degtyarenko
et al., 1998). Its focus is on protein active site structure
and on the relationships between a polypeptide and a prosthetic centre. BIMs were used as a basis for classification of metalloproteins, as both alternative and complementary to those employed in other ‘secondary’ protein
databases. PROMISE presents the relevant sequence, 3D
structural and physico-chemical information in a hierarchically organized collection of HTML documents. Unfortunately PROMISE was discontinued in 1999 due to lack
of funding.
In contrast to other databases reviewed, HAD (HeavyAtom Databank) deals exclusively with non-natural
metal-binding sites in proteins (Islam et al., 1998).
The crystallographic methods of multiple isomorphous
replacement and anomalous scattering use high quality
heavy-atom derivatives of protein crystals. Ironically,
it was the results of such analyses, i.e. models of the
native proteins, that were deposited to the PDB while
the structures of heavy-atom derivatives were discarded.
The ‘heavy atoms’ in HAD are defined as those with an
atomic mass greater than rubidium. HAD contains several
file types, including coordinate files for the heavy-atom
positions in PDB-compatible format, crystallization conditions files, compound data files and reference data files.
For this review, the pairs of metalloprotein data files are
of prime interest. One file of the pair contains information
on metalloprotein derivative with native metal replaced,
with details of type, quantity, function, coordination
geometry, distances and angles between the substituted
heavy atom and protein ligands. The second file has the
analogous data for the native metalloprotein. Thus, HAD
may be used to analyse the conformational change at the
metal-binding site upon replacement and reveal the most
‘native-like’ heavy atom substituents.
PROCAT is a database of enzyme active site 3D
templates created using the TESS (TEmplate Search
and Superposition) algorithm (Wallace et al., 1997). The
861
K.Degtyarenko
entries are classified according to the Enzyme Nomenclature. The templates include catalytically important and
spatially conserved atoms or amino acid residues. The
templates may be viewed as 3D analogues of PROSITE or
PRINTS patterns which could be used to search the PDB
for similar sites. Since the same EC class can include
non-homologous protein families, the case of entries
containing more than one template should be envisaged.
In reality, PROCAT does not cover even those EC classes
which are well represented in the PDB. As so often
happens, the progress here is limited by ‘people-ware’
(Hann and Green, 1999) and not by an algorithm.
Note however that TESS may be used to search the
PDB for any user defined combination of atoms in space,
i.e. it is not restricted to enzymes, polypeptides and
macromolecules in general. Likewise, the inconsistencies
of HET compounds do not pose a problem as far as
the atomic model of the compound is correct. Therefore,
TESS appears to be an ideal method to build a database of
3D BIMs!
Conclusion
Like other motifs in bioinformatics, BIM reflect the similar features of a class of proteins. Thus BIM may be used
both for classification and for the search of other functionally related proteins. Although BIMs inhabit the major biological and chemical databases, they are not defined
in any consistent way. Why, in spite of an abundance of
experimental data on metalloprotein structure and function, are there no comprehensive database of BIMs? The
reasons are the intrinsic complexity of data and lack of a
data model capable of handling such complexity; insufficient interoperability of existing biological databases; lack
of standards (including terminology) for biochemical data
in general. No reliable algorithm exists to predict BIMs
from sequence data, so there are no ‘easy’ ways in which
to populate the database. Compiling such a database is a
complex and ambitious task requiring decades or hundreds
of expert man-years, that could only be achieved by the
close cooperation of international scientific communities.
However, once created, the database could be used to yield
knowledge on metal–polypeptide interactions.
Bioinformatics of today deals primarily with the structure of biological macromolecules. In the next century
it should be extended to encompass biochemical and
biophysical (i.e. functional) data. In an ideal scenario,
development of free biochemical, biophysical and BIM
databases will go hand in hand with standardization
activities, such as creation of controlled vocabularies
of gene function and biological processes (White et al.,
1999).
862
Acknowledgements
I am indebted to Prof V.Yu.Uvarov, who introduced me to
the fields of bioinorganic chemistry and bioinformatics. I
thank Katalin Nadassy, Gillian Adams and my anonymous
reviewers for their helpful comments and suggestions on
the manuscript.
Glossary
Apoprotein, the polypeptide component of a complex
protein.
Bioinorganic motif, a common structural feature shared
by functionally related proteins, consisting of the metal
atom(s) and first coordination shell ligands.
Bridging ligand, an atom that donates two or more
electron pairs to different central atoms in polynuclear
coordination entity; indicated by the symbol µ.
Coenzyme, a non-polypeptide compound involved in enzymatic reactions as a reactant capable to donate or accept
chemical groups or electrons.
Coordination geometry, arrangement of the ligands
around the central atom.
Coordination number, the number of σ -bonds between the
central atom and ligands.
Coordination shell (first coordination shell), the collective
name for the ligands surrounding the central atom(s).
Diiron–carboxylate proteins, a group of proteins characterized by binuclear iron centre bridged by carboxylate
group(s) of Asp or Glu and oxide/hydroxide group(s).
Endogenous, polypeptide-derived.
Enzyme, a protein catalyst.
Exogenous, not derived from polypeptide.
Haem, an iron–porphyrin complex. Natural haems (a, b,
c, d, d1 , o) differ by substituents at various porphyrin
positions.
Holoprotein, the functional complex protein.
Homology, common evolutionary ancestry.
Iron–molybdenum cofactor (FeMoco), the prosthetic
group of nitrogenase MoFe protein.
Ligand (in a coordination entity), one of the atoms or
chemical groups bound to the metal atom via a dative
bond.
Midpoint potential (standard redox potential; E0 or Em ),
the redox potential of a system containing one mole each
of the reduced and oxidized form of a compound. In
biological systems, the E0 at pH 7 (E0 or Em.7 ) is used
as the reference.
Molybdenum cofactor (Moco), the metal (Mo or W) complex of molybdopterin. Moco functions as the prosthetic
group of a number of oxidoreductases.
Monodentate ligand, the compound that can donate one
electron pair to central atom in coordination entity.
Mononuclear, containing one metal atom within a coordination shell.
Bioinorganic motifs: towards functional classification of metalloproteins
Photosystem II (PSII), a multi-subunit transmembrane
protein complex in plants, algae and cyanobacteria that
uses light energy to oxidise water to dioxygen.
Polydentate ligand, the compound that can donate more
than one electron pair to central atom in coordination
entity. The number n of ligating atoms is represented by
the symbol ηn .
Polynuclear, containing more than one metal atom within
a single coordination shell.
Porphyrin, a macrocycle containing four pyrrole rings
linked by single carbon atom bridges. Naturally occurring
porphyrins form tight complexes with metal ions, such as
Fe (haems), Mg (chlorophylls) and Ni (F430).
Prosthetic group, a non-polypeptide compound that
conveys specific biological function to holoprotein. Single
metal ions, inorganic compounds, organic compounds and
metal–organic complexes all may function as prosthetic
groups.
Redox, abbreviation of oxidation–reduction.
Redox potential (E), a measure of the tendency of a redox
system to donate or accept electrons.
Siderophores, small organic molecules involved in the
specific uptake of iron in bacteria.
References
Allen,F.H. and Hoy,V.J. (1998) Cambridge Structural Database. In
von Ragué Schleyer,P. (ed.), Encyclopedia of Computational
Chemistry. John Wiley & Sons, Chichester, pp. 155–167.
Attwood,T.K.,
Croning,M.D.R.,
Flower,D.R.,
Lewis,A.P.,
Mabey,J.E.,
Scordis,P.,
Selley,J.N.
and
Wright,W.
(2000) PRINTS-S: the database formerly known as
PRINTS. Nucleic Acids Res., 28, 225–227. URL =
http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/
Bairoch,A. (2000) The ENZYME database in 2000. Nucleic Acids
Res., 28, 304–305. URL = http://www.expasy.ch/enzyme/
Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein
sequence database and its supplement TrEMBL in 2000. Nucleic
Acids Res., 28, 45–48. URL = http://www.expasy.ch/sprot/
Barker,W.C., Garavelli,J.S., Huang,H., McGarvey,P.B., Orcutt,B.C.,
Srinivasarao,G.Y., Xiao,C., Yeh,L.S., Ledley,R.S., Janda,J.F.,
Pfeiffer,F., Mewes,H.W., Tsugita,A. and Wu,C. (2000) The Protein Information Resource (PIR). Nucleic Acids Res., 28, 41–
44. URL = http://pir.georgetown.edu/
Bergerhoff,G. (1998) Inorganic three-dimensional structure
databases. In von Ragué Schleyer,P. (ed.), Encyclopedia of
Computational Chemistry. John Wiley & Sons, Chichester, pp.
1325–1337.
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N.,
Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. URL = http:
//www.rcsb.org/pdb/
Blalock,J.E. (1999) On the evolution of ligands: did peptides
functionally precede metals and small organic molecules? Cell.
Mol. Life Sci., 55, 513–518.
de Bolster,M.W.G. (1997) Glossary of terms used in bioinorganic
chemistry. Pure Appl. Chem., 69, 1251–1303. URL = http://
www.chem.qmw.ac.uk/iupac/bioinorg/
Bugg,T. (1997) An Introduction to Enzyme and Coenzyme Chemistry. Blackwell Science, Oxford.
Cole,J.C., Taylor,R. and Verdonk,M.L. (1998) Directional preferences of intermolecular contacts to hydrophobic groups. Acta
Crystallogr. D, 54, 1183–1193.
Davidson,E.R. (ed) (2000) Computational transition metal chemistry. Chem. Rev., 100, 351–818. URL = http://pubs.acs.org/
cgi-bin/jtocz?chreay/100/2
Degtyarenko,K.N., North,A.C.T., Perkins,D.N. and Findlay,J.B.C.
(1998) PROMISE: a database of information on prosthetic centres and metal ions in protein active sites. Nucleic Acids Res., 26,
376–381. URL = http://bioinf.leeds.ac.uk/promise/
Deisenhofer,J. and Michel,H. (1992) High-resolution crystal structures of bacterial photosynthetic reaction centers. In Ernster,L. (ed.), Molecular Mechanisms in Bioenergetics. Elsevier,
Amsterdam, pp. 103–120.
Deisenhofer,J., Huber,R. and Michel,H. Nobel Prize in Chemistry
(1988) for the determination of the three-dimensional structure
ofa photosynthetic reaction centre’. URL = http://www.nobel.se/
laureates/chemistry-1988.html.
Fletcher,D.A., McMeeking,R.F. and Parkin,D. (1996) The United
Kingdom chemical database service. J. Chem. Inf. Comput. Sci.,
36, 746–749. URL = http://cds.dl.ac.uk/cds/
Garavelli,J.S. (2000) The RESID Database of protein structure modifications: 2000 update. Nucleic Acids Res., 28, 209–211. URL
= http://pir.georgetown.edu/pirwww/dbinfo/resid.html
Goto,S., Nishioka,T. and Kanehisa,M. (1998) LIGAND: chemical database for enzyme reactions. Bioinformatics, 14, 591–
599. URL = http://www.genome.ad.jp/dbget/ligand.html
Hann,M. and Green,R. (1999) Chemoinformatics—a new name for
an old problem? Curr. Opin. Chem. Biol., 3, 379–383.
Harding,M.M. (1999) The geometry of metal–ligand interactions
relevant to proteins. Acta Crystallogr. D, 55, 1432–1443.
Harrison,M.D., Jones,C.E., Solioz,I. and Dameron,C.T. (2000)
Intracellular copper routing: the role of copper chaperones.
Trends Biochem. Sci., 25, 29–32.
Hendlich,M. (1998) Databases for protein–ligand complexes. Acta
Crystallogr. D, 54, 1178–1182.
Hofmann,K., Bucher,P., Falquet,L. and Bairoch,A. (1999) The
PROSITE database, its status in 1999. Nucleic Acids Res., 27,
215–219. URL = http://www.expasy.ch/prosite/
Holm,R.H., Kennepohl,P. and Solomon,E.I. (1996) Structural and
functional aspects of metal sites in biology. Chem. Rev., 96,
2239–2314.
Huang,W., Jia,J., Cummings,J., Nelson,M., Schneider,G. and
Lindqvist,Y. (1997) Crystal structure of nitrile hydratase reveals
a novel iron centre in a novel fold. Structure, 5, 691–699.
Hubbard,T.J.P., Ailey,B., Brenner,S.E., Murzin,A.G. and Chothia,C.
(1998) SCOP, structural classification of proteins database:
applications to evaluation of the effectiveness of sequence
alignment methods andstatistics of protein structural data. Acta
Crystallogr. D, 54, 1147–1154. URL = http://scop.mrc-lmb.cam.
ac.uk/scop/
Islam,S.A., Carvin,D., Sternberg,M.J. and Blundell,T.L. (1998)
HAD, a data bank of heavy-atom binding sites in protein crystals:
a resource for use in multiple isomorphous replacement and
anomalous scattering. Acta Crystallogr. D, 54, 1199–1206. URL
= http://www.bmm.icnet.uk/had/
863
K.Degtyarenko
IUBMB, (1992) Enzyme Nomenclature: Recommendations (1992)
of the Nomenclature Committee of the International Union
of Biochemistry and Molecular Biology. Academic Press, San
Diego.
IUPAC-IUBMB,Joint Commission on Biochemical Nomenclature
(JCBN) and Nomenclature Committee of IUBMB (NC-IUBMB)
Newsletter, (1999) Eur. J. Biochem., 264, 607–609. URL = http:
//www.chem.qmw.ac.uk/iubmb/newsletter/1999/item3.html
Jernigan,R., Raghunathan,G. and Bahar,I. (1994) Characterization
of interactions and metal ion binding sites in proteins. Curr. Opin.
Struct. Biol., 4, 256–263.
Junker,V.L., Apweiler,R. and Bairoch,A. (1999) Representation
of functional information in the SWISS-PROT data bank.
Bioinformatics, 15, 1066–1067.
Karlin,K.D. (1993) Metalloenzymes, structural motifs, and inorganic models. Science, 261, 701–708.
Karmirantzou,M. (1998) Computational approaches to protein–
ligand interactions: protein–haem complexes, PhD Thesis, University College London.
Kleywegt,G.J. and Jones,T.A. (1998) Databases in protein crystallography. Acta Crystallogr. D, 54, 1119–1131. URL = http:
//xray.bmc.uu.se/hicup/
Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C.,
Jones,M.L. and Thornton,J.M. (1997) PDBsum: a web-based
database of summaries and analyses of all PDB structures. Trends
Biochem. Sci., 22, 488–490. URL = http://www.biochem.ucl.ac.
uk/bsm/pdbsum/
Lippard,S.J. and Berg,J.M. (1994) Principles of Bioinorganic
Chemistry. University Science Books, Mill Valley.
Longhi,S., Czjzek,M. and Cambillau,C. (1998) Messages from
ultrahigh resolution crystal structures. Curr. Opin. Struct. Biol.,
8, 730–737.
Lu,Y. and Valentine,J.S. (1997) Engineering metal-binding sites in
proteins. Curr. Opin. Struct. Biol., 7, 495–500. Macromolecular
Structure Database. URL = http://msd.ebi.ac.uk/. Metalloprotein
site Database and Browser. URL = http://metallo.scripps.edu/
Meyer,E.F. (1997) The first years of the protein data bank. Protein
Sci., 6, 1591–1597.
Nomenclature Committee of the International Union of Biochemistry (NCIUB), (1991) Nomenclature of electron-transfer proteins. Recommendations 1989. Eur. J. Biochem., 200, 599–611.
864
Ogino,H., Inomata,S. and Tobita,H. (1998) Abiological iron–sulfur
clusters. Chem. Rev., 98, 2093–2122.
Orengo,C.A., Martin,A.M., Hutchinson,G., Jones,S., Jones,D.T.,
Michie,A.D., Swindells,M.B. and Thornton,J.M. (1998) Classifying a protein in the CATH database of domain structures. Acta
Crystallogr. D, 54, 1147–1154. URL = http://www.biochem.ucl.
ac.uk/bsm/cath/
Orengo,C.A., Todd,A.E. and Thornton,J.M. (1999) From protein
structure to function. Curr. Opin. Struct. Biol., 8, 374–382.
Perutz,M.F. and Kendrew,J.C. Nobel Prize in Chemistry (1962) for
their studies of the structures of globular proteins. URL = http:
//www.nobel.se/laureates/chemistry-1962.html.
Reichert,J., Jabs,A., Slickers,P. and Sühnel,J. (2000) The IMB Jena
Image Library of Biological Macromolecules. Nucleic Acids
Res., 28, 246–249. URL = http://www.imb-jena.de/IMAGE.html
Schomburg,D., Schomburg,I., Chang,A. and Bänsch,C. (1999)
BRENDA: the information system for enzymes and metabolicinformation. In Proceedings of the German Conference on Bioinformatics1999. URL = http://www.brenda.uni-koeln.de/
Seavey,B.R., Farr,E.A., Westler,W.M. and Markley,J.L. (1991) A
relational database for sequence-specific protein NMR data. J.
Biomol. NMR, 1, 217–236. URL = http://www.bmrb.wisc.edu/
Valentine,J.S. and O’Halloran,T.V. (1999) Bio-inorganic chemistry:
what is it, and what’s so exciting? Curr. Opin. Chem. Biol., 3,
129–130.
Wallace,A.C., Laskowski,R.A. and Thornton,J.M. (1995) LIGPLOT: a program to generate schematic diagrams of proteinligand interactions. Protein Eng., 8, 127–134.
Wallace,A.C., Borkakoti,N. and Thornton,J.M. (1997) TESS: a geometric hashing algorithm for deriving 3D coordinate templates
for searching structural databases. Application to enzyme active
sites. Protein Sci., 6, 2308–2323. URL = http://www.biochem.
ucl.ac.uk/bsm/PROCAT/PROCAT.html
White,J.A., Apweiler,R., Blake,J.A., Eppig,J.T., Maltais,L.J. and
Povey,S. (1999) Report of the second international nomenclature workshop. Cambridge, United Kingdom, May 1–2, 1999.
Genomics, 62, 320–323. URL = http://www.gene.ucl.ac.uk/
nomenclature/INW2.html
Williams,R.J.P. (1997) The natural selection of the chemical elements. Cell. Mol. Life Sci., 53, 816–829.