* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bioinorganic motifs: towards functional classification of metalloproteins
Genetic code wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein folding wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein domain wikipedia , lookup
Biochemistry wikipedia , lookup
Interactome wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
List of types of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein adsorption wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
BIOINFORMATICS REVIEW Vol. 16 no. 10 2000 Pages 851–864 Bioinorganic motifs: towards functional classification of metalloproteins Kirill Degtyarenko ∗ EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Received on December 21, 1999; revised on April 6, 2000; accepted on May 2, 2000 Abstract The habitat of bioinorganic motifs (BIMs) is at the interface of biological inorganic chemistry and bioinformatics. BIM is defined as a common structural feature shared by functionally related, but not necessarily homologous, proteins, and consisting of the metal atom(s) and first coordination shell ligands. BIMs appear to be suitable for classification of metal centres at any level, from groups of unrelated proteins with similar function to different functional states of the same protein, and for description of possible evolutionary relationships of metalloproteins. However, they have not attracted wide attention from the bioinformatics community. Although their presence is appreciated, they are difficult to predict—therefore the current ‘high-throughput’ initiatives are likely to miss or ignore them altogether. The protein sequence databases do not distinguish between proteins containing different prosthetic groups (unless they have different sequences) or between apo- and holoprotein. On the other hand, the protein structure databases include data on ‘hetero compounds’ of various origin but these data are often inconsistent. A number of specialized databases dealing with BIMs and attempts to classify them are reviewed. Supplementary information: The additional bibliography and list of Internet resources on bioinorganic chemistry are available at http:// www.ebi.ac.uk/ ∼kirill/ biometal/ Contact: [email protected] Abbreviations BChl-a, bacteriochlorophyll a BIM, bioinorganic motif BOM, bioorganic motif CCDC, Cambridge Crystallographic Data Centre CSD, Cambridge Structural Database D, dimensionality EC, Enzyme Commission FeMoco, iron–molybdenum cofactor ∗ To whom correspondence should be addressed. c Oxford University Press 2000 ICSD, Inorganic Crystal Structure Database EPR, electron paramagnetic resonance MDB, Metalloprotein site Database and Browser MSD, Macromolecular Structure Database Moco, molybdenum cofactor NMR, nuclear magnetic resonance ppIX, protoporphyrin IX PDB, Protein Data Bank Sec, L-selenocysteine TPQ, 2,4,5-trihydroxyphenylalanine quinone Introduction The field of biological inorganic chemistry is multidisciplinary and perhaps lacks well defined boundaries (Valentine and O’Halloran, 1999), but its main focus undoubtedly is on the structure and function of metalcontaining proteins. Metalloproteins participate in the most important biochemical processes including respiration, nitrogen fixation and oxygenic photosynthesis. Metalloenzymes were the first biological catalysts on Earth. About one-third of all structurally characterized proteins contain metals, while over 50% of all proteins are estimated to be metalloproteins. This emphasizes the crucial role of metal ions in stabilizing protein structure (Jernigan et al., 1994). Computational protein structure analysis is one of the cornerstones of bioinformatics. It seems strange how little attention the bioinformatics community has paid to metalloproteins and other complex proteins. (To get an idea, try PubMed search with the combination ‘bioinformatics’ + ‘biological inorganic’ or ‘computational’ + ‘bioinorganic’.) It is particularly striking considering the remarkable efforts and progress made in computational inorganic chemistry in the last few years [see, for example, Davidson (2000)]. Why did this happen? Historically, the main focus of bioinformatics has been on computational analysis of biological macromolecules, i.e. proteins and nucleic acids. Advent of high-throughput 851 K.Degtyarenko sequencing methods provides bioinformaticians with more and more raw sequence data to analyse. Since proteins are biochemical entities, the lack of specific biochemical data results in the immense information gap between protein structure and function. A ‘seamless transition between bioinformatics and chemoinformatics’ (Hann and Green, 1999) is needed to bridge this gap. Although the term ‘chemoinformatics’ came from the field of drug discovery, the methodologies employed are equally applicable to fundamental chemistry, including bioinorganic chemistry. Most of the challenges in chemoinformatics also have direct analogy with bioinformatics (just replace ‘molecule’ with ‘biological macromolecule’). There are some important differences too. Many bioinformatics resources (databases, services, programs) are freely accessible via the Internet. In contrast, almost all chemical resources are not. The lack of free chemical databases has resulted in the absence of a standard format for chemical data. Here, I present my view of what chemical information should be (or already is) available in terms of bioinorganic motifs (BIMs). Many features of BIMs were already summarized in our earlier paper (Degtyarenko et al., 1998) but the basic definition was lacking. Firstly, I try to give definitions and discuss the properties of BIMs. I then review the databases dealing with metalloproteins and BIMs, which are summarized in Table 1. Some ideas discussed in this review were presented at the Second International Nomenclature Workshop (White et al., 1999). Some definitions This section contains definitions of terms directly related to the concept of bioinorganic motif. The definition for terms given in italic are summarized in the Glossary. The Glossary of Terms in Bioinorganic Chemistry (de Bolster, 1997) contains definitions for approximately 400 terms of relevance and I recommend it for further reference. The same low-molecular compound can play different roles depending on its chemical context in the macromolecular environment (Lippard and Berg, 1994). The term cofactor causes confusion since it has been used instead of either prosthetic group or coenzyme, or referred to both collectively. (Sometimes even proteins such as calmodulin are referred to as ‘cofactors’ in biochemical literature). Therefore, the use of this term should be generally avoided, apart from certain well established combinations, e.g. molybdenum cofactor and iron–molybdenum cofactor. Both the apoprotein and the prosthetic group are integral parts of a functional complex protein. In contrast, the coenzyme is just another substrate of an enzyme. It should be noted that the biological functions of complex proteins (Table 2) may be other than catalysis, while the term ‘coenzyme’ should be used 852 only in conjunction with enzymes. An example of two distinct roles for the same compound within one protein complex is provided by the photosynthetic reaction centre from Rhodobacter sphaeroides: while one molecule of ubiquinone-10 is tightly bound, another one exchanges with the quinone pool of the membrane so that the electrons are transported outside the protein (Deisenhofer and Michel, 1992). Both prosthetic groups and coenzymes may (or may not) contain metal ions. Metal atoms per se can play such different roles as prosthetic group; substrate, product or inhibitor of an enzyme; stored or transported atom. The word ligand has two distinct, and sometimes directly opposite, meanings: (i) In coordination chemistry, the atoms or chemical groups bound to the central atom (usually a metal) via dative bond are called ligands. The donors of one or more electron pairs to the central atom are called monodentate or polydentate ligands, respectively. In bioinorganic chemistry, the ligands are often derived from macromolecules (polypeptides and nucleic acids) and some of them are polydentate. (ii) In biochemistry, any low-molecular compound (including metal ions and metal compounds) bound to the macromolecule may be referred to as ligand, e.g. in ‘ligand–receptor interactions’. The linkage, therefore, is not restricted to dative bonds. Interestingly, the names such as LIGAND (Goto et al., 1998), ReLiBase (Hendlich, 1998) and LIGPLOT (Wallace et al., 1995) all make use of this biochemical meaning. I will use the term ligand only in its (i) sense. The ligands surrounding the central atom are collectively called the (first) coordination shell. Polypeptide can be regarded as a polydentate ligand, but it is often easier to think of the amino acid residues as separate ligands. In some cases, however, the polydentate nature of polypeptide simply cannot be ignored. For example, in nitrile hydratase (Figure 1a), the active centre iron is coordinated to polypeptide backbone as well as side chains. The coordination geometry is octahedral, with the iron atom and equatorial ligands that can be superimposed on the plane of the iron and the four pyrrole nitrogens in haem (Huang et al., 1997). There are four classes of functional groups collectively referred to as polypeptide-derived, or endogenous, ligands (Holm et al., 1996): • Side chain groups: amide (Asn, Gln), amino (Lys), carboxyl (Asp, Glu), hydroxyl (Ser, Thr), imidazole (His), phenol (Tyr), selenol (Sec), sulphide (Met) and thiol (Cys) • Carbonyl and amide of main chain Bioinorganic motifs: towards functional classification of metalloproteins Table 1. Databases relevant to this review Database URL Description Reference Protein sequences Protein sequence motifs (fingerprints) Barker et al., 2000 Attwood et al., 2000 PROSITE http://pir.georgetown.edu/pir/ http://www.bioinf.man.ac.uk/dbbrowser/ PRINTS/ http://www.expasy.ch/prosite/ Hofmann et al., 1999 RESID SWISS-PROT http://pir.georgetown.edu/resid/ http://www.expasy.ch/sprot/ Protein sequence motifs (regular expressions and profiles) Post-translational modifications in proteins Protein sequences http://www.brenda.uni-koeln.de/ http://www.expasy.ch/enzyme/ http://www.genome.ad.jp/dbget/ligand.html Physico-chemical properties of enzymes Enzyme nomenclature Enzymes, reactions and compounds Schomburg et al., 1999 Bairoch, 2000 Goto et al., 1998 BioMagResBank http://www.bmrb.wisc.edu/ Seavey et al., 1991 CATH CSD http://www.biochem.ucl.ac.uk/bsm/cath/ http://cds.dl.ac.uk/cds/llcsd2.html HIC-Up ICSD IMB Jena Image Library http://xray.bmc.uu.se/hicup/ http://barns.ill.fr/dif/icsd/ http://www.fiz-karlsruhe.de/stn/Databases/ icsd.html http://www.imb-jena.de/IMAGE.html NMR data on proteins, peptides and nucleic acids Protein structure classification Crystal structures of organic and metalloorganic compounds HET compounds from PDB Crystal structures of inorganic compounds MSD http://msd.ebi.ac.uk/ PDB PDBsum http://www.rcsb.org/pdb/ http://pdb-browsers.ebi.ac.uk/ http://www.biochem.ucl.ac.uk/bsm/pdbsum/ ReLiBase http://rcsb.rutgers.edu:8081/ SCOP http://scop.mrc-lmb.cam.ac.uk/scop/ Sequence PIR PRINTS Garavelli, 2000 Bairoch and Apweiler, 2000 Function BRENDA ENZYME LIGAND Comprehensive structure Orengo et al., 1998 Allen and Hoy, 1998 Kleywegt and Jones, 1998 Bergerhoff, 1998 Visualization and analysis of macromolecule structures 3D and quaternary structures of biological macromolecules 3D structures of biological macromolecules Reichert et al., 2000 Summaries and structural analyses of PDB data files Protein–HET compound interactions from PDB Protein structure classification Laskowski et al., 1997 Berman et al., 2000 Hendlich, 1998 Hubbard et al., 1998 Bioinorganic structure HAD MDB http://www.bmm.icnet.uk/had/ http://metallo.scripps.edu/ PROCAT http://www.biochem.ucl.ac.uk/bsm/ PROCAT/PROCAT.html http://www.biochem.ucl.ac.uk/bsm/proLig/ ProtHaem.html http://bioinf.leeds.ac.uk/promise/ http://metallo.scripps.edu/PROMISE/ Protein–haem interactions PROMISE • Amino group at N-terminus • Carboxylate group at C-terminus Ligands not derived from polypeptides are called exogenous (Holm et al., 1996). The exogenous ligands range from simple inorganic entities (e.g. oxide, hydroxide, sulphide, water and other solvent-derived molecules, or such physiological ligands as dioxygen or nitric oxide) to polydentate organic compounds, e.g. porphyrins or corrins. Heavy-atom derivatives of protein crystals Metalloprotein sites derived from 3D structures Enzyme active site 3D templates Protein–haem interactions in non-homologous haem proteins from PDB Annotation of naturally occurring BIMs Islam et al., 1998 Wallace et al., 1997 Karmirantzou, 1998 Degtyarenko et al., 1998 BIMs and BOMs Let us suppose that there is only a limited set of ‘basic recurrent structures’ (Karlin, 1993) occurring in natural metalloproteins; such structures will be called here bioinorganic motifs (BIMs). Before giving more formal definition, let us consider the possible scenarios: • In the most simple case, when a single metal atom is bound to a protein (mononuclear centre), the BIM includes the metal and its first coordination shell 853 K.Degtyarenko Table 2. Functional classification of metalloproteins and roles of corresponding compounds (metals or metal complexes). The ‘permanently’ bound compounds involved in electron/proton transfer, substrate activation and gas binding are usually referred to as prosthetic groups. The reactants involved in electron/proton transfer are usually referred to as coenzymes but can also be considered as ‘transiently’ bound redox centres. One representative PDB entry for each example (if available) is given Function of protein Role of compound Electron transfer Electron transfer Light harvesting Excitation energy transfer Catalysis Compound binding mode Permanent Transient Substrate activation Electron transfer √ √ √ √ Catalysis and regulation Translocation Switch of function √ To be translocated Catalysis or transport Inhibitor Storage (uptake, binding and release) Gas coordination Various Structural 1CYO 1AYF 1AG6 Light-harvesting complex LH-II BChl-a 1KZU Nitrile hydratase DMSO reductase Nitrogenase MoFe protein Manganese superoxide dismutase 2AHJ 1DMR 3MIN 1VEW 1FGJ Manganese peroxidase Ferrochelatase Fe Moco FeMoco Mn haem P460 haems c Mn Fe, haem Holoenzyme: aconitate hydrolase Fe4 S4 1FGH Apoenzyme: IRE-BP Copper-transporting ATPase Cu+ 2AW0 √ Ca2+ -ATPase La3+ Ca2+ 1XLM √ √ √ • When the metal centre is formed by more than one metal atom (polynuclear centre), BIM includes all the metal atoms and their first coordination shell ligands, at least one of which is bridging (Figure 1c, d). • When the protein binds a complex of metal with an exogenous polydentate organic compound, such as porphyrin or pterin, the BIM includes the metal atom and its first coordination shell ligands, of which at least two belong to the organic compound (Figure 1e, f). Therefore, the BIM may be defined as a common structural feature of a class of functionally related, but not necessarily homolo- 1MNP 1DOZ – √ ligands, of which at least three are endogenous ligands (Figure 1a, b). 854 haem b Fe2 S2 Cu D -xylose To be transported or stored PDB Cytochrome b5 Adrenodoxin Plastocyanin Hydroxylamine oxidoreductase √ Reactant Example Compound Protein isomerase Nitrophorin Haemocyanin Haemophore HasA Metallothioneins Lactoferrin Bacterioferritin haem (coordinates NO) 2 Cu2+ (coordinates O2 ) haem Cd2+ , Hg2+ , Pb2+ , Tl+ Fe Fe (in form of hydrated ferric phosphate) 4NP1 1OXY 1B2V 4MT2 1B1X 1BFR Lignin peroxidase Zinc finger Endonuclease III Ca2+ Zn2+ Fe4 S4 1B82 1AAY 2ABK gous, proteins, that includes the metal atom(s) (1) and first coordination shell ligands For example, the similarity in active sites structure of P450, chloroperoxidase and nitric oxide synthase (Figure 1e), originally predicted from spectroscopic data and later confirmed by crystallography, led to recognition of these non-homologous enzymes as a distinctive class, ‘haem–thiolate proteins’ (NCIUB, 1991). A further differentiation may be achieved by either considering the second coordination shell (e.g. the amino acid residues bound to metal through the solvent molecules may be included) or by taking into account the chemistry, orientation or conformation of the organic compound. Interestingly, the different prosthetic groups (e.g. haem b and haem a) may form similar BIMs and vice versa, the same prosthetic group may form different BIMs (cf. haem a and haem a3 centres in cytochrome c oxidase). Bioinorganic motifs: towards functional classification of metalloproteins Mononuclear centres Polynuclear centres N N O N O N N S er N N S N Fe S O Cys N N O Cu Cu O N S Metal–exogenous compound centres N N N N Fe N N N O (a) OH (c) (e) Cys O Cys HO H Fe O O N H2O N S Cys (b) S S S H N Cys Fe Fe Fe N O S S OH2 S S Mg OH O Glu Leu N S O Mo OH S HN Cys O S H2N S N N H (d) OH O P O O- (f) γ Fig. 1. Examples of bioinorganic motifs. (a) Mononuclear iron centre in photosensitive nitrile hydratase: [Fe(NCys )(NSer )(SCys )3 (NO)]. ε (b) Mononuclear magnesium centre in Ni–Fe hydrogenase: [Mg(OH2 )3 (Nε2 His ) (OGlu )(OLeu )]. (c) Dinuclear (type III) copper centre in γ ε2 oxyhaemocyanin: [{Cu(NHis )3 }2 (µ-O2 )]. (d) Polynuclear iron–sulphur centre: [Fe4 S4 (SCys )4 ]. (e) Haem iron coordination in haem–thiolate γ γ proteins: [Fe(η4 -ppIX)SCys ]. (f) Molybdenum centre in sulphite oxidase: [MoO(OH)(SCys )(η2 -molybdopterin)]. The concept of BIM may be further broadened by incorporating model molecules mimicking the natural metalloprotein function. These models may be either completely synthetic coordination compounds, complexes of peptide ‘maquettes’ and prosthetic group, or ‘redesigned’ natural proteins with novel metal centres (Karlin, 1993; Lu and Valentine, 1997). Note that the idea of non-protein derived compounds containing BIM does not contradict the above definition of BIM (1) given that the coordination mode of metal in these compounds is, at least qualitatively, identical to that in natural proteins. On the other hand, the field of bioinorganic chemistry is not confined to metalloproteins. Siderophores and antibiotics such as bleomycin are examples of naturally occurring non-protein metal-binding biological molecules (Lippard and Berg, 1994) which have their functional analogues in the protein world. In its turn, the amazingly diverse bioinorganic centres represent but a fraction of the universal coordination compounds. For example, only a few types of Fe–S clusters are found in biological systems—cf. variety of abiological Fe–S clusters (Ogino et al., 1998). Many complex proteins contain purely organic prosthetic groups, such as flavins, pterins, pheophytins, quinones or carotenoids. By analogy with bioinorganic motifs, the bioorganic motif (BOM) can be defined as a common structural feature of a class of functionally related, but not necessarily homologous, proteins, that includes the organic prosthetic group and polypeptide-derived groups (2) bonded to it However, there is an intrinsic difficulty in defining BOM because of the heterogeneity of chemical bonds (covalent, hydrogen, van der Waals) which may or may not be involved in interaction between prosthetic group and polypeptide. The concept of a coordination shell is not applicable any longer. Haem proteins are the most extensive group of metalloproteins which often display complex combinations of BIMs and BOMs (Karmirantzou, 1998). The choice of BIMs and BOMs by Nature results in a spectacular variety of active site structures even within the same protein family. Therefore it is difficult to predict from the amino acid 855 K.Degtyarenko sequence whether a BIM/BOM is conserved in the protein family. The situation becomes even more complex at domain and/or subunit interfaces. While the protein threedimensional (3D) structure tends to be better conserved than the sequence, the quaternary structure may be less conserved than the 3D structure. This means that the BIM hosted at the subunit interface in the oligomeric protein may not exist in a homologous monomeric protein even if all the residues involved in metal coordination are conserved. Metalloprotein and BIM evolution A BIM by its very nature implies comparison between members of a metalloprotein class. What about the evolutionary aspect of BIMs? Apparently BIMs were among the first emerged structural features of proteins. The necessity to catalyse reactions involving small inert molecules such as CO2 , CH4 , H2 and N2 was a driving force of the evolution under primitive conditions (Williams, 1997). Since none of the common amino acids are able to perform any useful catalytic redox chemistry (Bugg, 1997), the first oxidoreductases and electron-transfer proteins employed the available metals, most importantly Mn, Fe, Ni. It is unlikely that the fold of these first protein molecules evolved prior to its metal binding function as implied by molecular recognition theory (Blalock, 1999) (although the question undoubtedly deserves a review on its own). The advent of dioxygen, the toxic by-product of oxygenic photosynthesis (the process itself involving a unique variety of metal centres), had a dramatic effect. It changed the availability of metals (in particular increasing availability of Cu and Zn) and brought into existence new redox enzymes, involved in detoxification of reactive oxygen species and oxidative energy production (Williams, 1997). The diversity of metal sites in proteins (as, indeed, of almost any feature in biology) is due to both divergent and convergent evolution. Within a divergent family, the active site structure is usually, but not always, conserved while the pairwise sequence identity may be as low as 10%. Homologous metalloproteins may have different BIMs (and vice versa). Thus, the sequence homology, although often resulting in the same fold, does not guarantee the same active site structure. I suggest the use of sequenceindependent BIMs to complement traditional evolutionary trees based on sequence comparison. Let us consider two haemoproteins having neither sequence nor 3D similarity. What is homologous between them? The answer is: the haem group. The known porphyrin biosynthesis pathways are essentially the same; the homologues of the corresponding enzymes are found in different kingdoms of living organisms. ‘Unusual’ prosthetic groups seem to be restricted to taxa containing the corresponding metabolic systems, and so on. The use of a particular metal (or, indeed, particular organic 856 compounds) by certain organisms may be governed not only by thermodynamics but also by the availability of specific transport pathways and enzymes catalyzing formation of complexes. Some complex proteins are unable to fold correctly without the corresponding prosthetic groups. In other cases, the specific proteins route the metal ion or prosthetic group to the target proteins. For example, holocytochrome c synthase (EC 4.4.1.17) is required to attach the haem covalently to the apocytochrome c. It appears that each copper protein is served by a specific Cu(I) transporting protein, ‘copper chaperone’ (Harrison et al., 2000). Thus, the functional structure of a protein is not always derived from its amino acid sequence alone. However sophisticated the software tools that might appear tomorrow, they would not be sufficient to predict the function if the biochemical context is not taken into account. Dimensionality of BIMs One always has to distinguish between structure and its representation. For instance, the covalent structure of a polypeptide is conventionally represented as a onedimensional (1D) amino acid sequence, although the individual amino acids, like any organic compounds, may be represented in two dimensions (2D) using structural formulae or in 3D using the coordinates. As soon as bonds other than peptide (e.g. disulphide) are taken into account, the higher dimensions are required. Except for glycine, all amino acids that occur in proteins are chiral, therefore their unambiguous representation should include stereochemistry. The dimensionality of stereochemical structural formulae is between 2 and 3 and may be referred to as 2.5D. BIMs, like other co-ordination compounds, may be represented in different ways (Figure 2). In contrast to polymers, coordination compounds cannot (sensibly) be represented in 1D. The formulae and systematic names are too cumbersome to be useful. On the other hand, 3D co-ordinates, if available, provide too much information to be used for comparison or classification. I am in favour of 2.5D representation. At the ligand level, not only the nature of the residue (e.g. His) but also the interacting atom is indicated (e.g. Nδ1 or Nε2 ). Not only the nature of a metal and its coordination number, but also its stereochemistry and, to some extent, coordination geometry should be included. Stereochemistry is more important because it is easier to define and the minor adjustments of the polypeptide chain do not break it. Because the coordination polyhedra in metalloproteins are often distorted, it is difficult to choose between, say, distorted octahedral and distorted trigonal biprismatic geometries. Exact angles and distances are not important in 2.5D. Bioinorganic motifs: towards functional classification of metalloproteins D Database Entry 0 ENZYME 1.14.16.2 0.5 1 – PROSITE 1.5 – – PS00367 – Representation Iron [Fe(His)2Glu] P-D-x(2)-H-[DE]-[LI]-[LIVMF]-G-H-[LIVMC]-P [Fe(Nε2His)2(OεGlu)(OH2)2] N N N 2 – – H2O Fe H2O N O O N N N 2.5 PROMISE AAAOH H2O H2O Fe N O O 3 MDB 1TOH Fig. 2. Different dimensionality (D) representations of a BIM for a mononuclear iron enzyme tyrosine 3-monooxygenase. Note that the PROSITE pattern (1D) contains only two of the three protein ligands. The 2.5D representation is attractive because it is intuitively understood. The problem, however, is that there are no publicly available 2.5D databases of biological macromolecules. The information of value for bioinorganic chemistry, therefore, should somehow be derived from other resources such as 1D (sequence), 3D (crystal and solution structure) and dimensionless (enzyme function) databases (Table 1). BIMs in sequence databases One possible explanation of the gap between bioinorganic chemistry and bioinformatics is that the concept of ‘hetero compounds’ is fundamentally alien to the sequence databases. A typical sequence database entry includes the core data (i.e. sequence itself) and the annotation. The core data is constituted by text with a limited number of characters. This limitation is both advantageous and disadvantageous. The progress of bioinformatics in large-scale genome sequence analysis is due to the one-dimensional nature of the core data! A comprehensive protein database remains to be created. My idea of the ‘ideal’ protein database entry is one which contains all the qualitative and quantitative information available for the particular protein. So-called protein sequence databases are, in the best case, polypeptide sequence databases. With the majority of entries originating from nucleic acid sequencing projects, their core data, at most, represent the explicit translation of genomic data. Moreover, the major sequence databases still stick to the 20 amino acid vocabulary and even the naturally occurring amino acid residues such as L -selenocysteine (IUPAC-IUBMB, 1999) and N-formylL -methionine are not considered ‘standard’. Both the low-level (disulphide bridges, prosthetic groups, covalent modifications) and the high-level (domains, membrane topology, quaternary structure, biological function) information exists only as annotation. Quantitative data are not presented at all. In SWISS-PROT (Bairoch and Apweiler, 2000), the reliability of ‘low-level’ annotations varies depending on whether the property is known from 3D structure, from a site-directed mutagenesis studies or just from sequence comparison (Junker et al., 1999). Since amino acid residues often have more than one possible metal-binding mode, the annotations in sequence databases are not informative enough for a bioinorganic chemist. The sequence motif databases, such as PROSITE (Hofmann et al., 1999) and PRINTS (Attwood et al., 2000), provide information on the conserved amino acid residues in protein families. As at 1 April 2000, 23% of PROSITE entries (240 of 1035) and 27% of PRINTS entries (356 of 1310) correspond to metalloprotein or metal-binding protein families although the chosen motifs do not necessarily include the actual metal-binding residues. In an attempt to impose the restricted vocabulary and standard syntax for feature annotation in the PIR Protein Sequence Database, the RESID Database has been built (Garavelli, 2000). RESID lists all the post-translational modifications in proteins, including the covalently attached prosthetic groups. The entries include structural formulae of the compounds (step towards 2.5D!) Again, RESID has a limited use for the bioinorganic chemist. One problem is that no distinction is made between covalent (as in Cys-haem c) and coordination bonds [as in Fe(Cys)4 ], while other prosthetic groups and structural metal sites, though being coordination-bonded to the polypeptide, are not included [e.g. haem b, Zn(Cys)4 ]. Enzyme databases The only comprehensive protein function databases in the public domain are enzyme databases: ENZYME, LIGAND and BRENDA. In all three databases, the entry names correspond to the EC (Enzyme Commission) numbers according to Enzyme Nomenclature (IUBMB, 1992). One has to bear in mind, however, that each EC 857 K.Degtyarenko number defines a particular chemical reaction (or family of reactions) but not the chemical nature of the particular catalyst. Therefore it comes as no surprise that, apart from pointers to the few macromolecular databases, both ENZYME and LIGAND are notably devoid of proteinspecific information. In fact, the chemical reactions in these databases are essentially detached from catalysts, making it possible (at least in principle) to assign the same EC number to both the natural enzyme and, say, the catalytic antibody, if they both catalyse the same reaction. On the other hand, not all known biological catalytic activities are assigned EC numbers or classified as enzymatic at all. One of the most important biochemical reactions on Earth is photosynthetic oxygen evolution: 2 H2 O + light → O2 + 4 H+ + 4 e− but it is not assigned an EC number and photosystem II is not considered to be an enzyme. The ENZYME database (Bairoch, 2000) is primarily based on Enzyme Nomenclature (IUBMB, 1992). The Cofactor field of ENZYME actually includes the names of prosthetic groups. There are few exceptions, such as heme–thiolate (already BIM) and selenium (which belongs to selenocysteine that contributes to the active site structure, sometimes also as a part of BIM). LIGAND is a composite database of the ENZYME and COMPOUND sections (Goto et al., 1998) and is at the moment the most comprehensive biochemical information resource freely available through the Web. In a COMPOUND entry, the links to the pertinent ENZYME entries are included together with one of four possible roles of a given compound in the enzymatic catalysis: R, reactant; C, cofactor; I, inhibitor and E, enhancer (activator). Many compounds have more than one function. For instance, manganese (COMPOUND C00034) functions as a reactant in manganese peroxidase reaction (EC 1.11.1.13); as cofactor of manganese superoxide dismutase (EC 1.15.1.1); as an inhibitor of lysyl aminopeptidase (EC 3.4.11.1) and as an activator of peptidase A (EC 3.4.13.18). The linkage of ENZYME entries back to the COMPOUND database is sometimes problematic since a single COMPOUND entry may correspond to more than one chemical compound. For instance, C00034 includes both Mn(II) and Mn(III); therefore, one cannot use COMPOUND accession numbers as a sole reference for reactants in certain oxidation–reduction or isomerisation reactions. BRENDA (Schomburg et al., 1999) is the most comprehensive freely available database on enzyme function. The database entries are being consistently created and regularly updated by the curators using the original literature rather than computer resources. 858 Two fields are of special interest for this review: Cofactors/prosthetic groups and Metal ions/salts. As one can expect, the information provided is heterogeneous. Cofactors/prosthetic groups also include the coenzymes. The difference between coenzymes and prosthetic groups is clear within the database context: coenzymes are the substrates and therefore are also found in the Reaction and Substrate spectrum fields. Thus, one could assume that {Cofactors/prosthetic groups} = {Prosthetic groups} ∪ {Coenzymes} {Coenzymes} = {Cofactors/prosthetic groups} ∩ {Substrates} However, the metal ions which act as prosthetic groups are also found in another field, Metal ions/salts, together with ‘effector’ metals. Metal ions can also be found in Inhibitor field. All in all, the existing data structure in BRENDA is better suited for the human reader than for a computer program. Since all three databases are based on Enzyme Nomenclature, the same entry may contain information on structurally different proteins which catalyse the same or similar reaction. For example, LIGAND lists copper, zinc, manganese and iron as cofactors of superoxide dismutases (EC 1.15.1.1) while these enzymes are known to contain either Cu and Zn or Fe or Mn. This information can be found in a COMMENT field but it will require a human interpreter. To add even more confusion, BRENDA also includes the additional reactions catalysed by the same protein which catalyse the ‘main’ reaction. For instance, the entry for EC 1.1.1.1 (alcohol dehydrogenase) lists peroxidase and esterase activities which should not be formally classified as EC 1.1.1.1. The polynuclear inorganic prosthetic groups are not treated as separate compounds. There are several types of Fe–S clusters but all of them are listed either as iron–sulfur (ENZYME) or iron and sulfur (LIGAND). Neither of the enzyme databases include enzymes with partially assigned EC numbers. 3D databases ‘Structural biology’ is the ’90s name for an older branch of biophysics dealing with determination of the threedimensional (3D) structure of macromolecules. The protein crystallographers were among the first to realize the need to establish a computer database to store 3D structures (Meyer, 1997), well before the genomic era. Of the about 10 000 proteins of known 3D structure at least half contain metal ions or other non-polypeptide derived groups bound in their active sites, such groups often themselves containing metal ions (Table 3). It is worth remembering that both the first globular proteins (Perutz and Kendrew, 1962) and the first membrane protein (Deisenhofer et al., 1988) to be solved by x-ray crystallography were metalloproteins! Bioinorganic motifs: towards functional classification of metalloproteins Table 3. Biologically important metals in the Periodic Table. The numbers of structurally characterized metal-binding proteins in MDB (version 1.4) are indicated by the lower figure. Note that metal–protein complexes found in PDB are not necessarily the native metalloproteins It should be noted however that the methods used in macromolecular structure determination (i.e. crystallography and NMR) do not yield high quality structures of small compounds. Indeed, while there are a growing number of structures determined at 1.2 Å resolution or better (Longhi et al., 1998), the ‘high-resolution’ in structural biology usually means that macromolecular structure is solved at <2 Å resolution. Although this resolution is more than enough for many biological applications, it often does not provide sufficient information to the bioinorganic chemist. The limitations of crystallography for metalloprotein active centres account for uncertainty or errors in the definition of ligand set; bond lengths and angles; stereochemistry; protonation state of ligands; and oxidation state of transition metals (Holm et al., 1996) Nevertheless, crystallography is the most informative physical method for protein structure determination and 3D databases store the structural data in more or less standard format, while spectroscopic databases of metalloproteins simply do not exist. The Protein Data Bank (PDB) is an archive of 3D structures (Berman et al., 2000). A number of secondary databases have been derived from the PDB. The Macromolecular Structure Database (MSD) project aims to represent biological entities incorporating all levels of structural organization, from covalent to quaternary (rather than, say, crystallographic asymmetric units). Therefore, the protein subset of MSD will be a real, if not comprehensive, protein database. A great deal of effort has been invested in the hierarchical classification of proteins. In such classification schemes as SCOP (Hubbard et al., 1998) and CATH (Orengo et al., 1998), the overall fold is the feature conserved along every hierarchical branch; the functional properties, including details of small compound binding, may be highly specific for individual proteins. A number of tools to search for functional sites in 3D structures have been developed (Orengo et al., 1999), but there is no comprehensive database of such sites. What should be used for functional classification of metalloproteins? Hetero compounds in PDB In PDB, any chemical entity other than one of 20 standard amino acid residues in a polypeptide or one of standard nucleotides in a nucleic acid, is referred to as ‘hetero compound’ (HET field). Given the great chemical diversity of small molecules and relatively low resolution of macromolecular structures, it is not surprising that the data available for HET compounds ‘are generally in a sorry state’ (Kleywegt and Jones, 1998). Two main problems make HET compounds a poor basis for metalloprotein classification: heterogeneity and inconsistency. Indeed, HET compounds include: • Water • Metal ions • Other exogenous inorganic compounds (e.g. CN− , − Cl− , O2 , · NO, NH+ 4 , HSO3 ) • Exogenous organic compounds (e.g. substrates, products, inhibitors) • Prosthetic groups • ‘Non-standard’ amino acids (e.g. Sec) • Modified amino acids (e.g. TPQ) • Modifiers (e.g. N-acetyl-D-glucosamine, myristoyl) 859 K.Degtyarenko To illustrate the second problem, let us consider the large group of diiron–carboxylate proteins, which contain Fe– O–Fe unit in the active site. This group includes such proteins as ribonucleotide reductase, methane monooxygenase, ferritins, haemerythrin and purple acid phosphatase. In different PDB entries, the HET compounds are: • FE (iron, Fe2+ or Fe3+ ) • FEO (µ-oxo-diiron, Fe–O–Fe), • FEA (monoazido-µ-oxo-diiron, N3 –Fe–O–Fe) • MN (manganese, Mn2+ ) In the first case, FE cannot be distinguished from any other iron ion, whether it is a mononuclear iron, iron–sulphur cluster or haem. In the case of FEA, azide anion (N3− ) represents an ‘external’ ligand as opposite to intrinsic Fe–O– Fe group. N3− binds to haemerythrin (and myohaemerythrin) at the site normally occupied by O2 in oxyhaemerythrin (oxymyohaemerythrin). The last case is found in the structure of manganese-substituted bacterioferritin (PDB 1BFR). It is assumed that the structure of the metal-binding site of Mn-substituted bacterioferritin is similar to that of native protein. However, such additional information cannot be deduced from the existing 3D model in the PDB format and may not always be found in comments. Such important information as ligand protonation and metal oxidation states is often missing from the PDB entries and sometimes also in the original articles. Not only different HET names are used in PDB for the same compound (like HEM and HEC for haem c), but also the same HET names are used for different compounds (e.g. HEM may be either haem b or haem c; HEC is either haem c or hydroxyethyl, etc.). PDB/HET derived resources Since the databases derived from PDB use the HET compounds as defined in PDB, they inherit most of the problems discussed above. HIC-Up (Hetero-compound Information Centre— Uppsala) is a resource containing co-ordinates, dictionaries for a number of software packages (CNS, X-PLOR, TNT and O), and other relevant information for the HET compounds from PDB (Kleywegt and Jones, 1998). ReLiBase (Hendlich, 1998) is a complete data management system comprising the object-oriented database handling protein–hetero compound structures derived from PDB, various query tools and web interface. The 3D structures of HET compounds in ReLiBase are converted to ‘two-dimensional’ (2D) chemical structures. This feature allows the 2D substructure and 2D similarity search of the database. However, the 2D structure does not always represent the correct chemical structure due to protonation 860 and bond type uncertainty intrinsic to the PDB (which keeps only geometric data). The protons are often missing from the PDB entries, and the algorithm that generates the 2D structure tends to fill the ‘free’ valences by bonds to implicit hydrogens. ReLiBase is commercialized by Cambridge Crystallographic Data Centre (CCDC) but also remains freely accessible via WWW. PDBsum (Laskowski et al., 1997) gives an at-a-glance overview of the contents of each PDB entry in terms of numbers of protein chains, HET compounds (including metal ions), etc. Among other goodies, PDBsum offers automatically produced LIGPLOT (Wallace et al., 1995) maps of compound–protein interactions. ‘Compounds’ here are not only separate HET groups but also HET–HET complexes (e.g. HEM-OXY). Unfortunately, such maps are not available yet for single metal ion–protein interactions. The IMB Jena Image Library (Reichert et al., 2000) also includes a database of HET compounds and allows for various searches, including very handy element search via the Periodic Table of Elements. Chemical 3D structure databases The Cambridge Structural Database (CSD) is one of the largest chemical resources currently available and the largest crystallography database, containing ∼190 000 entries (Allen and Hoy, 1998). It comprises a comprehensive archive of bibliographic, chemical (2D), molecular structure (3D) and crystal structure (3D) data for organic, inorganic and organometallic compounds. Another important 3D chemistry resource is the Inorganic Crystal Structure Database (ICSD) (Bergerhoff, 1998). It contains complete structural information for inorganic compounds abstracted from original journal articles, including compound name, molecular formula, crystal symmetry group, unit cell parameters, atomic coordinates, and temperature factors. IsoStar (Cole et al., 1998) incorporates experimental information on non-covalent interactions derived from the CSD and the PDB as well as molecular orbital calculations. IsoStar has great potential to address the protein–small compound (e.g. protein–drug) interactions. CSD and IsoStar are commercially distributed by CCDC; ICSD is distributed by FIZ Karlsruhe. In the UK, access to CSD and ICSD is free of charge to academic users of Chemical Database Service (Fletcher et al., 1996). I can name two reasons why a bioinorganic chemist should give special attention to CSD. First, the resolution generally achieved in crystallography of small compounds is significantly better than that of macromolecules. Therefore, the high quality structures of small compounds can be used to refine or validate the metalloprotein structures. Second, CSD contains a number of structures of synthetic compounds mimicking the metalloprotein active centres. These structures could be considered to be BIMs! Bioinorganic motifs: towards functional classification of metalloproteins Harding (1999) used CDS to systematically extract geometrical data relevant to metalloproteins, using specific 2D ‘bioinorganic’ queries. The queries included the six most common metals (Ca, Cu, Fe, Mg, Mn and Zn) and six classes of ligands (alcohols, carboxylates, imidazoles, phenolates, thiolates and water). Where appropriate, the bond type and length restrictions were applied. The method could be easily extended to use more complex queries specific for the metal environment in proteins (i.e. BIMs). Need for biophysical databases Apart from crystallography and NMR, there is whole arsenal of other biophysical methods that can help reveal the structure of metalloprotein active centres even in the absence of 3D structures. BioMagResBank (Seavey et al., 1991) contains chemical shift data derived from ∼1530 proteins and peptides, including those for HET compounds. Unfortunately, there are no other publicly available databases containing the spectroscopic information. Similarly, the functional properties, most importantly the midpoint potentials of redox centres, await the bioinformaticians’ attention. The lack of standard data formats poses a problem—but one not serious enough to prevent the rapid colonization of this ‘ecological niche’ in the very near future. BIM databases? The Scripps Research Institute’s Metalloprotein site Database and Browser (MDB) contains geometrical and functional information on metal sites derived from 3D structures that allows the classification and search of particular combinations of site characteristics. The current release (MDB 1.4) consists of two databases: • The ‘raw’ database is created by automatic recognition and extraction of quantitative information on metal sites from protein subsets of PDB. This is a comprehensive database, containing information on about 4100 proteins. • The ‘edited’ database includes 32 sites from representative Ca, Cu, Fe, Mn and Zn proteins and contains manually added information, such as function (structural, storage, electron transfer, O2 binding or catalytic). The Java-based viewer provides an interactive query tool to both ‘raw’ and ‘edited’ databases. The ‘raw’ database could also be searched using either HTML forms or an SQL interface. Using these tools, complex queries involving metal identity, coordination geometry, number of ligands, type and number of protein-derived ligands, distance cutoff criteria, etc. can be made. MDB allows the interactive visualization of the metal centre and ligands contributing to the metal atom’s first coordination shell. It is not possible to visualize organic compound–protein interactions, such as in haem proteins. Maria Karmirantzou and Janet Thornton have analysed 321 haemoproteins from the PDB comprising 13 nonhomologous families and have created the specialized Protein–Haem Interactions database (Karmirantzou, 1998). Conformational analysis of haem included torsion angles, planarity and accessibility of the haem group. Analysis of polypeptide–haem interactions included amino acids propensities, constraints upon the haem conformation and secondary structure at the protein–haem interface. The results were made available on the Web but the database has not been updated since 1998. PROMISE was intended to be a comprehensive information source on naturally occurring BIMs (Degtyarenko et al., 1998). Its focus is on protein active site structure and on the relationships between a polypeptide and a prosthetic centre. BIMs were used as a basis for classification of metalloproteins, as both alternative and complementary to those employed in other ‘secondary’ protein databases. PROMISE presents the relevant sequence, 3D structural and physico-chemical information in a hierarchically organized collection of HTML documents. Unfortunately PROMISE was discontinued in 1999 due to lack of funding. In contrast to other databases reviewed, HAD (HeavyAtom Databank) deals exclusively with non-natural metal-binding sites in proteins (Islam et al., 1998). The crystallographic methods of multiple isomorphous replacement and anomalous scattering use high quality heavy-atom derivatives of protein crystals. Ironically, it was the results of such analyses, i.e. models of the native proteins, that were deposited to the PDB while the structures of heavy-atom derivatives were discarded. The ‘heavy atoms’ in HAD are defined as those with an atomic mass greater than rubidium. HAD contains several file types, including coordinate files for the heavy-atom positions in PDB-compatible format, crystallization conditions files, compound data files and reference data files. For this review, the pairs of metalloprotein data files are of prime interest. One file of the pair contains information on metalloprotein derivative with native metal replaced, with details of type, quantity, function, coordination geometry, distances and angles between the substituted heavy atom and protein ligands. The second file has the analogous data for the native metalloprotein. Thus, HAD may be used to analyse the conformational change at the metal-binding site upon replacement and reveal the most ‘native-like’ heavy atom substituents. PROCAT is a database of enzyme active site 3D templates created using the TESS (TEmplate Search and Superposition) algorithm (Wallace et al., 1997). The 861 K.Degtyarenko entries are classified according to the Enzyme Nomenclature. The templates include catalytically important and spatially conserved atoms or amino acid residues. The templates may be viewed as 3D analogues of PROSITE or PRINTS patterns which could be used to search the PDB for similar sites. Since the same EC class can include non-homologous protein families, the case of entries containing more than one template should be envisaged. In reality, PROCAT does not cover even those EC classes which are well represented in the PDB. As so often happens, the progress here is limited by ‘people-ware’ (Hann and Green, 1999) and not by an algorithm. Note however that TESS may be used to search the PDB for any user defined combination of atoms in space, i.e. it is not restricted to enzymes, polypeptides and macromolecules in general. Likewise, the inconsistencies of HET compounds do not pose a problem as far as the atomic model of the compound is correct. Therefore, TESS appears to be an ideal method to build a database of 3D BIMs! Conclusion Like other motifs in bioinformatics, BIM reflect the similar features of a class of proteins. Thus BIM may be used both for classification and for the search of other functionally related proteins. Although BIMs inhabit the major biological and chemical databases, they are not defined in any consistent way. Why, in spite of an abundance of experimental data on metalloprotein structure and function, are there no comprehensive database of BIMs? The reasons are the intrinsic complexity of data and lack of a data model capable of handling such complexity; insufficient interoperability of existing biological databases; lack of standards (including terminology) for biochemical data in general. No reliable algorithm exists to predict BIMs from sequence data, so there are no ‘easy’ ways in which to populate the database. Compiling such a database is a complex and ambitious task requiring decades or hundreds of expert man-years, that could only be achieved by the close cooperation of international scientific communities. However, once created, the database could be used to yield knowledge on metal–polypeptide interactions. Bioinformatics of today deals primarily with the structure of biological macromolecules. In the next century it should be extended to encompass biochemical and biophysical (i.e. functional) data. In an ideal scenario, development of free biochemical, biophysical and BIM databases will go hand in hand with standardization activities, such as creation of controlled vocabularies of gene function and biological processes (White et al., 1999). 862 Acknowledgements I am indebted to Prof V.Yu.Uvarov, who introduced me to the fields of bioinorganic chemistry and bioinformatics. I thank Katalin Nadassy, Gillian Adams and my anonymous reviewers for their helpful comments and suggestions on the manuscript. Glossary Apoprotein, the polypeptide component of a complex protein. Bioinorganic motif, a common structural feature shared by functionally related proteins, consisting of the metal atom(s) and first coordination shell ligands. Bridging ligand, an atom that donates two or more electron pairs to different central atoms in polynuclear coordination entity; indicated by the symbol µ. Coenzyme, a non-polypeptide compound involved in enzymatic reactions as a reactant capable to donate or accept chemical groups or electrons. Coordination geometry, arrangement of the ligands around the central atom. Coordination number, the number of σ -bonds between the central atom and ligands. Coordination shell (first coordination shell), the collective name for the ligands surrounding the central atom(s). Diiron–carboxylate proteins, a group of proteins characterized by binuclear iron centre bridged by carboxylate group(s) of Asp or Glu and oxide/hydroxide group(s). Endogenous, polypeptide-derived. Enzyme, a protein catalyst. Exogenous, not derived from polypeptide. Haem, an iron–porphyrin complex. Natural haems (a, b, c, d, d1 , o) differ by substituents at various porphyrin positions. Holoprotein, the functional complex protein. Homology, common evolutionary ancestry. Iron–molybdenum cofactor (FeMoco), the prosthetic group of nitrogenase MoFe protein. Ligand (in a coordination entity), one of the atoms or chemical groups bound to the metal atom via a dative bond. Midpoint potential (standard redox potential; E0 or Em ), the redox potential of a system containing one mole each of the reduced and oxidized form of a compound. In biological systems, the E0 at pH 7 (E0 or Em.7 ) is used as the reference. Molybdenum cofactor (Moco), the metal (Mo or W) complex of molybdopterin. Moco functions as the prosthetic group of a number of oxidoreductases. Monodentate ligand, the compound that can donate one electron pair to central atom in coordination entity. Mononuclear, containing one metal atom within a coordination shell. Bioinorganic motifs: towards functional classification of metalloproteins Photosystem II (PSII), a multi-subunit transmembrane protein complex in plants, algae and cyanobacteria that uses light energy to oxidise water to dioxygen. Polydentate ligand, the compound that can donate more than one electron pair to central atom in coordination entity. The number n of ligating atoms is represented by the symbol ηn . Polynuclear, containing more than one metal atom within a single coordination shell. Porphyrin, a macrocycle containing four pyrrole rings linked by single carbon atom bridges. Naturally occurring porphyrins form tight complexes with metal ions, such as Fe (haems), Mg (chlorophylls) and Ni (F430). Prosthetic group, a non-polypeptide compound that conveys specific biological function to holoprotein. Single metal ions, inorganic compounds, organic compounds and metal–organic complexes all may function as prosthetic groups. Redox, abbreviation of oxidation–reduction. Redox potential (E), a measure of the tendency of a redox system to donate or accept electrons. Siderophores, small organic molecules involved in the specific uptake of iron in bacteria. References Allen,F.H. and Hoy,V.J. (1998) Cambridge Structural Database. In von Ragué Schleyer,P. (ed.), Encyclopedia of Computational Chemistry. John Wiley & Sons, Chichester, pp. 155–167. Attwood,T.K., Croning,M.D.R., Flower,D.R., Lewis,A.P., Mabey,J.E., Scordis,P., Selley,J.N. and Wright,W. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res., 28, 225–227. URL = http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ Bairoch,A. (2000) The ENZYME database in 2000. Nucleic Acids Res., 28, 304–305. URL = http://www.expasy.ch/enzyme/ Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. URL = http://www.expasy.ch/sprot/ Barker,W.C., Garavelli,J.S., Huang,H., McGarvey,P.B., Orcutt,B.C., Srinivasarao,G.Y., Xiao,C., Yeh,L.S., Ledley,R.S., Janda,J.F., Pfeiffer,F., Mewes,H.W., Tsugita,A. and Wu,C. (2000) The Protein Information Resource (PIR). Nucleic Acids Res., 28, 41– 44. URL = http://pir.georgetown.edu/ Bergerhoff,G. (1998) Inorganic three-dimensional structure databases. In von Ragué Schleyer,P. (ed.), Encyclopedia of Computational Chemistry. John Wiley & Sons, Chichester, pp. 1325–1337. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. URL = http: //www.rcsb.org/pdb/ Blalock,J.E. (1999) On the evolution of ligands: did peptides functionally precede metals and small organic molecules? Cell. Mol. Life Sci., 55, 513–518. de Bolster,M.W.G. (1997) Glossary of terms used in bioinorganic chemistry. Pure Appl. Chem., 69, 1251–1303. URL = http:// www.chem.qmw.ac.uk/iupac/bioinorg/ Bugg,T. (1997) An Introduction to Enzyme and Coenzyme Chemistry. Blackwell Science, Oxford. Cole,J.C., Taylor,R. and Verdonk,M.L. (1998) Directional preferences of intermolecular contacts to hydrophobic groups. Acta Crystallogr. D, 54, 1183–1193. Davidson,E.R. (ed) (2000) Computational transition metal chemistry. Chem. Rev., 100, 351–818. URL = http://pubs.acs.org/ cgi-bin/jtocz?chreay/100/2 Degtyarenko,K.N., North,A.C.T., Perkins,D.N. and Findlay,J.B.C. (1998) PROMISE: a database of information on prosthetic centres and metal ions in protein active sites. Nucleic Acids Res., 26, 376–381. URL = http://bioinf.leeds.ac.uk/promise/ Deisenhofer,J. and Michel,H. (1992) High-resolution crystal structures of bacterial photosynthetic reaction centers. In Ernster,L. (ed.), Molecular Mechanisms in Bioenergetics. Elsevier, Amsterdam, pp. 103–120. Deisenhofer,J., Huber,R. and Michel,H. Nobel Prize in Chemistry (1988) for the determination of the three-dimensional structure ofa photosynthetic reaction centre’. URL = http://www.nobel.se/ laureates/chemistry-1988.html. Fletcher,D.A., McMeeking,R.F. and Parkin,D. (1996) The United Kingdom chemical database service. J. Chem. Inf. Comput. Sci., 36, 746–749. URL = http://cds.dl.ac.uk/cds/ Garavelli,J.S. (2000) The RESID Database of protein structure modifications: 2000 update. Nucleic Acids Res., 28, 209–211. URL = http://pir.georgetown.edu/pirwww/dbinfo/resid.html Goto,S., Nishioka,T. and Kanehisa,M. (1998) LIGAND: chemical database for enzyme reactions. Bioinformatics, 14, 591– 599. URL = http://www.genome.ad.jp/dbget/ligand.html Hann,M. and Green,R. (1999) Chemoinformatics—a new name for an old problem? Curr. Opin. Chem. Biol., 3, 379–383. Harding,M.M. (1999) The geometry of metal–ligand interactions relevant to proteins. Acta Crystallogr. D, 55, 1432–1443. Harrison,M.D., Jones,C.E., Solioz,I. and Dameron,C.T. (2000) Intracellular copper routing: the role of copper chaperones. Trends Biochem. Sci., 25, 29–32. Hendlich,M. (1998) Databases for protein–ligand complexes. Acta Crystallogr. D, 54, 1178–1182. Hofmann,K., Bucher,P., Falquet,L. and Bairoch,A. (1999) The PROSITE database, its status in 1999. Nucleic Acids Res., 27, 215–219. URL = http://www.expasy.ch/prosite/ Holm,R.H., Kennepohl,P. and Solomon,E.I. (1996) Structural and functional aspects of metal sites in biology. Chem. Rev., 96, 2239–2314. Huang,W., Jia,J., Cummings,J., Nelson,M., Schneider,G. and Lindqvist,Y. (1997) Crystal structure of nitrile hydratase reveals a novel iron centre in a novel fold. Structure, 5, 691–699. Hubbard,T.J.P., Ailey,B., Brenner,S.E., Murzin,A.G. and Chothia,C. (1998) SCOP, structural classification of proteins database: applications to evaluation of the effectiveness of sequence alignment methods andstatistics of protein structural data. Acta Crystallogr. D, 54, 1147–1154. URL = http://scop.mrc-lmb.cam. ac.uk/scop/ Islam,S.A., Carvin,D., Sternberg,M.J. and Blundell,T.L. (1998) HAD, a data bank of heavy-atom binding sites in protein crystals: a resource for use in multiple isomorphous replacement and anomalous scattering. Acta Crystallogr. D, 54, 1199–1206. URL = http://www.bmm.icnet.uk/had/ 863 K.Degtyarenko IUBMB, (1992) Enzyme Nomenclature: Recommendations (1992) of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Academic Press, San Diego. IUPAC-IUBMB,Joint Commission on Biochemical Nomenclature (JCBN) and Nomenclature Committee of IUBMB (NC-IUBMB) Newsletter, (1999) Eur. J. Biochem., 264, 607–609. URL = http: //www.chem.qmw.ac.uk/iubmb/newsletter/1999/item3.html Jernigan,R., Raghunathan,G. and Bahar,I. (1994) Characterization of interactions and metal ion binding sites in proteins. Curr. Opin. Struct. Biol., 4, 256–263. Junker,V.L., Apweiler,R. and Bairoch,A. (1999) Representation of functional information in the SWISS-PROT data bank. Bioinformatics, 15, 1066–1067. Karlin,K.D. (1993) Metalloenzymes, structural motifs, and inorganic models. Science, 261, 701–708. Karmirantzou,M. (1998) Computational approaches to protein– ligand interactions: protein–haem complexes, PhD Thesis, University College London. Kleywegt,G.J. and Jones,T.A. (1998) Databases in protein crystallography. Acta Crystallogr. D, 54, 1119–1131. URL = http: //xray.bmc.uu.se/hicup/ Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem. Sci., 22, 488–490. URL = http://www.biochem.ucl.ac. uk/bsm/pdbsum/ Lippard,S.J. and Berg,J.M. (1994) Principles of Bioinorganic Chemistry. University Science Books, Mill Valley. Longhi,S., Czjzek,M. and Cambillau,C. (1998) Messages from ultrahigh resolution crystal structures. Curr. Opin. Struct. Biol., 8, 730–737. Lu,Y. and Valentine,J.S. (1997) Engineering metal-binding sites in proteins. Curr. Opin. Struct. Biol., 7, 495–500. Macromolecular Structure Database. URL = http://msd.ebi.ac.uk/. Metalloprotein site Database and Browser. URL = http://metallo.scripps.edu/ Meyer,E.F. (1997) The first years of the protein data bank. Protein Sci., 6, 1591–1597. Nomenclature Committee of the International Union of Biochemistry (NCIUB), (1991) Nomenclature of electron-transfer proteins. Recommendations 1989. Eur. J. Biochem., 200, 599–611. 864 Ogino,H., Inomata,S. and Tobita,H. (1998) Abiological iron–sulfur clusters. Chem. Rev., 98, 2093–2122. Orengo,C.A., Martin,A.M., Hutchinson,G., Jones,S., Jones,D.T., Michie,A.D., Swindells,M.B. and Thornton,J.M. (1998) Classifying a protein in the CATH database of domain structures. Acta Crystallogr. D, 54, 1147–1154. URL = http://www.biochem.ucl. ac.uk/bsm/cath/ Orengo,C.A., Todd,A.E. and Thornton,J.M. (1999) From protein structure to function. Curr. Opin. Struct. Biol., 8, 374–382. Perutz,M.F. and Kendrew,J.C. Nobel Prize in Chemistry (1962) for their studies of the structures of globular proteins. URL = http: //www.nobel.se/laureates/chemistry-1962.html. Reichert,J., Jabs,A., Slickers,P. and Sühnel,J. (2000) The IMB Jena Image Library of Biological Macromolecules. Nucleic Acids Res., 28, 246–249. URL = http://www.imb-jena.de/IMAGE.html Schomburg,D., Schomburg,I., Chang,A. and Bänsch,C. (1999) BRENDA: the information system for enzymes and metabolicinformation. In Proceedings of the German Conference on Bioinformatics1999. URL = http://www.brenda.uni-koeln.de/ Seavey,B.R., Farr,E.A., Westler,W.M. and Markley,J.L. (1991) A relational database for sequence-specific protein NMR data. J. Biomol. NMR, 1, 217–236. URL = http://www.bmrb.wisc.edu/ Valentine,J.S. and O’Halloran,T.V. (1999) Bio-inorganic chemistry: what is it, and what’s so exciting? Curr. Opin. Chem. Biol., 3, 129–130. Wallace,A.C., Laskowski,R.A. and Thornton,J.M. (1995) LIGPLOT: a program to generate schematic diagrams of proteinligand interactions. Protein Eng., 8, 127–134. Wallace,A.C., Borkakoti,N. and Thornton,J.M. (1997) TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci., 6, 2308–2323. URL = http://www.biochem. ucl.ac.uk/bsm/PROCAT/PROCAT.html White,J.A., Apweiler,R., Blake,J.A., Eppig,J.T., Maltais,L.J. and Povey,S. (1999) Report of the second international nomenclature workshop. Cambridge, United Kingdom, May 1–2, 1999. Genomics, 62, 320–323. URL = http://www.gene.ucl.ac.uk/ nomenclature/INW2.html Williams,R.J.P. (1997) The natural selection of the chemical elements. Cell. Mol. Life Sci., 53, 816–829.