Download Analysis and comparison of metabolic pathway databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
Ulrike Wittig
works as a research associate
in the Scienti®c Database and
Visualisation group of the
European Media Laboratory
GmbH (EML) Heidelberg,
Germany. Her research
interests include modelling and
visualisation of biochemical
pathways, information
extraction from biological data
sources and data integration
from biological databases.
Ann De Beuckelaer
is a member of the Scienti®c
Database and Visualisation
Group of the European Media
Laboratory GmbH (EML) in
Heidelberg looking at the
electronic representation of
biochemical pathways. Her
interest is the development of
data models allowing the
complete biochemical system
to be represented for
organisms of any kind.
Keywords: database,
pathway, metabolism,
enzyme, genome,
bioinformatics
Analysis and comparison of
metabolic pathway databases
Ulrike Wittig and Ann De Beuckelaer
Date received: 10th January 2001
Abstract
Enormous amounts of data result from genome sequencing projects and new experimental
methods. Within this tremendous amount of genomic data 30±40 per cent of the genes being
identi®ed in an organism remain unknown in terms of their biological function. As a
consequence of this lack of information the overall schema of all the biological functions
occurring in a speci®c organism cannot be properly represented. To understand the functional
properties of the genomic data more experimental data must be collected. A pathway database
is an effort to handle the current knowledge of biochemical pathways and in addition can be
used for interpretation of sequence data. Some of the existing pathway databases can be
interpreted as detailed functional annotations of genomes because they are tightly integrated
with genomic information. However, experimental data are often lacking in these databases.
This paper summarises a list of pathway databases and some of their corresponding biological
databases, and also focuses on information about the content and the structure of these
databases, the organisation of the data and the reliability of stored information from a
biological point of view. Moreover, information about the representation of the pathway data
and tools to work with the data are given.
Advantages and disadvantages of the analysed databases are pointed out, and an overview to
biological scientists on how to use these pathway databases is given.
INTRODUCTION
Ulrike Wittig,
European Media Laboratory,
Villa Bosch,
Schloss-Wolfsbrunnenweg 33,
69118 Heidelberg,
Germany
Tel: 49 6221 533117
Fax: 49 6221 533298
E-mail:
[email protected]
126
Initially, biological databases were created
to store information gained during a
particular project. As a consequence, all
the data are stored in large and widely
distributed databases, with heterogeneous
data formats and access mechanisms. In
most of the review articles published on
biological databases1±3 a critical analysis of
the data is missing. However, to extract
and use data from heterogeneous data
sources it is essential for a user to have
information about the origin of the data,
updates, redundancies and reliability. This
paper primarily focuses on pathway
databases and critically analyses their
structure, the content of information
being stored and especially the reliability
of the data. Pathway databases are closely
related to enzyme databases and this
means that some of the enzyme databases
were also checked according to their
biological content. The fast-growing
amount of data, creation of new databases
and changes in existing databases on the
Internet means that the analysis is
inevitably incomplete.
PATHWAY DATABASES
Databases developed in the last decades
can be classi®ed into different categories,
including genome databases, protein
databases, enzyme databases, pathway
databases, literature databases and some
very specialised databases. This
classi®cation of databases is often based on
their biological content. Although the
content of the databases is mostly
restricted to speci®c biochemical
compounds or functions, a lot of overlap
occurs. For example protein sequence
information is listed in genome databases
and functional data of corresponding
enzymes can be found in protein and
pathway databases.
While analysing databases containing
information about biochemical reactions a
distinction must be made between
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
what is there?
enzyme databases and the more complex
pathway databases. Enzyme databases
mainly contain information about
enzymes and their properties. On the
other hand pathway databases re¯ect
information about reactions and pathways
in general, data related to organismspeci®c information about genes, their
related gene products, protein functions,
expression data, data about enzymatic
activities, kinetic data, etc. In most cases
the above-mentioned enzyme databases
are closely connected with pathway
databases (Figure 1).
Pathway databases can also be
subclassi®ed into databases containing
metabolic and/or signalling pathways.
This review will mainly concentrate on
metabolic pathway databases because it is
dif®cult to compare the two types of
content. Currently, the most commonly
used pathway databases include WIT,
KEGG, EcoCyc, ExPASy±Biochemical
Pathways, PathDB and UM-BBD
(Table 1). These pathway databases were
selected and analysed particularly with
regard to their biological/biochemical
contents, data structures and data
representation.
One major advantage of pathway
databases over other biological databases is
the possibility of providing several types
of information in the context of the
graphical representation of pathways. For
example pathway databases are able to
represent the high complexity of all the
biochemical reactions within a single cell
or a complete organism. There are
different ways to show a graphical view of
a pathway. Graphical representations of
the pathways are drawn manually and are
non-dynamic (KEGG, WIT) or are
produced automatically on demand and
are user-dependent (PathDB, EcoCyc).
For a user-dependent interaction, ¯exible
representations will be more and more
important because they enable different
levels of information to be represented.
To avoid information overload, but on
the other hand to be able to differentiate
between general overviews and detailed
representations of reactions or pathways
containing structural formulas of chemical
compounds, users should be able to zoom
into their speci®c level of interest. A
challenge of pathways representation is to
combine a ¯exible drawing of pathways,
with the content behind all the elements
within that given pathway, without
loosing clarity. Therefore the following
pathway databases were checked for their
graphical representation, including issues
about the availability of automatic
drawings and the possibility of userdependent changes and interactions.
Furthermore, tools available to work with
the data were analysed.
WIT (WHAT IS THERE?)
Content of the database
Figure 1: The connectivity of the main database types
The WIT system connects data about
genes and genomes, enzymes, reactions
and pathways.4,5 The Enzyme and
Metabolic Pathways (EMP) database is
currently embedded within WIT. The
EMP data are derived from about 17,000
literature sources and are organised in a
number of tables. Organism-speci®c
information was extracted manually from
the literature, including protocols of
experiments (puri®cation steps, storage
conditions, etc.) and reaction mechanisms
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
127
Wittig and De Beuckelaer
Table 1: Pathway databases and relevant databases for data retrieval in pathway databases
Classi®cation
Content
Example databases
Pathway databases
Pathway databases
Metabolic and/or signalling
pathways, reaction participants,
relations between genes,
enzymes and reactions
KEGG
http://www.genome.ad.jp/kegg/kegg.html
WIT
http://wit.mcs.anl.gov/WIT2/
EcoCyc/MetaCyc
http://ecocyc.pangeasystems.com/ecocyc/
ExPASy±Biochemical Pathways http://expasy.proteome.org.au/cgi-bin/search-biochemindex
CSNDB
http://geo.nihs.go.jp/csndb/
PathDB
http://www.ncgr.org/software/pathdb/
UM-BBD
http://www.labmed.umn.edu/umbbd/index.html
SPAD
http://www.grt.kyushu-u.ac.jp/spad
Relevant databases often used for data retrieval in pathway databases
Genome databases Nucleotide sequences,
GenBank
functional annotations
EMBL Database
DataBank of Japan
TIGR
MGDB
FlyBase
Protein and protein- Protein sequences and their
SWISS-PROT/TrEMBL
related databases characteristics, functional
PROSITE
annotations
SCOP
PDB
Pedant
Transfac
PIR
Enzyme databases
Enzymatic properties of
ExPASy±ENZYME
proteins
BRENDA
EMP
Chemical databases Chemical properties of
Beilstein
compounds
ChemFinder
Literature database Abstracts of scienti®c literature Medline/PubMed
(de®ned as a network of elementary
reactions).
Access to the data
WIT offers a general functional overview
as one possible starting point represented
as a classi®cation table of pathways.
Unfortunately this classi®cation is not
consistent with other pathway databases.
For instance WIT has many different
types of glycolysis dependent on the initial
substrate and/or ®nal product. To ®nd a
speci®c pathway some detailed
information about the pathway of interest
is necessary. Since pathway names contain
a mixture of abbreviations and systematic
names6 they are often very long and
incomprehensible.
Querying the EMP database for
enzymes by name or EC number can be
improved by de®ning compartments, cell
types, tissues or organisms. Pathways can
128
http://www.ncbi.nlm.nih.gov/
http://www.ebi.ac.uk/embl/index.html
http://www.ddbj.nig.ac.jp/
http://www.tigr.org/tdb/
http://mbgd.genome.ad.jp/
http://¯y.ebi.ac.uk:7081
http://www.ebi.ac.uk/swissprot/index.html
http://www.expasy.ch/prosite/
http://scop.mrc-lmb.cam.ac.uk/scop/
http://www.rcsb.org/pdb/
http://pedant.mips.biochem.mpg.de/
http://transfac.gbf-braunschweig.de/TRANSFAC/index.html
http://www-nbrf.georgetown.edu/pirwww
http://www.expasy.ch/enzyme/
http://www.brenda.uni-koeln.de/
http://wit.mcs.anl.gov/EMP
http://www.beilstein.com/
http://www.chem®nder.com
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbˆPubMed
be searched by selecting compartments for
initial substrate, intermediate and ®nal
products, etc. Furthermore, organisms,
cell types or tissues can also be selected in
the general query system. Since the query
results contain internal accession numbers
for enzymes and abbreviated names for
pathways they are dif®cult to interpret.
Data representation
The `View Annotation' window gives
names, EC number and functional
description of an enzyme. Graphical
representations contain comments and
information on the compartments where
the reaction occurs and links to
information about the enzyme. The
descriptive page of enzymes is linked to
KEGG, EMP and Medline. However, the
enzyme information derived from KEGG
contains some slight differences from the
original KEGG project, eg links to KEGG
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
Kyoto Encyclopedia of
Genes and Genomes
maps, 3D-structure information from
PDB (Protein Data Bank) and links to the
BRENDA enzyme database are removed,
but accession numbers to the PIR
(Protein Information Resource) database
are given. The link to the EMP database
allows the retrieval of literature-related
and organism-speci®c enzyme
information.7
The `Diagram Data' window contains
detailed information on the overall
balance of the reaction, every compound
involved in the reaction and the location
or compartment of the cell where the
reaction occurs. These data are used for
the graphical representation of the
pathways that are represented in the
`Diagram Picture' view. The `Assertions
Table' contains information about
existing enzymes of a given pathway in
complete sequenced organisms.
One major disadvantage of the WIT
system is that there is no graphical
representation of all pathways in a general
overview.
Reliability/inconsistency
WIT uses a very comprehensive schema
for classifying biochemical pathways.
Unfortunately this classi®cation system is
not comparable with classi®cations used
in other databases (Figure 2). Going into
the detailed list of pathways the user is
confronted with very long pathway names
that contain abbreviations of start and/or
®nal compound of the pathway. For
example the catabolism of glucose (from
glucose to pyruvate) is named
`GLCPYR.CAT' but names such as
`GLCPYRADPATPNAD.CAT' also
occur, where it is very dif®cult to
understand the meaning.
Additional tools
Homology searches for genes or enzymes
and a pathway search option are
implemented. The result of these queries
contains only enzymes and the number of
matches for each pathway name but no
graphical representation.
KEGG
Content of the database
KEGG (Kyoto Encyclopedia of Genes
and Genomes) contains all known
metabolic pathways and a limited number
of regulatory pathways and transport
mechanisms.8,9 The KEGG system
consists of three main databases which are
tightly connected: LIGAND with
information about compounds, enzymes
and reactions stored in ¯at ®les,10,11
PATHWAY, which contains the
graphical representations of the pathways
and lists of enzymes and reactions within
the pathways, and GENES, which
contains organism-related genome and
gene information and lists of genes within
an organism and pathway. Furthermore
KEGG provides many links to other
databases that are integrated within the
DBGET integrated database retrieval
system.12
KEGG's gene and genome data are
based on genome databases containing all
publicly available nucleotide sequences
and their functional assignments.
Biochemical data of pathways, reactions,
etc., for instance compound and enzyme
names, reactions or comments, are based
on IUPAC (International Union of Pure
and Applied Chemistry) and IUBMB
(International Union of Biochemistry and
Molecular Biology) recommendations.13
Chemical structures of compounds are
stored in two different formats: a GIF
image format and 2D coordinates stored
in an MDL-MOL ®le (®le format for
molecular structures from MDL
Information Systems Inc.), which can be
used to launch a proper ISIS/Draw
application.
Access to the data
KEGG's main menu offers several
different ways to enter the pathway
systems: one can select an organism from
a list of complete or partially sequenced
organisms, or select a pathway from the
classi®cation of pathways (subdivided into
metabolic and regulatory pathways), or
search for a speci®c compound or enzyme
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
129
Wittig and De Beuckelaer
in the LIGAND database or for a gene in
the GENE database. The query interfaces
do not allow complex queries; for
example, a query for all known enzymes
that use a de®ned substrate.
The general overview of all pathways
using a colour-code according to the
classi®cation of pathways enables the
initial entry of the program. This allows
access to speci®c pathway classes, eg
carbohydrate metabolism. A window
opens which represents links to all speci®c
pathways within carbohydrate
metabolism.
Data representation
Pathways in KEGG are classi®ed
according to the chemical structures of
their main compounds, eg carbohydrates,
lipids, amino acids. All speci®c pathway
maps and overviews are manually drawn
pictures where pathway maps consist of
links to speci®c information about
compounds, enzymes and genes. Pathway
maps contain all known reactions
catalysed by proteins/enzymes derived
from gene products. Non-enzymatic
reactions are not always included in that
system. However, in some cases nonenzymatic reactions are shown within a
pathway map but do not have links to
detailed information of this speci®c
reaction. Reactions within the pathway
maps do not represent side compounds,
eg ATP (adenosine triphosphate) or
NADH (reduced nicotinamide adenine
dinucleotide). Pathway maps are also
linked to some other related pathways
connected by their contributing
compounds. This allows the user to get an
overview about connections to other
pathways, eg one main ®nal product of
glycolysis ± pyruvate ± is used in the
different pathways of amino acid
metabolism. Unfortunately not all
possible connections are given.
As well as the reference pathway the
user can select pathway information of a
speci®c organism. Within the reference
pathway all the enzymes that are known
to be expressed in the speci®c organism
(found by homology search) are
130
highlighted in green. Furthermore other
selections for speci®c information are also
available,10 eg highlighting of all enzymes
with identi®ed 3D structure, enzymes
with sequences in the SWISS-PROT,
GenBank or PIR databases, and enzymes
that have a link to genetic diseases in the
OMIM (Online Mendelian Inheritance in
Man) database.
Genome maps are represented as
graphics that give information about gene
positions and their relationship with the
pathways. Gene catalogues contain
hierarchical texts that include all known
genes for each organism, listed according
to the pathway classi®cation. Information
about orthologous genes are stored in
tables containing the information of a
conserved, functional unit in a pathway, a
comparative list of genes for the
functional unit in different organisms, and
the positional information of genes
clustered in each genome.
Molecule catalogues are represented by
hierarchical texts containing
macromolecules (including proteins and
RNAs) and small chemical compounds.
These data are based on classi®cations of
enzymes and chemical compounds.
Enzymes are classi®ed based on IUPAC/
IUBMB recommendations and
additionally according to PIR
superfamilies, SCOP (Structural
Classi®cation of Proteins) 3D-folds or
PROSITE (database of protein families
and domains) motifs. The pathway query
result shows all pathways containing a
given enzyme or compound but offers no
graphical representation. Organismspeci®c information is also not available
within such results. Searching for a
pathway by selecting a speci®c organism
does not provide information about the
enzymes available but only links to the
gene information related to that organism.
A comparison of pathways of two or
more organisms cannot be implemented.
Enzyme pages contain the following
information: names, EC classi®cation
according to IUBMB, reactions and their
participants (substrates, products,
cofactors, inhibitors, etc.), pathways,
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
related genes in organisms and links to
diseases (OMIM), motifs (PROSITE) and
3D structures (PDB).
Compounds are represented with
names, chemical and structural formula,
pathways in which the compound occurs
and enzymes that catalyse a reaction
containing the compound as participant.
Reliability/inconsistency
KEGG does not contain all possible
reactions catalysed by an enzyme. When
comparing reaction equations with the
lists of substrates and products,
inconsistencies sometimes arise, eg the
substrate list of pyruvate kinase (EC
2.7.1.40) contains more compounds than
those listed in the reaction equations.
Often reactions catalysed by an enzyme,
eg alcohol dehydrogenase (EC 1.1.1.1),
with a wide substrate speci®city are given
as general reaction equations, which
means loss of detailed information
concerning the correlating products of
each substrate. Moreover, enzymes are
represented with their substrates and
products, but reaction equations and
pathway information are missing, eg
peptidases.
Within compounds some errors and
inconsistencies arise. For example
sometimes one compound ID is given for
two different compounds (eg C00023 for
both Fe2‡ and Fe3‡ ) or different
compound IDs are given for the same
compound (C02038 and C02156 for
glycyl-peptide or C04230 and C01174 for
1-acyl-glycero-3-phosphocholine among
others).
Additional tools
EcoCyc/MetaCyc
KEGG offers some tools for querying
within different maps and also methods to
compute with KEGG data. Tools for
searching enzymes (EC numbers ± names
not possible), compounds (compound ID
± names not possible) or genes (gene
name or accession number) are available.
Within genome maps queries for gene
positions or homologous gene clusters are
available as well as comparisons of two
genome maps for exhaustive search of
homologous gene clusters. Tools for
colouring and thereby highlighting the
genes in the pathway or genome map
make the system user-friendly and enable
a quick overview on the result of the
query.
One unique feature of the KEGG
system is the creation of a pathway
between two given compounds.10 As a
starting point, two compound IDs, the
parameter level and hierarchy of
relaxation have to be de®ned.
Unfortunately there is no help function or
documentation about the parameters.
Based on binary relations two algorithms
(Dijkstra/Floyd) are used to ®nd the
shortest way between two compounds. In
such cases where a reaction has two or
more substrates the implementation is not
able to distinguish between different
substrates and their correlating products
or compartments and transport
mechanisms. Pathways are created based
on one main substrate without paying
attention to side substrates or products,
compound/enzyme locations and
cofactors. Results of the pathway creation
contain only the compound ID and EC
number. For users who are not familiar
with the compound IDs used in KEGG it
is very time-consuming to understand the
resulting pathway since there is no
graphical representation of the search
results. This tool does not apply
organism-speci®c searches and creates
pathways containing all possible reactions
regardless of whether the speci®c enzyme
has been identi®ed in the speci®c
organism or not. Highlighting these
identi®ed enzymes and proven pathways
among them would be very helpful.
Automatic reconstruction of organismspeci®c metabolic pathways can be
accomplished by matching the genes
corresponding to the enzymes in the gene
catalogues with the enzymes on the
reference pathway diagrams.
ECOCYC/METACYC
Content of the database
EcoCyc is a metabolic-pathway database
that describes the genome and the
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
131
Wittig and De Beuckelaer
biochemistry of Escherichia coli based on
information from the EcoGene database,14
SWISS-PROT and the scienti®c
literature.15,16 The database consists of all
sequences and functional annotations of
E. coli genes. Pathways of E. coli and its
reactions and enzymes are annotated with
references to the literature. In addition
MetaCyc describes pathways, reactions
and enzymes of a variety of organisms,
with a microbial focus. But MetaCyc does
not contain organism-speci®c genome or
protein information such as genomic
maps or sequences. MetaCyc uses the
same database schema and visualisation
software as EcoCyc.17
Access to the data
The EcoCyc query interface provides
search options for E. coli genes, proteins,
reactions, compounds and pathways by
names, sub-strings, classi®cation
hierarchy, EC number or chemical
structure. In contrast to EcoCyc the
MetaCyc query interface only offers
searching for pathways, reactions and
compounds. Proteins or genes can be
selected for querying but no matches will
be found by searching, for example, EC
numbers. The EcoCyc user is also able to
start the system from an E. coli metabolic
overview in addition to the query page.
MetaCyc offers no general pathway
overview. The EcoCyc overview is a
bird's-eye view of the complete E. coli
biochemical pathways.3
The glycolysis and TCA (tricarboxylic
acid) cycles in the middle separate
predominately catabolic pathways on the
right from pathways of anabolism and
intermediary metabolism on the left.
Signalling pathways run along the bottom
of the ®gure. Individual reactions at the
right of the ®gure represent reactions not
currently assigned to any pathway.
However, these single reactions without a
classi®cation are not very helpful in the
overview. The picture uses different icons
(circles, squares, triangles, etc.) for
different compound classes. If a mouse
pointer is assigned to a circle in the
diagram, the name of the corresponding
132
metabolite and the name of the related
pathway are displayed at the bottom of
the www browser window. This
operation will work only for JavaScriptenabled browsers. If such a system is not
available, the user needs biochemical
knowledge to recognise which graph can
represent the pathway. In that case the
metabolic overview will not help to get
detailed information. By clicking on a
compound the pathway containing the
selected compound will be displayed.
Unfortunately there is no way to get
compound information directly from the
overview window.
The metabolic overview is linked
directly to representations of speci®c
pathways or disconnected single reactions
but it is not possible to detect where the
pathway is located within the complex
overview. As in many other databases,
compartmentalisation and consequentially
transport mechanisms are missing.
Data representation
Pathways and reactions are drawn
automatically using graph-drawing
algorithms, and size may be adjusted. The
graphical representation of pathways
offers different levels of information. The
lowest level contains only the major
reactions of a given pathway. Information
about enzymes and genes, etc. is not
available. In the more detailed level all
enzymes with their corresponding EC
numbers and corresponding genes are
given. Some pathways show chemical
structures of the reactants on that level,
but this feature is not consistent. Some
pathways also include green arrows that
indicate regulatory mechanisms (dotted
lines) or connected pathways (solid lines).
Not all available connections between
pathways and reactions shown in the
metabolic overview picture are shown
within the pathway representation (eg
TCA cycle).
Within the graphical representation of
pathways, inhibitors and activators of
reactions are represented, but detailed
compound information is not accessible.
Names of the compounds are available in
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
PathDB
the bottom line of the JavaScript-enabled
www browser window.
Enzyme information consists of names,
enzyme classi®cation, catalysed reactions
and pathways in which the enzyme
occurs. More data about the enzyme are
given by connecting to the ExPASyENZYME database. Within EcoCyc a
graphical representation of the genereaction schema shows the relation
between E. coli genes coding for enzymes
or subunits. It also includes reactions that
can be catalysed by more than one
enzyme. A linkage of the enzyme to the
related gene or vice versa is implemented.
The E. coli genes are characterised by
their names, sequences, classi®cation,
positions within the genome and
information about the corresponding
enzymes. Links to the E. coli Stock
Center Database (CGSC)18 and automatic
queries in GenBank are installed.
Compound properties are represented
with names, chemical formulas, molecular
weights, CAS (Chemical Abstracts Service)
number and structural formulas, but these
formulas are not based on uniform
drawings. In addition a list of reactions in
which the compound occurs is given.
Reliability/inconsistency
The classi®cation of pathways refers to the
classi®cation of E. coli gene products,19,20
but is not used consistently. The
biosynthesis of amino acid families
includes, for example, the following
instances: `ile‡val biosynthesis',
`leu‡val‡ala‡ile biosynthesis' and
`leu‡val‡ile biosynthesis'. In such a case
a differentiation between these pathways
does not make sense.
Compounds are based on a de®ned
classi®cation, but this system of
compound classi®cation has some
discrepancies. For example it has the type
`unclassi®ed compounds' but within this
category compounds with known
classi®cation (eg phospholipids, steroids)
also occur. There is a category `animal
compounds' which includes hormones,
etc., but these compounds do not
participate in reactions of E. coli and are
therefore not included within speci®c
pathways. On the other hand, the
differences between animal compounds,
unclassi®ed and non-metabolic
compounds are unclear.
Additional tools
Unfortunately the query options are not
complex enough to search for reactions or
pathways by de®ning substrates, products,
etc.
One important feature of the EcoCyc
system is the possibility of importing
E. coli expression data and colour-coded
highlighting of reactions in the general
pathway overview according to the
relative or absolute amount of the gene
expressions.
PATHDB
Content of the database
PathDB is a general metabolic pathway
project of NCGR (National Center for
Genome Resources) representing
compounds, metabolic proteins (enzymes
or transport proteins), metabolic steps
(reactions) and pathways.21 Information
about locations within cells or organisms
are also given if available. PathDB also
includes descriptions of the kinetic,
thermodynamic and physicochemical
properties of pathway components based
on the scienti®c literature. The
integration of these data into PathDB
offers a good basis for the connection
with simulation tools.
Access to the data
A query interface allows the user to ®nd
pathways, enzymes, reactions or
compounds in the database by searching
given attributes (all possible roles of
compounds, kinetic types, modi®ers,
taxonomy, citations, etc.). A Java
application can be downloaded to run
complex queries. This interface makes it
possible to navigate between the relations
in various entities.
Data representation
The graphical representation of pathways
is based on graph-drawing algorithms,
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
133
Wittig and De Beuckelaer
ExPASy-biochemical
pathways
including, hierarchical and circular views
using prede®ned positions for the
compounds. Zooming and scaling tools
are installed and the user can select speci®c
information. For instance enzymes can be
displayed either by their names, their EC
numbers or both. In addition the user can
choose whether primary and/or secondary
substrates and products are visible.
Depending on the amount of selected
information the representation can be very
complex and confusing because of
overlapping terms and arrows.
Compounds are characterised by their
names, chemical and structural formulas,
molecular masses and other
physicochemical data. The information
about chemical structures is stored as a
SMILES (Daylight Chemical Information
Systems, Inc.) string that can be used to
generate the 2D structure. Metabolic
proteins are characterised by their names,
modi®ers and physicochemical properties
and links are given to databases containing
sequence or structure information.
A reaction is de®ned by its substrates,
products and the stoichiometric equation.
They are displayed as metabolic steps that
also include transport mechanisms.
Pathways are built by sequences of
metabolic steps including their direction,
catalysts, modi®ers and location.
Modi®ers can also be subdivided into
activators and inhibitors.
Reliability/inconsistency
Unfortunately cross-references within the
pathway representation to other
connected pathways are missing. Besides
common and alternative names of a
compound, PathDB also contains a
speci®c compound label used for the
graphical representation. However, most
of the compound labels are not
commonly used abbreviations of
compound names.
Additional tools
One feature of PathDB is a tool for a
user-dependent pathway generation
between two compounds, so the database
is not restricted to prede®ned pathways.
134
EXPASY±BIOCHEMICAL
PATHWAYS
Content of the database
ExPASy (Expert Protein Analysis System)
of the Swiss Institute of Bioinformatics
consists of two databases related to
biochemical pathway information ±
ENZYME and Biochemical Pathways.22
ExPASy±Biochemical Pathways is the
electronic version of the Boehringer
Mannheim Biochemical Pathways
Chart.23 An electronic index allows the
localisation of any metabolite or enzyme
on the chart.
Access to the data
Querying for compounds, enzymes or
pathways by names or substrings results in
a list of related positions (squares) on the
graphical map. If a reaction is not
complete within that resulting square the
reaction is cut and the corresponding
information is lost. One of the four
adjacent squares can be selected to expand
the view.
Most enzyme names on the chart act as
links to the ExPASy±ENZYME database.
Data representation
Unfortunately, ExPASY does not
implement a general overview of all
pathways on the complete map.
ENZYME, like Biochemical Pathways,
does not include organism-speci®c
pathway information except for the usage
of three colours to distinguish pathways
related to higher plants (green), animals
(blue) or microorganisms (red). Black
arrows indicate general biochemical
pathways. The information about the
colour-coded representation of reactions
and compounds and other remarks related
to the pathway maps (eg explanations of
abbreviations) is only available on the
printed versions of the Boehringer
Mannheim wall chart.23,24 Since there is
no documentation within the electronic
version the user has to have one of the
printed versions for data interpretation in
the electronic version.
Enzymes on the electronic map are
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
connected to the enzyme properties
stored in ExPASy±ENZYME containing
names, reaction properties and links to
other enzyme or pathway databases.
Compounds on the map are not linked to
any source of chemical database.
Reliability/inconsistency
Within the graphical pathway
representation it is dif®cult to distinguish
the pathways according to their
classi®cation. Since only two squares of
the chart can be selected at the same time
it is not possible to view complete
pathways, eg glycolysis. On the electronic
as well as on the printed version the
pathway representation includes
redundancies because some enzymes or
compounds appear more than once on
the complete map. Links between
enzymes or compounds are not included,
which implies no connectivity between
reactions or pathways comprising the
same enzymes or compounds. This fact
disables an overview of the complexity
and connectivity of pathways.
Additional tools
The ExPASy system does not offer any
tools to navigate on the pathway map or
to search for reactions or pathways by
de®ning substrates, etc.
University of Minnesota
Biocatalysis/
Biodegradation
Database
UM-BBD
Content of the database
UM-BBD (University of Minnesota
Biocatalysis/Biodegradation Database)
contains information on reactions related
to biodegradation pathways of xenobiotic
compounds in microorganisms.25,26 It
provides information on pathways, single
reactions, chemical compounds including
intermediate states of the reactions,
organisms, enzymes and genes. The
system is connected to other resources
such as GenBank, ChemFinder, Medline,
etc.
Access to the data
A search function allows queries for
speci®c compounds or enzymes by
de®ning names or substrings, chemical
formula, CAS number or EC number
respectively. The search page also has
links to lists of pathways, compounds,
enzymes, reactions and organisms that can
be selected.
UM-BBD gives no general pathway
overview. Most of the pathways and
reactions are disconnected.
Data representation
When appropriate citations in the
literature exist UM-BBD generates new
pathway, reaction and compound pages
based on the literature source. The
representation of a pathway contains the
names of compounds and reactions and
gives links to the more detailed
information of the compounds on their
corresponding pages. Starting compounds
are used to de®ne the name of a pathway.
Organisms are listed if they are known to
initiate this pathway or carry out later
steps. If both genus and species names are
known, the organism name links to a
search of a microbial database. The
pathway also contains possible cofactors.
If compounds occur that are parts of an
intermediary, metabolism pathway links
to the KEGG database are given. The
compound that links to KEGG is shown
as a red circle in the KEGG pathway map,
whereas KEGG does not link to the UMBBD on its pathway pages. A graphical
representation of a reaction contains the
reacting atoms and bonds highlighted in a
bold red font. If literature information
about the reaction mechanism is available
a link to a graphic showing the reaction
mechanism appears. There are two
different types of pathway representations:
a text format representation and a static
drawing, which is more restrictive.
Enzymes of a reaction are linked to
GenBank, when one or more hits have
been found. Furthermore enzyme pages
in the KEGG-LIGAND, in the ExPASyENZYME and/or in the BRENDA
enzyme database are connected when a
four-digit EC number exists. If an
incomplete EC number occurs, ExPASy±
ENZYME gives a list of all possible
enzymes within that enzyme subclass. The
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
135
Wittig and De Beuckelaer
enzyme pages contain a list of reactions
that are catalysed by the enzyme and links
to each of the static UM-BBD pathway
maps in which one or more reactions
catalysed by this enzyme appear.
The compound information includes
names and synonyms, chemical formula,
molecular mass, a SMILES format, CAS
number and links to ChemFinder. The
compound page also contains a list of all
reactions in the database which use or
produce this compound and links to the
UM-BBD pathway maps in which the
compound appears.
Reliability/inconsistency
One important feature is the plausibility
of a reaction or pathway. Pathways with
less evidence or unknown intermediates
are indicated in several ways and
mentioned in the pathway introduction.
If compounds are plausible, but not
certain, or are intermediates, they are put
in brackets. Multiple arrows indicate
unknown intermediates. Within the
introduction of a pathway or reaction
page the signi®cance of the pathway is
given. The names and af®liations of the
people who contributed the pathway to
the UM-BBD are important. This feature
is unique for this database system and
allows the user to contact the person in
question directly.
Additional tools
A dynamically generated pathway map
starting from a de®ned reaction can be
generated and is shown only in text
format representation. The created
pathway maps include all possible
reactions from all organisms found in the
UM-BBD and link to each of the static
pathway maps in which the reactions
appear.
Comparison of
databases
COMPARISON OF THE
DATABASES
A comparison of the analysed databases is
shown in Table 2. The simple molecule
`Ethanol' was selected as a criterion to
compare the database entries.
The result of this comparison points
136
out the following problems. WIT, KEGG
and ExPASy offer no speci®c queries for
compounds which, for example, results in
lists of enzymes or genes containing the
substring `ethanol'. Except for PathDB all
selected queries use the keyword as
substring and result in a list of entries.
From these lists `ethanol' can be selected
directly, except for WIT and ExPASy,
where the compound pages can only be
accessed by navigation, for example
through the enzyme pages. Unfortunately
in the case of `ethanol' ExPASy has no
implemented link to a corresponding
compound page. In general compound
information is available but not
consistently for all compounds.
The databases have different
preferences for contents within their
compound pages. Some of the databases
represent structural formulas or SMILES
strings. The databases also contain
different lists of names, which are written
with either capital or lower case letters.
Also within the same databases the types
of letters are used inconsistently. Only
KEGG and EcoCyc refer to pathways
containing this compound but the
pathways are assigned to different types of
classi®cations. The representation of
reactions containing the selected
compound is very inconsistent between
the databases. WIT has no reference to
reactions, KEGG refers to numbers of
enzymes that use the compound as
reactant (R), cofactor (C), inhibitor (I),
etc. and other databases refer to reaction
equations. One example of inconsistent
reactions within a database is shown in
EcoCyc. It contains reactions that use
`ethanol' as substrate and the resulting
product is `an aldehyde'. Ethanol is a
speci®c compound with molecular mass,
structural formula, etc. An aldehyde
represents a class of compounds that
contain the same functional group. Using
ethanol as the substrate only one possible
aldehyde (acetaldehyde) can be generated.
Note that, regarding the references on
the compound pages, UM-BBD gives
information about the author and date of
the entry. One piece of missing
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
137
C2 H6 O
‡
ÿ
64-17-5
Chemical formula
Structural formula
SMILES string
CAS number
ÿ
‡
CCO
ÿ
Ethanol, ethyl
alcohol, methylcarbinol
±
±
±
±
±
C2 H6 O
‡
ÿ
64-17-5
Absolute alcohol, alcohol,
etOH, eth, ethyl alcohol,
grain alcohol
No access to compound
Select `ethanol' from query
results
page (only position on the
pathway map), access to
compound pages through
enzyme pages where the
compound is used in the
reaction
Not available for ethanol !
C2 H6 O
ÿ
OCC
64-17-5
Alcohol, ethanol, ethyl
alcohol, ethyl hydroxide,
grain alcohol,
methylcarbinol, potato
alcohol, spirit, synasol,
wine spirit, cologne spirit
Select `ethanol' from query
results
1 compound
23 compounds
Select `ethanol' from query
results
Query compound by name
PathDB
Query compound by name
EcoCyc
General keywords search
(no speci®cation)
7 enzymes,
8 compounds
ExPASy
Query the LIGAND
database
23 enzymes,
51 compounds
KEGG
Compound page info:
Names
Ethanol, ethyl
alcohol, methylcarbinol
Search for compound `ethanol'
Query ®eld and
Search all data entries
speci®cation
(no compound speci®c query)
Number of hits
35 genes, 46 pathways, 23
enzymes, 21 topics in functional
overview
Access to compound No query for compounds
page
available, access to compound
pages through navigation of a
pathway (eg ethanol oxidation)
and diagram picture
WIT
continued overleaf
Anhydrol, alcohol,
methylcarbinol, denatured
alcohol, ethyl hydrate, ethyl
hydroxide, ethanol, algrain,
cologne spirit, fermentation
alcohol, grain alcohol, jaysol,
jaysol s, molasses alcohol,
potato alcohol, spirit, spirits of
wine, tecsol, alcohol
dehydrated, ethanol 200 proof,
cologne spirits (alcohol), sd
alcohol 23-hydrogen, Synasol
C2 H6 O
‡
CCO
64-17-5
Select `ethanol' from query
results
Search for compound whose
name contains the keyword
19 compounds
UM-BBD
Table 2: Comparison of the pathway databases (WIT, KEGG, ExPASy, EcoCyc, PathDB, UM-BBD) in regard to the search options and query results. `Ethanol' is used
as keyword to access compound information
138
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
ÿ
ÿ
ÿ
NSC number: 85228
DaylightTM
Molecular mass
Pathway
Enzyme/reaction
Reference/link
WIT
Table 2: (Continued)
±
ÿ
Glycolysis/gluconeogenesis
1.11.1.6 (R)
2.3.1.152 (R)
3.1.1.67 (R)
3.5.1.75 (R)
KEGG
±
±
±
±
ExPASy
46.069
Fermentation
Acetaldehyde ‡ NADH ˆ
ethanol ‡ NAD
NAD ‡ ethanol ˆ
NADH ‡ an aldehyde or
ketone
NADP ‡ ethanol ˆ
NADPH ‡ an aldehyde
NADP ‡ ethanol ˆ
NADPH ‡ an aldehyde
H2O ‡ urethan ˆ
NH3 ‡ CO2 ‡ ethanol
H2O ‡ a long chain-acyl
ethyl ester ˆ ethanol
‡ a fatty acid
±
EcoCyc
±
46.08
ÿ
Ethanol ˆ acetaldehyde
Ethanol ˆ aldehyde
Urethane ˆ ethanol
‡ NH3
PathDB
ChemFinder
Natl. Toxicology Program
Page Author ‡ Date
46.07
ÿ
Ethyl acetate ˆ
acetate ‡ ethanol
Ethanol ˆ acetaldehyde
UM-BBD
Analysis and comparison of metabolic pathway databases
information on these compound pages in
most of the databases is a reference to the
classi®cation of compounds used, similar
to the classi®cation of enzymes on most
enzyme pages.
errors and
inconsistencies
ERRORS AND
INCONSISTENCIES
As can be seen within the abovementioned examples there are always
problems, errors and inconsistencies
within one single database and between
different databases.
Until now it has not been possible to
automate the detection of errors within
databases. For example errors such as
wrong structural formulas of compounds
or different compound names of the same
compound which also cause two different
compound entries in the database can
only be detected manually by checking
each single entry.
Typically manual databases will always
contain typing errors. In most cases such
errors do not change the meaning of the
content. But certainly there are some
cases where a typing error changes the
property of a protein and enzyme. The
enzyme L-ribulose-5-phosphate 4epimerase (EC 5.1.3.4) catalyses the
reaction between L-ribulose-5-phosphate
and D-xylulose-5-phosphate, according to
IUBMB recommendations.13 Databases
such as BRENDA, KEGG, PathDB and
WIT/EMP use this equation with the
above-mentioned substrate and product.
In contrast, the protein database SWISSPROT and ExPASy±ENZYME associate
the same enzyme to a reaction between Lribulose-5-phosphate and D-xylose-5phosphate. Obviously the product name
D-xy(lu)lose-5-phosphate contains a
typing error that changes the catalytic
property of the catalysing enzyme. Since
several other databases extract their data
from this source, the wrong relation
between the compound and the enzyme,
including all the detailed information on
the reaction, is transferred and spread
throughout other database systems.
Copying or linking data causes the
transfer of errors from one database to
another. Such errors are relatively easy to
detect for an expert user. In contrast
typing errors within gene or protein
sequences are not usually so obvious.
Other problems that occur within
databases are errors from the original
source or experimental problems. These
errors include sequencing errors, errors
from homology searches, structure
prediction, etc. and incorrect, incomplete
or wrong transfers from literature
information into the database. The last
case occurs by incorrect or incomplete
interpretation of experimental results or
by drawing wrong conclusions from the
literature sources that are incorporated
manually. The risk here is the decrease in
reliability in the statements and contents
of such systems. One way to avoid such
problems is to add the source of data and
the reference, which gives the database
user the chance to see the original results
from a homology search, etc. The
comparisons of predicted data between
databases are very error-prone because of
the use of different tools for prediction.
This especially affects data about 3D
structures, motifs, molecular mass or
isoelectric points of protein sequences.
Both database user and database
developer must remember that
information is always contextdependent.27 Data cannot be generalised
because of the dependence on
environmental conditions. An important
example is the recent expression data. In
this case the environmental conditions
and methods used are essential for the
interpretation of the data and are often
missing in the database. In the worst case
these data will never be trusted by expert
users and the database therefore virtually
useless.
Inconsistencies can be found by
comparing database entries from different
databases. Typically problems of
inconsistencies arise from names and
abbreviations used in different databases.
There is no consistent nomenclature for
genes and proteins unlike the enzyme
nomenclature recommended by the
IUBMB.13 Databases refer to different
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
139
Wittig and De Beuckelaer
names, synonyms and abbreviations,
depending on the source of the data
including different types of spelling.
In addition to the names no consistent
nomenclature for functional descriptions
of genes and proteins exists. This
inconsistency is also transferred to
inconsistent classi®cations of pathways
(Figure 2). One example for the
classi®cation of functions came up with
the classi®cation of E. coli gene products
established by Monica Riley.19,20 A
de®ned nomenclature of functions similar
to the enzyme classi®cation is certainly a
great advantage.28 That should also imply
a standard de®nition of keywords within
comment lines ± understandable and
usable for different types of users. In this
context a biochemical classi®cation of
chemical compounds has to be installed.
There are different ways to classify
chemicals: either by the chemical
properties recommended by the IUPAC
or by functional properties. The last
option is being employed by most of the
databases described above and causes a
large degree of variability. For systems
representing biochemical pathways a
¯exible classi®cation is needed which
ideally combines chemical and functional
characteristics of the compound but still
has to be consistent throughout the
system.
Figure 2: Comparison of the classi®cation of Glycolysis (Embden±
Meyerhoff±Parnas pathway ± EMP) used in KEGG, EcoCyc and WIT
140
A demonstrative representation of the
connectivity of the main database types is
given in Figure 1. Genome and chemical
databases can be seen as the basis for all
other databases. Data from literature
databases enter other types of database.
Pathway databases are located above other
databases containing parts of data from the
databases below. Consequently pathway
databases might also contain the errors
made in the other databases. This
indicates the need for pathway database
developers to be especially aware of errors
and inconsistencies.
Expert knowledge is needed to avoid
or to detect errors and inconsistencies
within databases: a time-consuming and
expensive aspect.
SUMMARY
In general all analysed databases are far
from being complete. In terms of userfriendly access and the amount of data
incorporated in the database the KEGG
system clearly deserves the highest praise.
The more or less consistent nomenclature
of the compounds based on international
agreements such as the IUPAC and
IUBMB, as well as the easy-to-understand
classi®cation of compounds and reactions,
makes this system an easy tool for ®rst
access to pathway databases. However, for
users familiar with the biochemical
pathways of E. coli and with a JavaScriptenabled www browser, the EcoCyc
system offers the most information on that
speci®c organism since it also incorporates
direct links to the literature and
experimental data being used to set up the
system. For users interested in the
creation of new biochemical pathways
between different compounds the KEGG,
PathDB and the UM-BBD systems are
the most useful ones. Concerning
graphical representation of biochemical
pathways, those users already familiar with
the Boehringer Mannheim chart will
easily ®nd their way through the
ExPASy±Biochemical Pathways, whereas
absolute beginners will de®nitely prefer
the KEGG graphics.
One disadvantage of all these pathway
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
Analysis and comparison of metabolic pathway databases
systems is the absence of help functions or
windows describing, for example, the
concepts and/or restrictions that one has
to de®ne when querying the data. In most
of the databases meta-information, for
example about the source or origin of the
data (eg experiments, homology search,
literature), is missing.29 The user is not
provided with information as to whether
the data are experimentally proven or not.
Since most of the data are predicted by
homology search, information about
algorithms and their parameters being
used for the homology search or the
degree of homology would be very useful
for data interpretation.
As regards problems of errors, sequence
databases were already analysed some
years ago.29 Recommendations were
given on how to solve problems of errors
and missing meta-information. These
suggestions made for sequence databases
should now be applied to pathway
databases.
A comparison of data of pathway
databases is complicated because of
different classi®cations of compounds,
genes, proteins, pathways and gene/
protein functions. Only the classi®cation
of enzymes recommended by the IUBMB
is used as a standard. One suggestion
would be to have more standards for
nomenclature. In database development a
standardisation of data structures and ®le
formats would make further applications
more powerful.
In conclusion, pathway database users
should be aware of many possible errors
within the databases and the large number
of inconsistencies between the databases.
The inconsistencies would cause problems
for example during the extraction of data
from different data sources. Developers of
pathway databases should focus more on
curation of the data. One suggestion
would also be to try to standardise their
database entries, for example, in the case
of classi®cation.
(Project BioRegio 0312212). The authors thank
Andreas Kohlbecker and Isabel Rojas for
discussions and reviewing the manuscript.
Acknowledgements
This work was supported by the Klaus Tschira
Foundation gGmbH (KTF) and the BMBF
14. EcoGene database (2000), URL: http://
bmb.med.miami.edu/EcoGene/index.html
References
1.
Frishman, D., Heumann, K., Lesk, A. and
Mewes, H. (1998), `Comprehensive,
comprehensible, distributed and intelligent
databases: Current status', Bioinformatics, Vol.
14 (7), pp. 551±561.
2.
Karp, P. D. (1998), `Metabolic databases',
Trends Biochem. Sci., Vol. 23 (3), pp. 114±116.
3.
Karp, P. D., Krummenacker, M., Paley, S. and
Wagg, J. (1999), `Integrated pathway/genome
databases and their role in drug discovery',
Trends Biotechnol., Vol. 17 (7), pp. 275±281.
4.
Overbeek, R., Larsen, N., Smith, W. et al.
(1997), `Representation of function: The next
step', Gene, Vol. 191, pp. GC1±GC9.
5.
Overbeek, R., Larsen, N., Pusch, G. D. et al.
(2000), `WIT: Integrated system for highthroughput genome sequence analysis and
metabolic reconstruction', Nucleic Acids Res.,
Vol. 28 (1), pp. 123±125.
6.
Selkov, E. Jr, Grechkin, Y., Mikhailova, N.
and Selkov, E. (1998), `MPW: The Metabolic
Pathways Database', Nucleic Acids Res., Vol. 26
(1), pp. 43±45.
7.
Selkov, E., Basmanova, S., Gaasterland, T.
et al. (1996), `The metabolic pathway
collection from EMP: The enzymes and
metabolic pathways database', Nucleic Acids
Res., Vol. 24 (1), pp. 26±28.
8.
Kanehisa, M. and Goto, S. (2000), `KEGG:
Kyoto Encyclopedia of Genes and Genomes',
Nucleic Acids Res., Vol. 28 (1), pp. 27±30.
9.
Ogata, H., Goto, S., Sato, K. et al. (1999),
`KEGG: Kyoto Encyclopedia of Genes and
Genomes', Nucleic Acids Res., Vol. 27 (1),
pp. 29±34.
10. Goto, S., Nishioka, T. and Kanehisa, M.
(1998), `LIGAND: Chemical database for
enzyme reactions', Bioinformatics, Vol. 14 (7),
pp. 591±599.
11. Goto, S., Nishioka, T. and Kanehisa, M.
(2000), `LIGAND: Chemical database for
enzyme reactions', Nucleic Acids Res., Vol. 28
(1), pp. 380±382.
12. DBGET/LinkDB ± Integrated database
retrieval system (2000), URL:
http://www.genome.ad.jp/dbget/
13. IUBMB ± International Union of
Biochemistry and Molecular Biology (1992),
`Enzyme Nomenclature: Recommendations of
the Nomenclature Committee of the
IUBMB', Academic Press, New York.
15. Karp, P. D., Riley, M., Paley, S. M. et al.
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001
141
Wittig and De Beuckelaer
(1997), `EcoCyc: Encyclopedia of Escherichia
coli genes and metabolism', Nucleic Acids Res.,
Vol. 25 (1), pp. 43±50.
16. Karp, P. D. and Riley, M. (1999), `EcoCyc:
The resource and the lessons learned' in
Letovsky, S. Ed., `Bioinformatics Databases
and Systems', Kluwer Academic, Boston, MA,
pp. 47±62.
17. Karp, P. D., Riley, M., Saier, M. et al. (2000),
`EcoCyc and MetaCyc databases', Nucleic Acids
Res., Vol. 28 (1), pp. 56±59.
24. Michal, G. (1999), `Biochemical Pathways',
Spektrum Akad. Verlag, Heidelberg.
25. Ellis, L. B. M., Hershberger, C. D. and
Wackett, L. P. (2000), `The University of
Minnesota Biocatalysis/Biodegradation
Database: Microorganisms, genomics and
prediction', Nucleic Acids Res., Vol. 28 (1), pp.
377± 379.
18. CGSC ± E. coli Stock Center Database
(2000), URL: http://cgsc.biology.yale.edu/
cgsc.html
26. Ellis, L. B. M., Speedie, S. M. and McLeish,
R. (1998), `Representing metabolic pathway
information: an object-oriented approach',
Bioinformatics, Vol. 14 (9), pp. 803±806.
19. GenProtEC ± E. coli genome and proteome
database (2000), URL:
http://genprotec.mbl.edu:80/cat1.htf
27. Bork, P. (2000), `Powers and pitfalls in
sequence analysis: The 70% hurdle', Genome
Res., Vol. 10, pp. 398±400.
20. Riley, M. (1993), `Functions of the gene
products of Escherichia coli', Microbiol. Rev.,
Vol. 57 (4), pp 862±952.
28. Karp, P. D. (2000), `An ontology for
biological function based on molecular
interactions', Bioinformatics, Vol. 16 (3),
pp. 269± 285.
21. PathDB (2000), URL: http://www.ncgr.org/
software/pathdb/
22. ExPASy ± Biochemical Pathways (2000),
URL: http://expasy.ch/cgi-bin/searchbiochem-index
142
23. Michal, G. (1993), `Biochemical Pathways',
Boehringer Mannheim GmbH ± Biochemica.
29. Karp, P. D. (1998), `What we do not know
about sequence analysis and sequence
databases', Bioinformatics, Vol. 14 (9),
pp. 753±754.
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 2. 126±142. MAY 2001