Download Protein function from the perspective of molecular interactions and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Protein wikipedia , lookup

Cyclol wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Proteolysis wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Bernard Jacq
is a CNRS researcher and
project leader in the LGPD
developmental biology
laboratory of Marseilles. He is a
molecular biologist and
bioinformatician and his
present area of research is the
study of structure, function,
evolution and bioinformatics of
developmental regulatory
networks.
Protein function from the
perspective of molecular
interactions and genetic
networks
Bernard Jacq
Date received (in revised form): 5th December 2000
Abstract
Keywords: protein function,
genetic networks, molecular
interactions, interaction maps,
regulomics, functional
classi®cations
Protein function is a complex notion, which is now receiving renewed attention from a
bioinformatics and genomics perspective. After a general discussion of the principles of
experimental methods employed to decipher gene/protein function, the contributions made by
new, high-throughput methods in terms of function discovery are discussed. Recent work on
functional ontologies and the necessity to describe function within the context of hierarchical
levels of complexity are presented. The concepts of molecular interactions and genetic
networks are then discussed, leading to a useful new framework with which to describe
protein function using new tools such as 2D interaction maps. Finally, it is proposed that
interaction data could be used to develop new methods for the functional classi®cation of
proteins. An example of functional comparisons on a real data set of yeast chromosomal
proteins is presented.
INTRODUCTION
Bernard Jacq,
Laboratoire de GeÂneÂtique et
Physiologie du DeÂveloppement,
IBDM, Parc Scienti®que de
Luminy, Case 907,
13288 Marseille Cedex 9,
France
Tel: ‡(33) 04 91 26 96 00
Fax: ‡(33) 04 91 82 06 82
38
The term `gene function' (or `protein
function') is certainly one of the most
widely used in biology. It is nevertheless,
and unfortunately so, the one for which
there is probably the most severe lack of a
common accepted de®nition.
Let us take an example of the diversity
of what biologists term the function of a
macromolecule in a living organism: if a
protein crystallographer on one hand and
a geneticist on the other describe the
function of a given protein (X), it is
highly likely that there will be no overlap
between their two descriptions. In the
®rst case, it might be said that the relative
orientation and distance between three
speci®c amino acid residues in the
structure are crucial for the enzymatic
function of protein X as a peptidyl
hydrolase. In the second case, it might be
said that the lack of function of protein X
(as found in a null mutant of the X gene)
will lead to a speci®c developmental
defect in the ®rst hours of embryonic
development. Both statements are indeed
clearly related to the question of the
function of protein X: they could both be
scienti®cally true at the same time, but
they uncover two different levels of
function description. One is a
biochemical view of the function of
protein X at the molecular level, whereas
the other describes the function of the
same protein at the level of an entire
organism. Therefore, the two descriptions
are completely different, but each of them
correctly describes a part of the `complete'
function of protein X.
Several critical issues are associated with
the re-examination of gene (protein)
function from a genomics and
bioinformatics view. Some of these issues
are discussed here; four points in
particular.
· The ®rst one is related to experimental
methods used to study gene function.
Classical methods used to study the
function of a gene or a protein, which
are based on the functional
perturbation principle, are brie¯y
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
discussed. How genomics are now
providing new complementary ways to
decipher gene/protein function is
mentioned.
· A second issue is the lack of a common
de®nition of gene function, as already
illustrated above. This issue is
important, both from a biological point
of view and also for practical reasons
such as the functional annotation of
genomes for instance. Some of the
recent work performed in the ontology
®eld is presented, where generic and
standardised ways for describing gene
function are being developed.
Biological means by which gene
function could be described will also be
discussed.
the functional
perturbation
principle
· A third issue relates to gene function in
the context of genetic networks. After
30 years of experimental reductionism,
where genes have been largely
examined individually, functional
genomics approaches (micro-arrays,
two-hybrid screens) are showing that
genes behave as groups, which can
share a common type of regulation or
whose products share common direct
protein interactors. Examples of
networks of genes are emerging and we
will discuss the biological importance of
describing gene function within a
conceptual frame of a genetic network.
· Finally, the possibility that interaction
data could be used to develop methods
for functional comparisons of proteins
will be presented on a yeast protein
interaction data set. Such methods
would be an extremely useful
complement to the present structural
(mainly sequence-based) comparison
methods.
EXPERIMENTAL METHODS
TO DECIPHER GENE/
PROTEIN FUNCTION
A detailed description of concepts and
methods that have been developed to
study gene and then protein and RNA
function is clearly beyond the scope of the
present study. Rather, some general
principles that underlie the experimental
approach of function analysis and some of
the lessons learnt from many years of gene
function studies are discussed, since these
have practical consequences when
describing gene function in terms of a text
or a database.
Some general principles behind
functional studies
A ®rst general principle of functional
studies (which goes far beyond biology) is
the functional perturbation principle: in
order to approach the normal function of
an unknown system, the best way is
probably to study what happens when this
system is subjected to abnormal function.
Genetics has made an abundant use of this
principle through the examination of
phenotypes at various structural/
functional levels (molecule, cell, tissue,
organism, physiology and population). At
the basis of genetics is the study of many
variants (alleles) for one gene. Establishing
the relationships that exist between the
structure of different variants (speci®c
genotypes) and the resulting observable
characters (speci®c phenotypes) at
different structural levels provides a series
of experimental observations that are
invaluable in deciphering gene function.
Many other biological sciences such as
biochemistry, cell biology or physiology
also make a wide use of the functional
perturbation principle. Therefore, the
greater part of our present knowledge on
protein function has been obtained from
the observation of situations in which
proteins were themselves abnormal
(mutated) or put in an abnormal context
in vivo or in vitro. This means of studying
the function of a protein product has
some practical consequences.
The ®rst one lies in the names given to
many genes, which re¯ect more a
dysfunction than the normal function: for
instance, the Drosophila gene responsible
for the normal colour of the eye has been
named `white', although the normal eye
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
39
Jacq
protein function is
pleiotropic
40
colour is red. This name was adopted after
a particular mutant phenotype that
abolishes eye colour (such that the eye
then looks white instead of red). Other
Drosophila gene names, such as wingless,
Distalless and Multiple sex combs, are
clear examples of abnormal rather than
normal gene function, and this could be
misleading to non-specialists. In fact,
studying the function(s) of a gene or a
protein without introducing any
perturbation at any level is extremely
dif®cult, because a function, in contrast to
a structure, is not an object and can be
studied only through its effect upon other
objects. In other words, we do not study
the function directly, but the effect it
produces on observable biological
structures.
A second practical consequence is that
a function cannot be completely
understood in a single pass of experiments.
De®ning a function is a progressive
process that ideally requires techniques
from different biological sciences, and the
more experiments performed, the more
we learn about the function. These
considerations demonstrate why
knowledge of the function of a given
protein is largely dispersed and is
published in different papers and journals
by different authors, precluding a uni®ed
view of protein function. Even at the
database level, where some information
synthesis work could be done, all
functional information on a given protein
is rarely found in a single database.
A second general principle, which has
become apparent from results
accumulated over many years, is that the
function of a protein is generally
pleiotropic in at least two respects: ®rst,
within a given organisational level
(molecular, cellular), a protein often has
more than one function: at the molecular
level for instance, we have several
examples of DNA-binding proteins
which are also RNA-binding proteins:
Xenopus TFIIIA,1 Drosophila bicoid2 and
modulo3 proteins for instance; second, it
is rare that the function of a protein can
be described at one structural level and
that no other observable function(s) are
found at any other level: the Drosophila
bicoid protein is an RNA-binding
protein and a transcription factor at the
molecular level, and it is an essential
determinant of the formation of anterior
structures (head, thorax) at the level of
the organism. As will be discussed
below, it is therefore important to
examine functions of proteins within a
structural level framework and in any
case, it is always necessary to specify at
which level the function is being
examined. In conclusion, it is probably
more accurate to speak about `the
functions' rather than `the' function of a
gene or a protein.
Contribution of classical and
high-throughput methods to
functional studies
Classical methods of studying gene/
protein function have several interesting
characteristics: a large spectrum of
methods is available, and many different
methods exist for each structural level of
integration; many methods produce
quanti®able results allowing (to a certain
extent) a comparison of different proteins
using the same functional test; the
combination of results from different
methods applied to the same protein
produce a rich source of information and
this explains how we now have
accumulated detailed knowledge of the
functions of several hundred proteins. On
the other hand, there are some drawbacks
associated with these methods: no generic
functional test is available that could be
applied to all uncharacterised proteins;
classical methods have not always been
performed using standardised protocols,
so that direct comparisons of functional
results obtained in different laboratories
are not generally straightforward; ®nally,
the search of a function always gives an
incomplete answer because not all
possibilities are ever investigated in
totality (for instance, the search for
regulators of a given gene is often
practically restricted to the most obvious
candidates).
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
ontologies are useful
to describe gene
function
The arrival of genomics in the
context of available methods as a means
to decipher gene function now offers
new possibilities that complement
classical methods. For instance, if one is
looking at genes under the control of a
given regulatory protein in one species,
it is theoretically possible, if a complete
gene micro-array is available, to
compare RNAs extracted from a wildtype specimen to RNAs from a loss-offunction and/or over-expression mutant
of the regulatory gene, and subsequently
to score all genes whose transcription
status has changed. Of course, it remains
to be determined whether the observed
regulations are direct or indirect, but
obtaining complete lists of regulated
targets for many different genes is now
potentially possible. Two of the most
interesting aspects of such genomic
approaches are that no hypothesis has to
be made ®rst (this marks a switch from
hypothesis-driven to results-driven
experiments) and that a large (or even
complete) view of one precise aspect of
the function of a gene is attainable. Also,
since a high-throughput experiment can
be considered as a massively parallel one,
individual results can be satisfactorily
compared, experimental conditions
being the same for all obtained data
points.
The main problem associated with
functional genomics approaches is that, at
the moment, only a few functional
experiment types have been scaled up to a
high-throughput status: RNA expression
quanti®cation (micro-arrays and DNA
chips), protein±protein interactions
(double-hybrid) and in situ RNA
hybridisation to a lesser extent are the best
known examples. It is probable that in the
near future, other types of functional
experiments (protein or antibodies microarrays for instance) will be developed and
applied on a genome-wide basis, allowing
new knowledge to be obtained rapidly. In
the third section of the paper, we will
return to the use of genomic methods and
discuss function in the context of large
genetic networks.
FUNCTIONAL
DESCRIPTIONS FOR
MACROMOLECULES
Functional ontologies
Karp4 discusses the concept of biological
function diversity and introduces the
concepts of local function and integrated
function that he applied essentially to
prokaryotic organisms in the EcoCyc
database.5 In the example provided in the
introduction, the biochemical function
would be an instance of a local function,
whereas the genetic function would be an
instance of an integrated function. As far
as eukaryotic organisms are concerned,
the GO Consortium6 has produced a
structured controlled vocabulary that aims
to describe the roles of gene products in
any organism. To this end, it has
produced three independent ontologies
(the molecular function, biological
process and cellular component
ontologies), based on biological
knowledge accumulated for the yeast
Saccharomyces cerevisiae, the ¯y Drosophila
melanogaster and the mouse Mus musculus.
These attempts represent useful steps
towards encoding functional data in
databases. It has to be noted that both in
the prokaryotic EcoCyc database (for
evident reasons) and also in the GO
database (developed for a generic
eukaryotic cell), functions occurring at
the upper levels of tissue, organ, entire
organism or population of organisms are
not represented. This is an important
limitation that will make any functional
description of phenotypes encountered in
developmental defects or physiological
abnormalities for multicellular organisms
very dif®cult. For instance, making
functional links between sequence
databases and the OMIM database (a
catalogue of human genes and genetic
disorders7 ) will probably prove to be
dif®cult in many instances with the
present ontologies. It would therefore be
interesting to extend views developed in
EcoCyc or GO ontologies in order to get
a more complete description of functions
of macromolecules in all types of
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
41
Jacq
organisms. In this respect, a ®rst question
is: what are the different levels of
biological integration at which the
function of a protein could be tentatively
de®ned?
Examining function in the
context of a hierarchy of
structural levels
protein function is a
hierarchy of
structural levels
One possible way of addressing this
question is to start at the molecular level
and zoom out in a stepwise fashion in
order to examine the different nested
levels of biological integration from a
structural point of view. In so doing, one
can ®nd at least six natural levels of
increasing biological complexity (Table 1,
left column). These six structural levels
can be de®ned by speci®c components,
structures, processes and concepts that are
not valid in other levels. Furthermore,
each of these levels often represents one
traditional biological discipline such as
biochemistry, cellular biology,
developmental genetics, physiology and
anatomy. Not all six levels are necessary
for a functional description of every
protein, but when many different proteins
are described, all levels will be used. Also,
they are not necessary for every organism:
when studying prokaryotic organisms, the
tissue±organ level is useless and the cell
and organism levels represent in fact the
same level. Prokaryotes can therefore be
studied at four different levels only and six
levels are necessary to describe the variety
Table 1: Structural and functional levels in biological organisation.
The six structural and corresponding functional levels of biological
organisation proposed to serve as a framework for functional
descriptions are listed in increasing order of organisational complexity
Structural levels
Functional levels
Molecules
Molecular complexes, interaction networks
Subcellular structures
Cell traf®cking
Cells
Cell migrations, intercellular communications
Tissues, organs
Physiological regulations
Organisms
Behaviour
Populations
Interspecies relationships, ecological equilibria
42
of structures observed in eukaryotic
organisms (with the exception of
monocellular organisms such as yeasts or
protists).
Recognising the importance of
function description at each of these six
structural levels would have several
advantages:
· These levels represent different
biological realities which are quite
natural to people working with
eukaryotes and taking them into
account will obviate dif®culties
encountered with arti®cial
classi®cations.
· As already stated, these six levels
correspond to different biological
sciences that have developed a speci®c
vocabulary, concepts and experimental
methods. Ignoring some of these levels
or trying to fuse some of them into a
single level is likely to produce lack of
consistencies or even errors which
could be detrimental to a complete
description of gene function.
· We are totally ignorant of the
biological laws allowing one to infer
knowledge at one structural level from
what is known in another one. For
instance, trying to infer the cellular type
of a eukaryotic cell (muscular, nervous
and endodermal) from its detailed
proteomic content is presently not
possible (but seems an attainable goal in
the future). Another classical example is
to attempt to predict a phenotype from
a molecular defect in a gene (®lling in
the genotype±phenotype gap), which
is also nearly impossible now. We
believe that understanding the
biological level transition laws will ®rst
require a precise description of each
level in structural and functional terms
and that the more levels described, the
better.
We therefore advocate that analyses at
several structural levels are necessary to
fully describe all known subtleties in
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
eukaryotic function, and that ontologies
encompassing these levels have to be
created. As a working hypothesis, we
propose six structural levels, namely the
molecular (biochemical), subcellular,
cellular, tissue/organ, organism and
population levels (Table 1). In the second
part of the next section, we describe how
function may be described with respect to
the different structural levels.
protein interactions
exceed the number
of proteins
FUNCTIONAL
DESCRIPTIONS OF
PROTEINS
Are all proteins capable of
establishing the same number
of different interactions?
During the course of its biological life,
from its synthesis until its degradation, any
protein interacts with other partners in
order to perform its function(s). To date,
we have no precise idea of the number of
existing functional interactions either at
the level of individual proteins or that of
an entire proteome.
Different proteins may have different
numbers of interactors as a consequence
of different intrinsic characteristics such
as: (i) their size; (ii) their half-life; (iii) the
speci®city of their binding sites; (iv) their
structure (number and nature of
domains); (v) their subcellular locations
and tissue/organ distribution; (vi) the type
of organism in which they are found.
Indeed, at the experimental level, we
have many examples of the variation
range of interaction for individual
proteins, including cases of proteins with
very few different interacting partners, as
well as proteins that can be engaged in
several hundreds of interactions. In
Drosophila, for instance, the transcription
factors Ultrabithorax and engrailed have
been shown to have around 100 binding
sites on polytene chromosomes.8,9 At the
other side of the scale, it seems that many
bacterial regulators have a high speci®city
and interact with one gene only.10 In the
case of protein±protein interactions,
browsing databases such as YPD11
illustrate examples of proteins with more
than 20 identi®ed partners and others
with only 1 or 2.
However, a major dif®culty is to assess
if this variation is real or partly re¯ects our
incomplete knowledge at present, some
proteins having been studied in far more
detail than others. At the proteome level,
it seems clear that the number of
interactions in a cell largely exceeds the
number of different proteins. Present
minimal estimations of the number of
different protein±protein interactions in
yeast, based on two-hybrid screens, are in
the range of 36,000 for approximately
6,300 proteins.12 Estimations of the
number of different transcriptional
regulators per metazoan gene are in the
7±8 range,13 which, in Drosophila, would
lead to around 110,000 protein±DNA
transcriptional interactions for around
14,000 genes. If additional factors that
may increase protein diversity (alternative
splicing, post-translational modi®cations)
are taken into account, interactions could
easily attain the millions range in a
representative metazoan such as
Drosophila. More realistic evaluations of
the size of the `interaction universe' (the
interactome or regulome) must await a
precise determination of the number of
partners for a set of representative proteins
in different functional classes.
Molecular interactions and
genetic networks
We previously discussed the advantages of
describing protein function in the context
of a hierarchy of structural levels. How
may this be achieved? We propose that a
functional level is associated to each of the
six structural levels described previously
(Table 1, right column). In the context of
this paper, we will present only the
functional level that could be associated to
the structural molecular level: that of
interacting molecules (including
macromolecular complexes) and genetic
networks.
When interactions are considered as a
whole (a network), the complexity of the
system is directly related to the number of
speci®c interactions. Molecular
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
43
Jacq
proteomic
interactions maps
interactions, ie direct physical interactions
involving DNA, RNA and proteins, play
an essential role in all known biological
processes. Three major types of
interactions account for the great majority
of known biological macromolecular
interactions: these are the protein±DNA,
protein±RNA and protein±protein
interactions. Several such direct
interactions then form complex genetic
networks that are capable of responding to
both external stimuli and stresses, as well as
to internal changes occurring within
components of the network. One might
imagine genetic networks as a form of a
molecular nervous system: they have a
functional role at the level of a cell similar
to that of a nervous system at the level of
an organism. Being able to describe
interactions and networks formally, to
query and manipulate them are now
largely recognised as essential to the study
of gene regulation and function. As an
important step towards the construction of
a uni®ed and physiological view of living
organisms, several groups are developing
mathematical models for the simulation of
network behaviour;14±17 see Smolen
et al. 18 and Von Dassow et al.19 for recent
reviews). Recently, some of these
theoretical developments have started to
receive practical con®rmation when small
networks have been engineered and their
predictable behaviour experimentally
tested in prokaryotes.20±22
Functional interaction maps of
the genome
As a ®rst step towards the study of genetic
networks lies the need to have lists of
functional links (experimentally
determined) between speci®c proteins,
genes and RNAs. Classical and highthroughput methods have produced a
large amount of data on interactions and a
small part of the data is already present in
specialised databases such as DIP,23
KEGG,24 FlyNets, 25,26 GeNet 27 and
YPD.11 However, the great majority of
interaction data can presently be found in
published literature only. Developing
powerful tools to extract speci®c scienti®c
44
information from texts (exploring the
`textome') will be strategic to help
database development and annotation,
and this is now an active bioinformatics
research domain.28,29
Amassing lists of interactions is only
the ®rst step in the establishment of
gene functional interaction maps.
Interactions are extremely dynamic in
nature and some important parameters
are: (i) the duration of the interaction;
(ii) the developmental stage at which the
interaction occurs (in metazoans); (iii)
the cell/tissue localisation of the
interaction; (iv) the post-transcriptional
status of proteins. Interaction maps will
thus be far from static, and for instance,
a map drawn for muscle cells in the
developing embryo is likely to be
somewhat different from one derived
from adult pancreas.
Graphical representations of protein±
protein or protein±DNA interactions
already exist (see for instance the KEGG
or the GeNet database, Table 2). Among
different technical possibilities to
represent interactions graphically, one of
the most intuitive ones is the 2D (or
matrix) interaction map. In a theoretical
example, it is suggested that by using
appropriate image analysis or clustering
software, one could extract some
meaningful and simple patterns of
regulation from an otherwise complex
picture of relationships between proteins
and genes: Figure 1A, where an
interaction is speci®ed by ®lling in the
intersecting cell corresponding to the two
partners. Empty cells correspond to an
absence of interaction; empty vertical
lines correspond to a total absence of
protein±DNA interaction for the
corresponding protein (case of an integral
membrane protein, or a protein which
does not regulate any gene of the table for
instance); empty horizontal lanes
correspond to genes which are not
regulated by any protein listed in the
table. More precisely, patterns
corresponding to: (i) three genes with
common regulators (panel B), (ii) ®ve
proteins regulating common sets of genes
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
Table 2: Protein interaction resources on the Web
Name
Interaction type
URL
DIP
(Database of Interacting Proteins)
Protein±protein
http://dip.doe-mbi.ucla.edu
FlyNets
(Gene interactions in the ¯y)
Protein±protein, protein±DNA,
protein±RNA
http://gifts.univ-mrs.fr/FlyNets
GeNet
(Gene Networks database)
Protein±DNA, protein±protein
http://www.csa.ru:85/Inst/gorb_dep/inbios/genet/
genet.htm
Transfac
(Tanscription factor database)
Protein±DNA
http://transfac.gbf-braunschweig.de/TRANSFAC/
index.html
FlyBase
(a database of the Drosophila genome)
Genetic interactions
http://¯y.ebi.ac.uk:7081
KEGG
(the Kyoto Encyclopedia of Genes and Genomes)
Regulatory pathways
http://www.genome.ad.jp/kegg
STKE
(Signal Transduction Knowledge Environment)
Regulatory pathways
http://www.stke.org
SWISS-PROT
(The SWISS-PROT database)
Protein±protein, protein±DNA,
Protein±RNA
http://www.expasy.ch/sprot/sprot-top.html
YPD, PombePD and WormPD
(Proteome databases)
Protein±protein, protein±DNA
http://www.proteome.com/databases/index.html
Some selected WWW resources describing protein interactions with DNA, RNA or proteins are listed. The ®rst four databases are essentially devoted to
interactions, whereas the rest contain data on interactions as well as many other types of data
functional
comparisons of
proteins
(C) and (iii) autoregulated genes (D)
could be revealed.
Present data sets for direct protein±
DNA interactions are not yet large
enough to test the idea on a real example.
When suf®cient data are available, such
interaction maps could then be drawn at a
genomic scale. In Drosophila for instance,
for which more than 500 transcription
factor genes have been identi®ed at the
genome sequence level (data compiled on
the BDGP site),30 a transcriptional
regulation 2D interaction map would
have approximately 500 proteins
3 14,000 genes ˆ 7,000,000 cells (out of
which only a small subset will be active in
any given cellular type).
In contrast, public web data for direct
protein±protein interactions have now
attained the minimal size allowing real test
experiments to be made (at least in the
case of the yeast Saccharomyces cerevisiae).
An example of analysing real protein±
protein interaction patterns is described in
the last section.
COUNTING MOLECULAR
INTERACTIONS: A TOOL
FOR FUNCTIONAL
COMPARISONS BETWEEN
PROTEINS
As soon as the ®rst protein sequences
became available, biologists tried to
compare them and progressively
introduced useful measures for this
purpose (identity and similarity
percentages, Z-scores and BLAST
scores). At the secondary and tertiary
structure levels, also, methods were
devised to compare protein structures.
Very often, sequence and structural
comparisons are used to infer functional
relationships between proteins. Although
inferring functional predictions from
structural comparisons can lead to useful
and testable hypotheses, it remains a risky
exercise, which could lead to wrong
conclusions.31
De®ning new ways of comparing
proteins from a functional point of view
would therefore be a very desirable goal.
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
45
Jacq
Figure 1: The concept of molecular interaction 2D maps applied to protein±DNA
interactions. Panel A shows a theoretical example of a protein±DNA interaction map. Proteins
are listed on horizontal lines (a, b, c, . . .), and corresponding genes on vertical lines (A, B, C,
. . .). Panels B, C and D show in black three examples of patterns extracted from the rest of all
interactions (in light grey)
Considering proteins as individual
members of an immense network, in
which each protein has a ®nite number
of interactions with several other speci®c
molecular partners, could represent a
new and (as far as we know) an as yet
unexplored means by which to compare
proteins at a functional level. Basically,
the idea is not to compare proteins
themselves but instead to compare the
list of their partners: the more
interacting partners two proteins have in
common, the more these proteins are
likely to be functionally related. Let us
for instance consider three proteins A,
B, C, each of them establishing 30
46
speci®c interactions (experimentally
determined) with other protein partners.
If A and C, B and C, and A and B have
respectively 25, 13 and 2 common
interactors, it seems intuitively
reasonable to conclude that A and C are
highly functionally related, that B and C
share at least some functions and that A
and B are probably not functionally
related (or only marginally so).
The feasibility of this idea has been
tested on a set of real protein±protein
interactions from S. cerevisiae. Fourteen
chromosomal proteins for which
interaction data were available were
extracted from the YPD database, as well
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
protein interaction
patterns
as their speci®c interactors. A 2D
interaction map was used to represent the
results that appear graphically as PIPs
(Protein Interaction Patterns) in Figure 2.
The 14 proteins were grouped into four
clusters, based on their PIPs, the fourth
one (D) containing proteins that appear as
non-functionally related, using an
interaction criteria. The three other
clusters, on the contrary, contain proteins
with different levels of functional identity.
These 14 proteins (named reference
proteins) de®ne 14 vertical columns and
all their protein partners (named
interactors) de®ne 63 horizontal rows in
which proteins are arranged in
alphabetical order. Intersecting cells are
®lled in when a speci®c protein±protein
interaction exists between a reference and
an interactor. All protein names are YPD
names. Chromosomal reference protein
establishes between 4 and 21 interactions,
10 proteins having 9 to 19 interactors.
Out of 63 horizontal rows, 42 contain 2
to 7 ®lled cells, meaning that a majority of
interactors are shared by 2 to 7 members
from the reference set of 14 proteins. For
the sake of clarity, results have been
clustered according to vertical interaction
patterns, thus grouping together reference
proteins with similar sets of interactors.
Four main clusters are visible,
corresponding respectively to Hhf1,
Hhf2, Hht1 and Hht2 proteins (cluster
A), Med2, Med4 and Pgd1 (cluster B),
Sir3, Sir4, Tup1 and Ssn6 (cluster C) and
Mig1, Sir2 and Alpha2 (cluster D).
Cluster A groups 4 reference proteins
with a comparable number of interactors
(12 to 14) and out of a maximum of 14
interactors, 9 are completely shared by the
4 reference proteins (64 per cent
functional identity). Cluster A can then be
split up into 2 sub-clusters, A1 and A2,
with 13 and 11 common interactors
respectively. The 3 proteins of cluster B
have 13 common interactors out of a
maximum of 20 partners). Finally, within
cluster C, Sir3 and Sir4 exhibit 10
common interactors out of a maximum of
15 partners, whereas Ssn 6 does not share
any common interactor with them, but
Tup1 exhibits 4 common interactions
with both Sir3 and Sir4 and 5 with Ssn6.
Cluster D contains 3 proteins which do
not seem to be functionally related using
our representation, since out of 15
interaction cells for them, only two are
present on one single horizontal line.
It is interesting to try to correlate the
above observations with biological
knowledge on the members of the
functional interaction clusters. Cluster A
is composed of histone sequences only
and subclusters A1 and A2 (two genes
each) represent one single protein each
(Hhf1 and Hhf2 code for histone H4 and
Hht1 and Hht2 for histone H3). Identical
proteins of clusters A1 and A2 appear
almost functionally similar (92.8 and
92.3 per cent functional identity
respectively). Interestingly, unrelated
sequences exhibit functional similarities:
histone H3 and H4 do not display any
sequence similarity but appear
functionally related through a PIP analysis
(9/14 common interactors or 64 per cent
functional identity). Analysis of cluster B
(Med2, Med4 and Pgd1) leads to the same
conclusion and reveals clear functional
resemblances in the absence of sequence
similarities. However, some interaction
differences could potentially indicate
functional subclasses within a same
generic function. Finally, the four
proteins in cluster C exhibit a fourth
interesting type of interaction pattern:
Sir3 and Sir4 appear functionally related
(10 common interactors out of a
maximum of 15 partners) whereas Ssn6
does not seem to have any direct
functional relationship with them.
However, when Tup1 is introduced in
the comparison, it exhibits four common
interactions with Sir3 and Sir4 and ®ve
common interactions with Ssn 6. Tup1
therefore appears equally related to two
groups that are not otherwise functionally
related. Again, it has to be noted that no
proteins of cluster C show any sequence
similarity.
Although these results are still
preliminary and have to be extended to
other yeast proteins and to other organisms
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
47
Jacq
as well, the method appears promising to
reveal functional resemblances that will
not be detected by sequence comparison
programs. It is also generic since it could
be applied to any type of protein as soon as
experimental interaction data are available.
Moreover, it could also potentially be used
for functional evolutionary comparisons
between different organisms when lists of
orthologous proteins are available. It is
presently limited by the amount of
interaction data available but could
become increasingly useful as data from
systematic interaction screens, such as
wide-scale two-hybrid screens,32 are
obtained.
CONCLUSION
Figure 2: Protein±protein interaction 2D map of a set of yeast
chromosomal proteins. 14 chromosomal proteins with at least 4
identi®ed protein interaction partners were selected from the YPD
database (see Table 2 for the URL)
48
Although it seems quite intuitive, the
concept of the function of a gene or a
protein is not so simple and
straightforward, and this point has been
discussed using several examples. In
biology, the function of an object (a
molecule, a cell or an organ) is always
associated to structural aspects, and genes/
proteins are not an exception to this rule.
It has been concluded herein that a single
structural level is not suf®cient to describe
the various aspects of gene/protein
function and it is proposed that six
structural levels (from the molecule to the
population of organisms) have to be taken
into account for a full functional
description. Furthermore, it is advocated
that a speci®c functional level has to be
associated to each structural level and the
molecular network level that could be
associated to the molecular level was
taken as an example.
Whatever the prokaryotic or
eukaryotic organism under consideration,
until now biologists have essentially
adopted a gene-by-gene approach.
Although many different experimental
results were often obtained on the same
organisms, the same tissue and with the
same experimental conditions, they were
quite dif®cult to integrate, as one might
imagine would occur if different people
worked on isolated pieces of a giant
jigsaw puzzle. We advocate that studying
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
Protein function
protein function within the new
conceptual frame of a network of
interacting molecules could help to put
the pieces together and to obtain an
integrated view of the biological
phenomena under scrutiny.
In order to study the structure and
function of regulatory networks, new tools
have to be developed and 2D interaction
maps are one of these tools. Moreover, it
has been proposed that interaction maps
could also be used to quantify functional
relationships between proteins, allowing
new types of classi®cations to be made,
which could nicely complement
comparisons made on a structural basis.
Genome projects have shown that
there are no striking differences in the
number of genes between organisms with
very different organisational complexities:
Drosophila has only two to three times
more genes than the unicellular yeast and
seems to have fewer genes than the
nematode, although its anatomy and
behaviour are far more complex. Clearly,
the absolute number of genes does not
seem to be an essential determinant of
biological complexity. Rather, it could be
that the number of interactions between
genes and the structure of the regulatory
network that they establish plays a more
important role. Studying functional
relationships at the genome level
(regulomics) is a new frontier in the postgenome era, which will probably allow
important progress in the understanding
of protein function throughout evolution.
Acknowledgements
It is a pleasure to thank Laurent Fasano for
discussions, Laurence RoÈder and Denis Thieffry for
discussions and constructive comments on the
manuscript and Kim Dale for English corrections.
A preliminary version of this work has been
presented at the HUGO workshop on gene
function databases (Cambridge, May 1999; D.
Davidson, organiser). My work is supported by the
CNRS genome programme.
References
1.
Romaniuk, P. J. (1985), `Characterization of
the RNA binding properties of transcription
factor IIIA of Xenopus laevis oocytes', Nucleic
Acids Res., Vol. 25, pp. 5369±5387.
2.
Rivera-Pomar, R., Niessing, D., Schmidt-Ott,
U. et al. (1996), `RNA binding and
translational suppression by bicoid', Nature,
Vol. 379, pp. 746±749.
3.
Perrin, L., Romby, P., Laurenti, P. et al.
(1989), `The Drosophila modi®er of variegation
modulo gene product binds speci®c RNA
sequences at the nucleolus and interacts with
DNA and chromatin in a phosphorylationdependent manner', J. Biol. Chem., Vol. 274,
pp. 6315±6323.
4.
Karp, P. D. (2000), `An ontology for biological
function based on molecular interactions',
Bioinformatics, Vol. 16, pp. 269±285.
5.
Karp, P. D., Riley, M., Paley, S. M. et al.
(1999), `Eco Cyc: Encyclopedia of Escherichia
coli genes and metabolism', Nucleic Acids Res.,
Vol. 27, pp. 55±58.
6.
Ashburner, M., Ball, C. A., Blake, J. A. et al.
(2000), `Gene ontology: Tool for the
uni®cation of biology', Nat. Genet., Vol. 25,
pp. 25±29.
7.
Hamosh, A., Scott, A. F., Amberger, J.
et al. (2000), `Online Mendelian Inheritance in
Man (OMIM)', Human Mutat., Vol. 15, pp.
57±61.
8.
Botas, J. and Auwers, L. (1996), `Chromosomal
binding sites of Ultrabithorax homeotic
proteins', Mech. Dev., Vol. 56, pp. 129±138.
9.
Saenz-Robles, M. T., Maschat, F., Tabata, T.
et al. (1995), `Selection and characterization of
sequences with high af®nity for the engrailed
protein of Drosophila', Mech. Dev., Vol. 53, pp.
185±195.
10. Thieffry, D., Huerta, A. M., Perez-Rueda, E.
and Collado-Vides J. (1998), `From speci®c
gene regulation to genomic networks: A global
analysis of transcriptional regulation in
Escherichia coli', Bioessays, Vol. 20, pp.
433±440.
11. Costanzo, M. C., Hogan, J. D., Cusick, M. E.
et al. (2000), `The yeast proteome database
(YPD) and Caenorhabditis elegans proteome
database (WormPD): Comprehensive
resources for the organization and comparison
of model organism protein information',
Nucleic Acids Res., Vol. 28, pp. 73±76.
12. Legrain, P. and Selig, L. (2000), `Genomewide protein interaction maps using twohybrid systems', FEBS Lett., Vol. 480, pp.
32±36.
13. Arnone, M. I. and Davidson, E. H. (1997),
`The hardwiring of development: organization
and function of genomic regulatory systems',
Development, Vol. 124, pp. 1851-1864.
14. Thomas, R. (1973), `Boolean formalization of
genetic control circuits', J. Theor. Biol., Vol.
42, pp. 563±585.
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001
49
Jacq
15. Thomas, R, Thieffry, D. and Kaufman, M.
(1995) `Dynamical behaviour of biological
regulatory networks ± I. Biological role of
feedback loops and practical use of the concept
of the loop-characteristic state', Bull. Math.
Biol., Vol. 57, pp. 247±276.
16. Hlavacek, W. S. and Savageau, M. A. (1996),
`Rules for coupled expression of regulator and
effector genes in inducible circuits', J. Mol.
Biol., Vol. 255, pp. 121±139.
17. Sharp, D. H. and Reinitz, J. (1998),
`Prediction of mutant expression patterns using
gene circuits', Biosystems, Vol. 47, pp. 79±90.
18. Smolen, P., Baxter, D. A. and Byrne, J. H.
(2000), `Mathematical modeling of gene
networks', Neuron, Vol. 26, pp. 567±580.
19. Von Dassow, G., Meir, E., Munro, E. M. and
Odell, G. M. (2000), `The segment polarity
network is a robust developmental module',
Nature, Vol. 406, pp. 188±192.
20. Elowitz, M. B. and Leibler, S. (2000), `A
synthetic oscillatory network of transcriptional
regulators', Nature, Vol. 403, pp. 335±338.
21. Gardner, T. S., Cantor, C. R. and Collins, J. J.
(2000), `Construction of a genetic toggle
switch in Escherichia coli', Nature, Vol. 403, pp.
339±342.
22. Becskei, A. and Serrano, L. (2000),
`Engineering stability in gene networks by
autoregulation', Nature, Vol. 405, pp.
590±593.
23. Eisenberg, D., Rice, D. W. and Xenarios, I.
(1998), unpublished;
URL: http://dip.doe-mbi.ucla.edu/
24. Ogata, H., Goto, S., Sato, K. et al. (1999),
`KEGG: Kyoto Encyclopedia of Genes and
Genomes', Nucleic Acids Res., Vol. 27, pp.
29±34.
50
25. Mohr, E., Horn, F., Janody, F. et al. (1998),
`FlyNets and GIF-DB, two internet databases
for molecular interactions in Drosophila
melanogaster', Nucleic Acids Res., Vol. 26, pp.
89±93.
26. Sanchez, C., Lachaize, C., Janody, F. et al.
(1999), `Grasping at molecular interactions and
genetic networks in Drosophila melanogaster
using FlyNets, an Internet database', Nucleic
Acids Res., Vol. 27, pp. 89±94.
27. Serov, V. N., Spirov, A. V. and Samsonova,
M. G. (1998), `Graphical interface to the
genetic network database GeNet',
Bioinformatics, Vol. 14, pp. 546±547.
28. Blaschke, C., Andrade, M. A., Ouzounis, C.
and Valencia, A. (1999), `Automatic
extraction of biological information from
scienti®c text: protein±protein interactions', in
`Proceedings of the 7th International
Conference on Intelligent Systems for
Molecular Biology', AAAI Press, Menlo Park,
CA, pp. 60±67.
29. Craven, M. and Kumlien, J. (1999),
`Constructing biological knowledge bases by
extracting information from text sources', in
`Proceedings of the 7th International
Conference on Intelligent Systems for
Molecular Biology', AAAI Press, Menlo Park,
CA, pp. 77±86.
30. http://www.fruit¯y.org/annot/menus/
transcription_factor.html
31. Devos, D. and Valencia, A. (2000), `Practical
limits of function prediction', Proteins, Vol. 41,
pp. 98±107.
32. Fromont-Racine, M., Rain, J.C. and
Legrain, P. (1997), `Toward a functional
analysis of the yeast genome through
exhaustive two-hybrid screens', Nat. Genet.,
Vol. 16, pp. 277±282.
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001