* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein function from the perspective of molecular interactions and
Survey
Document related concepts
Signal transduction wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Protein structure prediction wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Proteolysis wikipedia , lookup
List of types of proteins wikipedia , lookup
Transcript
Bernard Jacq is a CNRS researcher and project leader in the LGPD developmental biology laboratory of Marseilles. He is a molecular biologist and bioinformatician and his present area of research is the study of structure, function, evolution and bioinformatics of developmental regulatory networks. Protein function from the perspective of molecular interactions and genetic networks Bernard Jacq Date received (in revised form): 5th December 2000 Abstract Keywords: protein function, genetic networks, molecular interactions, interaction maps, regulomics, functional classi®cations Protein function is a complex notion, which is now receiving renewed attention from a bioinformatics and genomics perspective. After a general discussion of the principles of experimental methods employed to decipher gene/protein function, the contributions made by new, high-throughput methods in terms of function discovery are discussed. Recent work on functional ontologies and the necessity to describe function within the context of hierarchical levels of complexity are presented. The concepts of molecular interactions and genetic networks are then discussed, leading to a useful new framework with which to describe protein function using new tools such as 2D interaction maps. Finally, it is proposed that interaction data could be used to develop new methods for the functional classi®cation of proteins. An example of functional comparisons on a real data set of yeast chromosomal proteins is presented. INTRODUCTION Bernard Jacq, Laboratoire de GeÂneÂtique et Physiologie du DeÂveloppement, IBDM, Parc Scienti®que de Luminy, Case 907, 13288 Marseille Cedex 9, France Tel: (33) 04 91 26 96 00 Fax: (33) 04 91 82 06 82 38 The term `gene function' (or `protein function') is certainly one of the most widely used in biology. It is nevertheless, and unfortunately so, the one for which there is probably the most severe lack of a common accepted de®nition. Let us take an example of the diversity of what biologists term the function of a macromolecule in a living organism: if a protein crystallographer on one hand and a geneticist on the other describe the function of a given protein (X), it is highly likely that there will be no overlap between their two descriptions. In the ®rst case, it might be said that the relative orientation and distance between three speci®c amino acid residues in the structure are crucial for the enzymatic function of protein X as a peptidyl hydrolase. In the second case, it might be said that the lack of function of protein X (as found in a null mutant of the X gene) will lead to a speci®c developmental defect in the ®rst hours of embryonic development. Both statements are indeed clearly related to the question of the function of protein X: they could both be scienti®cally true at the same time, but they uncover two different levels of function description. One is a biochemical view of the function of protein X at the molecular level, whereas the other describes the function of the same protein at the level of an entire organism. Therefore, the two descriptions are completely different, but each of them correctly describes a part of the `complete' function of protein X. Several critical issues are associated with the re-examination of gene (protein) function from a genomics and bioinformatics view. Some of these issues are discussed here; four points in particular. · The ®rst one is related to experimental methods used to study gene function. Classical methods used to study the function of a gene or a protein, which are based on the functional perturbation principle, are brie¯y & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function discussed. How genomics are now providing new complementary ways to decipher gene/protein function is mentioned. · A second issue is the lack of a common de®nition of gene function, as already illustrated above. This issue is important, both from a biological point of view and also for practical reasons such as the functional annotation of genomes for instance. Some of the recent work performed in the ontology ®eld is presented, where generic and standardised ways for describing gene function are being developed. Biological means by which gene function could be described will also be discussed. the functional perturbation principle · A third issue relates to gene function in the context of genetic networks. After 30 years of experimental reductionism, where genes have been largely examined individually, functional genomics approaches (micro-arrays, two-hybrid screens) are showing that genes behave as groups, which can share a common type of regulation or whose products share common direct protein interactors. Examples of networks of genes are emerging and we will discuss the biological importance of describing gene function within a conceptual frame of a genetic network. · Finally, the possibility that interaction data could be used to develop methods for functional comparisons of proteins will be presented on a yeast protein interaction data set. Such methods would be an extremely useful complement to the present structural (mainly sequence-based) comparison methods. EXPERIMENTAL METHODS TO DECIPHER GENE/ PROTEIN FUNCTION A detailed description of concepts and methods that have been developed to study gene and then protein and RNA function is clearly beyond the scope of the present study. Rather, some general principles that underlie the experimental approach of function analysis and some of the lessons learnt from many years of gene function studies are discussed, since these have practical consequences when describing gene function in terms of a text or a database. Some general principles behind functional studies A ®rst general principle of functional studies (which goes far beyond biology) is the functional perturbation principle: in order to approach the normal function of an unknown system, the best way is probably to study what happens when this system is subjected to abnormal function. Genetics has made an abundant use of this principle through the examination of phenotypes at various structural/ functional levels (molecule, cell, tissue, organism, physiology and population). At the basis of genetics is the study of many variants (alleles) for one gene. Establishing the relationships that exist between the structure of different variants (speci®c genotypes) and the resulting observable characters (speci®c phenotypes) at different structural levels provides a series of experimental observations that are invaluable in deciphering gene function. Many other biological sciences such as biochemistry, cell biology or physiology also make a wide use of the functional perturbation principle. Therefore, the greater part of our present knowledge on protein function has been obtained from the observation of situations in which proteins were themselves abnormal (mutated) or put in an abnormal context in vivo or in vitro. This means of studying the function of a protein product has some practical consequences. The ®rst one lies in the names given to many genes, which re¯ect more a dysfunction than the normal function: for instance, the Drosophila gene responsible for the normal colour of the eye has been named `white', although the normal eye & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 39 Jacq protein function is pleiotropic 40 colour is red. This name was adopted after a particular mutant phenotype that abolishes eye colour (such that the eye then looks white instead of red). Other Drosophila gene names, such as wingless, Distalless and Multiple sex combs, are clear examples of abnormal rather than normal gene function, and this could be misleading to non-specialists. In fact, studying the function(s) of a gene or a protein without introducing any perturbation at any level is extremely dif®cult, because a function, in contrast to a structure, is not an object and can be studied only through its effect upon other objects. In other words, we do not study the function directly, but the effect it produces on observable biological structures. A second practical consequence is that a function cannot be completely understood in a single pass of experiments. De®ning a function is a progressive process that ideally requires techniques from different biological sciences, and the more experiments performed, the more we learn about the function. These considerations demonstrate why knowledge of the function of a given protein is largely dispersed and is published in different papers and journals by different authors, precluding a uni®ed view of protein function. Even at the database level, where some information synthesis work could be done, all functional information on a given protein is rarely found in a single database. A second general principle, which has become apparent from results accumulated over many years, is that the function of a protein is generally pleiotropic in at least two respects: ®rst, within a given organisational level (molecular, cellular), a protein often has more than one function: at the molecular level for instance, we have several examples of DNA-binding proteins which are also RNA-binding proteins: Xenopus TFIIIA,1 Drosophila bicoid2 and modulo3 proteins for instance; second, it is rare that the function of a protein can be described at one structural level and that no other observable function(s) are found at any other level: the Drosophila bicoid protein is an RNA-binding protein and a transcription factor at the molecular level, and it is an essential determinant of the formation of anterior structures (head, thorax) at the level of the organism. As will be discussed below, it is therefore important to examine functions of proteins within a structural level framework and in any case, it is always necessary to specify at which level the function is being examined. In conclusion, it is probably more accurate to speak about `the functions' rather than `the' function of a gene or a protein. Contribution of classical and high-throughput methods to functional studies Classical methods of studying gene/ protein function have several interesting characteristics: a large spectrum of methods is available, and many different methods exist for each structural level of integration; many methods produce quanti®able results allowing (to a certain extent) a comparison of different proteins using the same functional test; the combination of results from different methods applied to the same protein produce a rich source of information and this explains how we now have accumulated detailed knowledge of the functions of several hundred proteins. On the other hand, there are some drawbacks associated with these methods: no generic functional test is available that could be applied to all uncharacterised proteins; classical methods have not always been performed using standardised protocols, so that direct comparisons of functional results obtained in different laboratories are not generally straightforward; ®nally, the search of a function always gives an incomplete answer because not all possibilities are ever investigated in totality (for instance, the search for regulators of a given gene is often practically restricted to the most obvious candidates). & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function ontologies are useful to describe gene function The arrival of genomics in the context of available methods as a means to decipher gene function now offers new possibilities that complement classical methods. For instance, if one is looking at genes under the control of a given regulatory protein in one species, it is theoretically possible, if a complete gene micro-array is available, to compare RNAs extracted from a wildtype specimen to RNAs from a loss-offunction and/or over-expression mutant of the regulatory gene, and subsequently to score all genes whose transcription status has changed. Of course, it remains to be determined whether the observed regulations are direct or indirect, but obtaining complete lists of regulated targets for many different genes is now potentially possible. Two of the most interesting aspects of such genomic approaches are that no hypothesis has to be made ®rst (this marks a switch from hypothesis-driven to results-driven experiments) and that a large (or even complete) view of one precise aspect of the function of a gene is attainable. Also, since a high-throughput experiment can be considered as a massively parallel one, individual results can be satisfactorily compared, experimental conditions being the same for all obtained data points. The main problem associated with functional genomics approaches is that, at the moment, only a few functional experiment types have been scaled up to a high-throughput status: RNA expression quanti®cation (micro-arrays and DNA chips), protein±protein interactions (double-hybrid) and in situ RNA hybridisation to a lesser extent are the best known examples. It is probable that in the near future, other types of functional experiments (protein or antibodies microarrays for instance) will be developed and applied on a genome-wide basis, allowing new knowledge to be obtained rapidly. In the third section of the paper, we will return to the use of genomic methods and discuss function in the context of large genetic networks. FUNCTIONAL DESCRIPTIONS FOR MACROMOLECULES Functional ontologies Karp4 discusses the concept of biological function diversity and introduces the concepts of local function and integrated function that he applied essentially to prokaryotic organisms in the EcoCyc database.5 In the example provided in the introduction, the biochemical function would be an instance of a local function, whereas the genetic function would be an instance of an integrated function. As far as eukaryotic organisms are concerned, the GO Consortium6 has produced a structured controlled vocabulary that aims to describe the roles of gene products in any organism. To this end, it has produced three independent ontologies (the molecular function, biological process and cellular component ontologies), based on biological knowledge accumulated for the yeast Saccharomyces cerevisiae, the ¯y Drosophila melanogaster and the mouse Mus musculus. These attempts represent useful steps towards encoding functional data in databases. It has to be noted that both in the prokaryotic EcoCyc database (for evident reasons) and also in the GO database (developed for a generic eukaryotic cell), functions occurring at the upper levels of tissue, organ, entire organism or population of organisms are not represented. This is an important limitation that will make any functional description of phenotypes encountered in developmental defects or physiological abnormalities for multicellular organisms very dif®cult. For instance, making functional links between sequence databases and the OMIM database (a catalogue of human genes and genetic disorders7 ) will probably prove to be dif®cult in many instances with the present ontologies. It would therefore be interesting to extend views developed in EcoCyc or GO ontologies in order to get a more complete description of functions of macromolecules in all types of & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 41 Jacq organisms. In this respect, a ®rst question is: what are the different levels of biological integration at which the function of a protein could be tentatively de®ned? Examining function in the context of a hierarchy of structural levels protein function is a hierarchy of structural levels One possible way of addressing this question is to start at the molecular level and zoom out in a stepwise fashion in order to examine the different nested levels of biological integration from a structural point of view. In so doing, one can ®nd at least six natural levels of increasing biological complexity (Table 1, left column). These six structural levels can be de®ned by speci®c components, structures, processes and concepts that are not valid in other levels. Furthermore, each of these levels often represents one traditional biological discipline such as biochemistry, cellular biology, developmental genetics, physiology and anatomy. Not all six levels are necessary for a functional description of every protein, but when many different proteins are described, all levels will be used. Also, they are not necessary for every organism: when studying prokaryotic organisms, the tissue±organ level is useless and the cell and organism levels represent in fact the same level. Prokaryotes can therefore be studied at four different levels only and six levels are necessary to describe the variety Table 1: Structural and functional levels in biological organisation. The six structural and corresponding functional levels of biological organisation proposed to serve as a framework for functional descriptions are listed in increasing order of organisational complexity Structural levels Functional levels Molecules Molecular complexes, interaction networks Subcellular structures Cell traf®cking Cells Cell migrations, intercellular communications Tissues, organs Physiological regulations Organisms Behaviour Populations Interspecies relationships, ecological equilibria 42 of structures observed in eukaryotic organisms (with the exception of monocellular organisms such as yeasts or protists). Recognising the importance of function description at each of these six structural levels would have several advantages: · These levels represent different biological realities which are quite natural to people working with eukaryotes and taking them into account will obviate dif®culties encountered with arti®cial classi®cations. · As already stated, these six levels correspond to different biological sciences that have developed a speci®c vocabulary, concepts and experimental methods. Ignoring some of these levels or trying to fuse some of them into a single level is likely to produce lack of consistencies or even errors which could be detrimental to a complete description of gene function. · We are totally ignorant of the biological laws allowing one to infer knowledge at one structural level from what is known in another one. For instance, trying to infer the cellular type of a eukaryotic cell (muscular, nervous and endodermal) from its detailed proteomic content is presently not possible (but seems an attainable goal in the future). Another classical example is to attempt to predict a phenotype from a molecular defect in a gene (®lling in the genotype±phenotype gap), which is also nearly impossible now. We believe that understanding the biological level transition laws will ®rst require a precise description of each level in structural and functional terms and that the more levels described, the better. We therefore advocate that analyses at several structural levels are necessary to fully describe all known subtleties in & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function eukaryotic function, and that ontologies encompassing these levels have to be created. As a working hypothesis, we propose six structural levels, namely the molecular (biochemical), subcellular, cellular, tissue/organ, organism and population levels (Table 1). In the second part of the next section, we describe how function may be described with respect to the different structural levels. protein interactions exceed the number of proteins FUNCTIONAL DESCRIPTIONS OF PROTEINS Are all proteins capable of establishing the same number of different interactions? During the course of its biological life, from its synthesis until its degradation, any protein interacts with other partners in order to perform its function(s). To date, we have no precise idea of the number of existing functional interactions either at the level of individual proteins or that of an entire proteome. Different proteins may have different numbers of interactors as a consequence of different intrinsic characteristics such as: (i) their size; (ii) their half-life; (iii) the speci®city of their binding sites; (iv) their structure (number and nature of domains); (v) their subcellular locations and tissue/organ distribution; (vi) the type of organism in which they are found. Indeed, at the experimental level, we have many examples of the variation range of interaction for individual proteins, including cases of proteins with very few different interacting partners, as well as proteins that can be engaged in several hundreds of interactions. In Drosophila, for instance, the transcription factors Ultrabithorax and engrailed have been shown to have around 100 binding sites on polytene chromosomes.8,9 At the other side of the scale, it seems that many bacterial regulators have a high speci®city and interact with one gene only.10 In the case of protein±protein interactions, browsing databases such as YPD11 illustrate examples of proteins with more than 20 identi®ed partners and others with only 1 or 2. However, a major dif®culty is to assess if this variation is real or partly re¯ects our incomplete knowledge at present, some proteins having been studied in far more detail than others. At the proteome level, it seems clear that the number of interactions in a cell largely exceeds the number of different proteins. Present minimal estimations of the number of different protein±protein interactions in yeast, based on two-hybrid screens, are in the range of 36,000 for approximately 6,300 proteins.12 Estimations of the number of different transcriptional regulators per metazoan gene are in the 7±8 range,13 which, in Drosophila, would lead to around 110,000 protein±DNA transcriptional interactions for around 14,000 genes. If additional factors that may increase protein diversity (alternative splicing, post-translational modi®cations) are taken into account, interactions could easily attain the millions range in a representative metazoan such as Drosophila. More realistic evaluations of the size of the `interaction universe' (the interactome or regulome) must await a precise determination of the number of partners for a set of representative proteins in different functional classes. Molecular interactions and genetic networks We previously discussed the advantages of describing protein function in the context of a hierarchy of structural levels. How may this be achieved? We propose that a functional level is associated to each of the six structural levels described previously (Table 1, right column). In the context of this paper, we will present only the functional level that could be associated to the structural molecular level: that of interacting molecules (including macromolecular complexes) and genetic networks. When interactions are considered as a whole (a network), the complexity of the system is directly related to the number of speci®c interactions. Molecular & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 43 Jacq proteomic interactions maps interactions, ie direct physical interactions involving DNA, RNA and proteins, play an essential role in all known biological processes. Three major types of interactions account for the great majority of known biological macromolecular interactions: these are the protein±DNA, protein±RNA and protein±protein interactions. Several such direct interactions then form complex genetic networks that are capable of responding to both external stimuli and stresses, as well as to internal changes occurring within components of the network. One might imagine genetic networks as a form of a molecular nervous system: they have a functional role at the level of a cell similar to that of a nervous system at the level of an organism. Being able to describe interactions and networks formally, to query and manipulate them are now largely recognised as essential to the study of gene regulation and function. As an important step towards the construction of a uni®ed and physiological view of living organisms, several groups are developing mathematical models for the simulation of network behaviour;14±17 see Smolen et al. 18 and Von Dassow et al.19 for recent reviews). Recently, some of these theoretical developments have started to receive practical con®rmation when small networks have been engineered and their predictable behaviour experimentally tested in prokaryotes.20±22 Functional interaction maps of the genome As a ®rst step towards the study of genetic networks lies the need to have lists of functional links (experimentally determined) between speci®c proteins, genes and RNAs. Classical and highthroughput methods have produced a large amount of data on interactions and a small part of the data is already present in specialised databases such as DIP,23 KEGG,24 FlyNets, 25,26 GeNet 27 and YPD.11 However, the great majority of interaction data can presently be found in published literature only. Developing powerful tools to extract speci®c scienti®c 44 information from texts (exploring the `textome') will be strategic to help database development and annotation, and this is now an active bioinformatics research domain.28,29 Amassing lists of interactions is only the ®rst step in the establishment of gene functional interaction maps. Interactions are extremely dynamic in nature and some important parameters are: (i) the duration of the interaction; (ii) the developmental stage at which the interaction occurs (in metazoans); (iii) the cell/tissue localisation of the interaction; (iv) the post-transcriptional status of proteins. Interaction maps will thus be far from static, and for instance, a map drawn for muscle cells in the developing embryo is likely to be somewhat different from one derived from adult pancreas. Graphical representations of protein± protein or protein±DNA interactions already exist (see for instance the KEGG or the GeNet database, Table 2). Among different technical possibilities to represent interactions graphically, one of the most intuitive ones is the 2D (or matrix) interaction map. In a theoretical example, it is suggested that by using appropriate image analysis or clustering software, one could extract some meaningful and simple patterns of regulation from an otherwise complex picture of relationships between proteins and genes: Figure 1A, where an interaction is speci®ed by ®lling in the intersecting cell corresponding to the two partners. Empty cells correspond to an absence of interaction; empty vertical lines correspond to a total absence of protein±DNA interaction for the corresponding protein (case of an integral membrane protein, or a protein which does not regulate any gene of the table for instance); empty horizontal lanes correspond to genes which are not regulated by any protein listed in the table. More precisely, patterns corresponding to: (i) three genes with common regulators (panel B), (ii) ®ve proteins regulating common sets of genes & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function Table 2: Protein interaction resources on the Web Name Interaction type URL DIP (Database of Interacting Proteins) Protein±protein http://dip.doe-mbi.ucla.edu FlyNets (Gene interactions in the ¯y) Protein±protein, protein±DNA, protein±RNA http://gifts.univ-mrs.fr/FlyNets GeNet (Gene Networks database) Protein±DNA, protein±protein http://www.csa.ru:85/Inst/gorb_dep/inbios/genet/ genet.htm Transfac (Tanscription factor database) Protein±DNA http://transfac.gbf-braunschweig.de/TRANSFAC/ index.html FlyBase (a database of the Drosophila genome) Genetic interactions http://¯y.ebi.ac.uk:7081 KEGG (the Kyoto Encyclopedia of Genes and Genomes) Regulatory pathways http://www.genome.ad.jp/kegg STKE (Signal Transduction Knowledge Environment) Regulatory pathways http://www.stke.org SWISS-PROT (The SWISS-PROT database) Protein±protein, protein±DNA, Protein±RNA http://www.expasy.ch/sprot/sprot-top.html YPD, PombePD and WormPD (Proteome databases) Protein±protein, protein±DNA http://www.proteome.com/databases/index.html Some selected WWW resources describing protein interactions with DNA, RNA or proteins are listed. The ®rst four databases are essentially devoted to interactions, whereas the rest contain data on interactions as well as many other types of data functional comparisons of proteins (C) and (iii) autoregulated genes (D) could be revealed. Present data sets for direct protein± DNA interactions are not yet large enough to test the idea on a real example. When suf®cient data are available, such interaction maps could then be drawn at a genomic scale. In Drosophila for instance, for which more than 500 transcription factor genes have been identi®ed at the genome sequence level (data compiled on the BDGP site),30 a transcriptional regulation 2D interaction map would have approximately 500 proteins 3 14,000 genes 7,000,000 cells (out of which only a small subset will be active in any given cellular type). In contrast, public web data for direct protein±protein interactions have now attained the minimal size allowing real test experiments to be made (at least in the case of the yeast Saccharomyces cerevisiae). An example of analysing real protein± protein interaction patterns is described in the last section. COUNTING MOLECULAR INTERACTIONS: A TOOL FOR FUNCTIONAL COMPARISONS BETWEEN PROTEINS As soon as the ®rst protein sequences became available, biologists tried to compare them and progressively introduced useful measures for this purpose (identity and similarity percentages, Z-scores and BLAST scores). At the secondary and tertiary structure levels, also, methods were devised to compare protein structures. Very often, sequence and structural comparisons are used to infer functional relationships between proteins. Although inferring functional predictions from structural comparisons can lead to useful and testable hypotheses, it remains a risky exercise, which could lead to wrong conclusions.31 De®ning new ways of comparing proteins from a functional point of view would therefore be a very desirable goal. & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 45 Jacq Figure 1: The concept of molecular interaction 2D maps applied to protein±DNA interactions. Panel A shows a theoretical example of a protein±DNA interaction map. Proteins are listed on horizontal lines (a, b, c, . . .), and corresponding genes on vertical lines (A, B, C, . . .). Panels B, C and D show in black three examples of patterns extracted from the rest of all interactions (in light grey) Considering proteins as individual members of an immense network, in which each protein has a ®nite number of interactions with several other speci®c molecular partners, could represent a new and (as far as we know) an as yet unexplored means by which to compare proteins at a functional level. Basically, the idea is not to compare proteins themselves but instead to compare the list of their partners: the more interacting partners two proteins have in common, the more these proteins are likely to be functionally related. Let us for instance consider three proteins A, B, C, each of them establishing 30 46 speci®c interactions (experimentally determined) with other protein partners. If A and C, B and C, and A and B have respectively 25, 13 and 2 common interactors, it seems intuitively reasonable to conclude that A and C are highly functionally related, that B and C share at least some functions and that A and B are probably not functionally related (or only marginally so). The feasibility of this idea has been tested on a set of real protein±protein interactions from S. cerevisiae. Fourteen chromosomal proteins for which interaction data were available were extracted from the YPD database, as well & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function protein interaction patterns as their speci®c interactors. A 2D interaction map was used to represent the results that appear graphically as PIPs (Protein Interaction Patterns) in Figure 2. The 14 proteins were grouped into four clusters, based on their PIPs, the fourth one (D) containing proteins that appear as non-functionally related, using an interaction criteria. The three other clusters, on the contrary, contain proteins with different levels of functional identity. These 14 proteins (named reference proteins) de®ne 14 vertical columns and all their protein partners (named interactors) de®ne 63 horizontal rows in which proteins are arranged in alphabetical order. Intersecting cells are ®lled in when a speci®c protein±protein interaction exists between a reference and an interactor. All protein names are YPD names. Chromosomal reference protein establishes between 4 and 21 interactions, 10 proteins having 9 to 19 interactors. Out of 63 horizontal rows, 42 contain 2 to 7 ®lled cells, meaning that a majority of interactors are shared by 2 to 7 members from the reference set of 14 proteins. For the sake of clarity, results have been clustered according to vertical interaction patterns, thus grouping together reference proteins with similar sets of interactors. Four main clusters are visible, corresponding respectively to Hhf1, Hhf2, Hht1 and Hht2 proteins (cluster A), Med2, Med4 and Pgd1 (cluster B), Sir3, Sir4, Tup1 and Ssn6 (cluster C) and Mig1, Sir2 and Alpha2 (cluster D). Cluster A groups 4 reference proteins with a comparable number of interactors (12 to 14) and out of a maximum of 14 interactors, 9 are completely shared by the 4 reference proteins (64 per cent functional identity). Cluster A can then be split up into 2 sub-clusters, A1 and A2, with 13 and 11 common interactors respectively. The 3 proteins of cluster B have 13 common interactors out of a maximum of 20 partners). Finally, within cluster C, Sir3 and Sir4 exhibit 10 common interactors out of a maximum of 15 partners, whereas Ssn 6 does not share any common interactor with them, but Tup1 exhibits 4 common interactions with both Sir3 and Sir4 and 5 with Ssn6. Cluster D contains 3 proteins which do not seem to be functionally related using our representation, since out of 15 interaction cells for them, only two are present on one single horizontal line. It is interesting to try to correlate the above observations with biological knowledge on the members of the functional interaction clusters. Cluster A is composed of histone sequences only and subclusters A1 and A2 (two genes each) represent one single protein each (Hhf1 and Hhf2 code for histone H4 and Hht1 and Hht2 for histone H3). Identical proteins of clusters A1 and A2 appear almost functionally similar (92.8 and 92.3 per cent functional identity respectively). Interestingly, unrelated sequences exhibit functional similarities: histone H3 and H4 do not display any sequence similarity but appear functionally related through a PIP analysis (9/14 common interactors or 64 per cent functional identity). Analysis of cluster B (Med2, Med4 and Pgd1) leads to the same conclusion and reveals clear functional resemblances in the absence of sequence similarities. However, some interaction differences could potentially indicate functional subclasses within a same generic function. Finally, the four proteins in cluster C exhibit a fourth interesting type of interaction pattern: Sir3 and Sir4 appear functionally related (10 common interactors out of a maximum of 15 partners) whereas Ssn6 does not seem to have any direct functional relationship with them. However, when Tup1 is introduced in the comparison, it exhibits four common interactions with Sir3 and Sir4 and ®ve common interactions with Ssn 6. Tup1 therefore appears equally related to two groups that are not otherwise functionally related. Again, it has to be noted that no proteins of cluster C show any sequence similarity. Although these results are still preliminary and have to be extended to other yeast proteins and to other organisms & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 47 Jacq as well, the method appears promising to reveal functional resemblances that will not be detected by sequence comparison programs. It is also generic since it could be applied to any type of protein as soon as experimental interaction data are available. Moreover, it could also potentially be used for functional evolutionary comparisons between different organisms when lists of orthologous proteins are available. It is presently limited by the amount of interaction data available but could become increasingly useful as data from systematic interaction screens, such as wide-scale two-hybrid screens,32 are obtained. CONCLUSION Figure 2: Protein±protein interaction 2D map of a set of yeast chromosomal proteins. 14 chromosomal proteins with at least 4 identi®ed protein interaction partners were selected from the YPD database (see Table 2 for the URL) 48 Although it seems quite intuitive, the concept of the function of a gene or a protein is not so simple and straightforward, and this point has been discussed using several examples. In biology, the function of an object (a molecule, a cell or an organ) is always associated to structural aspects, and genes/ proteins are not an exception to this rule. It has been concluded herein that a single structural level is not suf®cient to describe the various aspects of gene/protein function and it is proposed that six structural levels (from the molecule to the population of organisms) have to be taken into account for a full functional description. Furthermore, it is advocated that a speci®c functional level has to be associated to each structural level and the molecular network level that could be associated to the molecular level was taken as an example. Whatever the prokaryotic or eukaryotic organism under consideration, until now biologists have essentially adopted a gene-by-gene approach. Although many different experimental results were often obtained on the same organisms, the same tissue and with the same experimental conditions, they were quite dif®cult to integrate, as one might imagine would occur if different people worked on isolated pieces of a giant jigsaw puzzle. We advocate that studying & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 Protein function protein function within the new conceptual frame of a network of interacting molecules could help to put the pieces together and to obtain an integrated view of the biological phenomena under scrutiny. In order to study the structure and function of regulatory networks, new tools have to be developed and 2D interaction maps are one of these tools. Moreover, it has been proposed that interaction maps could also be used to quantify functional relationships between proteins, allowing new types of classi®cations to be made, which could nicely complement comparisons made on a structural basis. Genome projects have shown that there are no striking differences in the number of genes between organisms with very different organisational complexities: Drosophila has only two to three times more genes than the unicellular yeast and seems to have fewer genes than the nematode, although its anatomy and behaviour are far more complex. Clearly, the absolute number of genes does not seem to be an essential determinant of biological complexity. Rather, it could be that the number of interactions between genes and the structure of the regulatory network that they establish plays a more important role. Studying functional relationships at the genome level (regulomics) is a new frontier in the postgenome era, which will probably allow important progress in the understanding of protein function throughout evolution. Acknowledgements It is a pleasure to thank Laurent Fasano for discussions, Laurence RoÈder and Denis Thieffry for discussions and constructive comments on the manuscript and Kim Dale for English corrections. A preliminary version of this work has been presented at the HUGO workshop on gene function databases (Cambridge, May 1999; D. Davidson, organiser). My work is supported by the CNRS genome programme. References 1. Romaniuk, P. J. (1985), `Characterization of the RNA binding properties of transcription factor IIIA of Xenopus laevis oocytes', Nucleic Acids Res., Vol. 25, pp. 5369±5387. 2. Rivera-Pomar, R., Niessing, D., Schmidt-Ott, U. et al. (1996), `RNA binding and translational suppression by bicoid', Nature, Vol. 379, pp. 746±749. 3. Perrin, L., Romby, P., Laurenti, P. et al. (1989), `The Drosophila modi®er of variegation modulo gene product binds speci®c RNA sequences at the nucleolus and interacts with DNA and chromatin in a phosphorylationdependent manner', J. Biol. Chem., Vol. 274, pp. 6315±6323. 4. Karp, P. D. (2000), `An ontology for biological function based on molecular interactions', Bioinformatics, Vol. 16, pp. 269±285. 5. Karp, P. D., Riley, M., Paley, S. M. et al. (1999), `Eco Cyc: Encyclopedia of Escherichia coli genes and metabolism', Nucleic Acids Res., Vol. 27, pp. 55±58. 6. Ashburner, M., Ball, C. A., Blake, J. A. et al. (2000), `Gene ontology: Tool for the uni®cation of biology', Nat. Genet., Vol. 25, pp. 25±29. 7. Hamosh, A., Scott, A. F., Amberger, J. et al. (2000), `Online Mendelian Inheritance in Man (OMIM)', Human Mutat., Vol. 15, pp. 57±61. 8. Botas, J. and Auwers, L. (1996), `Chromosomal binding sites of Ultrabithorax homeotic proteins', Mech. Dev., Vol. 56, pp. 129±138. 9. Saenz-Robles, M. T., Maschat, F., Tabata, T. et al. (1995), `Selection and characterization of sequences with high af®nity for the engrailed protein of Drosophila', Mech. Dev., Vol. 53, pp. 185±195. 10. Thieffry, D., Huerta, A. M., Perez-Rueda, E. and Collado-Vides J. (1998), `From speci®c gene regulation to genomic networks: A global analysis of transcriptional regulation in Escherichia coli', Bioessays, Vol. 20, pp. 433±440. 11. Costanzo, M. C., Hogan, J. D., Cusick, M. E. et al. (2000), `The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): Comprehensive resources for the organization and comparison of model organism protein information', Nucleic Acids Res., Vol. 28, pp. 73±76. 12. Legrain, P. and Selig, L. (2000), `Genomewide protein interaction maps using twohybrid systems', FEBS Lett., Vol. 480, pp. 32±36. 13. Arnone, M. I. and Davidson, E. H. (1997), `The hardwiring of development: organization and function of genomic regulatory systems', Development, Vol. 124, pp. 1851-1864. 14. Thomas, R. (1973), `Boolean formalization of genetic control circuits', J. Theor. Biol., Vol. 42, pp. 563±585. & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001 49 Jacq 15. Thomas, R, Thieffry, D. and Kaufman, M. (1995) `Dynamical behaviour of biological regulatory networks ± I. Biological role of feedback loops and practical use of the concept of the loop-characteristic state', Bull. Math. Biol., Vol. 57, pp. 247±276. 16. Hlavacek, W. S. and Savageau, M. A. (1996), `Rules for coupled expression of regulator and effector genes in inducible circuits', J. Mol. Biol., Vol. 255, pp. 121±139. 17. Sharp, D. H. and Reinitz, J. (1998), `Prediction of mutant expression patterns using gene circuits', Biosystems, Vol. 47, pp. 79±90. 18. Smolen, P., Baxter, D. A. and Byrne, J. H. (2000), `Mathematical modeling of gene networks', Neuron, Vol. 26, pp. 567±580. 19. Von Dassow, G., Meir, E., Munro, E. M. and Odell, G. M. (2000), `The segment polarity network is a robust developmental module', Nature, Vol. 406, pp. 188±192. 20. Elowitz, M. B. and Leibler, S. (2000), `A synthetic oscillatory network of transcriptional regulators', Nature, Vol. 403, pp. 335±338. 21. Gardner, T. S., Cantor, C. R. and Collins, J. J. (2000), `Construction of a genetic toggle switch in Escherichia coli', Nature, Vol. 403, pp. 339±342. 22. Becskei, A. and Serrano, L. (2000), `Engineering stability in gene networks by autoregulation', Nature, Vol. 405, pp. 590±593. 23. Eisenberg, D., Rice, D. W. and Xenarios, I. (1998), unpublished; URL: http://dip.doe-mbi.ucla.edu/ 24. Ogata, H., Goto, S., Sato, K. et al. (1999), `KEGG: Kyoto Encyclopedia of Genes and Genomes', Nucleic Acids Res., Vol. 27, pp. 29±34. 50 25. Mohr, E., Horn, F., Janody, F. et al. (1998), `FlyNets and GIF-DB, two internet databases for molecular interactions in Drosophila melanogaster', Nucleic Acids Res., Vol. 26, pp. 89±93. 26. Sanchez, C., Lachaize, C., Janody, F. et al. (1999), `Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database', Nucleic Acids Res., Vol. 27, pp. 89±94. 27. Serov, V. N., Spirov, A. V. and Samsonova, M. G. (1998), `Graphical interface to the genetic network database GeNet', Bioinformatics, Vol. 14, pp. 546±547. 28. Blaschke, C., Andrade, M. A., Ouzounis, C. and Valencia, A. (1999), `Automatic extraction of biological information from scienti®c text: protein±protein interactions', in `Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology', AAAI Press, Menlo Park, CA, pp. 60±67. 29. Craven, M. and Kumlien, J. (1999), `Constructing biological knowledge bases by extracting information from text sources', in `Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology', AAAI Press, Menlo Park, CA, pp. 77±86. 30. http://www.fruit¯y.org/annot/menus/ transcription_factor.html 31. Devos, D. and Valencia, A. (2000), `Practical limits of function prediction', Proteins, Vol. 41, pp. 98±107. 32. Fromont-Racine, M., Rain, J.C. and Legrain, P. (1997), `Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens', Nat. Genet., Vol. 16, pp. 277±282. & HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 1. 38±50. MARCH 2001