Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SuperTarget: http:// insilico.charite.de/supertarget/ Matador: http://matador.embl.de/ STITCH: http://stitch.embl.de/ Chemicals in context: from SuperTarget and Matador to STITCH Michael Kuhn Peer Bork lab, EMBL Heidelberg [email protected] 1 Content: 7300 interactions in SuperTarget; subset: 4900 interactions in Matador Drug-Target Databases Published online 16 October 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D919–D922 doi:10.1093/nar/gkm862 SuperTarget and Matador: resources for exploring drug-target relationships Stefan Günther1, Michael Kuhn2, Mathias Dunkel1, Monica Campillos2, Christian Senger1, Evangelia Petsalaki2, Jessica Ahmed1, Eduardo Garcia Urdiales2, Andreas Gewiess3, Lars Juhl Jensen2, Reinhard Schneider2, Roman Skoblo3, Robert B. Russell2, Philip E. Bourne4, Peer Bork2,5 and Robert Preissner1,* 1 Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charité—University Medicine Berlin, Arnimallee 22, 14195 Berlin, 2EMBL—Biocomputing, Meyerhofstraße 1, 69117 Heidelberg, 3Institute for Laboratory Medicine, Windscheidstr, 18, 10627 Berlin, Germany, 4Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla CA 92093, USA and 5Max-Delbrück-Center for MolecularMedicine (MDC), 13092 Berlin-Buch, Germany Received August 15, 2007; Revised September 26, 2007; Accepted September 27, 2007 ABSTRACT INTRODUCTION The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de Within the past two decades our knowledge about drugs, their mechanisms of action and target proteins has increased rapidly. Nevertheless, knowledge on their molecular effects is far from complete. For some drugs even the primary targets are still unknown, for example, Diloxanide, Niclosamide and Ambroxol are administered successfully although their effect on human metabolism is still not clarified at a molecular level (1). Even if the medical effect has been explained by a certain molecular interaction, most drugs interact with several additional targets, which may either strengthen the therapeutic effect or cause unwanted adverse drug effects (2). Moreover, our knowledge on drugs and their targets is highly fragmented, most of it residing in millions of medical articles and textbooks, which precludes systematic studies. Several databases exist that collect binding data on small molecules, in particular drugs and proteins. The largest such resource is DrugBank (3), which contains 2600 drug-target relations for 900 FDA-approved drugs and additional annotations for 3200 experimental drugs. Another notable database is the Therapeutic Target Database (TTD) (4), which holds target information on about 1000 small molecule drugs. Unfortunately, DrugBank only provides references on the target, although generally not on the interactions, which makes it difficult to obtain information on the experimental context under which an interaction was observed. Moreover, the drugs in the TTD are not cross-linked 2 Manual Curation *To whom correspondence should be addressed. Tel: +49 30 8445 1649; Fax: +49 30 8445 1551; Email: [email protected] ! 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. • look for abstracts in PubMed/MEDLINE that mention genes and drugs • create candidate list • annotate candidate list 3 Direct Interactions 4 Prodrugs interact indirectly via a metabolite Some drugs bind to receptors, which affect other proteins (e.g. by phosphorylation or changes in gene expression) Indirect Interactions 5 Interactions with Proteins 6 Incomplete information: “Drug X interacts with dopamine receptors” — but which one? Interactions with multimers (e.g. NMDA receptors) Interactions with Protein Families 7 Demo of SuperTarget: Searching for a drug, showing drug information; putting drug on clipboard and using it to find cellular processes; put ontology term on clipboard and use it as query term find related drugs. 8 Chemicals in Context D684–D688 Nucleic Acids Research, 2008, Vol. 36, Database issue doi:10.1093/nar/gkm795 Published online 15 December 2007 STITCH: interaction networks of chemicals and proteins Michael Kuhn1, Christian von Mering2, Monica Campillos1, Lars Juhl Jensen1,* and Peer Bork1,3 1 European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany Received August 14, 2007; Revised September 14, 2007; Accepted September 17, 2007 ABSTRACT The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and basis for the integration of knowledge about chemicals themselves, their biological interactions and their phenotypic effects. Thus, many problems in Chemical Biology are now becoming approachable by the academic research community. Valuable information about the biological activity of chemicals is provided by large-scale experiments. Phenotypic effects of chemicals were first made available on a large scale by the US National Cancer Institute (NCI), which conducts anti-cancer drug screens on 60 human tumour cell lines (NCI60) (4). The patterns of growth inhibition in the different cell lines by small molecules can not only be used to judge the efficacy of individual compounds, but also to relate compounds by their mechanism of action (5,6). Other unexpected relationships between compounds can be found using the PubChem BioAssay resource, where NCI60 data and many other assays are aggregated. As of July 2007, it contains 587 highly diverse assays, ranging from studies of single molecules to high-throughput screens with over 100 000 tested substances. Recently, the Connectivity Map project (7) set out to catalogue the perturbations in gene expression upon chemical treatment. As the Connectivity 9 Left: whole tree of life Right: only interactions in human Content • 373 genomes • 68,000 chemicals • 11,800 human genes • 38,000 chemicals • 2100 drugs 10 Downloaded from www.genome.org on January 30, 2008 - Published by Cold Spring Harbor Laboratory Press Chemical databases are aggregated in PubChem. Different interaction databases can be combined for research and exploration of chemical context. Yao and Rzhetsky within the network, although the drug targets in the GeneWays network tend to have slightly higher betweenness values than average (P-value = 0.1943; Fig. 2C). The increased average betweenness of drug targets is most obvious in the HPRD1 and HPRD 2 networks (Pvalues = 0.0004 and 0.004, respectively), suggesting that successful drug targets tend to bridge two or more clusters of relatively closely interacting molecules. The clustering coefficients of drug targets are similar to those of the rest of the network nodes in all five data sets (see Table 2; Fig. 2D). We next asked if proteins that are successful drug targets are less polymorphic (considering only human, intraspecies variation) than human genes on average. To answer this question, we used a Figure 1. Distribution of the number of human gene targets per successful drug. The plot is superlarge set (16,462 genes) of known huimposed on a family classification of drug targets. man single-nucleotide polymorphisms (SNPs) available at dbSNP (Sherry et al. 2001). To reduce any effects of SNP sampling bias (some genes The connectivity of a node within a graph is simply the total enjoy more attention on the part of the scientific community number of incoming and outgoing arcs (direct molecular interthan others), instead of studying the absolute number of reactions, in our case). As has been previously established, the conported SNPs for each gene, we used the ratio (Cratio) of nonsynnectivity distributions for real molecular networks are so-called onymous to synonymous SNPs (with an expected value of 1 for heavy-tail distributions resembling Zipf’s (Pareto’s or power-law) a perfectly neutral mode of SNP accumulation). The assumption distribution (Fig. 2A; Barabasi and Bonabeau 2003). The successunderlying this analysis is that sampling bias for a gene affects ful drug targets occupy a rather narrow niche within this distrisynonymous and nonsynonymous SNPs equally. bution: their connectivity is significantly higher than that of an Our analysis indicates (Fig. 2E,F) that Cratio for successful average node within the network (in GeneWays it is ∼9.1, Pdrug targets is significantly smaller than that for an average huvalue = 0.0064 [Fig. 2A,B,F]; in HPRD1 and HPRD2, it is 10.9 and man gene (P-value = 0.0007). This result suggests that successful 11.5, P-values = 0 and 0.0001, respectively; the same comparison drug targets tend to be less nonsynonymously polymorphic at performed using the smaller Y2H and BIND networks revealed no the human population level than are human genes on average. significant difference [see Table 2]). However, the average conFurthermore, Cratio is significantly negatively correlated with nectivity of drug targets is relatively small compared to the maxigene connectivity (Spearman rank correlation coefficient mum connectivity observed in the network (9.1 vs. a maximum !0.4841, P-value = 0.0000), consistent with the observation that of 346 in GeneWays). The most highly connected high-revenue more highly conserved proteins tend to have higher connectividrug targets in the GeneWays network (ABL1, androgen receptor ties (Fraser et al. 2002). Another line of evidence shows that [AR], BCHE, EGFR, INSR, NR3C1, TNF, and VEGFA; see Fig. 2G) highly expressed genes tend to evolve more slowly than those are targeted by drugs intended to provide relief for the most whose expression is low (Drummond et al. 2005). Furthermore, life-threatening phenotypes, such as cancer and autoimmune some experimental techniques, such as yeast two-hybrid prodisorders. The successful drugs targeting these highly connected tein–protein interaction screening, may detect interactions of genes and proteins are associated with terrible side effects (think highly expressed proteins more readily (Bloom and Adami 2003). of chemotherapy patients) that are tolerable only in life-or-death Hence, relationships between gene expression level, sequence situations. conservation, and connectivity may involve data biases and The betweenness of a network node is defined as the number should be interpreted with caution. of times this node appears in the shortest path between two other We interpret the results of our SNP analysis as follows: a network nodes, summed over all node pairs in the network and drug designed to target a protein that is polymorphic among divided by the total number of node pairs (e.g., Noh 2003). The clustering coefficient of a network node is the ratio of the actual number of direct connections between the immediate neighbors Table 1. Comparison of different human molecular interaction of the node to the maximum possible number of such direct arcs data sets between its neighbors (e.g., Holme and Kim 2002). The clustering No. of No. of No. of drug coefficient is zero if a node’s neighbors do not interact directly genes/proteins interactions targets covered (e.g., a professor who interacts with many graduate students, but whose students avoid talking to one another). The highest clusY2H 2936 5722 49 BIND 2886 4964 157 tering coefficient is attained in a complete graph where every GeneWays 4458 14,124 197 node is connected to every other node. The betweenness values HPRD1 7764 28,149 304 of the drug targets in the GeneWays, BIND, and Y2H networks HPRD2 9462 37,107 318 are not significantly different from those of the rest of genes 2 Genome Research www.genome.org 11 Links to Protein World http://string.embl.de 12 Demo of STITCH 13 Confidence view 14 Evidence view: Different line colors for different interaction sources. (Magenta: experiments, cyan: databases, yellow: textmining) 15 Actions view: Lines depict the nature of the interaction. (Blue: binding/“targeting”, red: inhibition, green: activation). 16 All interactions can be traced back to the original evidence. 17 Thank you for your attention! • SuperTarget: http://insilico.charite.de/supertarget/ • Matador: http://matador.embl.de/ • STITCH: http://stitch.embl.de/ 18