Download Drug-Target Databases Manual Curation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Transcript
SuperTarget: http://
insilico.charite.de/supertarget/
Matador: http://matador.embl.de/
STITCH: http://stitch.embl.de/
Chemicals in context: from
SuperTarget and Matador
to STITCH
Michael Kuhn
Peer Bork lab, EMBL Heidelberg
[email protected]
1
Content: 7300 interactions in
SuperTarget; subset: 4900
interactions in Matador
Drug-Target Databases
Published online 16 October 2007
Nucleic Acids Research, 2008, Vol. 36, Database issue D919–D922
doi:10.1093/nar/gkm862
SuperTarget and Matador: resources for exploring
drug-target relationships
Stefan Günther1, Michael Kuhn2, Mathias Dunkel1, Monica Campillos2,
Christian Senger1, Evangelia Petsalaki2, Jessica Ahmed1,
Eduardo Garcia Urdiales2, Andreas Gewiess3, Lars Juhl Jensen2,
Reinhard Schneider2, Roman Skoblo3, Robert B. Russell2, Philip E. Bourne4,
Peer Bork2,5 and Robert Preissner1,*
1
Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charité—University Medicine
Berlin, Arnimallee 22, 14195 Berlin, 2EMBL—Biocomputing, Meyerhofstraße 1, 69117 Heidelberg, 3Institute for
Laboratory Medicine, Windscheidstr, 18, 10627 Berlin, Germany, 4Skaggs School of Pharmacy and
Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla CA 92093, USA
and 5Max-Delbrück-Center for MolecularMedicine (MDC), 13092 Berlin-Buch, Germany
Received August 15, 2007; Revised September 26, 2007; Accepted September 27, 2007
ABSTRACT
INTRODUCTION
The molecular basis of drug action is often not
well understood. This is partly because the very
abundant and diverse information generated in the
past decades on drugs is hidden in millions of
medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that
integrates drug-related information about medical
indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of
the target proteins. An easy-to-use query interface
enables the user to pose complex queries, for
example to find drugs that target a certain pathway,
interacting drugs that are metabolized by the same
cytochrome P450 or drugs that target the
same protein but are metabolized by different
enzymes. Furthermore, we provide tools for 2D
drug screening and sequence comparison of the
targets. The database contains more than 2500
target proteins, which are annotated with about
7300 relations to 1500 drugs; the vast majority of
entries have pointers to the respective literature
source. A subset of these drugs has been annotated
with additional binding information and indirect
interactions and is available as a separate resource
called Matador. SuperTarget and Matador are
available at http://insilico.charite.de/supertarget
and http://matador.embl.de
Within the past two decades our knowledge about
drugs, their mechanisms of action and target proteins
has increased rapidly. Nevertheless, knowledge on their
molecular effects is far from complete. For some drugs
even the primary targets are still unknown, for example,
Diloxanide, Niclosamide and Ambroxol are administered
successfully although their effect on human metabolism is
still not clarified at a molecular level (1). Even if the
medical effect has been explained by a certain molecular
interaction, most drugs interact with several additional
targets, which may either strengthen the therapeutic
effect or cause unwanted adverse drug effects (2).
Moreover, our knowledge on drugs and their targets is
highly fragmented, most of it residing in millions of
medical articles and textbooks, which precludes systematic
studies.
Several databases exist that collect binding data
on small molecules, in particular drugs and proteins.
The largest such resource is DrugBank (3), which contains
2600 drug-target relations for 900 FDA-approved drugs
and additional annotations for 3200 experimental drugs.
Another notable database is the Therapeutic Target
Database (TTD) (4), which holds target information on
about 1000 small molecule drugs. Unfortunately,
DrugBank only provides references on the target,
although generally not on the interactions, which makes
it difficult to obtain information on the experimental
context under which an interaction was observed.
Moreover, the drugs in the TTD are not cross-linked
2
Manual Curation
*To whom correspondence should be addressed. Tel: +49 30 8445 1649; Fax: +49 30 8445 1551; Email: [email protected]
! 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
• look for abstracts in PubMed/MEDLINE
that mention genes and drugs
• create candidate list
• annotate candidate list
3
Direct Interactions
4
Prodrugs interact indirectly via a
metabolite
Some drugs bind to receptors,
which affect other proteins (e.g.
by phosphorylation or changes in
gene expression)
Indirect Interactions
5
Interactions with
Proteins
6
Incomplete information: “Drug X
interacts with dopamine
receptors” — but which one?
Interactions with multimers (e.g.
NMDA receptors)
Interactions with
Protein Families
7
Demo of SuperTarget: Searching
for a drug, showing drug
information; putting drug on
clipboard and using it to find
cellular processes; put ontology
term on clipboard and use it as
query term find related drugs.
8
Chemicals in Context
D684–D688 Nucleic Acids Research, 2008, Vol. 36, Database issue
doi:10.1093/nar/gkm795
Published online 15 December 2007
STITCH: interaction networks of chemicals
and proteins
Michael Kuhn1, Christian von Mering2, Monica Campillos1, Lars Juhl Jensen1,*
and Peer Bork1,3
1
European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich,
Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbrück-Centre for Molecular Medicine,
Robert-Rössle-Strasse 10, 13092 Berlin, Germany
Received August 14, 2007; Revised September 14, 2007; Accepted September 17, 2007
ABSTRACT
The knowledge about interactions between
proteins and small molecules is essential for the
understanding of molecular and cellular functions.
However, information on such interactions is
widely dispersed across numerous databases and
the literature. To facilitate access to this data,
STITCH (‘search tool for interactions of chemicals’)
integrates information about interactions from
metabolic pathways, crystal structures, binding
experiments and drug–target relationships. Inferred
information from phenotypic effects, text mining
and chemical structure similarity is used to predict
relations between chemicals. STITCH further
allows exploring the network of chemical relations,
also in the context of associated binding proteins.
Each proposed interaction can be traced back to
the original data sources. Our database contains
interaction information for over 68 000 different
chemicals, including 2200 drugs, and connects
them to 1.5 million genes across 373 genomes and
basis for the integration of knowledge about chemicals
themselves, their biological interactions and their phenotypic effects. Thus, many problems in Chemical Biology
are now becoming approachable by the academic research
community.
Valuable information about the biological activity of
chemicals is provided by large-scale experiments.
Phenotypic effects of chemicals were first made available
on a large scale by the US National Cancer Institute (NCI),
which conducts anti-cancer drug screens on 60 human
tumour cell lines (NCI60) (4). The patterns of growth
inhibition in the different cell lines by small molecules can
not only be used to judge the efficacy of individual
compounds, but also to relate compounds by their
mechanism of action (5,6). Other unexpected relationships
between compounds can be found using the PubChem
BioAssay resource, where NCI60 data and many other
assays are aggregated. As of July 2007, it contains 587
highly diverse assays, ranging from studies of single
molecules to high-throughput screens with over 100 000
tested substances. Recently, the Connectivity Map project
(7) set out to catalogue the perturbations in gene
expression upon chemical treatment. As the Connectivity
9
Left: whole tree of life
Right: only interactions in human
Content
• 373 genomes
• 68,000 chemicals
• 11,800 human genes
• 38,000 chemicals
• 2100 drugs
10
Downloaded from www.genome.org on January 30, 2008 - Published by Cold Spring Harbor Laboratory Press
Chemical databases are
aggregated in PubChem.
Different interaction databases
can be combined for research
and exploration of chemical
context.
Yao and Rzhetsky
within the network, although the drug
targets in the GeneWays network tend
to have slightly higher betweenness
values than average (P-value = 0.1943;
Fig. 2C). The increased average betweenness of drug targets is most obvious in
the HPRD1 and HPRD 2 networks (Pvalues = 0.0004 and 0.004, respectively),
suggesting that successful drug targets
tend to bridge two or more clusters of
relatively closely interacting molecules.
The clustering coefficients of drug targets are similar to those of the rest of the
network nodes in all five data sets (see
Table 2; Fig. 2D).
We next asked if proteins that are
successful drug targets are less polymorphic (considering only human, intraspecies variation) than human genes on average. To answer this question, we used a
Figure 1. Distribution of the number of human gene targets per successful drug. The plot is superlarge set (16,462 genes) of known huimposed on a family classification of drug targets.
man single-nucleotide polymorphisms
(SNPs) available at dbSNP (Sherry et al.
2001). To reduce any effects of SNP sampling bias (some genes
The connectivity of a node within a graph is simply the total
enjoy more attention on the part of the scientific community
number of incoming and outgoing arcs (direct molecular interthan others), instead of studying the absolute number of reactions, in our case). As has been previously established, the conported SNPs for each gene, we used the ratio (Cratio) of nonsynnectivity distributions for real molecular networks are so-called
onymous to synonymous SNPs (with an expected value of 1 for
heavy-tail distributions resembling Zipf’s (Pareto’s or power-law)
a perfectly neutral mode of SNP accumulation). The assumption
distribution (Fig. 2A; Barabasi and Bonabeau 2003). The successunderlying this analysis is that sampling bias for a gene affects
ful drug targets occupy a rather narrow niche within this distrisynonymous and nonsynonymous SNPs equally.
bution: their connectivity is significantly higher than that of an
Our analysis indicates (Fig. 2E,F) that Cratio for successful
average node within the network (in GeneWays it is ∼9.1, Pdrug targets is significantly smaller than that for an average huvalue = 0.0064 [Fig. 2A,B,F]; in HPRD1 and HPRD2, it is 10.9 and
man gene (P-value = 0.0007). This result suggests that successful
11.5, P-values = 0 and 0.0001, respectively; the same comparison
drug targets tend to be less nonsynonymously polymorphic at
performed using the smaller Y2H and BIND networks revealed no
the human population level than are human genes on average.
significant difference [see Table 2]). However, the average conFurthermore, Cratio is significantly negatively correlated with
nectivity of drug targets is relatively small compared to the maxigene connectivity (Spearman rank correlation coefficient
mum connectivity observed in the network (9.1 vs. a maximum
!0.4841, P-value = 0.0000), consistent with the observation that
of 346 in GeneWays). The most highly connected high-revenue
more highly conserved proteins tend to have higher connectividrug targets in the GeneWays network (ABL1, androgen receptor
ties (Fraser et al. 2002). Another line of evidence shows that
[AR], BCHE, EGFR, INSR, NR3C1, TNF, and VEGFA; see Fig. 2G)
highly expressed genes tend to evolve more slowly than those
are targeted by drugs intended to provide relief for the most
whose expression is low (Drummond et al. 2005). Furthermore,
life-threatening phenotypes, such as cancer and autoimmune
some experimental techniques, such as yeast two-hybrid prodisorders. The successful drugs targeting these highly connected
tein–protein interaction screening, may detect interactions of
genes and proteins are associated with terrible side effects (think
highly expressed proteins more readily (Bloom and Adami 2003).
of chemotherapy patients) that are tolerable only in life-or-death
Hence, relationships between gene expression level, sequence
situations.
conservation, and connectivity may involve data biases and
The betweenness of a network node is defined as the number
should be interpreted with caution.
of times this node appears in the shortest path between two other
We interpret the results of our SNP analysis as follows: a
network nodes, summed over all node pairs in the network and
drug designed to target a protein that is polymorphic among
divided by the total number of node pairs (e.g., Noh 2003). The
clustering coefficient of a network node is the ratio of the actual
number of direct connections between the immediate neighbors
Table 1. Comparison of different human molecular interaction
of the node to the maximum possible number of such direct arcs
data sets
between its neighbors (e.g., Holme and Kim 2002). The clustering
No. of
No. of
No. of drug
coefficient is zero if a node’s neighbors do not interact directly
genes/proteins
interactions
targets covered
(e.g., a professor who interacts with many graduate students, but
whose students avoid talking to one another). The highest clusY2H
2936
5722
49
BIND
2886
4964
157
tering coefficient is attained in a complete graph where every
GeneWays
4458
14,124
197
node is connected to every other node. The betweenness values
HPRD1
7764
28,149
304
of the drug targets in the GeneWays, BIND, and Y2H networks
HPRD2
9462
37,107
318
are not significantly different from those of the rest of genes
2
Genome Research
www.genome.org
11
Links to Protein World
http://string.embl.de
12
Demo of STITCH
13
Confidence view
14
Evidence view: Different line
colors for different interaction
sources. (Magenta: experiments,
cyan: databases, yellow: textmining)
15
Actions view: Lines depict the
nature of the interaction. (Blue:
binding/“targeting”, red:
inhibition, green: activation).
16
All interactions can be traced
back to the original evidence.
17
Thank you for your
attention!
• SuperTarget:
http://insilico.charite.de/supertarget/
• Matador: http://matador.embl.de/
• STITCH: http://stitch.embl.de/
18