Download An Integrated Database and Knowledge

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein purification wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Homology modeling wikipedia , lookup

Protein domain wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Western blot wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Proteomics wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
An Integrated Database and Knowledge-Base of
Interaction between Human Proteins and Commonly Used
Drugs
Hiroyasu Shimada1
Masashi Nemoto1
[email protected]
[email protected]
1
2
3
Ken Horiuchi1
Atsuko Yamaguchi1
[email protected]
[email protected]
Motoi Tobita2
Kenji Araki3
Tetsuo Nishikawa1
[email protected]
[email protected]
[email protected]
Reverse Proteomics Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
Hitachi, Advanced Research Laboratory, 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan
Mochida Pharmaceutical, 722 Uenohara, Jimba, Gotemba, Shizuoka 412-8524, Japan
Keywords: protein-drug interaction, database, human full length cDNA, knowledge-base
1
Introduction
Finding new drug target proteins is indispensable to design novel drugs. However, not many target proteins
of commonly used drugs have been found until now. Therefore, it is important to clarify target proteins of
commonly used drugs in order to optimize commonly used drugs and to design novel drugs. Reverse
Proteomics Research Institute (REPRORI), that aims at finding new target proteins from 6,000 proteins
obtained from the human full-length cDNA clones generated in NEDO FL project by using 800 known
commonly used drugs as probes, started a research project [1] in 2002 in order to build a platform for drug
design. In this paper, we report a database system [2]
information about
Known interaction data
FL cDNA
commonly used
Information
that integrates annotation data of proteins, drugs and
sequences
drugs
collection
Amino acid
interactions and a knowledge-base system that collects
Measured interaction data
sequences
knowledge that is extracted from the database. One of
Normalization
the unique characteristics of the database is that it
calculation of physical
Functional annotation
Pre-processing
properties
of
drugs
and
collects comprehensive interaction data as matrices.
and clustering based on
clustering based on the
genome mapping
properties
From the analysis of the database, various findings
such as new drug target proteins, different target
Integrated database
proteins corresponding to different pharmacological
Proteins information
Drugs information
Functional annotations
effects or adverse effects, and new pharmacophores
CAS No., synonyms, structures,
Interaction similarity search,
Database
Physical properties,
data
motif search and
pharmacological
effects,
side
effects
genome mapping
are expected to be derived. Therefore, we clustered the
construction
Integrated retrieval system
interaction matrices and analyzed the clusters to search
for properties of the drugs or proteins common to the
Data-mining
members of the cluster. We also did modeling and
Interaction cluster analysis
Structural analysis
docking analyses for the measured interactions. We
developed a knowledge-base system that collects
cluster analysis results and structural analysis results
Knowledge-base
for interaction pairs. New target proteins and the Knowledge-base
・Clustering analysis results
construction
・Structural analysis results
related information in the knowledge-base system will
give us a basis for designing novel drugs.
2
Method and Results
Figure 1: Construction procedure of integrated
database and knowledge-base
2.1 Construction Procedure of Integrated Database and Knowledge-Base
Five-step construction procedure is described using Figure 1. In the information collection step, information
on commonly used drugs, FLcDNA sequences and amino acid sequences, known interaction data between
drugs and proteins, and measured interaction data is collected. In the pre-processing step, physical properties
of drugs are calculated. Also, clustering of drugs based on the properties, normalization on the measured
interaction data, and functional annotation and clustering based on genome mapping for cDNA sequences are
performed. In the database construction step, pre-processed data are gathered into a Postgre SQL database.
Graphical user interfaces are built in order to visualize information on drugs, proteins, and interactions. In
the data-mining step, interaction cluster analysis and structural analysis are performed. In the
knowledge-base construction step, results of clustering analysis and structural analysis are gathered.
2.2 Integrated Database System
The database contains annotations for more than 1000 commonly used drugs and for about 20000 proteins
that are coded on the human full-length cDNA sequences. Annotations for drugs include CAS No., synonyms,
structures, physical properties, pharmacological effects, and adverse effects. Annotations for proteins include
their functional annotations, such as similarity and motif search, and genome mapping. The database also
contains the experimental interaction data obtained from several screening methods such as size exclusion
chromatography and surface plasmon resonance, and known interaction data collected from public databases
such as ChemBank and literatures. By using the database, we can refer to information from journals,
information from public databases and experimental data at the same time, and observe connections among
them. Observation of these data leads to finding of the interaction that is related to pharmacological effects or
adverse effects.
2.3 Knowledge-Base System
Structural annotation window
Interaction matrix view by BirdsAnts
Construction procedure and interfaces of the
knowledge-base system are shown in Figure 2.
First, the interaction matrix is clustered by using a
clustering method developed by us that uses
concept of biclique cover and maximizes the
Clustering of matrix
Modeling and docking viewer
average cluster size. To check the clustering results,
Interaction cluste
clusterr annotation window
a viewer BirdsAnts [3] developed by us was used.
Cluster ID: Bia021003
Chemicals Proteins
For the obtained clusters, analyses of the clusters
Chemical1 Protein1
Comments:
Chemical2 Protein2
・Pharmacophore analysis results.
such as pharmacophore analysis and protein
Chemical3
・Protein domain analysis results
Chemical4
domain analysis were done. If an interesting pair of
Chemical5
Analysis of clusters
a drug and a protein is found as a result of
Pharmacophore view
Cluster view
Pharmacophore analysis
Protein domain analysis
clustering analysis, and if structure of that protein
Modeling and docking culculation
is guessed, homology modeling and docking
calculation are performed. The analyzed data are
accumulated in the knowledge-base system. In the a) Knowledge-base construction
b)
interaction cluster annotation window, the features
extracted from the analysis of the clusters such as Figure 2: Construction of knowledge-base
phamacophores are shown. The cluster is visualized a)Construction procedure, b)Interface of the system
with several properties of drugs and proteins by BirdsAnts. In the structural annotation window, modeling
and docking data are visualized by using jV version 3 [4].
Protein properties
Chemical
properties
3
Acknowledgments
This work was supported by a grant from NEDO Project of the Ministry of Economy, Trade and Industry of
Japan.
References
[1] Suwa, Y., Exploring novel drug targets through the comprehensive analysis of human protein-drug
interactions, HUGO's 10th Human Genome Meeting Kyoto, Japan 18-21 April 2005.
[2] Nemoto, M., Araki, K., Horiuchi, K., Tobita, M., and Nishikawa, T., An integrated database of interaction
between human proteins and commonly used drugs, Genome Inform., 14:599-600, 2003.
[3] Tobita, M., Horiuchi, K., Araki, K., Nemoto, M., and Nishikawa, T., BirdsAnts: bringing informative
rules from a database system, aimed at novel targets search, Genome Inform., 14:286-289, 2003.
[4] http://pdbj.protein.osaka-u.ac.jp/PDBjViewer/index.html