Download Facilitating adverse drug event detection in pharmacovigilance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Discovery and development of proton pump inhibitors wikipedia , lookup

Discovery and development of non-nucleoside reverse-transcriptase inhibitors wikipedia , lookup

Plateau principle wikipedia , lookup

Stimulant wikipedia , lookup

Compounding wikipedia , lookup

Pharmaceutical marketing wikipedia , lookup

Specialty drugs in the United States wikipedia , lookup

Polysubstance dependence wikipedia , lookup

Biosimilar wikipedia , lookup

Orphan drug wikipedia , lookup

Bad Pharma wikipedia , lookup

Drug design wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Drug discovery wikipedia , lookup

Neuropharmacology wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Psychopharmacology wikipedia , lookup

Medication wikipedia , lookup

Pharmacognosy wikipedia , lookup

Prescription costs wikipedia , lookup

Pharmaceutical industry wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Drug interaction wikipedia , lookup

Transcript
Research and applications
Facilitating adverse drug event detection in
pharmacovigilance databases using molecular
structure similarity: application to rhabdomyolysis
Santiago Vilar,1 Rave Harpaz,1 Herbert S Chase,1 Stefano Costanzi,2 Raul Rabadan,1
Carol Friedman1
< Additional materials are
published online only. To view
these files please visit the
journal online (www.jamia.org/
content/18/Suppl_1.toc).
1
Department of Biomedical
Informatics, Columbia University
Medical Center, New York, New
York, USA
2
Laboratory of Biological
Modeling, National Institute of
Diabetes and Digestive and
Kidney Diseases, National
Institutes of Health, Bethesda,
Maryland, USA
Correspondence to
Dr Carol Friedman, Department
of Biomedical Informatics,
Columbia University Medical
Center, 622 West 168th Street
VC5, New York, NY 10032,
USA;
[email protected]
Received 7 June 2011
Accepted 22 August 2011
Published Online First
21 September 2011
ABSTRACT
Background Adverse drug events (ADE) cause
considerable harm to patients, and consequently their
detection is critical for patient safety. The US Food and
Drug Administration maintains an adverse event
reporting system (AERS) to facilitate the detection of
ADE in drugs. Various data mining approaches have been
developed that use AERS to detect signals identifying
associations between drugs and ADE. The signals must
then be monitored further by domain experts, which is
a time-consuming task.
Objective To develop a new methodology that
combines existing data mining algorithms with chemical
information by analysis of molecular fingerprints to
enhance initial ADE signals generated from AERS, and to
provide a decision support mechanism to facilitate the
identification of novel adverse events.
Results The method achieved a significant improvement
in precision in identifying known ADE, and a more than
twofold signal enhancement when applied to the ADE
rhabdomyolysis. The simplicity of the method assists in
highlighting the etiology of the ADE by identifying
structurally similar drugs. A set of drugs with strong
evidence from both AERS and molecular fingerprintbased modeling is constructed for further analysis.
Conclusion The results demonstrate that the proposed
methodology could be used as a pharmacovigilance
decision support tool to facilitate ADE detection.
The US Food and Drug Administration’s (FDA)
adverse event reporting system (AERS)1 is currently
used in the USA as the main source for drug safety
surveillance. The FDA receives reports of suspected
adverse drug events (ADE),2 which are then entered
into the AERS database and used to discover new
ADE. Despite its relative success, AERS has some
limitations: the system captures only approximately 10% of events,3 the number of patients
taking any drug is uncertain, and therefore the
number of patients actually at risk is unknown. In
addition, there are data quality issues preventing its
effective use.4 The limitations imply that many
serious ADE go undetected. It is now widely
recognized that only through the integration of
several information sources can a more effective
surveillance system be achieved.5 6
Medicinal chemistry has exploited the idea that
similar molecules from the point of view of chemical structure have similar biological properties.7e9
Molecular fingerprints can be used to evaluate
molecular similarity and identify molecules
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
structurally similar to those with a certain property.10 From the ADE perspective, computer-aided
methods based on molecular structures are useful
for pharmaceutical companies working in drug
design to predict toxicological effects leading to the
selection of new candidates without possible
undesirable effects.11 However, computer-aided
structural studies intended to predict preclinical
ADE have important limitations due to the
complexity of modeling poly-pharmacology
systems with multiple mechanisms of action.11
Nevertheless, the chemical aspect for ADE
discovery provided by structureeactivity relationship (SAR) models could be very useful in combination with different types of sources, such as
AERS, in the detection of post-marketed ADE. The
potential of SAR models to detect post-marketed
ADE was studied by the FDA (informatics and
computational safety analysis),12 which used AERS
as a basis to construct SAR models. We hypothesize
that computer-aided structural studies, and in
particular molecular fingerprint-based modeling
(MFBM), can be used to analyze novel ADE for
which evidential support is not yet conclusive, to
enrich subsets of suspected drugs generated by
other sources likely to be interesting for further
study, and to explain the molecular mechanisms
responsible for ADE.
In this article, we present a new ADE discovery
paradigm that uses chemical information via
MFBM to enhance initial ADE signals generated
from another source, such as AERS, and provide
a decision support mechanism to facilitate the
identification of novel ADE. The basic idea is to
assess the likelihood of an ADE for a candidate drug
based on its structural similarity to a set of drugs
known to cause the event in which the candidate
drug has been determined by another source. Our
study is applied to the identification of drugs
reported in AERS that were found to be associated
with the serious ADE rhabdomyolysis, an important ADE characterized by skeletal muscle breakdown and the release of intracellular constituents
into the bloodstream. Different studies have been
published analyzing adverse events from the point
of view of AERS analysis or computer-aided structural studies.11 13 14 However, to the best of our
knowledge, no publication has combined the two
methods. In this paper we describe the combination of AERS analysis and MFBM, and show that it
reduces the number of false positives and yields
improved results to identify adverse events in drugs
better than either method by itself.
i73
Research and applications
BACKGROUND
Molecular fingerprint-based modeling
The study of the relationships between molecular structure and
biological activity has acquired important relevance in medicinal
chemistry.15 The basic method of developing quantitative
molecular similarity models based on molecular fingerprints
consists of three steps.
In the first step the structures of a set of molecules must be
entered into the computer and prepared for analysis. It is very
important to collect a representative dataset. Once the collection
is completed, a set of cleaning rules should be applied to ensure
that the molecular structures are in a form suitable for the next
modeling step (see the Methods section for more details).16
The second step is the calculation of the molecular fingerprints associated with the molecules.17 Although there are
different types of fingerprints, the essential idea is to represent
molecules through a bit vector that codifies the existence of
particular structural features or functional groups. As an
example, using BIT_MACCS fingerprints, some substructures
encountered in the molecule CH3-CH2-O-C(O)-NH2 are bit 23NC(O)O group, bit 84-NH2 (amine group), bit 114-CH3-CH2-A
(ethyl group), bit 123-OCO (ester group).16 18
The final step in the process is the calculation of the structural
similarity of the collected molecules on the basis of their molecular fingerprints. There are different measurements to compare
the similarity between molecular fingerprints. The Tanimoto
coefficient (TC), which is one of the most widely applied, can
span values between 0 and 1, where 0 means ‘’maximum
dissimilarity’ and 1 means ‘maximum similarity’.17 The TC
between two molecular fingerprints A and B is defined as:
TC ¼
NAB
NA þ NB NAB
where NA is the number of structural features present in fingerprint A (eg, number of bits with a value 1), NB is the number of
features present in fingerprint B and NAB is the number of
features present in common to both fingerprints A and B.
DrugBank database
The DrugBank database combines detailed chemical, biological
and pharmacological drug information with protein target data.
The database contains approved drugs as well as biotech,
nutraceuticals and experimental drugs.19 Importantly for our
study, it includes a field that specifies the structures (smile code)
of each molecule.
Pharmacovigilance databases and signal detection methods
There are different spontaneous reporting system databases that
have been created to provide postmarketing drug safety information, such as AERS,1 the European Medicines Agency20 or the
WHO international database.21 Pharmacoepidemiology databases, such as the general practice research databases in the
UK,22 or new initiatives such as the observational medical
outcomes partnership23 or the sentinel project5 also provide
important information for detecting new ADE and improving
drug safety. Researchers have also started to analyze clinical data
in electronic health records as new sources of data to detect
adverse events.24 In addition, multiple approaches have been
developed to detect adverse event signals in these databases,
most based on disproportionality measures, such as the relative
reporting ratio (RRR) or reporting odds ratio (ROR).24 A more
complex method is the gamma Poisson shrinker (GPS)
algorithm, which is used in this study.14 25
i74
GPS algorithm
The GPS algorithm was developed by DuMouchel,25 and is the
signal generation algorithm endorsed and used by the FDA to
generate ADE signals in AERS.26 GPS is based on a Bayesian
modeling framework that attempts to account for the uncertainty in the RRR disproportionality measure (a ratio of the
observed prevalence of a drugeevent pair to its expected prevalence under the assumption of independence) associated with
small counts, by ‘shrinking’ RRR towards the baseline case of no
association (value of one), by an amount that is proportional to
the variability of the RR statistic. The result of this shrinkage is
a reduction of spurious associations when there are not enough
data to support it. ADE signals are qualified by GPS using
a measure called empirical Bayes geometric mean (EBGM), which
is essentially an estimate of the posterior expected value of RRR.14
METHODS
Overview
An overview of the proposed method is depicted in figure 1.
Initially, a training dataset composed of 196 drugs reported to
cause rhabdomyolysis was compiled using several reliable
resources. This dataset was subjected to a series of preparation
steps and then used to calculate molecular fingerprints, which
are binary vectors that codify the presence/non-presence of
different structural features in the molecule.16 Two different sets
of candidate drugs (1162 approved DrugBank drugs and 59 drugs
selected from AERS by the GPS algorithm) were subjected to the
same process as the training dataset. The TC was then used to
compare the similarity of the fingerprints between the candidate
drugs and the training dataset drugs. A TC cut-off value of 0.80
or more signifies that the molecules are highly similar.
Molecular fingerprint-based rhabdomyolysis model
As mentioned above, a training dataset of 196 molecules known
to be associated with the ADE rhabdomyolysis was gathered
from the literature (hereafter ‘rhabdomyolysis training dataset’)
and used to construct the MFBM. The list of the compounds
along with corresponding references are given in supplementary
table S1, available online only. The molecules were collected
from different review articles, individual case reports and reliable
websites, such as http://drugs.com. The molecular operating
environment (MOE) software was used to represent the structures of all the molecules in this study,16 which were subsequently preprocessed with the module wash in the MOE
software. A set of cleaning rules in the module wash were used.
More specifically, group I metals in simple salts were disconnected from each molecule so that only the active ingredient
was retained (ie, the largest molecular fragment). Also, when
using the module wash, the protonation state was considered
neutral (the least charge-bearing form of the molecule) and
explicit hydrogens were added.
Finally, For all the molecules included in the study,
BIT_MACCS (MACCS structural keys bit packed) fingerprints
were calculated using the MOE software.16 18
Evaluation
Three evaluation studies were performed. The objective of the
studies was to measure performance of the MFBM by itself, of the
AERS signal detection GPS method by itself and of the combined
method. The precision of the three methods was calculated as the
ratio of the true positives versus all the positive cases defined as
true positive and false positive. In binary classification, precision
is a measure of the exactness in defining the proportion of true
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
Research and applications
Figure 1 An overview of the adverse drug event detection process for rhabdomyolysis. SAR, structureeactivity relationship.
positives within all the positive cases selected by the model.
Other metrics provided in the study are sensitivity and specificity.
Sensitivity measures the proportion of actual positives that are
correctly classified by the model and specificity measures the
proportion of actual negative cases correctly identified.
Signal generation using the MFBM
The first study measured the performance of the MFBM for
detecting drugs that may cause rhabdomyolysis by using the
1162 approved drugs in the DrugBank database. The DrugBank
database version 2.5 was downloaded19 and molecular fingerprints for all the drugs in DrugBank were computed using the
MOE software, in the same fashion described in the previous
section. Proteins were not taken into account in this study. As
described in the previous section, molecular fingerprints between
drugs in DrugBank and drugs in the rhabdomyolysis training
dataset were compared using the TC. A matrix file containing
the similarity between drugs based on TC was calculated using
the fingerprint cluster module and the sim_matrix2txt.svl script
in MOE.16 The similarity score of the drugs being evaluated by
the model is defined as the maximum pairwise TC obtained for
the drug against each drug in the rhabdomyolysis training
dataset. The set of drugs with a high TC value (TC $0.85) were
analyzed in detail, and the literature was searched to see if there
were any rhabdomyolysis-related reports for those drugs. The
results were also compared with a subset of drugs, which was
the same size as the set of highly similar drugs, and was selected
randomly from the DrugBank in order to compare the highly
similar set with the random set.
Signal generation using GPS algorithm
The second evaluation study measured the performance of
signal detection using the GPS algorithm by itself as applied to
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
the AERS database. To generate initial rhabdomyolysis ADE
signals, 3 years of AERS reports were processed, corresponding
to reports from the years 2007 to 2009 and amounting to
1 310 334 individual reports. Demographic information, necessary for removing duplicate reports, was extracted from AERS
and linked to each report. Duplicate reports were identified and
removed using several heuristics described in a related earlier
study that involved searching for reports with an exact match of
drugs, adverse events and demographic data.13 Of the remaining
reports only those that were reported by healthcare providers
were selected. In addition, the study was limited only to drugs
reported as a ‘primary’ or ‘secondary’ suspect. To reduce drug
naming redundancy and strengthen the signals, drug names
were automatically mapped to UMLS codes using a natural
language system MedLEE,27 and the corresponding generic
names were then obtained using RXNORM.28 As an example,
the drug Avandia, which is a brand name, was mapped to the
UMLS code C0875967 by MedLEE, and then mapped to the
generic rosiglitazone (UMLS code C0289313) using RXNORM.
A more detailed mapping process has previously been
described.13 29 The overall preprocessing step resulted in
a reduced set of 431 430 individual reports, including 14 737
unique drugs and 12 017 unique events.
After preprocessing the AERS reports, ADE signals were
generated using the GPS algorithm, which was implemented
exactly as specified in the original paper of DuMouchel,25
including stratification by age, gender and year, and using the
recommended seeding parameters. Finally, rhabdomyolysis ADE
signals were identified by selecting all signals having the
MedDRA30 term ‘rhabdomyolysis’ specified as the event, and
signals having an EBGM of 2 or greater, a suggested threshold,26
and having a prevalence of three or greater (ie, at least three
reports associated with the signal). The GPS algorithm takes
i75
Research and applications
into account corrections for multiple hypothesis testing by
shrinking the RRR to the baseline case of no association.
The evaluation was based on measuring the performance of
the GPS algorithm applied to AERS to identify known ADE
according to case reports in the literature and Micromedex.31
However, as some of the false positives could be true positives,
the precision may actually be higher (the receiver operating
characteristic (ROC) curve is reported in supplementary figure
S1, available online only).
Signal generation using GPS only
Study combining AERS GPS method with MFBM
The drug-ADE signals generated using the GPS method in the
second study were considered as candidate drugs, and were
structurally compared with the drugs in the training dataset of
the MFBM. Molecular fingerprints and TC calculations were
computed for the AERS candidate drugs using the same methods
as described in previous sections.
RESULTS
Signal generation using MFBM only
All the 1162 approved drugs from DrugBank were analyzed and
the number of those drugs that were highly similar to drugs in
the rhabdomyolysis training dataset (eg, the TC similarity
measure was over 0.85) was 127. The TC similarity measures for
this subset of drugs in the DrugBank database are provided in
supplementary tables S2 and S3, available online only. When the
literature was searched, articles indicating that these drugs could
be the cause of rhabdomyolysis were found for 24 of the drugs
(see table 1 and supplementary table S3, available online only).
However, it is possible that some of the remaining 103 drugs
could actually cause rhabdomyolysis, although currently there
are no published reports concerning them. The results were
compared with another subset of the same size, amounting to
127 different drugs selected randomly from DrugBank. Information to establish a relationship between them and the adverse
effect rhabdomyolysis was also searched for in the literature, and
only six of the 127 drugs were found to be the cause of rhabdomyolysis (see supplementary table S4, available online only).
The estimated sensitivity of the model in the prediction of
known ADE is 0.44 and the specificity is 0.91. The precision of
the model is 0.19 (true positive/(true positive+false positive)).
Table 1 Comparison of the different ADE evaluations using MFBM,
AERS and AERS plus MFBM
Evaluation 1
Signal generation using MFBM only*
No of drugs predicted as positives
TP
FP
TP/FP
127
24
103
0.23
Evaluation 2
Signal generation using GPS only*
No of drugs predicted as positives
TP
FP
TP/FP
59
13
46
0.28
Evaluation 3
Signal generation using AERS GPS algorithm and MFBM (TC $0.80)
No of drugs predicted as positives
TP
FP
TP/FP
20
TC <0.80
No of drugs predicted as negatives
39
Precision
TP/TP+FP
0.19
Precision
TP/TP+FP
0.22
9
11
0.82
Precision
TP/TP+FP
0.45
FN
4
TN
35
FN/TN
0.11
FN/FN+TN
0.10
*Without taking into account the drugs included in the rhabdomyolysis training dataset.
ADE, adverse drug event; AERS, adverse event reporting system; FN, false negatives; FP,
false positives; GPS, gamma Poisson shrinker; MFBM, molecular fingerprint-based
modeling; TC, Tanimoto coefficient; TN, true negatives; TP, true positives.
i76
An initial set of 115 drugs associated with rhabdomyolysis was
identified using the GPS algorithm on the AERS reports. Out of
these 115 drugs, 56 were described in the literature to cause
rhabdomyolysis and were also part of the set of 196 drugs used
to construct our MFBM. Among the remaining 59 drugs,
a bibliographic search based on Micromedex31 and case reports
revealed that 13 of the drugs can cause rhabdomyolysis as an
ADE (see table 2). No information relating to the other drugs as
the cause of rhabdomyolysis was available (see supplementary
table S5, available online only). The overall precision of AERS in
this study, considering all the 115 drugs is 0.60 (true positive/
(true positive+false positive)). If we take into account only the
59 drugs that were not included in the rhabdomyolysis training
dataset, the precision is 0.22 (see table 1). Although AERS
showed excellent results to detect drugs implicated in rhabdomyolysis, it is still possible to improve ADE detection by
combining AERS with MFBM as we explain in the next section.
Signal generation using AERS GPS algorithm and MFBM
Out of the 115 drugs selected in AERS, 56 were already included
in the rhabdomyolysis training dataset and thus were not used
for the evaluation. Of the other remaining 59 drugs, 13 drugs
were confirmed to cause rhabdomyolysis as an ADE, and no
information was found for 46 of the other drugs reporting them
as the cause of rhabdomyolysis, which were therefore considered
negative cases. Based on the TC, the MFBM highlighted nine of
the confirmed 13 drugs as structurally similar to a drug rhabdomyolysis training dataset, and therefore potentially a rhabdomyolysis-related ADE. A p value for the probability that the
method identified these nine drugs by chance was 0.004, ie,
highly unlikely. In addition, a random method would have
selected only approximately four confirmed drugs, whereas the
MFBM selected nine, thus achieving a greater than twofold
AERS signal enrichment. The results showed that the application of the MFBM substantially reduces the number of false
positives identified using AERS alone (see table 1). The ratio true
Table 2 Drugs with rhabdomyolysis causal reports found within the
AERS test set
AERS drug candidate
Reference
Tanimoto
coefficient
Ofloxacin
Tramadol
Desflurane
Omeprazole
Esomeprazole
Telbivudine
Gatifloxacin
Tiapride
Gabapentin
Donepezil
Etravirine
Metoclopramide
Amantadine
32e34
35 36
31 32 37
31 32 38e40
31 41
31 32 42
43
44
45 46
47
31 48
49
32 50
1.00
0.93
0.89
0.87
0.87
0.85
0.85
0.83
0.82
0.68
0.66
0.65
0.48
Similar drug in the
rhabdomyolysis
training dataset
Levofloxacin
Venlafaxine
Isoflurane
Pantoprazole
Pantoprazole
Zidovudine
Levofloxacin
Amisulpride
Aminocaproic acid
Heroin
Trimethoprim
Sunitinib
Phentermine
The Tanimoto coefficient can span values between 0 and 1, where 0 means ‘maximum
dissimilarity’ and 1 means ‘maximum similarity’.
AERS, adverse event reporting system.
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
Research and applications
positives/false positives for the set of drugs with TC of 0.80 or
greater and selected by the combined model is higher than the
ratio provided by the methods used individually (see table 1).
Figure 2 shows the resulting ROC curve for the classification of
59 AERS test signals as true/false ADE using the MFBM. The
area under the ROC curve is 0.7 (1 indicates perfect classifier and
0.5 indicates random classifier). Using a cut-off value of 0.80 for
the TC, the sensitivity obtained was 0.69 when the specificity
was 0.76, with an F-measure of 0.72. True positives and false
positives are shown in table 1. Comparison of similarity distributions of the set of candidate rhabdomyolysis ADE signals and
a random set of drugs related to other ADE showed a statistically significant difference (a KolmogoroveSmirnov test p value
of 0.006), indicating that MFBM detects a different structural
behavior in both sets of drugs (see figure 3). Table 2 shows the
TC for these 13 drugs as well as the most similar drug in the
rhabdomyolysis training dataset.
We also performed a qualitative analysis of the predictions
when using the combined method, and found interesting
examples.
Rationalization of AERS predictions through MFBM
Examples of different pharmacological classes
Several examples show that the combination of the AERS GPS
analysis and MFBM can detect drugs that belong to different
pharmacological classes but are still structurally related. An
example of a drug found by our methodology is tramadol, which
is an analgesic (pain reliever) used in treating moderate to severe
pain. Based on AERS reports between 2007 and 2009, tramadol
(brand name Ixprim) was determined to be a candidate for the
ADE rhabdomyolysis with an EBGM measure of association25 of
4.4. According to our MFBM, tramadol was found to be similar to
the drug venlafaxine with a TC of 0.93 (see tables 2 and 3).
Venlafaxine is an antidepressant confirmed to cause rhabdomyolysis as a rare ADE.31 32 Different case reports were found in the
literature35 36 to confirm these findings, indicating that tramadol
may induce rhabdomyolysis as a rare ADE, through several
mechanisms, including prolonged immobilization (due to central
nervous system depression) and neuromuscular excitability (as
serotonin syndrome). In Micromedex,31 an increased creatine
kinase level is an adverse event associated with tramadol, which
could be a clear indication of the potential for rhabdomyolysis.
Another interesting drug able to induce rhabdomyolysis
selected by the model is gabapentin, a GABA analogue used to
relieve pain, especially neuropathic pain and for the treatment of
depressive disorders. It is structurally similar to aminocaproic
acid (one of the 196 drugs in our rhabdomyolysis training
dataset), a fibrinolytic inhibitor analog of the amino acid lysine
used to treat excessive bleeding (see tables 2 and 3).
Another interesting case for further analysis is fusidic acid,
described in the literature as an agent that can interact with
statins and produce rhabdomyolysis.51 52 No rhabdomyolysis
causal reports were found in which fusidic acid was administered to the patients as the only drug. However, its structure
with a tetracyclic ring system is similar to carbenoxolone,
a known drug able to produce hypokalemia and rhabdomyolysis
(see table 3 and supplementary table S5, available online only).
Lower values for the TC can also be considered. As the TC
value decreases, the possibility of finding different classes
increases, which is more interesting than finding similar drugs in
the same class, but also the dissimilarity and the risk of incorrect
predictions is higher. Considering TC greater than 0.75 some
cases of misoprostol similar to carbenoxolone and ranolazine
similar to aripiprazole can be found. Although some reports
were found in the literature, causality cannot be established for
the adverse event (see supplementary information, available
online only).
Examples of same pharmacological classes
Many but not all of the predictions combining AERS GPS
analysis and MFBM correspond to drugs belonging to the same
pharmacological class, which can still be very useful for
researchers without a strong background in pharmacology.
Tiapride, a selective dopamine-2 receptor antagonist used as an
antipsychotic and also as a treatment for movement disorders
related to dopamine hyperactivity, is a suspected drug according
to FDA reports. The MFBM shows that it is similar to
amisulpride (see tables 2 and 3). Both drugs are implicated in
rhabdomyolysis due to malignant neuroleptic syndrome.31 44 53
The system is thus capable of predicting the possible mechanism
of action of the drug causing the adverse event.
The model can also detect the different enantiomers (stereoisomers that are mirror images of each other with the same
bidimensional formula but with different three-dimensional
orientations of their atoms) that compose a racemic mixture
of a chiral drug (with asymmetric carbon atoms). This is the
case of the pump inhibitors omeprazole and esomeprazole (see
table 2), the racemic mixture and the S-enantiomer, both are
Figure 2 Receiver operating
characteristic curve evaluating the test
set of adverse event reporting system
adverse drug event candidates with
molecular fingerprint-based modeling.
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
i77
Research and applications
Figure 3 Similarity distributions of the candidate rhabdomyolysis
adverse drug event signals and random signals related to other adverse
drug events (similarity is calculated against the rhabdomyolysis training
dataset). FDA, Food and Drug Administration.
captured by the model as being similar to pantoprazole (one of
the 196 drugs in our rhabdomyolysis training dataset) and
described in the literature as drugs able to induce rhabdomyolysis.31 A possible explanation for the adverse effect is the inhibition of potassium hydrogen ATPase, responsible for
intracellular pH equilibrium, and leading to acidic intracellular
conditions that degrade the cells.41
Another example is telbivudine, which is a nucleoside analog
used to treat hepatitis B infection. Its structure is similar to
zidovudine, the first antiretroviral drug approved for HIV
therapy (see tables 2 and 3). There are different possible mechanisms by which these drugs could cause muscle injuries, such as
mitochondrial DNA depletion, mitochondrial dysfunction,
reduced levels of L-carnitine, or apoptosis.54 There are also
different drugs, such as the benzodiazepines oxazepam, lormetazepam and alprazolam, with a strong signal according to
AERS and MFBM, but no reports were found in the literature
reporting these drugs as the cause of the adverse event (see
supplementary table S5, available online only). However, it is
possible that some of these drugs could actually cause rhabdomyolysis, although there are not yet any publications
concerning them. Fosamprenavir, similar to darunavir (a drug in
our rhabdomyolysis training dataset), could be another interesting case for further analysis.
DISCUSSION
The main goal of the study was to evaluate an approach that
combines ADE evidence from two different sources: the FDA’s
spontaneous reporting system (AERS) and MFBM, and to
demonstrate that the two sources combined together improve
the precision of ADE detection. In the first study, the performance of the MFBM is assessed in an independent dataset
(DrugBank). In the second study the GPS algorithm is applied to
AERS. We showed that the GPS algorithm is an excellent
method to generate sets of drugs highly associated with rhabdomyolysis. However, within the sets of drugs generated by
GPS, the score provided by the algorithm includes many false
positives, because there are confounding drugs with good scores
that are not causally related to the ADE, and therefore that set
could be improved by filtering out some of the confounders. The
application of the fingerprint model rationalizes the AERS
signals selected with the GPS score and provides a smaller set of
candidates with better enrichment factors. Although the
i78
fingerprint model performs similarly in an independent dataset,
the fingerprint model applied to the GPS candidates offers
a better true positive/false positive ratio due to the higher
concentration of rhabdomyolysis drugs provided by the GPS
algorithm. The combined method cannot increase the number of
true positives but considerably reduces the number of false
positives detected by AERS analysis. When we ignored drugs
already in our rhabdomyolysis training dataset, we found
a twofold precision improvement. The simplicity of the model
assists in highlighting the etiology of the ADE by identifying
structurally similar drugs, for which information is available and
can be used to help understand possible causes, such as the
mechanism of action. Although we used the AERS data in this
study to test the method for the ADE rhabdomyolysis, the
method could also apply to the use of electronic health records
as an initial source of ADE detection as well as to ADEs other
than rhabdomyolysis.
This system is not designed to replace the existing pharmacovigilance methods used to evaluate the importance of the
signals, ie, study of the potential relevance of the signal or the
biological plausibility, but to enhance the existing methods,
providing additional information to make decisions. The
simplicity of the model allows the researcher to detect the drug
in the training dataset for a given ADE that is most similar to
the AERS ADE drug candidate, which is useful for examining
the reports and the available information to decide the importance of the signal.
When the MFBM was applied to a large set of drugs in the
DrugBank database19 (see supplementary tables S2eS4, available
online only), the results were similar to those previously
reported.11 The predictability of fingerprint-based models by
themselves is limited due to the complexity associated with the
modeling of complex human clinical adverse events. The
performance of our fingerprint-based model was probably
affected by high molecular variability, and by the large number
of potential targets and biological mechanisms associated with
the clinical adverse event rhabdomyolysis. Nevertheless, the
model is very useful for generating sets of drugs with good
enrichment factors.
The performance of the model is highly dependent on the
suitable construction of a training dataset for a given ADE,
requiring a heterogeneity representation of the different structural
classes of drugs highly related to the adverse event. In addition,
training datasets can be updated continuously in order to enhance
performance whenever a new compound is identified as causing
an event. Although BIT_MACCS fingerprints have been shown to
be successful in representing molecular structure,55 56 alternative
methods could be explored that use different structural representations for molecules. Different types of models that use
complex data analysis methods, such as neural networks or
support vector machines, could be developed, but these methodologies would increase the complexity of the process.
When evaluating performance, we considered drugs that were
not yet known to cause rhabdomyolysis as false positives.
However, it is possible that some of these drugs actually do
cause rhabdomyolysis but that the association has not yet been
discovered. Therefore, it is possible that the true false positive
rate of this method is lower than we determined.
CONCLUSIONS
This study demonstrates the importance and the usefulness of
the proposed method, which incorporates molecular structural
information with data mining methods based on AERS to
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
Research and applications
Table 3 Examples of some AERS plus MFBM candidates and similar molecules in the rhabdomyolysis training dataset, along with the TC and EBGM
(ADE signals are qualified by GPS using a measure called EBGM)
AERS+MFBM drug candidate
Tramadol
Telbivudine
Tiapride
Gabapentin
Fusidic acid
Fosamprenavir
Most similar drug in the
rhabdomyolysis training
dataset
TC
EBGM
0.93
4.40
0.85
14.25
0.83
7.63
0.82
3.37
0.95
40.48
0.91
2.71
0.82
8.21
Venlafaxine
Zidovudine
Amisulpride
Aminocaproic acid
Carbenoxolone
Darunavir
Primidone
Phenobarbital
The first four examples (tramadol, telbivudine, tiapride and gabapentin) have rhabdomyolysis causal reports. No reports were found relating fusidic acid, fosamprenavir and primidone as the
direct cause of rhabdomyolysis.
ADE, adverse drug event; AERS, adverse event reporting system; EBGM, empirical Bayes geometric mean; GPS, gamma Poisson shrinker; MFBM, molecular fingerprint-based modeling; TC,
Tanimoto coefficient.
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417
i79
Research and applications
improve the initial signals generated from AERS. The proposed
method led to the efficient prioritization of the ADE candidates
selected from AERS so that the data could be monitored further
to evaluate the importance of the generated signal. A set of drugs
with strong evidence from both AERS and MFBM has also been
constructed for further analysis. The results we obtained
confirm that the molecular fingerprint-based structural similarity model could be used as a powerful pharmacovigilance
decision support tool to facilitate early ADE detection.
Acknowledgments The authors would like to thank Krystl Haerian, MD, for her
help with the rhabdomyolysis dataset construction.
Funding This work was supported by grants R01 LM010016 (CF), R01
LM010016-0S1 (CF), R01 LM010016-0S2 (CF), R01 LM008635 (CF) and
1R01LM010140-01 (RR), from the National Library of Medicine and by the Intramural
Research Program of the National Institute of Diabetes and Digestive and Kidney
Diseases (NIDDK) of the National Institutes of Health.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
FDA US Food and Drug Administration. Adverse Event Reporting System. http://
www.fda.gov/cder/aers/default.htm. (accessed Aug 2010, posted 2009)
FDA US Food and Drug Administration. MedWatch: The FDA Safety Information
and Adverse Event Reporting Program. http://www.fda.gov/Safety/MedWatch/
default.htm (accessed Aug 2010, posted 2010).
Wadman M. News feature: strong medicine. Nat Med 2005;11:465e6.
Stephenson WP, Hauben M. Data mining for signals in spontaneous reporting
databases: proceed with caution. Pharmacoepidemiol Drug Saf 2007;16:359e65.
Platt R, Wilson M, Chan KA, et al. The new sentinel networkdimproving the
evidence of medical-product safety. N Engl J Med 2009;361:645e7.
Reese JA, Li XN, Hauben M, et al. Identifying drugs that cause acute
thrombocytopenia: an analysis using 3 distinct methods. Blood 2010;116:2127e33.
Vilar S, Santana L, Uriarte E. Probabilistic neural network model for the in silico
evaluation of anti-HIV activity and mechanism of action. J Med Chem
2006;49:1118e24.
Martin YC, Kofron JL, Traphagen LM. Do structurally similar molecules have similar
biological activity? J Med Chem 2002;45:4350e8.
Winkler DA. The role of quantitative structureeactivity relationships (QSAR) in
biomolecular discovery. Brief Bioinform 2002;3:73e86.
Costanzi S, Vilar S, Micozzi D, et al. Delineation of the Molecular Mechanisms of
Nucleoside Recognition by Cytidine Deaminase through Virtual Screening.
ChemMedChem 2011;6:1452e8.
Valerio LG Jr. In silico toxicology for the pharmaceutical sciences. Toxicol Appl
Pharmacol 2009;241:356e70.
FDA US Food and Drug Administration. Informatics and Computational Safety
Analysis Staff (ICSAS). http://www.fda.gov/AboutFDA/CentersOffices/CDER/
ucm092125.htm (accessed Oct 2010, posted 2009).
Harpaz R, Perez H, Chase HS, et al. Biclustering of adverse drug events in the FDA’s
spontaneous reporting system. Clin Pharmacol Ther 2011;89:243e50.
Hauben M, Madigan D, Gerrits CM, et al. The role of data mining in
pharmacovigilance. Expert Opin Drug Saf 2005;4:929e48.
Vilar S, Cozza G, Moro S. Medicinal chemistry and the molecular operating
environment (MOE): application of QSAR and molecular docking to drug discovery.
Curr Top Med Chem 2008;8:1555e72.
MOE, version 2009.10. Chemical Computing Group, Inc. http://www.chemcomp.com
(accessed Oct 2010, posted 2010).
Daylight Chemical Information System, Inc. http://www.daylight.com/dayhtml/doc/
theory/theory.finger.html (accessed Oct 2010, posted 2008).
MACCS structural keys. MDL Information Systems Inc. http://www.mdli.com/
(accessed Jun 2008, posted 2007).
DrugBank Open Data Drug and Drug Target Database. DrugBank database,
version 2.5. http://www.drugbank.ca/ (accessed Aug 2010, posted 2010).
European Medicines Agency. http://www.ema.europa.eu/ (accessed Sep 2010,
posted 2010).
World Health Organization. http://www.who.int/en/ (accessed Sep 2010,
posted 2010).
The General Practice Research database. http://www.gprd.com/home/
(accessed Jan 2011, posted 2011).
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
Observational Medical Outcomes Partnership (OMOP). http://omop.fnih.org/
(accessed Jan 2011, posted 2011).
Wang XY, Hripcsak G, Markatou M, et al. Active computerized pharmacovigilance
using natural language processing, statistics, and electronic health records:
a feasibility study. J Am Med Inform Assoc 2009;16:328e37.
DuMouchel W. Bayesian data mining in large frequency tables, with an application
to the FDA spontaneous reporting system. Am Stat 1999;53:177e90.
Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer
systems to efficiently signal higher-than-expected combinations of drugs and
events in the US FDA’s spontaneous reports database. Drug Saf
2002;25:381e92.
Friedman C, Shagina L, Lussier Y, et al. Automated encoding of clinical
documents based on natural language processing. J Am Med Inform Assoc
2004;11:392e402.
US National Library of Medicine. Unified Medical Language System (UMLS).
http://www.nlm.nih.gov/research/umls/rxnorm/ (accessed Sep 2010, posted 2010).
Harpaz R, Haerian K, Chase HS, et al. Statistical Mining of Potential Drug Interaction
Adverse Effects in FDA’s Spontaneous Reporting System. AMIA Annu Symp Proc
2010:281e5.
The MSSO - Maintenance and Support Services Organization. Welcome to
MedDRA and the MSSO. MedDRA - the Medical Dictionary for Regulatory Activities.
https://meddramsso.com/ (accessed Sep 2010, posted 2010).
Micromedex Ò Healthcare Series [Internet database]. Updated periodically.
Greenwood Village, CO: Thomson Reuters (Healthcare) Inc., 2010.
Drug Information Online. Drugs.com. http://www.drugs.com (accessed Aug 2010,
posted 2010).
Hsiao SH, Chang CM, Tsao CJ, et al. Acute rhabdomyolysis associated with
ofloxacin/levofloxacin therapy. Ann Pharmacother 2005;39:146e9.
Baril L, Maisonobe T, Jasson-Molinier M, et al. Acute rhabdomyolysis during
treatment with ofloxacinda case report. Clin Infect Dis 1999;29:1598e9.
Garrett PM. Tramadol overdose and serotonin syndrome manifesting as acute right
heart dysfunction. Anaesth Intensive Care 2004;32:575e7.
Yousef Khan F, Yousef H, Errayes M. Tramadol toxicity-induced rhabdomyolysis.
J Emerg Trauma Shock 2010;3:421e2.
Rosenberg H, Sambuughin N, Dirksen R. Malignant Hyperthermia Susceptibility.
Seattle (WA): University of Washington, Seattle: GeneReviews [Internet],
1993e2003.
Nozaki M, Suzuki T, Hirano M. Rhabdomyolysis associated with omeprazole.
J Gastroenterol 2004;39:86.
Bebarta VS, King JA, McDonough M. Proton pump inhibitor-induced rhabdomyolysis
and hyponatremic delirium. Am J Emerg Med 2008;26:519.e1e2.
Tuccori M, Giovannoni S, Giustini SE, et al. Acute severe myopathy following
a single infusion of omeprazole. Ann Pharmacother 2006;40:352e3.
Troger U, Reiche I, Jepsen MS, et al. Esomeprazole-induced rhabdomyolysis in
a patient with heart failure. Intensive Care Med 2010;36:1278e9.
Telbivudine: rhabdomyolysis and neuropathy. Prescrire Int 2010;19:171.
George P, Das J, Pawar B, et al. Gatifloxacin-induced rhabdomyolysis. J Postgrad
Med 2008;54:233e4.
Tamion F, Petit J, Massari P, et al. [Malignant Neuroleptic Syndrome during tiapride
treatment] (In French). J Toxicol Clin Exp 1990;10:461e7.
Bilgir O, Calan M, Bilgir F, et al. Gabapentin-induced rhabdomyolysis in a patient
with diabetic neuropathy. Intern Med 2009;48:1085e7.
Tuccori M, Lombardo G, Lapi F, et al. Gabapentin-induced severe myopathy.
Ann Pharmacother 2007;41:1301e5.
Yanagisawa K, Nagai S, Kimura Y, et al. A case of rhabdomyolysis by donepezil
hydrochloride in an elder type 2 diabetes mellitus. Acta Medica Nosocomi Sapporo
2005;65:21e5.
Tommasi C, Tempestilli M, Fezza R, et al. A rare case of severe myopathy
associated with etravirine use. AIDS 2010;24:1088e90.
Ebadi M, Pfeiffer RF, Murrin LC. Pathogenesis and treatment of neuroleptic
malignant syndrome. Gen Pharmacol 1990;21:367e86.
Yang CC, Deng JF. Anticholinergic syndrome with severe rhabdomyolysisdan
unusual feature of amantadine toxicity. Intensive Care Med 1997;23:355e6.
Magee CN, Medani SA, Leavey SF, et al. Severe rhabdomyolysis as a consequence
of the interaction of fusidic acid and atorvastatin. Am J Kidney Dis 2010;56:E11e15.
Burtenshaw AJ, Sellors G, Downing R. Presumed interaction of fusidic acid with
simvastatin. Anaesthesia 2008;63:656e8.
Harter C, Obier C, Druschky KF, et al. [Malignant neuroleptic syndrome associated
with amisulpride] (In German). Nervenarzt 2008;79:86e9.
Scruggs ER, Naylor AJD. Mechanisms of zidovudine-induced mitochondrial toxicity
and myopathy. Pharmacology 2008;82:83e8.
Durant JL, Leland BA, Henry DR, et al. Reoptimization of MDL keys for use in drug
discovery. J Chem Inf Comput Sci 2002;42:1273e80.
Ewing T, Baber JC, Feher M. Novel 2D fingerprints for ligand-based virtual
screening. J Chem Inf Model 2006;46:2423e31.
PAGE fraction trail=8
i80
J Am Med Inform Assoc 2011;18:i73ei80. doi:10.1136/amiajnl-2011-000417