Download Statistical methods for knowledge discovery in adverse drug reaction surveillance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical methods for knowledge discovery in adverse drug reaction surveillance
Statistical methods for knowledge
discovery in adverse drug reaction
surveillance
G. Niklas Norén
Stockholm University
c G. Niklas Norén, Stockholm 2007
Cover photography by G. Niklas Norén
ISBN 91-7155-411-4 pp. 1–41
Typeset by LATEX
Printed in Sweden by Universitetsservice AB, Stockholm 2007
Distributor: Department of Mathematics, Stockholm University
Abstract
Collections of individual case safety reports are the main resource for early
discovery of unknown adverse reactions to drugs once they have been introduced to the general public. The data sets involved are complex and based on
voluntary submission of reports, but contain pieces of very important information. The aim of this thesis is to propose computationally feasible statistical
methods for large-scale knowledge discovery in these data sets. The main contributions are a duplicate detection method that can reliably identify pairs of
unexpectedly similar reports and a new measure for highlighting suspected
drug–drug interaction.
Specifically, we extend the hit-miss model for database record matching
with a hit-miss mixture model for scoring numerical record fields and a new
method to compensate for strong record field correlations. The extended
hit-miss model is implemented for the WHO database and demonstrated to
be useful in real world duplicate detection, despite the noisy and incomplete
information on individual case safety reports. The Information Component
measure of disproportionality has been in routine use since 1998 to screen the
WHO database for excessive adverse drug reaction reporting rates. Here, it is
further refined. We introduce improved credibility intervals for rare events,
post-stratification adjustment for suspected confounders and an extension
to higher order associations that allows for simple but robust screening for
potential risk factors. A new approach to identifying reporting patterns
indicative of drug–drug interaction is also proposed. Finally, we describe how
imprecision estimates specific to each prediction of a Bayes classifier may be
obtained with the Bayesian bootstrap. Such case-based imprecision estimates
allow for better prediction when different types of errors have different
associated loss, with a possible application in combining quantitative and
clinical filters to highlight drug–ADR pairs for clinical review.
List of Papers
This thesis is based on the following original publications, which are referred
to in the text by their Roman numerals.
I
II
III
IV
V
Norén, G. N., Orre, R., Bate, A., Edwards, I. R. (2007). Duplicate
detection in adverse drug reaction surveillance. Data Mining and
Knowledge Discovery. Published on-line.
Norén, G. N., Bate, A., Orre, R., Edwards, I. R. (2006). Extending the methods used to screen the WHO drug safety database
towards analysis of complex associations and improved accuracy
for rare events. Statistics in Medicine, 25(21):3740–3757.
Hopstadius, J, Norén, G. N., Bate, A., Edwards, I. R. (2007).
Adjustment for potential confounders in adverse drug reaction
surveillance. Submitted for publication.
Norén, G. N., Sundberg, R., Bate, A., Edwards, I. R. (2007). A
statistical methodology for drug–drug interaction surveillance.
Submitted for publication.
Norén, G. N., Orre, R. (2005). Case based imprecision estimates
for Bayes classifiers with the Bayesian bootstrap. Machine
Learning, 58(1):79–94.
Reprints of I, II and V were made with kind permission from the publishers.
Contents
Part I: Thesis summary
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Adverse drug reaction surveillance
.........................
2.1 Individual case safety reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The WHO database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Adverse drug reaction signal detection . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Knowledge discovery in adverse drug reaction surveillance . . . . . . .
3.1
3.2
3.3
3.4
3.5
3.6
3.7
Context . . . . . . . . . . . . . . . . .
Process . . . . . . . . . . . . . . . . .
Disproportionality . . . . . . . . . .
Shrinkage . . . . . . . . . . . . . . .
Pattern discovery and detection .
Facilitating interpretation . . . . .
Future directions . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Paper I . .
Paper II .
Paper III .
Paper IV
Paper V .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
6
8
11
.
.
.
.
.
.
.
4 Overview of the papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
4.2
4.3
4.4
4.5
1
3
12
13
15
17
19
20
22
25
.
.
.
.
.
25
28
30
31
32
35
37
Part I:
Thesis summary
1. Introduction
It is in the nature of pharmaceutical development that the full safety profile
of a new medicinal product will not be known at the time it is introduced to
the general public. Because randomised clinical trials are limited in both the
types and numbers of patients exposed, continued safety monitoring of drugs
is in the interest of patients, regulatory authorities and pharmaceutical companies (Finney 1966, Evans 2000). Individual case safety reports are submitted
by health professionals based on suspected adverse drug reaction (ADR) incidents (Edwards and Aronson 2000) observed in real world clinical practice.
They remain one of the best resources for early post-marketing discovery of
potential public health or patient safety issues. They are rich sources of information, but anecdotal in nature. The reliance on voluntary submission, the
variation in quality of information and the large number of new reports submitted to national and international organisations every year provide a range
of interesting statistical challenges.
1.1
Aim
The overall aim of this thesis is to propose improved statistical methods for
knowledge discovery in collections of individual case safety reports. I proposes a new method for automated duplicate detection based on the hit-miss
model introduced for statistical record linkage (matching records across data
sets) by Copas and Hilton (1990). An extended hit-miss model that handles
numerical record fields and compensates for correlations between record fields
is implemented for the WHO database and demonstrated to be useful in real
world duplicate detection. II proposes improved credibility intervals, a poststratification approach to adjustment for confounding variables and an extension to higher order associations for the Information Component (IC) measure
of disproportionality used to screen the WHO database for excessive ADR relative reporting rates. III demonstrates that the post-stratification adjustment
of the observed-to-expected ratio for suspected confounders adopted for the
IC in II may lead to spurious underestimation in the presence of any very
small strata in a stratified data set. A comparison to a literature reference indicates that while routine adjustment for some potential confounders in first
pass screening of collections of individual case safety reports does improve
1
performance, the magnitude of this improvement is modest compared to the
improvement from a triage (prioritisation) criterion requiring reports from at
least two countries before a drug–ADR pair is highlighted for clinical review.
This suggests that confounding may have less impact on the analysis of individual case safety reports than previously believed. IV introduces a new measure of drug–drug interaction for collections of individual case safety reports.
Unlike methods proposed previously for this purpose, it defines interaction
as departure from a baseline model with independent attributable risk. V introduces a Bayesian bootstrap method for estimating the uncertainty in Bayes
classification associated with each individual prediction. We demonstrate how
this information can be used to improve performance, when different types of
errors have different associated loss, with a possible application in selecting
drug–ADR pairs for detailed clinical review.
2
2. Adverse drug reaction surveillance
The analysis of individual case safety reports is the cornerstone of early postmarketing ADR detection (Rawlins 1988). Whereas large, formal drug safety
studies are useful to test specific hypotheses related to drug safety, they are not
suitable for continuous monitoring with the aim of detecting previously unsuspected ADRs, as early as possible. In the context of this PhD thesis, ADR
surveillance refers exclusively to drug safety monitoring based on individual
case safety reports. It thus excludes other post-marketing efforts such as the
intensive monitoring programs of New Zealand and the United Kingdom, as
well as safety monitoring based on health registries and hospital-based safety
monitoring. For comprehensive overviews of post-marketing ADR surveillance, see Lindquist (2003) and Bate (2003).
2.1
Individual case safety reports
Individual case safety reports communicate genuine clinical concerns from
observant health professionals (Edwards 1999). As they are based on actual
patients in real world clinical practice, their collection and analysis increase
the chance to discover ADRs that are due to drug–drug interaction, affect patients with certain medical predispositions or that belong to patient subgroups
that tend to be excluded from pre-marketing clinical trials, such as children or
pregnant women. In addition, the large numbers of patients exposed and the
unlimited follow-up time available considerably increase the chance to detect
ADRs that are rare or that occur only after extended periods of use.
An example of an authentic individual case safety report is provided in Figure 2.1. Much of the information on these reports can be originally provided
as free text, some of which is later encoded as structured information upon
database entry. This is usually done by trained personnel at pharmaceutical
companies or at national authorities. The encoding of observed ADR incidents
in terms of standardised terminology is a critical part of the preprocessing.
One potential pitfall is the risk of misinterpretation when the ADR encoding
is performed by someone who has never actually met the patient. Variation
in coding across regions and time periods may lead to systematic differences
that can affect subsequent data analysis. A general problem is that several
ADR terms are often applicable to a given incident. Thus, exploratory analysis
3
Figure 2.1: Sample individual case safety report. Reprinted with kind permission of
the Adverse Drug Reactions Unit at the Therapeutic Goods Administration of Australia
4
focusing on single ADR terms may fail to include all relevant reports — a phenomenon which has been referred to as ‘signal fragmentation’ (Purcell 2003).
While in the follow-up of specific issues, this can be remedied by specifying
groups of relevant ADR terms for the issue of interest, it is not obvious how
such strategies can be easily automated for routine exploratory analysis.
Individual case safety reports refer to suspected ADR incidents and some adverse events observed in association with drug prescription will in reality be
coincidental, due to concomitant medication or natural progression of the underlying disease. At the same time, not all ADR incidents that actually occur
are identified as such and eventually reported to the national drug safety centres. The degree of under-reporting is unknown but can be expected to vary
with the severity of the suspected ADR, across geographical regions and time
periods. There may also be variation in the propensity to report suspected
ADRs during the life-span of a drug and in response to any attention to suspected drug safety issues in the public or scientific media. The categories of
health professionals who are allowed to submit reports also differ over time
and between regions. Some countries allow only medical doctors to submit
reports, whereas others accept reports from medical nurses and pharmacists
as well. In addition, some countries encourage direct consumer reporting. Unsurprisingly, the propensity to report suspected ADRs of different types varies
considerably between different categories of reporters (Savage 1985).
An important characteristic of individual case safety report submission is that
separate reports sometimes have a common origin and therefore cannot be
considered as independent pieces of information (Finney 1973). This may
distort automated knowledge discovery and mislead clinical review. The most
obvious cause of non-independent reports is report duplication, where a single suspected ADR incident results in several reports. This phenomenon is
discussed at some length in I. More subtle examples include groups of reports
provided by the same health professional, such as those from the Norwegian
dentist discovered in I, reports from the same clinical study (sometimes mislabelled as spontaneous reports) or separate reports for the same patient at different points in time. If single individuals are responsible for encoding large
numbers of reports, this may also induce superficial similarity between reports. A potential example of this is the group of over 600 very similar reports
originally collected by a single law firm, discovered in IV. Violated independence assumptions differ from other data quality issues in that they do not
relate to the quality of single reports, but to the quality of collections of reports. Even upon the confirmation that a pair of reports are indeed duplicates
it is not obvious how to proceed: should the suspected duplicates be flagged or
should one of them perhaps be removed from the data set (if so, which one)?
5
Drugs
ADRs
Anatomical Therapeutic
Class
Reports
System Organ Class
Selective serotonin reuptake inhibitors (SSRI)
193,939
Body as a whole - general
disorders
1,218,425
Antiinfl.
prep.
nonsteroids for topical use
(NSAID)
180,770
Skin and appendages disorders
1,070,189
Platelet aggregation inhibitors excl. heparin
179,226
Gastro-intestinal system
disorders
902,238
ACE inhibitors, plain
171,706
Central & peripheral nervous system disorders
853,883
Benzodiazepine
tives
157,898
Psychiatric disorders
677,227
deriva-
Reports
Table 2.1: The most commonly reported groups of drugs and ADRs in the WHO
database (note that each report may list more than one drug and more than one ADR).
2.2
The WHO database
The Uppsala Monitoring Centre maintains and analyses the world’s largest
collection of individual case safety reports. As of December 2006, the WHO
database contained over 3.8 million reports, with a current yearly growth of
over 200,000 reports (see Figure 2.2). The database is held on behalf of the
countries participating in the WHO Programme for International Drug Monitoring, whose number has continued to grow from the founding 10 countries
in 1968 to over 80 member countries at the end of 2006. The international
coverage allows rare but important public health or patient safety issues to be
detected earlier after drug launch than if based on isolated analysis of national
data sets (Olsson 1998). Variation between countries in the range of available drug substances, populations at risk, reporting culture and regulation may
influence relative reporting rates and make knowledge discovery more complicated in international data sets. At the same time, this diversity is an invaluable asset in detecting public health or patient safety issues related to for
example ethnic or dietary ADR risk factors. Thus, even though most reports in
the WHO database come from the USA and other industrialised nations, the
worldwide coverage of the WHO programme is perhaps its greatest strength.
As is clear from Figure 2.2, the vast majority of reports in the WHO database
are so-called spontaneous reports that refer to observations in regular clinical
practice. However, a small minority are from intensive monitoring programs
or clinical studies. Such atypical reports should in principle be labelled as
6
Number of reports
4,000,000
3,000,000
2,000,000
1,000,000
0
1970
1975
1980
1985
1990
1995
2000
2005
Year
a. Database growth
United States
United Kingdom
Germany
Canada
France
Australia
Other countries
20%
5%
Spontaneous reports
Other
5%
5%
5%
47%
95%
6%
12%
b. Biggest contributors
c. Types of reports
2,000,000
Number of reports
Number of reports
4,000,000
3,000,000
2,000,000
1,000,000
0
0
2
4
6
8
Number of drugs per report
1,500,000
1,000,000
500,000
0
10
d. Number of drugs per report
0
2
4
6
8
Number of ADRs per report
10
e. Number of ADRs per report
Number of reports
50,000
40,000
30,000
20,000
10,000
0
0
10
20
30
40
50
60
Patient age (years)
70
80
90
100
110
f. Patient age distribution
Figure 2.2: Characteristics of the WHO database
7
such, but occasional mislabellings do occur. Thus, they cannot reliably be
excluded from the analysis.
Table 2.1 indicates what groups of drugs and ADRs have been reported most
often during the entire life span of the WHO database. From Figure 2.2, it
is clear that most reports list only one suspected drug and between one and
four ADRs, but there are reports that deviate from this general pattern, and
list very large numbers of drugs and ADRs. The most striking aspect of the
empirical age distribution in Figure 2.2 is perhaps the large number of reports
for children less than two years of age. A large proportion of these relate to
suspected adverse reactions to vaccines. Another interesting phenomenon is
the digit preference on 0 and 5 for encoding patient age.
2.3
Adverse drug reaction signal detection
The detection of early warnings related to potential public health or patient
safety issues is the main aim of collecting and analysing individual case safety
reports. In the context of ADR surveillance, the WHO defines a signal as:
"Reported information on a possible causal relationship between an adverse
event and a drug, the relationship being unknown or incompletely documented
previously. Usually more than a single report is required to generate a signal, depending upon the seriousness of the event and quality of the information." (Edwards and Biriell 1994)
As is clear from the definition, single reports in isolation rarely motivate the
communication of an early warning of a potential ADR, but there are exceptional examples where single reports of very high quality do (Meyboom et al.
1997). Particularly valuable pieces of information in this respect are those that
indicate the effect on the ADR of withdrawing the suspected medication (socalled dechallenge intervention), and the effect of re-exposing the patient to
the suspected treatment, after a successful dechallenge (so-called rechallenge
intervention) (Edwards et al. 1990). Moreover, Aronson and Hauben (2006)
argue that there are certain types of ADRs for which single, well documented
incidents may motivate early warning, much in the spirit of the triage algorithms proposed by Ståhl et al. (2004).
Early warning of a potential ADR is possible even in the absence of any individually very strong reports, if there is a large enough number of reports on
the drug–ADR pair of interest (Edwards et al. 1990). This is true in particular when alternative systematic explanations to excessive reporting rates, such
as reporting biases or strong confounding, can be dismissed and the relative
8
Figure 2.3: Signal detection process
reporting rate remains excessive even after suspected duplicates have been
removed.
The aim of ADR signal detection is to generate, strengthen and refine hypotheses related to suspected drug toxicity. Hypothesis testing is not possible
on account of the inherently non-systematic nature of data collection and the
lack of proper comparison groups. In-depth clinical evaluation and scrutiny
of reports remain at the core of the ADR signal detection process. However, the WHO database receives tens of thousands of reports every month
and this massive inflow of reports require efficient computational methods
to help clinical experts focus on the groups of reports most likely to represent important public health or patient safety issues (Meyboom et al. 2002).
As indicated in Figure 2.3, the signal detection process in routine use on the
WHO database consists of a combination of automated knowledge discovery methods (Bate et al. 1998), triage (prioritisation) algorithms and clinical
review (Ståhl et al. 2004). The knowledge discovery methods highlight drug–
ADR pairs with unexpectedly large numbers of reports relative to the average
reporting rates in the database. Triage algorithms use a combination of quantitative and qualitative information to focus attention on the most urgent issues
for follow-up (Ståhl et al. 2004). Reports related to drug–ADR pairs singled
out by the triage algorithm are forwarded to a panel of international experts for
clinical review. In the context of the clinical review, pattern discovery methods
may often be useful to profile larger groups of reports and suggest alternative
explanations to observed excessive reporting rates. Hypotheses of suspected
ADRs first highlighted in automated knowledge discovery that remain after
clinical review are routinely communicated to the drug safety community,
and some have been published in the mainstream medical literature (Coulter
9
et al. 2001, Sanz et al. 2005). However, the risk of distortion from undiscovered data quality problems and the difficulty of obtaining complete, detailed
information on reported ADR incidents mean that signals of suspected ADRs
often remain tentative, even after clinical review.
10
3. Knowledge discovery in adverse
drug reaction surveillance
Vast improvements in data storage capacity over the last decades have spurred
ever increasing ambitions to analyse large, complex data sets not originally
collected for the purpose of statistical analysis. Such investigations require
data analysis methodology that scales well with increasing amounts of data
and that focuses on discovery and exploration rather than on inference. This
area of research and application, on the border between mathematical statistics
and computer science, is referred to as knowledge discovery or data mining.
Fayyad et al. (1996) describe data mining as one step in a more general knowledge discovery process. Mannila (1996) and Hand (1998) emphasise the similarity between data mining and exploratory statistical analysis, the latter characterising the difference as one primarily related to data set size and properties: in data mining, contamination, nonstationarity and biases are standard.
On account of the complex data sets involved, interpretability is often a main
consideration, which may favour simplicity at the expense of prediction accuracy (Glymour et al. 1997). An important dividing line is the choice between
model based inference and algorithmic approaches (Breiman 2001). Whereas
much of the research on knowledge discovery has been driven by computer
science, key contributions from the statistical community include the clarification of inferential processes underlying algorithmic methods, insight into
the bias–variance trade-off in determining model complexity, methods for
quantifying uncertainty and placing emphasis of the impact on interpretation
of potential distortions such as confounding or selection biases (Elder and
Pregibon 1996, Glymour et al. 1997, Efron 2001).
In contrast with the more rigid framework for hypothesis testing, knowledge
discovery is usually an interactive and iterative process of increasingly refined
hypothesis generation. In my view, it should combine an unintimidated attitude towards the analysis of problematic and complex data sets with a proper
understanding and clear statement of the limitations in nature and strength of
the conclusions that can be drawn.
11
3.1
Context
Collections of individual case safety reports clearly contain important pieces
of rich and very useful information (Finney 1973, Edwards 1997), but they
constitute an inherently non-random sample. The presence of reporting biases and violated independence assumptions discussed in Section 2 render
summary statistics potentially deceptive. In particular, the presence of nonindependent reports can lead to optimistic precision estimates and invalidate
standard tests for association (Finney 1971). As a consequence, the place
for statistical methodology in the analysis of collections of individual case
safety reports is somewhat out of the ordinary. Its main focus is on providing
a framework for effective hypothesis generation and refinement, rather than
on hypothesis testing (Bate 2003). Methods for reliably identifying elevated
ADR reporting rates in collections of individual case safety reports are already
part of routine drug safety signal detection (Ståhl et al. 2004). In the future,
methods for highlighting suspected drug–drug interaction, groups of nonindependent reports or reporting patterns involving larger sets of drugs and
ADRs should allow for even more sophisticated use of this valuable source of
information.
The emphasis on hypothesis generation and refinement applies throughout this
thesis: the aim of the record matching algorithm in I is to highlight likely
duplicates for manual review and the aim of II, III and IV is to determine
the most effective approach to highlighting apparently excessive ADR reporting rates for further follow-up. The purpose of implementing the methods in
V for prioritisation of drug–ADR pairs for further follow-up as discussed in
Section 4.5 would also be effective hypothesis generation.
In knowledge discovery, large numbers of possible associations and patterns
are considered simultaneously. Familywise error rates that reflect the probability that any highlighted association corresponds to a false positive are usually
less relevant in this context, because all open-ended investigations are bound
to produce some false positives. Performance is better evaluated in terms of
measures that indicate the proportion of false positives that can be anticipated
in a specific study, such as false discovery rates. In our work, we have used two
related measures of performance from the literature on Information Retrieval:
precision (the number of true positives over the sum of true and false positives) and recall (the number of true positives over the sum of true positives
and false negatives). Precision–recall graphs that indicate how the precision
and recall vary by the threshold for clinical review are used in both I and III.
They provide an informative overview of performance, independent of the selected threshold.
12
Figure 3.1: Exploratory analysis of collections of individual case safety reports
3.2
Process
Fayyad et al. (1996) define knowledge discovery as:
"The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data."
The knowledge discovery process is not limited to actual data analysis but
includes: data collection, cleaning and preparation; reduction and projection;
data analysis and interpretation, and finally dissemination, incorporation into
existing structures and action based on discovered knowledge. It thus entails
the entire ADR signal detection process outlined in Section 2.3, from the collection of reports and their pooling in an international database, through data
preparation and transformation including conversion from free text to structured information, data cleaning and duplicate detection, via disproportionality analysis and triage algorithms to clinical review, and finally communication to national centres, pharmaceutical companies and the general public.
The statistical methodology developed in the context of this thesis is applied
at two different stages of the knowledge discovery process for ADR surveillance, as indicated in Figure 2.3. On one hand, disproportionality analysis is
a core component in screening for excessive ADR reporting rates in first pass
analysis of the database. On the other hand, pattern discovery methods are
13
useful in assisting clinical review and highlighting interesting aspects of specific groups of reports in more detailed investigations. Figure 3.1 proposes a
general framework for such exploratory analysis. For the purpose of illustration, assume that the data subset of interest consists of all reports involving
a particular drug D. At the outset of the exploratory analysis, simple descriptive information such as the total number of reports listing D and from what
countries and during what time periods they have been submitted, may be
very useful. Together with lists of the most commonly co-reported drugs and
ADRs, as well as empirical distributions for patient age and gender, this provides a descriptive overview of the reporting of D which can serve as a useful
reference for subsequent discoveries.
Experienced data analysts may react directly to descriptive information that
contradicts their subject matter knowledge. For example, a domain expert familiar with the WHO database may react to the observation that a suspiciously
large proportion of the reports in a subgroup of interest have been submitted
from a country with a low overall reporting rate. The middle box in Figure 3.1
is an attempt to formalise such comparative data analysis. Contrasts between
the group of reports of interest and a comparison group (e.g. the database as
a whole or all reports involving a drug in the same class of drugs as D) provide insight into what properties of the data subset differentiate it from the
comparison group. For example, it may turn out that the relative reporting rate
of a rare ADR for D by far exceeds that in the database as a whole. Such
discrepancies may well be more enlightening than information on what the
most commonly reported ADR is in absolute terms. The discussion of such
disproportionality analysis is further extended in Section 3.3.
Both descriptive and comparative studies may be misleading when the group
of interest contains distinct subgroups. For example, if D is prescribed on one
hand to young males and on the other hand to elderly females, the summary information that the average patient age on reports listing D is 43 years and that
the overall proportion of females is 52%, conveys a very insufficient overview.
Clustering algorithms allow for automated partitioning of data, with the aim of
detecting latent structure, and may allow for much more relevant subsequent
descriptive or comparative data analysis, as indicated by Figure 3.1.
In addition to the iteration of automated partitioning, description and comparison described above, there are other methods for pattern discovery in collections of individual case safety reports. Record matching methods such as that
adapted for duplicate detection in I can be used to detect groups of unexpectedly similar reports. Modified Hopfield networks and clustering algorithms
such as those evaluated in Orre et al. (2005) may allow groups of often recurring ADRs (syndromes) to be identified. Similarly, interaction detection
methods such as those in II and IV can be used to highlight suspected ADR
risk factors.
14
It is rarely possible to specify at the outset of a large exploratory study, a fully
automated, all-purpose approach to exploratory data analysis appropriate for
all possible questions and patterns of potential interest. In addition, knowledge discovery often produces results that relate not to the primary study
objective, but to fundamental properties of the data or of the data collection
process. Thus, data cleaning and analysis are in practice intertwined, so that
the correction of a data quality problem highlighted in initial data analysis
allows for more refined subsequent data analysis. For example, in screening
the WHO database for reporting patterns indicative of suspected drug interaction in IV, some larger groups of non-independent reports were highlighted.
Their removal may allow for more accurate subsequent studies of drug–drug
interaction in the WHO database.
3.3
Disproportionality
The frequency or relative frequency of a certain event (or set of events) in
a database is sometimes of direct interest. However, in many knowledge discovery applications the discrepancy between the observed (relative) frequency
and its expected value under some baseline model is of greater interest. An
example from the analysis of purchasing patterns in supermarket sales data
is that even if milk is the product most commonly purchased together with
the product of interest, because this is true of most products, it may be more
enlightening to point out that, for instance, grapefruit juice is purchased four
times as often together with the product of interest as overall in the database.
Such contrasts provide the basis of disproportionality analysis, which focuses
on identifying events whose relative frequency in a given subgroup deviates
substantially from the relative frequency of the same event in a given comparison group.
Most modern methods for screening collections of individual case safety
reports for excessive ADR reporting rates are based on disproportionality
relative to the rest of the database. This is true of the Information
Component (IC) (Bate et al. 1998), the Empirical Bayes Geometric Mean
(EBGM) (DuMouchel 1999), the Proportional Reporting Ratio (PRR) (Evans
et al. 2001) and the Reporting Odds Ratio (ROR) (Egberts et al. 2002).
All these measures compare the number of reports on a certain drug–ADR
pair to an expected number of reports conditional on the overall reporting
rates for the drug and the ADR in the database. The original idea of making
comparisons with the database itself as reference goes back to the early days
of ADR surveillance (Patwary 1969, Finney 1974). In addition to the lack of
reliable external estimates for the international usage of different drugs, an
advantage of disproportionality analysis is that marginal reporting biases that
affect only the drug or only the ADR, cancel out (at least approximately) in
15
a measure of disproportionality. Thus, even though the reporting rates are
likely to be higher for serious than for harmless ADRs, this does not have a
considerable impact on the measures of disproportionality, as long as the
reporting bias affects all drugs to an equal extent. The main drawback of
disproportionality measures is that they rely on comparison to the reporting
of other drug–ADR pairs. Thus, if a particular drug–ADR pair is massively
reported, it will inflate the overall reporting rates for both the drug and
the ADR, sometimes to the extent that excessive reporting rates for the
same drug with another ADR or for another drug with the same ADR are
masked (Evans 2004, Hauben et al. 2005).
Assume the following contingency table based on the cross-classification of
reports according to whether they involve a drug x and an ADR y:
y
not y
x
a
b
not x
c
d
The basis for pairwise disproportionality analysis in the WHO database is
an observed-to-expected ratio OE contrasting the relative reporting rate of y
given x to the overall relative reporting rate of y in the database. With the annotation used in the above contingency table, the observed number of reports
on y given x is a, and the expected number of reports conditional on the table
marginals is the product of the marginal relative reporting rate of y and the toa+c
tal number of reports on x: a+b+c+d
· (a + b). The observed-to-expected ratio
is:
OE =
a/(a + b)
(a + c)/(a + b + c + d)
(3.1)
The same measure of disproportionality has been used also in the context of
association rule analysis (Agrawal et al. 1996), where it is referred to as the lift
or the interest of an association rule involving x and y (Silverstein et al. 1998,
Hastie et al. 2001). The similarity between the observed-to-expected ratio and
other measures of disproportionality proposed for the analysis of individual
case safety reports is clear. The Proportional Reporting Ratio (PRR) based on
the above contingency table is (Evans et al. 2001):
PRR =
a/(a + b)
c/(c + d)
(3.2)
and the corresponding Reporting Odds Ratio (ROR) is (Egberts et al. 2002):
ROR =
16
a/b
c/d
(3.3)
The IC measure of disproportionality used in routine knowledge discovery for
the WHO database is essentially a conservative version of log2 OE , that tends
to 0 for rare drug–ADR pairs. The moderation in magnitude is referred to as
shrinkage (for details see Section 3.4 below). The availability of thoroughly
evaluated shrinkage measures is the main advantage of the OE ratio over the
PRR and the ROR. Other strengths are the link to Bayes classifiers described
in Norén (2005) and the somewhat better robustness to zero counts in the contingency table than for the PRR and ROR (van Puijenbroek et al. 2002). The
main limitation is that the observed-to-expected ratio provides a less distinct
contrast between the group of interest and the reference group by including the
group of interest in the reference. Another limitation is that the observed-toexpected ratio for a given pair of events by definition cannot exceed the inverse
of the marginal relative reporting rate for each event. For example, if one of the
events has an overall relative reporting rate of 0.5, then observed-to-expected
ratios involving this event can at most reach 2 (if the relative reporting rate of
the first event conditional on the other event is 1.00). In practice this limits the
usefulness of observed-to-expected ratio as a measure of disproportionality to
events that are reasonably rare.
While disproportionality analysis is usually carried out at an early stage of the
exploratory analysis of collections of individual case safety reports, there are
sometimes requests to compute a measure of disproportionality for a drug–
ADR pair highlighted for review based on clinical judgement or on account of
one or a few very strong reports. If the drug–ADR pair turns out to be disproportionally reported, this may indeed lend added support. However, observed
disproportionality must always be interpreted with caution. The possibility of
alternative explanations such as report duplication, violated independence assumptions, publication biases or confounding must always be analysed and
clearly stated.
3.4
Shrinkage
Shrinkage is an attempt to regularise and reduce the volatility of a measure
or parameter estimate of interest, by trading an increase in bias for a decrease
in variance. In large and sparse data sets such as national or international collections of individual case safety reports, raw measures of disproportionality
tend to sometimes yield very large values based on extremely low numbers of
reports, but disproportionality based on just 1 or 2 reports is rarely of practical
interest. The problem is that for rare drugs and ADRs, the expected number of
reports may be very close to 0, relative to which even a single observed report
may constitute a substantial deviation. Very low expected numbers of reports
occur in the analysis of collections of individual case safety reports because
the 2 by 2 contingency table of Section 3.3 is usually very unbalanced. Even
17
Figure 3.2: The simplified IC shrinkage measure plotted against the standard IC
shrinkage measure for 10,000 randomly selected drug–ADR pairs in the WHO
database
for the most common drugs and ADRs in the WHO database, the number of
reports that do not involve either the drug or the ADR, d , is around 3, 000, 000,
whereas b and c are generally in the order of 100 or 1, 000 and a is even smaller
(a ≤ 10 for 80% of the drug–ADR pairs in the database). In order to reduce the
vulnerability to spurious associations, two shrinkage measures of disproportionality have been proposed for the analysis of collections of individual case
safety reports: the IC (Bate et al. 1998) and the EBGM (DuMouchel 1999).
These measures of disproportionality are versions of the (logarithm of the)
observed-to-expected ratio in (3.1) moderated towards a baseline value in the
absence of large amounts of data. For the IC, the baseline value is 0 which
corresponds to an observed-to-expected ratio of 1. Such shrinkage provides a
robust measure of disproportionality moderated towards less extreme values
for rare drugs and ADRs. However, as data accumulates it tends to log2 OE as
desired.
The IC shrinkage measure is defined in II as a Bayesian maximum à posteriori
estimate of a parameter related to the logarithm of the observed-to-expected
ratio in (3.1). It is well approximated by the following simplified shrinkage
measure based on observed and expected counts Oxy and Exy :
IC ≈ log2
Oxy + 1/2
Exy + 1/2
(3.4)
A comparison between the IC shrinkage measure in II and that in (3.4) for
10,000 randomly selected drug–ADR pairs in the WHO database is presented
in Figure 3.2. Clearly, the difference between the two shrinkage measures is
negligible. The main advantages of the simplified IC shrinkage measure are
18
that it is easier to compute and that it provides a general recipe for shrinkage that can be applied to any measure expressed in terms of an observed-toexpected ratio, such as the Ω measure of drug–drug interaction in IV. This
shrinkage can also be implemented for the PRR and ROR, after re-expression
in terms of observed-to-expected ratios with Oxy = a and Exy = (a+b)c
c+d for the
bc
PRR, and with Oxy = a and Exy = d for the ROR.
Empirical Bayes estimation provides an alternative framework for shrinkage,
where the prior distribution for a group of parameters is estimated based on
the empirical distribution of maximum likelihood estimates for the group. The
main advantage of empirical Bayes estimators is that they borrow strength
from similar observations to improve the overall accuracy. However, with respect to each parameter, its estimate will only improve under the assumption
that it is indeed related to the other parameters. Unlike the IC prior distribution, an empirical Bayes prior for the observed-to-expected ratio will not
necessarily be centred at 1, and thus may inflate individual disproportionality measures rather than shrink them towards less extreme values. A practical
issue is that for drug–ADR pairs that have never been co-reported, the maximum likelihood estimate of the observed-to-expected ratio is 0. In practice,
these drug–ADR pairs appear to be ignored in the estimation of the empirical
prior distribution in DuMouchel (1999), and the potential bias due to this is
unclear. Berry and Berry (2004) propose a hierarchical empirical Bayes estimator for the observed-to-expected measure of disproportionality, where each
measure of disproportionality is shrunk towards the group mean for a smaller
group of more closely related ADRs. This should allow for more sophisticated empirical Bayes shrinkage, but the identification of appropriate groups
of related ADR terms remains a challenging research problem in its own right.
3.5
Pattern discovery and detection
Pattern recognition is the attempt to partition a group of data points into
classes, based on a given set of explanatory variables (Webb 2002). Distinction
is made between supervised and unsupervised pattern recognition: in supervised pattern recognition (or discrimination) a classifier is constructed based
on training data consisting of labelled data points with the aim of accurately
categorising unseen data points; in unsupervised classification (or clustering),
the aim is to identify a natural partitioning of the available data set, without
labelled training data available, or even a specification of what the classes of
interest may be.
For our purposes, the distinction between patterns and models in the context
of pattern discovery and detection is more relevant. Hand and Bolton (2004)
characterise patterns as related to local features of a data set involving only
19
subsets of the data points and/or subsets of the variables. Whereas a global
model provides a high level description of the most important general features
of a data set, a pattern may highlight one or a few outlying observations or
a strong correlation between two variables. Hand and Bolton (2004) propose
the following general definition:
"A pattern is a local structure that generates data with an anomalously high
density compared with that expected under the (global) baseline model."
The focus on deviation from a global baseline model applies broadly to the
methods described in this PhD thesis. The very aim of disproportionality analysis, is to identify groups of events that are co-reported more often than would
be expected, based on a baseline independence model. Similarly, in duplicate
detection and other record matching applications, the aim is to identify pairs
(or small subsets) of unexpectedly similar reports whose similarity deviates
from a global baseline model assuming all reports have been submitted independently.
With the exception of the work on Bayes classifiers in V (which relates primarily to supervised pattern recognition by the above definition), this thesis
focuses on unsupervised pattern discovery. The aim is to discover structure in
data, without strict à priori specification of what the structure of interest is.
At the same time, completely open-ended hypothesis generation is not possible as the type of potential patterns is determined by the choice of pattern
discovery method, as well as implicitly by a range of other choices such as
the variables considered in a given study (Hand 1994, p 319). Thus, while
disproportionality analysis may highlight a variety of patterns related to anything from a suspected drug–ADR association to an elevated reporting rate of
a certain drug in one particular country, the type of patterns in such studies
is restricted to unexpectedly high (or low) relative reporting rates. Similarly,
record matching may highlight a variety of non-independent reports, but all
highlighted patterns will refer to unexpected report similarity.
3.6
Facilitating interpretation
Interpretation is one of the final steps in the knowledge discovery
process (Fayyad et al. 1996), and a key component of the ADR signal
detection process. Transparency is of particular importance in the analysis
of non-systematically collected data such as individual case safety reports,
where the use of overly complex statistical methodology may give a false
sense of security and distract domain experts from limitations with the
data (Hauben et al. 2005). Breiman (1985) refers to the application of
20
advanced statistical methodology to hide inadequacies with the data as
‘edifice building’; in the ADR signal detection process, the use of overly
complex statistical methods may divert clinical experts from careful
consideration of alternative explanations to apparently excessive ADR
relative reporting rates.
Since the primary aim of applying knowledge discovery methods to collections of individual case safety reports is to guide and support domain experts
in their manual review, better transparency is a strong argument in favour of
choosing a simple method over a more complicated one. Indeed, better transparency is perhaps the strongest argument for choosing the simple IC shrinkage measure over the more complicated one as discussed in Section 3.4. Statistical sophistication does not necessarily rule out transparency, however. The
hit-miss model record matching algorithm in I is based on a rather intricate
probabilistic model, but its basis for highlighting a given record pair as suspected duplicates is immediately clear from an overview such as that presented
in Figure 4.2 of Section 4.
While sophisticated statistical methods are sometimes required to make the
most of the available data, knowledge discovery results should always be presented as transparently as possible. For example, while shrinkage measures of
disproportionality have proved a very powerful basis for filtering individual
case safety reports for interesting reporting patterns, they may confuse domain experts, with little interest in the statistical methodology. Moreover, it
is difficult to evaluate the impact of data quality issues such as suspected duplication or reporting biases on shrinkage measures of disproportionality. Observed and expected counts provide a more transparent explanation for why
certain drug–ADR pairs have been highlighted for manual review. In the presence of suspected data quality issues, simple arithmetic will indicate to what
extent an excessive reporting rate may be due to a group of suspected duplicates, for instance. At the same time, domain experts often do want a sense
of whether an observed disproportionality is likely to be due to chance or not.
Credibility intervals around measures of disproportionality give some such indication, although the potential for violated independence assumptions means
that precision can be overestimated.
Adjustment for potential confounders may complicate interpretation of
shrinkage measures of disproportionality. However, as commented on in
III, adjusted observed-to-expected ratios sometimes correspond closely to
stratum specific ones, and translating adjusted observed-to-expected ratios
to stratum specific ones may simplify interpretation. For example, in the
example on hypertension and zimeldine in Appendix E.4 of Hopstadius
(2006), the IC increases from -0.33 to +1.65 when adjusted for time of
reporting and country of origin. A closer investigation of the detailed data
available in Appendix F.2 of the same thesis indicates that the discrepancy
21
is due to hypertension being more than twice as common on US reports
(1.7%) as on reports from other countries (0.7%), whereas zimeldine was
never used in the USA. Additionally the overall relative reporting rate of
hypertension has increased in recent years, whereas zimeldine was primarily
used in the early 1980’s. Thus, the crude IC which contrasts the observed
relative reporting rate of hypertension given zimeldine to the overall relative
reporting rate of hypertension in the entire database underestimates the
disproportionality. Arguably, the best information to present to clinical
experts in this case would be the observed number of reports on hypertension
for zimeldine, and the expected number of such reports based on the relative
reporting rate of hypertension in the countries and time period in which it
was available. To guide domain experts to appropriate interpretation is clearly
as important a challenge as method development in knowledge discovery
research.
3.7
Future directions
The new methodology proposed in this thesis provides a strong basis for future improvement and further research on knowledge discovery methods for
collections of individual case safety reports. The method for drug–drug interaction detection goes beyond simple drug–ADR disproportional reporting
rates, and could potentially be used also to screen for other types of ADR
risk factors, such as related to patient gender or age. In general, we must
aim to make better use of the rich information available on individual case
safety reports. Virtually all knowledge discovery methods of today (including
those described in this thesis) are based on raw numbers of reports (Hauben
et al. 2005). They do not account for the amount or quality of information on
each report nor for suspected duplication. This is in stark contrast with clinical review, in which both the quality of single reports and the quality of sets
of reports as a group is carefully scrutinised (Meyboom et al. 1997). Indeed,
given that the overall aim of applying knowledge discovery methods to collections of individual case safety reports is to assist and direct clinical review,
an important challenge for the future is to achieve better alignment between
automated knowledge discovery and clinical review. A first step may be to
develop new and improved quality criteria for individual case safety reports
similar to those discussed by Edwards et al. (1990). Based on such quality
criteria, the number of high quality, distinct reports referring to a particular
drug–ADR pair can be identified and potentially provide a useful triage criterion. The possibility to highlight single high quality reports is interesting in
its own right.
The extended hit-miss model record matching algorithm has proved very useful for duplicate detection in the WHO database. Its importance is likely to
22
increase even further in the future, as new categories of health care professionals, and even patients, are invited to submit reports. In addition, the hitmiss model record matching algorithm sometimes highlights non-independent
reports other than pure suspected duplicates. Non-independent reports distort
data analysis, and their identification is important both for effective first pass
screening and for clinical review where the consideration of a group of related
reports as independent pieces of information is potentially deceptive. For this
purpose, an adapted hit-miss model record matching algorithm should ideally
be developed explicitly for the purpose of detecting non-independent reports
other than pure duplicates. A main challenge is how to incorporate, in subsequent data analysis, the information that some reports are suspected to be
related. One might conceive of an extended disproportionality analysis where
reports were weighted according to whether they are part of a suspected cluster or not. Given the tedious process of having suspected duplicates confirmed
and removed from collections of individual case safety reports, the same approach could perhaps be used also to account for suspected duplication, in first
pass screening. In a similar spirit, reports could perhaps also be weighted by
their quality of information.
Another important challenge for the future is to further advance the methods
for exploring patterns involving large groups of drugs and ADRs in collections of individual case safety reports. In Orre et al. (2005), we use a Hopfield
type network and a mixture model based probabilistic clustering algorithm
to identify suspected ADR syndromes in the WHO database. The main challenge is that while each syndrome may consist of a large group of ADRs,
each report tends to include only a small subset of these, so training data is
both noisy and incomplete. Pattern discovery in high-dimensional binary data
has been studied in other application areas such as market basket analysis and
document retrieval (Bingham et al. 2002), and this research provides a good
starting point for further development in our area. An interesting generalisation of the mixture model based clustering algorithm, for high-dimensional
binary data, is the subspace clustering method proposed by Patrikainen and
Mannila (2004), which models only the most characteristic attributes for each
class. For the discovery of reporting patterns based on smaller groups of reports, the hit-miss model based record matching algorithm may potentially
prove useful. Its advantage is that it does not attempt to build a global model,
but searches for groups of unexpectedly similar reports, based on pairwise
comparison.
The importance of individual case safety reports for early post-marketing discovery of previously undetected drug toxicity is clear. At the same time, these
data sets are not optimal for all types of ADR-related knowledge discovery.
Specifically, each report constitutes a snapshot in time, and any information
on the patient’s previous medical history is limited, at best. Therefore, it is
difficult to evaluate the potential impact of channelling effects, where those
23
patients that do not respond favourably to one medical treatment are systematically switched to a specific other treatment. Similarly, individual case safety
reports usually do not provide enough information to determine whether there
are differences in the severity of the underlying disease between patients prescribed different drugs. Yet another limitation with individual case safety reports is that adverse events without clear temporal association with the prescription of the drug are difficult to identify as suspected ADRs, in particular if
the background incidence of the adverse event is high (Meyboom et al. 1997).
As a consequence, longitudinal patient records listing patients’ entire medical
histories are a very interesting complementary source of information. Combined, individual case safety reports and longitudinal patient records may allow for more comprehensive ADR related knowledge discovery. While the
methodology proposed in the context of this thesis has been developed specifically for the exploratory analysis of collections of individual case safety reports, some of it may be relevant also for the analysis of longitudinal patient
records. Specifically, the method for interaction detection introduced in IV
can be adapted to longitudinal patient records, and the proposed framework
for exploratory analysis outlined in Figure 3.1, should, with some modifications, apply also to longitudinal patient records.
24
4. Overview of the papers
This thesis is based on five original contributions. The order in which they
are presented corresponds roughly to their natural order of application in the
knowledge discovery process for ADR surveillance. I focuses on improving
data quality through identifying suspected duplicate reports. II, III and IV
propose improvements to, and evaluate different aspects of, disproportionality analysis for individual case safety reports. Finally, V proposes a bootstrap
method to estimate the uncertainty in each prediction of a Bayes classifier.
Historically, II and V are based on related work on Bayesian bootstrap analysis in 2003. An earlier version of II was presented at the 25th annual conference of the International Society for Clinical Biostatistics in Leiden, the
Netherlands, 2004. The duplicate detection algorithm in I was developed during 2004 and 2005, and a shorter version of this paper was presented at the
Eleventh International Conference on Knowledge Discovery and Data Mining
in Chicago, 2005. The evaluation of the adjusted observed-to-expected ratio in
III was performed during 2005 and 2006, and the statistical methodology for
drug–drug interaction detection in IV was developed during 2006. The aim of
this section is to provide a conceptual overview of the five papers.
4.1
Paper I
Good data quality is a prerequisite for effective data analysis (Kim et al. 2003,
De Veaux and Hand 2005). One important data quality problem in collections
of individual case safety reports is that of report duplication. Duplicate reports
are unlinked reports related to the same ADR incident, perhaps provided by
different health professionals or by the same health professional to different
drug safety centres. Their presence is a problem in the analysis of individual case safety reports because the total number of reports on a particular
drug–ADR pair is both the basis for automated knowledge discovery and an
important piece of information in clinical review of potential drug safety signals. When a single suspected ADR incident yields several reports, this may
divert the analysis. Some studies indicate that duplicates may account for as
large a proportion as 5% of all reports. More importantly, suspected report
duplication appears not to be evenly spread in the data set, but whereas most
reports have no suspected duplicates, a small minority have several. Require-
25
True value
a
T
b
X
Y
Observed
value on
first report
Observed
value on
second report
1-a-b
Miss
?
Blank
−
Hit
T
Figure 4.1: The hit-miss model
ments and regulations selectively stimulate reporting of previously unknown
and serious ADRs, and may also increase the risk of duplicate reports related
to such incidents (R. H. B. Meyboom, personal communication). The identification of suspected duplicates is thus an important step towards improved
data quality and, ultimately, more effective automated knowledge discovery
as well as better informed clinical review.
The identification of suspected duplicates in collections of individual case
safety reports is a difficult challenge. Duplicate reports will often either have
been submitted by different individuals or processed in different reporting systems, and as such can be superficially very dissimilar. Different ADR terms
may have been used to encode the same incident, patient information may
be erroneous or incomplete and the listed drugs may differ between reports
related to the same incident. Therefore, simple rule based methods are usually insufficient to reliably detect suspected duplicates. The duplicate detection method proposed in I is based on the hit-miss model for statistical record
linkage introduced by Copas and Hilton (1990). The hit-miss model provides
a probability model for how discrepancies between related database records
occur. It allows for flexible and robust record matching in the presence of a
large variety of errors. Under the hit-miss model, each observed value X on
a database record (for example a listed patient gender on a report) is based
on a true but unobserved value T = t (in this case the true gender of the patient). Observed values on related records are assumed to have been generated
in independent identically distributed random processes resulting in i) a miss
(with respect to the true value) with probability a, ii) a blank with probability
b, or iii) a hit with probability 1 − a − b (see Figure 4.1). For a miss X is a random value independent of T but following the same distribution, for a blank
the value of X is missing and for a hit X = t . Hits and misses are unobservable events of an assumed data generating process. In screening for suspected
26
Tachycardia
ventricular
2002-02-07
?
62 years
Norway
Sertraline
Mirtazapine
2002-02-07
Female
60 years
Norway
Sertraline
Mirtazapine
Zopiclone
Tachycardia
ventricular
=
?
≠
=
=
=
≠
=
+12.0
±0
-0.2
+7.2
+6.1
+8.7
-2.3
+8.1
Compensation for correlation
between sertraline, mirtazapine
and tachycardia
-1.4
+38.2
Figure 4.2: Hit-miss model based scoring of a sample record pair
duplicates we make comparisons between distinct reports, based on whether
they have matching or mismatching information.
Duplicate detection in the WHO database is based on patient gender, patient
age, outcome, country of origin, date of onset, as well as all listed drugs and
ADRs. For each record field, a match weight is calculated based on the likelihood ratio for the observed matching event under the assumption that the
two records under study i) relate to the same underlying ADR incident or
ii) are unrelated. The total match score is obtained by adding together the
match weights for all record fields, as illustrated in Figure 4.2. It can be shown
that, under the hit-miss model, matches always receive positive weights, mismatches receive negative weights and missing information on either report results in a match weight of 0. Moreover, matches on rare events receive higher
match weights than matches on common events. This is an appealing property
since chance matches between unrelated record fields are more likely on common events. The penalty for a mismatch is constant for a given record field
but varies between record fields depending on how many mismatches were
observed in each record field in the available training data (consisting of confirmed pairs of duplicate reports). Thus, in screening for suspected duplicates,
mismatches in error prone record fields are penalised less than mismatches in
record fields that are usually reliable.
In I, we propose two methodological improvements to the standard hit-miss
model: a hit-miss mixture model for numerical record fields and an adjustment of the overall match score for violated independence assumptions between matching record fields. The hit-miss mixture model extends the hit-miss
model by including the possibility of imperfect matches in numerical record
fields, which are less detached from the true value than complete misses. Deviations follow a narrow distribution centred at the true value. The compensation for violated independence assumptions is based on an IC dispropor27
tionality measure for the overall co-occurrence of two matching events in the
database. It reduces the total match score for groups of matched events that
occur together more often in the database than would be expected under the
assumption of independence. The greatest strengths of the extended hit-miss
model are that it provides transparent and intuitive match weights and that its
parametrisation allows for robust fitting also in the absence of large numbers
of confirmed duplicates.
Because suspected duplicates can be reliably confirmed or refuted, the performance of a proposed duplicate detection method can be easily evaluated.
In I, we demonstrate that the extended hit-miss model is able to identify with
high accuracy (94.7% in our test data set), the most likely duplicate for a
given database record. We also show that it effectively discriminates pairs of
true duplicates from random matches. In a batch of 1559 Norwegian reports
that included 19 confirmed duplicates, the extended hit-miss model identified
12 of the 19 already known duplicates (corresponding to a 63% recall) while
additionally highlighting two pairs and one set of three reports as suspected
duplicates that were not originally labelled as such (corresponding to a nominal 71% precision). Out of the additional suspected duplicates, one pair was
later confirmed by the Norwegian national centre as a set of true duplicates,
the other pair remains a set of suspected but unconfirmed duplicates and the
set of three suspected duplicates turned out to be separate reports on the same
drug–ADR pair submitted by the same dentist, but for three distinct patients.
4.2
Paper II
The IC measure of disproportionality discussed in Sections 3.4 and 3.3 is the
basis for routine screening of the WHO database to highlight excessive ADR
reporting rates. In its original implementation (Bate et al. 1998), the IC only
allowed for the identification of pairwise disproportionality (typically between
one drug and one ADR). It relied on large sample approximations to compute
credibility intervals and did not accommodate adjustment for suspected confounders. In response to these issues, II proposes credibility intervals accurate
also for small samples, adopts a post-stratification approach to adjust for suspected confounders and introduces a simple extension to higher orders for the
IC measure of disproportionality. The overall aim of these improvements is to
allow more sophisticated and reliable screening for disproportional reporting
rates in the WHO database.
The credibility intervals for the IC proposed in Bate et al. (1998) were based
on a normal approximation to the posterior IC distribution. In II, we demonstrate by precise Monte Carlo simulation that this is often not accurate enough.
As an alternative, we propose an approximate formula for computing credibil28
ity intervals of the posterior IC distribution that is accurate also for rare events.
It may seem counter-intuitive that the use of small sample methods should be
necessary in the analysis of a data set with nearly 4 million records. However,
as the focus turns to specific drug–ADR pairs, the number of relevant reports
decreases very rapidly. Among the around 720,000 drug–ADR pairs ever coreported in the WHO database, more than 320,000 are co-reported only once,
and an additional 106,000 only twice. More than 80% are co-reported less
than 10 times. In the context of the variety of challenges involved in analysing
these data sets, the importance of very accurate credibility intervals is perhaps
limited, but one practically useful aspect of the refined credibility intervals
proposed in II over those in Bate et al. (1998) is that they allow examples
where the first three reports on a new drug all refer to the same ADR to be
highlighted. This may allow for very early warning of some suspected ADRs.
There may be a need to eliminate the impact of suspected confounders in
disproportionality analysis. In II, we adopt a post-stratification approach to
adjusting the observed-to-expected ratio for potential confounders originally
proposed by DuMouchel (1999). The adjusted observed-to-expected ratio is
an average of stratum specific observed-to-expected ratios weighted by the
stratum specific expected numbers of reports:
OE
ad j
=
∑z
Ozxy
z
Exy
z
· Exy
z
∑z Exy
Oxy
=
z
∑z Exy
(4.1)
The relative merits of the adjusted observed-to-expected ratio, and the general
impact of confounding on disproportionality analysis in the WHO database
are further discussed in III.
The extension of the IC to higher order associations in II is an important step
towards being able to screen for disproportional reporting indicative of effect
modification (for example variation across age groups in the risk of a certain
ADR due to a particular drug). The higher order IC is simple to estimate and
robust to overfitting based on limited amounts of data. Disregarding shrinkage,
the third order IC between events x, y and z proposed in II is:
ICxyz = ICxy|z − ICxy
where:
ICxy|z = log2
P(y | x, z)
P(y | z)
(4.2)
(4.3)
Thus, a positive third order IC value indicates that the presence of event z increases the disproportionality between x and y (and vice versa – the measure
is symmetrical in x, y and z). We further show that the third order IC can be
expressed as an observed-to-expected ratio for the threeway relative reporting
29
rate, where the expected relative reporting rate is based on a product of factors relating to main effects and pairwise interaction. The advantage of (4.2)
relative to the third order IC proposed in Orre et al. (2000) is that (4.2) accounts for pairwise associations in the expected relative reporting rate. The
discussion of how to best screen for interaction in ADR surveillance is further
extended in IV.
4.3
Paper III
Confounders are covariates that distort the quantitative relationship under
study. A textbook example of confounding is that the crude association
between coffee drinking and coronary heart disease in observational studies
may be due to heavy coffee drinkers also having a greater propensity to
smoke (Hennekens et al. 1976). While in experimental studies, randomisation
in principle eliminates the impact of all potential confounders, observational
studies are non-randomised by design and require each suspected confounder
to be individually identified and adjusted for in the analysis (or accounted for
in the study design). The potential impact of unaccounted for confounding
variables is a constant concern in the interpretation of observational data. It
has been argued that routine adjustment for potential confounders is crucial
also in first pass screening for excessive ADR reporting rates in collections of
individual case safety reports to avoid highlighting disproportional reporting
rates driven by other covariates.
In III, we study the relative merits of the post-stratification approach to routine adjustment of the observed-to-expected ratio adopted for the IC in II. We
focus on the WHO database, but use both simulated stratification and stratification based on true covariates (in particular patient age, patient gender,
country of origin and time of reporting). The two main results are that the
adjusted observed-to-expected ratio is sensitive to over-stratification and that
routine adjustment for common potential confounders has less impact on signal detection performance than initially believed. With a careful selection of
suspected confounders and a more coarse categorisation of these covariates,
routine adjustment does improve performance relative to a literature comparison, in our investigation. However, this performance improvement is modest
compared to that due to imposing a triage criterion that requires reports from
more than one country to highlight a drug–ADR pair for clinical review (see
Figure 4.3). These results support the claim by Bate et al. (2003), that confounding may be less important a bias in first pass screening of collections of
individual case safety reports for excessive ADR reporting rates than generally
assumed.
30
0.7
Crude without triage
Adjusted without triage
Crude with triage
Adjusted with triage
0.6
Precision
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
1
Recall
Figure 4.3: Precision–recall graphs relative to a literature reference for the crude IC025
and the IC025 simultaneously adjusted for country of origin and reporting time interval,
with and without a triage criterion requiring reports from more than 1 country. The
graphs plot precision (number of true positives over number of true positives and false
positives) vs recall (number of true positives over number of true positives and false
negatives) at varying thresholds on IC025 .
4.4
Paper IV
Interaction between drug substances may lead to excessive risk of certain
ADRs when two drugs are taken at the same time. If previously unknown
high risk drug combinations can be identified, they can potentially be avoided
in the future, and drugs that would have otherwise been withdrawn can remain
on the market with warnings concerning co-medication. Thus, the identification of suspected drug-drug interaction is important both from the individual patient safety perspective and from the general public health perspective.
In addition to the higher order IC measure of disproportionality in II, two
regression based approaches to screening individual case safety reports for
suspected drug–drug interaction have also been proposed (van Puijenbroek
et al. 1999, DuMouchel and Pregibon 2001), but no publicly available results
indicate that any of the proposed methods have been successfully applied to
prospective screening for suspected drug–drug interaction.
An important contribution of IV is the observation that the limited success of
earlier proposed methods for drug–drug interaction detection may be due to
their use of baseline models where, in the absence of interaction, different risk
factors essentially multiply. There are arguments from both public health and
individual patient safety perspectives to consider absolute differences in risk
rather than relative ones (Rothman et al. 1980): from a public health perspec31
tive, we are interested in whether the absolute number of ADR incidents of
a certain type in a given population depends on to what extent two different
drugs are co-prescribed; from the individual decision-making point of view,
we want to know whether the increase in absolute risk of a certain ADR due
to the prescription of one drug is modified by the co-prescription of another
drug. Based on these arguments, we propose in IV the Ω measure of suspected
drug–drug interaction. Ω is a shrinkage measure of threeway disproportional
reporting, based on the logarithm of an observed-to-expected ratio for the relative reporting rate of an ADR A under co-prescription of drugs D1 and D2 .
In our model, the background risk of A and the risks of A attributable to D1
and D2 , respectively, are independent. For small attributable risks, this leads
to an approximately additive model for risk difference, in the population. The
main technical contribution is an approach to estimate the expected relative
reporting rate of A given D1 and D2 co-prescribed.
In studies of the WHO database, we show that Ω highlights examples of established drug–drug interaction, with excessive relative reporting rates that go
undetected with logistic regression. For example, unlike logistic regression Ω
indicates that there is suspected interaction between gemfibrozil and cerivastatin with respect to the risk of rhabdomyolysis. This is a well established
drug–drug interaction and co-prescription together with gemfibrozil was contraindicated for cerivastatin even as it was introduced to the general public.
There are over 1,000 reports in the WHO database on rhabdomyolysis for concomitant use of cerivastatin and gemfibrozil, and the relative reporting rate of
rhabdomyolysis given cerivastatin together with gemfibrozil is over 75%. This
is to be compared with relative reporting rates of 0.1% in the absence of both
cerivastatin and gemfibrozil, 4% for gemfibrozil in the absence of cerivastatin
and 25% for cerivastatin in the absence of gemfibrozil. Clearly, a method for
drug–drug interaction must highlight this as indicative of suspected drug–drug
interaction in order to be practically useful for first pass screening purposes.
Ω fulfils this requirement and allows for computationally efficient first pass
screening for suspected drug–drug interaction in collections of individual case
safety reports.
4.5
Paper V
The aim of V is to demonstrate the usefulness of case-based imprecision estimates for Bayes classifier predictions. Unlike the overall expected prediction
error, case-based precision estimates indicate the certainty with which each
individual data point is predicted. Clearly, this will vary depending on the degree of similarity between the data point of interest and those in training data.
Bayes classifiers are generative classifiers that predict class membership indirectly, based on estimated distributions of the explanatory variables given
32
a specific class. This is in contrast with discriminative classifiers, such as logistic regression whose parameters are optimised directly with respect to the
prediction performance on a given set of training data. As noted by Ng and
Jordan (2002), generative classifiers may reach their (higher) asymptotic error
more rapidly than discriminative classifiers such as logistic regression, and
thus be preferable in the absence of large amounts of training data. Despite its
often violated assumption of mutual independence between explanatory variables given class membership, the naive Bayes classifier has proved to compare well with more sophisticated classification methods in many real world
applications (Domingos and Pazzani 1997, Hand and Yu 2001). However, the
exact values of the estimated class probabilities are not trustworthy as there is
a tendency of the naive Bayes classifier of being too confident in its predictions (Hand and Yu 2001). As an alternative, we propose that the certainty with
which each data point is classified be estimated based on Bayesian bootstrap
resampling of the original training data. The Bayesian bootstrap produces a
large number of slightly modified training data sets by repeatedly assigning
Di(1, 1, . . . , 1) distributed random weights to the observations in the original
training data. Based on each Bayesian bootstrap replicate of the original training data, a Bayes classifier is trained and used to predict the data point(s) of
interest. Instead of the predicted probability of class membership based on the
original Bayes classifier, we propose that the proportion of Bayesian bootstrap
replicates for which the predicted probability of class membership exceeds 0.5
be used as an estimate of the certainty with which a given data point is predicted. We provide results in V that indicate that this reduces the expected
loss, when some misclassifications are more costly than others.
A comment made in Norén (2005), which is worth repeating, is that in V, the
marginal class probabilities P(y j ) were estimated essentially as the proportion
of instances from each class in the available training data. This is appropriate
when training data is a representative sample from the population to which
the classifier is to be applied. If, on the other hand, the composition of training
data does not necessarily represent that of future observations, then P(y j ) must
either be estimated based on external data relevant to the population of interest
or be based on prior assumptions. The Bayesian bootstrap can be modified to
accommodate this, by replacing the numbers of data points from each class in
training data {ny1 , ny2 , . . .} (see Table 1 in V) by the corresponding numbers in
a data set representative for future samples (or by appropriate pseudo-counts).
V is the only paper in this thesis not to have derived methods explicitly for the
purpose of improved ADR surveillance. However, the methods for improved
Bayes classification under asymmetrical loss have a potential application in
the development of more data driven triage algorithms for ADR surveillance.
As implemented today, the triage algorithms are based exclusively on clinical expertise (Ståhl et al. 2004). A Bayes classifier framework may allow for
a more data driven approach where clinical judgement of the value of previ33
ously highlighted drug–ADR pairs is used as training data for the construction
of a Bayes classifier. Useful explanatory variables for such an implementation
might include the total number of reports listing a given drug–ADR combination, as well as their quality, diversity and geographical spread; the number
of positive de- or rechallenge interventions etc. Given that in ADR signal detection, missed problems are more problematic than falsely highlighted ones,
the loss functions involved will be asymmetrical, and the Bayesian bootstrap
method proposed in V should allow for improved performance.
34
Acknowledgements
I would like to express my gratitude to all those who have provided support
and encouragement during the work that has lead to this PhD thesis.
• Professor Rolf Sundberg for inspiration and advice, and for providing an
excellent example of how to combine a profound knowledge of mathematical statistics with a genuine interest in solving real world problems.
• Professor Ralph Edwards for helping me to broaden my views in pharmacovigilance, for identifying a wide variety of challenging research problems and for providing an ambitious overall vision for the work of the
Uppsala Monitoring Centre.
• Andrew Bate for encouraging me to enrol as a PhD student, for day to day
support and advice and for a very rewarding and productive collaboration.
• Marie Lindquist, Ron Meyboom and Sten Olsson for sharing their knowledge of ADR signal detection and of the WHO programme.
• All colleagues at the Uppsala Monitoring Centre for providing an excellent
work environment, in particular the members of the R&D team, past and
present: Erik Swahn, Jonathan Edwards, Malin Ståhl, Sven Purbe, Johan
Hopstadius, Kristina Star, Johanna Strandell and Ola Caster.
• Roland Orre for expert computational support and advice.
• All members of the Division of Mathematical Statistics at Stockholm University for providing a stimulating research environment and for welcoming me to the group.
Finally, I would like to thank all my family and friends from Järbo, New Bethlehem, Göteborg, Uppsala and elsewhere. A special thank you to my parents
Hasse and Christina Norén for your everlasting support and encouragement,
and to Minna, my love, for making it all worthwhile.
Uppsala, March 2007,
Niklas Norén
35
Bibliography
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I.: 1996, Fast
discovery of association rules, Advances in knowledge discovery and data
mining, American Association for Artificial Intelligence, pp. 307–328.
Aronson, J. K. and Hauben, M.: 2006, Anecdotes that provide definitive evidence,
British Medical Journal 333(7581), 1267–1269.
Bate, A.: 2003, The Use of a Bayesian Confidence Propagation Neural Network
in Pharmacovigilance, PhD thesis, Umeå University.
Bate, A., Edwards, I. R., Lindquist, M. and Orre, R.: 2003, Violation of homogeneity:
the author’s reply, Drug Safety 26, 364–366.
Bate, A., Lindquist, M., Edwards, I. R., Olsson, S., Orre, R., Lansner, A. and De Freitas, R. M.: 1998, A Bayesian neural network method for adverse drug reaction
signal generation, European Journal of Clinical Pharmacology 54, 315–
321.
Berry, S. M. and Berry, D. A.: 2004, Accounting for multiplicities in assessing drug
safety: a three-level hierarchical mixture model, Biometrics 60(2), 418–426.
Bingham, E., Mannila, H. and Seppänen, J. K.: 2002, Topics in 0–1 data, KDD’02:
Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining, ACM Press, New York, NY, USA,
pp. 450–455.
Breiman, L.: 1985, Nail finders, edifices and Oz, Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. I, Wadsworth, Belmont, CA, USA, pp. 201–214.
Breiman, L.: 2001, Statistical modeling: the two cultures (with comments and a rejoinder by the author), Statistical science 16(3), 199–231.
Copas, J. and Hilton, F.: 1990, Record linkage: statistical models for matching computer records, Journal of the Royal Statistical Society: Series A
153(3), 287–320.
Coulter, D. M., Bate, A., Meyboom, R. H., Lindquist, M. and Edwards, I. R.: 2001,
Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study, British Medical Journal 322(7296), 1207–1209.
37
De Veaux, R. D. and Hand, D. J.: 2005, How to lie with bad data, Statistical Science
20(3), 231–238.
Domingos, P. and Pazzani, M.: 1997, On the optimatility of the simple Bayesian classifier under zero-one loss, Machine learning 29, 103–130.
DuMouchel, W.: 1999, Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system, American Statistician
53, 177–202.
DuMouchel, W. and Pregibon, D.: 2001, Empirical Bayes screening for multi-item
associations, KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 67–76.
Edwards, I. R.: 1997, Adverse drug reactions: finding the needle in the haystack,
British Medical Journal 315(7107), 500.
Edwards, I. R.: 1999, Spontaneous reporting – of what? Clinical concerns about drugs,
British Journal of Clinical Pharmacology 48(2), 138–141.
Edwards, I. R. and Aronson, J. K.: 2000, Adverse drug reactions: definitions, diagnosis and management, Lancet 356(9237), 1255–1259.
Edwards, I. R. and Biriell, C.: 1994, Harmonisation in pharmacovigilance, Drug
Safety 10(2), 93–102.
Edwards, I. R., Wiholm, B.-E., Lindquist, M. and Napke, E.: 1990, Quality criteria for
early signals of possible adverse drug reactions, Lancet 336(8708), 156–158.
Efron, B.: 2001, [Statistical modeling: the two cultures]: Comment, Statistical science 16(3), 218–219.
Egberts, A. C., Meyboom, R. H. and van Puijenbroek, E. P.: 2002, Use of measures
of disproportionality in pharmacovigilance: three Dutch examples, Drug Safety
25(6), 453–458.
Elder, J. F. and Pregibon, D.: 1996, A statistical perspective on knowledge discovery
in databases, Advances in knowledge discovery and data mining, American
Association for Artificial Intelligence, Menlo Park, CA, USA, pp. 83–113.
Evans, S. J. W.: 2000, Pharmacovigilance: a science or fielding emergencies?, Statistics in Medicine 19(23), 3199–3209.
Evans, S. J. W.: 2004, Statistics: analysis and presentation of safety data, in J. Talbott
and P. Waller (eds), Stephens’ detection of new adverse drug reactions, John
Wiley & Sons, Chichester, England, pp. 301–328.
Evans, S. J. W., Waller, P. C. and Davis, S.: 2001, Use of proportional reporting ratios
(PRRs) for signal generation from spontaneous adverse drug reaction reports,
Pharmacoepidemiology and Drug Safety 10(6), 483–486.
38
Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P.: 1996, The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM
39(11), 27–34.
Finney, D. J.: 1966, Monitoring adverse reactions to drugs – its logic and its weaknesses, Proceedings of the European Society for the study of Drug Toxicity
7, 198–207.
Finney, D. J.: 1971, Statistical logic in the monitoring of reactions to therapeutic
drugs, Methods of Information in Medicine 10(4), 237–245.
Finney, D. J.: 1973, The detection of causation of adverse events, Proceedings of the
39th session of the International Statistical Institute, pp. 387–393.
Finney, D. J.: 1974, Systematic signalling of adverse reactions to drugs, Methods of
Information in Medicine 13(1), 1–10.
Glymour, C., Madigan, D., Pregibon, D. and Smyth, P.: 1997, Statistical themes and
lessons for data mining, Data Min. Knowl. Discov. 1(1), 11–28.
Hand, D. J.: 1994, Deconstructing statistical questions, Journal of the Royal Statistical Society. Series A (Statistics in Society) 157(3), 317–356.
Hand, D. J.: 1998, Data mining: Statistics and more?, The American Statistician
52, 112–118.
Hand, D. J. and Bolton, R.: 2004, Pattern discovery and detection: A unified statistical
methodology, Journal of Applied Statistics 31(8), 885–924.
Hand, D. J. and Yu, K.: 2001, Idiot’s Bayes—not so stupid after all?, International
Statistical Review 69(3), 385–398.
Hastie, T., Tibshirani, R. and Friedman, J.: 2001, The elements of statistical learning: data mining, inference and prediction, Springer-Verlag, New York, NY,
USA.
Hauben, M., Madigan, D., Gerrits, C. M., Walsh, L. and van Puijenbroek, E. P.: 2005,
The role of data mining in pharmacovigilance, Expert Opinion on Drug Safety
4(5), 929–948.
Hennekens, C., Drolette, M., Jesse, M., Davies, J. and Hutchison, G.: 1976, Coffee
drinking and death due to coronary heart disease, New England Journal of
Medicine 294(12), 633–636.
Hopstadius, J.: 2006, Methods to control for confounding variables in screening
for associations in the WHO drug safety database, Master’s thesis, Uppsala
University.
39
Kim, W. Y., Choi, B.-J., Hong, E. K., Kim, S.-K. and Lee, D.: 2003, A taxonomy of
dirty data., Data Mining and Knowledge Discovery 7(1), 81–99.
Lindquist, M.: 2003, Seeing and Observing in International Pharmacovigilance
– Achievements and Prospects in Worldwide Drug Safety, PhD thesis,
Katholieke Universiteit Nijmegen.
Mannila, H.: 1996, Data mining: machine learning, statistics, and databases, Pro-
ceedings of the 8th International Conference on Scientific and Statistical
Database Management (SSDBM ’96), pp. 2–9.
Meyboom, R. H. B., Egberts, A. C. G., Edwards, I. R., Hekster, Y. A., de Koning, F.
H. P. and Gribnau, F. W. J.: 1997, Principles of signal detection in pharmacovigilance, Drug Safety 16(6), 355–365.
Meyboom, R. H. B., Lindquist, M., Egberts, A. C. G. and Edwards, I. R.: 2002, Signal
selection and follow-up in pharmacovigilance, Drug Safety 25(6), 459–465.
Ng, A. Y. and Jordan, M. I.: 2002, On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, in T. G. Dietterich, S. Becker and
Z. Ghahramani (eds), Advances in Neural Information Processing Systems
14, MIT Press, Cambridge, MA.
Norén, G. N.: 2005, Statistical methods for large scale exploratory analysis of postmarketing drug safety data. Licentiate thesis, Stockholm University.
Olsson, S.: 1998, The role of the WHO programme on international drug monitoring
in coordinating worldwide drug safety efforts, Drug Safety 19(1), 1–10.
Orre, R., Bate, A., Norén, G. N., Swahn, E., Arnborg, S. and Edwards, I. R.: 2005, A
Bayesian recurrent neural network for unsupervised pattern recognition in large
incomplete data sets, International Journal of Neural Systems 15(3), 207–
222.
Orre, R., Lansner, A., Bate, A. and Lindquist, M.: 2000, Bayesian neural networks
with confidence estimations applied to data mining, Computational Statistics
& Data Analysis 34, 473–493.
Patrikainen, A. and Mannila, H.: 2004, Subspace clustering of high dimensional binary data – a probabilistic approach, Proc. Fourth SIAM Int’l Conf. Data
Mining, Workshop Clustering High Dimensional Data and Its Applications, pp. 57–65.
Patwary, K. M.: 1969, Report on statistical aspects of the pilot research project for
international drug monitoring, Technical report, Report prepared for the World
Health Organization, Geneva.
Purcell, P. M.: 2003, Data mining in pharmacovigilance, International Journal of
Pharmaceutical Medicine 17(2), 63–64.
40
Rawlins, M. D.: 1988, Spontaneous reporting of adverse drug reactions. II: Uses,
British Journal of Clinical Pharmacology 1(26), 7–11.
Rothman, K. J., Greenland, S. and Walker, A. M.: 1980, Concepts of interaction,
American Journal of Epidemiology 112(4), 467–470.
Sanz, E. J., De-las-Cuevas, C., Kiuru, A., Bate, A. and Edwards, I. R.: 2005, Selective serotonin reuptake inhibitors in pregnant women and neonatal withdrawal
syndrome: a database analysis, The Lancet 365, 482–487.
Savage, R. L.: 1985, Adverse drug reaction monitoring, Master’s thesis, University
of Newcastle upon Tyne.
Silverstein, C., Brin, S. and Motwani, R.: 1998, Beyond market baskets: generalizing
association rules to dependence rules, Data mining and Knowledge Discovery 2, 39–68.
Ståhl, M., Lindquist, M., Edwards, I. R. and Brown, E. G.: 2004, Introducing triage
logic as a new strategy for the detection of signals in the WHO drug monitoring
database, Drug Safety 13(6), 355–363.
van Puijenbroek, E. P., Bate, A., Leufkens, H. G. M., Lindquist, M., Orre, R. and
Egberts, A. C. G.: 2002, A comparison of measures of disproportionality for
signal detection in spontaneous reporting systems for adverse drug reactions,
Pharmacoepidemiology and Drug Safety 11(1), 3–10.
van Puijenbroek, E. P., Egberts, A. C., Meyboom, R. H. B. and Leufkens, H. G. M.:
1999, Signalling possible drug-drug interactions in a spontaneous reporting system: delay of withdrawal bleeding during concomitant use of oral contraceptives
and itraconazole, British Journal of Clinical Pharmacology 47, 689–693.
Webb, A.: 2002, Statistical pattern recognition, 2 edn, John Wiley & Sons, Chichester, England.
41