Download Data Mining as Method to Streamline the Drug Discovery Process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining as Method to
Streamline the Drug Discovery Process
Markus Duerring, Predict AG
[email protected]
http://www.predict.ch
P
Page 1
Predict AG – Architect for
Business Analytical Systems
Predict AG
• Founded 1998
• Independent corporation
• Grown to 30 people
• Customers in Finance, Telco, Retail, Pharma
Staff
Technology
Services
Know-How
combination of
and its application in
business processes
Projects that fit your
information needs
Experience
Knowledge
Management
Award winning
Projects
Data Mining
Certified SAS Quality
Partner
Business
Data Analysis
IT
P
Decision Support
Systems
Systems Integration
Page 2
The Starting Point:
The TheraStrat SafeBase Project
Correlation of patterns of genetic polymorphisms and gene expression with
drug-induced adverse effects and drug structure.
Structures
Parent
Intermediates
Metabolites
Adducts
Targets
Mimics
Pathways
Structures
Similarities
Type
P
Clinical
Endpoint
Type
Frequency
Severity
Which drug
Which
population
Expression
Genes
Receptors
Promotors
Transcriptionfactors
Responsive
elements
Frequencies
patterns
Function
3D-Structure
Adduct target
Autoantigen
Allelic Variants
SNPs
Splice
Variants
Amplifications
Functions
3D-Structure
Ethnic differences
Selectivity
Sensitivity
Page 3
The Situation: Recent Market Withdrawals
or Suspended Development
Name
Substance
Company
Therapeutic Area Reaction
Posicor
Mibefradil
Roche
Cardiovascular
Underestimated effect of
Drug/Drug interactions
Trovan
Trovaflocin
Pfizer
Antibiotic
Unexpected severe liver
toxicity with deaths,
call for ban of product.
Zagam
Sparfolxacin
RPR
Antibiotic
Severe phototoxicity and
cardiotoxicity. Limited for
use in pneumonia in EU.
Tempium Lazabemide
Roche
Alzheimer
Severe liver toxicity
in Phase III
Development aborted
Rezulin
Warner-Lambert
Diabetes II
Severe liver toxicity with
deaths, withdrawn from
the market.
P
Troglitazone
Page 4
Is it Possible to Avoid
Adverse Drug Effects?
Conventional analysis is not applicable because
1. Amount of data is too large
2. Incidence rate is very low
•
Increase number of patients to test
(The „brute-force“ method)
•
Use past experience to build consolidated learning sets that
allow functional compound profiling of NCE’s
(The datamining method)
P
Page 5
Avoiding Adverse Effects (ADE):
The „Brute-Force“ Method
- Sample size n = 5000 phase III patients
- Incidence rate p = 1/10‘000
- How likely do we encounter an ADE in our safety samples?
Probability for at least one occurrence of one ADE in the
sample: 1-p(X=0) = 1-exp(-1/2) = 1-0.607 = 0.293
- Estimator l of the Poisson Distribution:
l=0.5*2.336=1.183
- upper 0.9-confidence limit for l = 0.5*7.779=3.38895
For n=10‘000, the estimated p:1/8453
Required sample size with upper 0.9 confidence limit for
p < 1/10‘000 = 38‘896
P
Page 6
Avoiding Adverse Effects:
The Datamining Method
Deductive Reporting
Inductive Reporting
• OLAP
• Data Mining
• Transaction systems
• Analytical systems
• Retrospective data analysis
• Scoring compounds
• Known dimensions
• Search for new dimensions
I.e. Reporting all biological
I.e. Prediction of the
and chemical results
potential risk for a
of a given compound
new compound
P
Page 7
Where Datamining makes
Sense in the R&D Process?
Number of
Compounds
Development
P
Drug Discovery Process
Research
Information
per
Compound
SAR Report
Page 8
Is Datamining Worth the Investment?
Part One: Identify risky compounds earlier
Assuming your scoring model predicts a substance with a potential ADE
Development
Step
Cost
(Mio $)
No. substances
(Ordinary)
No. substances
(Data Mining)
Cost
Reduction
Lead compound selection
10
300‘000
299‘999
0
Discovery testing
20
300
299
0
Stability and formulation
20
20
19
1
Safety testing
60
10
9
6
Scale-up, Process setup
80
6
5
14
Phase I trials
10
5
4
2
Phase II trials
20
2
1
10
Phase III + IV trials
80
1
0
80
Total
P
300
113
Page 9
Is Datamining Worth the Investment?
Part 2: Rescue a Promising NCE
Example: Market of the future in the field of diabetes, type II: $
6 billion by the year 2004
(estimates and projection by Lehman Brothers, July 1999)
Drug
Companies
Market Projections
2001
2004
Troglitazone
(Rezulin)
Warner-Lambert
withdrawn
withdrawn
Rosiglitazone
(Avendia)
Smith Kline Beecham
$1.8 Billion
$ 1.85 Billion
Pioglitazone
(Actos)
Takeda/Lilly
$ 1 Billion
$ 2.24 Billion
P
Page 10
System Design:
Predict Solution Box for Drug Discovery
Genomics
View
Compound
View
Request Agent
Analysis
DataMining
Modelling
Simulation
Interface to
operational
systems
Knowledge
Base
Loading Agent
DNA
microarray
Data
P
ISIS,
Abase,
Proteomics
Swissprot
OMIM
Metabolic
pathways
(KEGG)
Domains
Motifs
(Pfam)
Page 11
Linking Your Own Data with External Data:
The Affymetrix Data Quality Challenge
•
Experiment sample and experiment controls are not on the
same chip.
•
Standards across individual chips are rarely used.
•
Housekeeping genes are often not of sufficient stability to be
used as mean for normalisation.
•
Total fluorescense per chip is used for scaling, assuming that
the difference between control and sample is small.
•
Using further normalisation parameters (Positive ration, pos/neg
ratio, Log Avg ratio, etc). for more reliable normalisation?
P
Page 12
Validating the System:
Using DNA Gyrase as Model Target
•
•
DNA Gyrase is an essential
prokaryotic type II DNA
topoisomerase
Involved in many biological
processes, e.g.
fi DNA replication
fi Gene expression
fi Recombination
P
Page 13
Experiment Setup
•
•
•
•
•
•
P
Novobiocin:
0, 12.5 and 125 µg/ml
Ciprofloxacin:
0, 30 and 300 µg/ml
Time points:
10, 30 and 60 min
RNA isolation
DNA microarray (Affymetrix)
Fluorescent signals detected by scanning confocal
microscopy
Page 14
DNA Microarray Results
• ~ 2000 H. influenzae genes
• In parallel proteins analyzed
with 2-D gels (proteomics)
P
Page 15
DNA Gyrase Data Analysis
•
Comparison of experiments (antibiotic exposure) versus controls
(no antibiotics) (Affymetrix algorithm)
•
SAS macros to
– Import of 37 files into SAS
4870 rows and 36 columns each
– Reformat and recalculate variables if necessary
– Merge result-files and files containing gene - or protein - or
metabolic pathways descriptions
– Filtering of data
P
Page 16
Clustering and Visualisation
• Table analyzed in SAS Enterprise Miner
- Clustering by
Average linkage method or
K-means algorithm
- Analysis variables = Fold Changes
• Visualization by TreeView (Eisen)
green = downregulated
red = upregulated
P
Page 17
Linking External Information:
Results from Ciprofloxacin Exposure
P
Page 18
Conclusions
•
Scoring of NCE‘s can be a valuable method to streamline the overall
R&D process
•
If value in the overall R&D process shall be generated, the correlation
of genetic, biological and chemical data must be taken into
consideration
•
The Key success factor to use datamining as a method to streamline
the drug discovery process is the consolidation, the normalization and
the quality of your data
P
Page 19
Acknowledgements
TheraSTrat AG
• Prof. Dr. Joseph Gut, CSO
Predict AG
• Dr. Patrick Schünemann, CEO
• Dr. Hans Gmünder, Datamining Consultant
• Thomas Nawrath, Datamining Consultant
P
Page 20