Download Data Mining in Bioinformatics Day 9: Graph Mining in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Drug discovery wikipedia , lookup

Transcript
Data Mining in Bioinformatics
Day 9: Graph Mining in Chemoinformatics
Chloé-Agathe Azencott & Karsten Borgwardt
February 10 to February 21, 2014
Machine Learning & Computational Biology Research Group
Max Planck Institutes Tübingen and
Eberhard Karls Universität Tübingen
Karsten Borgwardt: Data Mining in Bioinformatics, Page 1
Drug discovery
Modern therapeutic research
From serendipity to rationalized drug design
Ancient Greeks treat
infections with mould
NH2
NH
S
HO
CH3
O
N
CH3
O
O
HO
Biapenem in PBP-1A
Karsten Borgwardt: Data Mining in Bioinformatics, Page 2
Drug discovery process
1. Find a
target
Protein that we want
to inhibit so as to interfer
with a biological process
2. Identify
hits
Compounds likely to
bind to the target
3.Hit-to-lead:
characterize
hits
4. Lead
optimization
and synthesis
Can they be drugs?
(ADME-Tox)
- bioactivity
- pharmacokinetics
- synthetic pathway
5. Assay
- in vitro
- in vivo
- clinical
Karsten Borgwardt: Data Mining in Bioinformatics, Page 3
Drug discovery process
90 months
52 months
1. Find a
target
2. Identify
hits
3.Hit-to-lead:
characterize
hits
4. Lead
optimization
and synthesis
5. Assay
Karsten Borgwardt: Data Mining in Bioinformatics, Page 4
Drug discovery process
90 months
52 months
1. Find a
target
2. Identify
hits
3.Hit-to-lead:
characterize
hits
4. Lead
optimization
and synthesis
5. Assay
$500,000,000
to
$2,000,000,000
Karsten Borgwardt: Data Mining in Bioinformatics, Page 5
Chemoinformatics
How can computer science help?
→ Chemoinformatics!
“...the mixing of information resources to transform data into information, and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and
optimisation.” – F. K. Brown
“... the application of informatics methods to solve chemical problems.”
– J. Gasteiger and T. Engel
Karsten Borgwardt: Data Mining in Bioinformatics, Page 6
Chemoinformatics
Chemoinformatics
1. Find a
target
2. Identify
hits
3.Hit-to-lead:
characterize
hits
4. Lead
optimization
and synthesis
5. Assay
Karsten Borgwardt: Data Mining in Bioinformatics, Page 7
Chemoinformatics
The chemical space
1060 possible small organic molecules
1022 stars in the observable universe
(Slide courtesy of Matthew A. Kayala)
Karsten Borgwardt: Data Mining in Bioinformatics, Page 8
Drug discovery process
1. Find a
target
2. Identify
hits
3.Hit-to-lead:
characterize
hits
4. Lead
optimization
and synthesis
5. Assay
QSAR
QSPR
QSAR: Qualitative Structure-Activity Relationship
i.e. classification
QSPR: Quantititive Structure-Property Relationship
i.e. regression
Karsten Borgwardt: Data Mining in Bioinformatics, Page 9
Representing chemicals in silico
Expert knowledge molecular descriptors
→ hard, potentially incomplete
Molecules are...
NH2
NH
S
HO
CH3
O
N
CH3
O
O
HO
Karsten Borgwardt: Data Mining in Bioinformatics, Page 10
Representing chemicals in silico
Similar Property Principle
Molecules having similar structures should exhibit similar
activities.
→ Structure-based representations
Compare molecules by comparing substructures
Karsten Borgwardt: Data Mining in Bioinformatics, Page 11
Molecular graph
O
O
d
C
O
d
C
C
O
O
C
C
C
C
N
C
C
C
C
C
d
C
C
N
C
S
N
C
Undirected labeled graph
Karsten Borgwardt: Data Mining in Bioinformatics, Page 12
Fingerprints
Define feature vectors that record the presence/absence
(or number of occurrences) of particular patterns in a given
molecular graph
where
φ(A) = (φs(A))s substructure
1 if s occurs in A
φs(A) =
0 otherwise
Extension of traditional chemical fingerprints
Karsten Borgwardt: Data Mining in Bioinformatics, Page 13
Fingerprints
Learning from fingerprints
Classical machine learning and data mining techniques
can be applied to these vectorial feature representations.
Any distance / kernel can be used
Classification
Feature selection
Clustering
Karsten Borgwardt: Data Mining in Bioinformatics, Page 14
Fingerprints
Fingerprints compression
Systematic enumeration → long, sparse vectors
e.g. 50, 000 random compounds from ChemDB
→ 300, 000 paths of length up to 8
→ 300 non-zeros on average
“Naive” Compression
List the positions of the 1s
219 = 524, 288
average encoding: 300 × 19 = 5, 700 bits
Karsten Borgwardt: Data Mining in Bioinformatics, Page 15
Fingerprints
Fingerprints compression
Modulo Compression (lossy)
Karsten Borgwardt: Data Mining in Bioinformatics, Page 16
Frequent patterns fingerprints
MOLFEA
[Helma et al., 2004]
P = positive (mutagenic) compounds
N = negative compounds
features: fragments (= patterns) f such that
both freq(f, P) ≥ t and freq(f, N) ≥ t
Limited to frequent linear patterns
ML algorithm: SVM with linear or quadratic kernel
Karsten Borgwardt: Data Mining in Bioinformatics, Page 17
Frequent patterns fingerprints
MOLFEA [Helma et al., 2004]
CPDB – Carcinogenic Potency DataBase
684 compounds classified in 341 mutagens and 343 nonmutagens according to Ames test on Salmonella
Mutagenicity prediction [Hema04]
Linear kernel
Quadratic kernel
100
Cross-validated sensitivity
90
80
70
60
50
1%
3%
5%
Frequency threshold
10%
Karsten Borgwardt: Data Mining in Bioinformatics, Page 18
Spectrum kernels
φ(A) = (φs(A))s∈S
Kspectrum(A, A0) = k(φ(A), φ(A0))
|(S)|
|(S)|
k ∈ RR ×R can be
Dot product (linear kernel)
RBF kernel
Tanimoto kernel: k(A, B) =
MinMax kernel:
A∩B
A∪B
PN
min(Ai ,Bi )
PNi=1
i=1 max(Ai ,Bi )
Karsten Borgwardt: Data Mining in Bioinformatics, Page 19
Spectrum kernels
Tanimoto and MinMax
Both Tanimoto and Minmax are kernels.
Proof for Tanimoto: J.C. Gower A general coefficient
of similarity and some of its properties. Biometrics
1971.
Proof for MinMax:
hφ(x), φ(y)i
MinMax(x, y) =
hφ(x), φ(x)i + hφ(y), φ(y)i − hφ(x), φ(y)i
with φ(x) of length: # patterns × max count
φ(x)i = 1 iff. the pattern indexed by bi/qc appears more
than i mod q times in x
Karsten Borgwardt: Data Mining in Bioinformatics, Page 20
All patterns fingerprints
Paths fingerprints
Labeled sub-paths (walks)
O
O
d
C
O
d
CsCsCdO
C
O
O
C
C
C
C
N
C
C
NsCsCsS
C
C
C
d
C
C
C
N
C
S
N
C
Some sub-paths of length 3
Karsten Borgwardt: Data Mining in Bioinformatics, Page 21
All patterns fingerprints
Circular fingerprints
Labeled sub-trees - Extended-Connectivity (or Circular)
features
O
O
d
C
O
d
C
O
O
C
C
C
C
N
C
C
C
S
C
C
C
d
C
C
C
N
N
C{sC{sN|sC}|sN{sC}|sS{sC}}
C
Example of a circular substructure of depth 2
Karsten Borgwardt: Data Mining in Bioinformatics, Page 22
All patterns fingerprints
2D spectrum kernels
[Azencott et al., 2007]
Systematically extract paths / circular fingerprints,
for various maximal depths
SVM with Tanimoto / Minmax
Karsten Borgwardt: Data Mining in Bioinformatics, Page 23
All patterns fingerprints
2D spectrum kernels
[Azencott et al., 2007]
Mutagenicity (Mutag): 188 compounds
Benzodiazepine receptor affinity (BZR): 181+125 compounds
Cyclooxygenase-2 ihibitors (COX2): 178 + 125 compounds
Estrogen receptor affinity (ER): 166 + 180 compounds
Data
Mutag
BZR
COX2
ER
SVM Previous best
90.4% 85.2% (gBoost)
79.8%
76.4%
70.1%
73.6%
82.1%
79.8%
Karsten Borgwardt: Data Mining in Bioinformatics, Page 24
Weisfeiler-Lehman kernel
[Shervashidze et al., 2011]
Goal: scalability
Compute a sequence that captures topological and label
information of graphs in a runtime linear in the number of
edges
→ sub-tree kernel
Karsten Borgwardt: Data Mining in Bioinformatics, Page 25
Weisfeiler-Lehman kernel
[Shervashidze et al., 2011]
Karsten Borgwardt: Data Mining in Bioinformatics, Page 26
Convolution kernels
a.k.a. decomposition kernels
(x1, . . . , xD ) is a tuple of parts of x, with xd ∈ X for each
part d = 1, . . . , D
kd ∈ RXd×Xd : a Mercer kernel
0
Kdecomposition(x, x ) =
X
X
k1(x1, x01)k2(x2, x02) . . . kD (xD , x0D )
x1 x2 ...xD =x x0 x0 x0 =x0
1 2 D
Spectrum kernels are a particular case of convolution
kernels
Karsten Borgwardt: Data Mining in Bioinformatics, Page 27
Convolution kernels
Weighted Decomposition Kernel
[Menchetti et al., 2005]
Match atoms and weigh them according to a kernel between subgraphs that include these atoms
P
P
KW DK (x, x0) = (a,σ∈Dr (x)) (a0,σ0∈Dr (x0)) δ(a, a0)Kc(σ, σ 0)
r>0∈N
Dr (x): decompositions of the molecular graph of x in an atom a
and a subpath σ of x including a and of depth at most r
Karsten Borgwardt: Data Mining in Bioinformatics, Page 28
Convolution kernels
Weighted Decomposition Kernel
[Menchetti et al., 2005]
Kc: contextual kernel, here: histogram intersection kernel
P
0
Kc(σ, σ 0) = l∈L min(f σ (l), f σ (l))
L: possible labels for edges and vertices
f σ (l): frequency of label l subgraph σ.
Karsten Borgwardt: Data Mining in Bioinformatics, Page 29
Introducing spatial information
3D Histograms
[Azencott et al., 2007]
Groups of k atoms
Associated size:
Pairwise distances
(k = 2)
diameter of the smallest
sphere that contains all
k atoms
Karsten Borgwardt: Data Mining in Bioinformatics, Page 30
Introducing spatial information
3D Histograms [Azencott et al., 2007]
One histogram per class of k -tuple (e.g. C-C-C, C-C-O)
O
O
2.6
5.6
O
7.9
2.7
9.2
C
O
C
6.3
C
6.6
C
6.7
3.2
3.7
2.4
C
4.6
O
C
C
N
C
C
2.2
C
N
C
5.7
C
C
C
9.5
N
S
Frequency of N-O
4
C
C
3
2
1
0
0
1
2
7
3
5
6
4
N-O distance (A)
8
9
10
Karsten Borgwardt: Data Mining in Bioinformatics, Page 31
Introducing spatial information
3D Histograms: performance
[Azencott et al., 2007]
2D kernel Hist3D kernel
Data
Mutag
90.4%
88.8%
BZR (loo) 82.0%
79.4%
ER (loo)
87.0%
86.1%
COX2
76.9%
78.6%
Karsten Borgwardt: Data Mining in Bioinformatics, Page 32
Introducing spatial information
3D Decomposition Kernels
[Ceroni et al., 2007]
P
P
Remember: KW DK (x, x0 ) = (a,σ∈Dr (x)) (a0 ,σ0 ∈Dr (x0 )) δ(a, a0 )Kc (σ, σ 0 )
P
P
K3DDK (x, x0 ) = σ∈Sr (x) σ0 ∈Sr (x0 ) Ks (σ, σ 0 )
Sr (x): subgraphs of x composed of r distinct vertices
Qr(r−1)/2
0
Ks (σ, σ 0 ) = i=1
δ(ei , e0i )e−γ(li −li )
li = length of edge ei in x
(e1 , e2 , . . . , er(r−1)/2 lexicographically ordered; γ ∈ R
Karsten Borgwardt: Data Mining in Bioinformatics, Page 33
Introducing spatial information
3DDK: Performance
[Ceroni et al., 2007]
2D kernel Hist3D kernel
Data
Mutag
90.4%
88.8%
BZR (loo) 82.0%
79.4%
ER (loo)
87.0%
86.1%
COX2
76.9%
78.6%
3DDK Circ3DDK
86.7%
83.5%
78.4%
81.4%
82.3%
82.1%
75.6%
75.2%
Karsten Borgwardt: Data Mining in Bioinformatics, Page 34
Introducing spatial information
The pharmacophore kernel
[Mahé et al., 2006]
pharmacophore p ∈ P(x): p = [(x1 , l1 ), (x2 , l2 ), (x3 , l3 )]
xi 3D coordinates of atom i of x; li = label of atom i
P
P
K(x, x0 ) = p∈P(x) p0 ∈P(x0 ) KP (p, p0 )
KP (p, p0 ) = Kdist (d1 , d01 )Kdist (d2 , d02 )Kdist (d3 , d03 )Kf eat (l1 , l10 )Kf eat (l2 , l20 )Kf eat (l3 , l30 )
kd−d0 k2
0
Kdist : RBF Gaussian Kdist (d, d ) = exp
2σ 2
Kf eat : Dirac
Karsten Borgwardt: Data Mining in Bioinformatics, Page 35
Introducing spatial information
3D LAP kernel
[Hinselmann et al., 2010]
M : pairwise intramolecular matrix of inter-atomic
geometric distances
Karsten Borgwardt: Data Mining in Bioinformatics, Page 36
Introducing spatial information
Conclusion
How relevant is 3D information?
How good is 3D information?
Karsten Borgwardt: Data Mining in Bioinformatics, Page 37
Drug discovery process
1. Find a
target
3.Hit-to-lead:
characterize
hits
2. Identify
hits
4. Lead
optimization
and synthesis
5. Assay
Docking
Virtual
High-Throughput
Screening
Karsten Borgwardt: Data Mining in Bioinformatics, Page 38
High-throughput screening
Assay a large library of potential
drugs against their target
Very costly
→ docking
→ virtual high-throughput
screening (vHTS)
Karsten Borgwardt: Data Mining in Bioinformatics, Page 39
Measuring performance
Imbalanced data
Typically, most compounds are inactive ⇒ many more negative
than positive examples
E.g. DHFR data set:
99, 995 chemicals screened for activity against dihydrofolate
reductase; < 0.2% active compounds
Accuracy is not appropriate:
predicting all compounds negative ⇒ accuracy = 99.8%
sensitivity= # True Positives
# Positives
# True Negatives
specificity=
# Negatives
For many methods, the output is continuous
⇒ accuracy, sensitivity and specificity depend on a threshold θ
Karsten Borgwardt: Data Mining in Bioinformatics, Page 40
Measuring performance
Receiver-Operator Characteristic Curves
For all possible values of θ, report sensitivity and 1 − specif icity
AUROC (Area under the ROC Curve) is a numerical measure of
performance
AUROC(random) = 0.5 and AUROC(optimal) = 1
3/4
2/4
x 0.9
1/4
x 0.81 x 0.73 x 0.52 x 0.2
0
True Positive Rate
1
x 0.17 x 0.12 x 0.09
label
+
+
+
+
-
x 0.95 x 0.94
x Inf
0
random
perfect
real
1/6
1/3
1/2
2/3
5/6
prediction
0.95
0.94
0.90
0.81
0.73
0.52
0.20
0.17
0.12
0.09
1
False Positive Rate
Karsten Borgwardt: Data Mining in Bioinformatics, Page 41
Measuring performance
[Azencott et al., 2007]
0.6
0.4
0.2
RANDOM
IRV
SVM
MAXSIM
0.0
AUC
0.71
0.59
0.59
0.54
TPR
method
IRV
SVM
kNN
MAX-SIM
0.8
1.0
Inhibition of DHFR: ROC Curves
0.0
0.2
0.4
0.6
0.8
1.0
FPR
Karsten Borgwardt: Data Mining in Bioinformatics, Page 42
Measuring performance
Precision-recall curves
Precision = # True Positives
# Predicted Positives
Recall = sensitivity
x 0.81
3/5
x 0.9
x 0.52
x 0.2
1/5
2/5
x 0.94
x 0.73
x 0.17
x 0.12
x 0.09
perfect
real
0
Precision
4/5
1
x 0.95
0
1/4
2/4
3/4
1
Recall
Karsten Borgwardt: Data Mining in Bioinformatics, Page 43
Other applications
Other applications of graph mining in chemoinformatics
Database indexing and search
Prediction of 3D structures of small compounds
and proteins
Reaction Prediction
Karsten Borgwardt: Data Mining in Bioinformatics, Page 44
References and further reading
[Azencott et al., 2007] Azencott, C.-A., Ksikes, A., Swamidass, S. J., Chen, J. H., Ralaivola, L. and Baldi, P. (2007). One-to fourdimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. Journal of chemical
information and modeling 47, 965–974. 23, 24, 35, 36, 37, 47
[Baldi et al., 2007] Baldi, P., Benz, R. W., Hirschberg, D. S. and Swamidass, S. J. (2007). Lossless compression of chemical fingerprints
using integer entropy codes improves storage and retrieval. Journal of chemical information and modeling 47, 2098–2109.
[Ceroni et al., 2007] Ceroni, A., Costa, F. and Frasconi, P. (2007). Classification of small molecules by two-and three-dimensional
decomposition kernels. Bioinformatics 23, 2038–2045. 38, 39
[Helma et al., 2004] Helma, C., Cramer, T., Kramer, S. and De Raedt, L. (2004). Data mining and machine learning techniques for
the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of
chemical information and computer sciences 44, 1402–1411. 17, 18
[Hinselmann et al., 2010] Hinselmann, G., Fechner, N., Jahn, A., Eckert, M. and Zell, A. (2010). Graph kernels for chemical compounds
using topological and three-dimensional local atom pair environments. Neurocomputing 74, 219–229. 41
[Mahé et al., 2006] Mahé, P., Ralaivola, L., Stoven, V. and Vert, J.-P. (2006). The pharmacophore kernel for virtual screening with
support vector machines. Journal of chemical information and modeling 46, 2003–2014. 40
[Menchetti et al., 2005] Menchetti, S., Costa, F. and Frasconi, P. (2005). Weighted Decomposition Kernels. In Proceedings of the 22nd
International Conference on Machine Learning pp. 585–592, ACM, Bonn, Germany. 33, 34
[Saigo et al., 2009] Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T. and Tsuda, K. (2009). gBoost: a mathematical programming
approach to graph classification and regression. Machine Learning 75, 69–89. 26, 27, 28, 29
[Shervashidze et al., 2011] Shervashidze, N., Schweitzer, P., van Leeuwen, E. J., Mehlhorn, K. and Borgwardt, K. M. (2011). WeisfeilerLehman graph kernels. Journal of Machine Learning Research 12, 2539–2561. 30, 31
Karsten Borgwardt: Data Mining in Bioinformatics, Page 45