Download Slides of Lecture 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Predictive modeling of interactions between
small chemical molecules and proteins !
Anna Cichonska!
1)  Department of Computer Science, Aalto University!
2)  Institute for Molecular Medicine Finland FIMM, University of Helsinki !
[email protected] 29 February 2016!
Drug molecule!
– a small chemical substance used in the treatment, cure, prevention or diagnosis of a disease, or to otherwise enhance physical or mental well-being.!
!
Examples!
!•  Aspirin
(acetylsalicylic acid)
Treatment of pain, fever and inflammation.!
•  Lovastatin
Lowers
cholesterol level.!
!
!
!
•  Everolimus
Treatment of cancer, including cancer of the
kidneys, pancreas, breasts, and brain. Used with
other medicines to keep the body from rejecting a transplanted kidney or liver.!
T-61.6080 !
29 February 2016!
2!
Targets of drugs!
!  Drug-like chemical compounds execute their actions mainly by modulating
cellular targets, including proteins, metabolites or even nucleic acids (DNA
and RNA).
!
PROTEINS!
!  Large biomolecules.!
!  One of the most abundant molecules in living organisms.!
!  Perform variety of crucial functions, such as:!
"  transporting molecules (e.g. hemoglobin transports oxygen), !
"  catalyzing reactions (enzymes), !
"  replicating DNA, !
"  identifying and neutralizing foreign bacteria and viruses,!
& many more.!
T-61.6080 !
29 February 2016!
3!
Targets of drugs - PROTEINS!
!  Proteins consist of one or more long chains of amino acid residues.!
!  There are 20 proteinogenic amino acids.!
!  Amino acid sequences of proteins are encoded in the genes.!
Amino acid!
Protein sequence!
!  The sequence of the amino acid chain
causes the polypeptide to fold into a shape that is biologically active.!
Protein structure!
T-61.6080 !
29 February 2016!
4!
Drug-target interaction!
(DTI) ! One of the most common
drug’s mechanisms of action (MoA)!
Drug-target interaction = molecular-level interaction, e.g. !
hydrogen bonds keep a compound tightly bound to a protein,... hydrophobic protein atoms enclose
hydrophobic compound atoms. T-61.6080 !
29 February 2016!
5!
Drug-target interaction !
Statin-[HMG-CoA reductase] Inhibition of HMG-CoA reductase
enzyme with statin!
HMG-CoA reductase T-61.6080 !
29 February 2016!
6!
Drug-target interaction !
Aspirin suppresses the production of prostaglandins and thromboxanes due to its irreversible inactivation of the cyclooxygenase.!
Aspirin-cyclooxygenase Cyclooxygenase OH!
O!
H!
ACTIVE
cyclooxygenase
enzyme !
INACTIVE
cyclooxygenase
enzyme !
T-61.6080 !
29 February 2016!
7!
Why is drug discovery difficult? !
CHEMICAL SPACE !
TARGET (PROTEIN) SPACE !
Only certain molecules have features
consistent with good pharmacological
properties (Lipinski's rule of five).!
!
Only certain targets have binding sites
capable of efficient binding of drug-like ligands.!
!
104 – 105 proteins!
1020 – 1024 ! T-61.6080 !
29 February 2016!
8!
Accessible pharmacological space !
T-61.6080 !
29 February 2016!
9!
Drug development – some facts!
!  More than half of candidate drugs that look promising in the research lab will ultimately fail.!
!  More than a quarter of drugs that reach the clinical trial stage will be rejected as ineffective or unsafe (causing side effects).!
!  The amount of new drugs approved by US Food and Drug Administration (FDA) per billion dollars (inflation-adjusted) spent on R&D has halved every
9 years since 1950.!
Scannell JW et al. (2012) “Diagnosing the decline in
pharmaceutical R&D
efficiency”. Nature Reviews Drug
Discovery 11.3.!
T-61.6080 !
29 February 2016!
10!
Motivation for computational methods in drug discovery!
!  Target-centered drug discovery Recently, drug discovery was focused on designing selective chemical agents, each
binding with maximal affinity to an individual molecular target involved in a particular
disease, the concept often referred to as one drug – one target – one disease
paradigm. !  System-driven drug discovery, network
pharmacology!
Developing rational, effective and safe therapies requires a network perspective of the cellular system and a proteome-wide profile of drug-target interactions, !
including both on- and off-target effects. !  Experimental compound-target interaction mapping is time consuming and
expensive.!
!  Moreover, it is simply infeasible to determine all the possible compound-target
interactions in the laboratory (1020 – 1024 drug-like compounds!).!
T-61.6080 !
29 February 2016!
11!
In silico drug screening!
!  Docking simulations!
"  Finding preferred orientation of one molecule to a second one when bound to each other to form a stable complex.!
"  Very accurate but slow.!
"  Require the usage of 3D molecular structures and vast amount of computational resources.!
"  DOCK: first docking program by Kuntz et al. (1982).!
"  Current algorithms:!
– fragment-based methods: FlexX, DOCK (since version 4.0);!
– Monte Carlo/Simulated annealing: QXP(Flo), Autodock, Affinity & LigandFit (Accelrys);!
– Genetic algorithms: GOLD, AutoDock (since version 3.0);!
– Systematic search: FRED (OpenEye), Glide (Schrödinger). !
T-61.6080 !
29 February 2016!
12!
In silico drug screening!
!  Machine Learning!
"  Less accurate but orders of magnitude faster than docking simulations; thus, more preferable for early-stage in silico drugs screening.!
"  Compound (ligand)-based prediction Models trained using compound information only.!
"  Target-based prediction
Models trained using target information only.!
"  The above approaches allow discovering new targets for given drugs or new drugs
targeting given targets.!
"  System-based prediction
Models trained using both compound and target information.
Assumption: similar compounds are likely to interact with similar targets.!
T-61.6080 !
29 February 2016!
13!
Components of the drug-target interaction system-based prediction methods!
!  Classification
interaction/no interaction !
!  Regression quantitative binding affinity!
T-61.6080 !
29 February 2016!
14!
Kernels – recap from last week!
!  Assumption: similar drugs are likely to interact with similar targets.!
!  Similarities between molecules can be encoded using kernel functions K(x,z).!
!  If φ (x) is a feature vector describing object x, the linear kernel can be calculated as:
K(x, z) = φ (x), φ (z) = φ j (x)φ j (z).
!
∑
j
!  Kernel functions !K : X! × X !→ ℜ! are used to compute the dot product between
data points in a high dimensional feature space H without explicitly calculating
the mapping from the input space to H, i.e. φ : X → H. !
!  A linear model in the implicit feature space corresponds to a non-linear model in the original space.!
!  In addition to kernels being able to extract nonlinear features implicitly, they are very
useful for calculating the similarity between structured objects, e.g. images, chemical
molecules or proteins.!
T-61.6080 !
29 February 2016!
15!
Drug-target interaction prediction - OVERVIEW!
LABELS!
(known interactions)!
KERNELS!
Drugs!
Drugs!
KD
Predicted DTIs!
Proteins!
Proteins!
Drug-protein pairs!
y
KT
Classification!
or regression!
algorithm!
T-61.6080 !
29 February 2016!
16!
Known interactions – bioactivity databases!
h#ps://www.ebi.ac.uk/chembl/ " 
" 
" 
" 
h#ps://pubchem.ncbi.nlm.nih.gov/ "  Data generators deposit their data.!
"  Incorporates data from other databases. e.g. ChEMBL.!
"  Contains data on ~40 mln unique structures, 700 000 assays.!
Searchable and downloadable. !
Data manually extracted from the literature.!
Target Report Card, Compound Report Card. !
~1.5 mln compounds in ChEMBL19.!
"  7 669 drug entries.!
"  Does not contain strictly bioactivity data.!
"  Pharmacokinetics.!
h#p://www.drugbank.ca/ "  Psychoactive compounds.!
h#p://pdsp.med.unc.edu/ T-61.6080 !
29 February 2016!
17!
Drug-target interaction prediction - OVERVIEW!
LABELS!
(known interactions)!
KERNELS!
Drugs!
Drugs!
KD
Predicted DTIs!
Proteins!
Proteins!
Drug-protein pairs!
y
KT
Classification!
or regression!
algorithm!
T-61.6080 !
29 February 2016!
18!
Molecular fingerprint!
Chemical space!
!  A way of encoding the structure of a molecule.!
!  The most common type of fingerprint is a series of binary digits (bits) that represent the presence or absence of particular substructures in the molecule.!
Example!
T-61.6080 !
29 February 2016!
19!
2D fingerprints!
Chemical space!
Dictionary-Based Fingerprints
Pre-defined fragments, each of which maps to a single bit. Examples: MACCS Keys, BCI.!
Hashed Fingerprints
Fragments are generated algorithmically
without the need for a dictionary, e.g. all
paths up to seven non-hydrogen atoms. Examples: Daylight, UNITY fingerprints. !
Circular Substructures
Each atom is represented together with its
environment (neighbouring atoms as
extended spheres). Examples: ECFP2, ECFP4, FCFP2.!
T-61.6080 !
29 February 2016!
20!
3D fingerprints!
Chemical space!
Presence or absence of geometric features, !
e.g. pairs/triplets of atoms at given distance, valence/torsion angles.!
!
T-61.6080 !
29 February 2016!
21!
Fingerprint-based Tanimoto kernel!
K( fp1, fp2 ) =
Chemical space!
N fp1, fp2
N fp1 + N fp2 − N fp1, fp2
fpi
− fingerprint of the molecule i,
N fpi
− number of 1-bits in the fingerprint fpi ,
N fp1, fp2 − number of 1-bits in both fingerprints.
"  Computed based on the size of common substructures of the molecules represented
by the fingerprints. !
3D shape-based comparison!
"  Atoms are represented as Gaussian functions.!
"  Molecules are aligned in 3D.!
"  Similarity score is based on the common volume. !
T-61.6080 !
29 February 2016!
22!
Chemical space!
Graph kernel!
!  Graph kernels allow to measure the
similarity between graphs. !
!
!  Chemical molecule can be represented as a labeled or unlabeled graph, where a node corresponds to an atom, and an edge indicates a bond between two !
atoms.!
!
!  Graph kernels can be roughly categorized into three main groups:!
1)  graph kernels based on walks and paths,!
2)  graph kernels based on limited-size subgraphs,!
3)  graph kernels based on subtree patterns.!
!  Examples: random walk kernel, shortest-path kernel, Weisfeiler-Lehman subtree kernel. !
!
T-61.6080 !
29 February 2016!
23!
Chemical space!
Example: random walk!
!  A graph G is a set of nodes (vertices) V and edges E.
!
"1 if (vi , v j ) ∈ E
!  The adjacency matrix A of G is defined as [A]ij = #
.!
$0 otherwise
!  Walk – a sequence of nodes, in which consecutive nodes are connected by an edge.
A walk can travel over any edge and any node any number of times.!
v 2!
v 3!
v 1!
v 6!
v 7!
v 5!
w = (v1, v2, v6, v7, v2, v6, v5)!
v 4!
!  Walks of length k can be computed by taking the adjacency matrix A to the power of k. !
Ak(vi,vj) = m # m walks of length k exist between nodes vi and vj.!
T-61.6080 !
29 February 2016!
24!
Chemical space!
Example: random walk graph kernel!
!  Random walk kernel computes the number of all pairs of matching walks in a pair of graphs. !
!  TRICK: common walks of length k can be calculated from the adjacency matrix of the product graph G× of two input graphs G1 and G2.!
!  G× is a graph over pairs of vertices from G1 and G2. Two vertices in G× are neighbors if and only if the corresponding vertices in G1 and G2 are both neighbors.!
!  Random walk kernel!
|V× |
∞
G2!
G1!
G×!
|V× |
K× (G1, G2 ) = ∑ [∑ λ n A×n ]ij = ∑ [(I − λ A× )−1 ]
counts all pairs of matching walks
of any length !
!
i, j=1 n=0
i, j=1
Vishwanathan S et al. (2010) "Graph kernels”. The Journal of Machine Learning Research 11.!
T-61.6080 !
29 February 2016!
25!
Problem with using chemical structures!
Chemical space!
!  Sometimes structurally similar molecules can have different properties.!
2D Tanimoto
similarity!
99%!
95%!
T-61.6080 !
29 February 2016!
26!
Chemical space!
Additional information!
!  Side effects;!
Example: Glibenclamide (A10B B01) !
A
!  Anatomical Therapeutic Chemical (ATC) Classification System;!
!  Gene expression responses to drugs.!
!
!
!Alimentary tract and metabolism !(main anatomical group)!
A10 !
!
!Drugs used in diabetes !(main therapeutic group)!
A10B
!
!Oral blood-glucose-lowering drugs !(pharmacological subgroup)!
A10B B
!
!Sulfonamides, urea derivatives !(chemical/therapeutic subgroup)!
A10B B01 !Glibenclamide (subgroup
!
!for chemical substance)!
T-61.6080 !
29 February 2016!
27!
Proteins!
PROTEIN SPACE!
Proteins!
KT
T-61.6080 !
29 February 2016!
28!
Amino acid sequence alignment!
Protein
space!
!  Protein sequence alignment is a way of arranging the amino acid sequences to identify regions of similarity that may be a consequence of functional, structural, !
or evolutionary relationships between the sequences.!
!  Smith-Waterman (SW) algorithm!
"  Performs local sequence alignment; uses dynamic programming to compare segments of all possible lengths.!
"  To find the optimal alignment, a scoring system including a set of specified gap penalties is used (different scoring matrices, e.g. BLOSUM, PAM).!
"  The algorithm assigns a score to each residue comparison between two sequences.!
"  Normalized similarity
between two proteins p1 and p2:!
s( p1, p2 ) =
SW ( p1, p2 )
.
SW ( p1, p1 ) SW ( p2 , p2 )
T-61.5120 - Computational Genomics P!
T-61.6080 !
29 February 2016!
29!
Amino acid sequence alignment!
Protein
space!
!  Some proteins might have very low amino acid sequence identity but similar 3D structures. !
T-61.6080 !
29 February 2016!
30!
Protein
space!
Additional information!
!  3D protein structures (Protein Data Bank PDB, computational prediction algorithms);!
!  Binding sites, domains;!
!  Protein surface;
!
!  Protein-protein interaction network; !
!  Gene Ontology classifications
(http://geneontology.org/).!
http://wwpdb.org/!
Three domains:!
1)  biological processes, !
2)  cellular components,!
3)  molecular functions.!
T-61.6080 !
29 February 2016!
31!
Drug-target interaction prediction - OVERVIEW!
LABELS!
(known interactions)!
KERNELS!
Drugs!
Drugs!
KD
Predicted DTIs!
Proteins!
Proteins!
Drug-protein pairs!
y
KT
REGRESSION!
algorithm!
T-61.6080 !
29 February 2016!
32!
Supervised learning!
inferring a function from
LABELED training data!
!  Classification is the prediction of a class label, given attributes. !
!  Regression is the prediction of a real number, given attributes. !
INGREDIENTS !
"  X: a space of inputs.!
"  Y: a space of outputs.!
"  G: a set of models mapping input to output G = {g: X
Y}.!
"  Training data {(xi,yi)}, i=1,...,N, xi ∈X, yi ∈Y sampled from an underlying unknown distribution (x,y)~D.!
"  l: !a loss function measuring the discrepancy between the model’s !predicted outputs and true outputs.!
GOAL: !to find a model g that minimizes the expected loss l(g(x),y) !on future instances; typically, squared loss l(g(x),y) = (g(x) – y)2.
T-61.6080 !
29 February 2016!
33!
(e.g. interaction with cyclooxygenase)!
Regression!
g1(x) y
g2(x) x
(e.g. molecular weight of chemical compound) !
T-61.6080 !
29 February 2016!
34!
Regression!
g1(x) y
g1(x): more complex model, overfitting.!
g2(x): generalizes better to new instances.!
g2(x) x
!
T-61.6080 !
29 February 2016!
35!
Regularization!
!  Regularized learning considers optimising the functions of the form:!
N
min ∑ ℓ(g(xi ), yi ) + λΩ(g).
g∈G
i=1
Training error (loss), !
typically, squared loss: l(g(xi),yi) = (g(xi) – yi)2.
!
Regularizer that controls the complexity of the model g.!
"  Complex model g # high value of Ω(g).!
"  Regularization parameter λ ≥ 0 controls
the balance between training error and model complexity. !
T-61.6080 !
29 February 2016!
36!
Linear models and dual representation!
!  A model in the form of linear function has a form:!
g(x) = w, φ (x) = wT φ (x).
!  Weight vector w can be written as a linear combination of the training points: !
N
w = ∑α iφ (xi ).
α: dual variable !
i=1
!  Now, we can represent model’s prediction in terms of inner products of training points # we can use kernels: !
N
N
g(x) = w, φ (x) = ∑α i φ (xi ), φ (x) = ∑α i K(xi , x).
i=1
i=1
T-61.6080 !
29 February 2016!
37!
Regularized least-squares (RLS) regression!
for DTI prediction!
"  A set of nd drugs:!
"  A set of nt targets:
D = {d1,..., dnd };
!!
"  A set of N training inputs (drug-target pairs): !!
T = {t1,..., tnt };
X = {(d1, t1 ),..., (d1, tnt ), (d2 , t1 ),..., (d2 , tnt ),..., (dnd , tnt )};
"  N = nd × nt;!
"  yi: a real value indicating binding affinity of ith drug-target pair xi;!
"  λ: regularization parameter;!
"  ||g||K: norm of the prediction function g on the Hilbert space associated to the kernel K. !
N
objective function!
N
min
∑(g(xi ) − yi )2 + λ g
i=1
2
K
g(x) = ∑α i K(xi , x)
i=1
2
N
N
g 2 = ∑∑α iα j K(xi , x j )
i=1 j=1
T-61.6080 !
29 February 2016!
38!
Regularized least-squares (RLS) regression!
N
for DTI prediction!
BASE KERNELS!
PAIRWISE KERNEL!
!
K = K D ⊗ KT .
KD
Drug-protein pairs!
Proteins!
KT
Drug-protein pairs!
K
=!
Proteins!
i=1
Kronecker product of two base kernels: !
Drugs!
Drugs!
g(x) = ∑α i K(xi , x)
T-61.6080 !
29 February 2016!
39!
Kronecker product!
!  Defined for any two matrices B and C of arbitrary size.!
!  Resulting matrix contains all possible products of entries of B and C.!
Example!
T-61.6080 !
29 February 2016!
40!
Regularized least-squares (RLS) regression!
N
for DTI prediction!
BASE KERNELS!
PAIRWISE KERNEL!
!
K = K D ⊗ KT .
KD
Drug-protein pairs!
Proteins!
KT
Drug-protein pairs!
K
=!
Proteins!
i=1
Kronecker product of two base kernels: !
Drugs!
Drugs!
g(x) = ∑α i K(xi , x)
The size of K makes
the model training
computationally
unfeasible in typical
applications.!
!
T-61.6080 !
29 February 2016!
41!
KronRLS!
!  Special case of ordinary RLS in which one assumes the training data to have a special structure – each datum consisting of two distinct parts (e.g. drug !
!and protein). !
!  Takes advantage of some algebraic properties of the Kronecker product to
speed up model training.!
!  KronRLS is well-suited for the large pairwise space of the DTI prediction
problem.!
Pahikkala et al. 2012!
T-61.6080 !
29 February 2016!
42!
Multiple Kernel Learning MKL!
LABELS!
(known interactions)!
Drugs!
Drugs!
K(l)D
Predicted DTIs!
Proteins!
Proteins!
Drug-protein pairs!
y
KERNELS!
K(m)T
Classification!
or regression!
algorithm!
T-61.6080 !
29 February 2016!
43!
Multiple Kernel Learning MKL!
!  Classical kernel-based algorithms rely on a single kernel – the view resulting
in the highest predictive performance is considered the best one.!
!  Risk of loosing some important information by dropping all the other views.!
!  Ideally, one would like to learn the importance of each kernel matrix in a given
task, and then use a weighted combination of them:!
L
!
Kµ = ∑ µk K k .
k=1
!  There exist algorithms for determining the weights μk, e.g. ALIGN, ALIGNF.!
T-61.6080 !
29 February 2016!
44!