Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Predictive modeling of interactions between small chemical molecules and proteins ! Anna Cichonska! 1) Department of Computer Science, Aalto University! 2) Institute for Molecular Medicine Finland FIMM, University of Helsinki ! [email protected] 29 February 2016! Drug molecule! – a small chemical substance used in the treatment, cure, prevention or diagnosis of a disease, or to otherwise enhance physical or mental well-being.! ! Examples! !• Aspirin (acetylsalicylic acid) Treatment of pain, fever and inflammation.! • Lovastatin Lowers cholesterol level.! ! ! ! • Everolimus Treatment of cancer, including cancer of the kidneys, pancreas, breasts, and brain. Used with other medicines to keep the body from rejecting a transplanted kidney or liver.! T-61.6080 ! 29 February 2016! 2! Targets of drugs! ! Drug-like chemical compounds execute their actions mainly by modulating cellular targets, including proteins, metabolites or even nucleic acids (DNA and RNA). ! PROTEINS! ! Large biomolecules.! ! One of the most abundant molecules in living organisms.! ! Perform variety of crucial functions, such as:! " transporting molecules (e.g. hemoglobin transports oxygen), ! " catalyzing reactions (enzymes), ! " replicating DNA, ! " identifying and neutralizing foreign bacteria and viruses,! & many more.! T-61.6080 ! 29 February 2016! 3! Targets of drugs - PROTEINS! ! Proteins consist of one or more long chains of amino acid residues.! ! There are 20 proteinogenic amino acids.! ! Amino acid sequences of proteins are encoded in the genes.! Amino acid! Protein sequence! ! The sequence of the amino acid chain causes the polypeptide to fold into a shape that is biologically active.! Protein structure! T-61.6080 ! 29 February 2016! 4! Drug-target interaction! (DTI) ! One of the most common drug’s mechanisms of action (MoA)! Drug-target interaction = molecular-level interaction, e.g. ! hydrogen bonds keep a compound tightly bound to a protein,... hydrophobic protein atoms enclose hydrophobic compound atoms. T-61.6080 ! 29 February 2016! 5! Drug-target interaction ! Statin-[HMG-CoA reductase] Inhibition of HMG-CoA reductase enzyme with statin! HMG-CoA reductase T-61.6080 ! 29 February 2016! 6! Drug-target interaction ! Aspirin suppresses the production of prostaglandins and thromboxanes due to its irreversible inactivation of the cyclooxygenase.! Aspirin-cyclooxygenase Cyclooxygenase OH! O! H! ACTIVE cyclooxygenase enzyme ! INACTIVE cyclooxygenase enzyme ! T-61.6080 ! 29 February 2016! 7! Why is drug discovery difficult? ! CHEMICAL SPACE ! TARGET (PROTEIN) SPACE ! Only certain molecules have features consistent with good pharmacological properties (Lipinski's rule of five).! ! Only certain targets have binding sites capable of efficient binding of drug-like ligands.! ! 104 – 105 proteins! 1020 – 1024 ! T-61.6080 ! 29 February 2016! 8! Accessible pharmacological space ! T-61.6080 ! 29 February 2016! 9! Drug development – some facts! ! More than half of candidate drugs that look promising in the research lab will ultimately fail.! ! More than a quarter of drugs that reach the clinical trial stage will be rejected as ineffective or unsafe (causing side effects).! ! The amount of new drugs approved by US Food and Drug Administration (FDA) per billion dollars (inflation-adjusted) spent on R&D has halved every 9 years since 1950.! Scannell JW et al. (2012) “Diagnosing the decline in pharmaceutical R&D efficiency”. Nature Reviews Drug Discovery 11.3.! T-61.6080 ! 29 February 2016! 10! Motivation for computational methods in drug discovery! ! Target-centered drug discovery Recently, drug discovery was focused on designing selective chemical agents, each binding with maximal affinity to an individual molecular target involved in a particular disease, the concept often referred to as one drug – one target – one disease paradigm. ! System-driven drug discovery, network pharmacology! Developing rational, effective and safe therapies requires a network perspective of the cellular system and a proteome-wide profile of drug-target interactions, ! including both on- and off-target effects. ! Experimental compound-target interaction mapping is time consuming and expensive.! ! Moreover, it is simply infeasible to determine all the possible compound-target interactions in the laboratory (1020 – 1024 drug-like compounds!).! T-61.6080 ! 29 February 2016! 11! In silico drug screening! ! Docking simulations! " Finding preferred orientation of one molecule to a second one when bound to each other to form a stable complex.! " Very accurate but slow.! " Require the usage of 3D molecular structures and vast amount of computational resources.! " DOCK: first docking program by Kuntz et al. (1982).! " Current algorithms:! – fragment-based methods: FlexX, DOCK (since version 4.0);! – Monte Carlo/Simulated annealing: QXP(Flo), Autodock, Affinity & LigandFit (Accelrys);! – Genetic algorithms: GOLD, AutoDock (since version 3.0);! – Systematic search: FRED (OpenEye), Glide (Schrödinger). ! T-61.6080 ! 29 February 2016! 12! In silico drug screening! ! Machine Learning! " Less accurate but orders of magnitude faster than docking simulations; thus, more preferable for early-stage in silico drugs screening.! " Compound (ligand)-based prediction Models trained using compound information only.! " Target-based prediction Models trained using target information only.! " The above approaches allow discovering new targets for given drugs or new drugs targeting given targets.! " System-based prediction Models trained using both compound and target information. Assumption: similar compounds are likely to interact with similar targets.! T-61.6080 ! 29 February 2016! 13! Components of the drug-target interaction system-based prediction methods! ! Classification interaction/no interaction ! ! Regression quantitative binding affinity! T-61.6080 ! 29 February 2016! 14! Kernels – recap from last week! ! Assumption: similar drugs are likely to interact with similar targets.! ! Similarities between molecules can be encoded using kernel functions K(x,z).! ! If φ (x) is a feature vector describing object x, the linear kernel can be calculated as: K(x, z) = φ (x), φ (z) = φ j (x)φ j (z). ! ∑ j ! Kernel functions !K : X! × X !→ ℜ! are used to compute the dot product between data points in a high dimensional feature space H without explicitly calculating the mapping from the input space to H, i.e. φ : X → H. ! ! A linear model in the implicit feature space corresponds to a non-linear model in the original space.! ! In addition to kernels being able to extract nonlinear features implicitly, they are very useful for calculating the similarity between structured objects, e.g. images, chemical molecules or proteins.! T-61.6080 ! 29 February 2016! 15! Drug-target interaction prediction - OVERVIEW! LABELS! (known interactions)! KERNELS! Drugs! Drugs! KD Predicted DTIs! Proteins! Proteins! Drug-protein pairs! y KT Classification! or regression! algorithm! T-61.6080 ! 29 February 2016! 16! Known interactions – bioactivity databases! h#ps://www.ebi.ac.uk/chembl/ " " " " h#ps://pubchem.ncbi.nlm.nih.gov/ " Data generators deposit their data.! " Incorporates data from other databases. e.g. ChEMBL.! " Contains data on ~40 mln unique structures, 700 000 assays.! Searchable and downloadable. ! Data manually extracted from the literature.! Target Report Card, Compound Report Card. ! ~1.5 mln compounds in ChEMBL19.! " 7 669 drug entries.! " Does not contain strictly bioactivity data.! " Pharmacokinetics.! h#p://www.drugbank.ca/ " Psychoactive compounds.! h#p://pdsp.med.unc.edu/ T-61.6080 ! 29 February 2016! 17! Drug-target interaction prediction - OVERVIEW! LABELS! (known interactions)! KERNELS! Drugs! Drugs! KD Predicted DTIs! Proteins! Proteins! Drug-protein pairs! y KT Classification! or regression! algorithm! T-61.6080 ! 29 February 2016! 18! Molecular fingerprint! Chemical space! ! A way of encoding the structure of a molecule.! ! The most common type of fingerprint is a series of binary digits (bits) that represent the presence or absence of particular substructures in the molecule.! Example! T-61.6080 ! 29 February 2016! 19! 2D fingerprints! Chemical space! Dictionary-Based Fingerprints Pre-defined fragments, each of which maps to a single bit. Examples: MACCS Keys, BCI.! Hashed Fingerprints Fragments are generated algorithmically without the need for a dictionary, e.g. all paths up to seven non-hydrogen atoms. Examples: Daylight, UNITY fingerprints. ! Circular Substructures Each atom is represented together with its environment (neighbouring atoms as extended spheres). Examples: ECFP2, ECFP4, FCFP2.! T-61.6080 ! 29 February 2016! 20! 3D fingerprints! Chemical space! Presence or absence of geometric features, ! e.g. pairs/triplets of atoms at given distance, valence/torsion angles.! ! T-61.6080 ! 29 February 2016! 21! Fingerprint-based Tanimoto kernel! K( fp1, fp2 ) = Chemical space! N fp1, fp2 N fp1 + N fp2 − N fp1, fp2 fpi − fingerprint of the molecule i, N fpi − number of 1-bits in the fingerprint fpi , N fp1, fp2 − number of 1-bits in both fingerprints. " Computed based on the size of common substructures of the molecules represented by the fingerprints. ! 3D shape-based comparison! " Atoms are represented as Gaussian functions.! " Molecules are aligned in 3D.! " Similarity score is based on the common volume. ! T-61.6080 ! 29 February 2016! 22! Chemical space! Graph kernel! ! Graph kernels allow to measure the similarity between graphs. ! ! ! Chemical molecule can be represented as a labeled or unlabeled graph, where a node corresponds to an atom, and an edge indicates a bond between two ! atoms.! ! ! Graph kernels can be roughly categorized into three main groups:! 1) graph kernels based on walks and paths,! 2) graph kernels based on limited-size subgraphs,! 3) graph kernels based on subtree patterns.! ! Examples: random walk kernel, shortest-path kernel, Weisfeiler-Lehman subtree kernel. ! ! T-61.6080 ! 29 February 2016! 23! Chemical space! Example: random walk! ! A graph G is a set of nodes (vertices) V and edges E. ! "1 if (vi , v j ) ∈ E ! The adjacency matrix A of G is defined as [A]ij = # .! $0 otherwise ! Walk – a sequence of nodes, in which consecutive nodes are connected by an edge. A walk can travel over any edge and any node any number of times.! v 2! v 3! v 1! v 6! v 7! v 5! w = (v1, v2, v6, v7, v2, v6, v5)! v 4! ! Walks of length k can be computed by taking the adjacency matrix A to the power of k. ! Ak(vi,vj) = m # m walks of length k exist between nodes vi and vj.! T-61.6080 ! 29 February 2016! 24! Chemical space! Example: random walk graph kernel! ! Random walk kernel computes the number of all pairs of matching walks in a pair of graphs. ! ! TRICK: common walks of length k can be calculated from the adjacency matrix of the product graph G× of two input graphs G1 and G2.! ! G× is a graph over pairs of vertices from G1 and G2. Two vertices in G× are neighbors if and only if the corresponding vertices in G1 and G2 are both neighbors.! ! Random walk kernel! |V× | ∞ G2! G1! G×! |V× | K× (G1, G2 ) = ∑ [∑ λ n A×n ]ij = ∑ [(I − λ A× )−1 ] counts all pairs of matching walks of any length ! ! i, j=1 n=0 i, j=1 Vishwanathan S et al. (2010) "Graph kernels”. The Journal of Machine Learning Research 11.! T-61.6080 ! 29 February 2016! 25! Problem with using chemical structures! Chemical space! ! Sometimes structurally similar molecules can have different properties.! 2D Tanimoto similarity! 99%! 95%! T-61.6080 ! 29 February 2016! 26! Chemical space! Additional information! ! Side effects;! Example: Glibenclamide (A10B B01) ! A ! Anatomical Therapeutic Chemical (ATC) Classification System;! ! Gene expression responses to drugs.! ! ! !Alimentary tract and metabolism !(main anatomical group)! A10 ! ! !Drugs used in diabetes !(main therapeutic group)! A10B ! !Oral blood-glucose-lowering drugs !(pharmacological subgroup)! A10B B ! !Sulfonamides, urea derivatives !(chemical/therapeutic subgroup)! A10B B01 !Glibenclamide (subgroup ! !for chemical substance)! T-61.6080 ! 29 February 2016! 27! Proteins! PROTEIN SPACE! Proteins! KT T-61.6080 ! 29 February 2016! 28! Amino acid sequence alignment! Protein space! ! Protein sequence alignment is a way of arranging the amino acid sequences to identify regions of similarity that may be a consequence of functional, structural, ! or evolutionary relationships between the sequences.! ! Smith-Waterman (SW) algorithm! " Performs local sequence alignment; uses dynamic programming to compare segments of all possible lengths.! " To find the optimal alignment, a scoring system including a set of specified gap penalties is used (different scoring matrices, e.g. BLOSUM, PAM).! " The algorithm assigns a score to each residue comparison between two sequences.! " Normalized similarity between two proteins p1 and p2:! s( p1, p2 ) = SW ( p1, p2 ) . SW ( p1, p1 ) SW ( p2 , p2 ) T-61.5120 - Computational Genomics P! T-61.6080 ! 29 February 2016! 29! Amino acid sequence alignment! Protein space! ! Some proteins might have very low amino acid sequence identity but similar 3D structures. ! T-61.6080 ! 29 February 2016! 30! Protein space! Additional information! ! 3D protein structures (Protein Data Bank PDB, computational prediction algorithms);! ! Binding sites, domains;! ! Protein surface; ! ! Protein-protein interaction network; ! ! Gene Ontology classifications (http://geneontology.org/).! http://wwpdb.org/! Three domains:! 1) biological processes, ! 2) cellular components,! 3) molecular functions.! T-61.6080 ! 29 February 2016! 31! Drug-target interaction prediction - OVERVIEW! LABELS! (known interactions)! KERNELS! Drugs! Drugs! KD Predicted DTIs! Proteins! Proteins! Drug-protein pairs! y KT REGRESSION! algorithm! T-61.6080 ! 29 February 2016! 32! Supervised learning! inferring a function from LABELED training data! ! Classification is the prediction of a class label, given attributes. ! ! Regression is the prediction of a real number, given attributes. ! INGREDIENTS ! " X: a space of inputs.! " Y: a space of outputs.! " G: a set of models mapping input to output G = {g: X Y}.! " Training data {(xi,yi)}, i=1,...,N, xi ∈X, yi ∈Y sampled from an underlying unknown distribution (x,y)~D.! " l: !a loss function measuring the discrepancy between the model’s !predicted outputs and true outputs.! GOAL: !to find a model g that minimizes the expected loss l(g(x),y) !on future instances; typically, squared loss l(g(x),y) = (g(x) – y)2. T-61.6080 ! 29 February 2016! 33! (e.g. interaction with cyclooxygenase)! Regression! g1(x) y g2(x) x (e.g. molecular weight of chemical compound) ! T-61.6080 ! 29 February 2016! 34! Regression! g1(x) y g1(x): more complex model, overfitting.! g2(x): generalizes better to new instances.! g2(x) x ! T-61.6080 ! 29 February 2016! 35! Regularization! ! Regularized learning considers optimising the functions of the form:! N min ∑ ℓ(g(xi ), yi ) + λΩ(g). g∈G i=1 Training error (loss), ! typically, squared loss: l(g(xi),yi) = (g(xi) – yi)2. ! Regularizer that controls the complexity of the model g.! " Complex model g # high value of Ω(g).! " Regularization parameter λ ≥ 0 controls the balance between training error and model complexity. ! T-61.6080 ! 29 February 2016! 36! Linear models and dual representation! ! A model in the form of linear function has a form:! g(x) = w, φ (x) = wT φ (x). ! Weight vector w can be written as a linear combination of the training points: ! N w = ∑α iφ (xi ). α: dual variable ! i=1 ! Now, we can represent model’s prediction in terms of inner products of training points # we can use kernels: ! N N g(x) = w, φ (x) = ∑α i φ (xi ), φ (x) = ∑α i K(xi , x). i=1 i=1 T-61.6080 ! 29 February 2016! 37! Regularized least-squares (RLS) regression! for DTI prediction! " A set of nd drugs:! " A set of nt targets: D = {d1,..., dnd }; !! " A set of N training inputs (drug-target pairs): !! T = {t1,..., tnt }; X = {(d1, t1 ),..., (d1, tnt ), (d2 , t1 ),..., (d2 , tnt ),..., (dnd , tnt )}; " N = nd × nt;! " yi: a real value indicating binding affinity of ith drug-target pair xi;! " λ: regularization parameter;! " ||g||K: norm of the prediction function g on the Hilbert space associated to the kernel K. ! N objective function! N min ∑(g(xi ) − yi )2 + λ g i=1 2 K g(x) = ∑α i K(xi , x) i=1 2 N N g 2 = ∑∑α iα j K(xi , x j ) i=1 j=1 T-61.6080 ! 29 February 2016! 38! Regularized least-squares (RLS) regression! N for DTI prediction! BASE KERNELS! PAIRWISE KERNEL! ! K = K D ⊗ KT . KD Drug-protein pairs! Proteins! KT Drug-protein pairs! K =! Proteins! i=1 Kronecker product of two base kernels: ! Drugs! Drugs! g(x) = ∑α i K(xi , x) T-61.6080 ! 29 February 2016! 39! Kronecker product! ! Defined for any two matrices B and C of arbitrary size.! ! Resulting matrix contains all possible products of entries of B and C.! Example! T-61.6080 ! 29 February 2016! 40! Regularized least-squares (RLS) regression! N for DTI prediction! BASE KERNELS! PAIRWISE KERNEL! ! K = K D ⊗ KT . KD Drug-protein pairs! Proteins! KT Drug-protein pairs! K =! Proteins! i=1 Kronecker product of two base kernels: ! Drugs! Drugs! g(x) = ∑α i K(xi , x) The size of K makes the model training computationally unfeasible in typical applications.! ! T-61.6080 ! 29 February 2016! 41! KronRLS! ! Special case of ordinary RLS in which one assumes the training data to have a special structure – each datum consisting of two distinct parts (e.g. drug ! !and protein). ! ! Takes advantage of some algebraic properties of the Kronecker product to speed up model training.! ! KronRLS is well-suited for the large pairwise space of the DTI prediction problem.! Pahikkala et al. 2012! T-61.6080 ! 29 February 2016! 42! Multiple Kernel Learning MKL! LABELS! (known interactions)! Drugs! Drugs! K(l)D Predicted DTIs! Proteins! Proteins! Drug-protein pairs! y KERNELS! K(m)T Classification! or regression! algorithm! T-61.6080 ! 29 February 2016! 43! Multiple Kernel Learning MKL! ! Classical kernel-based algorithms rely on a single kernel – the view resulting in the highest predictive performance is considered the best one.! ! Risk of loosing some important information by dropping all the other views.! ! Ideally, one would like to learn the importance of each kernel matrix in a given task, and then use a weighted combination of them:! L ! Kµ = ∑ µk K k . k=1 ! There exist algorithms for determining the weights μk, e.g. ALIGN, ALIGNF.! T-61.6080 ! 29 February 2016! 44!