* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Numerical weather prediction wikipedia , lookup
Generalized linear model wikipedia , lookup
General circulation model wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Computational fluid dynamics wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Solvent models wikipedia , lookup
Targeting Druglike properties in Chemical Libraries David Winkler, Frank Burden, Mitchell Polley Centre for Complexity in Drug Design CSIRO Molecular Science and Chemistry Department, Monash University VICS Complexity in Drug Design Group Prof. Frank Burden - Scimetrics Ltd - consultant to CSIRO Dr. Mitchell Polley - CSS postdoctoral fellow Darryl Jones - CSS PhD top-up student Flinders University (Physics) Prof. Dave Winkler - CSIRO Molecular Science and Monash University/VICS Overview of Project Aims to develop a method for evolving a chemical library of heterogeneous agents (molecules) using 'drug-like' fitness functions Chemical space is vast (>1080 possibilities) Method must explore drug-like chemical space and identify islands of activity and novelty Application in the discovery of novel bioactive agents such as drugs, crop care products Methodology applicable to design of new materials and nanomachines using different fitness functions Overview of Project Steps… Devise sparse, informative mathematical representations of molecules Devise sparse methods of selecting these for models Use agent-based methods (Bayesian neural nets) to map representations to properties and use models as fitness functions Develop methods for evolving chemicals using mutation operators so that maximum chemical space can be traversed Evolve chemical libraries using drug-like fitness functions Highlights Representations Novel charge fingerprint descriptor devised and tested Theory of eigenvalue descriptors cracked momentum space descriptor work started Novel selectivity index developed Sparse Descriptors Many thousands of descriptors have been devised (e.g. CoMFA fields, DRAGON) Many are highly correlated with other descriptors - contain the same information Some (e.g. molecular weight) are informationpoor Models using sparse descriptors can be more predictive We work to the premise that it is possible to devise sparse, information-rich descriptors from which suitable subsets could be drawn for a wide variety of modelling problems Charge fingerprints These are widely applicable, easily computed descriptors calculated by binning charges on atoms in different environments 0.8 GM C EEM TOP 6 C EEM TOP 1 C EEM TOP C EEM SEP 6 C EEM SEP 1 C EEM SEP C 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 .0 +1 => .9 +0 => .8 +0 => .7 +0 => .6 +0 => .5 +0 => .4 +0 => .3 +0 => .2 +0 => .1 +0 => 0.0 => 0.0 <= 1 -0. <= 2 -0. <= 3 -0. <= 4 -0. <= 5 -0. <= 6 -0. <= 7 -0. <= 8 -0. <= 9 -0. <= 0 -1. <= EEM-based property descriptors Density Functional Theory (DFT) proposes that knowledge of electron density allows computation of many other properties Electronegativity equalization methods (Mortier, Bultinck and others) is a rapid, approximate DFT method All work to date has concentrated on charges or a few other ‘observables’. Main strength will probably lie with calculation of other molecular properties, when method is generalized and parameterized for more atom types Generalized eigenvalue matrices 1 H3C O C 0 1 a33 0 a11 b12 r12m b13 m r 13 b1n m r1n 2 O- a11 1 1 a22 0 1 1 0 3 4 0 1 0 ann b21 r21m a22 b23 r23m b2n r2nm b31 r31m b32 r32n a33 b3n r3nm bn1 rn1m bn 2 rnm2 bn 3 rnm3 ann Why do eigenvalue descriptors work? Eigenvalue matrix EEM matrix 21 1 r21 1 r m n1 1 1 r21 22 1 rnm2 1 1 r1n 1 r2n 2n 1 1 1 1 1 0 a11 1 r12 1 r 1n 1 r21 a22 1 r2n 1 r3n 1 rn1 1 rn 2 ann A = TLT' AT = TL \ A-1 = TL-1 T' since T'=T-1 for an orthogonal transformation i.e. inverse of A is related to the eigenvalues Momentum space descriptors the more interesting part of the electron density distribution in terms of biological activity is located near to the k-space origin. The corresponding rspace density distribution is associated with the outermost valence regions of the molecule k-space descriptions of electron density are more compact and simpler Optimum Selectivity Index So Highlights Sparse feature selection Automatic Relevance Determination (ARD) method refined Sparse Bayesian feature detection theory mastered Linear sparse feature detection using an EM algorithm and Jeffrey's prior Nonlinear Bayesian feature detection achieved but needs more work Novel variable selection when number of descriptors is much larger than the number of molecules in the data set Sparse Bayesian variable selection Descriptor Highlights Optimum nonlinear modelling Bayesian regularized neural networks working well Linear sparse feature detection and modelling Nonlinear Bayesian feature detection and modelling using radial basis function regression Use of sparse Bayesian methods in neural networks under study Highlights Models built Blood-brain barrier partitioning Drug intestinal absorption Acute toxicity Phase II metabolism - substrates and inhibitors (Flinders medical school collaboration) - SVM Several drug target models - e.g. farnesyl transferase QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Blood-brain barrier model Topological descriptors- 3 hidden nodes Training set 85 compounds, test set 21 compounds Intestinal absorption QSAR model Property-based descriptors- 5 hidden nodes- optimum model Acute toxicity model Burden index/binned charge descriptors 8 hidden nodes Training set 450 compounds, external test set 53 compounds Using SVM and EEM descriptors to model phase II metabolism UGT Isoform Number of Percent chemicals in Substrates dataset Percent of test set predicted correctly All Chemicals Substrates NonSubstrates 1A1 174 39 85 81 88 1A3 156 76 89 94 67 1A4 156 55 83 78 94 1A6 161 41 67 72 64 1A7 65 40 79 57 92 1A8 104 78 77 95 40 1A9 176 65 80 86 67 1A10 147 50 80 86 74 2B4 131 31 83 75 87 2B7 196 65 64 73 36 2B15 125 42 67 60 71 2B17 53 45 80 70 100 COX 1 and 2 QSAR and selectivity Built QSAR model for cyclooxygenase 1 and 2, and S0 using a large data set from Tom Stockfisch at Accelrys (454 compounds obtained from http://www.accelrys.com/references/datasets/) Used atomistic (A), Burden eigenvalue (B) and charge fingerprint (C) descriptors together with a Bayesian regularized neural net to build model Compared MLR with a Bayesian neural net with 3 nodes in the hidden layer COX 1 and 2 QSAR and selectivity Selectivity of cyclooxygenase 1 and 2 inhibitors Selectivity Index So QSAR Model MLR R2=0.77 Q2=0.69 BRANN (3 nodes) R2=0.92 Q2=0.74