Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MultiY Recursive Partitioning – Method and Applications Robert Brown, Shashidhar Rao, Tom Stockfisch, Accelrys Inc David Roush, Litai Zhang, FMC UK-QSAR meeting – June 2002 Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary Introduction • High-throughput chemistry and biology are creating a wealth of data that can lead to knowledge to expedite the drug-discovery process • Requirement for high-throughput methods to model HTS data for insilico screening – HTS data is characterized by huge number of observations, low hit rates, lots of noise – Need high-speed methods for prediction – Recursive Partitioning (CART, FIRM), Linear Discriminant Analysis, Neural Nets, Binary QSAR etc • Would like to understand trends and selectivity across assays – mine the HTS data matrix Standard CART RP • Input – Multiple descriptors (X) - continuous or categorical – Single screening result (Y) - categorical (e.g. yes/no) • Decision tree aims to separate different types of observation into different leaves of the tree • Two step procedure:(1) overgrow then (2) prune 1 •decrease impurity during growth phase •choose split with greatest drop in impurity •stepwise procedure w/ no look-ahead examines only a small fraction of possible trees • over-grows the tree •decrease R during pruning 2 •R R0 + Nterminal •stepwise procedure finds optimum R over all possible subtrees of the overgrown tree Understanding Selectivity? Target 1 Target 2 I A A A I I I A I I A A I A I I A • Hard or impossible to compare trees to see what produces selectivity • Requires enough data to determine two separate trees PUMP-RP • One tree combines both responses – Easy to see what makes a molecule selective – Easy to see what the targets have in common – Twice the activity data available to determine generic portion Yk-generic splits Activity -type splits I1 I2 A1 I2 A1 I1 I1 • Use for Specificity (e.g., Y1, Y2 different targets) • Use for multi-physical models (e.g., Y1= activity, Y2= toxicity) I2 Yk-specific splits Partially Unified Multiple Property Recursive Partitioning: A New Method for Predicting and Understanding Drug Selectivity, Thomas Stockfisch in preparation for J. Chem. Inf. Comput. Sci. A2 I2 A2 A2 I2 I2 New algorithm • Obtain a balance between a single general tree and a series of unrelated specific trees • Procedure – 1. Map data to a single Y variable – 2. Grow a pure specific tree - k node at level 1 – – 3. Regrow a k-branch - save the k split and replace with a non-k split 4. Recursively repeat step 3 moving the knodes “down” until arriving at the maximally generic tree – 5. Prune the generic tree - replace some generic branches with specifics – 6. Find the optimal tree to balance specificity and generality Yk=1 K-split wins X X1 0.1 X2 Y1 Y2 X1 0.41 k 1,I 0.2 0.4 0.3 I I A I 1,I 1 X X K Y X2 0.41 X 2 0.1 0.2 0.1 0.2 1 2 X X12b 0.91 X12b 0.11 0.6 A unk 1,I X Y1,IkX Y 0.2 Y Single-YXk plus K column 2 .0.61 X1 .61 Multi-Y 1,I X2,I 12a 0.91 2,I 0.2 0.4 1 1,A 0.2 0.4 2 2,I X12a 0.11 0.3 0.6 1 1,A X 1,A separate Y1 model 2,I 2,I 2,I 2,A separate Y2 model Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary Selectivity Study: COX-2 selectivity • Cyclooxygenase (COX) is a key enzyme in the prostaglandin biosynthesis via the pathway of arachadonic acid breakdown. • Two isoforms, COX-1 (constitutive) and COX-2 (triggered by inflammatory insults) are known and characterized. • COX-2 inhibitors are anti-inflammatory agents with minimal GI sideeffects. – Celebrex and Vioxx • Inhibition of COX-1 can lead to gastric damage, hemorrhage or ulceration – NSAIDS e.g Iboprofen, Aspirin etc Partially Unified Multiple Property Recursive Partitioning (PUMP-RP) Analyses of Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao &Thomas P. Stockfisch in preparation for J. Chem. Inf. Comput. Sci. Study Input • 454 Diaryl heterocycle cyclooxygenase (COX) inhibitors with phenyl sulfones & phenyl sulfonamides from published literature. – Inhibitory activities (IC50) against COX-1 and COX-2 isoforms of the enzyme. – Divided into 2 classes for each target: • COX-1 - IC50 > 5 M (Class 0). IC50 <= 5 M (Class 1) • COX-2 - IC50 > 0.5 M (Class 0). IC50 <= 0.5 M (Class 1) – Divided into • Test set (TE) of 50 compounds: 17 COX-2 selective • Training set (TR) of 404 compounds: 181 COX-2 selective. • External validation sets – 25 Merck cyclooxygenase inhibitors • represents a different class of chemistry than that covered by the training and test sets – 8 NSAIDs (aspirin, ketoprofen, naproxene, desmethylnaproxene, ibuprofen, indomethacin, phenytoin and diclofenac) • all active and non-selective Example Tree I1 (125) A2 (30) HB Donor <=1 Jurs-FNSA-3 <= -0.2 COX-2 selective I2 (95) A2 (112) AlogP98 <=3.1 A1 (112) I2 (61) I1 (61) ISIS_key59 generic split I2 (6) TRUE A1 (6) JY <=2.083 Yk = 1 split FALSE Specific split A2 (100) A1 (100) Why not just calculate two trees? FH2O <=-30.1 A2 (127) AlogP98 <= 2.6 COX-2 Inhibition I2 (4) Apol <=14051.8 JX <=2.01 ISIS Key #75 I2 (4) I2 (231) A2 (9) I2 (29) I1 (148) Dipole Mom. <=5.87 COX-1 Inhibition JX <= 1.79 ISIS Key #94 A1 (8) A1 (9) A1 (6) Shdw-XZ fract <= 0.7 ISIS Key #66 A1 (36) A1 (104) Shdw-nu <= 2 AlogP98 <= 3.1 I1 (63) I1 (30) Prediction of Selectivity • Percentage of actives correctly predicted by RP trees compared to experiment Both COX-1 & COX-2 COX-1 Only COX-2 Only TR 42% to 67% 71% to 85% 64% to 80% TE 52% to 68% 64% to 84% 66% to 90% • Enrichment in Cox2 selectives – 1.56 to 1.86 in the training set (TR) – 1.60 to 2.29 in the test set (TE) – Remember: 44% of TR is Cox2 selective, so the best possible enrichment in TR would be ~2.2 False positive and negative selectivity rates Training set (TR) Test Set (TE) SRfp 16.9% to 26.8% 25% to 45.5% SRfn 21.1% to 27.9% 27.1% to 40.2% External Validation Sets • 25 Merck compounds – 21 actives including 13 COX2 selective, 4 inactive – 21 correctly predicted COX2 active, 8 correctly predicted COX1 active – 8 correctly predicted COX2 selective – Correctly predict that none are COX1 selective • 8 NSAIDs: aspirin, ketoprofen, naproxene, desmethylnaproxene, ibuprofen, indomethacin, phenytoin and diclofenac. – All predicted to be non-selective – five of them (ketoprofen, naproxene, ibuprofen, indomethacin and diclofenac) are predicted to be active – three including aspirin predicted inactive • Aspirin is a weak inhibitor of both COX 1 and 2 (IC50 ~ 150-300 nM) Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary Assay Enrichment Study • 66000 FMC compounds library screened in two functional assays (I and II) returning two classes of activity (0 and 1) – Assay I has two follow up assays [I(1); I(2); I(3)] • 60, 33, 24 actives respectively – Assay II has one follow up assay[II(1); II(2)] • 109, 12 actives respectively – X(1) is a primary assay, whilst (2) and (3) are related to specific mechanisms • Goal – Combine multiple data from multiple assays for endpoint X to • Explain factors causing activity • Use maximum data to get best predictive model Computational Protocol • The 66000 compounds were divided in half for training and test sets with even distributions of actives/inactives for both assays • Six sets of descriptors – Bcuts (8), – Cerius2 Fast descriptors (199), – Jurs descriptors (30), – ISIS keys (166), – 3D Atom pairs (825) – CCG-2D (145) Mining Large Databases Using Multiple Y Recursive Partitioning, David Roush, Litai Zhang, Thomas Stockfisch and Shashidhar Rao, in preparation for J. Chem. Inf. Comput. Sci. Single Y vs Multi Y – Cerius2 Descriptors Actual Hit False Negative Rate (%) (%) False Postive (%) Enrichment Factor Assay I (1) 0.17 30 38 99.1 95.3 5x 26x Assay I (2) 0.09 31 37 99.2 94.4 8x 55x Assay I (3) 0.07 46 42 99.5 99.5 8x 6x Test Set Results Single Y vs Multi Y – ISIS Keys Actual Hit False Negative Rate (%) (%) False Postive (%) Enrichment Factor Assay I (1) 0.17 41 55 97 97 16x 16x Assay I (2) 0.09 35 55 98 97 22x 34x Assay I (3) 0.07 59 59 99.7 99.1 4x 15x Test Set Results Single Y vs Multiple Y • Multi Y produces better enrichments with better false positive rates • Single Y produces better false negative rates • => More information has produced a more selective screen • Logistically, only one experiment to run • Multi Y allows the factors/descriptors important to all assays to be identified PUMP-RP - Assays I(1) PUMP-RP - Assays I(2) PUMP-RP - Assays I(3) PUMP-RP - Assays I (all assays) Summary • PUMP-RP procedure creates tree with target-generic splits near the root, target-specific splits near the leaves, and separated by splits on the activity type. – the generic splits benefit from being determined by a larger amount of data than if separate models were made – easy to interpret which splits determine specificity and which show commonality of target • Prediction and understanding of COX-2 selective molecules • Large scale experiments with FMC show use of multiple assay data to enhance understanding of activity • Commercial released in Cerius2 4.6 Forthcoming Publications • Methodology – Partially Unified Multiple Property Recursive Partitioning: A New Method for Predicting and Understanding Drug Selectivity, Thomas Stockfisch, in preparation for J. Chem. Inf. Comput. Sci. • COX Selectivity Study – Partially Unified Multiple Property Recursive Partitioning (PUMP-RP) Analyses of Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao &Thomas P. Stockfisch in preparation for J. Chem. Inf. Comput. Sci. • FMC HTS Study – Mining Large Databases Using Multiple Y Recursive Partitioning, David Roush, Litai Zhang, Thomas Stockfisch and Shashidhar Rao. in preparation for J. Chem. Inf. Comput. Sci.