Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann Marc Zimmermann, 2005 ChemAxon UGM05 Selection of Potential Drugs • 28 million compounds currently known • Drug company biologists screen up to 1 million compounds against target using ultra-high throughput technology • Chemists select 50-100 compounds for follow-up • Chemists work on these compounds, developing new, more potent compounds • Pharmacologists test compounds for pharmacokinetic and toxicological profiles • 1-2 compounds are selected as potential drugs Page 2 ChemAxon UGM05 Marc Zimmermann, 2005 Page 3 High Volume Screening Analysis – the Methods Screening active inactive HTS vHTS (similarity, docking) Assembling Filtering Clustering Modeling Virtual Screening – Computational or in silico analog of biological screening o Score, rank, and/or filter a set of structures using one or more computational procedures o Helps to decide: Which compounds to screen Which libraries to synthesize Which compounds to purchase from an external source ChemAxon UGM05 Marc Zimmermann, 2005 Page 4 High Volume Screening Analysis – the Tools at SCAI Screening VS Explorer FTrees FlexX Assembling Filtering GRID Layer ProMiner TopNet HTSview Clustering DB Annotator Modeling ChemAxon UGM05 Marc Zimmermann, 2005 Computational Aspects of Drug Discovery : Virtual Screening • Enable scientists to quickly and easily find compounds binding to a particular target protein o growth of targets number o growth of 3D structures determination (PDB database) o growth of computing power o growth of prediction quality of protein-compound interactions • Experimental screening very expensive : not for academic or small companies • Aim : Active molecules Tested molecules Page 6 ChemAxon UGM05 Marc Zimmermann, 2005 Page 7 Grids for neglected diseases and diseases of the developing world In silico drug discovery process (EGEE, Swissgrid, …) SCAI Fraunhofer Clermont-Ferrand Support to local centres in plagued areas (genomics research, clinical trials and vector control) Swiss Biogrid consortium Local research centres In plagued areas The grid impact : •Computing and storage resources for genomics research and in silico drug discovery •cross-organizational collaboration space to progress research work •Federation of patient databases for clinical trials and epidemiology in developing countries ChemAxon UGM05 Marc Zimmermann, 2005 Page 8 Structure-Based Virtual Screening Protein-Ligand Docking Target Protein Ligand database o Aims to predict 3D structures when a molecule “docks” to a protein Need a way to explore the space of possible proteinligand geometries (poses) Molecular docking Need to score or rank the poses o Problem: many degrees of freedom (rotation, conformation, solvent effects) Ligand docked into protein’s active site ChemAxon UGM05 Marc Zimmermann, 2005 Page 9 Grid VS Results Browser • Quick overview on very large log-files • Sorting and merging of files • Storing and retrieval in databases • Similarity searches and property predictions • Interface to R statistics box M END > <Object Id> MAC-0000100 > <Batch Ref> 03 > <Supplier Object Id> "Smiles";"Data" 6743501 "c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66 "C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89 > <ENZ_KINETIC_RES_ACT.RES_ACT> "n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32 "[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14 "[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02 "C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15 "C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46 "C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98 concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations || score || time "C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6 ZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25 "C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92 ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84 s "C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84 ZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92 s "N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38 ZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41 s "N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84 ZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44 s • Prototype is under construction ZINC00000290 ZINC00000349 ZINC00000453 ZINC00000484 ZINC00000607 | | | | | 1cet 1cet 1cet 1cet 1cet | ZINC00000290 | ZINC00000349 | ZINC00000453 | ZINC00000484 | ZINC00000607 | | | | | 259 || 82 || 256 || 447 || 418 || -15.00 || 20.40 s -10.81 || 22.20 s -14.61 || 3.76 s -18.33 || 35.53 s -15.77 || 7.43 s ChemAxon UGM05 Marc Zimmermann, 2005 Page 10 Rapid prototyping using ChemAxon Libraries • 100% Pure JAVA (JRE) o Swing o JTable • Using ChemAxon (MarvinBeans) for the chemical stuff • OJDBC for database connection to Oracle GUI (Swing) Table Module Chem Module DB connect File I/O Marc Zimmermann, 2005 ChemAxon UGM05 Molecule Rendering From spreadsheets to molecular spreadsheets o Overloading cellRenderer with Marvin from Switch SMILES Structure on / off Page 11 ChemAxon UGM05 Marc Zimmermann, 2005 Page 12 File Import / Export • Implemented as a thread • Comma Separated Files o CSV Parser o Preview Window o Tag missing Values • SDF Molecular Files o SDF Properties Names as Row-Keys o Import Coordinates o Based on MolImporter from Preview Marc Zimmermann, 2005 ChemAxon UGM05 Page 13 Smart Indexing for large Collections Index FilePointer • Large index storing filepointers or database keys • JAVA TableModel only stores the full information for a limited number of elements (cache) Marc Zimmermann, 2005 ChemAxon UGM05 Page 14 Interactive Focus on Data Index • Large index storing filepointers or database keys • JAVA TableModel only stores the full information for a limited number of elements FilePointer • • EventHandler for scrolling triggers reload from external memory (e.g. a cursor for RDB) Update of the TableModel ChemAxon UGM05 Marc Zimmermann, 2005 Page 15 Column Sorting Index sort(List) • EventHandle starting a sorting thread • Resorting of the Index for flat files • New database query: + ORDER BY columnLabel • Coming next: o Object FilePointer o Implementation of efficient online sorting algorithms in order to reduce the file access Merging of two tables Marc Zimmermann, 2005 ChemAxon UGM05 Page 16 DB Annotator: Semantics for databases Semantic annotation of relational data o Linking databases and ontologies o Using the VS Explorer as Plugin VS Explorer Ontology browser Marc Zimmermann, 2005 ChemAxon UGM05 Page 17 DHFR Assay for E.coli: • Folate -> DHF -> THF -> synthesis of thymidin DHF • Important for cell growth • DHFR inhibitor: Trimethoprim Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6 High throughput screening identifies novel inhibitors of Escheria coli dihydrofolate reductase that are competitive with dihydrofolate. Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED. http://hts.mcmaster.ca/HTSDataMiningCompetition.htm Trimethoprim Marc Zimmermann, 2005 ChemAxon UGM05 Docking with FlexX1 • PDB structure 1RA2 • Cocrystallized DHFR and NADP • FlexX places water particles 15th Symposium on QSAR 2004; Poster Drilling into a HTS data set of e. coli. Zimmermann M, Tresch A, Maass A, Hofmann M 1Rarey M, Kramer B, Lengauer T and Klebe G, J Mol Biol 1996, 261(3):470-89. Page 18 ChemAxon UGM05 Marc Zimmermann, 2005 Page 19 In silico Screening Workflow: Training Set Test Set Docking Fragment Analysis QSAR 2D Similarity Analysis HTS MD Simulation Classification active inactive Activity Region Candidates ChemAxon UGM05 Marc Zimmermann, 2005 1CET – Lactate Dehydrogenase of Plasmodium Falciparum Malaria Target: o Chloroquine binds in the cofactor binding site of Plasmodium Falciparum lactate dehydrogenase o PDB structure: 1CET o Ligand: Chloro-Quinolin o Test Ligands: Ambinter data set from ZINC Page 20 ChemAxon UGM05 Marc Zimmermann, 2005 Page 21 1CET vs. 50 000 Compounds on 200 Nodes: Global Statistics • Done : 100% • Rescheduled : 46 • Running on nodes : 2296 h – 96 days • Grid Time : 205,5 h o Autodock.pl : 2288 h o Scheduled : 179h o Total transfer : 8h o Ready : 78 mn • submission script : 36 h o Waiting : 78 mn • time gain of : 64 (instead of 200) o Submitted : 24 h • Ideal : 11,5 h ChemAxon UGM05 Marc Zimmermann, 2005 Planning Next Steps • 2M compounds vs. 1 protein target o Input : 13GB o Output : 2 TB output (dlg), 0,5 TB (pdb) o 12 CPU/year o Ideal : 3 days with 1350 CPUs o Reality : clusters grid with users, queues, errors… • Challenges for our application? o 100% obtained results o Minimal process time o Grid resources consuming (storage, cpu) o User interface for the application o … Page 22