Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Interactive Datamining of Large-Scale Screening Datasets Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group University Stuttgart C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Chemical data 18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 Merck Katalog Synopsys PG ACX NCI DTP ChemInform Spresi Beilstein CAS Current datasets 0 C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Multi-Variate and Multi-Dimensional Numeric Datasets Today Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis) experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today ! tools for interactive mining and information visualization are needed C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data Standard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependent Our goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 3D Tools for Interactive Information Visualization Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approaches C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 Glyphs C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Java/Java3D InfVis Applet Java3D Canvas Tool Panel (filters, selection tools, details) Control Panel C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Java/Java3D InfVis Applet 3D Render Panel 3D Glyphs C3 © Oellien, Ihlenfeldt, Engel, Ertl 3D Barchart MMWS 2002 Java/Java3D InfVis Applet 3D Tool Panel Dynamic Filter Tools Selection Tools Detail Tools C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Java/Java3D InfVis Applet 3D Control Panel C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Advantages of Volume-based InfVis Tools Databases with millions of data points – Glyph-based InfVis approaches • produce millions of geometric primitives • interactive visualization not possible – Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possible C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules Goal: Analysis of the reaction space C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions: reagent, solvent, time, temperature, stoichiometry, reagent order, FG-compatibility C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 ChemCodes - Reaction Optimization II C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 ChemCodes - Reaction Planning Functional Group Compatibility Check H H N H O C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compound C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Lead Compound Discovery II C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Lead Compound Discovery II C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002 Acknowledgment • Prof. Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive Systems University of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc. • Marc Nicklaus Laboratory of Medicinal Chemistry NCI, NIH • Deutsche Forschungsgemeinschaft C3 © Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002