Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Interactive Datamining of Large-Scale
Screening Datasets
Frank Oellien, Wolf D. Ihlenfeldt
Computer-Chemie-Centrum
University Erlangen-Nuremberg
Klaus Engel, Thomas Ertl
Visualization and Interactive Systems Group
University Stuttgart
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Chemical data
18000000
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
Merck Katalog
Synopsys PG
ACX
NCI DTP
ChemInform
Spresi
Beilstein
CAS
Current datasets
0
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Multi-Variate and Multi-Dimensional
Numeric Datasets Today
Change in chemical synthesis technology
• new technologies (HTS, combinatorial synthesis)
 experiments generate terabytes of data per year
• development of data mining and visualization tools
could not keep pace
• most critical bottleneck in R&D today !
 tools for interactive mining and information
visualization are needed
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Tools for Interactive Visualization of
Multi-Variate and Multi-Dimensional Data
Standard applications
• barchart, 2D and pseudo 3D
scatter plots, molecular spreadsheets
• limited to small subsets
• platform-dependent
Our goal: applications that are
• simple to use
• allow straightforward interpretation of results
• generalized access to tabular numeric data
• platform-independent
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
3D Tools for Interactive
Information Visualization
Information Visualization Applications that uses
3D capabilities of modern clients
• Glyph-based InfVis approaches
• Volume-based InfVis approaches
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Glyph-based InfVis Tools
• 3 orthogonal axes
• color
• shape
• size
• transparency
• surface effects
• animation
• up to ~100 Glyphs
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
Java3D
Canvas
Tool Panel
(filters, selection
tools, details)
Control
Panel
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
3D Render Panel
3D Glyphs
C3 © Oellien, Ihlenfeldt, Engel, Ertl
3D Barchart
MMWS 2002
Java/Java3D InfVis Applet
3D Tool Panel
Dynamic Filter Tools
Selection Tools
Detail Tools
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
3D Control Panel
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Advantages of Volume-based InfVis Tools
Databases with millions of data points
– Glyph-based InfVis approaches
• produce millions of geometric
primitives
• interactive visualization not possible
– Volume-based InfVis approaches
• can handle large number of
data points
• interactive visualization using
low-cost graphics hardware is possible
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes Reaction Database
• 100 most important FGs ~75% chemistry
• 100 standard reactions
• Limits of standard reactions
• Functional Group Compatibility
• Generating Rules
Goal: Analysis of the reaction space
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Optimization I
• Goal:
Reaction Optimization: > 95% Yield
• 7 Dimensions:
reagent, solvent,
time, temperature,
stoichiometry,
reagent order,
FG-compatibility
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Optimization II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Planning
Functional
Group
Compatibility
Check
H
H
N
H
O
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Example 2: NCI Anti-tumor
/ Anti-viral Database
• Initiated in April 1990 (modified 1994)
• ~ 250.000 compounds
• ~ 30.000 with anti-tumor screening data
Enhanced NCI Database Browser
• > 30 different molecular properties
• up to 23 3D conformers per compound
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Lead Compound Discovery II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Lead Compound Discovery II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Acknowledgment
• Prof. Johann Gasteiger
Computer-Chemie-Centrum
University of Erlangen-Nuremberg
• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel
Visualization and interactive Systems
University of Stuttgart
• Dr. Patrick Kiser, Dr. Gary Eichenbaum
ChemCodes Inc.
• Marc Nicklaus
Laboratory of Medicinal Chemistry
NCI, NIH
• Deutsche Forschungsgemeinschaft
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Related documents