Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Interactive Datamining of Large-Scale
Screening Datasets
Frank Oellien, Wolf D. Ihlenfeldt
Computer-Chemie-Centrum
University Erlangen-Nuremberg
Klaus Engel, Thomas Ertl
Visualization and Interactive Systems Group
University Stuttgart
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Chemical data
18000000
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
Merck Katalog
Synopsys PG
ACX
NCI DTP
ChemInform
Spresi
Beilstein
CAS
Current datasets
0
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Multi-Variate and Multi-Dimensional
Numeric Datasets Today
Change in chemical synthesis technology
• new technologies (HTS, combinatorial synthesis)
 experiments generate terabytes of data per year
• development of data mining and visualization tools
could not keep pace
• most critical bottleneck in R&D today !
 tools for interactive mining and information
visualization are needed
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Tools for Interactive Visualization of
Multi-Variate and Multi-Dimensional Data
Standard applications
• barchart, 2D and pseudo 3D
scatter plots, molecular spreadsheets
• limited to small subsets
• platform-dependent
Our goal: applications that are
• simple to use
• allow straightforward interpretation of results
• generalized access to tabular numeric data
• platform-independent
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
3D Tools for Interactive
Information Visualization
Information Visualization Applications that uses
3D capabilities of modern clients
• Glyph-based InfVis approaches
• Volume-based InfVis approaches
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Glyph-based InfVis Tools
• 3 orthogonal axes
• color
• shape
• size
• transparency
• surface effects
• animation
• up to ~100 Glyphs
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
Java3D
Canvas
Tool Panel
(filters, selection
tools, details)
Control
Panel
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
3D Render Panel
3D Glyphs
C3 © Oellien, Ihlenfeldt, Engel, Ertl
3D Barchart
MMWS 2002
Java/Java3D InfVis Applet
3D Tool Panel
Dynamic Filter Tools
Selection Tools
Detail Tools
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Java/Java3D InfVis Applet
3D Control Panel
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Advantages of Volume-based InfVis Tools
Databases with millions of data points
– Glyph-based InfVis approaches
• produce millions of geometric
primitives
• interactive visualization not possible
– Volume-based InfVis approaches
• can handle large number of
data points
• interactive visualization using
low-cost graphics hardware is possible
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes Reaction Database
• 100 most important FGs ~75% chemistry
• 100 standard reactions
• Limits of standard reactions
• Functional Group Compatibility
• Generating Rules
Goal: Analysis of the reaction space
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Optimization I
• Goal:
Reaction Optimization: > 95% Yield
• 7 Dimensions:
reagent, solvent,
time, temperature,
stoichiometry,
reagent order,
FG-compatibility
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Optimization II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
ChemCodes - Reaction Planning
Functional
Group
Compatibility
Check
H
H
N
H
O
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Example 2: NCI Anti-tumor
/ Anti-viral Database
• Initiated in April 1990 (modified 1994)
• ~ 250.000 compounds
• ~ 30.000 with anti-tumor screening data
Enhanced NCI Database Browser
• > 30 different molecular properties
• up to 23 3D conformers per compound
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Lead Compound Discovery II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Lead Compound Discovery II
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Acknowledgment
• Prof. Johann Gasteiger
Computer-Chemie-Centrum
University of Erlangen-Nuremberg
• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel
Visualization and interactive Systems
University of Stuttgart
• Dr. Patrick Kiser, Dr. Gary Eichenbaum
ChemCodes Inc.
• Marc Nicklaus
Laboratory of Medicinal Chemistry
NCI, NIH
• Deutsche Forschungsgemeinschaft
C3 © Oellien, Ihlenfeldt, Engel, Ertl
MMWS 2002
Related documents