Download Radial Distribution Functions - TERENA Networking Conference 2000

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining in Chemistry
Markus C. Hemmer
Computer-Chemie-Centrum, Universität Erlangen-Nürnberg
D-91054 Erlangen, Germany
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
What is Data Mining ?
Data Mining is
an analytical process designed to explore large
amounts of data in search for consistent patterns
and systematic relationships.
„...a non-trivial process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data“ (Srikant,
Agrawal, 1996)
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Amount of Information in Chemistry
Yearly number of documents
in Chemical Abstracts
Number of registered
substances
800
24
700
20
500
Millions
Thousands
600
400
300
16
12
8
200
4
100
1920 1940 1960 1980 2000
C3
© Gasteiger et al.
1970
1980
1990
2000
[vermeer]slides/IR/DataMining.ppt
The Chemical Language
H
H
O
O
H
P
Cl
H
H
H
S H H
O
H
Cl
H
H
H H
Dichlophenthion
Phosphorothioic acid O-2,4-dichlorophenyl O,O-diethyl ester
C10H13Cl2O3PS
ClC(C(=C1)OP(=S)(OCC)OCC)=CC(=C1)Cl
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Search for Cancerostatic Drugs
protein/substrate complex
C3
© Gasteiger et al.
similar substrates
[vermeer]slides/IR/DataMining.ppt
Representation of Properties
biological
activity
chemical
reactivity
C2H5
N
O
CH3
N
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Non-linear Projection onto a Torus
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Comparison of Steroid Surfaces
o
o
o
3,20-Allopregnandion
C3
© Gasteiger et al.
o
3,20-Pregnandion
[vermeer]slides/IR/DataMining.ppt
Descriptor of a Polycyclic System
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Visualization of Multidimensional Data
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Research and Projects at the CCC
Evaluation of Reactions
Synthesis Design
Biochemical Pathways
TeleSpec
SOL
Drug Design
VS-C
QSAR/QSPR
ChemVis
Structure/Spectrum Correlation
C3
© Gasteiger et al.
Dissertation online
[vermeer]slides/IR/DataMining.ppt
Software Development at the CCC
C3
CORINA
CACTVS
3D structure generator
chemical information system
PETRA
EROS
atomic property calculator
reaction prediction expert system
ARC
CORA
descriptor generator
reaction classification system
KMAP
WODCA
Kohonen network generator
synthesis design expert system
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Data Mining Dienst – Chemie
(Data Mining Service – Chemistry)
Substructure Search
Similarity Search
Pattern Recognition
C3
© Gasteiger et al.
Property Search
Diversity Search
Pattern Analysis
[vermeer]slides/IR/DataMining.ppt
Information Sources
 2 
C3
1  2 X 1  2 Y 1  2Z


X x 2 Y y 2 Y z 2
Analysis
Calculation
Databases
Simulation
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
The Concept of Data Mining Service - Chemistry
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Descriptor Software
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Searching a Substructure
substructure search
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt
Acknowledgements
Team Coordination
Prof. Dr. Johann Gasteiger
Chemical Information
Dr. Thomas Engel
Databases & Visualization
Dr. Wolf-Dietrich Ihlenfeldt
Frank Oellien
Expert Systems
Achim Herwig
Genetic Algorithms
Dr. Sandra Handschuh
Neural Networks
Dr. Andreas Teckentrup
Dr. Lothar Terfloth
C3
© Gasteiger et al.
Spectroscopy
Dr. Paul Selzer
Thomas Kostka
Structures & Properties
Thomas Kleinöder
Christof Schwab
Structure Coding
Dr. Joao Aires de Sousa
Dr. Valentin Steinhauer
Synthesis Planning
Dr. Matthias Pförtner
Markus Sitzmann
[vermeer]slides/IR/DataMining.ppt
Contact Information
Email:
[email protected]
[email protected]
WWW: http://www2.chemie.uni-erlangen.de
C3
© Gasteiger et al.
[vermeer]slides/IR/DataMining.ppt