Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining in Chemistry Markus C. Hemmer Computer-Chemie-Centrum, Universität Erlangen-Nürnberg D-91054 Erlangen, Germany C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt What is Data Mining ? Data Mining is an analytical process designed to explore large amounts of data in search for consistent patterns and systematic relationships. „...a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data“ (Srikant, Agrawal, 1996) C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Amount of Information in Chemistry Yearly number of documents in Chemical Abstracts Number of registered substances 800 24 700 20 500 Millions Thousands 600 400 300 16 12 8 200 4 100 1920 1940 1960 1980 2000 C3 © Gasteiger et al. 1970 1980 1990 2000 [vermeer]slides/IR/DataMining.ppt The Chemical Language H H O O H P Cl H H H S H H O H Cl H H H H Dichlophenthion Phosphorothioic acid O-2,4-dichlorophenyl O,O-diethyl ester C10H13Cl2O3PS ClC(C(=C1)OP(=S)(OCC)OCC)=CC(=C1)Cl C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Search for Cancerostatic Drugs protein/substrate complex C3 © Gasteiger et al. similar substrates [vermeer]slides/IR/DataMining.ppt Representation of Properties biological activity chemical reactivity C2H5 N O CH3 N C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Non-linear Projection onto a Torus C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Comparison of Steroid Surfaces o o o 3,20-Allopregnandion C3 © Gasteiger et al. o 3,20-Pregnandion [vermeer]slides/IR/DataMining.ppt Descriptor of a Polycyclic System C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Visualization of Multidimensional Data C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Research and Projects at the CCC Evaluation of Reactions Synthesis Design Biochemical Pathways TeleSpec SOL Drug Design VS-C QSAR/QSPR ChemVis Structure/Spectrum Correlation C3 © Gasteiger et al. Dissertation online [vermeer]slides/IR/DataMining.ppt Software Development at the CCC C3 CORINA CACTVS 3D structure generator chemical information system PETRA EROS atomic property calculator reaction prediction expert system ARC CORA descriptor generator reaction classification system KMAP WODCA Kohonen network generator synthesis design expert system © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Data Mining Dienst – Chemie (Data Mining Service – Chemistry) Substructure Search Similarity Search Pattern Recognition C3 © Gasteiger et al. Property Search Diversity Search Pattern Analysis [vermeer]slides/IR/DataMining.ppt Information Sources 2 C3 1 2 X 1 2 Y 1 2Z X x 2 Y y 2 Y z 2 Analysis Calculation Databases Simulation © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt The Concept of Data Mining Service - Chemistry C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Descriptor Software C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Searching a Substructure substructure search C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt Acknowledgements Team Coordination Prof. Dr. Johann Gasteiger Chemical Information Dr. Thomas Engel Databases & Visualization Dr. Wolf-Dietrich Ihlenfeldt Frank Oellien Expert Systems Achim Herwig Genetic Algorithms Dr. Sandra Handschuh Neural Networks Dr. Andreas Teckentrup Dr. Lothar Terfloth C3 © Gasteiger et al. Spectroscopy Dr. Paul Selzer Thomas Kostka Structures & Properties Thomas Kleinöder Christof Schwab Structure Coding Dr. Joao Aires de Sousa Dr. Valentin Steinhauer Synthesis Planning Dr. Matthias Pförtner Markus Sitzmann [vermeer]slides/IR/DataMining.ppt Contact Information Email: [email protected] [email protected] WWW: http://www2.chemie.uni-erlangen.de C3 © Gasteiger et al. [vermeer]slides/IR/DataMining.ppt