Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang [email protected] Computational Chemistry Robots ACS Sep 2005 •Can high-throughput computation provide a reliable “experimental” resource for molecular properties? •Can protocols be automated? •Can we believe the results? Computational Chemistry Robots ACS Sep 2005 Aspects of complete automation • Humans must validate protocols rather than individual data • Low rates of error must be addressed • Users should know the rates of error and degree of conformance Computational Chemistry Robots ACS Sep 2005 Approaches to conformance • Explore limits of job behaviour (times, convergence, etc.) • Analyse reproducibility • Vary and analyse effects of parameters and algorithms • Compare output with other “measurements” of same quantity Computational Chemistry Robots ACS Sep 2005 The overall view molecules computation dissemination Computational Chemistry Robots ACS Sep 2005 The overall view molecules computation dissemination Check results Computational Chemistry Robots ACS Sep 2005 Components of System • Workflow for management of jobs (Taverna) • Natural Language Processing based parsing of outputs (JUMBOMarker) • Pairwise comparison of data sets (R) • Analysis of mean and variance • Detection and analysis of outliers Computational Chemistry Robots ACS Sep 2005 Computing the NCI database MOPAC PM5a aMOPAC PM5 – collaboration with J.J.P. Stewart Computational Chemistry Robots Unsuitable Data ACS Sep 2005 Program Crashes Pathological Behaviour Inform Developer Protocol System Crashes Statistics Science Errors Log Files Parse Analysis Other Science Disseminate Results Computational Chemistry Robots ACS Sep 2005 Taverna •Workflow programs allow a series of small tasks to be linked together to develop more complex tasks •Open Source •myGRID, eScience •European Bioinformatics Institute •University of Manchester Computational Chemistry Robots ACS Sep 2005 An Example Taverna Workflow Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Log Files Parsing Log Files to CML Coordinates Calculation Type Molecular Formula Point Group Total Energy Dipole Computational Chemistry Robots ACS Sep 2005 CompChem Output Parsers CML File Input/jobControl General CMLCore Coordinates Coordinates CMLCore Energy Levels Energy Level CMLComp Vibrations Vibration CMLSpect Computational Chemistry Robots ACS Sep 2005 Dissemination of results LOG FILE CML FILE HUMAN DISPLAY JUMBOMarker NLP-based log file parser WWMM* Server and DSpace * World Wide Molecular Matrix Outside world Computational Chemistry Robots ACS Sep 2005 InChI: IUPAC International Chemical Identifier A non-proprietary unique identifier for the representation of chemical structures. A normal, canonicalised and serialised form of a chemical connection table. InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/ Computational Chemistry Robots ACS Sep 2005 Proteus molecules* JUNK Cured by MOPAC Calculation * Proteus was a shape changing ocean deity Computational Chemistry Robots ACS Sep 2005 Proteus molecules Input JUNK Calculation Computational Chemistry Robots ACS Sep 2005 How do we know our results are valid? Computational Method 1 Computational Method 2 Experiment Computational Chemistry Robots ACS Sep 2005 J.J.P. Stewart’s example 50 40 30 20 Difference 10 (Kcal.mol-1) 0 Calculated DHf – Expt DHf -10 -20 -30 -40 -50 0 200 400 600 800 Compound 1000 1200 Computational Chemistry Robots ACS Sep 2005 GAMESS MOPAC results a Project GAMESSa 631G* B3LYP with Kim Baldridge and Wibke Sudholt Log Files Computational Chemistry Robots Unsuitable Data ACS Sep 2005 Program Crashes Pathological Behaviour Inform Developer Protocol System Crashes Statistics Science Errors Log Files Parse Analysis Other Science Disseminate Results Computational Chemistry Robots ACS Sep 2005 Repeat runs, different methods Multiple runs give same final structure from same input Changing memory allocation doesn’t make a difference Computational Chemistry Robots ACS Sep 2005 Pathological behaviour - Early detection divinyl ether 100 min trans-Crotonaldehyde 631G*, B3LYP 200 min Z matrix 15 min 631G*, B3LYP 10080 min Computational Chemistry Robots ACS Sep 2005 Times to run jobs time / s 120,000 80,000 40,000 0 0.E+00 1.E+09 5.E+08 (n basis functions) 4 Computational Chemistry Robots ACS Sep 2005 Analysis of different computational methods Mean - Overall difference Normality - Distribution of values Outliers - Unusual molecules? Variance - Spread of the data, depends on both distributions. (standard deviation) Computational Chemistry Robots ACS Sep 2005 Probability Plot (Normal QQ plot) Computational Chemistry Robots ACS Sep 2005 Probability Plot (Normal QQ plot) Mean of distribution (Approx - 0.03 Å) Range over which sample distribution is approximately normal S.D. 0.020 Å Outliers Computational Chemistry Robots ACS Sep 2005 All bonds* Dr (MOPAC – GAMESS) / Å * Excludes bonds to Hydrogenc Computational Chemistry Robots ACS Sep 2005 All bonds* Dr (MOPAC – GAMESS) / Å Good agreement S.D. 0.005 Å Nearly normal Outliers * Excludes bonds to Hydrogenc Computational Chemistry Robots ACS Sep 2005 Bad molecules and data usually cause outliers H N O N 2P O H Na O Computational Chemistry Robots ACS Sep 2005 Mean Dr (M - G) / Å Standard Error of the Mean / Å C N O C N O F S Cl -0.006 0.020 -0.010 -0.014 -0.040 -0.037 0.000 0.000 0.000 0.001 0.001 0.001 0.006 -0.037 -0.055 0.001 0.001 0.009 -0.087 -0.070 0.004 0.014 All values given to 3 significant figures Computational Chemistry Robots ACS Sep 2005 Dr CC bonds (M - G) / Å Computational Chemistry Robots ACS Sep 2005 Dr CC bonds (M - G) / Å Good agreement S.D. 0.013 Å Nearly normal Outliers JUNK Computational Chemistry Robots ACS Sep 2005 Selection of molecules with C C Dr (M - G) > 0.05 Angstroms OH CF3 O H H2N CF3 F HO CF3 HO N H CF3 O F OH HO CHF2 Computational Chemistry Robots ACS Sep 2005 Non aromatic C C bonds adjacent to CFn Y = 0.0277 X – 0.0061 Computational Chemistry Robots ACS Sep 2005 Dr NN bonds (M - G) / Å Computational Chemistry Robots ACS Sep 2005 Dr NN bonds (M - G) / Å S.D. 0.022 Å Good agreement Nearly normal Kink Computational Chemistry Robots ACS Sep 2005 Density plot of Dr NN bonds (M - G) / Å Computational Chemistry Robots ACS Sep 2005 Density plot of Dr NN bonds (M - G) / Å RIGHT LEFT Computational Chemistry Robots ACS Sep 2005 Most common fragments found in Left set but not Right set N(ar) S(sp2) N(sp3) C(sp2) N C(sp3) N (ar) Or C(sp3) N(ar) S(sp2) N (ar) C(sp2) Computational Chemistry Robots ACS Sep 2005 Comparison of theory and experiment CIF* CIF* CIF 2 CML CIF* GAMESS CIF* CIF* Log Files * CIF: Crystallographic Information File Computational Chemistry Robots ACS Sep 2005 Reading Acta Crystallographica Section E Computational Chemistry Robots ACS Sep 2005 All bonds* Dr (Cryst. – GAMESS) /Å Single molecules, no disorder * Excludes bonds to Hydrogenc Computational Chemistry Robots ACS Sep 2005 All bonds* Dr (Cryst. – GAMESS) /Å Single molecules, no disorder Mean Dr - 0.011 Å S.D. 0.014 Å Nearly normal Outliers * Excludes bonds to Hydrogenc Computational Chemistry Robots ACS Sep 2005 Dr CC bonds (C – G) /Å Computational Chemistry Robots Mean Dr - 0.01 Å ACS Sep 2005 Dr CC bonds (C – G) /Å S.D. 0.009 Å Nearly normal Computational Chemistry Robots ACS Sep 2005 Dr CO bonds (C – G) /Å Computational Chemistry Robots ACS Sep 2005 Dr CO bonds (C – G) /Å S.D. 0.011 Å Good agreement Nearly normal Outliers ? Computational Chemistry Robots ACS Sep 2005 Chemistry can cause outliers Dr = +0.08 Å H movement Computational Chemistry Robots ACS Sep 2005 Conclusions • Protocols can be automated • Machines can highlight unusual behaviour, geometries and distribution of results for humans to consider •Computational programs can provide high quality “experimental” molecular properties Computational Chemistry Robots ACS Sep 2005 Thanks J.J.P. Stewart Kim Baldridge Wibke Sudholt Simon Tyrrell Yong Zhang Peter Murray-Rust Unilever Computational Chemistry Robots ACS Sep 2005 Questions Homepage: http://wwmm.ch.cam.ac.uk InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq R: http:// www.r-project.org Taverna: http://taverna.sourceforge.net/ MOPAC 2002: http://www.cachesoftware.com/mopac/ GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html