Download C(sp 2 )

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computational Chemistry Robots
ACS Sep 2005
Computational Chemistry
Robots
J. A. Townsend, P. Murray-Rust,
S. M. Tyrrell, Y. Zhang
[email protected]
Computational Chemistry Robots
ACS Sep 2005
•Can high-throughput computation provide a
reliable “experimental” resource for
molecular properties?
•Can protocols be automated?
•Can we believe the results?
Computational Chemistry Robots
ACS Sep 2005
Aspects of complete automation
• Humans must validate protocols rather than
individual data
• Low rates of error must be addressed
• Users should know the rates of error and degree
of conformance
Computational Chemistry Robots
ACS Sep 2005
Approaches to conformance
• Explore limits of job behaviour (times,
convergence, etc.)
• Analyse reproducibility
• Vary and analyse effects of parameters and
algorithms
• Compare output with other “measurements” of
same quantity
Computational Chemistry Robots
ACS Sep 2005
The overall view
molecules
computation
dissemination
Computational Chemistry Robots
ACS Sep 2005
The overall view
molecules
computation
dissemination
Check
results
Computational Chemistry Robots
ACS Sep 2005
Components of System
• Workflow for management of jobs (Taverna)
• Natural Language Processing based parsing of
outputs (JUMBOMarker)
• Pairwise comparison of data sets (R)
• Analysis of mean and variance
• Detection and analysis of outliers
Computational Chemistry Robots
ACS Sep 2005
Computing the NCI database
MOPAC
PM5a
aMOPAC
PM5 – collaboration with J.J.P. Stewart
Computational Chemistry Robots
Unsuitable
Data
ACS Sep 2005
Program
Crashes
Pathological
Behaviour
Inform
Developer
Protocol
System
Crashes
Statistics
Science
Errors
Log Files
Parse
Analysis
Other Science
Disseminate
Results
Computational Chemistry Robots
ACS Sep 2005
Taverna
•Workflow programs allow a series of small tasks to be
linked together to develop more complex tasks
•Open Source
•myGRID, eScience
•European Bioinformatics Institute
•University of Manchester
Computational Chemistry Robots
ACS Sep 2005
An Example Taverna Workflow
Computational Chemistry Robots
ACS Sep 2005
Computational Chemistry Log Files
Parsing Log Files to CML
Coordinates
Calculation
Type
Molecular
Formula
Point
Group
Total
Energy
Dipole
Computational Chemistry Robots
ACS Sep 2005
CompChem
Output
Parsers
CML File
Input/jobControl
General
CMLCore
Coordinates
Coordinates
CMLCore
Energy Levels
Energy Level
CMLComp
Vibrations
Vibration
CMLSpect
Computational Chemistry Robots
ACS Sep 2005
Dissemination of results
LOG FILE
CML FILE
HUMAN DISPLAY
JUMBOMarker
NLP-based log file parser
WWMM* Server and DSpace
* World Wide Molecular Matrix
Outside world
Computational Chemistry Robots
ACS Sep 2005
InChI: IUPAC International Chemical Identifier
A non-proprietary unique identifier for the representation of chemical structures.
A normal, canonicalised and serialised form of a chemical connection table.
InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/
Computational Chemistry Robots
ACS Sep 2005
Proteus molecules*
JUNK
Cured by MOPAC
Calculation
* Proteus was a shape changing ocean deity
Computational Chemistry Robots
ACS Sep 2005
Proteus molecules
Input
JUNK
Calculation
Computational Chemistry Robots
ACS Sep 2005
How do we know our results are valid?
Computational
Method 1
Computational
Method 2
Experiment
Computational Chemistry Robots
ACS Sep 2005
J.J.P. Stewart’s example
50
40
30
20
Difference 10
(Kcal.mol-1)
0
Calculated DHf
– Expt DHf
-10
-20
-30
-40
-50
0
200
400
600
800
Compound
1000
1200
Computational Chemistry Robots
ACS Sep 2005
GAMESS
MOPAC
results
a Project
GAMESSa
631G*
B3LYP
with Kim Baldridge and Wibke Sudholt
Log Files
Computational Chemistry Robots
Unsuitable
Data
ACS Sep 2005
Program
Crashes
Pathological
Behaviour
Inform
Developer
Protocol
System
Crashes
Statistics
Science
Errors
Log Files
Parse
Analysis
Other Science
Disseminate
Results
Computational Chemistry Robots
ACS Sep 2005
Repeat runs, different methods
Multiple runs give same final structure from same
input
Changing memory allocation doesn’t make a
difference
Computational Chemistry Robots
ACS Sep 2005
Pathological behaviour - Early detection
divinyl ether
100 min
trans-Crotonaldehyde
631G*, B3LYP
200 min
Z matrix
15 min
631G*, B3LYP
10080 min
Computational Chemistry Robots
ACS Sep 2005
Times to run jobs
time / s
120,000
80,000
40,000
0
0.E+00
1.E+09
5.E+08
(n basis functions)
4
Computational Chemistry Robots
ACS Sep 2005
Analysis of different computational methods
Mean
- Overall difference
Normality
- Distribution of values
Outliers
- Unusual molecules?
Variance
- Spread of the data, depends
on both distributions.
(standard deviation)
Computational Chemistry Robots
ACS Sep 2005
Probability Plot (Normal QQ plot)
Computational Chemistry Robots
ACS Sep 2005
Probability Plot (Normal QQ plot)
Mean of distribution
(Approx - 0.03 Å)
Range over which
sample distribution is
approximately normal
S.D.
0.020 Å
Outliers
Computational Chemistry Robots
ACS Sep 2005
All bonds* Dr (MOPAC – GAMESS) / Å
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
All bonds* Dr (MOPAC – GAMESS) / Å
Good agreement
S.D.
0.005 Å
Nearly normal
Outliers
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
Bad molecules and data usually cause outliers
H
N
O
N 2P
O
H
Na
O
Computational Chemistry Robots
ACS Sep 2005
Mean Dr (M - G) / Å
Standard Error of the Mean / Å
C
N
O
C
N
O
F
S
Cl
-0.006
0.020
-0.010
-0.014
-0.040
-0.037
0.000
0.000
0.000
0.001
0.001
0.001
0.006
-0.037
-0.055
0.001
0.001
0.009
-0.087
-0.070
0.004
0.014
All values given to 3 significant figures
Computational Chemistry Robots
ACS Sep 2005
Dr CC bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Dr CC bonds (M - G) / Å
Good agreement
S.D.
0.013 Å
Nearly normal
Outliers
JUNK
Computational Chemistry Robots
ACS Sep 2005
Selection of molecules with C C
Dr (M - G) > 0.05 Angstroms
OH
CF3
O
H
H2N
CF3
F
HO
CF3
HO
N
H
CF3
O
F
OH
HO
CHF2
Computational Chemistry Robots
ACS Sep 2005
Non aromatic C C bonds adjacent to CFn
Y = 0.0277 X – 0.0061
Computational Chemistry Robots
ACS Sep 2005
Dr NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Dr NN bonds (M - G) / Å
S.D.
0.022 Å
Good agreement
Nearly normal
Kink
Computational Chemistry Robots
ACS Sep 2005
Density plot of Dr NN bonds (M - G) / Å
Computational Chemistry Robots
ACS Sep 2005
Density plot of Dr NN bonds (M - G) / Å
RIGHT
LEFT
Computational Chemistry Robots
ACS Sep 2005
Most common fragments found in
Left set but not Right set
N(ar)
S(sp2)
N(sp3)
C(sp2)
N
C(sp3)
N (ar)
Or
C(sp3)
N(ar)
S(sp2)
N (ar)
C(sp2)
Computational Chemistry Robots
ACS Sep 2005
Comparison of theory and experiment
CIF*
CIF*
CIF 2 CML
CIF*
GAMESS
CIF*
CIF*
Log Files
* CIF: Crystallographic Information File
Computational Chemistry Robots
ACS Sep 2005
Reading Acta Crystallographica Section E
Computational Chemistry Robots
ACS Sep 2005
All bonds* Dr (Cryst. – GAMESS) /Å
Single molecules, no disorder
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
All bonds* Dr (Cryst. – GAMESS) /Å
Single molecules, no disorder
Mean Dr
- 0.011 Å
S.D.
0.014 Å
Nearly normal
Outliers
* Excludes bonds to Hydrogenc
Computational Chemistry Robots
ACS Sep 2005
Dr CC bonds (C – G) /Å
Computational Chemistry Robots
Mean Dr
- 0.01 Å
ACS Sep 2005
Dr CC bonds (C – G) /Å
S.D.
0.009 Å
Nearly normal
Computational Chemistry Robots
ACS Sep 2005
Dr CO bonds (C – G) /Å
Computational Chemistry Robots
ACS Sep 2005
Dr CO bonds (C – G) /Å
S.D.
0.011 Å
Good agreement
Nearly normal
Outliers ?
Computational Chemistry Robots
ACS Sep 2005
Chemistry can cause outliers
Dr = +0.08 Å
H movement
Computational Chemistry Robots
ACS Sep 2005
Conclusions
• Protocols can be automated
• Machines can highlight unusual behaviour,
geometries and distribution of results for
humans to consider
•Computational programs can provide high
quality “experimental” molecular properties
Computational Chemistry Robots
ACS Sep 2005
Thanks
J.J.P. Stewart
Kim Baldridge
Wibke Sudholt
Simon Tyrrell
Yong Zhang
Peter Murray-Rust
Unilever
Computational Chemistry Robots
ACS Sep 2005
Questions
Homepage: http://wwmm.ch.cam.ac.uk
InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq
R: http:// www.r-project.org
Taverna: http://taverna.sourceforge.net/
MOPAC 2002: http://www.cachesoftware.com/mopac/
GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html
Related documents