Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Approaches in Atomistic Modeling H. Aourag URMER, University of Tlemcen AMASS – 7/25/03 Outline • • • • • • • Introduction Ex 1: Intergranular Embrittlement of Fe Ex 2: Catalytic Activity - Hydrogenation Ex 3: Stainless Steel CrxNiyFe(1-x-y) Ex 4: Conductivity T7 7xxx Al Alloys Ex 5: Boiling Points Ex 6: Crystal Structure Prediction – open questions… AMASS – 7/25/03 Predicting Properties with Atomistic Modeling Atomistic modeling • Atom positions • Electronic structure • Energies Band Gap Elastic Constants Segregation Energies Activation Barriers Atomic Scale Descriptors ? Macroscopic properties • Elastic properties • Conductivity • Toxicity Direct calculation Band Gap Elastic Constants Physical laws Constitutive relations Data Mining AMASS – 7/25/03 Embrittlement Transport Weldability Toxicity Power of Data Mining Use known data to establish R Calculated Atomistic Properties Database R Measured Macroscopic Properties Database R Predicted Macroscopic Properties Database Use R to predict new data Calculated Atomistic Properties Database • Does not require complete and accurate multiscale theories • New physics in relationships R • Quick, cheap screening for desired properties, errors, etc. – can be qualitative AMASS – 7/25/03 Key Issues Atomic scale descriptors Data Mining Macroscopic Properties – Descriptors accessible to modeling – Descriptors optimally chosen • Use known relationships/physics • Optimize from large set of possibilities – Descriptors→Property relationship is robust • Sensible choice of methods • tested with cross validation, test sets – Data • Large enough • Clean enough AMASS – 7/25/03 Ex 1: Intergranular Embrittlement of Fe • Property: Fe embrittlement • Descriptors→Property relationship: Embrittlement [Grain boundary segregation E - Free surface segregation E] = (EGB – EFS) (Rice ’89) • Descriptors: (EGB – EFS) (calculated ab initio) • Data: Embrittling potency for B, C, P, S. AMASS – 7/25/03 Ex 1: Intergranular Embrittlement of Fe (Wu, et al., Phys. Rev. B., ‘96) Also correctly predicts effect of Mn and Mo on P embrittlement! (Zhong, et al., Phys Rev B, ’97, Geng, et al., Solid State Comm., ’01) AMASS – 7/25/03 Ex 2: Catalytic Activity - Hydrogenation • Property: Reaction rates (Hydrogenation of ethene, benzene on 3d transition metal M) • Descriptors→Property relationship: Adapted Bronsted-Evans_Polanyi Free E + Langmuir-Hinshelwood Rate Equations Rate = R[EMC,12 fitting “constants” independent of M] • Descriptors: – EMC = M-C bond strength in bulk NaCl structure (calculated ab initio) – 12 fitting “constants” (fit to experimental data for each reaction) • Data: 10-20 reaction rates for each of ethene and benzene AMASS – 7/25/03 Ex 2: Catalytic Activity - Hydrogenation Cross-validation in black EMC Ethene: C2H4+H2→C2H6 Cross-validation with alloys EMC Benzene: C6H6+3H2→C6H12 AMASS – 7/25/03 (Toulhoat, et al. ’02) Ex 3: Stainless Steel CrxNiyFe(1-x-y) • Property: High hardness and ductility • Descriptors→Property relationship: Hardness shear modulus = G Ductility bulk modulus/shear modulus = B/G • Descriptors: B,G (from ab initio) • Data: Not clearly defined AMASS – 7/25/03 Vickers Hardness [GPa] Hardness vs. Shear Modulus (Teter, MRS Bulletin, ’98) AMASS – 7/25/03 Shear Modulus [GPa] Ex 3: Stainless Steel CrxNiyFe(1-x-y)) Shear Modulus G Bulk Modulus B Cr (at%) Cr (at%) High Low Ni (at%) High G (hard) High B/G (ductile) Ni (at%) (Vitos, et al., Nature Materials, ‘02) • Optimal at ~Cr18Ni24Fe58 (multiple patents) • Predict improved mechanical properties for Ir, Os doping AMASS – 7/25/03 Ex 4: Conductivity T7 7xxx Al Alloys • Property: Electrical conductivity s • Descriptors→Property relationship: – Linear: s = V*d (requires only fitting) – Neurofuzzy: s = NF(d) (requires only fitting) – Physical: s = P(d) (requires thermodynamic models of relevant phases, Rayleigh–Maxwell equation for resistivity with dispersed particles, Starink-Zahra equation for precipitation, 1D diffusion equation, Matthiesen’s rule for resistivity with dissolved elements) • Descriptors: Concentrations, ageing time d = xZn, xMg, xCu, xZr, xFe, xSi, t AMASS – 7/25/03 Ex 4: Conductivity T7 7xxx Al Alloys s measured for 36 concentration/ageing time samples R-Model Linear Fitting Params 7 RMS Cross Error (%) Validation (%) 4.75 5.25 Neurofuzzy 5 1.35 1.525 Physical 0.97 1.05 6 (Starink, et al., ‘00) AMASS – 7/25/03 Ex 5: Boiling Points (Quantitative Structure-Property Relationships: QSPR) • Property: Boiling Point TB • Descriptors→Property relationship: Neural Network (10:18:1, sigmoid, backpropagation) • Descriptors: Electrostatic and structural properties (calculated with semiempirical VAMP – AM1) • Data: TB for 6629 molecules containing elements H, B, C, N, O, F, Al, Si, P, S, Cl, Zn, Ge, Br, Sn, I, Hg AMASS – 7/25/03 Data Mining Descriptors→Property Relationships Many general approaches • Graphical • Linear Regressions (normal least squares, principal component regression, partial least squares, …) • Neural Networks (perceptrons, feed-forward, radial-basis, …) • Clustering (k-means, nearest-neighbor, …) Many choices in each approach In Neural Networks: • Number of neurons/layers – 3:4:1 • Transfer functions: step, sigmoid, tansig, etc. • Training method: backpropagation algorithms Thousands of possible approaches! • Many yield similar results • Appropriate for different situations • Problem dependent - much art!! AMASS – 7/25/03 Out Descriptors Charged partial surface areas descriptors, Accelyris QSAR module 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. Partial positive surface area (sum of the surface area of positive atoms) Partial negative surface area (sum of the surface area of negative atoms) Total charge weighted positive surface area (descriptor 1 multiplied by the total positive charge) Total charge weighted negative surface area (descriptor 2 multiplied by the total negative charge) Atomic charge weighted positive surface area: (sum of sasa*charge for all positive atoms) Atomic charge weighted negative surface area (sum of sasa*charge for all negative atoms) Difference in charged surface areas: (descriptor 1 - descriptor 2) Difference in total charge weighted surface areas (descriptor 3 - descriptor 4) Difference in atomic charge weighted surface areas (descriptor 5 - descriptor 6) Fractional charged partial surface areas (6 descriptors divided by total surface area) " " " " " Surface weighted charged partial surface areas (6 descriptors multiplied by total surface area) " " " " " Relative positive charge (charge of most positive atom divided by total positive charge Relative negative charge (charge of most negative atom divided by total negative charge Relative positive charge surface area (surface area of most positive atom divided by descriptor 22) Relative negative charge surface area (surface area of most negative atom divided by descriptor 23) Total hydrophobic surface area (sum of surface areas of atoms with |charge| < 0.2) Total polar surface area (sum of surface areas of atoms with |charge| > 0.2) Relative hydrophobic surface area (descriptor 26 divided by total surface area) Relative polar surface area (descriptor 27 divided by total surface area) Total solvent-accessible surface area (http://www.accelrys.com/cerius2/descriptor.html#list) AMASS – 7/25/03 Descriptors • Many broad categories: composition, topological, electronic, physical-chemical properties, … • Thousands of possible descriptors – Use physical knowledge to choose relevant ones (e.g., QSAR principle) – Use numerical methods to choose important descriptors AMASS – 7/25/03 Ex 5: Boiling Point Descriptors (Chalk, et al., J Chem. Inf. Comput. Sci, ‘01) AMASS – 7/25/03 Ex 5: Atomistic Modeling Methods Use VAMP – AM1 and PM3 Hamiltonians – Semi-empirical molecular orbital based – Quantum mechanical, but matrix elements are fit to experimental data – Can calculate optimized geometries, electronic structure (charge properties) – Fairly accurate (known failings) and fast AMASS – 7/25/03 Ex 5: Boiling Points Training set (6000) 17 (max -119) Test set (629) 19 (max -94) (Chalk, et al., J Chem. Inf. Comput. Sci, ‘01) Large errors often due to • Incorrect experimental measurements of TB (low pressure) • Incorrect experimental structures (tautomer misidentification) • Failure of atomistic modeling AMASS –method 7/25/03 (approximation errors) Ex 6: Crystal Structure Prediction • Property: Stable crystal structure • Descriptors→Property relationship: Neighbor Clustering algorithm (Euclidean metric) • Descriptors: Chemical scale (empirically assigned value for each element) (Pettifor, J. Phys. C, ’86) • Data: All intermetallic binary alloys (thousands) AMASS – 7/25/03 CsCl NaCl Structure Maps AMASS – 7/25/03 (Rodgers, CRYSTMET, ‘03) Ex 6: Crystal Structure Prediction • Powerful: structure maps can give 90-95% predictive accuracy • Many Descriptors: ~50 have been tried based on size, atomic number, cohesive energy, electrochemistry, valence electrons • Can’t be extended: accurate maps require ~40% of the possible systems to be known (~80% binaries known, ~0.1% quaternaries) • Can atomistic modeling help? – Fill in data for multicomponent systems – Provide optimal descriptors (Villars, Intermetallic Compounds, ’94) AMASS – 7/25/03 Conclusions • Atomistic modeling and data mining can provide valuable predictive ability when physical theories are incomplete • Key issues are data quality, descriptors, and descriptor→properties relationship • Dangers of overfitting and tuning AMASS – 7/25/03 Bible Code Are these words closer than by chance? Can the Bible predict future events? Some say yes (Witzumn, et al, Stat. Sci., ’94) Some say no (McKay, et al., Stat. Sci., ’99) • Many articles • >60 books on Bible Codes on Amazon • 1 major motion picture (Omega Code) AMASS – 7/25/03 Be careful with your statistics! The First and Greatest Example of Atomic Level Data Mining AMASS – 7/25/03 END AMASS – 7/25/03