Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Definition and overview of chemometrics Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi @ btk.slu.se paul.geladi @ syh.fi Project geography Chemometrics Mathematics Statistics Computer Science In Chemistry Similar fields • • • • Biometrics ±1900 Psychometrics ±1930 Econometrics ±1950 Technometrics ±1960 Chemometrics • • • • Design of Experiments (DOE) Exploratory Data Analysis Classification Regression and Calibration Design of Experiments • • • • • • • Most important where possible Uses: ANOVA F-test t-test Plots Response Surfaces Design of Experiments y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 + b22x22 +...+ bKKxK2 + b12x1x2 +...+ e Factors x1, x2,...xK changed systematically Response y measured and modeled Exploratory Data Analysis • • • • • Design not possible Sampling situations Find structure Find groupings Find outliers Classification • • • • • Check for groupings = UNSUPERVISED Existing groupings = SUPERVISED Visualize groupings Classify Test Regression / Calibration • • • • • Two types of variables X / y Relationship linear / nonlinear Model Diagnostics Residual y x Multivariate Data Analysis Multivariate Data Analysis • • • • • • Sampled data and design with too many reponses: Mining Hospitals Agriculture Food industry More Nomenclature • Samples are objects • What is measured on the object is a variable 34.92 Spectrum K 1 1 S a m p l e s Vectors I 12 3.6 11.1 5.9 34 0.5 1.4 17 A vector is a collection of numbers. It is always a column vector. 12 3.6 11.1 5.9 34 0.5 1.4 17 The transpose of a vector is a row vector. Symbols for transpose are ’ and T. a’ or aT. Particle size, 1 sample 18 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 Small particles, 35 samples 12 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 The Data Matrix K A data matrix is a vector of vectors I Size histograms, all samples 40 35 30 25 20 15 10 5 0 0 5 10 15 Particle area 20 25 Times in batch reaction 4 3.5 3 2.5 2 1.5 1 0.5 0 0 200 400 600 800 1000 NIR wavelengths 1200 Geometry of multivariate space Problem I and K can be large Correlation Univariate statistics does not apply 3 variables: blood oxygen, iron, hemoglobin I patients Hb Fe O2 Hb Fe O2 Hb Fe O2 Hb Fe O2 Hb Fe O2 Hb Fe O2 Hb Fe O2 Hb O2 Fe Hb Fe O2 Properties of multivariate space Rotation vectors unchanged / distance unchanged Translation vectors changed / distance unchanged Rescaling / change units all changes Consequences • We can move the coordinate sytem around • The relative distances between objects do not change • We can rotate the coordinate system • Scale changes are important • Move coordinate system to center of data • Scale properly Vectors (physics) x = [ x1, x2, x3 ] || x || = ( x12 + x22 + x32 ) 1/2 Geometry c a b c2 = a 2 + b 2 Vectors (K dimensions) x = [ x1, x2,..., xK ] || x || = ( x12 + x22 +...+ xK2 ) 1/2 Problem We can not see in more than 3 dimensions Paper, computer screen: 2-2.5 dimensions Hb Fe O2 Hb Fe O2 Projection 2D plane (screen, paper) Many projections possible Find a good one Find a few good ones What is good?