Download Chemometrics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Definition and overview of
chemometrics
Paul Geladi
Head of Research NIRCE
Chairperson NIR Nord
Unit of Biomass Technology and Chemistry
Swedish University of Agricultural Sciences
Umeå
Technobothnia
Vasa
paul.geladi @ btk.slu.se paul.geladi @ syh.fi
Project geography
Chemometrics
Mathematics
Statistics
Computer Science
In Chemistry
Similar fields
•
•
•
•
Biometrics ±1900
Psychometrics ±1930
Econometrics ±1950
Technometrics ±1960
Chemometrics
•
•
•
•
Design of Experiments (DOE)
Exploratory Data Analysis
Classification
Regression and Calibration
Design of Experiments
•
•
•
•
•
•
•
Most important where possible
Uses:
ANOVA
F-test
t-test
Plots
Response Surfaces
Design of Experiments
y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 +
b22x22 +...+ bKKxK2 + b12x1x2 +...+ e
Factors x1, x2,...xK changed systematically
Response y measured and modeled
Exploratory Data Analysis
•
•
•
•
•
Design not possible
Sampling situations
Find structure
Find groupings
Find outliers
Classification
•
•
•
•
•
Check for groupings = UNSUPERVISED
Existing groupings = SUPERVISED
Visualize groupings
Classify
Test
Regression / Calibration
•
•
•
•
•
Two types of variables X / y
Relationship linear / nonlinear
Model
Diagnostics
Residual
y
x
Multivariate Data Analysis
Multivariate Data Analysis
•
•
•
•
•
•
Sampled data and design with too many reponses:
Mining
Hospitals
Agriculture
Food industry
More
Nomenclature
• Samples are objects
• What is measured on the object is a variable
34.92
Spectrum
K
1 1
S
a
m
p
l
e
s
Vectors
I
12
3.6
11.1
5.9
34
0.5
1.4
17
A vector is a collection
of numbers.
It is always a column
vector.
12 3.6 11.1 5.9 34 0.5 1.4 17
The transpose of a vector is
a row vector.
Symbols for transpose are
’ and T. a’ or aT.
Particle size, 1 sample
18
16
14
12
10
8
6
4
2
0
0
5
10
15
20
25
Small particles, 35 samples
12
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
The Data Matrix
K
A data matrix is a
vector of vectors
I
Size histograms, all samples
40
35
30
25
20
15
10
5
0
0
5
10
15
Particle area
20
25
Times in batch reaction
4
3.5
3
2.5
2
1.5
1
0.5
0
0
200
400
600
800
1000
NIR wavelengths
1200
Geometry of multivariate space
Problem
I and K can be large
Correlation
Univariate statistics does not apply
3 variables: blood oxygen,
iron, hemoglobin
I patients
Hb
Fe
O2
Hb
Fe
O2
Hb
Fe
O2
Hb
Fe
O2
Hb
Fe
O2
Hb
Fe
O2
Hb
Fe
O2
Hb
O2
Fe
Hb
Fe
O2
Properties of multivariate space
Rotation
vectors unchanged / distance unchanged
Translation
vectors changed / distance unchanged
Rescaling / change units
all changes
Consequences
• We can move the coordinate sytem around
• The relative distances between objects do
not change
• We can rotate the coordinate system
• Scale changes are important
• Move coordinate system to center of data
• Scale properly
Vectors (physics)
x = [ x1, x2, x3 ]
|| x || = ( x12 + x22 + x32 ) 1/2
Geometry
c
a
b
c2 = a 2 + b 2
Vectors (K dimensions)
x = [ x1, x2,..., xK ]
|| x || = ( x12 + x22 +...+ xK2 ) 1/2
Problem
We can not see in more than 3 dimensions
Paper, computer screen: 2-2.5 dimensions
Hb
Fe
O2
Hb
Fe
O2
Projection
2D plane (screen, paper)
Many projections possible
Find a good one
Find a few good ones
What is good?
Related documents