Download Chimiometrie 2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Least squares wikipedia , lookup

Robust statistics wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chimiometrie 2009
Proposed model for Challenge2009
Patrícia Valderrama
[email protected]
[email protected]
1° step) Models development
Model
RMSEC
R2
Mean center 20VL
2.7019
0.8325
Mean center 20VL
+ 1° derivative
1.2991
0.9613
Mean center 20VL
+ Baseline
2.8022
0.8198
Mean center 20 VL
+ Smoothing
2.7779
0.8229
Mean center 20VL
+ Smoothing + 1°
derivative
1.7005
0.9336
Variable Selection
Genetic Algorithm
iPLS
Obs.: Impossible to estimate the RMSEP because of the need of reference values to X_TST!
I
RMSEP  
n 1
y
 yest,i 
2
ref,1
I
yref is the reference value
yest is the estimate value by the model
I is the number of samples
2° step) Models with Variable Selection
Model
RMSEC
R2
GA Mean Center
20VL
1.2940
0.9616
iPLS 10 intervals
15VL
3.7139
0.6674
iPLS 5 intervals
14VL
2.0182
0.9008
iPLS 3 intervals
19VL
1.6438
0.9374
iPLS 2 intervals
16VL
1.2526
0.9625
Best Model
3° step) Outliers detection for the Best Model
Model
RMSEC
R2
iPLS 2 intervals
16VL
1.2526
0.9625
iPLS 2 intervals 16
VL after outliers
detection
0.8320
0.9834
The outliers detection in calibration matrix were based on :
Extreme Leverages (zero outliers)
Unmodeled Residuals in Spectra (zero outliers)
Unmodeled Residuals in Dependent Variables (7 outliers)
Outliers total in calibration = 7
The outliers detection in validation matrix were based on:
Extreme Leverages (129 outliers)
Unmodeled Residuals in Spectra (106 outliers)
Outliers total in validation = 153
Best Model
Optimized
3° step) Outliers detection in calibration and validation matrix
Based on:
Extreme Leverages: Leverage represents how much one sample is distant from the
center of the data.
1
T
T
i
A,i
A
A,i
h  tˆ
Tˆ Tˆ 

tˆ
According to ASTM E1655-00 , samples with
higher than a limit value (hi), should be removed
from the calibration set.
A 1
hi  3
n
where T represents the scores of all calibration samples, ti is the score
vector of a particular sample, and A is the number of latent variables.
n = number of samples
3° step) Outliers detection in calibration and validation matrix
Based on:
Unmodeled Residuals in Spectra: Identification of outliers based on unmodeled residuals
in spectral data were obtained by comparison of the standard deviation total residuals
(s(e)) with the standard deviation of a particular sample (s(ei)):
n  J
1
2
  xi , j  xˆi , j  
s(eˆ) 


nJ  J  A max( n, J ) i 1  j 1

2



J
n
2
s(êi ) 
 x i, j  x̂ i, j
nJ  J  A max( n, J) j 1
2
n = number of samples
J = number of variables
A = number of latent variables
Xi,j = absorbance value of the sample i at wavelength j
ˆ i , =j estimated value with A latent variables
x

If a sample presents s(ei) > 2s(e), the sample should
be removed from the calibration set.
s(êi)>2s(ê)
3° step) Outliers detection in calibration matrix
Based on:
Unmodeled Residuals in Dependent Variables: Outliers are identified through comparison
of the root mean square error of calibration (RMSEC) with the absolute error of that
sample.
1
RMSEC 
n  A 1

n
2
ˆ
  yi  yi 
i 1
If a sample presents a difference between its reference value
(yi) and its estimate (yˆi) larger 2 times the RMSEC, it is
identified as an outlier
 yi  yˆi   2RMSEC 
n = number of samples
J = number of variables
A = number of latent variables
yi = reference value for the i sample
ŷi = estimated value for I samples
4° step) Figures of Merit for the Best Model Optimized
•Accuracy
• Fit
•Precision – impossible to estimate because of the need of replicates to the validation samples
•Sensitivity
•Analytical Sensitivity
• Selectivity
•Linearity
•Limit of Detection (LOD)
•Limit of Quantification (LOQ)
•Signal-to-noise ratio
4° step) Figures of Merit for the Best Model Optimized
Accuracy: This parameter reports the closeness of agreement between the
reference value and the value found by the calibration model. In chemometrics,
this is generally expressed as the root mean square error of calibration (RMSEC)
prediction (RMSEP). However, RMSEP is a global parameter that incorporates
both systematic and random errors. Hence, an F-test with the RMSEC/RMSEP of
two methods is not appropriate to compare the accuracy, a better indicator is
the regression of found versus nominal concentrations values and estimation of
the linear regression slope and intercept, including the consideration of the
elliptical joint confidence regions.
The ellipses contain the ideal point (1, 0), for
slope and intercept respectively, showing that
the reference calibration values and PLS
results do not present a significant difference
with 99% of confidence.
4° step) Figures of Merit for the Best Model Optimized
Fit:
Net Analyte Signal Versus Reference Values:
Presentation pseudo-univariate of the multivariate calibration model
4° step) Figures of Merit for the Best Model Optimized
Sensitivity: This parameter is the fraction of analytical signal due to the
increase of the concentration of a particular analyte at unit concentration.
1
sên 
b
sên  ŝ nas

k
x̂ nas
A , K ,i
yi
= 2.3932x10-5
Analytical Sensitivity: The inverse of this parameter reports the
minimum concentration difference between two samples that can be
determined by the model, considering that the spectral noise represents
the larger source of error.
sên
= 0.5737

x
And the minimum concentration difference between two samples
that can be determined by the model is -1 = 1.7431
4° step) Figures of Merit for the Best Model Optimized
Selectivity: Signal fraction utilized in the quantification
nâs i
sel i 
= 0.21
xi
Linearity: in multivariate calibration a liner model should presents errors
with alleatory behavior
4° step) Figures of Merit for the Best Model Optimized
Limit of Detection: Following IUPAC recommendations, the LOD can be
defined as the minimum detectable value of net signal (or concentration).
1
LD  3x b  3x
sên
= 5.7518
Limit of Quantification: The ability of quantification is generally
expressed in terms of the signal or analyte concentration value that will
produce estimatives having a specified standard deviation, usually 10%.
1
LQ  10x b  10x
sên
= 17.4296
4° step) Figures of Merit for the Best Model Optimized
Signal-to-noise ratio: How much the net analyte signal is superior to
instrumental noise
nâs i
S/ Ri 
x
Max = 26.1264
Min = 9.5815