Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Digital Media Lab
Data Mining Applied
To Fault Detection
Shinho Jeong
Jaewon Shim
Hyunsoo Lee
{cinooco, poohut, darth7}@icu.ac.kr
Digital Media Lab
1
Introduction
Aims of work
Neural Network Implementation of the Non-linear PCA model using
Principal Curve algorithm to increase both rapidity & accuracy of
fault detection.
Data mining?
Logo
Extracting useful information from raw data
using statistical methods and/or AI techniques.
Characteristics
Maximum use of data available.
Rigorous theoretical knowledge not required.
Efficient for a system with deviation between actual process and
first principal based model .
Application
Process monitoring
Fault detection/diagnosis/isolation
Process estimation
Digital Media Lab
Soft sensor
2
Fault Detection?
Logo
Fault introduction
Digital Media Lab
3
Issues
Major concerns
Rapidity
Ability to detect fault situation at an earlier stage of fault
introduction.
Accuracy
Logo
Ability to distinguish fault situation from possible process
variations.
Trade-off problem
Solve through
Digital Media Lab
Frequent acquisition of process data.
Derivation of efficient process model through data
analysis using Data mining methodologies.
4
Inherent Problems
①
Multi-colinearity problem
Due to high correlation among variables.
②
Due to more variables than observations.
Likely to cause over-fitting problem in model-building phase.
Dimensional reduction required.
Non-linearity problem
Due to non-linear relation among variables.
④
Likely to cause redundancy problem.
Derivation of new uncorrelated feature variables required.
Dimensionality problem
③
Logo
Pre-determination of degree of non-linearity required.
Application of non-linear model required.
Process dynamics problem
Due to change of operating conditions with
time.
Digital Media Lab
Likely to cause change of correlation structure among variables.
5
Statistical Approach
Logo
Statistical data analysis
Uni-variate SPC
Conventional Shewart, CUSUM, EWMA, etc.
Limitations
Perform monitoring for each process variable.
More concerned with how variables co-vary.
Inefficient for multi-variate system.
Need for multi-variate data analysis
Multi-variate SPC
PCA
Digital Media Lab
Most popular multi-variate data analysis method.
Basis for regression modesl(PLS, PCR, etc).
6
Linear PCA(1)
Features
Creation of…
Logo
Fewer => solve ‘Dimensionality problem‘
&
Orthogonal => solve ‘Multi-colinearity problem‘
new feature variables(Principal components)
through linear combination of original variables.
Perform Noise reduction additionally.
Basis for PCR, PLS.
Limitation
Linear model => inefficient for nonlinear process.
Digital Media Lab
7
Linear PCA(2)
Logo
Theory
Let , x [ x1 x2 x3
xm ] ~ original var's
Cov( x) , pi i pi (i 1, 2,3,
ti x pi t x P x t P (
'
x {t1 p1 t2 p2 t3 p3
'
'
'
, m)
P ~ orthonormal matrix)
tl pl } {tl 1 pl 1
'
'
tl Pl ' el x el
F (tl ) tl Pl ' x ~decoding mapping
Digital Media Lab
'
Encoding
mapping
x f ( x, Pl ) tl Pl ' ( x Pl ) Pl ' F (G ( x ))
G ( x ) x Pl tl ~encoding mapping
t m pm }
x
tl
x
Decoding
mapping
8
Linear PCA(3)
Logo
ERM inductive principle
n
1
R emp ( Pl ) xi xi
n i 1
Limitation
2
where, xi F (G( xi )) ( xi pl ) p
'
l
G( xi ), F (tl ) ~ linear functions
Alternatives
Extension of linear functions to non-linear ones
using…
Neural networks.
Statistical method.
Digital Media Lab
9
Kramer’s Approach
x
Input layer
x
Mapping
layer
'
Bottleneck
layer
Logo
x%
Demapping
Output layer
layer
Limitations
Difficult to train the networks with 3 hidden layers.
Difficult to determine the optimal # of hidden nodes.
Difficult to interpret the meaning of the bottle-neck layer.
Digital Media Lab
10
Non-linear PCA(1)
Principal curve(Hastie et al. 1989)
Logo
Statistical, Non-linear generalization of the first
linear Principal component.
Self-consistency principle
x F (G( x)) ( x | z arg min F ( z ) x )
2
z
①
Projection step(Encoding)
z G( x) arg min F ( z ) x )
2
z
②
Conditional averaging(Decoding)
x F ( z ) ( x | z )
Digital Media Lab
11
Non-linear PCA(2)
Logo
1
Limitations
0.8
Finiteness of data.
Unknown density distribution.
No a priori information about data.
σ=0.5
0.6
0.4
σ=1
0.2
σ=2
Additional consideration
σ=4
0
-5
②
-4
-3
-2
-1
0
1
2
3
4
Conditional averaging => Locally weighted
regression, Kernel regression
Increasing flexibility(Span decreasing)
Digital Media Lab
Span : fraction of data considered to be in the neighborhood.
~ smoothness of fit
~ generalization capacity
12
5
Proposed Approach(1)
Logo
LPCA v.s. NLPCA
Digital Media Lab
13
Proposed Approach(1)
Logo
Creation of Non-linear principal scores
x F1 ( z1 ) e1 where, F1 ( z1 ) C1
ei 1 Fi ( zi ) ei where, i =1,2,3,
x =F1 ( z1 ) F2 ( z2 )
z [ z1 , z2 ,
Digital Media Lab
and e0 = x
Fl ( zl ) el x el
, zl ] ~ non-linear principal score
14
Proposed Approach(2)
Logo
Implementation of Auto-associative N.N.
Construction of 2 MLP N.N.'s from ( x, z ) & ( z , x)
Reconstructed
x
Input layer
1st MLP's hidden
1st MLP
Digital Media Lab
z
z
NLPC
score
x
2nd MLP's hidden
Reconstructed
2nd MLP
15
Case Study
Logo
Objective
Fault detection during operating mode change using
6 variables
Data acquisition & Model building
NOC data : 120 observations => NLPCA model building
Fault data : another 120 observations
FI
FI
9
8
FI
1
A
JI
CW S
XA
TI
FI
7
2
D
Condenser
13
PI
LI
CW R
A
N
A
L
Y
Z
E
R
FI
3
E
SC
TI
5
PI
PI
FI
LI
10
CW S
XA
XB
XC
XD
XE
XF
A
N
A
L
Y
Z
E
R
Digital Media Lab
XB
XC
XD
XE
XF
XG
XH
Vap /liq
separator
6
TI
12
FI
CW R
Stripper
TI
TI
FI
LI
Reactor
Stm
drift
Cond
FI
FI
C
Purge
Compressor
4
11
A
N
A
L
Y
Z
E
R
XD
XE
XF
XG
XH
Product
16
Model Building
Principal curve fitting
Logo
Auto-associative
N.N. using 2 MLP’s
5 iterations
1st MLP N.N.
30 iterations
50 iterations
Digital Media Lab
2nd MLP N.N.
17
Monitoring Result
Logo
Fault introduction
NLPCA model more efficient than LPCA model!!!
Digital Media Lab
18
Conclusion
Result
Logo
Fault Detection performance was enhanced in terms
of both speed and accuracy when applied to a test
case.
Future work
Integration of ‘Fault Diagnosis’ and ‘Fault Isolation’
methods to perform complete process monitoring on
a single platform.
Digital Media Lab
19