Download modified_final_Intelligent Outlier Detection using Online

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Corecursion wikipedia , lookup

Data assimilation wikipedia , lookup

Error detection and correction wikipedia , lookup

Machine learning wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Fault tolerance wikipedia , lookup

Transcript
Intelligent Outlier Detection for HVAC System Fault Detection
Ying Guo1*, Davood Dehestani2, Jiaming Li1, Josh Wall3, Sam West3, Steven Su1
CSIRO ICT Centre, Sydney, Australia
2
UTS, Sydney, Australia
3
CSIRO Division of Energy Technology, Newcastle, Australia
1
*
Corresponding email: [email protected]
Keywords: HVAC systems, detection, energy performance, machine learning.
1 Introduction
This paper proposes methods for detecting
outliers in order to improve real-time fault
detection of HVAC systems. Detecting faults
properly can significantly improve energy
efficiency, reduce maintenance costs, and
improve human comfort. However, there often
exist outliers in measured data which mislead
the training process for fault detection of HVAC
systems.
We propose a novel intelligent outlier detection
approach using online soft-margin support
vector machine, which introduces the slack
variables to measure the degree of
misclassification of the training samples. Nonlinear penalty functions have then been used to
reduce the effect of outliers on the classifier. In
addition, we applied an on-line incremental
SVM to cope with large dataset in real time.
Based on this online training procedure, a semiunsupervised fault detection method is
implemented which can detect new unknown
faults (outliers) from the training datasets in real
time.
2 Incremental-decremental Algorithm of
SVM
To identify the outliers within the training
datasets, we need to investigate methods with
real time operation capability and requiring less
training data. Support Vector Machine (SVM)
has been extensively studied in data mining and
machine learning communities for the last two
decade (Vapnic 1995). A SVM model is
equivalent to a two-layer, perceptron neural
network. With using a kernel function, SVM is
an alternative training method for multi-layer
perceptron classifiers in which the weights of
the network are identified by solving a quadratic
programming problem under linear constraints,
rather than by solving a non-convex
unconstrained minimization problem. Liang and
Du (2007) studied fault detection for HVAC
systems by using standard SVM (off-line). It is
required to solve a quadratic programming (QP)
for the training of a SVM. However, standard
numerical techniques for QP are infeasible for
very large datasets which is the situation for
fault detection and isolation for HVAC systems
(Thongkam et. al. 2008). By using online SVM,
the large-scale classification problems can be
implemented in real time configuration under
limited hardware and software resources. In this
paper, incremental SVM (on-line) has been
applied for outlier detection in training datasets.
The main advantages of SVM include the usage
of kernel trick (no need to know the non-linear
mapping function), the global optimal solution
(quadratic problem), and the generalization
capability obtained by optimizing the margin.
However, for very large datasets, standard
numeric techniques for QP become infeasible.
Training an SVM incrementally on new data by
discarding all previous data except their support
vectors, gives only approximate results. An online alternative, that formulates the (exact)
solution for M+1 training data in terms of that
for M data and one new data point, is presented
in online incremental method. Cauwenberghs
and Poggio (2001) consider incremental learning
as an exact on-line method to construct the
solution recursively, one point at a time. The
key is to retain the Kuhn-Tucker (KT)
conditions on all previous data, while
adiabatically adding a new data point to the
solution. Leave-one-out is a standard procedure
in predicting the generalization power of a
trained classifier, both from a theoretical and
empirical perspective (Vapnic 1995).
3 Implementation of Online SVM for
Outlier Detection
Figure 1 shows the proposed fault detection
scheme by using incremental-decremental
support vector machine classification. The
system can detect unknown faults existed in the
training datasets by monitoring key HVAC
variables during system operation.
In this algorithm existing faults can be detected
(as unknown new faults) by comparing with the
outputs of the healthy model and the real
system. If detected fault was similar to known
fault, it will be categorized by algorithm as
existing faults. Otherwise, this data is sent to
online SVM trainer for training for the new
fault. Finally new fault will be isolate by this
online SVM as a known fault. The incremental
procedure is reversible and decremental
unlearning of each training sample produces an
exact leave-one-out estimate of faults with using
all HVAC data during its operating.
Figure 1: Schematic of semi unsupervised
outlier detection with online SVM.
As mentioned earlier, our proposed algorithm is
able to detect unknown faults in the training
datasets by semi-unsupervised learning process.
To testing semi-unsupervised performances, an
unknown sudden fault is imposed in the system
at one test. The detection results are shown in
Figure 2. It is clearly indicated that the margin
changes from high level to low level when
detecting incipient outlier. For unknown fault
(outliers) this change is dramatic as unknown
fault is abrupt type. To efficiently optimizing
training process, samples in each normal/faulty
condition should be applied. A group containing
of maximum faulty (outliers) training samples is
selected, and applied for training. From Figure
2, it is found that. The experimental results
shows that the designed SVM classifier can
identify the HVAC unknown fault (outlier)
accurately, and the unknown faults can also be
detected efficiently.
Figure 2: Sudden unknown faults (outliers) in
the training data detected as the margin shown.
4 Discussion
The main advantage of this algorithm is usage of
only a range of useful data (including healthy
data, old faults, and new faults) instead of whole
data sets. The computation cost can be reduced
dramatically, hence it can be able to detect new
unknown faults (outliers) in real time.
Furthermore, this online approach can more
efficiently train the fault detection modular by
throwing out unnecessary data and just used a
series of data with high priority regarding to
classification. This approach has been tested
based on real data from several commercial
HVAC systems, and can successfully isolate
outliers and detect HVAC system faults from
un-healthy datasets. The experimental results are
all very positive.
5 References
Cauwenberghs G. and Poggio T. 2001.
Incremental and decremental support vector
machine learning, Advances in Neural
Information Processing Systems. 13(13), pp.
409-415.
Liang J. and Du R. 2007. Model-based fault
detection and diagnosis of HVAC systems
using support vector machine method,
International Journal of Refrigeration. 30(6),
pp. 1104-1114.
Thongkam J., Xu G., Zhang Y., and Huang F.
2008. Support Vector Machine for outlier
detection in breast cancer survivability
prediction, APWeb 2008 Workshops, LNCS
4977, pp. 99-109.
Vapnic V. 1995. The nature of statistical
learning theory, Springer-Verlag.