Download Self-organizing maps for virtual sensors, fault detection and fault isolation in

Document related concepts
no text concepts found
Transcript
Self-organizing maps for virtual sensors,
fault detection and fault isolation in
diesel engines
Examensarbete utfört i Reglerteknik
vid Tekniska Högskolan i Linköping
av
Conny Bergkvist and Stefan Wikner
Reg nr: LiTH-ISY-EX–05/3634–SE
Linköping 2005
Self-organizing maps for virtual sensors,
fault detection and fault isolation in
diesel engines
Examensarbete utfört i Reglerteknik
vid Tekniska Högskolan i Linköping
av
Conny Bergkvist and Stefan Wikner
Reg nr: LiTH-ISY-EX–05/3634–SE
Supervisor: Johan Sjöberg (LiTH)
Urban Walter (Volvo)
Fredrik Wattwil (Volvo)
Examiner: Svante Gunnarsson (LiTH)
Jonas Sjöberg (Chalmers)
Linköping 18th February 2005.
Avdelning, Institution
Division, Department
Datum
Date
2005-02-11
Institutionen för systemteknik
581 83 LINKÖPING
Språk
Language
Svenska/Swedish
X Engelska/English
Rapporttyp
Report category
Licentiatavhandling
X Examensarbete
C-uppsats
D-uppsats
ISBN
ISRN LITH-ISY-EX--05/3634--SE
Serietitel och serienummer
Title of series, numbering
ISSN
Övrig rapport
____
URL för elektronisk version
http://www.ep.liu.se/exjobb/isy/2005/3634/
Titel
Title
Self-organizing maps for virtual sensors, fault detection and fault isolation in diesel
engines
Författare
Author
Conny Bergkvist and Stefan Wikner
Sammanfattning
Abstract
This master thesis report discusses the use of self-organizing maps in a diesel engine management
system. Self-organizing maps are one type of artificial neural networks that are good at visualizing
data and solving classification problems. The system studied is the Vindax(R) development system
from Axeon Ltd. By rewriting the problem formulation also function estimation and conditioning
problems can be solved apart from classification problems.
In this report a feasibility study of the Vindax(R) development system is performed and for
implementation the inlet air system is diagnosed and the engine torque is estimated. The results
indicate that self-organizing maps can be used in future diagnosis functions as well as virtual
sensors when physical models are hard to accomplish.
Nyckelord
Keyword
self-organizing maps, neural network, virtual sensor, diesel engine, fault detection, fault isolation,
automotive, development system
Abstract
This master thesis report discusses the use of self-organizing maps in a diesel engine
management system. Self-organizing maps are one type of artificial neural networks
that are good at visualizing data and solving classification problems. The system
R
studied is the Vindax
development system from Axeon Ltd. By rewriting the
problem formulation also function estimation and conditioning problems can be
solved apart from classification problems.
R
In this report a feasibility study of the Vindax
development system is performed and for implementation the inlet air system is diagnosed and the engine
torque is estimated. The results indicate that self-organizing maps can be used in
future diagnosis functions as well as virtual sensors when physical models are hard
to accomplish.
Keywords: self-organizing maps, neural network, virtual sensor, diesel engine,
fault detection, fault isolation, automotive, development system
i
ii
Preface
The thesis has been made by two students, one student from Chalmers University of
Technology and one from Linköpings University. For practical purposes this report
exist in two versions; one at Chalmers and one at Linköping. Their contents are
identical, only the framework differs slightly to meet each university’s rules.
iii
iv
Acknowledgments
Throughout the thesis work a number of persons have been helpful and engaged in
our work. We would like to thank our instructors Fredrik Wattwil and Urban Walter
at Engine Diagnostics, Volvo Powertrain for their support and engagement. In addition we thank instructors and examiners; Svante Gunnarsson and Johan Sjöberg
at Linköpings University and Jonas Sjöberg at Chalmers University of Technology. Also the support team from Axeon Ltd - Chris Kirkham, Helge Nareid, Iain
MacLeod and Richard Georgi - and the project coordinator, Carl-Gustaf Theen,
as well as the other employees at the Engine Diagnostics group have been to great
help for the success of this thesis.
v
vi
Notation
Symbols
x, X
Boldface letters are used for vectors and matrices.
Abbreviations
ABSE
ANN
DC
EMS
ETC
PCI
RMSE
SOM
VDS
VP
ABSolute Error
Artificial Neural Network
Dynamic Cycle
Engine Management System
European Transient Cycle
Peripheral Component Interconnect
Root-Mean-Square Error
Self-Organizing Map
R
Vindax
Development System
Vindax R Processor
vii
viii
Contents
1 Introduction
1.1 Background . . . . .
1.2 Problem description
1.3 Purpose . . . . . . .
1.4 Goal . . . . . . . . .
1.5 Delimitations . . . .
1.6 Method . . . . . . .
1.7 Thesis outline . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Background theory
2.1 Self-organizing maps . . . . . .
2.2 Diesel engines . . . . . . . . . .
2.2.1 The basics . . . . . . . .
2.2.2 Four-stroke engine . . .
2.2.3 Turbo charger . . . . . .
2.2.4 What happens when the
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
gas pedal is
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
pushed?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
. 5
. 8
. 8
. 9
. 9
. 10
R
3 The Vindax
Development System
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Labeling step . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Classification step . . . . . . . . . . . . . . . . . . . . . . .
3.2 Conditioning of input signals . . . . . . . . . . . . . . . . . . . . .
3.3 Classification of cluster data . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Results and discussion . . . . . . . . . . . . . . . . . . . . .
3.4 Function estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Results and discussion . . . . . . . . . . . . . . . . . . . . .
3.5 Handling dynamic systems . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Results and discussion . . . . . . . . . . . . . . . . . . . . .
3.6 Comparison with other estimation and classification methods . . .
3.6.1 Cluster classification through minimum distance estimation
ix
1
1
2
2
3
3
3
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
14
15
16
16
19
20
20
21
25
25
26
29
29
x
Contents
3.6.2
Function estimation through polynomial regression . . . . . . 30
4 Data collection and pre-processing
4.1 Data collection . . . . . . . . . . .
4.1.1 Amount of data . . . . . . .
4.1.2 Sampling frequency . . . .
4.1.3 Quality of data . . . . . . .
4.1.4 Measurement method . . .
4.2 Pre-processing . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
32
32
33
34
5 Conditioning - Fault detection
37
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Classification - Fault detection and isolation
6.1 Introduction . . . . . . . . . . . . . . . . . . .
6.2 Method . . . . . . . . . . . . . . . . . . . . .
6.2.1 Leakage isolation . . . . . . . . . . . .
6.2.2 Reduced intercooler efficiency isolation
6.3 Results and discussion . . . . . . . . . . . . .
.
.
.
.
.
41
41
41
42
43
44
7 Function estimation - Virtual sensor
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
47
48
8 Conclusions and future work
8.1 Conclusions . . . . . . . . .
8.1.1 Conditioning . . . .
8.1.2 Classification . . . .
8.1.3 Function estimation
8.2 Method criticism . . . . . .
8.3 Future Work . . . . . . . .
51
51
52
52
53
53
53
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
55
A RMSE versus ABSE
57
B Measured variables
59
C Fault detection results
C.1 90 % activation frequency
C.2 80 % activation frequency
C.3 70 % activation frequency
C.4 60 % activation frequency
resolving
resolving
resolving
resolving
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
63
64
65
Contents
xi
C.5 Most frequent resolving . . . . . . . . . . . . . . . . . . . . . . . . . 66
xii
Contents
Chapter 1
Introduction
This chapter gives the background, the purpose, the goal, delimitations, used methods and the outline of the thesis.
1.1
Background
The demand for new methods and technologies in the Engine Management System,
EMS, is increasing. Laws that govern the permissible level of pollutants in the
exhaust of diesel engines have to be followed at the same time as the drivers demand
high performance and low fuel consumption, putting the EMS under pressure.
Applications in the EMS are based on a number of inputs measured by sensors
and estimated by models. Limitations of these sensors and models create situations
where qualitative input signals cannot be obtained. Adding new sensors is expensive
and creates additional need for diagnostics. Together, this forces engineers to look
at new methods to attain these values or to find ways around the problem.
Identifying models using Artificial Neural Networks, ANNs, can be one solution.
A lot of research has been put into this area. The increasing number of applications with implemented ANNs indicates that the technology could be of use in an
EMS. There are development systems available to produce hardware and software
R
applications using ANNs to be deployed in the EMS. The Vindax
Development
1
System, VDS, by Axeon Limited is an example of this. The system may be capable
of solving problems within the EMS.
The thesis work has been performed at Volvo Powertrain Corporation. The
diagnosis group want to know if the VDS can be used to help in the area of
diagnosing diesel engines.
1 www.axeon.com
1
2
Introduction
1.2
Problem description
Three different kinds of problems are investigated in this thesis:
Conditioning - Fault detection Conditioning a system is one way of determining if the system is fault free. Assume that a system has states x ∈ Φ, where x
denotes the system states and Φ denotes a set, if it is fault free. Determining
if x ∈ Φ is a conditioning problem.
Classification - Fault isolation There can be different kinds of faults that needs
to be isolated. Assume that a system with states x is fault free if x ∈ Φ0 .
Assume also that k different classes of errors can occur and that each error
causes the states to belong to one of the sets Φ1 , Φ2 , . . . Φk . Determining
which set x belongs to is a classification problem.
Function estimation - Virtual sensor A system generates output according to
y = f (u) ∈ <. This output can be measured by sensors or obtained by
determining the function f . In complex systems, e.g. an engine, it may be hard
to determine f exactly. Finding an estimation of f is a function estimation
problem.
1.3
Purpose
The aim for this master thesis is to investigate a neural network application, based
on Self-Organized Maps, SOM. The main purpose is to investigate how well such a
neural network works compared to traditional software applications used for control
of real time functions used in EMSs. The actual environment to be investigated is
the VDS. The thesis should:
• Evaluate the training process of the system regarding:
– Time requirements
– Size and type of source measurement data needed
• Evaluate the strengths and weaknesses of the system
• Evaluate the performance and capabilities of the VDS to solve problems described in section 1.2.
1.4 Goal
1.4
3
Goal
To fulfill the purpose, the thesis goal is divided into three parts:
• Estimate the output torque with better accuracy than the value estimated
by the EMS
• Detect air leakages in the inlet air system as well as a reduced intercooler
efficiency with accuracy sufficient to make a diagnosis
• Isolate the same errors sufficient accurate to make a diagnosis
All experiments are performed on a Volvo engine using the VDS.
1.5
Delimitations
R
The evaluation of the VDS will be based upon a PCI2 -card version of the Vindax
hardware on a Windows PC.
For the fault detection and isolation problems data from one single engine individual are used. However, two different types of test running cycles are used. For
function estimation data from one single type of test cycle are used, but the test
cycle is run on two different types of engines. For all problems, the data used for
verification are of the same kind of data as for training.
1.6
Method
To fulfill the purpose and reach the goal of this thesis, a feasibility study of the
VDS is performed. The study reveals basic properties of the system and it will be
a solid base for the continued work. With the feasibility study as base, the actual
development of the applications takes place.
2 Peripheral
PC.
Component Interconnect, PCI, is a standard for connecting expansion cards to a
4
1.7
Introduction
Thesis outline
R
Chapter 2 Introduction to the theory behind the Vindax
Processor, VP, used
in the VDS; self-organizing maps. The diesel engine is also described to give
a sense of the environment where the algorithm is going to be deployed.
Chapter 3 A feasibility study of the VDS and its three main application areas;
conditioning, classification and function estimation.
Chapter 4 Here the collection and pre-processing of data is discussed. Important
issues as representability of data are handled, both in general and specific
terms.
Chapter 5-7 Each of these chapters describes the development of an application.
These are very specific chapters and presents results that can be expected to
be achieved with the VDS.
Chapter 8 Ends the report with conclusions and recommendations for future
work.
Chapter 2
Background theory
This chapter gives a short introduction to self-organizing maps and diesel engines
for readers with no or little knowledge about these areas.
2.1
Self-organizing maps
Professor Kohonen developed the SOM algorithm in its present form in 1982 and
presents it in his book [1]. The algorithm maps high-dimensional data onto a low
dimensional array. It can be seen as a conversion of statistical relationships into
geometrical, suitable for visualization of high-dimensional data. As this is a kind
of compression of information, preserving topological and/or metric relationships
of the primary data elements, it can also be seen as a kind of abstraction. These
two aspects make the algorithm useful for process analysis, machine perception,
control tasks and more.1
A SOM is actually an ANN categorized as a competitive learning network. These
kinds of ANNs contain neurons that, for each input, compete over determining the
output. Each time the SOM is fed with an input signal, a competition takes place.
One neuron is chosen as the winner and decides the output of the network.
The algorithm starts with initiation of the SOM. Assume that the input variables, ξi , are fed to the SOM as a vector u = [ξ1 , ξ2 . . . ξn ]T ∈ <n . This defines
the input space as a vector of dimension n. Each neuron is then associated with a
reference vector, mi ∈ <n , that gives the neuron an orientation in the input space.
The reference vectors are finally, often randomly, given an initial value.
Then the algorithm carries out a regression process to represent the available
inputs. For each sample of the input signal, u(t), the SOM-algorithm selects a
winning neuron. To find the winner, the distance between each neuron and the
1 It is formally described, by the inventor himself, as ([1], p 106) “a nonlinear, ordered, smooth
mapping of high-dimensional input data manifolds onto the elements of a regular low-dimensional
array.”
5
6
Background theory
input sample is calculated. According to equation (2.1)2 the winner mc (t) is the
neuron closest to the input sample.
||u(t) − mc (t)|| ≤ ||u(t) − mi (t)|| ∀ i
(2.1)
Here t is a discrete time-coordinate of the input samples.
When a winner is selected, the reference vectors of the network are updated.
The winning neuron and all neurons that belong to its neighborhood are updated
according to equation (2.2).
mi (t + 1) = mi (t) + hc,i (t)(u(t) − mi (t))
(2.2)
The function hc,i (t) is a ‘neighborhood function’ decreasing with the distance
between mc and mi . The neighborhood function is traditionally implemented
as a Gaussian (bell-shaped) function. Also for convergence reasons, hc,i (t) →
0 when t → ∞. This regression distributes the neurons to approximate the probability density function of the input signal.
To give a more qualitative description of the SOM-algorithm an example is considered. The input signal, generated by taking points on the arc of a half circle3 , is
shown in figure 2.1. This is a signal of dimension two that the SOM is going to map.
The SOM has in this case a size of 16 neurons, i.e. an array 4 by 4. To illustrate
the learning process, the array is imagined as a net, with knots, representing neurons, and elastic strings keeping a relation to each knot’s neighbors. For initiation,
the SOM is distributed over the input space according to figure 2.1. The reference
vectors are randomized in the input space (using a Gaussian distribution).
During training, a regression process adjusts the SOM to the input signal. For
each input, the closest neuron is chosen to be the winner. When a winner is selected,
it and its neighbors are adjusted towards the input. This can be seen as taking the
winning neuron and pulling it towards the input. The elastic strings in the net
causes the neighbors also to adjust, following the winner. Referring to equation
(2.2) the neighborhood-function, hc,i (t), determines the elasticity of the net.
The regression continues over a finite number of steps to complete the training
process. In figure 2.1 network shapes after 10 and 100 steps of training are illustrated. As can be seen a nonlinear, ordered and smooth mapping of the input signal
is formed after 100 steps. The network has formed an approximation of the input
signal.4
2 In equation (2.1) the norm used usually is the Euclidean (2-norm), but other norms can be
used as well. The VDS uses the 1-norm.
3 Although there is probably no physical system giving this input, it suites the purpose to
illustrate the SOM-algorithm.
4 Additional training has no significant affect on the structure of the neurons.
2.1 Self-organizing maps
7
Figure 2.1. Illustration of the SOM-algorithm. Each step involves presenting one input
sample for the SOM. The network has started approximating the distribution of the input
signal already after 10 steps. After 100 steps the network has formed an approximation
of the input signal.
8
2.2
Background theory
Diesel engines
It is not crucial to know how diesel engines work to read this report, but some
knowledge might increase the understanding why certain signals are used and how
they affect the engine.
This thesis will only discuss the four-stroke diesel engine because that is the
type that is used in Volvo’s trucks. To learn more about engines, both diesel and
Otto (e.g. petrol), see e.g. [2], [3].
2.2.1
The basics
A diesel engine, or compression ignited engine as it is sometimes called, converts
air and fuel through combustion into torque and emissions.
Figure 2.2. A picture from the side of one cylinder in a diesel engine.
Figure 2.2 shows the essential components in the engine. The air is brought
into the engine through a filter to the intake manifold. There the air is guided
into the different cylinders. In the cylinder the air is compressed and when the
fuel is injected the mixture self ignites. The combustion results in a pressure that
generates a force on the piston. This in turn is transformed into torque on the
crankshaft that makes it rotate. Another result of the combustion is exhaust gases
that go through the exhaust manifold, the muffler and out into the air.
In this thesis the measured intake manifold pressure and temperature are called
boost pressure and boost temperature respectively.
2.2 Diesel engines
2.2.2
9
Four-stroke engine
The engine operates in four strokes (see figure 2.3):
Figure 2.3. The four strokes in an internal combustion engine.
1. Intake stroke. (The inlet valve is open and the exhaust valve is closed) The
air in the intake manifold is sucked into the cylinder while the piston moves
downwards. In engines equipped with a turbo charger, see section 2.2.3, air
will be pressed into the cylinder.
2. Compression stroke. (Both valves are closed) When the piston moves upwards the air is compressed. The compression causes the temperature to rise.
Fuel is injected when the piston is close to the top position and the high
temperature causes the air-fuel mixture to ignite.
3. Expansion stroke. (Both valves are closed) The combustion increases the
pressure in the cylinder and pushes the piston downwards, which in turn
rotates the crankshaft. This is the stroke where work is generated.
4. Exhaust stroke. (The exhaust valve is open and the inlet valve is closed)
When the piston moves upwards again it pushes the exhaust gases out into
the exhaust manifold.
After the exhaust stroke it all starts over with the intake stroke again. As a result
of this, each cylinder produces work during one stroke and consumes work (friction,
heat, etc.) during three strokes.
2.2.3
Turbo charger
The amount of work, and in the end the speed, generated by the engine depends
on the amount of fuel injected, but also the relative mass of fuel and oxygen is
important. For the fuel to be able to burn there must be enough of oxygen. From
the intake manifold pressure the available amount of oxygen is calculated and in
10
Background theory
turn, together with the wanted work, the amount of fuel injected is calculated. So,
a manifold pressure that is not high enough might lead to lower performance.
Figure 2.4. A schematic picture of a diesel engine.
To get more air, and with that more oxygen, into the cylinders, almost all large
diesel engines of today use a turbo charger. This will increase the performance of
the engine as more fuel can be injected. The turbo charger uses the heat energy
in the exhaust gases to rotate a turbine. The turbine is connected to a compressor
(see figure 2.4) that pushes air from the outside into the cylinders.
One effect of the turbo charger is that the compressor increases the temperature
of the air. When the air is warmer it contains less oxygen per volume. An intercooler
can be used to increase the density of oxygen and by that increase the performance
of the engine. After the intercooler the air has (almost) the same temperature as
before going through the compressor but at a much higher pressure. From the ideal
gas law, d = m/V = p/RT , we get that the density has increased. This way we can
get more air mass into the same cylinder volume only using energy that otherwise
would be thrown away.
2.2.4
What happens when the gas pedal is pushed?
When the driver pushes the pedal the engine produces more torque. Simplified, the
process looks like:
1. The driver pushes the pedal.
2. More fuel is injected into the cylinders.
3. More fuel yields a larger explosion which leads to more heat and in turn more
pressure in the cylinder. (Ideal gas law)
2.2 Diesel engines
11
4. The increased pressure pushes the piston harder which leads to higher torque.
But it does not stop here; after a little while the turbo ’kicks’ in.
5. The increased amount of exhaust gases together with the increased temperature accelerate the turbine.
6. The increased speed of the turbine makes the compressor push more air into
the intake manifold and the pressure rises.
7. This leads to more air in the cylinders.
8. With more air in the cylinders more fuel can be used in the combustion which,
in turn, leads to an even larger explosion and even more torque.
The final four steps is repeated until a steady state is achieved. This is a rather
slow process which takes a second or two before steady state is achieved.
12
Background theory
Chapter 3
The Vindax Development
System
R
In this chapter the VDS by Axeon Limited is introduced with a feasibility study. To
be able to visualize and easily understand the way the VDS works, low dimensional
inputs are used to create illustrative examples. To finish this chapter the VDS is
compared with other systems trying to solve the same problems.
In this chapter the variables y and u denote the output signal and the input
signal respectively.
3.1
Introduction
The VDS deploys a SOM algorithm in hardware through a software development
system. The hardware system, used for this thesis, consists of a PCI version of the
VP. It consists of 256 RISC processors, each representing a neuron in a SOM. The
architecture allows for the computations to be done in parallel, which decreases the
working time substantially. The dimension of the input vectors, i.e. the memory
depth, is currently limited to 16.
Configuration of the processor is quite flexible. It can be divided, in up to
four separate networks that can work in parallel, individually or in other types
of hierarchies. There is also a possibility to swap the memory (during operation),
enabling more than one full size network to run in the same processor.
The software is simple to use and provides good possibilities to handle and
visualize data. It also includes some data pre-processing tools as well as output
functions. In the software system, the VP is used with networks sizes of 256 neurons and smaller. To test larger networks, a software emulation mode, where the
maximum size is 1024 neurons and the memory depth is 1024, is available. The
emulation mode do not use the VP and therefore increases the calculation times
considerably.
13
R
The Vindax
Development System
14
The VDS is able to solve classification problems, described in section 1.2, according to section 3.3. Rewriting of the conditioning and function estimation problems,
enables a solution to these problems as well. Performing this is described in sections
3.2 and 3.4.
Development of applications using the VDS involves 4 steps:
1. Data collection and pre-processing
2. Training
3. Labeling
4. Classification
Data collection and pre-processing are described in chapter 4. During the training step the network is organized using the SOM algorithm presented in section 2.1.
The labeling and classification steps are described below.
3.1.1
Labeling step
The training has formed an approximation of the input signal distribution and this
approximation should now be associated with output signals, be labeled.
The labeling step uses the same input data as the training step, together with
measured output data. It requires the correct output value to be known for each
input. The neuron responding to a particular input sample should have an output
value matching the measured output value. Presenting the data to the network
makes it possible to assign values for each neuron.
As there are more input samples than neurons, each neuron will have many
output value candidates. If these candidates are not identical, which is most often
the case, some kind of strategy for choosing which one to use as label is needed.
Either, it is chosen manually for each neuron, or one of the automatic methods of
the VDS is used. There are different kinds of methods suitable for different kinds
of problems and they are discussed in sections 3.2-3.4 in its contexts.
3.1.2
Classification step
If correctly labeled, the network can be used for mapping new inputs using the
classification step. This is done in a very straightforward manner and can be seen
as function evaluation. For each new input a winning neuron is selected as described
in section 2.1. The label of the winner is the output of the network.
3.2 Conditioning of input signals
3.2
15
Conditioning of input signals
To solve a conditioning problem, according to section 1.2, the distance measurement
has to be involved. As discussed in section 2.1, the distance from the input sample
to each neuron is calculated when a winner is selected. It is this distance that can
be used for conditioning of signals.
The idea is to save the maximum distance, between the neurons and the input
signals, that occurs when the network is presented with input signals from a fault
free system. An error has probably occurred if the maximum distance is exceeded
when new inputs are presented to the network.
To do this, the network is first trained on data from a system with no faults.
After that, the same data are fed to the network again and the distance between
the input signal and the winning neuron is saved for each sample. The maximum
distance, for each neuron, is used as label.
During the classification step, the label value of the winning neuron is subtracted
from the distance between this neuron and the current sample to get the difference
between the distance of the current sample and the maximum distance that occured
during the training. Previously unseen data have been presented to the VP when
the difference is larger than zero. Assuming that the network has been trained on
representative (see 4.1) data, this implies that an error has probably occurred.
Applying this on the real problem is as simple as on a constructed one. Therefore
see chapter 5 for a real world example that illustrates the use of this technique.
R
The Vindax
Development System
16
3.3
Classification of cluster data
To explore the capabilities to solve classification problems, see section 1.2, with
VPs, an example with clusters in three dimensions is made.
3.3.1
Introduction
Three, Gaussian, clusters are created with the mean values in three points. They
are labeled: blue, red and black. Different variances are used to create two sets of
input data, shown in figures 3.1-3.2. Printed in black and white it is hard to see
which cluster the data points belong to. The purpose of the figures is however to
show how the clusters are distributed.
Figure 3.1. First cluster setup with every tenth sample of the data plotted from different
angles.
The task is to take an input signal, composed of the three coordinates for each
point, and classify it as blue, red or black. To do this, the VP is first trained (using
all 256 neurons). This creates an approximation of the input signal distribution
that, hopefully, contains three regions. This can be visualized by looking at the
distance between neurons using the visualization functionality in the VDS. This
is very useful when the input signal has more than three dimensions, as it is not
possible to visualize the actual data itself. In this way it is possible to get an idea of
how the neurons are distributed over the input space also when working in higher
dimension. Then the SOM is labeled to be able to give some output. This is the
part where settings for the classification are made.
3.3 Classification of cluster data
17
Figure 3.2. Second cluster setup with every tenth sample of the data plotted from
different angles.
When labeling the SOM, there will be conflicts when neurons are associated
with more than one input. For example, if a neuron is associated with inputs with
label red 60 times and black 40 times the neuron will be named multi-classified
after the labeling process in the VDS. This situation has to be resolved and there
are different methods for this.
In the VDS, there are six different multi-classification resolving methods:
1. Most frequent
2. Activation frequency threshold
3. Activation count threshold
4. Activation frequency range threshold
5. Activation count range threshold
6. Manually
The first two are most appropriate for the situations occurring in this thesis, which
is why the other methods not are handled here.1
1 The third method is similar to the second one but harder to use as the number of activations
depend on the amount of data. There is no reason to use a range (method 4-5) instead of a
threshold as the neurons with a high number of activations definitely should be labeled. Finally
the manual method is very time-consuming and does not really provide an advantage compared
to the first two methods.
18
R
The Vindax
Development System
Using the first method makes the VDS assign neuron labels that occurred most
frequently. In the example above, this would mean that the neuron would be classified as red.
With the second option it is possible to affect the accuracy of the classification.
There will be a trade-off between the number of unclassified inputs and the accuracy
of the ones classified. If the activation frequency threshold is set to a high value,
say 90 percent, the classification will be accurate but there can be many inputs
that are classified as unknown. In the example above, this would mean that the
neuron is not classified and the threshold has to be lowered to 60 percent to classify
the neuron as red.
Figure 3.3. Possible label regions for input classification when using data from figure 3.2
as input.
This is visualized in figure 3.3. In between the clusters there are inputs that
belong to one cluster but are situated closer to another cluster center. For example,
there are many inputs in figure 3.2 that belong to the red cluster but are closer
to the black cluster center (and vice versa). Depending on how high the frequency
threshold is set for the labeling, these inputs will be classified as black or not
classified at all.
With a high threshold there will be many neurons without a label. This causes
the unclassified region to be big and the accuracy, of those classified, to be high.
The drawback of this is that many inputs will not be classified at all, e.g. trying to
classify between an engine with no faults and an engine with air leakage will return
many answers saying nothing.
3.3 Classification of cluster data
3.3.2
19
Results and discussion
The results from using the VDS to perform this task are summarized in tables 3.13.4.
True value blue
True value red
True value black
Classified
as blue
95.74%
0.00%
0.00%
Classified
as red
0.00%
95.72%
0.00%
Classified
as black
0.06%
0.06%
98.41%
Not
classified
4.20%
4.16%
1.59%
Table 3.1. Classification on first cluster (figure 3.1) data with no multi-classification
resolution.
True value blue
True value red
True value black
Classified
as blue
99.75%
0.13%
0.07%
Classified
as red
0.00%
99.74%
0.50%
Classified
as black
0.26%
0.13%
99.42%
Table 3.2. Classification on first cluster (figure 3.1) data using most frequent as multiclassification resolution.
True value blue
True value red
True value black
Classified
as blue
77.72%
0.07%
0.28%
Classified
as red
0.13%
84.13%
0.28%
Classified
as black
0.19%
0.13%
70.25%
Not
classified
21.96%
15.68%
29.19%
Table 3.3. Classification on second cluster (figure 3.2) data with no multi-classification
resolution.
True value blue
True value red
True value black
Classified
as blue
96.76%
0.59%
2.46%
Classified
as red
1.10%
96.80%
1.12%
Classified
as black
2.14%
2.61%
96.42%
Table 3.4. Classification on second cluster (figure 3.2) data using most frequent as multiclassification resolution.
Using no multi-classification resolution, i.e. a frequency threshold of 100 percent,
gives high accuracy but many unclassified samples. Instead, using most frequent
resolution gives a high classification rate but with less accuracy. Which one to use
depend on the application where it should be used.
These two are extremes where no multi-classification resolution leaves many
neurons without a label whereas the most frequent resolution method labels all
neurons, except for the ones with equally many activations from two or more labels.
The frequency threshold can be lowered to get a result that would lie somewhere
in between these. I.e. a little lower accuracy but a little lower not classified signals.
R
The Vindax
Development System
20
3.4
Function estimation
This section investigates the capabilities to estimate functions, see section 1.2, with
the VDS.
3.4.1
Introduction
First a simple linear system is created:
y1 [t]
= 600u1 [t] + 750u2 [t],
(3.1)
with uniformly distributed random inputs u1 , u2 ∈ [0, 2]. These are used to train
the VP.
This is a very simple problem; parameter estimation. It is questionable whether
to use a neural network to solve this as simpler methods will probably produce
better results. It is however suitable to illustrate the characteristics of the VDS.
The labeling of the VP will be done by, for each neuron, choosing the mean
value of all label candidates. This is the most common method to select labels
when estimating functions. It is appropriate as there will be a very large number
of labels, i.e. measured outputs, and choosing the mean value will give a nice
approximation.
Three network sizes are used to estimate the system:
- 64 neurons
- 256 neurons, the physical limit of the VP
- 1024 neurons, the limit of the software emulation mode
Increasing the number of neurons also increases the number of output values (one
output value per neuron). Used on the same problem a larger network should lead
give a higher output resolution hence reduce the errors.
As a second step, noise is introduced to the system. The network is going to be
trained and labeled using noisy signals. The classification can then be compared
with the true output values to see whether the network is performing worse or if it
is a robust method.
The noise, band-limited white noise, is added to the input and output signals.
Different amplitudes will be used, as the input signals are much weaker than the
output. The amplitude is chosen according to table 3.5.
Signal
u1
u2
y
Noise-amplitude
5 · 10−4
5 · 10−4
100
Table 3.5. Noise amplitudes
3.4 Function estimation
21
Three non-linear systems are compared to the linear system. These are:
y2 [t]
y3 [t]
2
= 1200 · atan(u1 [t]) + |18 · u2 [t]|
√
= 1200 · atan(u1 [t]) + 100 · e2· u2 [t]
(3.2)
(3.3)
plus a third system identical to system (3.2) except for a backlash with a dead-band
width of 200 and an initial output of 10002 , that is applied on the output signal.
All systems have (different) uniformly distributed, random inputs, u1 , u2 ∈ [0, 2].
3.4.2
Results and discussion
The results are summarized in table 3.6 with Root Mean Square Errors, RMSEs,
and Absolute Errors, ABSEs for the estimations. See appendix A for details of
RMSE and ABSE.
System
System (3.1)
64 neurons
256 neurons
1024 neurons
256 neurons
(noisy signal)
System (3.2)
without backlash
with backlash
System (3.3)
RMSE
mean ABSE
max ABSE
70.9
35.3
18.5
59.2
29.6
15.6
187.8
93.2
51.5
254.0
199.2
1069
39.8
83.4
41.2
31.2
68.4
33.3
164.5
283.1
148.4
Table 3.6. Statistics after classification with VDS on the four systems examined in this
section. Systems (3.2) and (3.3) are all with 256 neuron networks and without noise.
With a SOM of 256 neurons a result shown in figure 3.4 is achieved. The estimated output values in the figure, are gathered at a number of levels. I.e. the output
from the VP is discrete. It is hard to see in the figure, but there are actually 256
different levels, corresponding to the number of neurons in the SOM.
Looking at how the SOM is distributed over the input space, the output levels
are denser towards the middle. When trained, the SOM approximates the probability distribution of the input signal. In this example two uniformly distributed
signals are used as input. Together these inputs have a higher probability for values
in the middle range of the space. Extreme values have less probability and hence
the look of figure 3.4.3
R
R
Matlab
/Simulink
help for details
classical example in probability theory is throwing dices. Each throw is uniformly distributed among the results. Taking two throws in a row, the probability to get an outcome of 6 is
higher than the probability to get for example 2. That is very similar to the problem that is
approached here.
2 See
3A
22
R
The Vindax
Development System
Figure 3.4. Correlation plot with estimated values compared to the true values of system (3.1) using a network of 256 neurons. The straight line shows the ideal case where
the network has infinitely many neurons and is optimally trained and labeled.
Figure 3.5. Correlation plot with estimated values compared to the true values using a
network of 1024 neurons applied on system (3.1). The straight line shows the ideal case
where the network has infinitely many neurons and is optimally trained and labeled.
3.4 Function estimation
23
Comparing figure 3.4 with figure 3.5, the horizontal lines are closer together and
also shorter when the SOM is larger. This indicates smaller errors and the values
in table 3.6 confirms this.
The horizontal lines in figure 3.4 are approximately twice the length of the lines
in figure 3.5. The RMSE and the mean ABSE are also twice as big with a 256
neuron SOM, see table 3.6.
Looking at the results when using only 64 neurons shows that these values are,
approximately, twice as large as for 256 neurons. All together the results indicates
that, for this problem, the error size is reduced with the square-root of the network
size. This is probably problem dependent and more tests are needed to reveal real
relationship.
Figure 3.6. Correlation plot with estimated values compared to the true values using a
SOM of 256 neurons with noisy signals applied on system (3.1). The straight line shows
the ideal case where the SOM has infinitely many neurons and is optimally trained and
labeled.
As noise is introduced to the problem, the SOM performs a lot worse. Figure 3.6
shows the correlation plot and the estimation error is much bigger now. In comparison to the noise free estimation error, there is a six times higher error made
by the SOM now. The horizontal lines are distributed the same way compared to
figure 3.4, but they are longer. This points out that there are more inputs that are
misclassified now than earlier, increasing the error. This concurres with the values
in table 3.6.
Looking into non-linearities, the results from the systems (3.2) and (3.3) do
not differ much from the linear case. This can also be seen when comparing the
topmost graph in figure 3.7 with figure 3.4. There is no big difference between a
24
R
The Vindax
Development System
Figure 3.7. Correlation plot between true and estimated values for system (3.2). Without
backlash (top figure) and with backlash (bottom figure). The bottom figure definitely
shows worse results than the top figure. The straight line shows the ideal case where the
network has infinitely many neurons and is optimally trained and labeled.
linear and non-linear system as long as it is static. The reason for this is that the
SOM does a mapping.
The addition of the backlash to system (3.2) decreases the VDS ability to model
the system as can be seen in table 3.6 and when comparing the two plots in figure 3.7. A SOM can’t handle a backlash. The backlash output is dependent on the
direction from which the backlash area is entered. The SOM system is static, so it
only looks at current signal values, and therefore it doesn’t have the required information to handle dynamic systems, i.e. systems that depend on previous values.
See section 3.5 for more about dynamic systems and SOMs.
3.5 Handling dynamic systems
3.5
25
Handling dynamic systems
In this section, a method to estimate dynamic systems with SOMs is described.
Although a function estimation problem is used to exemplify this, the method apply
to conditioning and classification problems as well.
In [4], the estimation of a dynamic system is done automaticly by modification
of the SOM-algorithm. In short, the output signal is used as an extra input signal during the training step to create some short-term memory mechanism. The
technique is called Vector-Quantized Temporal Associative Memory. This solution
is not possible in the VDS without remaking of the core system.
Another, manual, way of handling dynamic data is adopted instead. Lagged,
input and output, signals are used as additional input signals to incorporate the
information needed to handle dynamics.
3.5.1
Introduction
The SOM can be seen as a function estimator, estimating the function f (·) in
y(t) = f (u(t)),
i.e. the output at time t only depends on input signals at time t. By adding historical
input and output values to the input vector, the method can estimate functions on
the form
y(t) = f (u(t), u(t − 1), ..., u(t − k), y(t − 1), ..., y(t − n))
In this way, the dynamic function becomes, in theory, static when enough historical
values are used. Knowing that the SOM method has no problem with non-linearities
this leads to that, in theory, given enough historical signal values all causal, identifiable systems can be estimated with this method. This is only limited by the
discrete output, there can only be as many output values as there are neurons.
This theory is illustrated using the discrete system
y[t] = u[t],
(3.4)
using both Gaussian and uniformly distributed random signals as input u. The VP
is trained on the signal u[t] and labeled with y[t]. This example also shows upon
differences when using different kinds of distributions.
When this is verified, a time shift is introduced to the system;
y[t] = u[t − 1]
(3.5)
Exactly as before, the processor is trained on the signal u[t] and labeled with y[t].
It should be impossible to approximate the system because of the random data
input. The VP is not able to guess the value of y[t] when just knowing u[t] and not
u[t − 1] as there is no correlation between them.
Then the VP is trained on the signal u[t−1] and labeled with y[t]. This problem
is identical to the first problem, just as a change of variable, and there should
R
The Vindax
Development System
26
be no problem to estimate the system with the same accuracy as the system in
equation (3.4).
A situation where the exact dynamics of the system are known will not likely
appear in practice. Therefore a test is performed to see what happens if the VP
is over-fed with information. The processor is trained with both u[t] and historical
values of u[t] as input.
3.5.2
Results and discussion
Table 3.7 summarizes these results.
System
y[t] = u[t]
y[t] = u[t − 1]
Input signals
u[t]
u[t] Gaussian
u[t]
u[t − 1]
u[t], u[t − 1]
u[t], ..., u[t − 3]
max ABSE
2.6
38.1
134.7
2.0
12.7
89.3
mean ABSE
0.4
0.4
64.0
0.4
4.0
15.6
RMSE
0.5
0.8
73.9
0.5
4.8
19.6
Table 3.7. Results of function estimation using the VDS on simple time lagging examples.
Starting with the system in equation (3.4), the results show that the VP has
no problem doing this estimation using a uniformly distributed random signal as
input. The correlation plot in figure 3.8 visualizes how well the system is estimated
and the values in table 3.7 are very good.
Figure 3.8. Correlation plot of correct vs estimated values using a uniformly distributed
input signal.
Comparing figure 3.9 with figure 3.8 shows the differences between using a
3.5 Handling dynamic systems
27
Gaussian and a uniformly distributed signal as input. The estimation at extreme
values are worse with a Gaussian signal. The reason for this is that the SOM
algorithm approximates the distribution of the training data. Using an uniformly
distributed input signal will spread the neurons evenly over the input space, while
using Gaussian input more neurons will be placed in the center of the input space
and less neurons at the perimeters. Less neurons, in an area, gives a larger error
when discretizising the input signal and most often results in a larger error in the
output signal, which can be seen in figure 3.9.
Figure 3.9. Correlation plot of correct vs estimated values using a Gaussian input signal.
Trying to estimate the system in equation (3.5), the result again confirms the
expectations. The correlation between the correct and the estimated values of y
is ∼ 0.0043, the system could not be estimated. As proposed, this is solved by
lagging the input signal. Now the accuracy is almost identical to the first estimation.
The small difference is due to randomness of the input signals.
The VP is clearly confused when the dynamics is not exactly matched, as can
be seen when comparing figure 3.10 with figure 3.8. Here both u[t] and u[t − 1] are
used as inputs. The result shows that it is possible to estimate the system (3.5)
although not as good as when using only u[t − 1] as input. This is because the
SOM method uses unsupervised learning. It does not know, during training, that
the extra dimension in the input space is useless. This results in a lower resolution.
Using more historical signals as input gives even worse results. This shows the
importance of having good knowledge of time shifts and other dynamics in the
system.
28
R
The Vindax
Development System
Figure 3.10. Correlation plot of correct vs estimated values when estimating y[t] =
u[t − 1] with the VP. The processor is given u[t] and u[t − 1] as inputs.
3.6 Comparison with other estimation and classification methods
3.6
29
Comparison with other estimation and classification methods
In this section a comparison is made between the VDS and some other simple
methods to get an idea of the performance of the VDS. Classification and function
estimation are the problems chosen as suitable for a comparison.
3.6.1
Cluster classification through minimum distance estimation
The classification of cluster data with the VP can be compared with a simple
R
. The idea is to estimate the cluster centres by
algorithm implemented in Matlab
taking an average over the input training data. These averaged centres will then
act as reference points to which a distance is calculated when an input data point
should be classified. The algorithm is simply4 :
1. Estimate each cluster centre by taking the mean value of all input data that
belong to each cluster
2. Calculate the distance from each new input data point to all the cluster
centres
3. Classify the input as belonging to the closest cluster centre
Classified as
True value blue
True value red
True value black
blue
97.10%
0.50%
1.84%
red
0.86%
98.18%
1.34%
black
2.04%
1.32%
96.82%
Table 3.8. Classification results using the algorithm described.
This algorithm is tested on the second set of cluster data (figure 3.2). The
results are presented in table 3.8 and are compared with table 3.2 where most
frequent multi-classification resolution is used. The performance is slightly better
in the VDS approach.
This shows that the VDS has sufficient capabilities to classify data. The example
was a simple one, why the comparing algorithm was quite easy to design. If however
noisy signals are used or the clusters are not Gaussian, there might be a lot more
problems to find a suitable algorithm, mainly because of that the cluster centres
are difficult to estimate. The VDS however, is used the same way, which is why it
has a strong advantage in simplicity.
R
The Vindax
Development System
30
Figure 3.11. Correlation plot with estimated values compared to true values using least
squares estimation.
3.6.2
Function estimation through polynomial regression
The estimation of the system in equation (3.1) with the VDS in section 3.4 is put
R
.
into perspective with a least squares parameter estimation with Matlab
With no noise added to inputs and outputs the estimation problem is deterministic. Therefore only the case with noise is interesting for a comparison. With
knowledge of the system the estimation problem is formulated as:
y=Θu
(3.6)
where Θ = [θ1 θ2 ] are the parameters to be estimated and u = [u1 u2 ]T the input
signals. This is an over determined equation system that is solved with least square
minimization.
Using exactly the same data set as in the noise example in section 3.4 but
R
using Matlab
’s polynomial regression the result shown in figure 3.11 is achieved.
Comparing this to figure 3.6, the polynomial regression look worse and table 3.9
confirms this. The VDS performs better than a polynomial regression.
Method
VDS
Polynomial regression
Max ABSE
1078
1371
Average ABSE
189.3
237.1
RMSE
254.0
297.3
Table 3.9. Errors when using polynomial regression compared to the results from the
VDS.
4 The first step of this algorithm correspond to the training and labeling steps in the VDS and
steps 2-4 to the classification step.
Chapter 4
Data collection and
pre-processing
Collecting and pre-processing data are two subjects that due to their complexity
could be subject for a thesis of their own. Therefore, this report does not handle
them in depth and only relevant issues are discussed. It is however important to
stress that data collection and pre-processing are two key factors to succeed in
developing applications with SOMs.
4.1
Data collection
The entirely data driven property of the SOMs, and therefore the VDS makes data
collection extremely important. The training of the SOM is affected by the way
data are collected. The
• amount of data
• sampling frequency
• quality of data
• measurement method
are issues that have to be considered.
4.1.1
Amount of data
A rule of thumb is that during the training sequence the VDS should be presented
to, at least, an amount of data points 500 times larger than the size of the SOM1 .
This is to ensure that the network approximates the probability density function
1 There
is no risk of over-training (over-fitting). The SOM method has no such problems.
31
32
Data collection and pre-processing
of the input data. In a standard size VP with 256 neurons, this means 128000
samples.
When not enough data is available, training data can be presented to the SOM
in a cyclic way until the required number of data points have been processed2 .
However, doing this may cause information to be lost, e.g. input data with gaps
are not representative.
In addition, enough data for both training and validation has to be available.
4.1.2
Sampling frequency
The sampling frequency has to be considered when dealing with dynamics. As
was shown in section 3.5, dynamics can be handled by lagging input and/or output
signals. How many and which old values should be used, depend on the dynamics of
the system combined with the sampling frequency. With a high sampling frequency,
many old values are needed to cover the dynamics of the system. On the other hand,
a low sampling frequency, might cause dynamic information to be lost.
In this thesis a sampling frequency of 10 Hz is used. This is the sampling
frequency that is available in the test cell equipment used.
4.1.3
Quality of data
The quality of data is the most important factor to get good results. There are many
aspects of this, but the two most important are that data have to be representative
and correctly distributed. This is, again, due to the fact that the SOM-algorithm
is data driven and cannot extrapolate or interpolate. The VDS will, most likely,
generate unreliable output if variables are not representative and correctly distributed.
Representative data contain values from the entire operating range. This means
that the variables have varied from their maximum to their minimum values during
the measurement. All possible combinations of variables should also be available
to ensure representativeness.
Data from variables that are not representative should either be omitted or
used for scaling/normalization. E.g. the ambient air pressure variable is hard to
measure in its full range. Therefore it is better to use it to scale other pressure
variables that depend upon it.
The distributions of the input variables depend on the test cycle used when
collecting data. Therefore the test cycle has to produce data that are suitable for
the intended application. There are two different categories that are used: uniform
and real life distributions.
Uniform distribution The distribution should be uniform when estimating functions and doing condition monitoring. In both these cases it is important that
2 In this thesis the data are also randomized to ensure that they are not grouped in a bad
manner.
4.1 Data collection
33
the application has equal performance over the entire input space, why input
data should to be uniformly distributed.
Real life distribution A classification problem does not require equal performance over the entire input space. It is sufficient, and often advantageous, to
make the classification in areas where the input signal resides frequently in a
real situation.
Although these are the distributions wanted for all input signals, often not all
of them can be controlled individually, i.e the boost temperature depends on the
torque and the engine speed and can not be controlled manually. This is something
that needs to be considered from case to case when constructing the cycle used for
gathering data.
4.1.4
Measurement method
Data are collected in an engine test cell where an engine is run in a test bench.
In the bench an electromagnetic braking device is connected to the engine. This
device brakes the engine to the desired speed. Different operating conditions can
then be obtained. A lot of variables can be measured but the ones in appendix B
are the ones measured for this thesis.
As there are deviations between engine individuals, only one engine is used.
In the laboratory, the engine is controlled with test cycles/schemes to simulate
different driving styles and routes. It is important that the cycle generates appropriate data that suite the purpose, as discussed in section 4.1.3. For this thesis,
two different test cycles are chosen, the Dynamic Cycle, DC, and the European
Transient Cycle, ETC. A DC provides the possibility to control variables using
different regulators, here the engine speed and torque are used, whereas the ETC
is a standardized cycle.
Aiming for data from the entire dynamical range of the engine, a randomized
DC is created. The cycle starts with a torque of 600 Nm and an engine speed
of 1350 RPM and then desires new torques and engine speeds twice per second.
These values are randomly set under the constraint to prevent the engine to overload3 . This is followed by not allowing the speed to vary with no more than 100
RPM/second and the torque with 1000 Nm/second. In this way, the cycle will take
the engine through a large amount of operating conditions.
Two different change rates is used in this thesis. The first sets new desired
values two times per second and runs for 2000 seconds. With a sampling frequency
of 10 Hz 20000 samples are collected per cycle. The other sets new desired values
five times per 10 seconds and runs for 20000 seconds. The same sampling frequency
is used hence 200000 samples are collected.
The ETC is a standardized cycle that contains three phases:
3 An overload situation can occur if the engine is running at a very high speed. If the demanded
speed drops from a high level to a much lover level in a short time, for example going from 2500
RPM to 1000 RPM in one second, the breaking of the engine will cause the torque to peak.
34
Data collection and pre-processing
1. city driving - many idle points, red lights, and not very high speed.
2. country road driving - small road with lot of curves driving
3. high-way driving - rather continuous high speed driving
The data from this cycle are not as uniformly distributed as the DC but contain
other types information about the dynamical behavior of the engine. Sequences
such as taking the engine from idle to full load are captured in the ETC cycle but
not in the DC.
The ETC cycle takes approximately 1800 seconds. Again 10 Hz is used as sampling frequency why approximately 18000 samples are collected.
Measurements are performed in three different cases, listed in table 4.1. They
are performed on different occasions (read days) why the quality of the collected
data has to be examined closely. Both the DC and the ETC are used, thus six sets
of measurements are collected.
Case
1.
2.
3.
Description
Fault free engine
Engine with a 11 mm leakage on the air inlet pipe
(after the intercooler, see figure 2.2)
Engine with reduced intercooler efficiency to 80%
Table 4.1. Description of the three different measurement cases.
4.2
Pre-processing
The pre-processing of data can be divided into specific and general pre-processing.
The specific pre-processing deals with how to generate proper files for use as
input to the VDS. This can be very different depending on how the data files
are structured. In addition to this, data may have to be filtered, time shifted,
differentiated or other. This is also, more appropriately, called feature extracting
and is more a part of the problem solving method.4 When the input files are in
order, data are ready to be used as input to the VDS but not to the VP.
The general pre-processing involves file segmenting, statistics generation and
channel splitting where different toolboxes in the VDS can be used as well as other
software. Also, the general pre-processing includes transforming data to a form the
VP handles. The input interface of the VP uses 8-bit unsigned integers. Due to
this the data, usually provided as floats, are scaled to the range 0-255 and then
converted to unsigned integers. It is convenient to use the VDS for these tasks.
One variable that is handled in the same way for all applications (chapters 5-7)
is the ambient air pressure. The mean value and the standard deviation in table 4.2
4 The feature extracting is handled in the three chapters 5-7 where the developments of the
applications are discussed.
4.2 Pre-processing
Case
Mean value [kPa]
Standard deviation [kPa]
35
1
97.3
0.0603
2
98.1
0.0448
3
99.7
0.0604
Table 4.2. Ambient air pressure mean values and standard deviations for the three cases,
see table 4.1, using the DC.
reveals that this variable is not representative. The pressure values collected, stem
from the ambient air pressure outdoors, which varies from day to day5 . Measurements from all different altitudes and weather conditions are not available, and
therefore this variable is used to normalize turbo boost pressure, which is the only
other pressure signal used.
5 All cases are measured in different days. This will cause the VDS to produce very impressive
results if, for example, a distinction is to be made between case one and two. The VDS will
only use the ambient pressure when deciding the winner. The results are not applicable on a real
situation where the ambient air pressure will vary more.
36
Data collection and pre-processing
Chapter 5
Conditioning - Fault
detection
In this chapter a fault detection problem is treated. This is done using the condiR
tioning method in the VDS. The goal is to test if Vindax
can be used to detect
some faults in a diesel engine.
5.1
Introduction
The functionality used for fault detection is described in section 3.2. This method
is based on the distance measurement between neurons and input data that can be
extracted from the VDS. When a combination of the engine states produces a larger
distance between the winning neuron and the input signal than occurred during
training, an error is detected, or, at least, an unknown situation has occurred.
5.2
Method
Data from a DC and an ETC collected according to chapter 4, are used. The
system is trained and labeled with 75% of the data from the engine with no faults.
The verification data consist of the remaining 25% of the fault free data plus data
from an engine with an air leakage and an engine with reduced InterCooler, IC,
efficiency.
The following variables are used as input:
• moving average over 10 values of boost pressure
• moving average over 100 values of boost temperature
• moving average over 100 values of fuel value
• moving average over 100 values of engine speed
37
38
Conditioning - Fault detection
The smoothing, moving averages, is done to solve the problem with dynamics,
see section 6.2 for a more thorough discussion. It would also be possible to delay signals to solve dynamic problems. However, since different faults show with
different delays and no obvious delays can be detected, that approach is not used.
5.3
Results and discussion
The results from the conditioning are summarized in table 5.1 and visualized in
figure 5.1.
Healthy engine
Air leakage
IC reduced
%
0.9
11.8
35.7
mean
4.3
9.1
33.9
Table 5.1. Results during fault detection with a fault free engine, an engine with an air
leakage and an engine with reduced IC efficiency. The second column shows the percentage,
of all samples, the samples where the difference is above zero. The last column shows the
mean value of the difference values when the difference is above zero. Values above zero
indicates an unknown input combination, i.e. a probable fault.
This shows that faults can be detected. As explained in section 3.2 difference
values above zero indicates a probable fault in the system. There are definitely a
lot more samples with a difference above zero when there is a fault in the system.
The differences are also in general more above zero in the faulty cases than in the
fault free case.
In the results presented it is clear that the fault detection performs better when
the error is a reduced IC efficiency than when it is an air leakage. This can be
explained by comparing the chosen input variables to variables chosen in chapter 6.
They are similar to the variables chosen for isolation of reduced IC efficiency faults,
which explains the better performance.
It would be the other way around if the chosen variables would be more similar
to the ones for isolation of an air leakage. Therefore a change in variables towards
this would improve the fault detection when there is an air leakage. This is however
not a solid approach as the idea of conditioning is to detect new types, i.e. unseen,
faults. Input variables should not be adapted to a specific problem but should be
quite general to ensure that unseen faults are detected.
5.3 Results and discussion
39
Figure 5.1. Results during fault detection with (from top to bottom) a healthy engine,
an engine with air system leakage and an engine with reduced IC. Values above zero
indicates an unknown input combination, i.e. a probable fault.
40
Conditioning - Fault detection
Chapter 6
Classification - Fault
detection and isolation
This chapter describes fault detection and isolation with the VDS. The purpose is
to demonstrate how to develop such an application and to give an indication of
R
for this task. The goal is to test
results that can be expected when using Vindax
R
whether or not Vindax is capable to produce sufficient information to make a
diagnosis on a diesel engine inlet air system.
6.1
Introduction
As was described in section 4.1.4, data from three different cases are available. In
engine diagnostics it is desirable to isolate the faults in these cases, i.e. distinguish
a fault free engine from an engine with a leakage in the inlet air system or a reduced
intercooler efficiency. This is what the VDS is going to be used for in this chapter.
Two applications are developed to fulfill the purpose of this chapter; detection
of a leakage and detection of a reduced intercooler efficiency. DC and ETC data
from a fault free engine and an engine with implemented faults are combined and
are used for input. For training and labeling 75 % of the data are used and the
remaining part is used for verification.
6.2
Method
To develop these applications the network is trained to recognize fault situations.
The engine state will reveal these situations by the combination of its state variables, e.g. load, speed, oil and water temperature. All variables will not contain
information about the fault and hence the selection of the proper variables is vital.
41
42
Classification - Fault detection and isolation
6.2.1
Leakage isolation
When there is an air leakage, the compressor is not able to build up the intake
manifold pressure, i.e. the boost pressure signal, as fast and to the same level,
compared to the fault free engine. Therefore, the boost pressure signal and its
derivative are suitable as inputs. Additional state variables are the amount of fuel
injected, i.e. fuel value, with its derivative. This variable is selected because the
amount of fuel injected is correlated with the boost pressure. See section 2.2.4 for
details.
The following signals are used as inputs and pre-processed as follows:
• boost pressure
Smoothed by taking a 10 sample average and then normalized with the ambient air pressure.
• boost pressure derivatives
The difference between the current boost pressure and the values at t − 4 and
t − 8 are used as inputs to incorporate the slope. These two where chosen
simply by examining the slope of the boost pressure and trying to incorporate
as useful information about the derivative as possible.
• fuel value
Smoothed by taking a 10 sample average delayed by 7 samples to account
for dynamics, see section 3.5. The time constant, 7 samples, is estimated by
maximizing the correlation between the fuel value and the boost pressure
signals.
• fuel value derivatives
The differences between the current fuel value and the values at t − 4 and
t − 8 are used as inputs to incorporate the slope. These two where chosen of
the same reasons as with the boost pressure derivatives.
The smoothing reduces the effect of noise and dynamic behavior. This makes a
total of six input signals.
Another important variable is the engine speed, however not suitable for input.
Experiments using engine speed as an input variable, give worse results1 . The information that however resides in the engine speed signal is incorporated in another
way. As the boost pressure depends on this variable, but not as strongly as on the
fuel value, the input space is divided into three areas:
1. Low speed : 0 rpm → 1250 rpm
2. Middle speed : 1150 rpm → 1650 rpm
3. High speed : 1550 rpm → ∞ rpm
This is done to reduce the effect that the engine speed has on the boost pressure.
1 Why this is the case is not obvious. Maybe the VP is confused as more inputs are introduced.
But it could also be that the dynamics are not properly handled, i.e. the signal lagging is not
correct. Experiments could be done to try to solve this, but the solution presented here gives
satisfactory results.
6.2 Method
43
In each range, the fuel value will become more dominant as the engine speed does
not differ as much.
The ranges are chosen by looking at the distribution of the engine speed signal.
The test cycles used, DC and ETC, generate more measurements of the engine
speed in the mid range. This is due to the fact that the regulator used to control
the engine speed is not optimal, hence will the distribution be slightly ‘normalized’.
In addition, the ETC cycle is not designed to generate a uniform distribution of
the engine speed.
A separate network is used for each area. The size of them is 256 neurons
except for the middle range where a 1024 neuron network is tested as well. This
is done because the measurements contain more input signals in this area. There
are approximately four times as many samples here, compared to the other ranges.
Trying to increase the network size in the other ranges will probably not improve
the results much because the size is already sufficient.
6.2.2
Reduced intercooler efficiency isolation
Reduced intercooler efficiency, will result in a different behavior in the intake manifold temperature, i.e. the boost temperature signal. When the efficiency is reduced,
the air flowing through the intercooler will not be cooled as efficiently as with a
fault free engine.
Both engine speed and fuel value, i.e. amount of fuel injected, signals affect the
boost temperature. In the process in section 2.2.4, it is described how an increase in
fuel value causes the temperature to rise. This puts more pressure on the intercooler
to cool the air. Also an increase in engine speed would mean that more air would
be flowing through the intercooler. These two should therefore be the states that
the boost temperature depends upon.
The relationship, in time, between the fuel value and engine speed states and
the boost temperature cannot be concluded from the figure 6.1. It is difficult to
estimate a time constant and solve the dynamic behavior by delaying input signals.
Instead of delaying signals, a very large average is used. This means that the
signals are smoothed and the time delay will not be as significant. This is actually
how the temperature changes in the engine, hence a way to solve the dynamic
problems. Therefore, these three signals are used as inputs and pre-processed as
follows:
• boost temperature
Smoothed by taking a 100 sample average.
• fuel value
Smoothed by taking a 100 sample average.
• engine speed
Smoothed by taking a 100 sample average.
Figure 6.2 shows the signals after smoothing. The correlation between the three
states is much higher now. The actual changes in fuel value can be recognized in the
44
Classification - Fault detection and isolation
Figure 6.1. Boost temperature from an engine without faults during a DC. The boost
temperature is presented in both graphs and compared with, top: the fuel value and
bottom: the engine speed.
boost temperature curve. It is harder to see the same relationship between engine
speed and boost temperature but the improvement in results indicates a higher
correlation. There is a risk that information is lost when the signals are heavily
smoothed, but experiments show good results.
As with leakage detection the problem is divided into three intervals to increase
the resolution. This time, the intervals are in the boost pressure signal:
1. Low pressure: 0 kPa → 75 kPa
2. Middle pressure : 65 kPa → 155 kPa
3. High pressure : 145 kPa → ∞ kPa
6.3
Results and discussion
Tables 6.1-6.2 show the results from the classification. This is achieved by using
an activation frequency threshold of 80% for labeling; see section 3.3. This gives
a high response frequency, still keeping the ratio between correct and incorrect
6.3 Results and discussion
45
Figure 6.2. Averaged boost temperature from an engine without faults during a DC.
The boost temperature is presented in both graphs and compared with, top: the averaged
fuel value and bottom: the averaged engine speed.
classifications at a low level. Appendix C shows results when using 90%, 80%, 70%
and 60% as activation frequency threshold as well as most frequent method.
The results presented in tables 6.1-6.2, were found to be the best. Different
frequency thresholds produce different kinds of information. For the purpose here,
passing classification results to a diagnosis algorithm, a high ratio between correct
and incorrect classifications has to be achieved. Also a sufficiently high response
frequency, i.e. few non-classifiable signals, should be achieved to be able to make a
diagnosis.
46
Classification - Fault detection and isolation
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (256/1024/256)
NF
FAL
NF
FAL
Not cl.
13.80%
1.41%
0.94%
10.01%
85.27%
88.58%
38.92%
2.09%
3.18%
29.98%
57.89%
67.93%
46.76%
2.13%
4.65%
48.94%
48.59%
48.92%
68.95%
1.86%
4.93%
65.72%
26.12%
32.42%
37.44%
1.86%
2.89%
30.86%
59.67%
67.27%
41.81%
1.89%
3.70%
41.20%
54.48%
56.91%
Table 6.1. Classification results with an activation frequency of 80% at labeling. N F =
No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
48.85%
2.61%
3.89%
51.63%
47.25%
45.76%
66.58%
3.01%
2.41%
70.99%
31.00%
26.00%
69.60%
4.66%
2.48%
71.87%
27.93%
23.48%
59.91%
3.11%
3.03%
62.69%
37.07%
34.21%
Table 6.2. Classification results with an activation frequency of 80% at labeling. N F =
No Fault, FRIE = Reduced Intercooler Efficiency.
Chapter 7
Function estimation - Virtual
sensor
In this chapter a function estimation problem is solved. The VDS is used as a
R
virtual sensor to estimate the engine torque. The goal is to see how well Vindax
performs as a virtual sensor on a test problem.
7.1
Introduction
When discussing the torque estimation problem only the DC data are used. This
is because the quality of the ETC data that was acquired is not as high as for the
DC data, i.e. the number of samples is too low.1
Two different DC runs have been used. First, the same cycle as in the conditioning and classification problems is used. Then, a DC that is slower and longer is
used. The second cycle is run on a different engine. For training and labeling 75 %
of the data are used and the rest are used for validation. Only data from engines
with no faults are used.
7.2
Method
As discussed in section 2.2, the torque value is connected to the amount of fuel
injected into the cylinders, but also the amount of oxygen pushed into the cylinder
is of interest. This leads to the use of the following variables:
• Fuel value
• Engine speed
• Boost pressure
1 See
chapter 4.1.4 for a description of the two test cycles.
47
48
Function estimation - Virtual sensor
• Boost temperature
where the last three are connected to the amount of oxygen.
It is important to realize what using the fuel value, as one of the input signals,
leads to. This signal is an input signal for the fuel system in the engine, specifying
the amount of fuel that should be injected. If there is a fault in the system, e.g.
a clogged injector, this is not what will be realized. This might lead to a fault
intolerant torque estimation.
7.3
Results and discussion
The quality of the estimation is measured using a comparison between the estimated
torque value and the torque value used for labeling, i.e. the torque value measured
in the test bench. This result is put in relation to the torque estimation done by
the EMS.
The best result, on the first data set, is achieved using fuel value, engine
speed and boost pressure as input signals. The RMSE is then 146.5 Nm compared to 165.5 Nm for the estimation made by the EMS. If also boost temperature
is used, which might be desirable to compensate for faults in the system, the RMSE
increases to 158.5 Nm.
On the second set of data the corresponding RMSE value is 90.1 Nm (using
all four variables) compared to 66.9 Nm for the estimation by the EMS. Using
1024 Nm neurons lowers the RMSE value to 74.6 Nm. This set of data has slower
changes and not as many transients as the first set of data.
Two things can be seen that make the results worse. First, it is a resolution
problem. It is not possible to get that much better result using only 256/1024
neurons. In addition the I/O-interface of the VP causes problems on resolution. The
interface uses 8-bit integers for input/output which causes quantization errors. The
second problem is unaccountable peaks in the torque signal, several hundred Nm
in 0.1 seconds, from the test bench, as can be seen in figure 7.1.
These peaks can not be explained2 and are of no interest, as they are too short
to affect the truck. Therefore, they can be removed by a filter, that eliminates peaks
that are larger than 100 Nm. This filter is used on the signal before using it for
labeling and verification. Then the RMSE (without boost temperature) decreases
to 67.0 with 256 neurons and 55.4 with 1024 neurons. Also the RMSE for the EMS.
estimation decreases, down to 55.5.
For the faster set of data, the estimation made by the VP is clearly better than
the EMS estimation. For the slower set of data it’s a more even game, but here a
R
resolution issue in the Vindax
can be seen. The increase in neurons from 256 to
1024 gives a clear increase in the accuracy of the estimation. This suggests that
more neurons probably would improve the results even more.
Also for the second, slower, set of data, more variables should probably be
used. The set of data is collected from an engine equipped with an exhaust gas
2 They are probably a result of the test bench configuration and are not present when the
engine is installed in a truck.
7.3 Results and discussion
49
Figure 7.1. Unaccountable peaks in the torque signal from the test bench.
recirculation system that takes some of the exhaust gases and leads them into the
inlet air system. This is a way to reduce the emissions, but it also results in that
less oxygen in the air that is pushed into the cylinders. The amount of oxygen is
one of the key factors used in the estimation so using more variables, incorporating
the amount of oxygen lost, might improve the results for the slower set of data.
50
Function estimation - Virtual sensor
Chapter 8
Conclusions and future work
In this chapter conclusions from the work and suggestions of future work in the
area will be presented.
8.1
Conclusions
The probability density functions of the input signals determine the structure of
the SOM. This is all according to the SOM theory and a confirmation that the VDS
follows this. Depending on the application area, the distributions of input signals
have to be analyzed before using it for training, i.e. the training data should be
suitable for the purpose of the network.
A desired distribution of data can however be extremely hard to achieve. Often
measurements are done by controlling a limited number of variables. It is possible
to control the distribution of these but it is hard, or even impossible, to control the
distribution of the rest of the measured variables.
An advantage, and disadvantage, of the SOM approach is that prior knowledge
most often cannot be invoked to improve the results. Dealing with system identification usually starts with assuming the order of the system and then estimating
parameters. With prior knowledge about the system it is sometimes very straightforward to estimate the parameters. But with a more complex system this becomes
harder to do. Therefore this is both an advantage as well as a disadvantage.
Some types of prior knowledges might be incorporated by smart choices of input
signals, e.g. using derivatives instead of absolute values, combine signals into a new
signals etc.
Noise redundancy properties are hard to conclude. The numbers in table 3.6
showed a six times as high estimation error when the noise was introduced. Comparing with the polynomial regression there was a slightly better performance.
More studies should be performed to investigate the connection between the noise
and the estimation error. It is however clear that the signals should not be too
noisy to get a good result.
51
52
Conclusions and future work
Working with linear or non-linear systems is not much of a difference. This is the
nature of the SOM network. It does a mapping of the input data independently
of what the system is like. Sending the same input data to a linear and a nonlinear system gives the same network. In the end the system layout only affects the
labeling. This is a big benefit of the SOM based methods.
The SOM method is a mapping for pure static relationships. It has no ability
to handle dynamic data. But, by feeding it with old input and/or output signals
as input the problems goes from being dynamic to being static. This way SOM’s
(including the VDS) can be used to solve dynamic problems.
Dealing with dynamics, a problem is to find out which input (and/or output)
signals that need to be lagged and with how much. Optimal is when all dynamics of
the system are included but nothing more. Using signals that are irrelevant makes
the estimation worse. The VP tries to adapt to information that is not relevant to
the problem.
As was mentioned in section 3.1, the thesis evaluated a PCI-version of the VP.
The processor is accessed through the VDS. This creates a very fast system and
the time required to train and label a network is negligible compared to the preprocessing. Using the software emulation mode is however very slow and usually
takes more than an hour.1
The VDS has a strong advantage in visualization. When the VP is trained it
is easy to visualize with different kinds of graphs how the training turned out.
Then for example adjustments in data pre-processing can be made to get better
performance.
Conclusions so far have been general conclusions applicable to all usage areas.
The following sections handle each area more specifically.
8.1.1
Conditioning
The conditioning functionality of the VDS is questionable for detection of faults.
The reason for this is because the VDS requires very precise handling of dynamics.
An on beforehand known possible fault can, probably, be detected by designing a
VDS application. But to say whether or not unknown errors will be recognized is
impossible. Probably, this new error shows up in a specific state and this information has to be extracted in a proper way to be detected.
There will be a problem if unseen data is presented to the VP as it will be
classified as faulty. This means that all possible combinations have to be measured
and used to train the VP if the conditioning is going to be useful.
8.1.2
Classification
When working with classification problems, the VDS has a strong advantage in
simplicity. The method to design an application is basically the same, no matter
the dimensionality or other aspects that complicates the problem. This also makes
1 This
depends on the performance of the PC that is used.
8.2 Method criticism
53
it possible to get results very fast, and it is also very easy to change classification
boundaries to try out a good combination.
There is a high potential in this area. As the demands on representability of
data are lower when making a diagnosis, i.e. a diagnose does not have to be made
during every operating condition, the training data do not need to incorporate all
engine properties.
8.1.3
Function estimation
The accuracy of the function estimation depends on how many neurons that are
used in the SOM and the number of misclassified signals. In the example with noisy
signals the number of misclassified signals were high. This caused the error to grow
although the neuron labels had approximately the same distance, i.e. the network
had the same resolution.
As was discussed in section 7.3 there is a resolution problem in some cases.
In general, this applies to function estimation, but could be an issue for the other
areas as well. The I/O interface uses 8-bit integers, i.e. 256 levels, which means that
there will be quantification errors in both the input signals and the output signal.
It is a disadvantage that the output is discrete and limited to as many values as the
number of neurons. This might be compensated for with hierarchies in the VDS
but in the end it is a limitation.
8.2
Method criticism
The same kind of data have been used for training as for verification. It is arguable
that this approach does not give a true image of the VDS capabilities. Classifying
other kinds of data, e.g. more stationary data, may not give as good results. This
is because these are unseen data that the system is not trained to classify.
A temporary solution would be to incorporate part of the new data into the
training set and the system would then be able to classify it. Then again, a new
type of data could be introduced and so on. The solution to this problem is to use
training data that capture sufficient properties of the system to be able to classify
signals and make a diagnosis.
8.3
Future Work
Going forth would probably involve developing a new and better test cycle with
both faster and slower parts. Also more variations in ambient air pressure and
all the temperatures are needed. New hierarchies of networks could increase the
resolution where separation can be based on different intervals in fuel value, engine
speed or other variables. Additionally, to improve the torque estimation results it
would probably be necessary to incorporate flywheel information.
Some of the input signals used for torque estimation might be sensitive to
faults in the system. This should be tested and some new input signals might be
54
Conclusions and future work
needed to compensate this. Also for this reason, looking into flywheel information
is preferable.
The data used in this thesis have been collected in an engine laboratory on one
engine individual. The collecting of data should also be examined when it comes
to deploying a VP-based application in a truck. There will probably be needed
some adaptations for this, which is something that should be investigated. And the
capabilities of a network trained on one engine individual be used on another need
looking in to. For the method to be useful in an engine this must be solved without
or with just minor modifications to the network.
The experience from the fault isolation where a such an important variable as
engine speed made the isolation worse when it was included as an input variable.
The approach with separation into different intervals sort of solved this problem.
It would however be interesting to investigate why this behavior was encountered.
The dynamic behavior has been handled by delaying input signals or using
long averages. When a signal is delayed, it is important that the estimated time
constant is correct, see section 3.5. These time constants change depending on
different driving conditions, e.g. engine speed, ambient air temperature, coolant
temperature, and so on.
Further investigation should be done to solve the varying time constants and
these are some suggestions:
• Sampling based on crank-shaft angle instead of time. This would eliminate
the affect from engine speed on time constants.
• Use separate networks for different temperature areas. In each area a separate
time constant could be estimated to give a higher accuracy.
The long averages solution to dynamics included one length of the averaging.
Experiments with different lengths would reveal if the accuracy can be improved
additionally. Also using a combination where the boost temperature signal is not
averaged while fuel value and engine speed are, would be an interesting test.
The torque estimation did not include any delay of signals. This was done
because the torque is a very fast ‘output signal’. Delaying the inputs one or two
samples may improve the results even more and could be investigated.
Bibliography
[1] T. Kohonen, Self-Organizing Maps, 3rd ed. Springer-Verlag Berlin Heidelberg
New York, 2001.
[2] B. Challen and R. Baranescu, Diesel Engine Reference Book, 2nd ed. Elsevier,
1999.
[3] L. Nielsen and L. Eriksson, Course material Vehicular systems.
Institute of Technology, Vehicular Systems, ISY, 2003.
Linköping
[4] G. de A. Barreto and A. F. Araújo, “Nonlinear modeling of dynamic systems
with the self-organizing map,” ARTIFICIAL NEURAL NETWORKS - ICANN
2002 LECTURE NOTES IN COMPUTER SCIENCE 2415, pp. 975–980, 2002.
55
56
Conclusions and future work
Appendix A
RMSE versus ABSE
Perhaps the easiest way of measuring an error is to look at the absolute error in
every sample and take the mean value or the maximum value. In this thesis this
measure is called the mean/max ABSolute Error, ABSE.
mean ABSE
=
1 X
|x̂i − xi |
N
i∈[1,N ]
max ABSE
max |x̂i − xi |
=
i∈[1,N ]
Here x̂ is the measured value, x is the real/theoretic value and N is the number of
samples.
One of the most common error measurements is the Root Mean Square Error,
RMSE, which is a quadratic version of the mean ABSE. The name comes from the
formula in which the value is calculated:
v
u
u1 X
(x̂i − xi )2
RM SE = t
N
i∈[1,N ]
The RMSE gives a value that is more sensitive to large errors than the mean ABSE.
When taking the square of the errors large errors gets more emphasis.
57
58
RMSE versus ABSE
Appendix B
Measured variables
Following sensor variables are available:
• Time
• Ambient air temperature
• Ambient air pressure
• Oil temperature
• Coolant temperature
• Boost temperature
• Boost pressure
• Engine speed
Boost temperature and pressure are measured in the intake manifold.
In addition to this, calculated values, e.g. torque and amount of fuel injected,
are available from the EMS.
59
60
Measured variables
Appendix C
Fault detection results
In this appendix results from the fault isolation problem in chapter 6 are listed.
Results from using 90%, 80%, 70% and 60% as activation frequency threshold
as well as using the most frequent method when trying to isolate air leakage and
reduced intercooler efficiency faults are presented. For more details see section 6.2.
61
62
C.1
Fault detection results
90 % activation frequency resolving
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (1024)
NF
FAL
NF
FAL
Not cl.
6.96%
0.08%
0.28%
6.03%
92.76%
93.89%
26.43%
0.57%
1.35%
12.47%
72.21%
86.96%
39.35%
0.69%
1.33%
31.46%
59.32%
67.85%
64.28%
0.93%
2.00%
53.32%
33.73%
45.75%
27.83%
0.50%
1.18%
18.00%
70.99%
81.50%
35.03%
0.57%
1.17%
28.36%
63.80%
71.08%
Table C.1. Classification results with an activation frequency of 90% at labeling. N F =
No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
36.10%
0.53%
0.63%
34.23%
63.27%
65.24%
55.07%
0.91%
0.71%
58.85%
44.22%
40.24%
53.44%
1.51%
0.56%
63.48%
46.00%
35.01%
47.04%
0.84%
0.65%
48.87%
52.31%
50.28%
Table C.2. Classification results with an activation frequency of 90% at labeling. N F =
No Fault, FRIE = Reduced Intercooler Efficiency.
C.2 80 % activation frequency resolving
C.2
63
80 % activation frequency resolving
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (1024)
NF
FAL
NF
FAL
Not cl.
13.80%
1.41%
0.94%
10.01%
85.27%
88.58%
38.92%
2.09%
3.18%
29.98%
57.89%
67.93%
46.76%
2.13%
4.65%
48.94%
48.59%
48.92%
68.95%
1.86%
4.93%
65.72%
26.12%
32.42%
37.44%
1.86%
2.89%
30.86%
59.67%
67.27%
41.81%
1.89%
3.70%
41.20%
54.48%
56.91%
Table C.3. Classification results with an activation frequency of 80% at labeling. N F =
No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
48.85%
2.61%
3.89%
51.63%
47.25%
45.76%
66.58%
3.01%
2.41%
70.99%
31.00%
26.00%
69.60%
4.66%
2.48%
71.87%
27.93%
23.48%
59.91%
3.11%
3.03%
62.69%
37.07%
34.21%
Table C.4. Classification results with an activation frequency of 80% at labeling. N F =
No Fault, FRIE = Reduced Intercooler Efficiency.
64
C.3
Fault detection results
70 % activation frequency resolving
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (1024)
NF
FAL
NF
FAL
Not cl.
19.86%
3.98%
3.95%
19.10%
76.19%
76.92%
45.02%
4.19%
8.60%
45.89%
46.39%
49.92%
57.70%
5.59%
8.73%
59.75%
33.57%
34.66%
75.44%
4.71%
7.98%
77.31%
16.58%
17.98%
43.60%
4.23%
7.24%
44.12%
49.16%
51.65%
50.67%
4.99%
7.32%
51.68%
42.01%
43.33%
Table C.5. Classification results with an activation frequency of 70% at labeling. N F =
No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
57.28%
4.86%
10.05%
67.50%
32.67%
27.63%
73.99%
5.27%
4.47%
76.04%
21.55%
18.69%
77.59%
6.49%
4.33%
76.52%
18.08%
16.98%
67.83%
5.30%
6.72%
72.39%
25.45%
22.31%
Table C.6. Classification results with an activation frequency of 70% at labeling. N F =
No Fault, FRIE = Reduced Intercooler Efficiency.
C.4 60 % activation frequency resolving
C.4
65
60 % activation frequency resolving
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (1024)
NF
FAL
NF
FAL
Not cl.
32.68%
12.67%
14.12%
31.52%
53.20%
55.81%
50.41%
10.91%
20.17%
61.95%
29.42%
27.15%
67.51%
11.07%
14.61%
72.00%
17.88%
16.93%
80.99%
8.49%
11.03%
89.26%
7.98%
8.25%
51.05%
10.96%
16.69%
57.37%
32.25%
31.67%
60.54%
11.05%
13.86%
62.85%
25.60%
26.10%
Table C.7. Classification results with an activation frequency of 60% at labeling. N F =
No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
69.15%
10.66%
14.53%
75.83%
16.32%
13.52%
79.77%
4.56%
7.57%
43.23%
12.66%
52.21%
84.77%
12.07%
7.31%
78.88%
7.93%
9.05%
76.35%
7.42%
10.36%
57.99%
13.29%
34.59%
Table C.8. Classification results with an activation frequency of 60% at labeling. N F =
No Fault, FRIE = Reduced Intercooler Efficiency.
66
C.5
Fault detection results
Most frequent resolving
true mode \ classified as
Low rpm
NF
FAL
Mid rpm (256)
NF
FAL
Mid rpm (1024)
NF
FAL
High rpm
NF
FAL
Over all (256)
NF
FAL
Over all (1024)
NF
FAL
NF
FAL
Not cl.
58.85%
34.74%
39.72%
63.13%
1.42%
2.13%
67.90%
19.86%
31.95%
79.89%
0.16%
0.24%
76.16%
17.99%
22.84%
81.28%
1.00%
0.73%
84.73%
11.78%
14.21%
87.48%
1.06%
0.74%
68.42%
22.52%
30.93%
76.63%
0.65%
0.85%
73.03%
21.50%
25.85%
77.37%
1.12%
1.12%
Table C.9. Classification results with most frequent as multi-classification resolving. N F
= No Fault, FAL = Air Leakage.
true mode \ classified as
Low BP
NF
FRIE
Mid BP
NF
FRIE
High BP
NF
FRIE
Over all
NF
FRIE
NF
FRIE
Not cl.
79.87%
19.21%
19.99%
80.59%
0.14%
0.20%
87.01%
13.41%
12.86%
86.39%
0.14%
0.19%
88.48%
13.70%
11.21%
85.70%
0.31%
0.59%
84.37%
15.99%
15.47%
83.75%
0.17%
0.26%
Table C.10. Classification results with most frequent as multi-classification resolving.
N F = No Fault, FRIE = Reduced Intercooler Efficiency.
På svenska
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
In English
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/
© Conny Bergkvist and Stefan Wikner