Download Data mining applied for analysis of fault sequences in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
Data mining applied for analysis of fault
sequences in electronic circuits
A. C. G. Oliveiral, J. P. L. ~ressan',L. E. ziratel & N. J. vieira2
I
Department of Computer Science,Pontifcia Universidade Cato'lica de
Minas Gerais, Brazil
2
Department of Computer Science, Universidade Federal de Minas
Gerais, Brazil
Abstract
In electronics industries, more precisely in the UPS (Unintenuptable Power
Supply) industry, there is a challenge to discover an optimized testing sequence
for electronic cards. If a fault occurs in a test, the procedure is to repair the
electronic card immediately; however the test may depend on previous tests that
should be repeated. The situation could be time consuming if a fault occurs at the
end of a testing sequence. In some cases, the whole testing sequence might have
to be repeated. The optimized testing sequence is obtained by discovering test
patterns and bringing them to the beginning of the sequence. The purpose of this
paper is to develop an optimized testing sequence method reducing fault
detection time by using a structure capable of finding test patterns that often
occur in a database. This paper uses the KDD (Knowledge Discovery in
Databases) methodology for data preparation and application of data mining
techniques. Data mining techniques are used to discover test patterns to optimize
future testing sequences. The project focuses on data mining applications.
1 Introduction
The last two decades were marked by a fast growth in the amount of
electronically stored information. The low cost of storing has contributed to this
fact and, nowadays, large amount of data does not mean knowledge because no
one is able to read and assimilate all the information. In other hand, data mining
defines the automized process of capturing and analyzing large amount of data to
discover important hidden relations. Data mining is composed of artificial
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
5 12
Data Mining IV
intelligence technologies, statistics, data modeling techniques and database
technology [l, 21.
The KDD (Knowledge Discovery in Databases) process, as shown in figure
l , consists of database discoveries, including data preparation, knowledge
acquisition and data reporting [3].This process presupposes the following stages
basically [l, 3,4]:
User's
Figure 1: Steps for Knowledge Discovery in Databases (KDD).
Data selection: after defining the problem, a set of related data is selected
and collected in order to solve it.
Cleaning: data from real-world sources are often erroneous, incomplete, and
inconsistent. For this reason, a treatment with strategies formulation is necessary.
It includes basic operations such as noise removal and treatment of missing data.
Enhancement and enrichment: enhances the data with additional sources of
information to increase the likelihood of success.
Coding or transforming: determines the most relevant features and derives
those that are useful. It includes operations on the database to transform or
simplify data in order to prepare it for data mining algorithms.
Data Mining: applies algorithms to the transformed data in order to generate
the expected results.
Data reporting: visualization through written or graphical tools is used to
present the mined knowledge to users.
In electronics industries such as UPS industry, a series of tests in search of
faults is done on the electronic cards to assure the quality of the final product. If
a fault occurs, the procedure is to repair it immediately. Consequently, it may
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
Data Mining IV
5 13
have to repeat previous tests. The situation could be time consuming if the fault
occurs in the end of the testing sequence. In some cases, all the previous tests
might have to be repeated. By discovering these test patterns, it's possible to
rearrange the sequence in order to create an optimized testing sequence.
The purpose of this paper is to develop an analyse method for testing
sequences reducing fault detection time. The cost of the process is expected to be
reduced and productivity increased.
This paper uses the KDD methodology [5, 61 described before, to find test
patterns that optimize the electronic cards' testing sequence. The database has
stored 100,000 records of electronic cards used in a UPS model.
In other hand, the database only has stored if a fault occurred in a test or not.
For this reason, the enhancement process is needed and applied using meta rules
that add, in the database, testing sequences that occurred after the fault test.
For the selected database, cleaning process should also be applied,
eliminating inconsistent testing sequences.
To enhance the database, meta rules are created based on technician's
experience which are conditions and restrictions of the testing sequence that will
be applied to the suggested optimized sequence.
The project focuses on data mining application. It is proposed a method to
discover test patterns that is used to optirnize the testing sequence. The idea is to
find test patterns and its occurrence frequency in the database, allowing the
"suggestion " of an optimized testing sequence.
This paper is divided in the following sections: Section 2 a description of the
database is presented. Section 3 the KDD methodology is applied. Section 4
results are shown. In the last section, conclusions are presented.
2 Description of the considered databases
The definition of the problem space to be investigated mentions the discovery of
patterns of testing faults. The database studied contains the history for all the
testing sequences of the electronic cards. It presented 18 attributes and a total of
100,000 records. The typical record of the original database, with categorical
names, is expressed in eqn (1).
Record = {MD, UD, SN, FD, ND, TD, TT, RD, VD, RT, AT, TS, DE, HR)
(1)
where: MD = model id; UD = unit id; SN = testing sequence number; FD =
fault id; ND = node id; TD = test id; TT = test status; RD = test item id; VD =
values read ( 3 attributes: read, maximum and minimum); RT = test item status;
AT = test arrangement; TS = responsible; DE = date (2 attributes: beginning,
ending date), HR = hour (2 attributes: beginning, ending hour).
The selection of the relevant attributes to the problem is a difficult task.
These attributes mean facts or judgments. The larger the number of attributes that
mean judgments, the larger will be the imprecision of the discovered knowledge
[7]. Therefore, each attribute should be analyzed alone and then with the other
attributes. For example, the attribute (SN) corresponds to the arrangement of the
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
5 14
Data Mining IV
testing sequence, in other words, how many testing sequences and which one
happened first. Attributes that mean judgments, should be enhanced and enriched
during the data preparation process of KDD.
3 Application of the KDD methodology
The KDD methodologies vary from author to author [4, 51 but they all deal with
the same necessary stages for accomplishment of the process. In general lines,
the stages of KDD involve: Selection, Pre-processing, Data Mining,
Interpretation and Visualization. Each mentioned stage involves other steps that
depend, specifically, on the problem. Fig. 2 shows the structure of the stages
applied in this paper.
Historical
DB
a
Selection
F
I
j1
Attributes
Circuit A
Circuit B
Circuit C
of Attributes
Circuit C
DB
Validating
Sequences
Selecting
Sequences
Q
,,,.=@D
Real testing
Sequence
DB
Figure 2: Applied KDD methodology.
3.1 Data preparation and meta rules definition
During data preparation, the focus is on having a better understanding of what
the data represents, how and why it is transformed and how to enhance it using
meta rules. The main purpose of data preparation is to manipulate and transform
data so that its information can be showed [5].
In this paper, data preparation is composed of selection (data selection,
ordering data, choosing relevant attributes), enhancement of attributes, and
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
Data Mining IV
5 15
cleaning database (removing incomplete sequences, validating sequences,
selecting relevant sequences).
3.1.1 Selection
Data selection: the database used has stored 100,000 records, containing the
history of all the testing sequences done with the UPS for a specific electronic
circuit, fig. 3. Each record has information about only one test of the testing
sequence.
aI
Model
I Circuit I T1, T2, .... T15 1
Cucuit A
Cucut B
of 10 years
Figure 3: Selection of the data sets.
The Sq; sequences have 15 tests, eqn (2), being represented by the set C of
eqn (3):
where: n is the number of testing sequences; and Ti receives the values 0: fault
test or 1: test ok. Appendix 1 shows the description of each test. After data set
selection, the database was reduced to 58,215 records representing a total of
1,961 stored testing sequences.
Ordering data: the records are not stored according to the execution of the real
testing sequence. They were rearranged chronologically to assure the sequences'
consistency.
Choosing relevant attributes: the criteria adopted for removal of attributes
through SQL query were based on the database analysis: attributes that do not
contribute to the definition of the problem space. Some of these are: technician's
names and numerical values read from a test. The total of excluded attributes was
14.
3.1.2 Enhancement and enrichment
In eqn (2), it is noticed that each test represents a boolean information, failed or
not. For this reason, the enhancement of the data set with meta rules was needed:
tests that were repeated when a fault occurred were introduced. The meta rules
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
5 16
Data Mining IV
controls the execution of the testing sequences and are defined based on
technician's experience. For example, in a testing sequence, the first test to be
done is the visual inspection of the equipment (T1, see Appendix l), verifying
physical problems like broken or dented components. The correspondent meta
rule (MR) for this process is to start all the testing sequences verifying the
electronic card visually.
Another meta rule application is in the dependence of the tests. If a test fails
and it depends on results from previous tests, the process will restart from the
dependent tests. To explain the creation of the meta rules, consider the following
testing sequence read from the database:
If the fault occurs in T4 (T4= 0). The meta rule is:
MR1)
IF T4 = 0 THEN Start (T3)
The real testing sequence would be:
If the pattern [T3,T4] repeats frequently, the new testing sequence would be:
Sq optimized: IT1, T3, T4,T2, T5, .... TI5}
Figure 4: Data cleaning process.
3.1.3 Cleaning
Removing incomplete sequences: the data cleaning process eliminates all the
sequences that have missing tests to guarantee the integrity of the testing
sequence, fig. 5. This measure avoids inconsistent knowledge and information
acquisition from the history table in the discovery of fault patterns, fig. 4. The
total of excluded sequences was 383, remaining 1,578.
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
Data Mining IV
5 17
The eqn (4) describes this procedure:
(4)
for each unit do
read tests from file
if the sequence has at least one occurrence of each of the 15 tests
then sequence is ok
record sequence to file
else
remove sequence from file
Figure 5: Applied algorithm for removing incomplete sequences.
Validating sequences: this stage eliminates testing sequences that were not stored
correctly, fig. 6. For example, a technician detected a fault test, but he did not
repeat the whole testing sequence or maybe he did but it was not recorded. This
is a real problem that industry faces. In some cases, testing sequences were
recorded with discontinuous tests (5). The total of excluded sequences was 8,
remaining 1,570.
for each unit do
read tests from file
if each next test is greater than the previous test by one unity or
the next test is equal to the previous test or
the next test is lesser than the previous test
then sequence is ok
record sequence to file
else
remove sequence from file
Figure 6: Applied algorithm for validating sequences.
Selecting relevant sequences: this stage eliminates all testing sequences that did
not present any fault, fig. 7. The total of excluded sequences was 1,387,
remaining 183.
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
5 18
Data Mining IV
for each unit do
read tests from file
count how many tests in the sequence
if the total of tests in the sequence equal to 15
then no faults occurred
remove sequence from file
else
record sequence to file
Figure 7: Applied algorithm for selecting relevant sequences.
3.2 Data mining
The database has stored a large amount of data meaning that no one is capable of
assimilating and discovering test patterns that frequently occurs during the test
sequence process. The data mining process consists of knowledge acquisition
and in this work it's used to discover those test patterns to optimize the testing
sequence. Each testing sequence implicitly stores patterns information about how
the fault occurs. Based on this idea, it is better to analyze in each testing
sequence discovering which patterns frequently occur instead of analyzing the
fault test patterns combinations.
In order to reduce processing time, the data mining technique consists of
detecting which test fails and constructing a history table with fault patterns that
occur in the testing sequence. The history table stores patterns represented by the
fault test followed by the test that should be repeated. The occurrence table stores
the frequency of the respective patterns in the history table. Further details about
these tables are shown in section 4. The algorithm used in this process is
demonstrated in Fig. 8.
In order to explain the process, consider the following testing sequence
composed of 15 tests read from the database:
where a fault occurs in Tg (T3 = 0).
Using the predefined meta rule:
MR2)
IF Tg= 0 THEN Start (T2)
The real testing sequence is:
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
5 19
Data Mining IV
for each testing sequence do
read tests from file
if the next test is lesser than the previous test
then
pattern + previous test, next test
if the pattern does not exist in the history table
then
record new pattern to history table
record new pattern frequency to occurrence table
else
increment pattern frequency in the occurrence table
Figure 8: Applied algorithm for data mining.
In this case the pattern [Tz, T3] is the most frequent. The meta rule (MR2)
eliminates the possibility of the testing sequence start in T3 due to previous tests
dependence. According to this information, the indicated new testing sequence
is:
Sq optimized: IT2, T3,T1, T4, ... , T15)
The final optimization should reduce fault detection time and cost of the
process, increasing productivity.
4 Results
After data preparation and data mining, a history table, as shown in table 1, is
built from each testing sequence containing fault patterns. The history table has
another corresponding table called occurrence table, as shown in table 2. It has
stored the frequency of each fault pattern from the history table.
Table 1: History table.
2 ,
14,l
1
1
l ,
15,l
1
1
8
10,lO
1
1
5,l
13,13
1
1
Fault 7attern
11,11
13,l
2,2
3,2
1
1
10,l 111,l
12,12 1 7,7
1
1
3,l
4,4
1
1
9,9
6,6
1
1
3,3
5,s
Table 2: Occurrence table.
Fault pattern frequency
1 0 ) 5 9 1
1 1 1 1
7
6
1
1
3
8
1 3 6
1 1
1351
1 2 1
2
2
1 7 1 1 1 1 1 1 1
1 2 1 2 1 1 ( 1
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
520
Data Mining IV
The fault pattern [2,1] in table 1 means that a fault occurred in test number 2
and the testing sequence restarted from test number 1. This fault pattern occurred
10 times in the database, as shown in table 2. This frequency is a counter and it is
obtained from each pattern [2,1] read in the data mining process, fig. 8. This
process happens for each fault pattern in table 1.
Analyzing fault patterns and its occurrence, table 1 and table 2, 199 fault
patterns were detected being 29,65% represented by the pattern [l,l]; 18,1% by
the pattern [ l 1,l l]; 17,59% by the pattern [2,2] and 34,66% others.
According to this information, eqn (6) represents the new indicated testing
sequence.
Sq optimized: {T1,Tll, Tz, T3. T4,T5, ... ,TIS}
(6)
5 Conclusion
In this paper, an application of the KDD methodology for knowledge discovery
in database for electronic circuits was applied. In electronic industry, to assure
the quality of the final product, a series of tests in electronic cards is done in
search of faults. If a fault occurs, the procedure is to repair it immediately. The
situation could be time consuming if a fault occurs in the end of the testing
sequence and the whole sequence might have to repeat.
The problem of time reduction was solved using KDD methodology for data
preparation and data mining application. During data preparation, 3 stages were
considered: selection (data selection, ordering data and choosing relevant
attributes). Using SQL query, the new database had stored 1961 testing
sequences. Enhancement of attributes was applied using meta rules based on
technician's experience to avoid inconsistent data. Cleaning (removing
incomplete sequences, validating sequences and selection relevant sequences)
was possible by applying specific algorithms. After the cleaning process, the new
database had stored 183 testing sequences.
The data mining process consisted in discovering fault patterns using history
and occurrence table concept. Applying frequency analyzes to the occurrence
table patterns were presented, allowing the rearrangement and optimization of
the testing sequence.
The KDD methodology used in this paper showed that it's an efficient way to
retrieve useful information from databases. It assisted through the whole process
of obtaining an optimized testing sequence. The most frequent fault patterns
were identified and moved to the beginning of the sequence decreasing the fault
detection time. With the new testing sequence it's expected to lower the cost of
production and increase productivity.
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517
Data Mining IV
52 1
Appendix 1
1
[1
Test
T5
T6
1
Testing sequence
Descri~tion
I
Reset
Storage command reset
Reset Start
1
TIO
I
B - ass
Communication via Ethernet
TI5
I Card removal
References
[l] Adriaans P. and Zantinge D., Data Mining, Addison Wesley Longman Inc,
California, 1996.
[2] Fayyad U. and Stolorz P., Information Visualization in Data Mining and
Knowledge Discovery, Morgan Kaufmann Inc, USA, 1999.
[3] Fayyad U. and Stolorz P,, Data mining and KDD: Promise and challenges,
Future Generation Computer Systems, vol. 13, pp. 99-1 15. 1997.
[4] Fayyad U. M., Djorgovski S.G. and Weir N., Advanced in Knowledge
Discovery and Data Mining, AAAI/MIT Press, 1996.
[S] Pyle D., Data preparation for data mining, Morgan Kaufmann, USA, 1999.
[6] Han J.W. and Kamber M., Data Mining: Concepts and Techniques, Morgan
Kaufmann, California, 2001.
[7] Jones, M.D., 14 Powerful Techniques for Problem Solving, Times Books
Random House Inc. 1998.
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517