Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 Data mining applied for analysis of fault sequences in electronic circuits A. C. G. Oliveiral, J. P. L. ~ressan',L. E. ziratel & N. J. vieira2 I Department of Computer Science,Pontifcia Universidade Cato'lica de Minas Gerais, Brazil 2 Department of Computer Science, Universidade Federal de Minas Gerais, Brazil Abstract In electronics industries, more precisely in the UPS (Unintenuptable Power Supply) industry, there is a challenge to discover an optimized testing sequence for electronic cards. If a fault occurs in a test, the procedure is to repair the electronic card immediately; however the test may depend on previous tests that should be repeated. The situation could be time consuming if a fault occurs at the end of a testing sequence. In some cases, the whole testing sequence might have to be repeated. The optimized testing sequence is obtained by discovering test patterns and bringing them to the beginning of the sequence. The purpose of this paper is to develop an optimized testing sequence method reducing fault detection time by using a structure capable of finding test patterns that often occur in a database. This paper uses the KDD (Knowledge Discovery in Databases) methodology for data preparation and application of data mining techniques. Data mining techniques are used to discover test patterns to optimize future testing sequences. The project focuses on data mining applications. 1 Introduction The last two decades were marked by a fast growth in the amount of electronically stored information. The low cost of storing has contributed to this fact and, nowadays, large amount of data does not mean knowledge because no one is able to read and assimilate all the information. In other hand, data mining defines the automized process of capturing and analyzing large amount of data to discover important hidden relations. Data mining is composed of artificial Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 5 12 Data Mining IV intelligence technologies, statistics, data modeling techniques and database technology [l, 21. The KDD (Knowledge Discovery in Databases) process, as shown in figure l , consists of database discoveries, including data preparation, knowledge acquisition and data reporting [3].This process presupposes the following stages basically [l, 3,4]: User's Figure 1: Steps for Knowledge Discovery in Databases (KDD). Data selection: after defining the problem, a set of related data is selected and collected in order to solve it. Cleaning: data from real-world sources are often erroneous, incomplete, and inconsistent. For this reason, a treatment with strategies formulation is necessary. It includes basic operations such as noise removal and treatment of missing data. Enhancement and enrichment: enhances the data with additional sources of information to increase the likelihood of success. Coding or transforming: determines the most relevant features and derives those that are useful. It includes operations on the database to transform or simplify data in order to prepare it for data mining algorithms. Data Mining: applies algorithms to the transformed data in order to generate the expected results. Data reporting: visualization through written or graphical tools is used to present the mined knowledge to users. In electronics industries such as UPS industry, a series of tests in search of faults is done on the electronic cards to assure the quality of the final product. If a fault occurs, the procedure is to repair it immediately. Consequently, it may Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 Data Mining IV 5 13 have to repeat previous tests. The situation could be time consuming if the fault occurs in the end of the testing sequence. In some cases, all the previous tests might have to be repeated. By discovering these test patterns, it's possible to rearrange the sequence in order to create an optimized testing sequence. The purpose of this paper is to develop an analyse method for testing sequences reducing fault detection time. The cost of the process is expected to be reduced and productivity increased. This paper uses the KDD methodology [5, 61 described before, to find test patterns that optimize the electronic cards' testing sequence. The database has stored 100,000 records of electronic cards used in a UPS model. In other hand, the database only has stored if a fault occurred in a test or not. For this reason, the enhancement process is needed and applied using meta rules that add, in the database, testing sequences that occurred after the fault test. For the selected database, cleaning process should also be applied, eliminating inconsistent testing sequences. To enhance the database, meta rules are created based on technician's experience which are conditions and restrictions of the testing sequence that will be applied to the suggested optimized sequence. The project focuses on data mining application. It is proposed a method to discover test patterns that is used to optirnize the testing sequence. The idea is to find test patterns and its occurrence frequency in the database, allowing the "suggestion " of an optimized testing sequence. This paper is divided in the following sections: Section 2 a description of the database is presented. Section 3 the KDD methodology is applied. Section 4 results are shown. In the last section, conclusions are presented. 2 Description of the considered databases The definition of the problem space to be investigated mentions the discovery of patterns of testing faults. The database studied contains the history for all the testing sequences of the electronic cards. It presented 18 attributes and a total of 100,000 records. The typical record of the original database, with categorical names, is expressed in eqn (1). Record = {MD, UD, SN, FD, ND, TD, TT, RD, VD, RT, AT, TS, DE, HR) (1) where: MD = model id; UD = unit id; SN = testing sequence number; FD = fault id; ND = node id; TD = test id; TT = test status; RD = test item id; VD = values read ( 3 attributes: read, maximum and minimum); RT = test item status; AT = test arrangement; TS = responsible; DE = date (2 attributes: beginning, ending date), HR = hour (2 attributes: beginning, ending hour). The selection of the relevant attributes to the problem is a difficult task. These attributes mean facts or judgments. The larger the number of attributes that mean judgments, the larger will be the imprecision of the discovered knowledge [7]. Therefore, each attribute should be analyzed alone and then with the other attributes. For example, the attribute (SN) corresponds to the arrangement of the Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 5 14 Data Mining IV testing sequence, in other words, how many testing sequences and which one happened first. Attributes that mean judgments, should be enhanced and enriched during the data preparation process of KDD. 3 Application of the KDD methodology The KDD methodologies vary from author to author [4, 51 but they all deal with the same necessary stages for accomplishment of the process. In general lines, the stages of KDD involve: Selection, Pre-processing, Data Mining, Interpretation and Visualization. Each mentioned stage involves other steps that depend, specifically, on the problem. Fig. 2 shows the structure of the stages applied in this paper. Historical DB a Selection F I j1 Attributes Circuit A Circuit B Circuit C of Attributes Circuit C DB Validating Sequences Selecting Sequences Q ,,,.=@D Real testing Sequence DB Figure 2: Applied KDD methodology. 3.1 Data preparation and meta rules definition During data preparation, the focus is on having a better understanding of what the data represents, how and why it is transformed and how to enhance it using meta rules. The main purpose of data preparation is to manipulate and transform data so that its information can be showed [5]. In this paper, data preparation is composed of selection (data selection, ordering data, choosing relevant attributes), enhancement of attributes, and Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 Data Mining IV 5 15 cleaning database (removing incomplete sequences, validating sequences, selecting relevant sequences). 3.1.1 Selection Data selection: the database used has stored 100,000 records, containing the history of all the testing sequences done with the UPS for a specific electronic circuit, fig. 3. Each record has information about only one test of the testing sequence. aI Model I Circuit I T1, T2, .... T15 1 Cucuit A Cucut B of 10 years Figure 3: Selection of the data sets. The Sq; sequences have 15 tests, eqn (2), being represented by the set C of eqn (3): where: n is the number of testing sequences; and Ti receives the values 0: fault test or 1: test ok. Appendix 1 shows the description of each test. After data set selection, the database was reduced to 58,215 records representing a total of 1,961 stored testing sequences. Ordering data: the records are not stored according to the execution of the real testing sequence. They were rearranged chronologically to assure the sequences' consistency. Choosing relevant attributes: the criteria adopted for removal of attributes through SQL query were based on the database analysis: attributes that do not contribute to the definition of the problem space. Some of these are: technician's names and numerical values read from a test. The total of excluded attributes was 14. 3.1.2 Enhancement and enrichment In eqn (2), it is noticed that each test represents a boolean information, failed or not. For this reason, the enhancement of the data set with meta rules was needed: tests that were repeated when a fault occurred were introduced. The meta rules Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 5 16 Data Mining IV controls the execution of the testing sequences and are defined based on technician's experience. For example, in a testing sequence, the first test to be done is the visual inspection of the equipment (T1, see Appendix l), verifying physical problems like broken or dented components. The correspondent meta rule (MR) for this process is to start all the testing sequences verifying the electronic card visually. Another meta rule application is in the dependence of the tests. If a test fails and it depends on results from previous tests, the process will restart from the dependent tests. To explain the creation of the meta rules, consider the following testing sequence read from the database: If the fault occurs in T4 (T4= 0). The meta rule is: MR1) IF T4 = 0 THEN Start (T3) The real testing sequence would be: If the pattern [T3,T4] repeats frequently, the new testing sequence would be: Sq optimized: IT1, T3, T4,T2, T5, .... TI5} Figure 4: Data cleaning process. 3.1.3 Cleaning Removing incomplete sequences: the data cleaning process eliminates all the sequences that have missing tests to guarantee the integrity of the testing sequence, fig. 5. This measure avoids inconsistent knowledge and information acquisition from the history table in the discovery of fault patterns, fig. 4. The total of excluded sequences was 383, remaining 1,578. Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 Data Mining IV 5 17 The eqn (4) describes this procedure: (4) for each unit do read tests from file if the sequence has at least one occurrence of each of the 15 tests then sequence is ok record sequence to file else remove sequence from file Figure 5: Applied algorithm for removing incomplete sequences. Validating sequences: this stage eliminates testing sequences that were not stored correctly, fig. 6. For example, a technician detected a fault test, but he did not repeat the whole testing sequence or maybe he did but it was not recorded. This is a real problem that industry faces. In some cases, testing sequences were recorded with discontinuous tests (5). The total of excluded sequences was 8, remaining 1,570. for each unit do read tests from file if each next test is greater than the previous test by one unity or the next test is equal to the previous test or the next test is lesser than the previous test then sequence is ok record sequence to file else remove sequence from file Figure 6: Applied algorithm for validating sequences. Selecting relevant sequences: this stage eliminates all testing sequences that did not present any fault, fig. 7. The total of excluded sequences was 1,387, remaining 183. Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 5 18 Data Mining IV for each unit do read tests from file count how many tests in the sequence if the total of tests in the sequence equal to 15 then no faults occurred remove sequence from file else record sequence to file Figure 7: Applied algorithm for selecting relevant sequences. 3.2 Data mining The database has stored a large amount of data meaning that no one is capable of assimilating and discovering test patterns that frequently occurs during the test sequence process. The data mining process consists of knowledge acquisition and in this work it's used to discover those test patterns to optimize the testing sequence. Each testing sequence implicitly stores patterns information about how the fault occurs. Based on this idea, it is better to analyze in each testing sequence discovering which patterns frequently occur instead of analyzing the fault test patterns combinations. In order to reduce processing time, the data mining technique consists of detecting which test fails and constructing a history table with fault patterns that occur in the testing sequence. The history table stores patterns represented by the fault test followed by the test that should be repeated. The occurrence table stores the frequency of the respective patterns in the history table. Further details about these tables are shown in section 4. The algorithm used in this process is demonstrated in Fig. 8. In order to explain the process, consider the following testing sequence composed of 15 tests read from the database: where a fault occurs in Tg (T3 = 0). Using the predefined meta rule: MR2) IF Tg= 0 THEN Start (T2) The real testing sequence is: Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 5 19 Data Mining IV for each testing sequence do read tests from file if the next test is lesser than the previous test then pattern + previous test, next test if the pattern does not exist in the history table then record new pattern to history table record new pattern frequency to occurrence table else increment pattern frequency in the occurrence table Figure 8: Applied algorithm for data mining. In this case the pattern [Tz, T3] is the most frequent. The meta rule (MR2) eliminates the possibility of the testing sequence start in T3 due to previous tests dependence. According to this information, the indicated new testing sequence is: Sq optimized: IT2, T3,T1, T4, ... , T15) The final optimization should reduce fault detection time and cost of the process, increasing productivity. 4 Results After data preparation and data mining, a history table, as shown in table 1, is built from each testing sequence containing fault patterns. The history table has another corresponding table called occurrence table, as shown in table 2. It has stored the frequency of each fault pattern from the history table. Table 1: History table. 2 , 14,l 1 1 l , 15,l 1 1 8 10,lO 1 1 5,l 13,13 1 1 Fault 7attern 11,11 13,l 2,2 3,2 1 1 10,l 111,l 12,12 1 7,7 1 1 3,l 4,4 1 1 9,9 6,6 1 1 3,3 5,s Table 2: Occurrence table. Fault pattern frequency 1 0 ) 5 9 1 1 1 1 1 7 6 1 1 3 8 1 3 6 1 1 1351 1 2 1 2 2 1 7 1 1 1 1 1 1 1 1 2 1 2 1 1 ( 1 Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 520 Data Mining IV The fault pattern [2,1] in table 1 means that a fault occurred in test number 2 and the testing sequence restarted from test number 1. This fault pattern occurred 10 times in the database, as shown in table 2. This frequency is a counter and it is obtained from each pattern [2,1] read in the data mining process, fig. 8. This process happens for each fault pattern in table 1. Analyzing fault patterns and its occurrence, table 1 and table 2, 199 fault patterns were detected being 29,65% represented by the pattern [l,l]; 18,1% by the pattern [ l 1,l l]; 17,59% by the pattern [2,2] and 34,66% others. According to this information, eqn (6) represents the new indicated testing sequence. Sq optimized: {T1,Tll, Tz, T3. T4,T5, ... ,TIS} (6) 5 Conclusion In this paper, an application of the KDD methodology for knowledge discovery in database for electronic circuits was applied. In electronic industry, to assure the quality of the final product, a series of tests in electronic cards is done in search of faults. If a fault occurs, the procedure is to repair it immediately. The situation could be time consuming if a fault occurs in the end of the testing sequence and the whole sequence might have to repeat. The problem of time reduction was solved using KDD methodology for data preparation and data mining application. During data preparation, 3 stages were considered: selection (data selection, ordering data and choosing relevant attributes). Using SQL query, the new database had stored 1961 testing sequences. Enhancement of attributes was applied using meta rules based on technician's experience to avoid inconsistent data. Cleaning (removing incomplete sequences, validating sequences and selection relevant sequences) was possible by applying specific algorithms. After the cleaning process, the new database had stored 183 testing sequences. The data mining process consisted in discovering fault patterns using history and occurrence table concept. Applying frequency analyzes to the occurrence table patterns were presented, allowing the rearrangement and optimization of the testing sequence. The KDD methodology used in this paper showed that it's an efficient way to retrieve useful information from databases. It assisted through the whole process of obtaining an optimized testing sequence. The most frequent fault patterns were identified and moved to the beginning of the sequence decreasing the fault detection time. With the new testing sequence it's expected to lower the cost of production and increase productivity. Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517 Data Mining IV 52 1 Appendix 1 1 [1 Test T5 T6 1 Testing sequence Descri~tion I Reset Storage command reset Reset Start 1 TIO I B - ass Communication via Ethernet TI5 I Card removal References [l] Adriaans P. and Zantinge D., Data Mining, Addison Wesley Longman Inc, California, 1996. [2] Fayyad U. and Stolorz P., Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann Inc, USA, 1999. [3] Fayyad U. and Stolorz P,, Data mining and KDD: Promise and challenges, Future Generation Computer Systems, vol. 13, pp. 99-1 15. 1997. [4] Fayyad U. M., Djorgovski S.G. and Weir N., Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996. [S] Pyle D., Data preparation for data mining, Morgan Kaufmann, USA, 1999. [6] Han J.W. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann, California, 2001. [7] Jones, M.D., 14 Powerful Techniques for Problem Solving, Times Books Random House Inc. 1998. Transactions on Information and Communications Technologies vol 29, © 2003 WIT Press, www.witpress.com, ISSN 1743-3517