Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SETIT 2005 3rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27-31, 2005 – TUNISIA Data mining and models for human adapted system: a multi-methodological approach Vellemans P. *, Billaudel P. * and Riera B.** * IFTS – CReSTIC 7 Boulevard Jean Delautre 08000 Charleville Mézières [email protected] [email protected] ** Laboratoire CReSTIC (formerly LAM) Moulin de la Housse BP 1039 51687 Reims Cedex [email protected] Abstract: In the last few years, information contained in databases has increased considerably. To deal with this volume of information, a new approach known as data mining has developed. However, the number of methods used in data mining applications has also increased noticeably, which can make an understanding of the field difficult. Another problem is the enormous databases. Once, we have captured data and stored it, certain questions begin to naturally arise: Will these data help business gain an advantage? How can we use historical data to build models of underlying processes that generated such data? How not to be lost in data? How to use this "hidden" information which dozes? Etc. This paper presents the interest and the state of our research concerning the development of a cognitive system of decision-making aid applicable for the production’s follow-up. All the information sources are combined in order to facilitate the decision-making and adapted to the cognitive characteristics of the human operators. The work presented is related to the domain of the knowledge’s capitalization and the knowledge’s usability. We chose to illustrate our step with an industrial example of manufacture of cross-pieces’ support for cars. Key words: Cooperation human-machine, Data handling systems, Industrial production systems, Knowledge based systems, Models. 1 Introduction Actually, the industrial are prone to various requests which oblige them to implement a step of continuous improvement. Many requirements must be reconciled such as raw materials, reliability, safety, saving energy and environmental protection. If we add the international competition, we understand that the management and the control’s quality have become a determining element of the companies’ development (Rezg & al., 1995). To carry out these objectives, the production systems must be able: • to adapt to a production’s change (multiproduct manufacture), to a risk (introduction of a rush order, etc), • to answer quickly and economically. So the big size of the Dynamic Industrial Systems has two important consequences. On the one hand, automation cannot be total; the supervision loop integrates as well automatic systems (algorithms) as human operators. On the other hand, to describe them, it is necessary to make coexist different models (behavioural, structural, functional, dysfunctional...) at various levels of abstraction (structural and functional decompositions). The figure one summarizes the multi-point of view’s analysis for the supervision. Moreover, the progress in data acquisition and storage technology has led to the fast-growing tremendous and amount of data stored in databases, data warehouses. SETIT2005 - targets - technical specifications SYSTEM ANALYSIS Models Functional analysis (GTST, MFM) Structural analysis (topography, Information’s theory) Behavioral analysis (bond-graph, causal graphs) Specification of Algorithms Human-Machine (FDI…) Interface Figure 1. Multi-point of view’s analysis for the supervision (Riera, 2001) The industrial context in which we place ourselves is as follows: the installation’s production exceeds 20% of faulty pieces. The experts don’t manage to define the failure causes of the whole of the installation. At the present time, for the same manufacturing ranges, the operators produce a random number of right pieces. Each relevant information, concerning the system (tip up mould time, temperature of product’s tools…), as well as the product (temperature of sand’s cook, temperature of aluminium) is stored in a data base. To date, it is impossible to find the components which were used to crosspiece’s manufacture, through the various workshops (figure 3). To allow this link between the data bases, we will use the concept of traceability, as well as fuzzy logic and the theory of the possibilities (Dubois, & al., 1998). Although valuable information may be hiding behind the data, the vast data volume makes it difficult, if not impossible, for human beings to extract them without powerful tools. In order to relieve such a data rich but information poor plight, during the late 1980s, a new discipline named data mining emerged, which devotes itself to extracting knowledge from huge volumes of data (Zouh). 2 Research context We work on a foundry chain producing car aluminium crosspiece supports (figure 2). To produce a crosspiece support, three workshops are essential: • the mould core manufacturing one, • the produce of aluminium one, • the aluminium casting one. Figure 2. The foundry chain producing car aluminium crosspiece supports1 1 Fourth workshop represents the finishing touches’ workshops. Figure 3. To seek the link between the data 3. Data mining, methods and models, our contribution 3.1 Data mining The data mining joins the trend (now irreversible) of the knowledge management. Data mining will never replace the expertise, but it constitutes a great tool of formalization and improvement of the expertise. It often makes it possible to pass from a tacit knowledge (I can do it) to knowledge clarified (I can say like I make). Consequently, it is possible to communicate and increase this knowledge in the company. It is only one element of the data transformation process into knowledge (It makes easier the description of models or rules, starting from data’s observation). The techniques of DM can provide knowledge on the product. In the figure 4a, the cycle’s decomposition of the data’s transformation into knowledge, is represented. The DM needs a certain quantity of data to extract representative knowledge; this is why these techniques are adapted better to frequently encountered problems or repetitive tasks from which it is possible to have data of training. SETIT2005 Figure 4a. Step of knowledge management (Lefébure & al. 2001) Figure 4b. Step of knowledge management with a multi-methodological DM tool For that, in the repetitive tasks, it is possible, with DM’s methods, to compare the current evolution of a task with an old equivalent situation; with the intention of pre-empting the result and the next stage which must occur. The tools and methods suggested in the literature not having fully satisfied us, for the control of the extraction of knowledge. So, we propose a new methodology, for our diagnosis’ tools; this methodology is very strongly based on the tools taken from the literature. Many works are devoted to the comparison of methods on the simulated or real data. The lesson, of good direction which one can learnt from these methods of DM, is that there is not better method; their intrinsic properties and the necessary hypotheses adapt more or less well to the problem occurred. So, the problem is that the data are not equivalent (continuous, event-driven, conceptual, hybrid), well to date, no method can claim to handle these various data’s types. For this reason our work moves towards a “multi-methodological” approach. The perspective is to make coexist the various methods of DM and thus, use the advantages of some, to get round the limits of the others: “United we stand, divided we fall” Data Mining requires the implementation, clarifies or not, of traditional statistical methods (principal components, discriminating, K nearest neighbours, segmentation, linear regression) or less traditional (trees of classification and regression) or artificial intelligence (Bayesian networks, recognition of forms). The techniques quickly listed previously section pursue similar goals and can appear like competitors or rather complementary (Besse & al.,). Schematically, four nonexclusive objectives are the research’s target: • Exploration for a first approach of the data: check data by the search for inconsistencies, of atypical data, missing or erroneous, their transformation preliminary to other treatments. • Classification (clustering) to discover a typology or a segmentation of the observation. • Modelling by a whole of variables, to explain a quantitative or qualitative target variable. It acts then of a regression or a discrimination (or classification). • Recognition of forms without training. It is a question of detecting a configuration (pattern) original dissociating data. So, we have looked into the Lefébure & Venturi point of view, to add our vision of the multi-methodological DM tool (figure 4b). 3.2 Methods It is important that these predictors have an easily readable form and, if possible, already known apart from the field. There is a compromise between the clearness of the model and its predictive capacity. The more one model takes a simple form, the more it will be easy to understand, but less it will be able to take into consideration fines or too varied dependencies (non-linear). SETIT2005 For any given problem the data’s nature will affect the choice of the models and the algorithms, which one will choose. There is no “better” model or algorithm. Consequently, we need a variety of tools and technologies in order to find the best model one. For that, we listed, in the table 1, the whole of the data mining’s methods. To our knowledge, nobody took time to index and compare these methods. Table 1. An extract of advantages and disadvantages of each data-mining’s methods DRAWBACKS Strong data’s relations not represented. The clearness of the trees can become fallacious. The trees miss predictive smoothness. Very great number of rules, difficult to interpret, for a voluminous database. The rules can have conflict forecasts. Neural To the examples which Incapacity to explain the found relations network it "sees»: it doesn’t (causes for purposes). repeat the past. They (Rumelh are robust. Capacity of Their flexibility is art & al., generalization. Good in such, as they will find 1994) the case of problems many false models, for which one does not when the signal know little information report/ratio on noise is a priori. low. To approach too Training automated, much close to a value strong predictive can result in modeling capacity, capacity to particular cases, nonaccept co-linearity. relevant. K Goods to discover the Require a great groups’ zone. quantity of memory. nearest Can be extremely neighbors sensitive to the similar recordings. Genetic Goods for problems of Identical to the neural algoforecasts, implying networks. rithms non-linear data. Allow to obtain One does not know the (Two solutions with method of resolution. crow, problems not having 1999) resolution’s methods or whose exact solution is difficult to be found in a reasonable time. They do not require any knowledge in the way in which to solve the problem (required to evaluate the quality of the solution). Methods based on decision trees (Jambu, 1999) Methods based on rules ADVANTAGES Easy comprehension and interpretability (each way leads to a sheet). They manage the non-numerical data very well. Contrary to the decision trees, the rules are not necessarily independent. 3.3 Models The aim of our research concern the development of a cognitive system of decision-making aid applicable for the production’s follow-up. So, for implementing this system, it is necessary to know the process. That seems obviousness. However, considerable achievements begin differently. Finally, one realizes that some information is known only on the level of a complete line; whereas several intermediate operations are executed, with a possible detection of the rejects on the level of each one (Allot). For this reason, we modelled our installation, as well as the thought product. The formalism employed is Petri net. The Petri’s evolution represent either the expected operating (normal) of the system or failure situations. Each evolution is composed of a set of events and of time constraints between these events. The distribution of the time model represented by an evolution induces the distribution of the events occurrences (Ghallab, 1998). These models are including into the step of knowledge management. So, we have added to our vision of the multimethodological DM tool, the process and product models (figure 4c). Figure 4c. Product and process models include into the step of knowledge management SETIT2005 Conclusion - progress report We were interested in this paper in the development of a cognitive system of decisionmaking aid applicable for the production’s follow-up (figure 5). installation. With this vision of the multimethodological tool, we can start to answer the questions of the beginning and, by the same occasion, to put in trace traceability, worthy the name. As perspectives, we propose the following axes: • the co-operation between the human operator, the process, the product and data, must be widely studied. In fact, the nature of the supervised system and its characteristics must be taken into account while taking a decision for a co-operation, • to establish traceability of improvement and seek the cause of the faulty operation, • to make coexist our analysis’ module with the installation’s modelling ; all this combined in a tool of diagnosis and industrial supervision. References Allot, P. La réalisation de votre MES de A à Z. Ordinal Technologies. Besse, P., Le Gall C., Raimbault, N., Sarpy, S., Data mining et statistique. Communication. Dubois, D., & Prade, H. (1998). Possibility theory: qualitative and quantitative aspects. Handbook of defeasible reasoning and uncertainty management systems (Vol. 1, pp. 169-226). Kluwer, Netherlands Ghallab, M. (1998). Chronicles as practical representation for dealing with time, events and actions, AIIA Conference, Padoue, Italy. Jambu, M. (1999). Introduction au data mining – analyse intelligente des données, Eyrolles. Paris, France. Lefébure, R. and Venturi, G. (2001). Data mining – Gestion de la relation client, personnalisation de sites web, Eyrolles. Paris, France. Figure 5. Product and process capitalization’ ends and knowledge’s models, at We have recorded the values of the different parameters and of the quality of the resulting pieces in a database; we have implemented the Principal Components Analysis (PCA) and the Multiple Component Analysis (MCA) to prove or not, if our hypotheses appear exact. But these analyses did not give any result, this due to the mass of parameters to be treated and to the non-relevance of those. So, we have used the data mining’s methods, which will integrate our tool of assistance. Moreover, the system of support of crosspieces comprises a problem of traceability between the entities "Cores" - "Aluminium" and "Process". We endeavour to integrate this traceability, which is essential for such products. Finally, we have modelled the installation from the process and product points of view; to that we add a modelling of the “good” and “faulty” operations of the Rezg, N. and Niel, E. (1995). Monitoring system for discrete event system using failure-tolerance techniques, INRIA/IEEE Conference, of Emergent technologies and the manufacture systems automation, Paris, France. Rumelhart, D.E., Windrow, B. and Lehr, M.A. (1994) The basic ideas in neural networks, Communications of the ACM, 37. Riera, B. (2001). Contribution à la conception d’outils de supervision adaptés à l’homme, HDR, Valenciennes, France. Two Crows Corporation. (1999). Introduction to data mining and knowledge discovery (third edition). Zouh, Z.H., Three perspectives of Data Mining, National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China.