Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A prototype of Quality Data Warehouse in Steel Industry Maria Murri – Centro Sviluppo Materiali S.p.A. Camillo De Vecchis – CM Pansid S.p.A. Abstract The final product quality is one of the fundamental aspects in steel industry. Because of this it became more and more important to prove the product quality, to look for the causes of all quality defects and to discover the relationships between product properties and production process parameters. Further, in order to verify the product quality in terms of ‘customer satisfaction’, it is important to investigate the relationships between product properties and sold/refused products. To solve these tasks it is on one hand necessary to handle information and data factory in a wide form and on the other hand to have powerful tools to analyse and interpret them. The work describes a Quality Data Warehouse (QDW) prototype, developed using SAS/Warehouse Administrator V8, realised for a steel plant of tinplate production. Process/product data and quality information are structured into the QDW in order to satisfy the requirements of several end-users (process analyst, quality analyst, data/modelling analyst, etc.). Introduction This paper describes a multi-partner research project carried out by Centro Sviluppo Materiali (CSM) and Aceralia. CSM, formerly the corporate research centre of the Italian public steel industry, is now a limited company, where the majority shareholders are private steel producers and material users. Aceralia formerly CSI, is an integrated steelworks that employs 9000 people in Asturias, Bilbao and Valencia. One of the main interests of steel research is the improvement of product quality and the reduction of production costs. To reach these aims many different ways are pursued: development of models for different metallurgical processes, optimising of the process control and automation systems, integrated quality assurance and others. Extremely important for a correct approach is the best management of plant and process data in order to obtain proper information. The best data management is furthermore necessary due to the complexity of data acquisition and to the amount of data sources on steel plants. Data storage in industrial plants with multi-stage processes is typically very heterogeneous because the systems were implemented at different time, on different hardware and with different data base concepts. This leads to problems in data-transfer as well as in data processing because of different ways of coding and pre-processing. On one hand, installed process computers store plant and process data for short time periods with high sampling rates and, on the other hand, the syntheses of these data are collected in higher level data bases, which consist normally of commercial software tools based on a Relational Data Base structure (RDB). Data referring to product quality or customer satisfaction are rarely suitable to perform a combination with process information. The Data Warehouse approach is a valid tool for solving these problems and for obtaining proper information from data. The case study CSM has developed a prototype of Quality Data Warehouse applied to the process/plant data of the coating line plant ‘Tinplate 2’ for the tinplate production at Aceralia. The tinplate production process can be synthesised as a transformation of a coil of properly prepared steel (incoming material) into a tin coated coil /sheet (final product). The incoming material is processed along the following sections of the coating line: • An entry section that includes two uncoilers, a shear, a welder, a drag bridle and a strip accumulator • A pre-plating section that comprises alkaline electrolytic cleaning tanks, electrolytic pickling tanks, rinse tanks • A plating tank section • A drag-out recovery unit • A melting tower and quench-tank section • A chemical-treatment section • • An emulsion and/or electrostatic oiling unit A delivery section that consists of the drive bridle, coilers, shear, pinhole detector, thickness cages, run-out tables, classifiers and pilers. Each section provides, for each coil, a huge quantity of process parameters that have been integrated into the data model. The relevant data The relevant data groups are defined considering both the process/product data and the quality information, therefore the defects detected on the incoming material and on the final product have to be included. The main groups of data for each coil are: • process parameters (of tinplate line) • incoming material defects (before the tinplate line) • final product defects (after the tinplate line) • properties of the incoming material • properties of the final product Besides these data, for a more complete analysis of the quality of the products, it is necessary to consider some information about the “customer satisfaction” in terms of sold and refused (for quality cause) products. These informations allow obtaining a prototype that is able to integrate the different functional area of a factory in a global process vision. The user requirements In order to specify in detail the data structure in the Data Warehouse, it is necessary to define the final users for this prototype. There are almost three typical final users: 1. The process analyst 2. The data-modelling analyst 3. The quality analyst The informational needs of each of these users may be defined in terms of interested data and frontend tools that have to be developed for the Data Warehouse exploitation (as shown in the following table). USER Process analyst Data modelling analyst INTERESTED DATA Process data and final product defects Defects (before and after tinplate line) FRONT-END TOOLS Data analysis tools Data mining tools Quality analyst Process data, defects, properties of the final product, refused products Multidimensio nal data analysis tools The end user requirements provide the basis for the data structure in the data warehouse repository where the main data entities (subjects) are: • Tinplate process: all data related to the process, products and defects; • Sold: data about sold products and clients; • Refused: data related to the refused products for quality reasons. Each subject is detailed in fact tables and the related dimensions using a star schema representation in which some dimensions (i.e. coil identifier (coil, client) are shared between more than one subjects. The data model design will be deeply described in the next paragraph. The process analyst and the data modelling analyst access directly to the Tinplate Process subject. The quality analyst needs data that are stored in all the three subjects, so he needs a data mart in order to have an integrated vision of his data. Data Warehouse design The first step of the modelling phase is the data warehouse design that requires a detailed analysis of the process and the actual data. As already said, the data warehouse is composed of three main subjects: • Tinplate process: this subject contains all the information related to the tinplate production process, like process global parameters, input and output defects, product classification, etc… • Sold: in this subject are contained all the relevant information about the selling process, like customer information, ordering process and shipment data • Refused: this subject contains information about the refusing process that is activated when the customer returns the bought products by contesting their quality. This process may have a different result depending on the reasons of the customer and the producer (Aceralia in the case). All the information related to this quality aspect are structured into this subject. In Figure 1 the first level of the data warehouse logical model is shown: the data warehouse is decomposed into the three subjects that are structured in a star schema representation. TIN PLATE PROCESS Def ects Production Time Def ects Fac Tin Plate Process REFUSED Coil classif ication Product Ty pe BL dispute result COIL SOLD Ref used PartOf Coil Sold Ref use Date Sold Date Client Country Figure 1 - Quality data warehouse logical model The subject ‘Tinplate process’ has a particular representation, called ‘constellation’ that is the result of the union of two star schemas (one for the ‘Defect fact’ and one for the ‘Tinplate process fact’) with two shared dimensions: 'Coil' and 'Production Time'. The implementation of this subject requires six tables: the first star is constituted from the ‘Defect fact table’ (containing all the analysis data related to the detected defects on the final products) and three dimension tables: • Defects: the description of all the possible input/output defects • Coil: description of the elementary production object, that is the coil • Production Time: time production hierarchy. The ‘Coil’ dimension in not so clear as we can imagine, because of the tinplate production process. At the beginning of the process, each input coil (incoming material) is welded to the tail of the preceding coil becoming a strip. At the end of the process this strip is again cut into coils. It is possible that the strip is cut in a different point respect to the welded point, so an input coil can be shared from more than one output coil and, vice versa, an output coil can be composed from more than one input coil. Data process, in the Aceralia databases, are not able to say if a defect detected on an input coil is inherited to all the output coils in which the input coil is present or only to a part of them. In this case it is necessary to describe the defects process, to maintain the input/output coil couple. The ‘Coil’ dimension is designed in order to give this kind of information. The second star of the ‘Tinplate process’ subject is composed of ‘Tinplate process fact table’, containing all the analysis data, that is all the physical parameters of the multi-stage process, and three dimension tables: ‘Coil’, ‘Production time’ and ‘Coil Classification’. A star schema that has the ‘Sold fact table’ and three dimensions represents the ‘Sold process’: • ‘Sold date’: sold date hierarchy (year, month, day, etc.) • ‘Client’: information about a client, divided into personal information and geographical information • ‘Part Of Coil’: each coil can be sheared into sheets so the minimal selling object is a part of a coil and all the sheets are referred to the same coil identifier. A star schema that has the ‘Refused fact table’, containing all the relevant analysis data about the refusing process, represents the ‘Refused process’ and four dimension tables: • ‘Part of Coil’ • ‘Client’ • ‘Refuse date’: refuse date hierarchy (year, month, day, etc.) • ‘Dispute Result’: the result of the process (the contestation is accepted or rejected from Aceralia) Process Analyst and Data modelling analyst users require working on the ‘Tinplate process’ data and it is necessary to present them a unique data environment. For this purpose it is necessary to have a ‘Tinplate process’ data mart where the two stars are joined and data is moderately aggregated. Quality analyst needs to investigate on all the data warehouse data, so he needs a general data mart that contains an aggregate form of all the information that are in the data warehouse. Figure 2 shows the complete data warehouse model. Figure 2 - Quality Data Warehouse Schema The data warehouse has been realised using SAS/Warehouse Administrator® facilities and the data structures are defined using SAS datasets. Loading process The loading process is the part of the data warehousing process that extracts data from Aceralia databases and loads it into the data warehouse data structures. In Figure 3 the loading process for the ‘Sold’ subject is shown. Figure 4 - An example of loading process Front-end The front-end is the data warehouse part directly accessed by the final users. It is, for this reason, the most important part of the data warehouse itself. The purpose of the front-end is to give analysis and investigation instruments to the end users, in order to give the ‘right thing’ to the ‘right user’. Previously three kinds of end users have been introduced: 1. The process analyst 2. The data-modelling analyst 3. The quality analyst Figure 3 - The loading process schema During the loading process some operations are performed: • data mapping: data base variables are mapped into data warehouse data structures, integrating different data • validation: same controls are performed to verify data congruence. In this prototype data discard management is not performed. • loading: data are loaded into data warehouse data structure Dimension tables are loaded first, then fact tables are loaded and then data mart structures are populated. Loading code has been realised using both SAS/Warehouse Administrator® facilities and adhoc code. SAS/Warehouse Administrator® has been used also to collect and to maintain metadata information. Metadata information can be visited using SAS/Warehouse Administrator®, as shown in Figure 4, where the loading process for the ‘Tinplate process’ fact table is reported. Each of these users has different informational needs and, therefore, each user must have different investigation instruments. The data- modelling analyst has to design models that aim at the product quality. He needs tools in order to build models, like data mining tools. As this work is done using a prototypal approach, the front-end for this kind of users is scheduled for the second part of the project. Nowadays the front-end parts realised are referred to the process analyst and the quality analyst. They are ad hoc applications developed using both SAS/EIS® and SAS/AF® software. The process analyst The process analyst needs to monitor constantly and “on the fly” the efficiency of the process in order to avoid bad product quality due to process malfunctions. He needs front-end tools like data analysis instruments that make easy the process monitoring and the detection of the causes of a bad product quality. The kind of instruments for this user is: reporting instruments, control panels and data navigation tools that focus on that part of process where significant facts happen. In particular, it is important to focus on defects and to filter analysis variables referring to a specific defect and to the production section where the defect has been generated. Figure 6 - Example of multidimensional graphical tool Figure 5 - Example of analysis navigational tool The process analyst needs the process data that are stored into the ‘Tinplate process’ data mart. Quality Analyst The quality analyst is interested in the product quality considering all the possible aspects. The analysis can be focus on the production process aspect, but also to the life cycle of the final product after the exit from the production line and, in general, to the client satisfaction. This last aspect includes information related to the detection of defects out of the production line, like defects detected during the manufacturing performed by the client or defects due to the storage and so on. Customer satisfaction also depends on nontechnical parameters, like price for quality and shipment/production time. For these reasons the quality analyst have to investigate all the data of the data warehouse, looking for relationships between data and technical/customer quality results. The front-end instruments for this kind of user are data analysis tools that make easy the data exploitation, so the user can cross as many variables as he likes, perform operations of drill down and rollup on data, design charts and so on. These instruments also must integrate statistical analysis tools to help users to discriminate actual correlation from non-related variables; that means to have a multidimensional data analysis tool integrated with the powerful statistical instruments of SAS System. The analysed data are stored both in the quality data mart and in the quality data warehouse. Conclusions This work, still running, has produced a reasonable instrument to collect different processing data and to analyse this data in an easy and rich manner. It represents the indispensable base for the further work that will be done on data warehouse instruments, like data mining analysis, in order to look for the causes of all quality defects and to discover the relationships between product properties and production process parameters. Acknowledgement European Carbon and Steel Community (ECSCS), the financial support of which is fully acknowledged, sponsor this work.