Download A prototype of Quality Data Warehouse in Steel Industry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
A prototype of Quality Data Warehouse in
Steel Industry
Maria Murri – Centro Sviluppo Materiali S.p.A.
Camillo De Vecchis – CM Pansid S.p.A.
Abstract
The final product quality is one of the fundamental aspects in steel industry.
Because of this it became more and more important to prove the product quality, to look for the causes of all
quality defects and to discover the relationships between product properties and production process
parameters. Further, in order to verify the product quality in terms of ‘customer satisfaction’, it is important
to investigate the relationships between product properties and sold/refused products.
To solve these tasks it is on one hand necessary to handle information and data factory in a wide form and
on the other hand to have powerful tools to analyse and interpret them.
The work describes a Quality Data Warehouse (QDW) prototype, developed using SAS/Warehouse
Administrator V8, realised for a steel plant of tinplate production. Process/product data and quality
information are structured into the QDW in order to satisfy the requirements of several end-users (process
analyst, quality analyst, data/modelling analyst, etc.).
Introduction
This paper describes a multi-partner research
project carried out by Centro Sviluppo Materiali
(CSM) and Aceralia.
CSM, formerly the corporate research centre of
the Italian public steel industry, is now a limited
company, where the majority shareholders are
private steel producers and material users.
Aceralia formerly CSI, is an integrated
steelworks that employs 9000 people in Asturias,
Bilbao and Valencia.
One of the main interests of steel research is the
improvement of product quality and the reduction of
production costs. To reach these aims many
different ways are pursued: development of models
for different metallurgical processes, optimising of
the process control and automation systems,
integrated quality assurance and others. Extremely
important for a correct approach is the best
management of plant and process data in order to
obtain proper information.
The best data management is furthermore
necessary due to the complexity of data acquisition
and to the amount of data sources on steel plants.
Data storage in industrial plants with multi-stage
processes is typically very heterogeneous because
the systems were implemented at different time, on
different hardware and with different data base
concepts. This leads to problems in data-transfer as
well as in data processing because of different ways
of coding and pre-processing.
On one hand, installed process computers store
plant and process data for short time periods with
high sampling rates and, on the other hand, the
syntheses of these data are collected in higher
level data bases, which consist normally of
commercial software tools based on a Relational
Data Base structure (RDB).
Data referring to product quality or customer
satisfaction are rarely suitable to perform a
combination with process information.
The Data Warehouse approach is a valid tool
for solving these problems and for obtaining
proper information from data.
The case study
CSM has developed a prototype of Quality
Data Warehouse applied to the process/plant data
of the coating line plant ‘Tinplate 2’ for the
tinplate production at Aceralia.
The tinplate production process can be
synthesised as a transformation of a coil of
properly prepared steel (incoming material) into a
tin coated coil /sheet (final product).
The incoming material is processed along the
following sections of the coating line:
• An entry section that includes two uncoilers, a
shear, a welder, a drag bridle and a strip
accumulator
• A pre-plating section that comprises alkaline
electrolytic cleaning tanks, electrolytic
pickling tanks, rinse tanks
• A plating tank section
• A drag-out recovery unit
• A melting tower and quench-tank section
• A chemical-treatment section
•
•
An emulsion and/or electrostatic oiling unit
A delivery section that consists of the drive
bridle, coilers, shear, pinhole detector, thickness
cages, run-out tables, classifiers and pilers.
Each section provides, for each coil, a huge
quantity of process parameters that have been
integrated into the data model.
The relevant data
The relevant data groups are defined considering
both the process/product data and the quality
information, therefore the defects detected on the
incoming material and on the final product have to
be included.
The main groups of data for each coil are:
• process parameters (of tinplate line)
• incoming material defects (before the tinplate
line)
• final product defects (after the tinplate line)
• properties of the incoming material
• properties of the final product
Besides these data, for a more complete analysis
of the quality of the products, it is necessary to
consider some information about the “customer
satisfaction” in terms of sold and refused (for
quality cause) products. These informations allow
obtaining a prototype that is able to integrate the
different functional area of a factory in a global
process vision.
The user requirements
In order to specify in detail the data structure in
the Data Warehouse, it is necessary to define the
final users for this prototype. There are almost three
typical final users:
1. The process analyst
2. The data-modelling analyst
3. The quality analyst
The informational needs of each of these users
may be defined in terms of interested data and frontend tools that have to be developed for the Data
Warehouse exploitation (as shown in the following
table).
USER
Process
analyst
Data
modelling
analyst
INTERESTED
DATA
Process data and final
product defects
Defects (before and
after tinplate line)
FRONT-END
TOOLS
Data analysis
tools
Data mining
tools
Quality
analyst
Process data, defects,
properties of the final
product, refused
products
Multidimensio
nal data
analysis tools
The end user requirements provide the basis
for the data structure in the data warehouse
repository where the main data entities (subjects)
are:
• Tinplate process: all data related to the
process, products and defects;
• Sold: data about sold products and clients;
• Refused: data related to the refused products
for quality reasons.
Each subject is detailed in fact tables and the
related dimensions using a star schema
representation in which some dimensions (i.e. coil
identifier (coil, client) are shared between more
than one subjects. The data model design will be
deeply described in the next paragraph.
The process analyst and the data modelling
analyst access directly to the Tinplate Process
subject. The quality analyst needs data that are
stored in all the three subjects, so he needs a data
mart in order to have an integrated vision of his
data.
Data Warehouse design
The first step of the modelling phase is the data
warehouse design that requires a detailed analysis
of the process and the actual data. As already said,
the data warehouse is composed of three main
subjects:
• Tinplate process: this subject contains all the
information related to the tinplate production
process, like process global parameters, input
and output defects, product classification,
etc…
• Sold: in this subject are contained all the
relevant information about the selling process,
like customer information, ordering process
and shipment data
• Refused: this subject contains information
about the refusing process that is activated
when the customer returns the bought
products by contesting their quality. This
process may have a different result depending
on the reasons of the customer and the
producer (Aceralia in the case). All the
information related to this quality aspect are
structured into this subject.
In Figure 1 the first level of the data warehouse
logical model is shown: the data warehouse is
decomposed into the three subjects that are
structured in a star schema representation.
TIN PLATE PROCESS
Def ects
Production Time
Def ects Fac
Tin Plate Process
REFUSED
Coil classif ication
Product Ty pe BL
dispute result
COIL
SOLD
Ref used
PartOf Coil
Sold
Ref use Date
Sold Date
Client
Country
Figure 1 - Quality data warehouse logical model
The subject ‘Tinplate process’ has a particular
representation, called ‘constellation’ that is the
result of the union of two star schemas (one for the
‘Defect fact’ and one for the ‘Tinplate process fact’)
with two shared dimensions: 'Coil' and 'Production
Time'. The implementation of this subject requires
six tables: the first star is constituted from the
‘Defect fact table’ (containing all the analysis data
related to the detected defects on the final products)
and three dimension tables:
• Defects: the description of all the possible
input/output defects
• Coil: description of the elementary production
object, that is the coil
• Production Time: time production hierarchy.
The ‘Coil’ dimension in not so clear as we can
imagine, because of the tinplate production process.
At the beginning of the process, each input coil
(incoming material) is welded to the tail of the
preceding coil becoming a strip. At the end of the
process this strip is again cut into coils. It is possible
that the strip is cut in a different point respect to the
welded point, so an input coil can be shared from
more than one output coil and, vice versa, an output
coil can be composed from more than one input coil.
Data process, in the Aceralia databases, are not able
to say if a defect detected on an input coil is
inherited to all the output coils in which the input
coil is present or only to a part of them. In this case
it is necessary to describe the defects process, to
maintain the input/output coil couple. The ‘Coil’
dimension is designed in order to give this kind of
information.
The second star of the ‘Tinplate process’
subject is composed of ‘Tinplate process fact
table’, containing all the analysis data, that is all
the physical parameters of the multi-stage process,
and three dimension tables: ‘Coil’, ‘Production
time’ and ‘Coil Classification’.
A star schema that has the ‘Sold fact table’ and
three dimensions represents the ‘Sold process’:
• ‘Sold date’: sold date hierarchy (year, month,
day, etc.)
• ‘Client’: information about a client, divided
into personal information and geographical
information
• ‘Part Of Coil’: each coil can be sheared into
sheets so the minimal selling object is a part
of a coil and all the sheets are referred to the
same coil identifier.
A star schema that has the ‘Refused fact table’,
containing all the relevant analysis data about the
refusing process, represents the ‘Refused process’
and four dimension tables:
• ‘Part of Coil’
• ‘Client’
• ‘Refuse date’: refuse date hierarchy (year,
month, day, etc.)
• ‘Dispute Result’: the result of the process (the
contestation is accepted or rejected from
Aceralia)
Process Analyst and Data modelling analyst
users require working on the ‘Tinplate process’
data and it is necessary to present them a unique
data environment. For this purpose it is necessary
to have a ‘Tinplate process’ data mart where the
two stars are joined and data is moderately
aggregated.
Quality analyst needs to investigate on all the
data warehouse data, so he needs a general data
mart that contains an aggregate form of all the
information that are in the data warehouse.
Figure 2 shows the complete data warehouse
model.
Figure 2 - Quality Data Warehouse Schema
The data warehouse has been realised using
SAS/Warehouse Administrator® facilities and the
data structures are defined using SAS datasets.
Loading process
The loading process is the part of the data
warehousing process that extracts data from
Aceralia databases and loads it into the data
warehouse data structures. In Figure 3 the loading
process for the ‘Sold’ subject is shown.
Figure 4 - An example of loading process
Front-end
The front-end is the data warehouse part
directly accessed by the final users. It is, for this
reason, the most important part of the data
warehouse itself.
The purpose of the front-end is to give analysis
and investigation instruments to the end users, in
order to give the ‘right thing’ to the ‘right user’.
Previously three kinds of end users have been
introduced:
1. The process analyst
2. The data-modelling analyst
3. The quality analyst
Figure 3 - The loading process schema
During the loading process some operations are
performed:
• data mapping: data base variables are mapped
into data warehouse data structures, integrating
different data
• validation: same controls are performed to
verify data congruence. In this prototype data
discard management is not performed.
• loading: data are loaded into data warehouse
data structure
Dimension tables are loaded first, then fact
tables are loaded and then data mart structures are
populated.
Loading code has been realised using both
SAS/Warehouse Administrator® facilities and adhoc code. SAS/Warehouse Administrator® has been
used also to collect and to maintain metadata
information. Metadata information can be visited
using SAS/Warehouse Administrator®, as shown in
Figure 4, where the loading process for the ‘Tinplate
process’ fact table is reported.
Each of these users has different informational
needs and, therefore, each user must have different
investigation instruments. The data- modelling
analyst has to design models that aim at the
product quality. He needs tools in order to build
models, like data mining tools. As this work is
done using a prototypal approach, the front-end
for this kind of users is scheduled for the second
part of the project. Nowadays the front-end parts
realised are referred to the process analyst and the
quality analyst. They are ad hoc applications
developed using both SAS/EIS® and SAS/AF®
software.
The process analyst
The process analyst needs to monitor
constantly and “on the fly” the efficiency of the
process in order to avoid bad product quality due
to process malfunctions. He needs front-end tools
like data analysis instruments that make easy the
process monitoring and the detection of the causes
of a bad product quality. The kind of instruments
for this user is: reporting instruments, control
panels and data navigation tools that focus on that
part of process where significant facts happen. In
particular, it is important to focus on defects and to
filter analysis variables referring to a specific defect
and to the production section where the defect has
been generated.
Figure 6 - Example of multidimensional
graphical tool
Figure 5 - Example of analysis navigational tool
The process analyst needs the process data that
are stored into the ‘Tinplate process’ data mart.
Quality Analyst
The quality analyst is interested in the product
quality considering all the possible aspects. The
analysis can be focus on the production process
aspect, but also to the life cycle of the final product
after the exit from the production line and, in
general, to the client satisfaction. This last aspect
includes information related to the detection of
defects out of the production line, like defects
detected during the manufacturing performed by the
client or defects due to the storage and so on.
Customer satisfaction also depends on nontechnical parameters, like price for quality and
shipment/production time. For these reasons the
quality analyst have to investigate all the data of the
data warehouse, looking for relationships between
data and technical/customer quality results.
The front-end instruments for this kind of user
are data analysis tools that make easy the data
exploitation, so the user can cross as many variables
as he likes, perform operations of drill down and
rollup on data, design charts and so on. These
instruments also must integrate statistical analysis
tools to help users to discriminate actual correlation
from non-related variables; that means to have a
multidimensional data analysis tool integrated with
the powerful statistical instruments of SAS System.
The analysed data are stored both in the quality data
mart and in the quality data warehouse.
Conclusions
This work, still running, has produced a
reasonable instrument to collect different
processing data and to analyse this data in an easy
and rich manner. It represents the indispensable
base for the further work that will be done on data
warehouse instruments, like data mining analysis,
in order to look for the causes of all quality defects
and to discover the relationships between product
properties and production process parameters.
Acknowledgement
European Carbon and Steel Community
(ECSCS), the financial support of which is fully
acknowledged, sponsor this work.