Download Data mining and models for human adapted system: a multi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
SETIT 2005
3rd International Conference: Sciences of Electronic,
Technologies of Information and Telecommunications
March 27-31, 2005 – TUNISIA
Data mining and models for human adapted system:
a multi-methodological approach
Vellemans P. *, Billaudel P. * and Riera B.**
*
IFTS – CReSTIC
7 Boulevard Jean Delautre
08000 Charleville Mézières
[email protected]
[email protected]
**
Laboratoire CReSTIC (formerly LAM)
Moulin de la Housse
BP 1039
51687 Reims Cedex
[email protected]
Abstract: In the last few years, information contained in databases has increased considerably. To deal with this volume
of information, a new approach known as data mining has developed. However, the number of methods used in data
mining applications has also increased noticeably, which can make an understanding of the field difficult. Another
problem is the enormous databases. Once, we have captured data and stored it, certain questions begin to naturally arise:
Will these data help business gain an advantage? How can we use historical data to build models of underlying
processes that generated such data? How not to be lost in data? How to use this "hidden" information which dozes? Etc.
This paper presents the interest and the state of our research concerning the development of a cognitive system of
decision-making aid applicable for the production’s follow-up. All the information sources are combined in order to
facilitate the decision-making and adapted to the cognitive characteristics of the human operators. The work presented
is related to the domain of the knowledge’s capitalization and the knowledge’s usability. We chose to illustrate our step
with an industrial example of manufacture of cross-pieces’ support for cars.
Key words: Cooperation human-machine, Data handling systems, Industrial production systems, Knowledge based
systems, Models.
1 Introduction
Actually, the industrial are prone to various
requests which oblige them to implement a step of
continuous improvement. Many requirements must be
reconciled such as raw materials, reliability, safety,
saving energy and environmental protection. If we add
the international competition, we understand that the
management and the control’s quality have become a
determining element of the companies’ development
(Rezg & al., 1995). To carry out these objectives, the
production systems must be able:
• to adapt to a production’s change (multiproduct manufacture), to a risk (introduction of
a rush order, etc),
• to answer quickly and economically.
So the big size of the Dynamic Industrial Systems has
two important consequences. On the one hand,
automation cannot be total; the supervision loop
integrates as well automatic systems (algorithms) as
human operators.
On the other hand, to describe them, it is necessary to
make coexist different models (behavioural, structural,
functional, dysfunctional...) at various levels of
abstraction (structural and functional decompositions).
The figure one summarizes the multi-point of view’s
analysis for the supervision.
Moreover, the progress in data acquisition and storage
technology has led to the fast-growing tremendous and
amount of data stored in databases, data warehouses.
SETIT2005
- targets
- technical specifications
SYSTEM
ANALYSIS
Models
Functional analysis
(GTST, MFM)
Structural analysis
(topography,
Information’s theory)
Behavioral analysis
(bond-graph,
causal graphs)
Specification of
Algorithms
Human-Machine
(FDI…)
Interface
Figure 1. Multi-point of view’s analysis for the
supervision (Riera, 2001)
The industrial context in which we place ourselves is
as follows: the installation’s production exceeds 20%
of faulty pieces. The experts don’t manage to define
the failure causes of the whole of the installation. At
the present time, for the same manufacturing ranges,
the operators produce a random number of right
pieces.
Each relevant information, concerning the system (tip
up mould time, temperature of product’s tools…), as
well as the product (temperature of sand’s cook,
temperature of aluminium) is stored in a data base.
To date, it is impossible to find the components which
were used to crosspiece’s manufacture, through the
various workshops (figure 3).
To allow this link between the data bases, we will use
the concept of traceability, as well as fuzzy logic and
the theory of the possibilities (Dubois, & al., 1998).
Although valuable information may be hiding
behind the data, the vast data volume makes it
difficult, if not impossible, for human beings to extract
them without powerful tools. In order to relieve such a
data rich but information poor plight, during the late
1980s, a new discipline named data mining emerged,
which devotes itself to extracting knowledge from
huge volumes of data (Zouh).
2 Research context
We work on a foundry chain producing car
aluminium crosspiece supports (figure 2). To produce
a crosspiece support, three workshops are essential:
• the mould core manufacturing one,
• the produce of aluminium one,
• the aluminium casting one.
Figure 2. The foundry chain producing car aluminium
crosspiece supports1
1
Fourth workshop represents the finishing touches’
workshops.
Figure 3. To seek the link between the data
3. Data mining, methods and models, our
contribution
3.1 Data mining
The data mining joins the trend (now irreversible)
of the knowledge management. Data mining will
never replace the expertise, but it constitutes a great
tool of formalization and improvement of the
expertise. It often makes it possible to pass from a
tacit knowledge (I can do it) to knowledge clarified (I
can say like I make). Consequently, it is possible to
communicate and increase this knowledge in the
company. It is only one element of the data
transformation process into knowledge (It makes
easier the description of models or rules, starting from
data’s observation).
The techniques of DM can provide knowledge on the
product. In the figure 4a, the cycle’s decomposition of
the data’s transformation into knowledge, is
represented.
The DM needs a certain quantity of data to extract
representative knowledge; this is why these techniques
are adapted better to frequently encountered problems
or repetitive tasks from which it is possible to have
data of training.
SETIT2005
Figure 4a. Step of knowledge management (Lefébure
& al. 2001)
Figure 4b. Step of knowledge management with a
multi-methodological DM tool
For that, in the repetitive tasks, it is possible, with
DM’s methods, to compare the current evolution of a
task with an old equivalent situation; with the
intention of pre-empting the result and the next stage
which must occur.
The tools and methods suggested in the literature not
having fully satisfied us, for the control of the
extraction of knowledge. So, we propose a new
methodology, for our diagnosis’ tools; this
methodology is very strongly based on the tools taken
from the literature. Many works are devoted to the
comparison of methods on the simulated or real data.
The lesson, of good direction which one can learnt
from these methods of DM, is that there is not better
method; their intrinsic properties and the necessary
hypotheses adapt more or less well to the problem
occurred. So, the problem is that the data are not
equivalent (continuous, event-driven, conceptual,
hybrid), well to date, no method can claim to handle
these various data’s types. For this reason our work
moves towards a “multi-methodological” approach.
The perspective is to make coexist the various
methods of DM and thus, use the advantages of some,
to get round the limits of the others:
“United we stand, divided we fall”
Data Mining requires the implementation, clarifies or
not, of traditional statistical methods (principal
components, discriminating, K nearest neighbours,
segmentation, linear regression) or less traditional
(trees of classification and regression) or artificial
intelligence (Bayesian networks, recognition of
forms). The techniques quickly listed previously
section pursue similar goals and can appear like
competitors or rather complementary (Besse & al.,).
Schematically, four nonexclusive objectives are the
research’s target:
• Exploration for a first approach of the data:
check data by the search for inconsistencies,
of atypical data, missing or erroneous, their
transformation
preliminary
to
other
treatments.
• Classification (clustering) to discover a
typology or a segmentation of the
observation.
• Modelling by a whole of variables, to explain
a quantitative or qualitative target variable. It
acts then of a regression or a discrimination
(or classification).
• Recognition of forms without training. It is a
question of detecting a configuration
(pattern) original dissociating data.
So, we have looked into the Lefébure & Venturi point
of view, to add our vision of the multi-methodological
DM tool (figure 4b).
3.2 Methods
It is important that these predictors have an easily
readable form and, if possible, already known apart
from the field. There is a compromise between the
clearness of the model and its predictive capacity. The
more one model takes a simple form, the more it will
be easy to understand, but less it will be able to take
into consideration fines or too varied dependencies
(non-linear).
SETIT2005
For any given problem the data’s nature will affect the
choice of the models and the algorithms, which one
will choose. There is no “better” model or algorithm.
Consequently, we need a variety of tools and
technologies in order to find the best model one. For
that, we listed, in the table 1, the whole of the data
mining’s methods. To our knowledge, nobody took
time to index and compare these methods.
Table 1. An extract of advantages and disadvantages
of each data-mining’s methods
DRAWBACKS
Strong data’s relations
not represented. The
clearness of the trees
can become fallacious.
The trees miss
predictive smoothness.
Very great number of
rules, difficult to
interpret, for a
voluminous database.
The rules can have
conflict forecasts.
Neural
To the examples which Incapacity to explain
the found relations
network it "sees»: it doesn’t
(causes for purposes).
repeat the past. They
(Rumelh are robust. Capacity of Their flexibility is
art & al., generalization. Good in such, as they will find
1994)
the case of problems
many false models,
for which one does not when the signal
know little information report/ratio on noise is
a priori.
low. To approach too
Training automated,
much close to a value
strong predictive
can result in modeling
capacity, capacity to
particular cases, nonaccept co-linearity.
relevant.
K
Goods to discover the Require a great
groups’ zone.
quantity of memory.
nearest
Can be extremely
neighbors
sensitive to the similar
recordings.
Genetic Goods for problems of Identical to the neural
algoforecasts, implying
networks.
rithms
non-linear data.
Allow to obtain
One does not know the
(Two
solutions with
method of resolution.
crow,
problems not having
1999)
resolution’s methods or
whose exact solution is
difficult to be found in
a reasonable time.
They do not require
any knowledge in the
way in which to solve
the problem (required
to evaluate the quality
of the solution).
Methods
based on
decision
trees
(Jambu,
1999)
Methods
based on
rules
ADVANTAGES
Easy comprehension
and interpretability
(each way leads to a
sheet). They manage
the non-numerical data
very well.
Contrary to the
decision trees, the rules
are not necessarily
independent.
3.3 Models
The aim of our research concern the development
of a cognitive system of decision-making aid
applicable for the production’s follow-up. So, for
implementing this system, it is necessary to know the
process. That seems obviousness. However,
considerable achievements begin differently. Finally,
one realizes that some information is known only on
the level of a complete line; whereas several
intermediate operations are executed, with a possible
detection of the rejects on the level of each one
(Allot). For this reason, we modelled our installation,
as well as the thought product. The formalism
employed is Petri net. The Petri’s evolution represent
either the expected operating (normal) of the system
or failure situations.
Each evolution is composed of a set of events and of
time constraints between these events. The distribution
of the time model represented by an evolution induces
the distribution of the events occurrences (Ghallab,
1998). These models are including into the step of
knowledge management.
So, we have added to our vision of the multimethodological DM tool, the process and product
models (figure 4c).
Figure 4c. Product and process models include into
the step of knowledge management
SETIT2005
Conclusion - progress report
We were interested in this paper in the
development of a cognitive system of decisionmaking aid applicable for the production’s follow-up
(figure 5).
installation. With this vision of the multimethodological tool, we can start to answer the
questions of the beginning and, by the same occasion,
to put in trace traceability, worthy the name. As
perspectives, we propose the following axes:
• the co-operation between the human operator,
the process, the product and data, must be
widely studied. In fact, the nature of the
supervised system and its characteristics must
be taken into account while taking a decision
for a co-operation,
• to establish traceability of improvement and
seek the cause of the faulty operation,
• to make coexist our analysis’ module with the
installation’s modelling ; all this combined in
a tool of diagnosis and industrial supervision.
References
Allot, P. La réalisation de votre MES de A à Z.
Ordinal Technologies.
Besse, P., Le Gall C., Raimbault, N., Sarpy, S.,
Data mining et statistique. Communication.
Dubois, D., & Prade, H. (1998). Possibility theory:
qualitative and quantitative aspects. Handbook of
defeasible reasoning and uncertainty management
systems (Vol. 1, pp. 169-226). Kluwer, Netherlands
Ghallab, M. (1998). Chronicles as practical
representation for dealing with time, events and
actions, AIIA Conference, Padoue, Italy.
Jambu, M. (1999). Introduction au data mining –
analyse intelligente des données, Eyrolles. Paris,
France.
Lefébure, R. and Venturi, G. (2001). Data mining –
Gestion de la relation client, personnalisation de sites
web, Eyrolles. Paris, France.
Figure 5. Product and process
capitalization’ ends and knowledge’s
models,
at
We have recorded the values of the different
parameters and of the quality of the resulting pieces in
a database; we have implemented the Principal
Components Analysis (PCA) and the Multiple
Component Analysis (MCA) to prove or not, if our
hypotheses appear exact. But these analyses did not
give any result, this due to the mass of parameters to
be treated and to the non-relevance of those. So, we
have used the data mining’s methods, which will
integrate our tool of assistance.
Moreover, the system of support of crosspieces
comprises a problem of traceability between the
entities "Cores" - "Aluminium" and "Process". We
endeavour to integrate this traceability, which is
essential for such products.
Finally, we have modelled the installation from the
process and product points of view; to that we add a
modelling of the “good” and “faulty” operations of the
Rezg, N. and Niel, E. (1995). Monitoring system
for discrete event system using failure-tolerance
techniques, INRIA/IEEE Conference, of Emergent
technologies and the manufacture systems automation,
Paris, France.
Rumelhart, D.E., Windrow, B. and Lehr, M.A.
(1994) The basic ideas in neural networks,
Communications of the ACM, 37.
Riera, B. (2001). Contribution à la conception
d’outils de supervision adaptés à l’homme, HDR,
Valenciennes, France.
Two Crows Corporation. (1999). Introduction to
data mining and knowledge discovery (third edition).
Zouh, Z.H., Three perspectives of Data Mining,
National Laboratory for Novel Software Technology,
Nanjing University, Nanjing 210093, China.