Download VISUALIZATION METHODS OF DATA MINING MODELS AT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
67
VISUALIZATION METHODS OF DATA MINING MODELS AT
PROCESSING THE TELEMETERING INFORMATION
А.О. Deripaska1, V.V. Geppener1
1Centre
of science-engineering of the St.-Petersburg state electro technical university, 197376,
Russia, St.-Petersburg, street of professor Popova 5, (812) 346-08-67, [email protected]
This report considers: features of work with telemetering measurements, opportunity of
applying Data Mining for processing and analysis of initial data and methods of Data
Mining models visualization.
Introduction
Literally “telemetry” [1] means measuring on
distance. The matter of the modern telemetry
forms a wide range of problems, concerning
acquisition, transformation, transferring and
measuring data processing, which is used for
consoling objects, defining their condition
studying physical processes in places where
immediate presence of the observer is difficult
or impossible.
According to State Standard [2], «telemetry is
a field of science and technology, studying the
questions of development of automated assets
package,
which
provide
acquisition,
transformation,
transferring
though
communication
channel,
acquisition,
transformation and registration measuring
information and information on events to
console the condition and functioning of the
technical and biological systems of various
objects and to study weather phenomenon’s».
Telemetry assets are a powerful tool for world
cognition.
Control
under
various
phenomenon’s, processes and objects, defining
conditions of their functioning, became
possible using telemetry as a basic tool.
Nowadays telemetry equipment is widely
applied in meteorology and geophysics, in
natural gas industry, nuclear and chemical
industry, in medicine and other fields of
national economy.
Telemetry takes leading hand in the process of
military hardware testing and armament
testing, and also in controlling automated
technical systems while process of reaching
target goals. The difficulty of technical objects
leads to necessity of controlling a big amount
of parameters describing condition and
behavior of separate integral units and objects
in whole. Also while the process of
engineering development more increasing
criteria concerning accuracy, reliability of data
and operability of its obtaining are claimed.
These are the top problems of modern
telemetry.
The difficulty of telemetry data is that its
handling occurs in distance. It implies
transferring a big amount of data, which must
be analyzed on the spot. For the registration,
handling and controlling telemetry data special
program software is necessary. The main
criteria to this software are high maintenance
reliability, efficient handling and controlling
on-line. Due to computers’ development more
powerful methods of information handling
became possible like Data Mining. This trend
has got a rapid development recently.
Data Mining
The Term Data Mining is often translated as
production of data, extraction of the
information, an intellectual data analysis,
means for search of laws, extraction of
68
knowledge, the analysis of patterns,
"extraction of grains of knowledge from
mountains of data", excavation of knowledge
in databases, information drifting data,
"washing" of data. "Detection of knowledge in
databases"
(Knowledge
Discovery
in
Databases, KDD) it is possible to consider
concept synonym Data Mining [3].
Classical definition of this term was given in
1996 by one of founders of this direction
Piatetsky-Shapiro: Data Mining - research and
detection by "machine" (algorithms, means of
an artificial intellect) in the crude given latent
knowledge which earlier were not known,
nontrivial, are practically useful, accessible to
interpretation by the person [4].
From great volume of initial data by means of
Data Mining it is necessary to reveal the
knowledge possessing following properties:
 knowledge should be new, before
unknown;
 knowledge should be nontrivial;
 knowledge should be practically useful;
 knowledge should be accessible to
person’s understanding.
There are models, which serve to grante the
received knowledge in Data Mining. Kinds of
models depend on methods of their creation.
At processing the telemetering information
with use of methods of the intellectual analysis
the problem of visualization of results rises
especially sharply, in connection with great
volume of initial data and complexity of
models Data Mining received during
processing. By means of computer it is
necessary to give schedules data of complex
structure in an evident and clear kind that will
allow the operator to make the decision in a
mode of real time.
Visualization of models Data Mining
Let's considerthe the basic models, that allow
to make the complex analysis of data:
clusterisation model, qualifiers construction
model, associative rules construction model,
and also the analysis and the preliminary
processing of an initial data. For each of the
models adequate results visualization methods
have been chosen and realized.
The preliminary analysis of the data. This
stage is very important. In case you need, you
can make preliminary transformation of the
data applying various filters. For example:
digitization of the data. It is necessary for
some algorithms, such as associative rules
construction, which cannot work with
quantitative data. Accuracy of final results
depends on the correct analysis of initial data
and on the choice of adequate processing
model.
Depending on initial data type, linear graphics
or histogramsare being built, because
quantitative or qualitative attributes processing
are both supposed. For quantitative attributes
building of a sliding median and a sliding
average with the set window is supported.
Display of dependences between various
attributes is also supported. Such dependences
are displayed with use of linear and dot
graphics. The special attention was paid to
displaing changes of the data parameters in
time. When using polynomial or exponential
initial attributes models, linear graphic
representation is used.
By means of computer graphics process of
digitization becomes simple and fast. The
preliminary results could be seen at any
moment.
Such methods allow representing evidently
initial data and the dependences between
various data sets.
Classification methods. Often at the analysis it
is required to define, to what of known classes
investigated objects concern, i.e. to classify
them. In Data Mining, the problem of
classification is considered as a problem of
definition of some parameters value of
analyzed object on the basis of the other
parameters values.
Results of classification methods are
represented in a structural kind and in the form
of graphs. Creation of qualifiers is made on the
basis of algorithms of construction of trees of
decisions (further simply "trees").
Structural representation allows to analyze a
tree with a various degree of detailed
elaboration. The graph form of the tree
represents before the user of an opportunity of
easy and convenient search and viewing of
69
data, opportunities of transition from the
detailed analysis to operations on concrete
units of a tree. In this form of a tree, units of a
tree are represented as ellipses, and leaves - as
rectangulars. Well-known, this way of
visualization is the most evident among the
others, that explains its use in the given
program.
Operations of alignment of a tree on the center
of a window and resizing a tree on width of a
window are realized.
Associative rules construction methods. Search
of associative rules is one of the most popular
appendices Data Mining. The essence consists
in definition of often meeting sets of objects in
the big set of such sets. The
problem of association is a special case of
classification problem.
Application of associative rules construction
methods is possible only to qualitative
attributes. Associative rules construction
methods are convenient for displaying in
three-dimensional space. On axis OX and on
axis OY values of the variables located in the
left and right parts of a rule are taken
accordingly, and on axis OZ - reliability of the
rule. Support rule is displayed by means of
color scale.
Clusterisation methods. The problem of
clusterisation consists in division of
investigated set of objects into groups of the
"similar" objects named clusters. The word
cluster [4] is translated as a clot, a bunch,
group. The related concepts used in the
literature: a class, a condensation.
Clusterisation can be applied in any branch
where research of experimental or statistical
data is necessary.
There are four various ways of cluster-analysis
results visualization realized. It allows to
facilitate considerably the work of the expert
at the analysis of the revealed similar groups
of objects. Distribution of initial data into
clusters can be analyzed with the use of dot
graphics. An essence of a method: on graphics
axes the chosen attributes values are taken, the
accessory of objects to various clusters is
displayed on the graphics by various colors.
Effective ways of initial attributes space
reduction are algorithms of the main
components and multivariate scaling. They
allow presenting the distribution of initial data
into clusters in space of two main components.
It promotes a visual estimation of a grouping
investigated data and found clusters.
Histograms are applied to display likelihood
distribution of the received results in clusters.
Thus, we were once again convinced of
convenience and necessity of visualization of
processes of work with data.
Implementation of the visualization method
The methods of visualization, listed below,
have been implemented in the telemetering
information processing software package.
Before visualization implementation, a search
of graphic libraries have been done. The
following libraries were considered: JSCi,
JFC, JFreeChart.
The first to expose its shortcomings was the
JSCi library. Serious shortcomings made it not
suitable for visualization of great data
volumes. For instance, the color palette is
limited by 8 colors.
The next graphic library, taken in
consideration, is standard Java library, JFC
(Java Foundation Classes) - is simple to use,
universal, but in quality of the image yields to
the JFreeChart library.
After consideration of the graphic libraries
enabling 2D resources, JFreeChart library [5]
has been chosen as the most capable. It has the
following advantages: high image quality;
image scaling and saving in PNG format file
opportunities, etc. The main advantage of this
library is in presence of a wide variety of
graph types which allows to expand a data
visualization methods set. The library has no
strict associating to the system type which
allows applying it in various operational
systems. This feature makes the software
product cross-platform, which ads it a great
advantage nowadays.
As to the 3D libraries choosing is not that
easy. And the main problem here is a huge
variety of libraries. After the analysis of merits
and demerits, OpenGL was chosen. The reason
why OpenGL was chosen, not DirectX, is in
cross-platform libraries capabilities. As the
results of OpenGL compatible libraries search
70
to main libraries were chosen: LWIGL and
JOGL.
LWIGL [6] - the serious library which profile
is for game creation. For construction 3D
graphs JOGL (Java OpenGL [7]) library
approaches better. It is much easier to use,
more economic in resources and thus provides
with all necessary means for 3D development.
The library is cross-platform, which allows
developing projects of visualization on many
modern operational systems.
This direction is perspective and developing.
Already today, many methods are more
obvious, when showing in three-dimensional
space. For example methods of construction of
associative rules more evidently.
References
1.
2.
Conclusion
3.
Wide introduction of computer graphics for
data visualization of the various processing
and forecasting projects allows quickly
enough, obviously and with the least inputs to
present the problem’s solution with various
methods and algorithms results. Graphs
become not only the way of results’
visualization, but also the high-grade tool for
work with results.
4.
5.
6.
7.
Nazarov A.V., Kozyrev G.I., Shitov I.V.,
Obruchenkov V.P., Drevin A.V. Modern telemetry
in the theory and an expert, 2007, 22-24с. . (“in
Russian”)
GOST 19619-74 Furnitures radiotelemetering.
Terms and determinations. . (“in Russian”)
Fayyad,
Piatetsky-Shapiro,
Smyth,
and
Uthurusamy Advances in Knowledge Discovery
and Data Mining, (Chapter 1) AAAI/MIT Press
1996.
Barsegjan A.A., Kuprijanov M.S., Stepanenko
V.V., I.I.method's Chill and patterns of the
analysis of data: OLAP and Data Mining, 2004,
67с. (“in Russian”)
http://www.jfree.org/jfreechart/
http: // www.lwjgl.org/
https://jogl.dev.java.net/