Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
67 VISUALIZATION METHODS OF DATA MINING MODELS AT PROCESSING THE TELEMETERING INFORMATION А.О. Deripaska1, V.V. Geppener1 1Centre of science-engineering of the St.-Petersburg state electro technical university, 197376, Russia, St.-Petersburg, street of professor Popova 5, (812) 346-08-67, [email protected] This report considers: features of work with telemetering measurements, opportunity of applying Data Mining for processing and analysis of initial data and methods of Data Mining models visualization. Introduction Literally “telemetry” [1] means measuring on distance. The matter of the modern telemetry forms a wide range of problems, concerning acquisition, transformation, transferring and measuring data processing, which is used for consoling objects, defining their condition studying physical processes in places where immediate presence of the observer is difficult or impossible. According to State Standard [2], «telemetry is a field of science and technology, studying the questions of development of automated assets package, which provide acquisition, transformation, transferring though communication channel, acquisition, transformation and registration measuring information and information on events to console the condition and functioning of the technical and biological systems of various objects and to study weather phenomenon’s». Telemetry assets are a powerful tool for world cognition. Control under various phenomenon’s, processes and objects, defining conditions of their functioning, became possible using telemetry as a basic tool. Nowadays telemetry equipment is widely applied in meteorology and geophysics, in natural gas industry, nuclear and chemical industry, in medicine and other fields of national economy. Telemetry takes leading hand in the process of military hardware testing and armament testing, and also in controlling automated technical systems while process of reaching target goals. The difficulty of technical objects leads to necessity of controlling a big amount of parameters describing condition and behavior of separate integral units and objects in whole. Also while the process of engineering development more increasing criteria concerning accuracy, reliability of data and operability of its obtaining are claimed. These are the top problems of modern telemetry. The difficulty of telemetry data is that its handling occurs in distance. It implies transferring a big amount of data, which must be analyzed on the spot. For the registration, handling and controlling telemetry data special program software is necessary. The main criteria to this software are high maintenance reliability, efficient handling and controlling on-line. Due to computers’ development more powerful methods of information handling became possible like Data Mining. This trend has got a rapid development recently. Data Mining The Term Data Mining is often translated as production of data, extraction of the information, an intellectual data analysis, means for search of laws, extraction of 68 knowledge, the analysis of patterns, "extraction of grains of knowledge from mountains of data", excavation of knowledge in databases, information drifting data, "washing" of data. "Detection of knowledge in databases" (Knowledge Discovery in Databases, KDD) it is possible to consider concept synonym Data Mining [3]. Classical definition of this term was given in 1996 by one of founders of this direction Piatetsky-Shapiro: Data Mining - research and detection by "machine" (algorithms, means of an artificial intellect) in the crude given latent knowledge which earlier were not known, nontrivial, are practically useful, accessible to interpretation by the person [4]. From great volume of initial data by means of Data Mining it is necessary to reveal the knowledge possessing following properties: knowledge should be new, before unknown; knowledge should be nontrivial; knowledge should be practically useful; knowledge should be accessible to person’s understanding. There are models, which serve to grante the received knowledge in Data Mining. Kinds of models depend on methods of their creation. At processing the telemetering information with use of methods of the intellectual analysis the problem of visualization of results rises especially sharply, in connection with great volume of initial data and complexity of models Data Mining received during processing. By means of computer it is necessary to give schedules data of complex structure in an evident and clear kind that will allow the operator to make the decision in a mode of real time. Visualization of models Data Mining Let's considerthe the basic models, that allow to make the complex analysis of data: clusterisation model, qualifiers construction model, associative rules construction model, and also the analysis and the preliminary processing of an initial data. For each of the models adequate results visualization methods have been chosen and realized. The preliminary analysis of the data. This stage is very important. In case you need, you can make preliminary transformation of the data applying various filters. For example: digitization of the data. It is necessary for some algorithms, such as associative rules construction, which cannot work with quantitative data. Accuracy of final results depends on the correct analysis of initial data and on the choice of adequate processing model. Depending on initial data type, linear graphics or histogramsare being built, because quantitative or qualitative attributes processing are both supposed. For quantitative attributes building of a sliding median and a sliding average with the set window is supported. Display of dependences between various attributes is also supported. Such dependences are displayed with use of linear and dot graphics. The special attention was paid to displaing changes of the data parameters in time. When using polynomial or exponential initial attributes models, linear graphic representation is used. By means of computer graphics process of digitization becomes simple and fast. The preliminary results could be seen at any moment. Such methods allow representing evidently initial data and the dependences between various data sets. Classification methods. Often at the analysis it is required to define, to what of known classes investigated objects concern, i.e. to classify them. In Data Mining, the problem of classification is considered as a problem of definition of some parameters value of analyzed object on the basis of the other parameters values. Results of classification methods are represented in a structural kind and in the form of graphs. Creation of qualifiers is made on the basis of algorithms of construction of trees of decisions (further simply "trees"). Structural representation allows to analyze a tree with a various degree of detailed elaboration. The graph form of the tree represents before the user of an opportunity of easy and convenient search and viewing of 69 data, opportunities of transition from the detailed analysis to operations on concrete units of a tree. In this form of a tree, units of a tree are represented as ellipses, and leaves - as rectangulars. Well-known, this way of visualization is the most evident among the others, that explains its use in the given program. Operations of alignment of a tree on the center of a window and resizing a tree on width of a window are realized. Associative rules construction methods. Search of associative rules is one of the most popular appendices Data Mining. The essence consists in definition of often meeting sets of objects in the big set of such sets. The problem of association is a special case of classification problem. Application of associative rules construction methods is possible only to qualitative attributes. Associative rules construction methods are convenient for displaying in three-dimensional space. On axis OX and on axis OY values of the variables located in the left and right parts of a rule are taken accordingly, and on axis OZ - reliability of the rule. Support rule is displayed by means of color scale. Clusterisation methods. The problem of clusterisation consists in division of investigated set of objects into groups of the "similar" objects named clusters. The word cluster [4] is translated as a clot, a bunch, group. The related concepts used in the literature: a class, a condensation. Clusterisation can be applied in any branch where research of experimental or statistical data is necessary. There are four various ways of cluster-analysis results visualization realized. It allows to facilitate considerably the work of the expert at the analysis of the revealed similar groups of objects. Distribution of initial data into clusters can be analyzed with the use of dot graphics. An essence of a method: on graphics axes the chosen attributes values are taken, the accessory of objects to various clusters is displayed on the graphics by various colors. Effective ways of initial attributes space reduction are algorithms of the main components and multivariate scaling. They allow presenting the distribution of initial data into clusters in space of two main components. It promotes a visual estimation of a grouping investigated data and found clusters. Histograms are applied to display likelihood distribution of the received results in clusters. Thus, we were once again convinced of convenience and necessity of visualization of processes of work with data. Implementation of the visualization method The methods of visualization, listed below, have been implemented in the telemetering information processing software package. Before visualization implementation, a search of graphic libraries have been done. The following libraries were considered: JSCi, JFC, JFreeChart. The first to expose its shortcomings was the JSCi library. Serious shortcomings made it not suitable for visualization of great data volumes. For instance, the color palette is limited by 8 colors. The next graphic library, taken in consideration, is standard Java library, JFC (Java Foundation Classes) - is simple to use, universal, but in quality of the image yields to the JFreeChart library. After consideration of the graphic libraries enabling 2D resources, JFreeChart library [5] has been chosen as the most capable. It has the following advantages: high image quality; image scaling and saving in PNG format file opportunities, etc. The main advantage of this library is in presence of a wide variety of graph types which allows to expand a data visualization methods set. The library has no strict associating to the system type which allows applying it in various operational systems. This feature makes the software product cross-platform, which ads it a great advantage nowadays. As to the 3D libraries choosing is not that easy. And the main problem here is a huge variety of libraries. After the analysis of merits and demerits, OpenGL was chosen. The reason why OpenGL was chosen, not DirectX, is in cross-platform libraries capabilities. As the results of OpenGL compatible libraries search 70 to main libraries were chosen: LWIGL and JOGL. LWIGL [6] - the serious library which profile is for game creation. For construction 3D graphs JOGL (Java OpenGL [7]) library approaches better. It is much easier to use, more economic in resources and thus provides with all necessary means for 3D development. The library is cross-platform, which allows developing projects of visualization on many modern operational systems. This direction is perspective and developing. Already today, many methods are more obvious, when showing in three-dimensional space. For example methods of construction of associative rules more evidently. References 1. 2. Conclusion 3. Wide introduction of computer graphics for data visualization of the various processing and forecasting projects allows quickly enough, obviously and with the least inputs to present the problem’s solution with various methods and algorithms results. Graphs become not only the way of results’ visualization, but also the high-grade tool for work with results. 4. 5. 6. 7. Nazarov A.V., Kozyrev G.I., Shitov I.V., Obruchenkov V.P., Drevin A.V. Modern telemetry in the theory and an expert, 2007, 22-24с. . (“in Russian”) GOST 19619-74 Furnitures radiotelemetering. Terms and determinations. . (“in Russian”) Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy Advances in Knowledge Discovery and Data Mining, (Chapter 1) AAAI/MIT Press 1996. Barsegjan A.A., Kuprijanov M.S., Stepanenko V.V., I.I.method's Chill and patterns of the analysis of data: OLAP and Data Mining, 2004, 67с. (“in Russian”) http://www.jfree.org/jfreechart/ http: // www.lwjgl.org/ https://jogl.dev.java.net/