Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Cybercrime Taxonomy of Visualization of Data Mining Process M. Babič*, B. Jerman-Blažič* * Laboratory for Open Systems and Networks, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia [email protected] Abstract - Data Mining is the process of identifying new patterns, insights in data and knowledge discovery, and is at the intersection of multiple research areas, including Machine Learning, Statistics, Pattern Recognition, Databases, and Visualization. With the maturity of databases and constant improvements in computational speed, data mining algorithms that were too expensive to execute are now within reach. Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. Patterns that might go undetected in text-based data can be exposed and recognized easier with data visualization software. Exploring and analyzing the vast volumes of data becomes increasingly difficult. Information visualization and visual data mining can help to deal with the flood of information. There is a large number of information visualization techniques which have been developed over the last decade to support the exploration of large data sets. In this paper, we propose a classification of information visualization and visual data mining techniques. Fractals and graph theory are very popular in different areas. We present a new method for estimating fractal dimension for network and new Taxonomy of visualization of data mining process application in Cybercrime activity. Keywords: Visual Data mining, Network, Modeling, Fractal geometry, taxonomy, Cybercrime, 1. Introduction Progress in digital data acquisition and storage technology has resulted in the growth of huge databases. This has occurred in all areas of human endeavor, from the mundane to the more exotic. Little wonder, then, that interest has grown in the possibility of tapping these data, of extracting from them information that might be of value to the owner of the database. The discipline concerned with this task has become known as data mining. The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. Data mining is the process of discovering insightful, interesting, and novel patterns, as well as descriptive, understandable, and predictive model s from large-scale data. Data mining is an interdisciplinary exercise. Statistics, database technology, machine learning, pattern recognition, artificial MIPRO 2016/DC VIS intelligence, and visualization, all play a role. And just as it is difficult to define sharp boundaries between these disciplines, so it is difficult to define sharp boundaries between each of them and data mining. At the boundaries, one person's data mining is another's statistics, database, or machine learning problem. Statistical techniques alone may not be sufficient to address some of the more challenging issues in data mining, especially those arising from massive data sets. Nonetheless, statistics plays a very important role in data mining: it is a necessary component in any data mining enterprise. In this section we discuss some of the interplay between traditional statistics and data mining. With large data sets (and particularly with very large data sets) we may simply not know even straightforward facts about the data. In summary, while data mining does overlap considerably with the standard exploratory data analysis techniques of statistics, it also runs into new problems, many of which are consequences of size and the non traditional nature of the data sets involved. Pattern recognition aims to make the process of learning and detection of patterns explicit, such that it can partially or entirely be implemented on computers. Automatic (machine) recognition, description, classification (grouping of patterns into pattern classes) have become important problems in a variety of engineering and scientific disciplines such as biology, psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing. In almost any area of science in which observations are studied but the underlying mathematical or statistical models are not available, pattern recognition can be used to support human concept acquisition or decision making. Given a group of objects, there are two ways to build a classification or recognition system (Watanabe 1985), supervised, i.e., with a teacher, or unsupervised, without the help of a teacher. Interest in pattern recognition has been renewed recently due to emerging applications which are not only challenging but also computationally more demanding, such as data mining, document classification, organization and retrieval of multimedia databases, and application for prevent of cybercrime as biometric authentication (i.e., face recognition and fingerprint matching). 367 Data visualization is a quite new and promising field in computer science. It uses computer graphic effects to reveal the patterns, trends, relationships out of datasets. Human has a long history with basic data visualization, and data visualization is still a hot topic today. The history of visualization was shaped to some extent by available technology and by the pressing needs of the time, they include: primitive paintings on clays, maps on walls, photographs, table of numbers (with rows and columns concepts), these are all some kind of data visualization – although we may not call them under this name at that time. Visualization is the graphical presentation of information, with the goal of providing the viewer with a qualitative understanding of the information contents. It is also the process of transforming objects, concepts, and numbers into a form that is visible to the human eyes. When we say “information”, we may refer to data, processes, relations, or concepts. In computer and network science, network theory [5] is the study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects. Network theory is a part of graph theory. It has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, operations research, climatology and sociology. Applications of network theory include logistical networks, the World Wide Web, Internet, gene regulatory networks, metabolic networks, social networks and epistemological networks. Fig. 1: Data Mining present as an interdisciplinary discipline encompassing a blend of statistical, artificial intelligence, and management science & information systems disciplines for pattern recognition, mathematical modeling, and databases activities In this article we present new cybercrime incident taxonomy of visualization of data mining process. 368 2. Experimental method We use data mining as an interdisciplinary discipline encompassing a blend a blend of statistical, artificial intelligence, and management science & information systems disciplines for pattern recognition, mathematical modeling, and databases activities. After this, we use network theory to present taxonomy of cyber incident. On Fig. 2 is presented incident of cyber crime. Fig. 2. Incident of cyber crime include attackers, attack(s) and objective A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Dendrograms are often used in computational biology to illustrate the clustering of genes or samples. The agglomerative hierarchical clustering algorithms available in this program module build a cluster hierarchy that is commonly displayed as a tree diagram called a dendrogram. They begin with each object in a separate cluster. At each step, the two clusters that are most similar are joined into a single new cluster. Once fused, objects are never separated. The horizontal axis of the dendrogram represents the distance or dissimilarity between clusters. The vertical axis represents the objects and clusters. The dendrogram is fairly simple to interpret. Remember that our main interest is in similarity and clustering. Each joining (fusion) of two clusters is represented on the graph by the splitting of a horizontal line into two horizontal lines. The horizontal position of the split, shown by the short vertical bar, gives the distance (dissimilarity) between the two clusters. Looking at this dendrogram, you can see the three clusters as three branches that occur at about the same horizontal distance. Fig. 3 is can we present as dendrogram and dendrogram presented as network. Fig. 3: Dendrogram presented as network In network we calculated topological property, density. Network density” describes the portion of the potential connections in a network that are actual connections. A MIPRO 2016/DC VIS “potential connection” is a connection that could potentially exist between two “nodes” regardless of whether or not it actually does. The density D of a network is defined as a ratio of the number N of edges E to the number of possible edges, giving D=2E/N×(N-1). 3. Results and discussion Hierarchical clustering is one method for finding community structures in a network. Fig. 4. Present Computer and network cyber incident taxonomy. Incident include attackers, tools, vulnerability, action, target, unauthorized result and objective. On Fig. 5 is presented Computer and network cyber incident taxonomy presented as network. On this network we calculated topological properties density. Density of network is 0.03570248. In this paper, we presented a Systemic Taxonomy of Cyber incident and its network research. Our taxonomy of Cyber incident was an extended taxonomy of existing taxonomies of Cybercrime. A dendrogram of Cyber incident was presented with method graph theory. We described Cyber incident with topological properties of network density. Density present complexity of network. 4. Conclusion This paper has contributed to the conceptual and empirical understanding of global cyber wars and crimes. Different motivations of hackers, source characteristics and target country characteristics lead to different likelihoods of attacks on different organizations. Timely reporting of cyber attacks to authorities is thus likely to strengthen the rules of law and help combat cyber threats in the long run. Cooperation and collaboration among national governments, computer crime authorities and businesses are critical to combat cyber attacks. If national governments work with one another as well as with business communities to modify institutions by defining appropriate policies for the security of the digital world, it will result in lower transaction costs. With topological property density we describe Computer and network cyber incident taxonomy. 5. Fig. 4. Computer and network cyber incident taxonomy Fig. 5: Computer and network cyber incident taxonomy presented as network MIPRO 2016/DC VIS Literature [1] Shukla, D., Verma, Kapil, Dubey, Jayant and Gangele, Sharad, (2012 c): Cyber crime Based Curve Fitting Analysis in Internet Traffic Sharing in Computer Network, International Journal of Computer Application (IJCA), Vol.46(22), pp. 41-51. [2] Gangele, Sharad, Verma, Kapil and Shukla, D.,(2014): Bounded Area Estimation of Internet Traffic Share Curve, International Journal of Computer Science and Business Informatics (IJCSBI), Vol. 10, No. 1, pp. 54-67. [3] Shukla, D., Verma, Kapil and Gangele, Sharad, (2012 a): Iso-Failure in Web Browsing using Markov Chain Model and Curve Fitting Analysis, International Journal of Modern Engineering Research(IJMER) , Vol. 02, Issue 02, pp. 512-517. [4] Shukla, D., Verma, Kapil and Gangele, Sharad, (2012 b): Least Square Curve Fitting Applications under Rest State Environment in Internet Traffic Sharing in Computer Network, International Journal of Computer Science and Telecommunications (IJCST), Vol. 03, Issue 05, pp. 43-51. [5] SWAMY, M.N.S. & THULASIRAMAN, K.:Graphs, Networks, and Algorithms.Wiley (1981). 369