Download New Cybercrime Taxonomy of Visualization of Data Mining Process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer security wikipedia , lookup

Transcript
New Cybercrime Taxonomy of Visualization of
Data Mining Process
M. Babič*, B. Jerman-Blažič*
*
Laboratory for Open Systems and Networks, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
[email protected]
Abstract - Data Mining is the process of identifying new
patterns, insights in data and knowledge discovery, and is at
the intersection of multiple research areas, including
Machine Learning, Statistics, Pattern Recognition,
Databases, and Visualization. With the maturity of
databases and constant improvements in computational
speed, data mining algorithms that were too expensive to
execute are now within reach. Data visualization is a general
term that describes any effort to help people understand the
significance of data by placing it in a visual context. Patterns
that might go undetected in text-based data can be exposed
and recognized easier with data visualization software.
Exploring and analyzing the vast volumes of data becomes
increasingly difficult. Information visualization and visual
data mining can help to deal with the flood of information.
There is a large number of information visualization
techniques which have been developed over the last decade
to support the exploration of large data sets. In this paper,
we propose a classification of information visualization and
visual data mining techniques. Fractals and graph theory
are very popular in different areas. We present a new
method for estimating fractal dimension for network and
new Taxonomy of visualization of data mining process
application in Cybercrime activity.
Keywords: Visual Data mining, Network, Modeling, Fractal
geometry, taxonomy, Cybercrime,
1.
Introduction
Progress in digital data acquisition and storage
technology has resulted in the growth of huge databases.
This has occurred in all areas of human endeavor, from
the mundane to the more exotic. Little wonder, then, that
interest has grown in the possibility of tapping these data,
of extracting from them information that might be of
value to the owner of the database. The discipline
concerned with this task has become known as data
mining. The fundamental algorithms in data mining and
analysis form the basis for the emerging field of data
science, which includes automated methods to analyze
patterns and models for all kinds of data, with
applications ranging from scientific discovery to business
intelligence and analytics. Data mining is the process of
discovering insightful, interesting, and novel patterns, as
well as descriptive, understandable, and predictive model
s from large-scale data. Data mining is an
interdisciplinary exercise. Statistics, database technology,
machine learning, pattern recognition, artificial
MIPRO 2016/DC VIS
intelligence, and visualization, all play a role. And just as
it is difficult to define sharp boundaries between these
disciplines, so it is difficult to define sharp boundaries
between each of them and data mining. At the
boundaries, one person's data mining is another's
statistics, database, or machine learning problem.
Statistical techniques alone may not be sufficient to
address some of the more challenging issues in data
mining, especially those arising from massive data sets.
Nonetheless, statistics plays a very important role in data
mining: it is a necessary component in any data mining
enterprise. In this section we discuss some of the
interplay between traditional statistics and data mining.
With large data sets (and particularly with very large data
sets) we may simply not know even straightforward facts
about the data. In summary, while data mining does
overlap considerably with the standard exploratory data
analysis techniques of statistics, it also runs into new
problems, many of which are consequences of size and
the non traditional nature of the data sets involved.
Pattern recognition aims to make the process of learning
and detection of patterns explicit, such that it can partially
or entirely be implemented on computers. Automatic
(machine) recognition, description, classification
(grouping of patterns into pattern classes) have become
important problems in a variety of engineering and
scientific disciplines such as biology, psychology,
medicine, marketing, computer vision, artificial
intelligence, and remote sensing. In almost any area of
science in which observations are studied but the
underlying mathematical or statistical models are not
available, pattern recognition can be used to support
human concept acquisition or decision making. Given a
group of objects, there are two ways to build a
classification or recognition system (Watanabe 1985),
supervised, i.e., with a teacher, or unsupervised, without
the help of a teacher. Interest in pattern recognition has
been renewed recently due to emerging applications
which are not only challenging but also computationally
more demanding, such as data mining, document
classification, organization and retrieval of multimedia
databases, and application for prevent of cybercrime as
biometric authentication (i.e., face recognition and
fingerprint matching).
367
Data visualization is a quite new and promising field in
computer science. It uses computer graphic effects to
reveal the patterns, trends, relationships out of datasets.
Human has a long history with basic data visualization,
and data visualization is still a hot topic today. The
history of visualization was shaped to some extent by
available technology and by the pressing needs of the
time, they include: primitive paintings on clays, maps on
walls, photographs, table of numbers (with rows and
columns concepts), these are all some kind of data
visualization – although we may not call them under this
name at that time. Visualization is the graphical
presentation of information, with the goal of providing
the viewer with a qualitative understanding of the
information contents. It is also the process of
transforming objects, concepts, and numbers into a form
that is visible to the human eyes. When we say
“information”, we may refer to data, processes, relations,
or concepts.
In computer and network science, network theory [5] is
the study of graphs as a representation of either
symmetric relations or, more generally, of asymmetric
relations between discrete objects. Network theory is a
part of graph theory. It has applications in many
disciplines including statistical physics, particle physics,
computer science, electrical engineering, biology,
economics, operations research, climatology and
sociology. Applications of network theory include
logistical networks, the World Wide Web, Internet, gene
regulatory networks, metabolic networks, social networks
and epistemological networks.
Fig. 1: Data Mining present as an interdisciplinary
discipline encompassing a blend of statistical, artificial
intelligence, and management science & information
systems disciplines for pattern recognition, mathematical
modeling, and databases activities
In this article we present new cybercrime incident
taxonomy of visualization of data mining process.
368
2.
Experimental method
We use data mining as an interdisciplinary discipline
encompassing a blend a blend of statistical, artificial
intelligence, and management science & information
systems disciplines for pattern recognition, mathematical
modeling, and databases activities. After this, we use
network theory to present taxonomy of cyber incident. On
Fig. 2 is presented incident of cyber crime.
Fig. 2. Incident of cyber crime include attackers, attack(s)
and objective
A dendrogram is a tree diagram frequently used to
illustrate the arrangement of the clusters produced by
hierarchical clustering. Dendrograms are often used in
computational biology to illustrate the clustering of genes
or samples. The agglomerative hierarchical clustering
algorithms available in this program module build a
cluster hierarchy that is commonly displayed as a tree
diagram called a dendrogram. They begin with each
object in a separate cluster. At each step, the two clusters
that are most similar are joined into a single new cluster.
Once fused, objects are never separated. The horizontal
axis of the dendrogram represents the distance or
dissimilarity between clusters. The vertical axis
represents the objects and clusters. The dendrogram is
fairly simple to interpret. Remember that our main
interest is in similarity and clustering. Each joining
(fusion) of two clusters is represented on the graph by the
splitting of a horizontal line into two horizontal lines. The
horizontal position of the split, shown by the short
vertical bar, gives the distance (dissimilarity) between the
two clusters. Looking at this dendrogram, you can see the
three clusters as three branches that occur at about the
same horizontal distance. Fig. 3 is can we present as
dendrogram and dendrogram presented as network.
Fig. 3: Dendrogram presented as network
In network we calculated topological property, density.
Network density” describes the portion of the potential
connections in a network that are actual connections. A
MIPRO 2016/DC VIS
“potential connection” is a connection that could
potentially exist between two “nodes” regardless of
whether or not it actually does. The density D of a
network is defined as a ratio of the number N of edges E
to the number of possible edges, giving
D=2E/N×(N-1).
3.
Results and discussion
Hierarchical clustering is one method for finding
community structures in a network. Fig. 4. Present
Computer and network cyber incident taxonomy. Incident
include attackers, tools, vulnerability, action, target,
unauthorized result and objective.
On Fig. 5 is presented Computer and network cyber
incident taxonomy presented as network. On this network
we calculated topological properties density. Density of
network is 0.03570248. In this paper, we presented a
Systemic Taxonomy of Cyber incident and its network
research. Our taxonomy of Cyber incident was an
extended taxonomy of existing taxonomies of
Cybercrime. A dendrogram of Cyber incident was
presented with method graph theory. We described Cyber
incident with topological properties of network density.
Density present complexity of network.
4.
Conclusion
This paper has contributed to the conceptual and
empirical understanding of global cyber wars and crimes.
Different motivations of hackers, source characteristics
and target country characteristics lead to different
likelihoods of attacks on different organizations. Timely
reporting of cyber attacks to authorities is thus likely to
strengthen the rules of law and help combat cyber threats
in the long run. Cooperation and collaboration among
national governments, computer crime authorities and
businesses are critical to combat cyber attacks. If national
governments work with one another as well as with
business communities to modify institutions by defining
appropriate policies for the security of the digital world, it
will result in lower transaction costs. With topological
property density we describe Computer and network
cyber incident taxonomy. 5.
Fig. 4. Computer and network cyber incident taxonomy
Fig. 5: Computer and network cyber incident taxonomy
presented as network
MIPRO 2016/DC VIS
Literature
[1] Shukla, D., Verma, Kapil, Dubey, Jayant and
Gangele, Sharad, (2012 c): Cyber crime Based Curve
Fitting Analysis in Internet Traffic Sharing in Computer
Network, International Journal of Computer Application
(IJCA), Vol.46(22), pp. 41-51.
[2] Gangele, Sharad, Verma, Kapil and Shukla,
D.,(2014): Bounded Area Estimation of Internet Traffic
Share Curve, International Journal of Computer Science
and Business Informatics (IJCSBI), Vol. 10, No. 1, pp.
54-67.
[3] Shukla, D., Verma, Kapil and Gangele, Sharad, (2012
a): Iso-Failure in Web Browsing using Markov Chain
Model and Curve Fitting Analysis, International Journal
of Modern Engineering Research(IJMER) , Vol. 02, Issue
02, pp. 512-517.
[4] Shukla, D., Verma, Kapil and Gangele, Sharad, (2012
b): Least Square Curve Fitting Applications under Rest
State Environment in Internet Traffic Sharing in
Computer Network, International Journal of Computer
Science and Telecommunications (IJCST), Vol. 03, Issue
05, pp. 43-51.
[5] SWAMY, M.N.S. & THULASIRAMAN, K.:Graphs,
Networks, and Algorithms.Wiley (1981).
369