Download a literature review on the data mining and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International
Journal of Computer
Engineering
and Technology
(IJCET), ISSN 0976
INTERNATIONAL
JOURNAL
OF COMPUTER
ENGINEERING
&–
6367(Print), ISSN 0976 – 6375(Online)
Volume
3,
Issue
1,
JanuaryJune
(2012),
©
IAEME
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 3, Issue 1, January- June (2012), pp. 141-146
© IAEME: www.iaeme.com/ijcet.html
Journal Impact Factor (2011): 1.0425 (Calculated by GISI)
www.jifactor.com
IJCET
©IAEME
A LITERATURE REVIEW ON THE DATA MINING AND
INFORMATION SECURITY
Mr. M. Karthikeyan
Research Scholar
Department of Computer Science and Applications
Karpagam University, Coimbtaore
Mr. M. Suriya Kumar
Department of Management Studies
Karpagam College of Engineering
Dr. S. Karthikeyan
Professor and Head
Department of Computer Science and Applications
Karpagam University, Coimbtaore
ABSTRACT
This Paper Dealt with the Data Mining and Security Related Issues. Nowadays
Storing and Procuring Data is very easier as if we are having the handful of Technocrats
to warehouse the data and the cryptographic techniques are there to Mining the data in
order to find out the interesting patterns and the combinations. In defense or military the
mechanism of mining is very much harder as such in order to maintain the security.
Based on the Vulnerability of the security attacks, the priority Changes also the
probability of the attacks also varies. The Security Mechanism and Issues Varies with
respect to the type of Data mining.
Keywords: Patterns, Database, Data Mining, Security.
141
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME
INTRODUCTION
Whatever the definition, the data mining process differs, as underlined by (Han et
el 2001) states that, extracting or mining of Knowledge from large amount of data.(Berry
and Linoff, 2000) Stated that, Data mining is a process of analysis and exploration by
means of Automatic or Semi-Automatic to discover the meaning patterns or rules.
(Chen et. Al.,1996) on Business Perspective, Various data mining techniques are
used to better understand user behavior.(Sato., 2000) what ever the definition be, it is the
statistical analysis of data.
LITERATURE REVIEW
Data Mining has a large family composed of various algorithms and the scope is
still expanding, because researchers devote to improve the efficiency and accuracy of the
existing algorithm, new approaches increase with time. Most researches in data mining
area focus on improving efficiency and accuracy of single business application, fewer
efforts are devoted into the discussion of application`s ease of use.
(Moore et.al.,2001)states that more complex the application is, the larger the gap
comes into existence between application and users. The data mining applications to
draw the concepts and characters, and after that proposed a selection model to match
these business requirements to data mining categories to connect complex data mining
concepts with business problems and assists users to choose the best data mining
solution. Knowledge discovery in databases, interesting Knowledge, regularities, or highlevel information can be extracted from the relevant sets of data in databases and be
investigated
DATA MINING PROCESS WITH SECURITY
(Anand . et. Al.,(1997) described the data mining process as given in the below sets.
1.Human Resource Identification.
2.Problem Specifiation
3.Data Prospecting
4.Domain Knowledge Elicitation.
5.Methodology identification.
6.Date Pre-Processing.
142
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME
7.Pattern discovery.
8.Knowledge Post Processing
(Fayad et.al.,) proposed the following Steps: Retrieving the data from the large database,
Selecting the relevant subset to work within, Deciding on the appropriate Sampling
System, Cleaning the data and maintaining the old records, Dealing with Missing files,
need to apply the transformation and to enhance the dimensionality projection, Fitting Up
models to the preprocessed data.
DATA MINING CATEGORISING
(Fayyad.Et.al.,1996) based on the datamining it categorises into Classification,
Regresssion, Clustering, summerisation, Dependency modeling,Link analysis, Sequence
analysis(Han.et. al. 1996) categorises the Association, Generalisation, Classification,
Clustering, Similarity search, path traversal Pattern
(Berry 1997) proposed the Category as Classification, Estimation, prediction,
Affinity Grouping, Clustering, Description.(Gardner`s 2000) states in a case study of
Motorola`s Semiconductor wafer manufacturing problem gives an example as Motorola`s
Production Analysis tool called “CorDex”,Creates a 2 dimensional topology, called a
“Cluster Map”, the best maintains the original133 diensional data inter-relationships.
(Cabena et al., 1997)Data mining is the process of extracting previously unknown,
valid and actionable information from large databases and then using the information to
make crucial business decisions. In essence, data mining is distinguished by the fact that
it is aimed at the discovery of information, without a previously formulated hypothesis.
(Mitchell, 1999) proposed the field of data mining addresses the question of how best to
use the historical data to discover general regularities and improve the process of making
decisions.
DATA MINING AND SECURITY (Bhavani Thuraisingham)
Data mining is the process of posing a series of appropriate queries to extract information
from large quantities of data in the database. Data mining techniques can be applied to
handle problems in database security. On the other hand, data mining techniques can also
be employed to cause security problems. This position paper reviews both aspects. Data
143
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME
mining techniques include those based on rough sets, inductive logic programming,
machine learning, and neural networks, among others. Essentially one arrives at some
hypothesis, which is the information extracted, from examples and patterns observed.
These patterns are observed from posing a series of queries; each query may depend on
the response obtained to the previous queries posed.
Data mining techniques have applications in intrusion detection and auditing databases.
In the case of auditing, the data to be mined is the large quantity of audit data. One may
apply data mining tools to detect abnormal patterns. For example, suppose an employee
makes an excessive number of trips to a particular country and this fact is known by
posing some queries. The next query to pose is whether the employee has associations
with certain people from that country. If the answer is positive, then the employee's
behavior is flagged. While the previous example shows how data mining tools can be
used to detect abnormal behavior, the next example shows how data mining tools can be
applied to cause security problems. Consider a user who has the ability to apply data
mining tools. This user can pose various queries and infer sensitive hypothesis. That is,
the inference problem occurs via data mining. There are various ways to handle this
problem. One approach is as follows. Given a database and a particular data mining tool,
apply the tool to see if sensitive information can be deduced from the unclassified
information legitimately obtained. If so, then there is an inference problem. There are
some issues with this approach. One is that we are applying only one tool. In reality, the
user may have several tools available to him. Furthermore, it is impossible to cover all
ways that the inference problem could occur. Another approach is to build an inference
controller which acts during run-time. As the user applies data mining tools, the inference
controller will analyze the queries posed by the user and the answers, and determines the
types of responses that should be released to the user for each query. The issues involved
in building such an inference controller have to be determined. In summary, data mining
is an area that is growing rapidly. Not only are there several prototypes, commercial
products are also appearing. One needs to take advantage of these tools to handle certain
problems in security. On the other hand, these tools can also cause security
problems. Therefore, appropriate measures have to be taken to detect/prevent such
problems.
144
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME
SECURITY MECHANISMS AND ISSUES
Brachman et al. (1996) differentiated two types of data mining methods:
verification, and discovery, in which the system finds new patterns. Discovery would
includes Prediction and future forecasting.(Chris Clifton.,2001) proposed that the
discovery of new and interesting pattern of data sets is known as data mining, whereas
the security should incorporate along with this data mining.
(Jiaxi et.al.,2003) work are two different cyber security assessment methods. The
first method is a probabilistic assessment. In this method the probability of occurrences
along with probability of a resulting accident are used to calculate a vulnerability index of
the cyber systems. The second method is an integrated approach. Cyber security risks are
first categorized into five different categories based on severity. Then probabilities of a
risk belonging to a category are assigned. Using this information and a formula, the
degree of cyber security risk can be obtained.
Based on the priority the risk has been evaluated and the priority is given based
the attack and the intrusion to the system.
Aleksandra Garvick et.al.,(2003) states that the Security policies and the
Mechanisms were not perfect, more and more organizations are vulnerable to threats and
attacks in Data warehouse. This will reluctantly reflect in data mining, whereas the
security issues arises. This can be rectified based on the strategy and the
mechanism.(Gerhard Puub et.al.,2007) states that to incurbing the malware and other
threats rather than Preventing the Threats is and attacking on Databases while through
mining the data sets.
145
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME
REFERENCES
1. Database Security IX Status and Prospects Edited by D. L. Spooner, S. A.
Demurjian and J. E. Dobson ISBN 0 41272920 2, 1996, pp. 391-399.
2. Pawlak, Z. (1990). Rough sets. Theoretical Aspects of Reasoning about Data,
Kluwer AcademicPublishers, 1992
3. Lin, T. Y. (1994), “Anamoly Detection -- A Soft Computing Approach”,
Proceedings in the ACM SIGSAC New Security Paradigm Workshop, Aug 3-5,
1994,44-53. This paper reappeared in the Proceedings of 1994 National Computer
Security Center Conference under the title “Fuzzy Patterns in data”.
4. Lin, T. Y. (1993), “Rough Patterns in Data-Rough Sets and Intrusion Detection
Systems”, Journal of Foundation of Computer Science and Decision Support,
Vol.18, No. 3-4, 1993. pp. 225-241. The extended version of “Patterns in DataRough Sets and Foundation of Intrusion Detection Systems” presented at the First
Invitational Workshop on Rough Sets, Poznan-Kiekrz, September 2-4. 1992.
146