Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &– 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, JanuaryJune (2012), © IAEME TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), pp. 141-146 © IAEME: www.iaeme.com/ijcet.html Journal Impact Factor (2011): 1.0425 (Calculated by GISI) www.jifactor.com IJCET ©IAEME A LITERATURE REVIEW ON THE DATA MINING AND INFORMATION SECURITY Mr. M. Karthikeyan Research Scholar Department of Computer Science and Applications Karpagam University, Coimbtaore Mr. M. Suriya Kumar Department of Management Studies Karpagam College of Engineering Dr. S. Karthikeyan Professor and Head Department of Computer Science and Applications Karpagam University, Coimbtaore ABSTRACT This Paper Dealt with the Data Mining and Security Related Issues. Nowadays Storing and Procuring Data is very easier as if we are having the handful of Technocrats to warehouse the data and the cryptographic techniques are there to Mining the data in order to find out the interesting patterns and the combinations. In defense or military the mechanism of mining is very much harder as such in order to maintain the security. Based on the Vulnerability of the security attacks, the priority Changes also the probability of the attacks also varies. The Security Mechanism and Issues Varies with respect to the type of Data mining. Keywords: Patterns, Database, Data Mining, Security. 141 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME INTRODUCTION Whatever the definition, the data mining process differs, as underlined by (Han et el 2001) states that, extracting or mining of Knowledge from large amount of data.(Berry and Linoff, 2000) Stated that, Data mining is a process of analysis and exploration by means of Automatic or Semi-Automatic to discover the meaning patterns or rules. (Chen et. Al.,1996) on Business Perspective, Various data mining techniques are used to better understand user behavior.(Sato., 2000) what ever the definition be, it is the statistical analysis of data. LITERATURE REVIEW Data Mining has a large family composed of various algorithms and the scope is still expanding, because researchers devote to improve the efficiency and accuracy of the existing algorithm, new approaches increase with time. Most researches in data mining area focus on improving efficiency and accuracy of single business application, fewer efforts are devoted into the discussion of application`s ease of use. (Moore et.al.,2001)states that more complex the application is, the larger the gap comes into existence between application and users. The data mining applications to draw the concepts and characters, and after that proposed a selection model to match these business requirements to data mining categories to connect complex data mining concepts with business problems and assists users to choose the best data mining solution. Knowledge discovery in databases, interesting Knowledge, regularities, or highlevel information can be extracted from the relevant sets of data in databases and be investigated DATA MINING PROCESS WITH SECURITY (Anand . et. Al.,(1997) described the data mining process as given in the below sets. 1.Human Resource Identification. 2.Problem Specifiation 3.Data Prospecting 4.Domain Knowledge Elicitation. 5.Methodology identification. 6.Date Pre-Processing. 142 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 7.Pattern discovery. 8.Knowledge Post Processing (Fayad et.al.,) proposed the following Steps: Retrieving the data from the large database, Selecting the relevant subset to work within, Deciding on the appropriate Sampling System, Cleaning the data and maintaining the old records, Dealing with Missing files, need to apply the transformation and to enhance the dimensionality projection, Fitting Up models to the preprocessed data. DATA MINING CATEGORISING (Fayyad.Et.al.,1996) based on the datamining it categorises into Classification, Regresssion, Clustering, summerisation, Dependency modeling,Link analysis, Sequence analysis(Han.et. al. 1996) categorises the Association, Generalisation, Classification, Clustering, Similarity search, path traversal Pattern (Berry 1997) proposed the Category as Classification, Estimation, prediction, Affinity Grouping, Clustering, Description.(Gardner`s 2000) states in a case study of Motorola`s Semiconductor wafer manufacturing problem gives an example as Motorola`s Production Analysis tool called “CorDex”,Creates a 2 dimensional topology, called a “Cluster Map”, the best maintains the original133 diensional data inter-relationships. (Cabena et al., 1997)Data mining is the process of extracting previously unknown, valid and actionable information from large databases and then using the information to make crucial business decisions. In essence, data mining is distinguished by the fact that it is aimed at the discovery of information, without a previously formulated hypothesis. (Mitchell, 1999) proposed the field of data mining addresses the question of how best to use the historical data to discover general regularities and improve the process of making decisions. DATA MINING AND SECURITY (Bhavani Thuraisingham) Data mining is the process of posing a series of appropriate queries to extract information from large quantities of data in the database. Data mining techniques can be applied to handle problems in database security. On the other hand, data mining techniques can also be employed to cause security problems. This position paper reviews both aspects. Data 143 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME mining techniques include those based on rough sets, inductive logic programming, machine learning, and neural networks, among others. Essentially one arrives at some hypothesis, which is the information extracted, from examples and patterns observed. These patterns are observed from posing a series of queries; each query may depend on the response obtained to the previous queries posed. Data mining techniques have applications in intrusion detection and auditing databases. In the case of auditing, the data to be mined is the large quantity of audit data. One may apply data mining tools to detect abnormal patterns. For example, suppose an employee makes an excessive number of trips to a particular country and this fact is known by posing some queries. The next query to pose is whether the employee has associations with certain people from that country. If the answer is positive, then the employee's behavior is flagged. While the previous example shows how data mining tools can be used to detect abnormal behavior, the next example shows how data mining tools can be applied to cause security problems. Consider a user who has the ability to apply data mining tools. This user can pose various queries and infer sensitive hypothesis. That is, the inference problem occurs via data mining. There are various ways to handle this problem. One approach is as follows. Given a database and a particular data mining tool, apply the tool to see if sensitive information can be deduced from the unclassified information legitimately obtained. If so, then there is an inference problem. There are some issues with this approach. One is that we are applying only one tool. In reality, the user may have several tools available to him. Furthermore, it is impossible to cover all ways that the inference problem could occur. Another approach is to build an inference controller which acts during run-time. As the user applies data mining tools, the inference controller will analyze the queries posed by the user and the answers, and determines the types of responses that should be released to the user for each query. The issues involved in building such an inference controller have to be determined. In summary, data mining is an area that is growing rapidly. Not only are there several prototypes, commercial products are also appearing. One needs to take advantage of these tools to handle certain problems in security. On the other hand, these tools can also cause security problems. Therefore, appropriate measures have to be taken to detect/prevent such problems. 144 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME SECURITY MECHANISMS AND ISSUES Brachman et al. (1996) differentiated two types of data mining methods: verification, and discovery, in which the system finds new patterns. Discovery would includes Prediction and future forecasting.(Chris Clifton.,2001) proposed that the discovery of new and interesting pattern of data sets is known as data mining, whereas the security should incorporate along with this data mining. (Jiaxi et.al.,2003) work are two different cyber security assessment methods. The first method is a probabilistic assessment. In this method the probability of occurrences along with probability of a resulting accident are used to calculate a vulnerability index of the cyber systems. The second method is an integrated approach. Cyber security risks are first categorized into five different categories based on severity. Then probabilities of a risk belonging to a category are assigned. Using this information and a formula, the degree of cyber security risk can be obtained. Based on the priority the risk has been evaluated and the priority is given based the attack and the intrusion to the system. Aleksandra Garvick et.al.,(2003) states that the Security policies and the Mechanisms were not perfect, more and more organizations are vulnerable to threats and attacks in Data warehouse. This will reluctantly reflect in data mining, whereas the security issues arises. This can be rectified based on the strategy and the mechanism.(Gerhard Puub et.al.,2007) states that to incurbing the malware and other threats rather than Preventing the Threats is and attacking on Databases while through mining the data sets. 145 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME REFERENCES 1. Database Security IX Status and Prospects Edited by D. L. Spooner, S. A. Demurjian and J. E. Dobson ISBN 0 41272920 2, 1996, pp. 391-399. 2. Pawlak, Z. (1990). Rough sets. Theoretical Aspects of Reasoning about Data, Kluwer AcademicPublishers, 1992 3. Lin, T. Y. (1994), “Anamoly Detection -- A Soft Computing Approach”, Proceedings in the ACM SIGSAC New Security Paradigm Workshop, Aug 3-5, 1994,44-53. This paper reappeared in the Proceedings of 1994 National Computer Security Center Conference under the title “Fuzzy Patterns in data”. 4. Lin, T. Y. (1993), “Rough Patterns in Data-Rough Sets and Intrusion Detection Systems”, Journal of Foundation of Computer Science and Decision Support, Vol.18, No. 3-4, 1993. pp. 225-241. The extended version of “Patterns in DataRough Sets and Foundation of Intrusion Detection Systems” presented at the First Invitational Workshop on Rough Sets, Poznan-Kiekrz, September 2-4. 1992. 146