Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Extraction using Data Mining Techniques Prof. R. A. Gangurde, Prof. M. R. Sonar Department of MCA, K K Wagh Institute of Engineering Education and Research, Nashik Maharashtra, India Abstract: Data mining is a logical process which finds useful patterns from large amount of data. It is the process of extracting previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. Data mining is the computer-assisted process that digs and analyzes enormous sets of data and then extracts the knowledge out of it. The various techniques of data mining are used to extract the useful piece of knowledge from a database / data warehouse which is growing continuously. This extraction of knowledge is useful in research as well as in organization. In this paper authors have reviewed the literature of data mining techniques such as Classification, Clustering, Association Rules and Prediction. Through data mining we can identify the trends or patterns of the data, thus we can propose a corresponding and optimum plan for the enterprise. As a part of data mining research, this paper focuses on surveying data mining techniques used in knowledge extraction. 2. KNOWLEDGE EXTRACTION PROCESS Keywords- Knowledge discovery, Classification, Clustering, Association Rule, Prediction. 1. INTRODUCTION In the modern era, each and every day, people are dealing with vast amount of data present in different formats. People are making decisions by analyzing these data. Data mining is a process of extraction of useful information and patterns from huge data. It is also called as knowledge discovery process, knowledge mining from data, knowledge extraction or data /pattern analysis. In information age, knowledge is becoming a fundamental organizational resource that provides reasonable advantage and giving rise to knowledge management (KM) initiative. Many organizations collect and stores huge amount of data. However, they are unable to discover valuable information hidden in the data by transforming these data into valuable and useful knowledge. Managing knowledge resources can be a challenge. Data mining is a process of sorting and picking out meaningful and useful information from a large pool of data. Knowledge discovery is a process that extracts implicit, potentially useful or previously unknown information from the data. The knowledge discovery process is described as follows: Data comes from variety of sources is integrated into a single data store called Data warehouse/Data mart. The data stored in Data warehouse/Data mart is called as target data. The target Data is then pre-processed and transformed into standard format. The data mining algorithms process the data to the output in form of patterns or rules. Then those patterns and rules are interpreted to new or useful knowledge or information. As we can see, data mining is a heart of knowledge discovery process. Using data mining we can find Recently various data mining techniques have been developed and used for projects including classification, clustering, association, prediction and sequential patterns etc., are used for knowledge discovery from databases. 3.1. Classification Classification is a classic data mining technique based on machine learning. It is the process which finds common properties among a set of objects in a database and classifies them into different classes according to a classification model. The objective of classification is to first analyze the training data and develop an accurate description or a model for search class using the feature available in the data. Such class description are then use to classify future test data in the database or to develop a better description for each class in the database. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. For Example, Teachers classify students’ grades as A, B, C, D, or F. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group. In recent years, many advanced classification techniques are developed as follow: Regression Distance Decision Trees Fuzzy-sets Neural Networks Support Vector Machine 3.2. Clustering It the process of grouping physical or abstract objects into classes of similar objects. The term useful patterns from large volumes of data and interpret them for useful knowledge and information. 3. DATA MINING TECHNIQUES “Unsupervised Classification” is also often used. The term comes from the differentiation between Clustering and Classification. The difference is that – though both the methods produce a set of clusters with similar properties – in Clustering, we don’t know the number of output classes in advance. Clustering analysis helps to construct meaningful partitioning of a large set of objects based on a divide & conquer methodology which decomposes a large scale system into smaller components to simplify design and implementation. Clustering is the process of arranging items into groups whose elements are similar in some manner. A cluster is therefore a collection of items which are similar between them and are dissimilar to the objects belonging to other clusters. Dissimilarities and similarities are evaluated based on the attribute values describing the objects. Clustering deals with finding a structure in a collection of unlabeled data. The general approach for all clustering techniques is to find cluster’s centre that will characterize each cluster. Various data clustering methods: 1) Partitioning method – It divides the data into number of groups and each group contains atleast one object. E.g. K-Means Clustering. 2) Hierarchical method – It creates a hierarchical decomposition of the data either by using botton up approach (agglomerative) or top down approach (divisive). E.g. Hierarchical Clustering 3) Density based method – In this, cluster continue to grow as long as density of objects exceeds some threshold. E.g. DBSCAN Clustering 4) Grid based method – It forms grid structure from object. The main advantage of this approach is its fast processing time. E.g. STING Clustering 3.3. Association Rule Association rule mining approach is the most efficient data mining method to find out hidden or required pattern among the large volume of data. It is responsible to find relationships among various data attributes in a huge set of items in a database. A huge quantity of interesting relevant associations across the itemsets has been identified by association rules mining. A typical example of the association rules mining is the market basket analysis. Association rules mining helps to explore the relationship among different products in transaction databases and to find out the buyer behaviors, such as the purchase of a commodity impact on other goods. The results can be applied to goods shelf layout, storage arrangements, and classification of users according to buying patterns. Association rule has a mentionable amount of practical applications, including Market Basket Analysis, Recommendation Systems, Classification, XML Mining and Share Market. This rule measure with support to ensure every dataset treated equally in classical model. The perception of association rule mining suggests the support confidence level outline and condensed association rule mining to the discovery of frequent item sets. Rule support and confidence are two measures of interestingness. Association rules are observed as appealing if a minimum support and a minimum confidence threshold is satisfied. Association rule mining procedure can be finished in four steps. 1. 2. 3. 4. Data preparation and select the required data Produce itemsets that determines the rule constraints for knowledge Mine k frequent itemsets using the new database Produce the association rule that set up the knowledge base. The types of association rules are: 1. 2. 3. Multilevel association rule Multidimensional association rule Quantitative association rule 3.4. Prediction Regression technique can be considered for prediction. This technique is used to predict the value of dependent (response) variable from one or more independent (predictor) variable where variables are numeric. There are various forms of regression as 1. 2. 3. 4. 5. 6. 4. Linear Regression Multiple Regression Weighted Regression Polynomial Regression Non- Parametric Regression Robust Regression CONCLUSION Data mining has wide application field almost in every industry where the data is generated enormously. That’s why data mining is considered one of the most important cutting edge in database and information systems. It is one of the most promising interdisciplinary developments in Information Technology also. Data mining techniques such as classification, clustering, association rule, prediction etc helps in finding the patterns to decide upon the future trends in businesses to grow. In this paper we focused on a comprehensive overview of certain data mining techniques. 5. REFERENCES Journal Papers: [1] Kalyani M Raval, Data Mining Techniques, International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 10, October 2012. [2] Madhuri V. Joseph, Lipsa Sadath, Vanaja Rajan, Data Mining: A Comparative Study on Various Techniques and Methods, International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 2, February 2013. [3] Mrs. Tejaswini Abhijit Hilage and R. V. Kulkarni, Review of Literature on Data Mining, IJRRAS 10 (1) January 2012. [4] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, From Data Mining to Knowledge Discovery in Databases, AI Magazine Volume 17, Number 3 1996 [5] Dr. Lokanatha C. Reddy, A Review on Data mining from Past to the Future, International Journal of Computer Applications (0975 – 8887) Volume 15– No.7, February 2011 [6] Md.Zuber, N.Suman, Md. Gouse Pasha, Md. Adam, A STUDY ON DATA MINING APPROACHES, International Journal of Emerging Trends in Engineering and Development Issue 3, Vol.1, January 2013 Books: [7] Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques