Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Latest Trends in Engineering and Technology (IJLTET) Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College of engineering and Inforamtion Technology,Madurai,TamilNadu,India Abstract- Data mining is a process which finds useful patterns from large amount of data. It predicts future trends and behaviors allowing businesses to take decisions. The paper discusses few of the data mining techniques, algorithms and some of the organizations which have adapted data mining technology to improve their businesses and found excellent results. And also discuss about the architecture of data mining systems, and the tasks and the major issues of data mining. Keywords – Data mining Techniques, Data mining algorithms, Tasksa and Issues. I. INTRODUCTION The major reason that data mining has attracted a great deal of attention in information industry in recent years is due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from business management, production control, and market analysis, to engineering design and science exploration. Data mining can be viewed as a result of the natural evolution of information technology. An evolutionary path has been witnessed in the database industry in the development of the following functionalities data collection and database creation, data management (including data storage and retrieval, and database transaction processing), and data analysis and understanding (involving data warehousing and data mining). For instance, the early development of data collection and database creation mechanisms served as a prerequisite for later development of effective mechanisms for data storage and retrieval, and query and transaction processing. With numerous database systems query and transaction processing as common practice, data analysis and understanding has naturally become the next target. II. EVOLUTION OF DATABASE Database technology since the mid - 1980’s has been characterized by the popular adoption of relational technology and an upsurge of research and development activities on new and powerful database systems. These systems employ advanced data models such as extended relational, object-oriented, object-relational, and deductive models. Advanced-oriented database systems, including spatial, temporal, multimedia, active, and scientific databases, knowledge bases, and office information bases, have flourished. Issues related to the distribution, diversification, and sharing of data have been studied extensively. Heterogeneous database systems and Internet-based global information systems such as the World-Wide Web (WWW) also emerged and play a vital role in the information industry. Vol. 5 Issue 1 January 2015 280 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) Figure-1 Evaluation of Database Data Collection and Database Creation 1960’s and earlier Database Management Systems 1970’s – early 1980 · hierarchical and network database system · relational database system · data modeling tools · query language · on line transaction 0 processing (OLTP) Advanced database system mid 1980’s – present · Advanced Models · Advanced Applications Advanced data Analysis late 1980’s – present Data warehouse and OLAP · · Data mining and knowledge discovery · Data mining Applications Web based databases 1990’s – present XML based database system · · Integration with information retrieval · Data and information integration New generation of Data integration and Information Systems Present and future III. EVOLUTION AND FOUNDATIONS OF DATA MINING It is a application for business and is supported by three technologies · Massive data collection · Multiprocessor computers · Data mining algorithm Steps in Evolution of Data mining · Data collection (1960) – computers, tapes and disks · Data access (1980) – RDBMS, SQL, ODBC · Data warehousing and decision support (1990) – online analytic processing (OLAP) multidimensional databases, data warehouse. · Data mining – advanced algorithm, multiprocessor computer, massive databases. Vol. 5 Issue 1 January 2015 281 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) IV. DATA MINING DEFINITION Data mining is a process of extracting or mining of useful information and patterns from huge data. It is also called as knowledge discovery process, knowledge mining from data, knowledge extraction or dta / pattern analysis. The mined information may be of any other relation between the data items in the data. Mining process is valid, actionable and previously unknown. Figure – 2 Data Mining Process Problem definition Data gathering and Preparation Data access Data sampling Data transformation Model Building and Evaluation Knowledge Deployment Create Model Test Model Evaluate and interpret Model Modern apply Custom repots External Applications Data mining is a logical process that is used to search through large amount of data in order to find useful data. The goal of this technique is to find patterns that were previously unknown. Once these patterns are found they can further be used to make certain decisions for development of their businesses. Three steps involved are · · · Exploration Pattern identification Deployment Exploration: In the first step of data exploration data is cleaned and transformed into another form, and important variables and then nature of data based on the problem are determined. Pattern Identification: Once data is explored, refined and defined for the specific variables and second step is to form pattern identification. Identify and choose the patterns which make the best prediction. Deployment: Patterns are deployed for desired outcome. Vol. 5 Issue 1 January 2015 282 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) V. ARCHITECTURE OF DATA MINING Figure – 3 Architecture of Data Mining User Interface Pattern Evaluation Data Mining Engine Knowledge base Data Warehouse and Database Server Data Cleaning, integration and selection Database Data Warehouse World Wide Web Other Repositories a. Database, Data Warehouse or other Information Repository This is one or et of database and data warehouse and etc. Data cleaning and integration techniques may be applied on data. b. Data base and Data warehouse server It is responsible for fetching data based on user’s data mining request. c. Knowledge base It is used to search or evaluate the interestingness of resulting patterns. It uses the hierarchy concept to organize the attribute. d. Data Mining Engine It is essential. It consists of some functional modules for task like characterization, classification, clustering etc. e. Pattern Evaluation It measures and interacts with the modules to focus the search towards interesting pattern. It is necessary to confine the search to only the interesting pattern. Vol. 5 Issue 1 January 2015 283 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) f. GUI It communicates between user and data mining systems. User may interact the system by specifying the data mining queries. It allows the user to browse the database, data structure, evaluate patterns, and visualize the patterns in different forms. VI. DATA MINING TASKS Data mining provides the link between the transaction and analytical systems. Data mining software analysis relationships and patterns in stored transaction data based on open ended user queries. Relationships are classified into two methods. Prediction method is uses some variables to predict unknown values of other variables. Description method is used to identify the pattern or relationship in data. Figure – 4 Tasks of Data Mining Classification Regression Predictive Time Series Analysis Prediction DATA MINING Clustering Summarization Descriptive Association Rules Sequence Discovery a. Classification It maps the data into predefined groups or classes. The classes are determined before examining the data. And also to stored data locate the data in the predefined group. b. Clustering In this method groups are not predefined. It is defined by the data. It determines the similarity among the data on predefined attributes. Data are grouped into clusters. And the grouping is based on logical relationships. c. Association rules It indentifies data associated with each others. It is often used in the retail sales community which is frequently purchase together. d. Sequence pattern discovery It is used to determine sequential pattern in data. It is based on a time sequence of actions. It is similar to association in that data but relationships based on time. Vol. 5 Issue 1 January 2015 284 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) e. Regression It assumes that the target data fit into some known type of function. It determined the best function of this type of data. f. Time series analysis The value of an attributes is examined as it varies over time. The values obtained at limited time period. g. Prediction Predict the future data states based on past and current data. It predicts future state than the current state. It includes flooding, speech recognize and pattern recognisation. h. Summarization It maps the data into subsets with associated simple descriptions. It is known as characterization and generalization. It derives representative from the database. It characterize the content of the database. VII. DATA MINING ISSUES Figure – 5 Issues of Data Mining STATISTICS DATABASE TECHNOLOGY VISUALIZATION DATA MINING INFORMATION SCIENCE \ a. OTHER DISCIPLINES Human interaction Interfaces may be needed with both domain and technical experts. Experts formulates the queries to interpret the results. Users identify the data and desired results. b. Over fitting When the model is generated with the given databases. It must fit in further for further database. It may arise when the model is created for small size of database. It may arise even though the data are not changed. Vol. 5 Issue 1 January 2015 285 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) c. Outliers Data entities not fit into derived model. Some of the model may not behave well for the data that are not with outliers. d. Interpretation of results Experts needed to interpret the results. It may be meaningless for average database users. e. Visualization of results It is helpful to view the output of data mining algorithms. f. Large data sets The algorithm designed for smaller data sets may create problem for large data sets, associated with data using. g. High dimensionality Not all the attributes are needed to solve the problem. Some method may increase the complexity. Some method may decrease the efficiency of an algorithm. h. Multimedia data Previous data mining algorithm designed for traditional data types. Some new algorithm use of multimedia data. i. Missing data Missing data may be replaced with estimates. Missing data can lead to invalid results. j. Irrelevant data This may not be used to develop the data mining task. k. Noisy data Data which is invalid or incorrect. It must be corrected whenever the data mining application is running. l. Changing data Data bases are not static. Data mining assume the database as static. Algorithm are to be run again whenever the changes occur in the database. m. Integration Integration of data mining functions into traditional DBMS systems is used for a desirable results. n. Application Determine the use for the information obtained for data mining function. VIII. HOW DATA MINING WORKS Data mining is a process called knowledge discovery form database. It invokes scientist, machine learning, Artificial intelligence, information retrieval and pattern recognition. Vol. 5 Issue 1 January 2015 286 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) Figure – 6 Working Principles of Data Mining LEARNING COLLECTING RELEVANT DATA MODEL BUILDING UNDERSTANDING OF BUSINESS PROBLEM IDENTIFICATION BUSINESS STRATEGY AND EVALUATION ACTION a. Modeling Build a model on the data from the existing situation where thon where the answer is known and then applying the model to other situation where the answer is not known People have been doing it for a long time. No problem of data storage and communication. Lots of information about a variety of situations where an answer is known is loaded. Data mining software filters the characteristics of the data that go into the model. Model is built and now can be used in similar situation where the answer is not known. b. Discovery Find something that is new. Data mining tools that sweeps through databases and identify previously hidden pattern. Pattern discovery is the analysis of retail sales data to indentify unrelated products that are often purchased together. c. Prediction Predict the reason. Find a pattern is association with a very specific event or attribute. d. Over fitting Data mining term was used in statistical community. IX. CONCLUSION Data mining involves extracting useful rules or interesting patterns from historical data. There are many data mining tasks each of them further has many techniques. No free lunch theorem exists that is a single technique is not suitable for all kinds of data for all types of domains. Sometimes hybrid techniques have been observed to perform better as compared to the pure ones. Data mining is a “decision support” process in which we search for patterns of information in data. Data mining techniques such as classification, clustering, prediction, association and sequential patterns etc. The commercial, educational and scientific applications are increasingly dependent on these methodologies. Decision trees are a reliable and effective decision making technique which provide high Vol. 5 Issue 1 January 2015 287 ISSN: 2278-621X International Journal of Latest Trends in Engineering and Technology (IJLTET) classification accuracy with a simple representation of collected KDD. It help experts to validate and classify the results and outcomes of tests and analyze various new symptoms of diseases based on data. Thus , data mining can help to play an important role in the field of medicine or health care and disease prediction. REFERENCES [1] [2] [3] [4] Han.J.Kamber. M. data mining concepts and techniques, Morgan Kaufmann publisher, 2001. R.S. Michalski, I. Bratko, and M. Kubat. Machine learning and data mining: Methods and applications. John wiley & sons, 1998. Hand. D., Mannila. H.m Smythe. P., Principles of data mining, Prentice Hall of India, 2001. S.Vijiyarani S.Sudha, Disease Prediction in Data Mining Technique – A Survey, International Journal of Computer Applications & Information Technology, ISSN: 2278-7720 Vol. II, Issue I, January 2013 . [5] Vili Podgorelec, Peter Kokol, Bruno Stiglic, Ivan Rozman, Decision trees: an overview and their use in medicine, Journal of Medical Systems, Kluwer Academic/Plenum Press,Vol. 26, Num. 5, pp. 445-463, October 2002. [6] Goebel, M., and Gruenwald, L. A Survey of Knowledge Discovery and Data Mining Tools. Technical Report, University of Oklahoma, School of Computer Science, Norman, OK, February 1998. [7] Meta Group Inc. Data Mining: Trends, Technology, and Implementation Imperatives. Stamford, CT, February 1997. [8] Goebel, M. and Grunewald, L., A Survey of Knowledge Discovery and Data Mining Tools. Technical Report, University of Oklahoma, School of Computer Science, Norman, OK, February 1998. [9] Berson, A., Smith, S., & Thearling, K. (2011). An Overview of Data Mining Techniques Retrieved November 28, 2011 [10] Dunham, M. (2003). Data Mining: Introductory and Advanced Topics Pearson Education. Vol. 5 Issue 1 January 2015 288 ISSN: 2278-621X