Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE IMPLEMENTATION OF ROUGH SET METHOD FOR ANALYZING INCOMPLETE NUMERICAL DATA Hetty Rohayani. AH Abstract Rough set theory has been applied successfully in many fields. However, the classical rough set model can only deal with complete and symbolic data sets. Some researchers have proposed extended rough set models to handle incomplete data while others proposed extensions of classical rough set models to deal with numerical data. In this research, an extended rough set model, tolerance fuzzy rough set model to deal with this type of data characterized with numerical attributes and missing values, that is, incomplete numerical data. Discernibility matrices and discernibility functions for incomplete numerical information systems and incomplete numerical decision systems are defined to compute reducts or relative reducts. Meanwhile, the relationship between the proposed tolerance fuzzy rough set model and the tolerance rough set model is also examined. It is shown that the tolerance fuzzy rough set model is an extension of the tolerance rough set model. Finally, uncertainty measurement is also investigated. It is suggested that the proposed tolerance fuzzy rough set model provide an optional approach to incomplete numerical data. Keywords: Fuzzy rough set, incomplete numerical data, data mining INTRODUCTION 1.1 Background The rough set theory, developed by Pawlak (1982), has emerged as a major mathematical method to manage uncertainties from inexact, noisy and incomplete information. It has been one of the focal research areas in artificial intelligence since its advent. A rough set provides a representation of a given set using lower and upper approximations when the available information is not sufficient for determining the exact value of the set. The main objective of rough set analysis is to synthesise the approximation of concepts from the acquired data. It has been proven that rough set methods have enormous potential in dealing with uncertainties in decision making. Rough sets have been applied in various fields, including expert systems, machine learning, image processing, pattern recognition, knowledge discovery and control systems. 1 The theory rough set successfully implemented in various sector, but rough set model classical can only associated with the data complete and set of data in symbolic form Jianhua (2013). Research by adopting the theory rough set conducted in attribute numerical and the value of an attribute lost from Jerzy (2011). In this research discussed how distinguish 2 (two) interprets of the value of an attribute lost and the condition of the value of an attribute who ignored (lost values “do not care” conditions) with use the mining LERS in the data mining, with the approach probability and approach rough set, the results of the analysis that with the approach probability no better than approach rough set. By using testing wilcoxon the result of this research produce that there is no significant difference. Jerzy (2011) have done research on the theory rough set to handles data containing attribute numerical and the data having been missing values simultaneously. In this research, a theory of rough set that was expanded by using tolerance fuzzy to handle types of data on with see attribute the data and values lost or data that is not complete. At first, the theory rough set used for numerical data that complete Jerzy (2007) with all value the attribute of being specified. Lately the theory rough extended to set numerical data that is not complete with a value of an attribute that lost Jerzy (2003). The theory set introduced rough Pawlak (1982) in the field of mathematics that is useful to handle information that is not clear and uncertain and has successfully implemented in many areas Pawlak (1991). In theory, rough set of data described is imperfect data using data estimates set down and on, while function used is a function real and function of an imaginary in theory. In addition, the theory rough set usually will start used in analysis started from raw data, where of the theory in subjective hanging from those opinions experts. In a table decision that uses settled with a method of rough set usually containing attribute, in real life, data sets often incomplete, namely that some the value of an attribute missing. In research, this has researchers raised problems related to the decision of data that is not complete seen from the specific data lost or the value of an attribute unknown. 2 1.2. Research Question The problem to be analyzed as follows. 1. How to find the characteristic of numerical data incomplete? 2. How to find the condition and decision from an attribute for the complete data? 1.3. Research Objectives As for the objective of this research is: 1. To find the characteristic of numerical data incomplete. 2. To find the conditions and decisions of numerical data incomplete. 1.4. Literature Review Data mining is a process of getting knowledge or patterns from a data set (Pawlak, 1991). Data mining will solve the problem by analyzing the existing data in the database. Data mining, often also called Knowledge-Discovery in Databases (KDD) is an activity that includes the collection, use historical data to find patterns of regularities, patterns of relationships in large data sets (Santoso, 2007). The output data mining can be used to improve decision making in the future. Many data mining functions that can be used. In certain cases the data mining functions can be combined to address the problems encountered (Jamie, 2009). Here is a function of data mining in general (Ian, 2011): 1. Classification The function of the Classification is to classify the target class in the selected category. 2. Clustering The function of clustering is to find a grouping of attributes into segmentations based on similarity. 3. Association 3 The function of the association is to find the relationship between attributes or item set, based on the number of items that appear and existing association rule. 4. Regression The function of the regression is almost similar to the classification. The function of the regression is aimed to explore the prediction of an existing pattern. 5. Forecasting The function of forecasting is to forecast the future based on the trend that has occurred in the past. 6. Sequence Analysis The function of the sequence analysis is tok for patterns a sequence of the chain of events. 7. Deviation Analysis The function of the deviation analysis is to look for the rare occurrence that is very different from the normal state (abnormal occurrences). Rough Set theory is a mathematical technique developed by Pawlack (1991). This technique is used to address the issue of uncertainty, missing of data, incompleted data, data inconsistency, imprecision and vagueness in the same practice Artificial Intelligence (AI). Rough Set is an efficient technique for Knowledge Discovery in Databases (KDD) and data mining processes. In general, rough set theory has been used in many applications such as medicine, pharmacology, business, banking, engineering design, image processing and decision analysis. The rough set is an efficient technique for the KDD process and data mining. In rough set data can be represented in two forms, namely: incomplete numerical information system and incomplete numerical decision system. METHODOLOGY The research methods for this study were divided into three phases as shown in Figure 1. 4 Fase I : Start SL1 : Literature Review about research rough set Research Problem SL2 : Theories & concept in rough set of numerical data Fase II : SL3 : Theories Systematic Literatur Review Analysis of the concept & theory Model Development Fase III : Evaluating & Testing Validated Models Conclusion/full report End Figure 1. Research Framework 5 & concept rough set of innumerical data. EXPECTED RESULT In this research, a fuzzy rough set approaches is proposed to incomplete numerical information systems which contain both missing values and numerical attributes simultaneously. The proposed model can be viewed as an extension of the tolerance relation based rough set model. It is will be proved that the proposed discernibility matrices and discernibility functions can be used to compute the reduce in an incomplete numerical information system or relative reducts in an incomplete numerical decision system. It is will be also proved that the proposed model is an extension of tolerance rough set model. Uncertainty measurement is also investigated. 6 I. RESEARCH SCHEDULE Tabel 1. Milestone and dates of research activity Project Activity 2016 2017 11 12 1 2 3 4 5 6 7 8 9 10 1.Literature review about research rough set in numerical data Completion of literature review (April 2017) 2. Analysis of the concept and theory incomplete numerical data Completion of concept and theory (October 2017) 3. Model development rough set incomplete numerical data Completion of model development (August 2018) 4. Evaluating, testing, validation and full report Completion of Project Result (February 2019) Journal publication about research topic 0 11 12 2018 1 2 3 4 5 6 7 8 9 10 11 12 2019 1 2 REFERENCES IAN H, Witten, f, E. 2011. Data Mining : Practical Machine Learning Tools and Techniques (3.ed). United States of America: Morgan Kaufmann. JAMIE, Maclenanan, Tang ZHAOHUI, and Crivat BONGDAN. 2009. Data Mining with Microsoft SQL Server. Wiley. JERZY W, Grzymala - Buse. 2003. Rough set strategies to data with missing attribute values. Workshop Notes, Foundations and New Directions of Data Mining. In: the 3-rd International Conference on Data Mining. Melbourne, FL, USA: IEEE, pp.56-63. JERZY W, Grzymala - Buse. 2004. Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In: Proceedings of the RSCTC’2004, Fourth International Conference on Rough Sets and Current Trends in Computing. Uppsala, Sweden: Springer-Verlag, pp.244-253. JERZY W, Grzymala - Buse. 2004. Data with missing attribute values: Generalization of idiscernibility relation and rule induction. Transactions on Rough Sets. IEEE. vol. 1, pp.78–95. JERZY W, Grzymala - Buse. 2007. Mining Numerical Data—A Rough Set Approach. In: International Conference, RSEISP 2007. Warsaw, Poland: Springer Berlin, p-ISBN9: 78-3-540-73450-5,e-ISBN 978-3-540-73451-2, pp.12-21. JERZY W, Grzymala-Buse and Zdzislaw S, Hippe. Nov. 2011. Mining Data with Numerical Attributes and Missing Attribute Values-A Rough Set Approach. IEEE International Conference on Granular Computing., pp.214-219. JERZY W, Grzymala and Y,Wang A. 1997. Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proc. of 0 the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97). Research Triangle Park, NC, pp.69-72. JERZY W, Grzymala, BUSE, and M.HU. 2000. A comparison of several approaches to missing attribute values in data mining. In: Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC’2000. Banff, Canada, pp.340-347. JIANHUA, Dai. 2013. Rough set approach to numerical data. Journal Information Sciences. Vol 241(ISSN:0020-0255), pp.43-57. M, Kryszkiewicz. 1995. Rough set approach to incomplete information systems. In: Proceedings of the Second Annual Joint Conference on Information Sciences. Wrightsville Beach, NC, pp.194-197. M, Kryszkiewicz. 1998. Information Sciences. In: Rough set approach to incomplete information systems, pp.39-49. SANTOSO, B. 2007. Data Mining Teknik Pemanfaatan Data untuk Keperluan Bisnis. Yogyakarta: Graha Ilmu. Z, Pawlak. 1982. Rough Sets. International Journal of Computer and Information Sciences 11., pp.341-356. Z, Pawlak. 1991. Rough Sets. In: Theoretical Aspects of Reasioning about Data, Dordrecht: Kluwer Academic Publisher. 1