Download references - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
THE IMPLEMENTATION OF ROUGH SET METHOD FOR
ANALYZING INCOMPLETE NUMERICAL DATA
Hetty Rohayani. AH
Abstract
Rough set theory has been applied successfully in many fields. However, the
classical rough set model can only deal with complete and symbolic data sets.
Some researchers have proposed extended rough set models to handle incomplete
data while others proposed extensions of classical rough set models to deal with
numerical data. In this research, an extended rough set model, tolerance fuzzy
rough set model to deal with this type of data characterized with numerical
attributes and missing values, that is, incomplete numerical data. Discernibility
matrices and discernibility functions for incomplete numerical information
systems and incomplete numerical decision systems are defined to compute
reducts or relative reducts. Meanwhile, the relationship between the proposed
tolerance fuzzy rough set model and the tolerance rough set model is also
examined. It is shown that the tolerance fuzzy rough set model is an extension of
the tolerance rough set model. Finally, uncertainty measurement is also
investigated. It is suggested that the proposed tolerance fuzzy rough set model
provide an optional approach to incomplete numerical data.
Keywords: Fuzzy rough set, incomplete numerical data, data mining
INTRODUCTION
1.1 Background
The rough set theory, developed by Pawlak (1982), has emerged as a major
mathematical method to manage uncertainties from inexact, noisy and incomplete
information. It has been one of the focal research areas in artificial intelligence
since its advent. A rough set provides a representation of a given set using lower
and upper approximations when the available information is not sufficient for
determining the exact value of the set. The main objective of rough set analysis is
to synthesise the approximation of concepts from the acquired data. It has been
proven that rough set methods have enormous potential in dealing with
uncertainties in decision making. Rough sets have been applied in various fields,
including expert systems, machine learning, image processing, pattern recognition,
knowledge discovery and control systems.
1
The theory rough set successfully implemented in various sector, but
rough set model classical can only associated with the data complete and set of
data in symbolic form Jianhua (2013). Research by adopting the theory rough set
conducted in attribute numerical and the value of an attribute lost from Jerzy
(2011). In this research discussed how distinguish 2 (two) interprets of the value
of an attribute lost and the condition of the value of an attribute who ignored (lost
values “do not care” conditions) with use the mining LERS in the data mining,
with the approach probability and approach rough set, the results of the analysis
that with the approach probability no better than approach rough set. By using
testing wilcoxon the result of this research produce that there is no significant
difference.
Jerzy (2011) have done research on the theory rough set to handles data
containing attribute numerical and the data having been missing values
simultaneously. In this research, a theory of rough set that was expanded by using
tolerance fuzzy to handle types of data on with see attribute the data and values
lost or data that is not complete.
At first, the theory rough set used for numerical data that complete Jerzy
(2007) with all value the attribute of being specified. Lately the theory rough
extended to set numerical data that is not complete with a value of an attribute that
lost Jerzy (2003). The theory set introduced rough Pawlak (1982) in the field of
mathematics that is useful to handle information that is not clear and uncertain
and has successfully implemented in many areas Pawlak (1991).
In theory, rough set of data described is imperfect data using data estimates
set down and on, while function used is a function real and function of an
imaginary in theory. In addition, the theory rough set usually will start used in
analysis started from raw data, where of the theory in subjective hanging from
those opinions experts. In a table decision that uses settled with a method of rough
set usually containing attribute, in real life, data sets often incomplete, namely that
some the value of an attribute missing. In research, this has researchers raised
problems related to the decision of data that is not complete seen from the specific
data lost or the value of an attribute unknown.
2
1.2. Research Question
The problem to be analyzed as follows.
1. How to find the characteristic of numerical data incomplete?
2. How to find the condition and decision from an attribute for the complete
data?
1.3. Research Objectives
As for the objective of this research is:
1. To find the characteristic of numerical data incomplete.
2. To find the conditions and decisions of numerical data incomplete.
1.4. Literature Review
Data mining is a process of getting knowledge or patterns from a data
set (Pawlak, 1991). Data mining will solve the problem by analyzing the existing
data in the database. Data mining, often also called Knowledge-Discovery in
Databases (KDD) is an activity that includes the collection, use historical data to
find patterns of regularities, patterns of relationships in large data sets (Santoso,
2007). The output data mining can be used to improve decision making in the
future.
Many data mining functions that can be used. In certain cases the data
mining functions can be combined to address the problems encountered (Jamie,
2009). Here is a function of data mining in general (Ian, 2011):
1. Classification
The function of the Classification is to classify the target class in the
selected category.
2. Clustering
The function of clustering is to find a grouping of attributes into
segmentations based on similarity.
3. Association
3
The function of the association is to find the relationship between attributes
or item set, based on the number of items that appear and existing
association rule.
4. Regression
The function of the regression is almost similar to the classification. The
function of the regression is aimed to explore the prediction of an existing
pattern.
5. Forecasting
The function of forecasting is to forecast the future based on the trend that
has occurred in the past.
6. Sequence Analysis
The function of the sequence analysis is tok for patterns a sequence of the
chain of events.
7. Deviation Analysis
The function of the deviation analysis is to look for the rare occurrence
that is very different from the normal state (abnormal occurrences).
Rough Set theory is a mathematical technique developed by Pawlack
(1991). This technique is used to address the issue of uncertainty, missing of data,
incompleted data, data inconsistency, imprecision and vagueness in the same
practice Artificial Intelligence (AI). Rough Set is an efficient technique for
Knowledge Discovery in Databases (KDD) and data mining processes. In general,
rough set theory has been used in many applications such as medicine,
pharmacology, business, banking, engineering design, image processing and
decision analysis. The rough set is an efficient technique for the KDD process and
data mining. In rough set data can be represented in two forms, namely:
incomplete numerical information system and incomplete numerical decision
system.
METHODOLOGY
The research methods for this study were divided into three phases as
shown in Figure 1.
4
Fase I :
Start
SL1 : Literature Review about
research rough set
Research
Problem
SL2 : Theories
& concept in
rough set of
numerical data
Fase II :
SL3 : Theories
Systematic
Literatur
Review
Analysis of the concept &
theory
Model Development
Fase III :
Evaluating & Testing
Validated Models
Conclusion/full report
End
Figure 1. Research Framework
5
& concept
rough set of
innumerical
data.
EXPECTED RESULT
In this research, a fuzzy rough set approaches is proposed to
incomplete numerical information systems which contain both missing values
and numerical attributes simultaneously. The proposed model can be viewed
as an extension of the tolerance relation based rough set model. It is will be
proved that the proposed discernibility matrices and discernibility functions
can be used to compute the reduce in an incomplete numerical information
system or relative reducts in an incomplete numerical decision system. It is
will be also proved that the proposed model is an extension of tolerance rough
set model. Uncertainty measurement is also investigated.
6
I. RESEARCH SCHEDULE
Tabel 1. Milestone and dates of research activity
Project Activity
2016
2017
11 12 1 2 3 4 5 6 7 8 9 10
1.Literature review about research rough set in
numerical data
Completion of literature review (April 2017)
2. Analysis of the concept and theory incomplete
numerical data
Completion of concept and theory (October 2017)
3. Model development rough set incomplete numerical
data
Completion of model development (August 2018)
4. Evaluating, testing, validation and full report
Completion of Project Result (February 2019)
Journal publication about research topic
0
11
12
2018
1 2 3 4 5 6 7 8 9
10
11
12
2019
1 2
REFERENCES
IAN H, Witten, f, E. 2011. Data Mining : Practical Machine Learning Tools and
Techniques (3.ed). United States of America: Morgan Kaufmann.
JAMIE, Maclenanan, Tang ZHAOHUI, and Crivat BONGDAN. 2009. Data Mining
with Microsoft SQL Server. Wiley.
JERZY W, Grzymala - Buse. 2003. Rough set strategies to data with missing
attribute values. Workshop Notes, Foundations and New Directions of Data
Mining. In: the 3-rd International Conference on Data Mining. Melbourne, FL,
USA: IEEE, pp.56-63.
JERZY W, Grzymala - Buse. 2004. Characteristic relations for incomplete data:
A generalization of the indiscernibility relation. In: Proceedings of the
RSCTC’2004, Fourth International Conference on Rough Sets and Current
Trends in Computing. Uppsala, Sweden: Springer-Verlag, pp.244-253.
JERZY W, Grzymala - Buse. 2004. Data with missing attribute values:
Generalization of idiscernibility relation and rule induction. Transactions on
Rough Sets. IEEE. vol. 1, pp.78–95.
JERZY W, Grzymala - Buse. 2007. Mining Numerical Data—A Rough Set
Approach. In: International Conference, RSEISP 2007. Warsaw, Poland:
Springer Berlin, p-ISBN9: 78-3-540-73450-5,e-ISBN 978-3-540-73451-2,
pp.12-21.
JERZY W, Grzymala-Buse and Zdzislaw S, Hippe. Nov. 2011. Mining Data with
Numerical Attributes and Missing Attribute Values-A Rough Set Approach.
IEEE International Conference on Granular Computing., pp.214-219.
JERZY W, Grzymala and Y,Wang A. 1997. Modified algorithms LEM1 and
LEM2 for rule induction from data with missing attribute values. In: Proc. of
0
the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97)
at the Third Joint Conference on Information Sciences (JCIS’97). Research
Triangle Park, NC, pp.69-72.
JERZY W, Grzymala, BUSE, and M.HU. 2000. A comparison of several
approaches to missing attribute values in data mining. In: Proceedings of the
Second International Conference on Rough Sets and Current Trends in
Computing RSCTC’2000. Banff, Canada, pp.340-347.
JIANHUA, Dai. 2013. Rough set approach to numerical data. Journal
Information Sciences. Vol 241(ISSN:0020-0255), pp.43-57.
M, Kryszkiewicz. 1995. Rough set approach to incomplete information
systems. In: Proceedings of the Second Annual Joint Conference on Information
Sciences. Wrightsville Beach, NC, pp.194-197.
M, Kryszkiewicz. 1998. Information Sciences. In: Rough set approach to
incomplete information systems, pp.39-49.
SANTOSO, B. 2007. Data Mining Teknik Pemanfaatan Data untuk Keperluan
Bisnis. Yogyakarta: Graha Ilmu.
Z, Pawlak. 1982. Rough Sets. International Journal of Computer and
Information Sciences 11., pp.341-356.
Z, Pawlak. 1991. Rough Sets. In: Theoretical Aspects of Reasioning about Data,
Dordrecht: Kluwer Academic Publisher.
1