Download Constructing a Fuzzy Decision Tree by Integrating Fuzzy Sets and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Granular computing wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Gene expression programming wikipedia , lookup

Pattern recognition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Time series wikipedia , lookup

Type-2 fuzzy sets and systems wikipedia , lookup

Fuzzy concept wikipedia , lookup

Fuzzy logic wikipedia , lookup

Transcript
Constructing a Fuzzy Decision Tree by Integrating Fuzzy
Sets and Entropy
TIEN-CHIN WANG (王天津)1
HSIEN-DA LEE(李賢達)1,2
1
Department of Information Management
I-Shou University
[email protected]
2
Fortune Institute of Technology
Kaohsiung, Taiwan
[email protected]
Abstract: - Decision tree induction is one of common approaches for extracting knowledge from a sets of
feature-based examples. In real world, many data occurred in a fuzzy and uncertain form. The decision tree
must able to deal with such fuzzy data. This paper presents a tree construction procedure to build a fuzzy
decision tree from a collection of fuzzy data by integrating fuzzy set theory and entropy. It proposes a fuzzy
decision tree induction method for fuzzy data of which numeric attributes can be represented by fuzzy number,
interval value as well as crisp value, of which nominal attributes are represented by crisp nominal value, and of
which class has confidence factor. It also presents an experiment result to show the applicability of the proposed
method.
Key-Words: Fuzzy Decision Tree, Fuzzy Sets, Entropy, Information Gain, Classification, Data Mining
1 Introduction
Decision trees have been widely and successfully
used in machine learning. More recently, fuzzy
representations have been combined with decision
trees. Many methods have been proposed to construct
decision trees from collection of data. Due to
observation error, uncertainty, and so on, many data
collecting in real world are obtained in fuzzy forms.
Fuzzy decision trees treat features as fuzzy variables
and also yield simple decision trees. Moreover, the
use of fuzzy sets is expected to deal with uncertainty
due to noise and imprecision. The researches on
fuzzy decision tree induction for fuzzy data have not
yet sufficiently performed. This paper is concerned
with a fuzzy decision tree induction method for such
fuzzy data. It proposes a tree-building procedure to
construct fuzzy decision tree from a collection of
fuzzy data.
Decision trees and decision rules are data-mining
methodologies applied in many real-world
applications as a powerful solution to classification
problem [1]. Classification is a process of learning a
function that maps a data item into one of several
predefined classes. Every classification based on
inductive-learning algorithms is given as input a sets
of samples that consist of vectors of attribute values
and a corresponding class. For example, a simple
classification might group students into three groups
based on their scores: (1) those students whose scores
are above 90 (2) those students whose scores are
between 90 and 70 and (3) those students whose
scores are below 70.
1.1 Fuzzy set theory
Fuzzy set theory was first proposed by Zadeh to
represent and manipulate data and information that
posses non-statistical uncertainty. Fuzzy set theory is
primarily concerned with quantifying and reasoning
using natural language in which words can have
ambiguous meanings. This can be thought of as an
extension of traditional crisp sets, in which each
element must either be in or not in a set. Fuzzy sets
are defined on a non-fuzzy universe of discourse,
which is an ordinary sets. A fuzzy sets F of a universe
of discourse U is characterized by a membership
function µ F (x) which assigns to every element
x ∈ U ,a membership degree µ F ( x) ∈ [0,1] . An
element x ∈ U is said to be in a fuzzy sets F if and
only if µ A ( x) > 0 and to be a full member if and
only if µ F ( x) = 1 [5]. Membership functions can
either be chosen by the user arbitrarily, based on the
user’s experience, or they can be designed by using
optimization procedures[6][7]. Typically, a fuzzy
subset A can be represented as,
A = {µ A ( x1 ) / x1 }, {µ A ( x 2 ) / x 2 },..., {µ A ( x n ) / x n }
Where the separating symbol / is used to
associate the membership value with its coordinate
on the horizontal axis. For example, in Fig.1, let
F=integers close to 10; then one choice for µ F (x) is
expressed as
F = 0.0 / 8 + .0.5 / 9 + 1/ 10 + .0.5 / 11 + 0.0 / 12
Fig. 1. Triangular Membership
function expression for a number
closed to 10
1.2 Fuzzy Decision Trees
A decision tree[4][8] is a formalism for
expressing mapping from attribute values to classes
and consists of tests or attribute nodes linked to two
or more subtrees and leafs or decision nodes labeled
with a class which indicates the decision. The main
advantage of decision-tree approach is it visualizes
the solution; it is easy to follow any path through the
tree. Relationships discovered by a decision tree can
be expressed as a set of rules, which can then be used
in developing an expert system. A decision tree
model employs a recursive divide–and-conquer
strategy to divide the data set into partitions so that all
of the records in a partition have the same class
label[9]. In classical decision trees, nodes make a
data follow down only one branch since data satisfies
a branch condition, and the data finally arrives at only
a leaf node. In tree-structured representations, a set of
data is represented by a node, and the entire data set is
represented as a root node. When a split is made,
several child nodes, which correspond to partitioned
data subsets, are formed. If a node is not to be split
any further, it is called a leaf; otherwise, it is an
internal node. Decision trees classify data by sorting
them down the tree from the root to leaf nodes. As the
typical kinds of decision tree induction algorithms,
there are ID3 and CART [10][11]. Decision trees
were popularized by Quinlan with the ID3 algorithm.
Systems based on ID3 work well in symbolic
domains. A large variety of extensions to the basic
ID3 algorithm have been developed by different
researchers. ID3 is designed to deal with symbolic
domain data, and the data finally arrives at only a leaf
node. The algorithm is applied recursively to each
child node until all samples at a node are belongs to a
class. Fuzzy decision trees allow data to follow down
simultaneously multiple branches of a node with
different satisfaction degrees ranged on [0,1][12].
CART is designed to deal with continuous numeric
domain data. A number of alternation of them have
been developed. Fuzzy decision tree is one of them.
Fuzzy decision trees attempt to combine elements
of symbolic and sub-symbolic approaches. Fuzzy
sets and fuzzy logic allow modeling language-related
uncertainties, while providing a symbolic framework
for knowledge comprehensibility. Fuzzy decision
trees differ from traditional crisp decision trees in
three respects [10]: (1) They use splitting criteria
based on fuzzy restrictions. (2) Their inference
procedures are different. (3) The fuzzy sets
representing the data have to be defined.
Fuzzy decision tree induction has two major
components: a procedure for fuzzy decision tree
building and an inference procedure for decision
making [13]. It is required to develop the following
things to apply an ID3-like procedure to fuzzy
decision tree construction: attribute value space
partitioning methods, branching attribute selection
method, branching test method to decide which
degree data follows down branches of a node, and
leaf node labeling methods to determine classes for
which leaf nodes stand.
1.3 Entropy Heuristics
Attribute selection in ID3 and C4.5 algorithms are
based on minimizing an information entropy measure
applied to the examples at a node [1]. The entropy
measure is used to calculate the information gain
which reflects the quality of an attribute as the
branching attribute. The attribute-selection part of
ID3 is base on the assumption that the complexity of
the decision tree is strongly related to the amount of
information conveyed by the value of the given
attribute. An information-based heuristic selects the
attribute providing the highest information gain. A
data set with some discrete-valued condition
attributes and one discrete-valued decision attributes
can be presented in the form of knowledge
representation
system
,
J = (U , C ∪ D )
U = {u1 , u 2 ...., u s } is the set of data samples,
C = {c1 , c 2 ...., c n } is the set of condition attributes
where
and D = {d } is the one-elemental set with the
decision attribute or class label attribute. Suppose
this class label attribute has m distinct values
d i (for i=l, ..,m), let si
d
be the number of samples of U in class i .The
defining m distinct classes ,
expected information or entropy need to classify a
given sample is given by
m
I ( s1 ,...s m ) = −∑ pi log 2 pi
In this section, an example is given to illustrate
the proposed fuzzy decision tree algorithm. This
sample is intended to show fuzzy decision tree
algorithm can be used to evaluate student admission
for graduate school. The data set includes 10
applicants, as shown in Table 1
(1)
i =1
Table 1.The data set of students
Where p i is the probability that an arbitrary
sample belongs to class si and is estimated by
summation those samples’ entropy (m is the number
of all samples). Let attribute ci have v distinct value
{A1 , A2 ...., Av } , attribute ci can be used to partition
U into v subsets {S1 , S 2 ...., S v } where S i (j=1,..,v)
contains those samples in U that have value A j of ci .
Let s ij be the number of samples of class d i in a
subset S j , the entropy of attribute ci is given by
v
E (c i ) = ∑
s1 j + ...s mj
I ( s1 j ,...s mj )
s
s1 j + .... + s mj
(2)
j =1
The term
acts as the weight of the
s
jth subset and is the number of samples in the subset
divided by the total number of samples. The smaller
the entropy value, the greater the purity of the subset
partitions. Thus the attribute that leads to the largest
information gain, is selected as the branching
attribute. For a given subset S j ,the information gain
is expressed as
Student
no.
1
2
3
4
5
6
7
8
9
10
GPA
ETS
3.2
2.8
2.7
3.6
2.1
2.6
2.8
2.3
3.6
3.5
75
52
69
86
63
91
63
77
68
90
WE
Fair
Excellent
Fair
Excellent
Fair
Fair
Excellent
Fair
Fair
Fair
Ref.
Yes
N/A
Yes
Yes
Yes
N/A
Yes
Yes
Yes
N/A
(3)
i =1
Where
pij =
s ij
Sj
(
S j is the number of
samples in the subset S j ) and is the probability that
a sample in S j belongs to class d i . So information
gain of attribute ci is given by
Gain(ci ) = I ( s1 j ,...s mj ) − E (ci )
(4)
We compute the information gain of each
condition attribute, the attribute with the highest
information gain is the most informative and the most
discriminating attribute of the given set.
2 Experiment
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
Each case consists of four condition attributes:
grade point average (denoted GPA), entrance test
score (denoted ETS), working experience (denoted
WE), and reference (denoted Ref).
In this example, triangular membership functions
are used to represent fuzzy sets because of its
simplicity, easy comprehension, and computational
efficiency. Membership functions are usually
predefined by experienced experts. They also can be
derived through automatic adjustments [14].
From Fig.2 and Fig.3, GPA and ETS attribute
have three fuzzy regions: Low, Middle, and High.
Thus, three fuzzy membership values are produced
for each course score according to the predefined
membership functions.
m
I ( s1 j ,...s mj ) = −∑ pij log 2 pij
Admission
Fig. 2. The membership function for
examinees’ GPAs
yes and class d 2 represents no, there are 5 samples
of class yes and 5 samples of class no, so
5
5 5
5
I (s1 , s 2 ) = − log2 − log2 = 1 formula(1)
10
10 10
10
STEP 3. Compute the entropy for each attribute,
for attribute GPA, it has three distinct values
{High, Middle, Low} ,U can be partitioned into
three subsets
{s1 , s 2 , s3 }
For GPA=”High” s11 =3 s 21 =0
3
3
I (s11 , s 21 ) = − log 2 − 0 = 0
3
3
Fig. 3. The membership function for
examinees’ scores
3 Problem Solution
For the experimental data in Table 1, the
decision-tree construction algorithm proceeds in
following subsections.
formula(3)
For GPA=”Middle” s12 =2 s 22 =3
2
2 3
3
I (s12 , s 22 ) = − log2 − log 2 = 0.971 formula(3)
5
5 5
5
For GPA=”Low” s13 =0 s 23 =2
I (s13 , s 31 ) = 0 −
3.1 Calculate Information Gain
STEP 1. To represent a continuous fuzzy set , we
need to express it as a function and then map the
elements of the set to their degree of membership[3].
Transform the quantitative values of each examinee’s
score into fuzzy sets. Take the Entrance Test Score
(ETS) for example, the score “85” can be converted
into a fuzzy set (0.0/Low + 0.0/Middle + 0.5/High)
using the predefined membership functions in Fig.2.
The transformation procedure is repeated for the
other scores. The result is shown in Table 2.
2
2
formula (3)
log 2 = 0
2
2
3
5
2
E(GPA) = *I(s11, s21) + *I(s21, s22) + *I(s13, s31) = 0.485
10
10
10
formula(2)
Gain(GPA) = I (s1 , s 2 ) − E (GPA) = 0.514
formula(4)
STEP 4. Same as STEP 3 to compute
Gain(ETS)=0.6, Gain(WE)=0.3389, Gain(Ref)=0.05.
Since ETS has the highest information gain among
the four attributes, so ETS is selected as the attribute
to split the tree.
Table 2.The data set of students in fuzzy form
no.
1
2
3
4
5
6
7
8
9
10
GPA
ETS
Middle
Middle
Middle
High
Low
Middle
Middle
Low
High
High
Middle
Low
Middle
High
Low
High
Low
Middle
Middle
High
WE
Fair
Excellent
Fair
Excellent
Fair
Fair
Excellent
Fair
Fair
Fair
Ref.
Yes
N/A
Yes
Yes
Yes
N/A
Yes
Yes
Yes
N/A
Admission
3.2 Constructing a Decision Tree
Yes
No
No
Yes
No
Yes
No
No
Yes
Yes
STEP 2. Form a knowledge representation system
J = {U, C ∪ D},U = {1,..10}, C = {GPA, ETS,WE, REF.},
D = {Admission} . The class label attribute is
admission, has two distinct values {yes, no}. There
are two distinct classes (m=2), let class d1 represents
We use the selected condition attribute: ETS to
form the decision tree. So, we get the following
equivalence classes:
high: {4,6,10} middle: {1,3,8,9} low: {2,5,7}
The subset class middle: {1,3,8,9} needs to
further split. Following the algorithm expressed in
section 2.1, the attribute GPA has the highest
information gain to split the tree. Then the whole
decision tree has been completed as Fig.4.
1-10 ETS?
high
low
middle
2,5,7 no
1,3,8,9 GPA?
4,6,10 yes
high
9 yes
low
middle
1, 3 ?
8 no
Fig. 4. Decision tree based on
information gain.
3.3 Extract classification rules
Data classification is an important data mining
task[2] that tries to identify common characteristics
in a set of N objects contained in a database and to
categorize them into different groups. We extract
classification IF-THEN rules from those equivalence
classes. For equivalence class {4,6,10} ,those
samples all have the identical attribute values:
ETS=high, Admission=yes
So, we use the condition attribute values
(ETS=high) as the rule antecedent and use the class
label attribute value (Admission= yes) as the rule
consequent, we can get the following classification
rule:
IF ETS=”high” THEN Admission=”yes”
Similarly, the other classification rules can be
extracted at this manner. We can get those rules as
follows:
1. IF ETS=”high” THEN Admission=”yes”
2. IF ETS=”low” THEN Admission=”no”
3. IF ETS=”middle” AND GPA=”high” THEN
Admission=”yes”
4. IF ETS=”middle” AND GPA=”low” THEN
Admission=”no”
4 Conclusion
The paper is concerned with fuzzy sets and
decision tree. We present a fuzzy decision tree model
based on fuzzy set theory and information theory. It
proposes a fuzzy decision tree induction method for
fuzzy data of which numeric attributes can be
represented by fuzzy number, interval value as well
as crisp value, of which nominal attributes are
represented by crisp nominal value, and of which
class has confidence factor. An example is used to
prove the validity. First, we applied fuzzy set theory
to transform real-world data into fuzzy linguistic
forms. Secondly, we used information theory to
construct a decision tree. Finding the best split point
and performing the split are the main tasks in
decision tree induction method. Through the
integration of both fuzzy set theory and information
theory, it can make classification tasks originally
thought too difficult or complex to become possible.
It provides an alternative for evaluating the best
possible candidates.
References:
[1] Mehmed Kantardzic, Data Mining, Concept,
Models, Methods, and Algorithms, Wiley
Publishers, 1993.
[2] U.M. Fayyad, G.Piatesky-Shapiro and P. Smith,
From Data Mining to Knowledge Discovery in
Knowledge Discovery and Data Mining,
AAAI/MIT Press, 1996.
[3] Michael Negnevitsky, Artificial Intelligence,
Addison Wesley, 2002.
[4] Stuart J. Russel, Peter Norvig, et al: Artificial
Intelligence: a Modern Approach, Englewood
Cliffs, Prentice-Hall,1995
[5] H. J. Zimmermann, Fuzzy Set Theory and Its
Applications, Kluwer Academic Publishers,
1991.
[6] Jang, J.S. R., Self-Learning Fuzzy Controllers
Based on Temporal Back-Propagation, IEEE
Trans. On Neural Network, Vol. 3, September,
1992, pp. 714-723.
[7] Horikowa, S., T. Furahashi and Y. Uchikawa, On
Fuzzy Modeling Using Fuzzy Neural Networks
with Back-Propagation Algorithm, IEEE Trans.
on Neural Networks, Vol.3, Sept., 1992, pp.
801-806.
[8] J.R. Quinlan: C4.5: Programs for Machine
Learning, Morgan Kaufmann Publishers, San
Mateo, CA, 1993
[9] Shu-Tzu Tsai, Chao-Tung Yang, Decision Tree
Construction for Data Mining on Grid
Computing, IEEE International Conference on
e-Technology, e-Commerce and e-Service, 2004.
[10] C. Z. Janikow, Fuzzy Decision Trees: Issues and
Methods, IEEE Trans. on Systems, Man, and
Cybernetics -Part B, February 1998, Vo1.28,
No.1, pp.1-14.
[11] J. Jang, Structure determination in fuzzy
modeling: A fuzzy CART approach, Proc. IEEE
Conf on Fuzzy Systems, 1994, pp.480-485.
[12] R.L.P. Chang, T. Pavlidis, Fuzzy Decision Tree
Algorithms, IEEE Trans. on Systems, Man, and
Cybernetics,Vol.7, No.1, 1977, pp.28-35.
[13] Koen-Myung Lee, Kyung-Mi. Lee, Jee-Hyung
Lee, Hyung Lee-Kwang, A Fuzzy Decision Tree
Induction Method for Fuzzy Data, IEEE
International Fuzzy Systems Conference
Proceedings, Vol.1, August 1999, pp.16-21.
[14] T.P. Hong, C.H. Chen, Y.L. Wu, Y.C. Lee,
Using Divide-and- Conquer GA Strategy in Fuzzy
Data Mining, The Ninth IEEE Symposium on
Computers and Communications, 2004.