Download Constructing a Fuzzy Decision Tree by Integrating Fuzzy Sets and

Constructing a Fuzzy Decision Tree by Integrating Fuzzy Sets and Entropy TIEN-CHIN WANG (王天津)1 HSIEN-DA LEE(李賢達)1,2 1 Department of Information Management I-Shou University [email protected] 2 Fortune Institute of Technology Kaohsiung, Taiwan [email protected] Abstract: - Decision tree induction is one of common approaches for extracting knowledge from a sets of feature-based examples. In real world, many data occurred in a fuzzy and uncertain form. The decision tree must able to deal with such fuzzy data. This paper presents a tree construction procedure to build a fuzzy decision tree from a collection of fuzzy data by integrating fuzzy set theory and entropy. It proposes a fuzzy decision tree induction method for fuzzy data of which numeric attributes can be represented by fuzzy number, interval value as well as crisp value, of which nominal attributes are represented by crisp nominal value, and of which class has confidence factor. It also presents an experiment result to show the applicability of the proposed method. Key-Words: Fuzzy Decision Tree, Fuzzy Sets, Entropy, Information Gain, Classification, Data Mining 1 Introduction Decision trees have been widely and successfully used in machine learning. More recently, fuzzy representations have been combined with decision trees. Many methods have been proposed to construct decision trees from collection of data. Due to observation error, uncertainty, and so on, many data collecting in real world are obtained in fuzzy forms. Fuzzy decision trees treat features as fuzzy variables and also yield simple decision trees. Moreover, the use of fuzzy sets is expected to deal with uncertainty due to noise and imprecision. The researches on fuzzy decision tree induction for fuzzy data have not yet sufficiently performed. This paper is concerned with a fuzzy decision tree induction method for such fuzzy data. It proposes a tree-building procedure to construct fuzzy decision tree from a collection of fuzzy data. Decision trees and decision rules are data-mining methodologies applied in many real-world applications as a powerful solution to classification problem [1]. Classification is a process of learning a function that maps a data item into one of several predefined classes. Every classification based on inductive-learning algorithms is given as input a sets of samples that consist of vectors of attribute values and a corresponding class. For example, a simple classification might group students into three groups based on their scores: (1) those students whose scores are above 90 (2) those students whose scores are between 90 and 70 and (3) those students whose scores are below 70. 1.1 Fuzzy set theory Fuzzy set theory was first proposed by Zadeh to represent and manipulate data and information that posses non-statistical uncertainty. Fuzzy set theory is primarily concerned with quantifying and reasoning using natural language in which words can have ambiguous meanings. This can be thought of as an extension of traditional crisp sets, in which each element must either be in or not in a set. Fuzzy sets are defined on a non-fuzzy universe of discourse, which is an ordinary sets. A fuzzy sets F of a universe of discourse U is characterized by a membership function µ F (x) which assigns to every element x ∈ U ,a membership degree µ F ( x) ∈ [0,1] . An element x ∈ U is said to be in a fuzzy sets F if and only if µ A ( x) > 0 and to be a full member if and only if µ F ( x) = 1 [5]. Membership functions can either be chosen by the user arbitrarily, based on the user’s experience, or they can be designed by using optimization procedures[6][7]. Typically, a fuzzy subset A can be represented as, A = {µ A ( x1 ) / x1 }, {µ A ( x 2 ) / x 2 },..., {µ A ( x n ) / x n } Where the separating symbol / is used to associate the membership value with its coordinate on the horizontal axis. For example, in Fig.1, let F=integers close to 10; then one choice for µ F (x) is expressed as F = 0.0 / 8 + .0.5 / 9 + 1/ 10 + .0.5 / 11 + 0.0 / 12 Fig. 1. Triangular Membership function expression for a number closed to 10 1.2 Fuzzy Decision Trees A decision tree[4][8] is a formalism for expressing mapping from attribute values to classes and consists of tests or attribute nodes linked to two or more subtrees and leafs or decision nodes labeled with a class which indicates the decision. The main advantage of decision-tree approach is it visualizes the solution; it is easy to follow any path through the tree. Relationships discovered by a decision tree can be expressed as a set of rules, which can then be used in developing an expert system. A decision tree model employs a recursive divide–and-conquer strategy to divide the data set into partitions so that all of the records in a partition have the same class label[9]. In classical decision trees, nodes make a data follow down only one branch since data satisfies a branch condition, and the data finally arrives at only a leaf node. In tree-structured representations, a set of data is represented by a node, and the entire data set is represented as a root node. When a split is made, several child nodes, which correspond to partitioned data subsets, are formed. If a node is not to be split any further, it is called a leaf; otherwise, it is an internal node. Decision trees classify data by sorting them down the tree from the root to leaf nodes. As the typical kinds of decision tree induction algorithms, there are ID3 and CART [10][11]. Decision trees were popularized by Quinlan with the ID3 algorithm. Systems based on ID3 work well in symbolic domains. A large variety of extensions to the basic ID3 algorithm have been developed by different researchers. ID3 is designed to deal with symbolic domain data, and the data finally arrives at only a leaf node. The algorithm is applied recursively to each child node until all samples at a node are belongs to a class. Fuzzy decision trees allow data to follow down simultaneously multiple branches of a node with different satisfaction degrees ranged on [0,1][12]. CART is designed to deal with continuous numeric domain data. A number of alternation of them have been developed. Fuzzy decision tree is one of them. Fuzzy decision trees attempt to combine elements of symbolic and sub-symbolic approaches. Fuzzy sets and fuzzy logic allow modeling language-related uncertainties, while providing a symbolic framework for knowledge comprehensibility. Fuzzy decision trees differ from traditional crisp decision trees in three respects [10]: (1) They use splitting criteria based on fuzzy restrictions. (2) Their inference procedures are different. (3) The fuzzy sets representing the data have to be defined. Fuzzy decision tree induction has two major components: a procedure for fuzzy decision tree building and an inference procedure for decision making [13]. It is required to develop the following things to apply an ID3-like procedure to fuzzy decision tree construction: attribute value space partitioning methods, branching attribute selection method, branching test method to decide which degree data follows down branches of a node, and leaf node labeling methods to determine classes for which leaf nodes stand. 1.3 Entropy Heuristics Attribute selection in ID3 and C4.5 algorithms are based on minimizing an information entropy measure applied to the examples at a node [1]. The entropy measure is used to calculate the information gain which reflects the quality of an attribute as the branching attribute. The attribute-selection part of ID3 is base on the assumption that the complexity of the decision tree is strongly related to the amount of information conveyed by the value of the given attribute. An information-based heuristic selects the attribute providing the highest information gain. A data set with some discrete-valued condition attributes and one discrete-valued decision attributes can be presented in the form of knowledge representation system , J = (U , C ∪ D ) U = {u1 , u 2 ...., u s } is the set of data samples, C = {c1 , c 2 ...., c n } is the set of condition attributes where and D = {d } is the one-elemental set with the decision attribute or class label attribute. Suppose this class label attribute has m distinct values d i (for i=l, ..,m), let si d be the number of samples of U in class i .The defining m distinct classes , expected information or entropy need to classify a given sample is given by m I ( s1 ,...s m ) = −∑ pi log 2 pi In this section, an example is given to illustrate the proposed fuzzy decision tree algorithm. This sample is intended to show fuzzy decision tree algorithm can be used to evaluate student admission for graduate school. The data set includes 10 applicants, as shown in Table 1 (1) i =1 Table 1.The data set of students Where p i is the probability that an arbitrary sample belongs to class si and is estimated by summation those samples’ entropy (m is the number of all samples). Let attribute ci have v distinct value {A1 , A2 ...., Av } , attribute ci can be used to partition U into v subsets {S1 , S 2 ...., S v } where S i (j=1,..,v) contains those samples in U that have value A j of ci . Let s ij be the number of samples of class d i in a subset S j , the entropy of attribute ci is given by v E (c i ) = ∑ s1 j + ...s mj I ( s1 j ,...s mj ) s s1 j + .... + s mj (2) j =1 The term acts as the weight of the s jth subset and is the number of samples in the subset divided by the total number of samples. The smaller the entropy value, the greater the purity of the subset partitions. Thus the attribute that leads to the largest information gain, is selected as the branching attribute. For a given subset S j ,the information gain is expressed as Student no. 1 2 3 4 5 6 7 8 9 10 GPA ETS 3.2 2.8 2.7 3.6 2.1 2.6 2.8 2.3 3.6 3.5 75 52 69 86 63 91 63 77 68 90 WE Fair Excellent Fair Excellent Fair Fair Excellent Fair Fair Fair Ref. Yes N/A Yes Yes Yes N/A Yes Yes Yes N/A (3) i =1 Where pij = s ij Sj ( S j is the number of samples in the subset S j ) and is the probability that a sample in S j belongs to class d i . So information gain of attribute ci is given by Gain(ci ) = I ( s1 j ,...s mj ) − E (ci ) (4) We compute the information gain of each condition attribute, the attribute with the highest information gain is the most informative and the most discriminating attribute of the given set. 2 Experiment Yes No No Yes No Yes No No Yes Yes Each case consists of four condition attributes: grade point average (denoted GPA), entrance test score (denoted ETS), working experience (denoted WE), and reference (denoted Ref). In this example, triangular membership functions are used to represent fuzzy sets because of its simplicity, easy comprehension, and computational efficiency. Membership functions are usually predefined by experienced experts. They also can be derived through automatic adjustments [14]. From Fig.2 and Fig.3, GPA and ETS attribute have three fuzzy regions: Low, Middle, and High. Thus, three fuzzy membership values are produced for each course score according to the predefined membership functions. m I ( s1 j ,...s mj ) = −∑ pij log 2 pij Admission Fig. 2. The membership function for examinees’ GPAs yes and class d 2 represents no, there are 5 samples of class yes and 5 samples of class no, so 5 5 5 5 I (s1 , s 2 ) = − log2 − log2 = 1 formula(1) 10 10 10 10 STEP 3. Compute the entropy for each attribute, for attribute GPA, it has three distinct values {High, Middle, Low} ,U can be partitioned into three subsets {s1 , s 2 , s3 } For GPA=”High” s11 =3 s 21 =0 3 3 I (s11 , s 21 ) = − log 2 − 0 = 0 3 3 Fig. 3. The membership function for examinees’ scores 3 Problem Solution For the experimental data in Table 1, the decision-tree construction algorithm proceeds in following subsections. formula(3) For GPA=”Middle” s12 =2 s 22 =3 2 2 3 3 I (s12 , s 22 ) = − log2 − log 2 = 0.971 formula(3) 5 5 5 5 For GPA=”Low” s13 =0 s 23 =2 I (s13 , s 31 ) = 0 − 3.1 Calculate Information Gain STEP 1. To represent a continuous fuzzy set , we need to express it as a function and then map the elements of the set to their degree of membership[3]. Transform the quantitative values of each examinee’s score into fuzzy sets. Take the Entrance Test Score (ETS) for example, the score “85” can be converted into a fuzzy set (0.0/Low + 0.0/Middle + 0.5/High) using the predefined membership functions in Fig.2. The transformation procedure is repeated for the other scores. The result is shown in Table 2. 2 2 formula (3) log 2 = 0 2 2 3 5 2 E(GPA) = *I(s11, s21) + *I(s21, s22) + *I(s13, s31) = 0.485 10 10 10 formula(2) Gain(GPA) = I (s1 , s 2 ) − E (GPA) = 0.514 formula(4) STEP 4. Same as STEP 3 to compute Gain(ETS)=0.6, Gain(WE)=0.3389, Gain(Ref)=0.05. Since ETS has the highest information gain among the four attributes, so ETS is selected as the attribute to split the tree. Table 2.The data set of students in fuzzy form no. 1 2 3 4 5 6 7 8 9 10 GPA ETS Middle Middle Middle High Low Middle Middle Low High High Middle Low Middle High Low High Low Middle Middle High WE Fair Excellent Fair Excellent Fair Fair Excellent Fair Fair Fair Ref. Yes N/A Yes Yes Yes N/A Yes Yes Yes N/A Admission 3.2 Constructing a Decision Tree Yes No No Yes No Yes No No Yes Yes STEP 2. Form a knowledge representation system J = {U, C ∪ D},U = {1,..10}, C = {GPA, ETS,WE, REF.}, D = {Admission} . The class label attribute is admission, has two distinct values {yes, no}. There are two distinct classes (m=2), let class d1 represents We use the selected condition attribute: ETS to form the decision tree. So, we get the following equivalence classes: high: {4,6,10} middle: {1,3,8,9} low: {2,5,7} The subset class middle: {1,3,8,9} needs to further split. Following the algorithm expressed in section 2.1, the attribute GPA has the highest information gain to split the tree. Then the whole decision tree has been completed as Fig.4. 1-10 ETS? high low middle 2,5,7 no 1,3,8,9 GPA? 4,6,10 yes high 9 yes low middle 1, 3 ? 8 no Fig. 4. Decision tree based on information gain. 3.3 Extract classification rules Data classification is an important data mining task[2] that tries to identify common characteristics in a set of N objects contained in a database and to categorize them into different groups. We extract classification IF-THEN rules from those equivalence classes. For equivalence class {4,6,10} ,those samples all have the identical attribute values: ETS=high, Admission=yes So, we use the condition attribute values (ETS=high) as the rule antecedent and use the class label attribute value (Admission= yes) as the rule consequent, we can get the following classification rule: IF ETS=”high” THEN Admission=”yes” Similarly, the other classification rules can be extracted at this manner. We can get those rules as follows: 1. IF ETS=”high” THEN Admission=”yes” 2. IF ETS=”low” THEN Admission=”no” 3. IF ETS=”middle” AND GPA=”high” THEN Admission=”yes” 4. IF ETS=”middle” AND GPA=”low” THEN Admission=”no” 4 Conclusion The paper is concerned with fuzzy sets and decision tree. We present a fuzzy decision tree model based on fuzzy set theory and information theory. It proposes a fuzzy decision tree induction method for fuzzy data of which numeric attributes can be represented by fuzzy number, interval value as well as crisp value, of which nominal attributes are represented by crisp nominal value, and of which class has confidence factor. An example is used to prove the validity. First, we applied fuzzy set theory to transform real-world data into fuzzy linguistic forms. Secondly, we used information theory to construct a decision tree. Finding the best split point and performing the split are the main tasks in decision tree induction method. Through the integration of both fuzzy set theory and information theory, it can make classification tasks originally thought too difficult or complex to become possible. It provides an alternative for evaluating the best possible candidates. References: [1] Mehmed Kantardzic, Data Mining, Concept, Models, Methods, and Algorithms, Wiley Publishers, 1993. [2] U.M. Fayyad, G.Piatesky-Shapiro and P. Smith, From Data Mining to Knowledge Discovery in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996. [3] Michael Negnevitsky, Artificial Intelligence, Addison Wesley, 2002. [4] Stuart J. Russel, Peter Norvig, et al: Artificial Intelligence: a Modern Approach, Englewood Cliffs, Prentice-Hall,1995 [5] H. J. Zimmermann, Fuzzy Set Theory and Its Applications, Kluwer Academic Publishers, 1991. [6] Jang, J.S. R., Self-Learning Fuzzy Controllers Based on Temporal Back-Propagation, IEEE Trans. On Neural Network, Vol. 3, September, 1992, pp. 714-723. [7] Horikowa, S., T. Furahashi and Y. Uchikawa, On Fuzzy Modeling Using Fuzzy Neural Networks with Back-Propagation Algorithm, IEEE Trans. on Neural Networks, Vol.3, Sept., 1992, pp. 801-806. [8] J.R. Quinlan: C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993 [9] Shu-Tzu Tsai, Chao-Tung Yang, Decision Tree Construction for Data Mining on Grid Computing, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. [10] C. Z. Janikow, Fuzzy Decision Trees: Issues and Methods, IEEE Trans. on Systems, Man, and Cybernetics -Part B, February 1998, Vo1.28, No.1, pp.1-14. [11] J. Jang, Structure determination in fuzzy modeling: A fuzzy CART approach, Proc. IEEE Conf on Fuzzy Systems, 1994, pp.480-485. [12] R.L.P. Chang, T. Pavlidis, Fuzzy Decision Tree Algorithms, IEEE Trans. on Systems, Man, and Cybernetics,Vol.7, No.1, 1977, pp.28-35. [13] Koen-Myung Lee, Kyung-Mi. Lee, Jee-Hyung Lee, Hyung Lee-Kwang, A Fuzzy Decision Tree Induction Method for Fuzzy Data, IEEE International Fuzzy Systems Conference Proceedings, Vol.1, August 1999, pp.16-21. [14] T.P. Hong, C.H. Chen, Y.L. Wu, Y.C. Lee, Using Divide-and- Conquer GA Strategy in Fuzzy Data Mining, The Ninth IEEE Symposium on Computers and Communications, 2004.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Constructing a Fuzzy Decision Tree by Integrating Fuzzy Sets and