Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012 444 Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees Yo-Ping Huang, Shin-Liang Lai, Frode Eika Sandnes, and Shen-Ing Liu Abstract1 Analyzing given medical databases provide valuable references for classifying other patients symptoms. This study presents a strategy for discovering fuzzy decision trees from medical databases, in particular Harbeman’s Survival database and the Blood Transfusion Service Center database. Harbeman’s Survival database helps doctors treat and diagnose a group of patients who show similar past medical symptoms and the Blood Transfusion Service Center database advises individuals about when to donate blood. The proposed data mining procedure involves neural network based clustering using Adaptive Resonance Theory 2 (ART2), and the extraction of fuzzy decision trees for each homogeneous cluster of data records using fuzzy set theory. Besides, another objective of this paper is to examine the effect of the number of membership functions on building decision trees. Experiments confirm that the number of erroneously clustered patterns is significantly reduced compared to other methods without preprocessing data using ART2. Keywords: Data classification, fuzzy ART2 algorithm, fuzzy decision tree, medical data. 1. Introduction Data mining, or knowledge discovery from data (KDD), is the process of extracting desirable knowledge, or interesting patterns, from existing databases for speCorresponding Author: Yo-Ping Huang is with the Department of Electrical Engineering National Taipei University of Technology, Taipei, Taiwan 10608. E-mail: [email protected] Shin-Liang Lai is with the Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan 10451. E-mail: [email protected] Frode Eika Sandnes is with the Faculty of Technology, Art and Design, Oslo and Akershus University College of Applied Sciences, 0130 Oslo, Norway. E-mail: [email protected] Shen-Ing Liu is with the Department of Psychiatry, Mackay Memorial Hospital, Taipei, Taiwan 10449. E-mail: [email protected] Manuscript received 26 March, 2012; revised 9 June, 2012; accepted 13 June, 2012. cific purposes. Many types of knowledge and technology have been proposed, and the extraction of decision trees from transaction data is one of the most commonly studied forms of data mining. Decision tree based classification is a supervised learning method that constructs decision tree from a set of samples [1, 2]. A decision tree is a tree structure where each leaf node is assigned a class label. The root node in the tree contains all training samples that are to be divided into classes. All nodes except the leaves are called decision nodes, since they specify decision to be performed at this node based on a single feature. Each decision node has a number of child nodes equal to the number of values that a given feature assumes. All decision tree algorithms are based on Hunt’s concept learning algorithm [3, 4]. The concept learning algorithm mimics how humans learn simple concepts, that is, how key distinguishing features between two categories are found, represented by positive and negative (training) examples. Hunt’s algorithm is based on a divide-and-conquer strategy where the task is to divide the set S, consisting of n samples belonging to c classes, into disjoint subsets that create a partition of the data into subsets containing samples from one class only. Decision trees have many advantages, such as rapid training (tree construction) and high classification accuracy. Artificial neural networks (ANNs) are commonly used for solving pattern classification problems and for finding decision trees [5]. In the medical domain, neural networks and other strategies have been used for diagnostic decision support such as cancer diagnosis [6] and fall detection [31]. Other fault detection models based on abdicative network model, and combined fuzzy logic and neural networks have been proposed [7-11] as well as a hybrid model combining Adaptive Resonance Theory (ART) and fuzzy c-mean clustering for medical classification and diagnosis with missing features [12]. This study proposes a method using fuzzy ART2 to improve data classification accuracy when discovering fuzzy decision trees. The input patterns are first fuzzified. These fuzzified patterns are clustered into groups where patterns have similar properties by the ART2 neural network. Finally, these groups are organized into a decision tree. The erroneous classification rate is greatly reduced and the efficiency of the algorithm in finding decision tree rules is increased since data patterns are clus- © 2012 TFSA Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees tered in advance. ART2 is computationally effective and allows the user to easily control the degree of pattern similarity within each cluster [30]. The proposed strategy has been applied on, but not limited to, two different medical databases. The first database includes information about patients who have undergone surgery for breast cancer. Those with similar symptoms are used to discover group decision trees rules to assist the doctors in treatment and diagnosis. The second database contains data about blood donors. The purpose is to find decision tree rules that can be used for providing advice about when to donate blood. 2. Related Work A. Fuzzy Neural Networks ANN learning algorithms are either supervised or unsupervised. Supervised learning involves training instances, or examples, with labeled outputs. Unsupervised learning involves an unknown goal with no pre-determined categorizations. ANN technology helps summarize, organize, and classify data. Requiring a few assumptions, they also identify patterns among input data with a high predictive accuracy [13]. Both ANNs and fuzzy models have been successfully applied in many areas [9-11, 14-16] and approaches for successfully combining these two approaches have become an active area of research. A fuzzy neural network (FNN) system uses the ANN learning algorithm to produce parameters. It then adapts these parameters for optimization. B. Fuzzy ID3 Decision Trees A decision tree is a simple structure, built through supervised learning, where nonterminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. Decision trees have been successfully applied to classification problems and several decision tree construction algorithms have been proposed [17-19, 34-36]. Decision tree induction methods such as ID3, C4.5 and CART [20-22] generated a tree structure through recursively partitioning the attribute space until the whole decision space is completely partitioned into a set of non-overlapping subspaces. Besides, through the pruning processes, decision trees can be further simplified to classify data without sacrificing their accuracy [34]. Fig. 1 shows an example of a decision tree obtained using the C4.5 algorithm [2] created from Roiger and Geats data [23]. The decision tree generalizes the data. Then, this decision tree can be mapped to a set of production rules by writing one rule for each path of the tree, in particular. z If a patient has swollen glands, the diagnosis is strep 445 throat. If a patient does not have swollen glands and does have a fever, the diagnosis is a cold. z If a patient does not have swollen glands and does not have a fever, the diagnosis is an allergy. The decision tree describes how to accurately diagnose a patient by only considering whether the patient has swollen glands and a fever. The attributes sore throat, congestion, and headache do not play a role in determining this diagnosis. ID3 is the most widely cited decision tree construction algorithm and several variations and enhancements have been proposed [24-26]. There are also a related branch of approaches that are not based on ID3 [27-29]. z Figure 1. Decision tree (based on data in [23]). The strategy proposed herein is coined fuzzy ART2. It is based on a fuzzy dataset where each item is associated with a membership grade. Fuzzy decision trees are generated using these fuzzy sets defined by a user. A fuzzy decision tree consists of nodes for testing attributes, edges for branching and leaves with class names and probabilities. 3. Methodology The procedures used for the implementation of the proposed system are shown in Fig. 2. There are five principal modules in the system, including data collection, data transformation, data clustering, decision trees construction and decision trees simplification. Data collection module: This module is used to collect data for analysis in the study. It is effectively a database. Data transformation module: It is used to preprocess the raw data for underlying analysis. Data clustering module: The module is to cluster data into smaller groups according to fuzzy ART2 model. This module helps reduce the complexity of constructing the decision trees. International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012 446 Decision trees construction module: Data in the same cluster are used to create the decision trees that can be further transformed into semantically fuzzy rules. Decision trees simplification module: The purpose is to further prune the trees. Figure 2. Block diagram of the proposed method. A. Preprocessing The ART2 input patterns are fuzzified and the clustered results are used to find the association rules among data. Assume that each input pattern contains n attributes (X1, X2, …, Xn) and a membership function μA of set A maps each attribute value to a real number in the interval [0,1], μA(x): X →[0, 1]. For convenience, the fuzzy membership function is denoted μf, where the subscript f indicates the corresponding fuzzy subset. The set of fuzzy membership functions for all attributes is denoted by S. The membership functions labels for each attribute are normally determined by the distribution of the attribute values. The ART2 input pattern are represented by new patterns in form of membership degrees (μ11, μ12, … , μ1S1, …, μn1, μn2, …, μnSn) where n is the number of attributes in the dataset, and (μi1, μi2, …, μiSi) represent the membership functions of the ith attribute in fuzzy subset Si. To better facilitate the discovery of association rules, linguistic terms are used instead of membership functions. B. Fuzzy ART2 Neural Network The following paragraphs describe the proposed fuzzy neural network. 1) Network Structure: After the fuzzy inputs have been extracted, the fuzzy ART2 clusters the patterns. The fuzzy ART2 can be described as follows: Input layer: The input layer consists of short term memories (STM). Weight layer: The weight layer consists of bottom-up bij and top-down tij weights, called long term memories (LTM). Output layer: The output layer is used to express the clustering results for the given data. 2) Learning Procedures: The fuzzy ART2 learning procedure consists of the following four steps: Step 1: The fuzzy vector is input into the input layer. Step 2: The distance between the bottom-up weights and fuzzy inputs are calculated and the shortest distance is identified. Step 3: If the shortest distance fails a vigilance test (a threshold defined to test the similarity between a new pattern and a cluster centroid), a new node is created with its weights equal to the fuzzy inputs. If a cluster wins the vigilance test, the centroid of the cluster is adjusted to adopt the new input. Step 4: The process is repeated until all the data are clustered. If all winners fail the vigilance test a new cluster is created and the corresponding weights are added. If the state is “resonance”, the current fuzzy input is assigned to this cluster by modifying the corresponding weights. The fuzzy ART2 is performed before discovering the decision tree rules to divide the volume of data from a potentially large database into smaller and more manageable subsets. Moreover, since patterns in a cluster have similar characteristics, the time taken to construct the decision tree for the subset will be shorter than if the process is applied to the full dataset. C. Finding Decision Tree Rules After all patterns of the original data are grouped into clusters, the data classification algorithms are used to find the decision tree rules for each cluster. The fuzzy decision tree [32] is an extension of the traditional decision tree and an effective method for extracting knowledge given uncertain classification problems. Fuzzy set theory is applied to the dataset. Tree growing and pruning are combined to determine the tree structure. ID3 selects the test attribute based on the information gain which is computed according to the probability of ordinary data. The algorithm proposed herein is similar to ID3, but its gain is computed according to the membership degrees. Assume that we have a dataset D, where each datum has l numerical attributes A1, A2, …, Al, one class set C = {C1, C2, …, Cn} and fuzzy subset DFij for the attribute Aij (assuming attribute Ai has m different values). Let DCk be a fuzzy subset in D whose class is Ck and |D| the sum of the membership values in the dataset D. Then, the fuzzy decision tree is generated as follows: (a) Generate the root node T. (b) If a node t with a fuzzy dataset D satisfies the following conditions, then it is aleaf node and assigned the majority class in D. (1). The proportion of a dataset of a class Ck is greater than or equal to athreshold θr , that is, D Ck D ≥ θr . (1) Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees (2). The sum of the membership values in a fuzzy dataset is less than a threshold θ n , that is, D Ck ≤ θ n . (2) (3). There are no attributes for further classification. (c) If it does not satisfy the above conditions, it is not a leaf and the test node is generated as follows: (1). For Ai’s (i = 1, 2, …, l), calculate the information gains G(Ai, D), described below, and select the maximizing test attributes Amax. (2). Divide D into fuzzy subsets D1, D2, …, Dm according to Amax, generate new nodes t1, t2, …, tm for the fuzzy subsets D1, D2, …, Dm. (3). Replace D by Dj (j = 1, 2, …, m) and repeat recursively from (b). The information gain G(Ai, D) for attribute Ai with a fuzzy dataset D is defined by G ( A , D ) = I ( D ) − E ( A , D ) , where i i n I ( D ) = − ∑ pk . log 2 pk , (3) k =1 E ( Ai , D ) = m ∑ p .I ( D ij Fij ), (4) j =1 D pk = pij = Ck , D DF ij ∑ m h =1 DF (5) , (6) ih where |DFij| is the sum of membership values in a fuzzy subset DFij for the attribute Aij. Eq.(1) indicates that if all data belong to a single cluster, then there is no need for further split on the decision node. Eq.(2) is used to check whether a cluster Ck contains only few data. In our experiments, we set θ r = 1 and θ n = 0 . The following sleeping quality measurement example illustrates one cycle of the algorithm with the data in Table 1. Given a fuzzy dataset D where MV is the membership value of each data set. There are two distinct classes with values G (Good) and B (Bad). A (root) node T is created for the samples in D. To find the splitting criterion for these samples, the information gains of the four test attributes are computed according to the fuzzy values and the results compared. First, the expected information needed to classify a sample in D from Eq.(3) is I (D) = − 9 20 log 2 9 20 − 11 20 log 2 11 20 = 0.99 bits. Next, the expected information requirement for each attribute is found. For each linguistic term in each attribute, the total 447 fuzzy values of each linguistic term in the same class G or B are found by applying Eq.(4). For attribute “clothes” in Table 2 shows the total fuzzy values of the linguistic terms in each class. Note that the first figure in the left-hand-side column in Table 2, “much” (2-5), means that two data belong to the class G while 5 data in class B for clothes with an attribute value of “much.” The expected information needed to classify a sample in D if the examples are partitioned according to “clothes” is: E ( Clothes, D ) = 5 13.7 I ( DF 11 ) + 4.3 13.7 I ( DF 12 ) + 4.4 13.7 I ( DF 13 ) = 0.54 bits. Therefore, the information gained by branching on attribute “clothes” is: G ( clothes, D ) = I ( D ) − E ( clothes , D ) = 0.45 bits. Table 1. The example dataset. Cloth es MV Temp MV Humidity MV Noise MV Quality much 1.0 high 0.6 prodigious 0.2 no 0.9 B much 0.5 high 0.9 prodigious 0.7 0.3 B much 0.7 high 0.3 prodigious 0.5 noisy medium 0.6 B 0.4 high 0.5 prodigious 0.8 no 0.7 G medium 0.2 G normal normal enoug h enoug h enoug h enoug h much much enoug h enoug h 0.8 0.3 0.6 0.7 1.0 0.9 0.2 0.6 0.3 much 0.7 much 1.0 normal normal normal enoug h normal 1.0 0.8 high moderate moderate 0.2 prodigious 1.0 0.7 prodigious 0.9 no 0.8 B 1.0 B 1.0 prodigious 0.6 medium high 1.0 comfortable 0.8 no 0.3 G high 0.3 comfortable 0.5 noisy 0.2 B 0.6 prodigious 0.9 no 0.9 B medium 1.0 B no 0.3 B 0.4 B 0.7 G moderate moderate moderate moderate moderate moderate moderate moderate 0.7 prodigious 0.4 0.4 comfortable 0.2 medium medium 0.9 comfortable 0.7 0.3 comfortable 0.6 1.0 comfortable 1.0 noisy 0.5 G 0.8 prodigious 0.2 noisy 0.9 G 0.2 G 0.9 prodigious 1.0 medium 0.3 high 0.5 comfortable 0.9 no 0.5 G 0.9 moderate 0.3 prodigious 0.3 noisy 0.7 B 0.8 medium 0.3 G 1.0 high 1.0 comfortable Table 2. Distributions of the samples in original “clothes”. Clothes For clothes = “much” (2-5) p1=1.7 n1=3.3 For clothes =”normal” (6-0) p2=4.3 n2=0 For clothes = “enough” (1-6) p3=0.7 n3=3.7 I(DF11)=0.92 I(DF12)=0 I(DF13)=0.63 Similarly, the information gained by branching on the three remaining attributes can be calculated in the same International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012 448 way with the attribute “clothes”. Finally, the results of the gain in information by branching on all four attributes are displayed in decreasing bit values in Table 3. The attribute “clothes” in Table 3 has the highest information gain and becomes the splitting attribute at the root node T of the fuzzy decision tree. The root node “clothes” has three linguistic terms “much”, “normal” and “enough”, thus three leaves are grown from this node. Proceeding the same way for other attributes the final fuzzy decision tree in Fig. 3 was obtained. Table 3. Information gained by branching on four attributes. Attribute Gain clothes 0.45 humidity noise 0.08 0.06 temperature 0.02 Table 4. Abbreviations of the patient parameters. Attribute Age of patient Patient’s year of operation Number of positive auxiliary nodes detected Survival status No. 1 2 3 4 5 1 2 3 1 2 3 4 5 1 2 Attribute Frequency - total number of donation Monetary - total blood donated in c.c. 4. Experimental Results and Analyses Benchmark data were used to illustrate the effectiveness of the proposed model. The data sets are available from the UCI machine learning repository [33]. The first experiment involves Haberman's Survival problem. The Harbeman’s Survival dataset consists of 306 samples. Each sample comprises four attributes abbreviated Attr1, Attr2, Attr3, and Attr4. Table 4 lists the linguistic terms used for these four attributes in each pattern for this experiment. The survival status attribute value is clearly indicated in the medical data, not self-defined information. Besides, either one or two indicates whether or not the patient has survived five or more years. The second experiment addressed the Blood Transfusion problem. There are 748 samples in this dataset where each sample contains five attributes. The classifying attribute is a binary variable representing whether the donor should donate blood at a certain time. The linguistic terms for the attributes are listed in Table 5. Linguistic term Little_Young Young Middle_Aged Senior Little_Senior Old Average Recent Very_Small Small Medium Large Very_Large High Low Table 5. Abbreviations of the donor parameters. Recency - months since last donation Figure 3. Final decision tree using fuzzy ID3 algorithm. Parameter Ly Yo Ma Se Ls Ol Av Re Vs Sm Me La Vl Hi Lo Time - months since first donation A binary variable representing whether he/she donated blood in March 2007 No. Parameter Linguistic Term 1 Vs Very_Small 2 3 4 5 1 2 3 Sm Me Bi Vb Fe Me Ma Small Medium Big Very_Big Few Medium Many 1 Tl Too_Little 2 3 4 5 1 2 3 4 5 Li Av Mu Tm Vr Re Mo Lg Vl Little Average Much Too_Much Very_Recent Recent Moderate Long Very_Long 1 Hi High 2 Lo Low A. Haberman’s Survival Problem The vigilance value in the ART2 model dominates the clustering results that in turn affect the number of patterns assigned to each cluster. Depending on the database size and the properties of the data, it is hard to decide a suitable number of clusters before clustering. In the first experiment, some mid-size clusters, such as 3 to 7, are tested for the medical data. Small clusters are checked to see whether they are outliers or not. For example, we found that one pattern (83, 58, 2, 2) is numerically distant from the remaining data. Therefore, only 305 patterns are used in the simulations to avoid the overhead of processing outliers. Experiment 1 started by fuzzifying the input patterns using the membership function labels 3, 3, 3 and 2 for the four attributes (case HS-1 with the vigilance value is set to 3.6). The membership functions of the four attributes are also plotted in Fig. 4. Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees The overlapping range and shape of the membership functions may affect the decision rules. Although these parameters can be further optimized, this is not the focus of this study. Here, all membership functions are determined empirically. 449 pattern 53. - The tree for cluster 2 contains the leaf MaAvHi (indicated with an arrow in Fig. 5(b)). The classification result (Hi) is consistent with the evidence. Another objective of this experiment was to examine the effect of the number of membership functions on building decision trees. The number of membership functions for the patient age attribute and the number of positive auxiliary nodes attribute are changed to 5 membership functions as shown in Fig. 7 (case HS-2, the vigilance value is set to 3.4). Figure 4. Linguistic terms for the four attributes for case HS-1. After applying the fuzzy ART2, the data are grouped into 3 clusters containing 131, 115 and 59 patterns, respectively. The fuzzy ID3 algorithm is then applied to find the decision tree rules for each cluster. Fig. 5 shows results after applying fuzzy ID3 to each of the three clusters in case HS-1. The decision tree rules are found by traversing the tree from the root to each leaf. Each path of branches from the root to the leaf is converted into a rule where the conditional part represents the attributes of the passing branches from the root to the leaf with the highest classification truth level. The end nodes contain the classified attributes (Lo or Hi). For example, the right leaf in Fig. 5(a) yields the result YoAvSmHi meaning “if the age of the patient is young (Yo) and the patient’s year of operation is average (Av) and number of positive auxiliary nodes detected is small (Sm), then the patient survived 5 years or longer (Hi)”. The resulting fuzzy decision trees are compared with the results obtained only using the fuzzy ID3 for the original dataset (without using fuzzy ART2) to mine all the 305 patterns. The whole decision tree is shown in Fig. 6. To compare the whole tree with the other smaller trees of clusters, we use all patterns in each cluster to check for erroneously clustered patterns. For example, picking the pattern 53 (MaAvMeHi) in cluster 2 to verify its correctness in the whole tree and the tree of cluster 2, we found that: - The entire tree contains the leaf MeMaAvLo (indicated with an arrow in Fig. 6). It means the patient did not survive five or more years. The classification result (Lo) is contradictory to the evidence given in Figure 5. The decision trees for case HS-1: (a) Cluster 1, (b) Cluster 2 and (c) Cluster 3. Attr3 La Sm Me Attr1 Attr1 Ma Yo Hi Lo Attr1 Ma Attr2 Re Lo Av Lo Se Yo Attr2 Ol Lo Re Lo Attr2 Av Ol Hi Hi Re Hi Hi Av Ol Hi Yo Ma Lo Attr2 Re Hi Av Hi Ol Hi Figure 6. The decision tree for the original dataset in case HS-1. Figure 7. Linguistic terms of attribute 1 and attribute 3 for case HS-2. In this case, the dataset is clustered into 5 clusters (larger than that of case HS-1) where the numbers of patterns are 78, 66, 57, 85 and 19. In two cases the full advantage of fuzzy ART2 is taken in finding decision International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012 450 tree rules. The dataset is split into smaller parts to reduce the computational effort. Table 6 lists the results of the first experiment including the data ratio and reduction percentage of erroneous classifications in the clusters for the two cases. To compare the whole tree with the other smaller trees of clusters, we use all patterns in each cluster to check for erroneously clustered patterns. Table 6 also shows Table 6. Results of experiment 1. Case HS-1 Case HS-2 Cluster No. of patterns 1 2 3 1 2 3 4 5 131 115 59 78 66 57 85 19 Data ratio (%) 43.0 37.7 19.3 25.6 21.6 18.7 27.9 6.2 erroneous classifications Full From Reduction tree cluster (%) 5 19 10 5 27 22 7 13 1 16 1 2 18 7 4 0 80 15.8 90 60 33.3 68.2 42.9 100 B. Blood Transfusion Service Center Database The proposed method was also applied to the Blood Transfusion Service Center database. First, seven outliers were discarded and 741 patterns were used. Two cases were considered with the vigilance values set to 3.9. The inputs of the first case, BT-1, were fuzzified using the number of membership functions of 3, 3, 3, 3 and 2 for the five attributes. The simplified trees from clusters 1, 2, 3, 4 and full tree in case BT-1 are shown in Fig. 9. For the second case, BT-2, the recency (months since last donation attribute), the monetary (total blood donated in c.c.) attribute and time (months since first donation) were respectively fuzzified using five membership functions. that the full tree always contains more erroneously clustered patterns than the small trees. For example, cluster 1 has five erroneously clustered patterns in the full tree while the small tree only has one in case HS-1. The error classification rates are reduced by 47.1% and 58.1% in cases HS-1 and HS-2, respectively. The trees in case HS-2 are more complicated than case HS-1 because increasing the number of linguistic terms for attributes also incurs an increase in branches for each attribute. The results in Table 6 show that there are more erroneously clustered patterns in the full tree than in the small trees. Cluster 5 has no erroneously clustered pattern (the best case). The decision tree and the corresponding classification rules are further simplified [34]. The trimmed trees from Fig. 5 are shown in Fig. 8. It is easier to search for decision rules in these reduced trees. For example, Fig. 8(a) has only three leaves in the reduced tree and the rules are formed: SeHi, MaHi and YoHi. Thus, if any pattern belongs to cluster 1, it will be classified to attribute Hi. Figure 9. The simplified trees from clusters 1, 2, 3, 4 and full tree in case BT-1. Figure 8. The simplified trees from clusters 1, 2 and 3 in case HS-1. Table 7 presents the grouping with distributions of patterns achieved with the fuzzy ART2 for the two cases. The datasets of the two cases are grouped into 4 and 5 clusters, respectively. The decision tree rules help make decisions about whether someone should donate blood during a certain period. To demonstrate the efficiency of the fuzzy ART2 algorithm, the erroneously clustered patterns are also extracted and the results are listed in Table 7. The results show that when processing the whole tree more errone- Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees ously clustered patterns appear than for the small trees. The error classification rates are reduced by 34.5% and 92.9% in cases BT-1 and BT-2, respectively. Finally, the resulting trees are simplified as described previously. Although there are 4 and 5 attributes in the Haberman’s Survival database and Blood Transfusion Service Center database, respectively, the proposed method will not be deterred its applications to multi-attribute databases. The proposed method can be easily applied to construct the decision trees from multi-attribute databases. Table 7. Results for experiment 2. Cluster Case BT-1 Case BT-2 1 2 3 4 1 2 3 4 5 No. of patterns 104 252 228 157 91 259 297 44 50 Data ratio (%) 14.0 34.0 30.8 21.2 12.3 35.0 40.1 5.9 6.7 erroneous classifications Full From Reduction tree cluster (%) 28 0 1 0 22 28 1 5 0 19 0 0 0 0 0 0 4 0 32.1 0 100 0 100 100 100 20 0 5. Conclusions and Future Work A novel approach is proposed for finding decision tree rules by combining fuzzy ART2 and fuzzy ID3. First, a fuzzy model and ART2 neural network is used for clustering the fuzzified dataset into groups with similar properties. Next, decision tree rules are mined using the fuzzy ID3. Two medical datasets have been explored to validate the effectiveness of the proposed approach, namely the Haberman’s Survival database for grouping patients who have undergone surgery for breast cancer into groups with similar properties, and the Blood Transfusion Service Center database for clustering blood donors into groups. The experiments confirm that the erroneous classification rates are significantly reduced by processing simplified decision trees. Next, the experimental results show that the discovered decision tree rules from individual cluster are more efficient than those based on all of the data. The number of erroneously clustered patterns in the decision tree of individual cluster is smaller than for the entire tree. Applications of the proposed method are not limited to medical databases. As long as the accuracy and efficiency of data classification are concerned, the presented work is applicable to the multi-attribute databases. Although the ID3 algorithm is very popular in the field of machine learning it is associated with several drawbacks. For example, it tends to select attributes that have more attribute values as the expanding attribute in 451 the process of building decision tree, but the selected attribute is often not the one that contributes most to the classification. Future work should thus focus on improving the ID3 algorithm and the classification accuracy. Acknowledgments This work was supported in part by National Science Council, Taiwan under Grant NSC100-2221-E-027-110and in part by Joint project between National Taipei University of Technology and Mackay Memorial Hospital under Grants NTUT-MMH-99-03 and NTUT-MMH100-09. References [1] J. R. Quinlan, “Introduction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986. [2] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. [3] G.-G. Li, Y. Li, F.-C. Li, and C.-X. Jin. “Decision tree inductive learning algorithm based on removing noise gradually,” in Proc. of Int. Conf. on Machine Learning and Cybernetics, Hong Kong, China, vol. 5, pp. 2723-2728, Aug. 2007. [4] X.-D. Song and X.-L. Cheng “Decision tree algorithm based on sampling,” in Proc. of IFIP Int. Conf. on Network and Parallel Computing Workshops, Dalian, China, pp. 689-694, Sep. 2007. [5] R. Li and Z. Wang, “Mining classification rules using rough sets and neural networks,” European Journal of Operational Research, vol. 157, no. 2, pp. 439-448, Sep. 2004. [6] M. Ringnér and C. Peterson, “Microarray-based cancer diagnosis with artificial neural networks,” BioTechniques, vol. 34, pp. 30-34, Mar. 2003. [7] D. S. S. Lee, B. J. Lithgow, and R. E. Morrison, “New fault diagnosis of circuit breakers,” IEEE Trans. on Power Delivery, vol. 18, no. 2, pp. 454-459, Apr. 2003. [8] J. M. Benitez, J. L. Castro, and I. Requena, “Are artificial neural networks black boxes,” IEEE Trans. on Neural Networks, vol. 8, no. 5, pp. 1156-1164, Sep. 1997. [9] Y.-P. Huang, T.-W. Chang, L.-J. Kao, and F. E. Sandnes, “Using fuzzy SOM strategy for satellite image retrieval and information mining,” Journal of Systemics, Cybernetics and Informatics, vol. 6, no. 1, pp. 1-6, Nov. 2008. [10] S. A. Mingoti and J. O. Lima, “Comparing SOM neural network with fuzzy c-means, k-means and traditional hierarchical clustering algorithms,” Eu- 452 [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] International Journal of Fuzzy Systems, Vol. 14, No. 3, September 2012 ropean Journal of Operational Research, vol. 16, no. 3, pp. 1742-1759, Nov. 2006. R. J. Kuo and Y. T. Su, “Integration of ART2 neural network and fuzzy sets theory for market segmentation,” Int. Journal of Operations Research, vol. 1, no. 1, pp. 67-68, Feb. 2004. E. Kolman and M. Margaliot, “Are artificial neural networks white boxes,” IEEE Trans. on Neural Networks, vol. 16, no. 4, pp. 844-852, July 2005. B. S. Ahn, S. S. Cho and C. Y. Kim, “The integrated methodology of rough set theory and artificial neural network for business failure prediction,” Expert Systems with Applications, pp. 65-74, Feb. 2000. J Zhao and Z. Zhang, “Fuzzy rough neural network and its application to feature selection,” Int. Journal of Fuzzy Systems, vol. 13, no. 4, pp. 270-275, Dec. 2011. T.-P. Hong, C.-T. Chiu, S.-L. Wang, and K.-Y. Lin, “Mining a complete set of fuzzy multiple-level rules,” Int. Journal of Fuzzy Systems, vol. 13, no. 2, pp. 111-119, June 2011. I.-J. Ding, “Fuzzy rule-based system for decision making support of hybrid SVM-GMM acoustic event detection,” Int. Journal of Fuzzy Systems, vol. 14, no. 1, pp. 118-130, March 2012. W. Du and Z. Zhan, “Building decision tree classifier on private data,” in Proc. of Workshop on Privacy, Security, and Data Mining at the IEEE Int. Conf. on Data Mining, Maebashi, Japan, Dec. 2002. A. Abdelhalim and I. Traore, “A new method for learning decision trees from rules,” in Proc. of Int. Conf. on Machine Learning and Applications, Miami Beach, FL, USA, pp. 693-698, Dec. 2009. Q.-W. Meng, H. Qiang, L. Ning, X.-R. Du, and L.-N. Su, “Crisp decision tree induction based on fuzzy decision tree algorithm,” in Proc. of 1st Int. Conf. on Information Science and Engineeging, Nanjing, China, pp. 4811-4814, Dec. 2009. S. Ruggieri, “Efficient C4.5 [classification algorithm],” IEEE Trans. on Knowledge and Data Engineering, vol. 14, pp. 438-444, Mar./Apr. 2002. Y.-P. Huang and Vu Thi Thanh Hoa “General criteria on building decision trees for data classification,” in Proc. of the 2nd Int. Conf. on Interaction Sciences: Information Technology, Culture and Human, Seoul, Korea, vol. 403, pp. 649-654, Nov. 2009. G. Vagliasindi, P. Arena, and A. Murari, “CART data analysis to attain interpretability in a fuzzy logic classifier,” in Proc. of the Int. Joint Conf. on Neural Networks, Atlanta, Georgia, USA, pp. 1925-1932, June 2009. [23] R. J. Roiger and M. W. Geats, Data Mining: A Tutorial-Based Primer, Addison-Wesley Publishing Company, Inc., Boston, MA, USA, 2003. [24] M. Huang, W. Niu, and X. Liang, “An improved decision tree classification algorithm based on ID3 and the application in score analysis,” in Proc. of Conf. on Control and Decision, Guilin, China, pp. 1876-1879, June 2009. [25] S. Wu, L.-Y. Wu, Y. Long, and X.-D. Gao, “Improved classification algorithm by Minsup and Minconf based on ID3,” in Proc. of the Int. Conf. on Management Science and Engineering, Lille, France, pp. 135-139, Oct. 2006. [26] J. Chen, D.-L Luo, and F.-X. Mu, “An improved ID3 decision tree algorithm,” in Proc. of 4th Int. Conf. on Computer Science and Education, Nanning, China, pp.127-130, July 2009. [27] Y.-P. Huang, L.-J. Kao, and F. E. Sandnes, “Efficient mining of salinity and temperature association rules from ARGO data,” Expert Systems with Applications, vol. 35, no. 1-2, pp. 59-68, Aug. 2008. [28] L. V. S. Lakshmanan, C. K. -S. Leung, and R. T. Ng, “The segment support map: scalable mining of frequent itemsets,” ACM SIGKDD Explorations Newsletter, vol. 2, no. 2, pp. 21-27, Dec. 2000. [29] J. S. Park, M.-S. Chen, and P. S. Yu, “Using a hash-based method with transaction trimming for mining association rules,” IEEE Trans. on Knowledge and Data Engineering, vol. 9, no. 5, pp. 813-825, Oct. 1997. [30] J. A. Freeman and D. M. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley Publishing Company, Inc., Boston, MA, USA, 1992. [31] C. N. Doukas and I. Maglogiannis, “Emergency fall incidents detection in assisted living environments utilizing motion, sound, and visual perceptual components,” IEEE Trans. on Information Technology in Biomedicine, vol. 15, no. 2, pp. 277-289, Mar. 2011. [32] M. Umano, H. Okamoto, I. Hatono, H. Tamura, F. Kawachi, S. Umedzu, and J. Kinishita, “Fuzzy decision tree by fuzzy ID3 algorithm and its application to diagnosis systems,” in Proc. of the 3rd IEEE Conf. on Fuzzy Systems, Orlando, FL, USA, pp. 2113-2118, June 2006. [33] Http://archive.ics.uci.edu/ml/ [34] J. R. Quinlan, “Simplifying decision trees,” Int. Journal of Man Machine Studies, vol. 27, pp. 221-234, 1987. [35] R. Jin and G. Agrawal, “Efficient decision tree construction on streaming data,” in Proc. of the Ninth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, D.C, Yo-Ping Huang et al.: Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees USA, pp. 571-576, Aug. 2003. [36] Z.-H. Huang, T. D. Gedeon, and M. Nikaravesh, “Pattern trees inductions: A new machine learning method,” IEEE Trans. on Fuzzy Systems, vol. 16, pp. 958-970, Aug. 2008. Yo-Ping Huang received his Ph.D. in electrical engineering from Texas Tech University, Lubbock, TX, USA. He is currently a Professor in the Department of Electrical Engineering at National Taipei University of Technology (NTUT), Taiwan. He also serves as CEO of Joint Commission of Technological and Vocational College Admission Committee in Taiwan. He was the secretary general at NTUT, Chairman of IEEE CIS Taipei Chapter, and Vice Chairman of IEEE SMC Taipei Chapter. He was professor and Dean of College of Electrical Engineering and Computer Science, Tatung University, Taipei, before joining NTUT. His research interests include medical knowledge mining, intelligent control systems, handheld device application systems design. Prof. Huang is a senior member of the IEEE and a fellow of the IET. Shin-Liang Lai received BS and MS degrees in mathematics from the National Taipei University of Education, Taipei, Taiwan, in 1998 and 2002, respectively. He is currently a researcher and Ph.D. candidate at the department of computer science and engineering, Tatung University, Taipei, Taiwan. His research interests include content-based music information retrieval, data mining, and fuzzy inference system. Frode Eika Sandnes received a B.Sc. in computer science from the University of Newcastle upon Tyne, U.K., and a Ph.D. in computer science from the University of Reading, U.K. Sandnes is a Professor in the Institute of Information Technology, Faculty of Technology, Art and Design at Oslo and Akershus University College of Applied Sciences, Norway. He he currently serves as pro-Rector of research and internationalisation at Oslo and Akerhus University College of Applied Sciences. His research interests include error-correction, intelligent systems, human computer interaction and universal design. Shen-Ing Liu received her M.D. degree from the China Medical University, Taiwan, and Ph.D. degree in psychiatric epidemology from the Institute of Psychiatry, King’s College London, University of London, U.K. She is currently an Associate Professor in the Department of Medicine at Mackay Medi- 453 cal College, Taiwan and also serves as an attending psychiatrist in Mackay Memorial Hospital, Taipei, Taiwan. Her research interests include epidemiology, clinical psychiatry and interventional study.