Download A Closest Fit Approach to Missing Attribute Values

Recent Advances in Intelligent Information Systems ISBN 978-83-60434-59-8, pages 429–442 A Closest Fit Approach to Missing Attribute Values Jerzy W. Grzymala-Busse1,2 1 2 Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA Institute of Computer Science, Polish Academy of Sciences, 01–237 Warsaw, Poland Abstract Recently, results on a comparison of seven successful methods of handling missing attribute values were reported. This paper describes experimental results on the three most successful methods out of these seven. Two of these methods, based on a Closet Fit idea (searching in a remaining data set for the closest fit case and replacing a missing attribute value by the corresponding known value from that case) were enhanced by 12 strategies (using two different options for missing attribute replacement, three different interpretations of missing attribute values and two types of rules: certain and possible). Our results show that for a given data set the best method handling missing attribute values should be selected individually, testing the main two methods: Local Closest Fit method with four options: both types of missing attribute replacement and two interpretations of missing attribute values and the Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. All of these methods are local, i.e., restricted to a concept, so it indicates all over again that local approaches are better than global ones. 1 Introduction It is well known that mining incomplete data, i.e., data in which some attribute values are missing, is much more challenging than mining complete data. Research on selecting the best methods of handling missing attribute values is important for data mining since many real-life data sets are incomplete. In general, methods to handle missing attribute values may be categorized as sequential and parallel. In sequential methods some technique is used to handle missing attribute values, then the main process of acquiring knowledge is conducted. In parallel methods missing attribute values are taken into account during knowledge acquisition, i.e., both processes are performed in parallel. Sequential methods include techniques based on deleting cases with missing attribute values, replacing a missing attribute value by the most common value of that attribute, assigning all possible values to the missing attribute value, replacing a missing attribute value by the mean for numerical attributes, assigning to a 430 Jerzy W. Grzymala-Busse missing attribute value the corresponding value taken from the closest fit case, or replacing a missing attribute value by a new value, computed from a new data set, in which the original attribute is a decision. Parallel methods to handle missing attribute values include the MLEM2 (Modified Learning from Examples Module, version 2) rule induction algorithm in which rules are induced form the original data set, with missing attribute values considered to be lost values, attribute-concept values, or ”do not care” conditions, see, e.g., Grzymala-Busse (2002). MLEM2 is an option of the LERS (Learning from Examples based on Rough Sets) data mining system. The C4.5, see Quinlan (1993), and CART, see Breiman et al. (1984), approaches to missing attribute values are other examples of methods from the same group. There are three important interpretations of missing attribute values. The first is a lost value interpretation, Grzymala-Busse and Wang (1997); Stefanowski and Tsoukias (1999), where we consider a missing attribute value as potentially important but unavailable because it was mistakenly erased or mistakenly not recorded. We cannot replace a lost value by any known (specified) attribute value. In the second interpretation, called a ”do not care” condition, Grzymala-Busse (1991); Kryszkiewicz (1999), it is assumed that a missing attribute value was irrelevant during data collection. Typically, a ”do not care” condition can be replaced by any possible value from the attribute domain. The third possibility is an attribute-concept value where a missing attribute value may be replaced by any possible value from the attribute domain restricted to the concept to which the case belongs. The concept (or class) is a set of all cases classified the same way. For example, if we want to use the attributeconcept interpretation of a missing attribute value for the attribute Temperature for a patient who is sick with flu, and other patients sick with flu have values of Temperature either high or very high, then the typical values are high and very high. The attribute-concept approach to missing attribute values was introduced in Grzymala-Busse (2004). This paper is a continuation of Grzymala-Busse (2009). In Grzymala-Busse (2009) experimental results on seven successful methods of handling missing attribute values, each method with additional two options of using certain and possible rules, were presented. Among these seven methods, only three produced consistently good results in terms of an error rate evaluated by ten-fold cross validation. Two of these good methods were Global and Local Closest Fit methods, the third method was Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. The main objective of this paper was to conduct experiments using both Closest Fit methods, each with 12 options, to identify the best approach and then to compare the best results with the third best method identified in Grzymala-Busse (2009), i.e., the method of Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. Though in six cases out of eight the Local Closest Fit methods outperformed the Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept, their performance does not differ significantly (at 5% significance level using the Wilcoxon two-tailed test). A Closest Fit Approach to Missing Attribute Values 431 2 Conventional Methods to Handle Missing Attribute Values We will describe some traditional methods to handle missing attribute values, all sequential and considered by many authors as very successful. 2.1 Most Common Value for Symbolic Attributes, Average Value for Numerical Attributes There are two versions of this method. In the first version, any missing attribute value is replaced by the most common value of the attribute. Thus, a missing attribute value is replaced by the most probable known (specified) attribute value. Additionally, every missing attribute value for a numerical attribute is replaced by the average of known attribute values. In the second version of this method the most common value of the attribute restricted to the concept is used instead of the most common value for all cases. Such a concept is the same concept that contains the case with missing attribute value. A missing attribute value of a numerical attribute is replaced by the average of all known values of the attribute restricted to the concept. 2.2 Closest Fit Again, there are two versions of this method. The first version, called a Global Closest Fit method, Grzymala-Busse et al. (2002), is based on replacing a missing attribute value by the known value in another case that resembles as much as possible the case with the missing attribute value. In searching for the closest fit case we compare two vectors of attribute values, one vector corresponds to the case with a missing attribute value, the other vector is a candidate for the closest fit. The search is conducted for all cases, hence the name Global Closest Fit. For each case a distance is computed, the case for which the distance is the smallest is the closest fitting case that is used to determine the missing attribute value. Let x and y be two cases. The value of x for the attribute ai will be denoted by ai (x), where i = 1, 2, ..., n. The distance between cases x and y is computed as follows distance (x, y) = n X distance (ai (x), ai (y)), i=1 where distance (ai (x), ai (y)) =            0 |ai (x)−ai (y)| r 1 if both ai (x) and ai (y) are known and ai (x) = ai (y) , if ai (x) and ai (y) are numbers and ai (x) 6= ai (y) , otherwise, r is the difference between the maximum and minimum of the known values of the numerical attribute with a missing value. If there is a tie for two cases with the same distance, a kind of heuristics is necessary, for example, select the first case. In this method, there are two approaches to replacement of missing attribute values by the known value of the closest case. In the first approach, called static, 432 Jerzy W. Grzymala-Busse Table 1: An incomplete decision table Attributes Decision Case Temperature Headache Cough Flu 1 2 3 4 5 6 7 high ? high − * normal normal ? yes − no yes no − yes * yes * no ? no yes yes yes no no no no there is an attempt to replace all missing attribute values after the scanning of the entire data set. In the second approach, called dynamic, any missing attribute value is replaced by the existing value immediately after finding the closest case. This value may be used in the following steps of looking for the closest case as known. In general, using the Global Closest Fit method may result in data sets in which some missing attribute values are not replaced by known values, in spite of the fact that the entire data set is searched for the closest cases in many iterations in which the entire data set is completely scanned. Additional iterations of using this method may reduce the number of missing attribute values, but may not end up with all missing attribute values being replaced by known attribute values. The second closest fit method, called a Local Closest Fit (or Concept Closest Fit) is similar to the Global Closest Fit method. The original data set, containing missing attribute values, is split into smaller data sets, each smaller data set corresponds to a concept from the original data set, and then the Global Closest Fit is used for the smaller data sets. This approach is applied using either static or dynamic way of replacing missing attribute values. 3 Blocks of attribute-value pairs In this section, we will quote some basic ideas of the rough set theory. We will assume that lost values will be denoted by ”?”, ”do not care” conditions will be denoted by ”*”, and attribute-concept values will be denoted by ”–”. We assume that the input data sets are presented in the form of a decision table. An example of a decision table is shown in Table 1. Rows of the decision table represent cases, while columns are labeled by variables. The set of all cases will be denoted by U . In Table 1, U = {1, 2, ..., 7}. Independent variables are called attributes and a dependent variable is called a decision and is denoted by d. The set of all attributes will be denoted by A. In Table 1, A = {Temperature, Headache, Cough}. A Closest Fit Approach to Missing Attribute Values 433 For the rest of the paper we will assume that all decision values are known, i.e., they are not missing. Additionally, we will assume that for each case at least one attribute value is known. An important tool to analyze decision tables is a block of an attribute-value pair. Let (a, v) be an attribute-value pair. For complete decision tables, i.e., decision tables in which every attribute value is known, a block of (a, v), denoted by [(a, v)], is the set of all cases x for which a(x) = v. For incomplete decision tables the definition of a block of an attribute-value pair is modified. • If for an attribute a there exists a case x such that a(x) = ?, i.e., the corresponding value is lost, then the case x should not be included in any blocks [(a, v)] for all values v of attribute a, • If for an attribute a there exists a case x such that the corresponding value is a ”do not care” condition, i.e., a(x) = ∗, then the case x should be included in blocks [(a, v)] for all known values v of attribute a, • If for an attribute a there exists a case x such that the corresponding value is an attribute-concept value, i.e., a(x) = −, then the corresponding case x should be included in blocks [(a, v)] for all known values v ∈ V (x, a) of attribute a, where V (x , a) = {a(y) | a(y) is known, y ∈ U, d(y) = d(x)}. For Table 1, V (3, Headache) = {yes}, V (4, T emperature) = {normal} and V (7, Headache) = {no, yes}, so the blocks of attribute-value pairs are: [(Temperature, normal)] = {4, 5, 6, 7}, [(Temperature, high)] = {1, 3, 5}, [(Headache, no)] = {4, 6, 7}, [(Headache, yes)] = {2, 3, 5, 7}, [(Cough, no)] = {2, 4, 5, 7}, [(Cough, yes)] = {1, 2, 3, 4}, For a case x ∈ U the characteristic set KB (x) is defined as the intersection of the sets K(x, a), for all a ∈ B, where the set K(x, a) is defined in the following way: • If a(x) is specified, then K(x, a) is the block [(a, a(x)] of attribute a and its value a(x), • If a(x) =? or a(x) = ∗ then the set K(x, a) = U , • If a(x) = −, then the corresponding set K(x, a) is equal to the union of all blocks of attribute-value pairs (a, v), where v ∈ V (x, a) if V (x, a) is nonempty. If V (x, a) is empty, K(x, a) = U . For Table 1 and B = A, KA (1) = {1, 3, 5} ∩ U ∩ {1, 2, 3, 4} = {1, 3}, KA (2) = U ∩ {2, 3, 5, 7} ∩ U = {2, 3, 5, 7}, KA (3) = {1, 3, 5} ∩ {2, 3, 5, 7} ∩ {1, 2, 3, 4} = {3}, KA (4) = {4, 5, 6, 7} ∩ {4, 6, 7} ∩ U = {4, 6, 7}, KA (5) = U ∩ {2, 3, 5, 7} ∩ {2, 4, 5, 7} = {2, 5, 7}, KA (6) = {4, 5, 6, 7} ∩ {4, 6, 7} ∩ U = {4, 6, 7}, 434 Jerzy W. Grzymala-Busse Table 2: Data sets used for experiments Data set cases Bankruptcy Breast cancer Hepatitis House Image segmentation Iris Lymphography Wine 66 277 155 434 210 150 148 178 Number of attributes 5 9 19 16 19 4 18 12 concepts Type of attributes 2 2 2 2 7 3 4 3 numerical symbolic discretized symbolic discretized numerical symbolic discretized KA (7) = {4, 5, 6, 7} ∩ ({2, 3, 5, 7} ∪ {4, 6, 7}) ∩ {2, 4, 5, 7} = {4, 5, 7}. Characteristic set KB (x) may be interpreted as the set of cases that are indistinguishable from x using all attributes from B and using a given interpretation of missing attribute values. Thus, KA (x) is the set of all cases that cannot be distinguished from x using all attributes. The characteristic relation R(B) is a relation on U defined for x, y ∈ U as follows (x , y) ∈ R(B ) if and only if y ∈ KB (x ). The characteristic relation R(B) is reflexive but—in general—does not need to be symmetric or transitive. Also, the characteristic relation R(B) is known if we know characteristic sets KB (x) for all x ∈ U . In our example, R(A) = {(1, 1), (1, 3), (2, 2), (2, 3), (2, 5), (2, 7), (3, 3), (4, 4), (4, 6), (4, 7), (5, 2), (5, 5), (5, 7), (6, 4), (6, 6), (6, 7), (7, 4), (7, 5), (7, 7)}. For a complete decision table, the characteristic relation R(B) is reduced to the indiscernibility relation Pawlak (1991). Recently characteristic relations were investigated in a number of papers, see, e.g., Li et al. (2007); Qi et al. (2008); Song et al. (2008); Yang et al. (2007). 4 Lower and upper approximations For incomplete decision tables lower and upper approximations may be defined in a few different ways. In this paper we will use concept lower and upper approximations for incomplete decision tables, following Grzymala-Busse (2003). Again, let X be a concept, let B be a subset of the set A of all attributes, and let R(B) be the characteristic relation of the incomplete decision table with characteristic sets KB (x), where x ∈ U . A concept B-lower approximation of the concept X, introduced in Grzymala-Busse (2003), is defined as follows: BX = ∪{KB (x) | x ∈ X, KB (x) ⊆ X}. 435 A Closest Fit Approach to Missing Attribute Values Table 3: Number of missing attribute values Name of the data set Bankruptcy Breast cancer Hepatitis House Image segmentation Iris Lymphography Wine Number of missing attribute values in input data set data set outputted by Global Local Closest Fit Closest Fit static dynamic static dynamic 113 869 1,035 376 1,394 210 931 806 5 117 74 2 151 29 91 119 11 175 45 4 126 54 84 93 5 149 110 2 137 44 79 107 8 184 63 4 107 56 71 104 A concept B-upper approximation of the concept X is defined as follows: BX = ∪{KB (x) | x ∈ X, KB (x) ∩ X 6= ∅} = ∪{KB (x) | x ∈ X}. The concept upper approximation was defined in Lin (1992) and Slowinski and Vanderpooten (2000) as well. For the decision table presented in Table 1, the concept A-upper approximations are A{1, 2, 3} = {1, 3}, A{4, 5, 6, 7} = {4, 5, 6, 7}, A{1, 2, 3} = {1, 2, 3, 5, 7}, A{4, 5, 6, 7} = {2, 4, 5, 6, 7}. 5 MLEM2 Rules induced from the lower approximation of the concept certainly describe the concept, hence such rules are called certain, Grzymala-Busse (1988). On the other hand, rules induced from the upper approximation of the concept describe the concept possibly, so these rules are called possible, Grzymala-Busse (1988). In our experiments, we used the MLEM2 rule induction algorithm. LEM2 explores the search space of attribute-value pairs, for details see, e.g., Chan and Grzymala-Busse (1991); Grzymala-Busse (1992). MLEM2, a modified version of LEM2, induces rule sets directly from data with numerical attributes and missing attribute values, see, e.g., Grzymala-Busse 436 Jerzy W. Grzymala-Busse Table 4: Error rates—Bankruptcy Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 13.64% 13.64% 13.64% 13.64% 13.64% 13.64% 4.55% 4.55% 4.55% 4.55% 4.55% 4.55% 7.58% 7.58% 7.58% 7.58% 7.58% 7.58% 3.03% 3.03% 4.55% 4.55% 4.55% 4.55% (2002). For Table 1, certain rules induced by MLEM2, using concept approximations, are: 2, 2, 2 (Temperature, high) & (Cough, yes) -> (Flu, yes) 1, 4, 4 (Temperature, normal) -> (Flu, no) Possible rules, induced in the same way, are: 1, 2, 4 (Headache, yes) -> (Flu, yes) 1, 2, 3 (Temperature, high) -> (Flu, yes) 1, 4, 4 (Temperature, normal) -> (Flu, no) 1, 3, 4 (Cough, no) -> (Flu, no) The above rules are in the LERS format (every rule is equipped with three numbers, the total number of attribute-value pairs on the left-hand side of the rule, the total number of cases correctly classified by the rule during training, and the total number of training cases matching the left-hand side of the rule), see, e.g., Grzymala-Busse (2002). 6 Experiments We used eight data sets for our experiments, see Tables 2–3. All of these data sets, except bankruptcy, are well-known data accessible at the University of California at Irvine Data Depository, http://archive.ics.uci.edu/ml/. The data set bankruptcy was collected by E. Altman and M. Heine at the New York University, School of Business, in 1968. Note that only one of data sets, house, was used for experiments in its original form. In remaining seven data sets, some attribute values, about 437 A Closest Fit Approach to Missing Attribute Values Table 5: Error rates—Breast Cancer Data, Slovenia Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 29.24% 31.41% 29.24% 31.05% 28.52% 31.41% 23.10% 25.27% 22.38% 27.08% 23.47% 25.99% 31.41% 29.24% 29.96% 29.96% 30.32% 29.60% 24.91% 25.63% 25.99% 28.16% 27.44% 28.16% Table 6: Error rates—Hepatitis Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 20.00% 21.94% 23.23% 23.23% 23.23% 22.58% 8.39% 8.39% 14.19% 11.61% 15.48% 12.90% 28.39% 25.81% 25.16% 27.10% 25.16% 27.10% 10.32% 12.26% 10.32% 9.03% 9.68% 10.97% 35%, were randomly removed, or more exactly, the original attribute values were replaced by symbols of missing attribute values. Three data sets: hepatitis, image segmentation and wine were discretized using a discretization method based on agglomerative cluster analysis, Chmielewski and Grzymala-Busse (1996). In our previous experiments, Grzymala-Busse (2009), we tested seven methods of handling missing attribute values: 1. Most common value for symbolic attributes and average value for numerical attributes, 2. Most common value for symbolic attributes and average value for numerical attributes, both restricted to a concept, 3. Global closest fit with static replacement of missing attribute values, if this method outputs a data set with some missing attribute values, use Method 5, 4. Concept closest fit with static replacement of missing attribute values, if this method outputs a data set with some missing attribute values, use Method 5, 5. Missing attribute values interpreted as lost values, 6. Missing attribute values interpreted as ”do not care” conditions, 438 Jerzy W. Grzymala-Busse Table 7: Error rates—House of Representatives, USA, 1984 Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 3.92% 4.15% 3.92% 4.38% 3.92% 4.38% 4.38% 4.61% 4.38% 4.61% 4.38% 4.61% 4.61% 5.07% 4.61% 5.07% 4.61% 5.07% 3.69% 3.69% 3.69% 3.69% 3.69% 3.69% Table 8: Error rates—Image Segmentation Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 46.19% 41.90% 60.48% 40.95% 59.52% 41.43% 13.81% 12.38% 16.67% 11.90% 15.71% 12.86% 55.24% 45.24% 52.86% 41.90% 55.71% 41.43% 13.81% 12.38% 12.86% 13.33% 15.71% 14.76% 7. Missing attribute values interpreted as attribute-concept values. For all seven methods we induced both certain and possible rules. For every data set the best method, one of our seven methods, was identified using the smallest error rate (estimated by ten-fold cross validation) as an indicator of quality. Only three methods, out of seven, were winners in this competition. Eight times the winner was Method 4, six times the winner was Method 2, and four times the winner was Method 3. A choice between certain and possible rules was not important. For methods 3 and 4, i.e., Global Closest Fit and Local Closest Fit, there exist additional strategies, based on a choice between static and dynamic replacement of missing attribute values and three additional choices based on interpreting remaining missing attribute values, after applying Closest Fit methods, as lost values, ”do not care” conditions, and attribute-concept values. Additionally, there are two possible ways of rule induction: as certain and possible. These additional strategies were not tested before in Grzymala-Busse (2009). In experiments described in this paper, we investigated these additional strate- 439 A Closest Fit Approach to Missing Attribute Values Table 9: Error rates—Iris Plant Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 10.00% 10.67% 11.33% 16.00% 13.33% 16.67% 2.67% 2.67% 1.33% 4.67% 6.67% 6.67% 16.00% 16.67% 14.00% 14.00% 14.00% 17.33% 2.00% 2.00% 2.67% 4.67% 7.33% 7.33% Table 10: Error rates—Lymphography Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 24.32% 25.00% 25.00% 25.68% 25.00% 25.00% 9.46% 9.46% 7.43% 7.43% 10.81% 10.81% 27.70% 22.97% 25.68% 24.32% 26.35% 25.68% 9.46% 9.46% 7.43% 7.43% 10.81% 10.81% gies. Thus for every data set 24 strategies of handling missing attribute values were tested. Results are presented in Tables 4–11. As it is clear from Tables 4–11, Local Closest Fit always outperformed Global Closest Fit. Local Closest Fit is designed to work on a single concept at a time and it is the main reason for its success. Additionally, one of interpretations of missing attribute values, namely attribute-concept value, was outperformed by remaining two interpretations, lost values and ”do not care” conditions in all experiments. Static replacement of missing attribute values was better for five out of seven data sets (with one tie), however, the Wilcoxon matched-pairs signed-rank test indicates no significant difference (at 5% significance level using a two-tailed test). On the other hand, the difference may be dramatic, as wine data set shows: the error rate was only 2.25% for static replacement and 15.17% for dynamic replacement. Similarly, ”do not care” condition interpretation of missing attribute values was better in applying to five out of seven data sets (with one tie), again, the same test shows no significant difference. Finally, like in our previous experiments, Grzymala-Busse (2009), we may conclude that the choice between certain and possible rules is irrelevant, for two data sets better were certain rules, for one case possible rules, with five ties. 440 Jerzy W. Grzymala-Busse Table 11: Error rates—Wine Recognition Data Method Global Closest Fit Static Dynamic Local Closest Fit Static Dynamic ?, certain rules ?, possible rules ∗, certain rules ∗, possible rules −, certain rules −, possible rules 13.48% 13.48% 14.61% 12.92% 16.85% 13.48% 5.62% 5.62% 2.25% 2.25% 3.37% 3.37% 7.87% 5.06% 7.30% 5.06% 6.18% 5.62% 29.78% 12.92% 29.21% 15.17% 29.78% 13.48% The next task was to compare the best results accomplished using the Local Closest Fit and Method 2, i.e., Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. Results are presented in Table 12. For these two methods of handling missing attribute values, even though Local Closest Fit outperformed Method 2 for six out of eight data sets, the Wilcoxon matched-pairs signed-rank test shows that the performance on our eight data sets does not differ significantly at 5% significance level using a two-tailed test. 7 Conclusions A very successful method of handling missing attribute values (Method 1) and two conventional methods based on Closest Fit enhanced by the rough-set methodology with 12 strategies in each were used in our experiments. It is clear that Local Closest Fit outperformed Global Local Fit and that attribute-concept value was the worst interpretation of missing attribute values. Obviously, more experiments are needed, but for time being it is clear that the best methodology should be selected as either Local Closest Fit equipped with four options: both static and dynamic replacement of missing attribute values and lost values and ”do not care” conditions for interpretation of missing attribute values or as Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. A choice between using certain and possible rules seems to be not important. References L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone (1984), Classification and Regression Trees, Wadsworth & Brooks, Monterey, CA. C. C. Chan and J. W. Grzymala-Busse (1991), On the attribute redundancy and the learning programs ID3, PRISM, and LEM2, Technical report, Department of Computer Science, University of Kansas. 441 A Closest Fit Approach to Missing Attribute Values Table 12: Error rates—the best results Name of the data set Bankruptcy Breast cancer Hepatitis House Image segmentation Iris Lymphography Wine The best results Concept most common value Local Closest fit and concept average value 7.58% 23.47% 12.90% 4.15% 9.52% 4.00% 6.08% 3.37% 3.03% 22.38% 8.39% 3.69% 11.90% 1.33% 7.43% 2.25% M. R. Chmielewski and J. W. Grzymala-Busse (1996), Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(4):319–331. J. W. Grzymala-Busse (1988), Knowledge acquisition under uncertainty—A rough set approach, Journal of Intelligent & Robotic Systems, 1:3–16. J. W. Grzymala-Busse (1991), On the unknown attribute values in learning from examples, in Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pp. 368–377. J. W. Grzymala-Busse (1992), LERS—A system for learning from examples based on rough sets, in R. Slowinski, editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18, Kluwer Academic Publishers, Dordrecht, Boston, London. J. W. Grzymala-Busse (2002), MLEM2: A new algorithm for rule induction from imperfect data, in Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 243– 250. J. W. Grzymala-Busse (2003), Rough set strategies to data with missing attribute values, in Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining, pp. 56–63. J. W. Grzymala-Busse (2004), Three approaches to missing attribute values—A rough set perspective, in Proceedings of the Workshop on Foundation of Data Mining, in conunction with the Fourth IEEE International Conference on Data Mining, pp. 55– 62. J. W. Grzymala-Busse (2009), A comparison of traditional and rough set approaches to missing attribute values in data mining, accepted for Data Mining 2009, the 10-th International Conference on Data Mining, Detection, Protection and Security, Royal Mare Village, Crete. J. W. Grzymala-Busse, W. J. Grzymala-Busse, and L. K. Goodwin (2002), A com- 442 Jerzy W. Grzymala-Busse parison of three closest fit approaches to missing attribute values in preterm birth data, International Journal of Intelligent Systems, 17(2):125–134. J. W. Grzymala-Busse and A. Y. Wang (1997), Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values, in Proceedings of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), pp. 69–72. M. Kryszkiewicz (1999), Rules in incomplete information systems, Information Sciences, 113(3-4):271–292. T. Li, D. Ruan, W. Geert, J. Song, and Y. Xu (2007), A rough sets based characteristic relation approach for dynamic attribute generalization in data mining, KnowledgeBased Systems, 20(5):485–494. T. Y. Lin (1992), Topological and fuzzy rough sets, in R. Slowinski, editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 287–304, Kluwer Academic Publishers, Dordrecht, Boston, London. Z. Pawlak (1991), Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht, Boston, London. Y. S. Qi, L. Wei, H. J. Sun, Y. Q. Song, and Q. S. Sun (2008), Characteristic relations in generalized incomplete information systems, in International Workshop on Knowledge Discovery and Data Mining, pp. 519–523. J. R. Quinlan (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA. R. Slowinski and D. Vanderpooten (2000), A generalized definition of rough approximations based on similarity, IEEE Transactions on Knowledge and Data Engineering, 12:331–336. J. Song, T. Li, and D. Ruan (2008), A new decision tree construction using the cloud transform and rough sets, in Proceedings of the Rough Sets and Knowledge Technology Conference, pp. 524–531. J. Stefanowski and A. Tsoukias (1999), On the extension of rough sets under incomplete information, in Proceedings of the RSFDGrC’1999, 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pp. 73–81. X. B. Yang, J. Y. Yang, C. Wu, and D. J. Yu (2007), Further investigation of characteristic relation in incomplete information systems, Systems Engineering - Theory & Practice, 27(6):155–160.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A Closest Fit Approach to Missing Attribute Values