Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RULE INDUCTION USING PROBABILISTIC APPROXIMATIONS AND DATA WITH MISSING ATTRIBUTE VALUES Patrick G. Clark Department of Electrical Engineering and Computer Science University of Kansas Lawrence, KS 66045, USA email: [email protected] ABSTRACT This paper presents results of experiments on rule induction from incomplete data (data with missing attribute values) using probabilistic approximations. Such approximations, broadly studied for many years, are fundamental concepts of variable precision rough set theory and similar models to deal with inconsistent data sets. Our main objective was to study how useful are probabilistic approximations that are different from ordinary lower and upper approximations. Our results are rather pessimistic: for eight data sets with two types of missing attribute values, in only one case out of 16 some of such probabilistic approximations were better than ordinary approximations. On the other hand, in another case, some probabilistic approximations were worse than ordinary approximations. Additionally, we studied how many different probabilistic approximations may exist for a given concept of a data set. KEY WORDS Data mining, rule induction, rough set theory, probabilistic approximations, parameterized approximations, incomplete data. 1 Introduction An idea of lower and upper approximations is one of the most fundamental concepts of rough set theory. A probabilistic (or parameterized) approximation, associated with a probability (parameter) α, is a generalization of ordinary lower and upper approximations. If this probability α is quite small, the probabilistic approximation is reduced to an upper approximation; if it is equal to one, the probabilistic approximation becomes a lower approximation [7]. Probabilistic approximations have been explored for many years in areas such as variable precision rough sets, Bayesian rough sets, decision-theoretic rough sets etc. The idea was introduced in [18] and then studied in many papers [11, 14, 15, 16, 19, 20, 21, 22]. Jerzy W. Grzymala-Busse Department of Electrical Engineering and Computer Science University of Kansas Lawrence, KS 66045, USA and Institute of Computer Science Polish Academy of Sciences 01-237 Warsaw, Poland email: [email protected] Many authors discussed theoretical properties of probabilistic approximations. Only recently probabilistic approximations, for completely specified and inconsistent data sets, were experimentally validated in [1]. Incomplete data sets are inconsistent in the sense that it is necessary to compute approximations of all concepts for rule induction. For incomplete data sets probabilistic approximations were generalized and re-defined in [7]. However, this paper, for the first time, presents results of experiments on probabilistic approximations applied for such data. The main objective of this paper is to study how useful are probabilistic approximations for data with missing attribute values. Ordinary lower and upper approximations proved their usefulness in many applications and it is a well-known fact that rough-set approaches to missing attribute values, based on ordinary lower and upper approximations, are of the same quality as the best traditional methods [8]. This paper compares probabilistic approximations with ordinary ones. We will distinguish two kinds of missing attribute values: lost and ”do not care” conditions. If an attribute value was originally given but now is not accessible (e.g., was erased or was forgotten) we will call it lost. If a data set consists of lost values, we will try to induce rules from existing data. Another interpretation of a missing attribute value is based on a refusal to answer a question, e.g., some people may refuse to tell their salary range, such a value will be called a ”do not care” condition. For analysis of data sets with ”do not care” conditions we will replace such a missing attribute values by all possible attribute values. For incomplete data sets there exist many definitions of approximations. Following [7], we will use so called concept approximations, generalized to concept probabilistic approximation in [7]. For a given concept of a data set the number of distinct probabilistic approximations is quite limited. In this paper we report the number of distinct probabilities associated with all characteristic sets for all concepts of eight real-life data sets with two interpretations of missing at- Table 1. A complete data set Attributes Table 2. An incomplete data set Decision Decision Case Temperature Headache Cough Flu Case Temperature Headache Cough Flu 1 2 3 4 5 6 7 high very-high high high normal normal high yes yes no yes yes no no no yes no yes yes no no yes yes yes yes no no no 1 2 3 4 5 6 7 high very-high * high normal ? high yes * no yes ? no no ? yes no yes yes no * yes yes yes yes no no no tribute values, lost values and ”do not care” conditions. Since characteristic sets are building blocks of probabilistic approximations, such numbers are upper limits for the number of corresponding probabilistic approximations. 2 Attributes Table 3. Conditional probabilities K(x) {1, 4} {2} {4} {3, 7} {3, 6, 7} {5} P ({1, 2, 3, 4} | K(x)) 1.0 1.0 1.0 0.5 0.333 0 Complete Data Sets We assume that the input data sets are presented in the form of a decision table. An example of a decision table is shown in Table 1. Rows of the decision table represent cases, while columns are labeled by variables. The set of all cases will be denoted by U . In Table 1, U = {1, 2, 3, 4, 5, 6, 7}. Independent variables are called attributes and a dependent variable is called a decision and is denoted by d. The set of all attributes will be denoted by A. In Table 1, A = {Temperature, Headache, Cough}. The value for a case x and an attribute a will be denoted by a(x). One of the most important ideas of rough set theory [13] is an indiscernibility relation. Let B be a nonempty subset of A. The indiscernibility relation R(B) is a relation on U defined for x, y ∈ U as follows: (x, y) ∈ R(B) if and only if ∀a ∈ B (a(x) = a(y)). The indiscernibility relation R(B) is an equivalence relation. Equivalence classes of R(B) are called elementary sets of B and are denoted by [x]B . For Table 1, all elementary sets of A are {1}, {2}, {3, 7}, {4}, {5} and {6}. A subset of U is called A-definable if it is a union of elementary sets. The set X of all cases defined by the same value of the decision d is called a concept. For example, a concept associated with the value yes of the decision Flu is the set {1, 2, 3, 4}. This concept is not A-definable. The largest B-definable set contained in X is called the B-lower approximation of X, denoted by apprB (X), and defined as follows ∪{[x]B | [x]B ⊆ X} while the smallest B-definable set containing X, denoted by apprB (X) is called the B-upper approximation of X, and is defined as follows ∪{[x]B | [x]B ∩ X 6= ∅}. For a variable a and its value v, (a, v) is called a variable-value pair. A block of (a, v), denoted by [(a, v)], is the set {x ∈ U | a(x) = v} [4]. For Table 1, there are two concepts: the blocks [(F lu, yes)] = {1, 2, 3, 4} and [(F lu, no)] = {5, 6, 7}. A-approximations of the concept {1, 2, 3, 4} are: apprA ([(F lu, yes)]) = {1, 2, 4}, apprA ([(F lu, yes)]) = {1, 2, 3, 4, 7}. 3 Incomplete Data Sets In this paper we distinguish between two interpretations of missing attribute values: lost values, denoted by ”?”, and ”do not care” conditions, denoted by ”*”. We assume that lost values were erased or are unreadable and that for data mining we use only remaining, specified values [10, 17]. ”Do not care” conditions are interpreted as uncommitted (Unpledged, neutral) [3, 12]. Such missing attribute values will be replaced by all possible attribute values. An example of an incomplete data set is shown in Table 2. For incomplete decision tables the definition of a block of an attribute-value pair is modified in the following way. Table 4. A decision table Attributes Table 5. Data sets used for experiments Decision Case Temperature Headache Cough Flu 1 2 3 4 5 6 7 high very-high * high normal ? high yes * no yes ? no no ? yes no yes yes no * yes yes SPECIAL yes SPECIAL SPECIAL SPECIAL Data set Bankruptcy Breast cancer Echocardiogram Image segmentation Hepatitis Iris Lymphography Wine recognition • If for an attribute a there exists a case x such that a(x) = ?, i.e., the corresponding value is lost, then the case x should not be included in any blocks [(a, v)] for all values v of attribute a, For the data set from Table 2 the blocks of attributevalue pairs are: [(Temperature, high)] = {1, 3, 4, 7}, [(Temperature, very-high)] = {2, 3}, [(Temperature, normal)] = {3, 5}, [(Headache, yes)] = {1, 2, 4}, [(Headache, no)] = {2, 3, 6, 7}, [(Cough, no)] = {3, 6, 7}, [(Cough, yes)] = {2, 4, 5, 7}. For a case x ∈ U and B ⊆ A, the characteristic set KB (x) is defined as the intersection of the sets K(x, a), for all a ∈ B, where the set K(x, a) is defined in the following way: • If a(x) is specified, then K(x, a) is the block [(a, a(x)] of attribute a and its value a(x), • If a(x)) =? or a(x) = ∗ then the set K(x, a) = U , where U is the set of all cases. For Table 1 and B = A, KA (1) = {1, 4}, KA (2) = {2}, KA (3) = {3, 6, 7}, KA (4) = {4}, KA (5) = {5}, KA (6) = {3, 6, 7}, KA (7) = {3, 7}. Note that for incomplete data there is a few possible ways to define approximations [6, 9], we use concept approximations [7]. A B-concept lower approximation of the concept X is defined as follows: cases attributes concepts 66 277 74 210 155 150 148 178 5 9 7 19 19 4 18 13 2 2 2 7 2 3 4 3 50 Bankruptcy, * Bankruptcy, ? Breast cancer, * Breast cancer, ? 45 40 35 30 Error rate • If for an attribute a there exists a case x such that the corresponding value is a ”do not care” condition, i.e., a(x) = ∗, then the case x should be included in blocks [(a, v)] for all specified values v of attribute a. Number of 25 20 15 10 5 0 0 0.2 0.4 0.6 Parameter alpha 0.8 1 Figure 1. Error rate for bankruptcy and breast cancer data sets BX = ∪{KB (x) | x ∈ X, KB (x) ⊆ X}. A B-concept upper approximation of the concept X is defined as follows: BX = ∪{KB (x) | x ∈ X, KB (x) ∩ X 6= ∅} = = ∪{KB (x) | x ∈ X}. For Table 1, A-concept lower and A-concept upper approximations of the two concepts: {1, 2, 3, 4} and {5, 6, 7} are: A{1, 2, 3, 4} = {1, 2, 4}, A{5, 6, 7} = {5}, A{1, 2, 3, 4} = {1, 2, 3, 4, 6, 7}, A{5, 6, 7} = {3, 5, 6, 7}. 50 90 45 80 40 30 60 Error rate Error rate 70 Echocardiogram, * Echocardiogram, ? Hepatitis, * Hepatitis, ? 35 Image segmentation, * Image segmentation, ? Iris, * Iris, ? 25 20 50 40 30 15 20 10 10 5 0 0 0 0 0.2 0.4 0.6 Parameter alpha 0.8 0.2 1 Figure 2. Error rate for echocardiogram and hepatitis data sets 0.4 0.6 Parameter alpha 0.8 1 Figure 3. Error rate for image segmentation and iris data sets 40 4 35 Probabilistic Approximations 30 apprα (X) = ∪{[x] | x ∈ U, P (X | [x]) ≥ α}, where [x] is [x]A and α is a parameter, 1 ≥ α > 0. We excluded the case of α = 0 since then apprα (X) = U for any nonempty X. Since we consider all possible values of α, our definition of apprα (X) covers both lower and upper probabilistic approximations. For discussion on how this definition is related to the value precision asymmetric rough sets see [1, 7]. Note that if α = 1, the probabilistic approximation becomes the standard lower approximation and if α is small, close to 0, in our experiments it was 0.001, the same definition describes the standard upper approximation. For incomplete data sets, a B-concept probabilistic approximation is defined by the following formula [7] 25 Error rate In this paper we are exploring all probabilistic approximations that can be defined for a given concept X. For completely specified data sets a probabilistic approximation is defined as follows 20 15 Lymphography, * Lymphography, ? Wine recognition, * Wine recognition, ? 10 5 0 0 0.2 0.4 0.6 Parameter alpha 0.8 1 Figure 4. Error rate for lymphography and wine recognition data sets appr0.5 ({1, 2, 3, 4}) = {1, 2, 4}. Note that there are only two distinct probabilistic approximations for the concept [(Flu, no)] as well. ∪{KB (x) | x ∈ X, P r(X|KB (x)) ≥ α}. For Table 2 and the concept X = [(Flu, yes)] = {1, 2, 3, 4}, for any characteristic set K(x) = KA (x), x ∈ U , conditional probabilities P (X|K(x)) are presented in Table 3. Since in we will discuss only A-concept probabilistic approximations, we will call them, for simplicity, probabilistic approximations. Thus, for the concept {1, 2, 3, 4} we may define only two distinct probabilistic approximations: appr0.333 ({1, 2, 3, 4}) = {1, 2, 3, 4, 6, 7}, and 5 Rule Induction with LERS The LERS (Learning from Examples based on Rough Sets) data mining system [4, 5] starts from computing lower and upper approximations for every concept and then it induces rules using the MLEM2 (Modified Learning from Examples Module version 2) rule induction algorithm. Rules induced from lower and upper approximations are called certain and possible, respectively [2]. MLEM2 explores the search space of attribute-value pairs. Its input data set is a lower or upper approximation of Table 7. Breast cancer, lost values Table 6. Bankruptcy, ”do not care” conditions α Error rate 0.001 0.75 0.8675 1.0 28.43 41.26 46.82 42.08 Standard deviation 5.580 3.933 4.103 5.748 a concept. In general, MLEM2 computes a local covering and then converts it into a rule set [5]. In order to induce probabilistic rules we have to modify input data sets. For every probabilistic approximation of the concept X = [(d, w)], the corresponding region will be unchanged (every entry will be the same as in the original data set). For all remaining cases, the decision value will be set to a special value, e.g., SPECIAL. Then we will induce a possible rule set [4] using the MLEM2 rule induction algorithm. From the induced rule set, only rules with (d, w) on the right hand side will survive, all remaining rules (for other values of d and for values SPECIAL) should be deleted. The final rule set is a union of all rule sets computed this way separately for all values of d. For example, if we want to induce probabilistic rules with α = 0.5 and X = [(Flu, yes)] = {1, 2, 3, 4} for the data set presented on Table 1 we should construct the decision table presented as Table 4. From Table 4, the MLEM2 rule induction algorithm induced the following possible rule with (Flu,yes) on right hand side: 1, 3, 3 (Headache, yes) → (F lu, yes). Rules for remaining two concepts must be computed separately, from different tables. Rules are presented in the LERS format, every rule is associated with three numbers: the total number of attribute-value pairs on the left-hand side of the rule, the total number of cases correctly classified by the rule during training, and the total number of training cases matching the left-hand side of the rule, i.e., the rule domain size. 6 Experiments For our experiments we used eight real-life data sets that are available on the University of California at Irvine Machine learning Repository. These data sets were enhanced by replacing 35% of existing attribute values by missing attribute values, separately by lost values and separately by ”do not care” conditions, see Table 5. Thus, for any data set from Table 5, two data sets were used for experiments, with missing attribute values interpreted as lost values and as ”do not care” conditions, respectively. α 0.001 0.4 0.9 1.0 Error rate 30.07 28.52 28.05 28.28 Standard deviation 0.900 1.102 1.318 1.135 The main objective of our research was to test whether probabilistic approximations, different from lower and upper approximations, are truly better than lower and upper approximations. To accomplish this objective, we conducted experiments of a single ten-fold cross validation increasing the parameter α, with increments equal to 0.1, from 0 to 1.0. For a given data sets, in all of these eleven experiments we used identical ten pairs of larger (90%) and smaller (10%) data sets, see Figures 1–4. If during such a sequence of eleven experiments, the error rate was smaller than the minimum of the error rates for lower and upper approximations or larger than maximum of the error rated for lower and upper approximations, we selected more precise values of the parameter α, to make sure that we are reaching an extreme. For a value suspected to be an extreme, we conducted additional 30 experiments of ten-fold cross validation. We compared averages and the standard deviations, using the standard statistical test for the difference between two averages (two-tailed test with 5% of significance level). For example, for the bankruptcy data set, affected by lost values, denoted by ”?”, the error rate was constant, so there is no need for additional 30 experiments. But for the same data set affected by ”do not care” conditions, denoted by ”*”, it is clear that we should look more closely at the parameter α around the value of 0.8. Results are presented in Table 6. Using the standard statistical test for the difference between two averages (two tails and the significance level of 5%) we may conclude that the lower approximation (α = 1.0) is worse than the upper approximation (α = 0.001). For the rest of the paper, whenever we will quote this statistical test, it will be always the twotailed test with the significance level of 5%. The same test indicates that the probabilistic approximation, associated with α = 0.8675 is worse than the lower approximation (α = 1.0), as well as the upper approximation (α = 0.001). Results of all remaining 30 experiments of ten-fold cross validation are presented on Tables 7–14. The wine recognition data set, affected by ”do not care” conditions represents the only case where probabilistic approximations (e.g., for α = 0.4 and α = 0.8) are better than both lower and upper approximations. For remaining 14 cases (combinations of a data set and a type of missing attribute values) the probabilistic approximations are not worse than the worse of the two: lower and upper approximations and not better than the better of the two: lower and upper approximations. Table 13. Wine recognition, lost values Table 8. Breast cancer, ”do not care” conditions α Error rate 0.001 0.2 0.45 0.5 1.0 28.93 28.78 29.38 29.38 29.03 Standard deviation 0.605 0.501 0.835 0.555 0.877 α 0.001 0.6 0.8 1.0 Error rate Standard deviation 18.19 17.70 17.04 17.26 2.031 1.661 1.485 1.877 Table 14. Wine recognition, ”do not care” conditions α Error rate Standard deviation 26.24 22.47 23.49 37.68 3.095 2.945 3.257 4.162 Table 9. Hepatitis, ”do not care” conditions α Error rate 0.001 0.2 0.4 0.7 1.0 19.66 19.61 20.08 20.95 20.56 Standard deviation 1.242 0.952 1.230 1.158 0.893 Table 10. Image segmentation, lost values α 0.001 0.45 1.0 Error rate 45.81 45.71 45.90 Standard deviation 2.621 3.048 2.704 Table 11. Iris, lost values α 0.001 0.75 1.0 Error rate 13.73 13.87 13.51 Standard deviation 1.915 1.520 1.485 0.001 0.4 0.8 1.0 Our secondary objective was to test how many different probabilistic approximations there exist for a given concept of the real-life data set. Results are listed in Tables 15-22. 7 Conclusions Our main objective was to test whether probabilistic approximations, different from lower and upper approximations, are truly better than lower and upper approximations. As follows from our experiments, for only one out of 16 possibilities (the wine recognition data set combined with missing attribute values interpreted as ”do not care” conditions) some of proper probabilistic approximations are better than ordinary lower and upper approximations. On the other hand, for another possibility (the bankruptcy data set combined with missing attribute values interpreted as ”do not care” condition) some of proper probabilistic approximations are worse than ordinary lower and upper approximations. Additionally, we may conclude that upper approximations are better than lower approximations for the following Table 15. Bankruptcy Table 12. Lymphography, ”do not care” conditions Concept α 0.001 0.8 1.0 Error rate Standard deviation 28.24 29.71 30.83 2.568 3.374 2.803 bankruptcy survival Upper limit for the number of distinct probabilistic approximations ? * 1 1 18 18 Table 20. Iris Table 16. Breast cancer Concept Concept Upper limit for the number of distinct probabilistic approximations Upper limit for the number of distinct probabilistic approximations recurrence-events no-recurrence-events ? * 15 15 128 128 iris-setosa iris-viginica iris-versicolor ? * 3 6 7 67 62 65 Table 21. Lymphography Table 17. Echocardiogram Concept one zero Upper limit for the number of distinct probabilistic approximations ? * 1 1 13 13 Table 18. Hepatitis Concept yes no Upper limit for the number of distinct probabilistic approximations ? * 1 1 47 47 Table 19. Image segmentation Concept sky cement window briskface foliage path grass Upper limit for the number of distinct probabilistic approximations ? * 1 3 2 2 2 3 1 47 76 58 66 75 63 56 Concept one two three four Upper limit for the number of distinct probabilistic approximations ? * 1 2 2 1 3 26 26 1 six data sets, all associated with ”do not care” conditions: bankruptcy, echo, hepatitis, iris, lymphography, and wine recognition. For only one combination (breast cancer and lost values) the lower approximation was better than the upper approximation. For remaining 9 combinations the difference is not statistically significant. Furthermore, for five data sets (bankruptcy, image segmentation, iris, lymphography and wine recognition) associated with lost values the error rate is smaller than the error rate for the same data sets associated with ”do not care” conditions. In the case of the lymphography data set the conclusion that lost values are better than ”do not care” conditions was a result of the Wilcoxon matchedpairs signed-ranks test (again, for two-tailed test and 5% level of significance). We also conducted experiments to test, for a given concept, how many different probabilistic approximations there exist for a combination of the data set and type of Table 22. Wine recognition Concept one two three Upper limit for the number of distinct probabilistic approximations ? * 4 3 6 96 69 106 missing attribute values. Note that the upper limits for number of distinct probabilistic approximations are always smaller for lost values than for ”do not care” conditions. This fact explains why the error rate for data sets with lost values is constant or show small deviation on Figures 1–4. If the only conditional probabilities, associated with characteristic sets are 0’s and 1’s, then the only probabilistic approximations are equal to both lower and upper approximations. In the future we plan to study probabilistic approximations using a series of incomplete data sets, with incrementally increased number of missing attribute values. References [1] P. G. Clark and J. W. Grzymala-Busse. Experiments on probabilistic approximations. In Proceedings of the 2011 IEEE International Conference on Granular Computing, pages 144–149, 2011. [2] J. W. Grzymala-Busse. Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems, 1:3–16, 1988. [3] J. W. Grzymala-Busse. On the unknown attribute values in learning from examples. In Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pages 368–377, 1991. [4] J. W. Grzymala-Busse. LERS—a system for learning from examples based on rough sets. In R. Slowinski, editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pages 3–18. Kluwer Academic Publishers, Dordrecht, Boston, London, 1992. [5] J. W. Grzymala-Busse. MLEM2: A new algorithm for rule induction from imperfect data. In Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 243–250, 2002. [6] J. W. Grzymala-Busse. Three approaches to missing attribute values—a rough set perspective. In Proceedings of the Workshop on Foundation of Data Mining, in conjunction with the Fourth IEEE International Conference on Data Mining, pages 55–62, 2004. [7] J. W. Grzymala-Busse. Generalized parameterized approximations. In Proceedings of the RSKT 2011, the 6-th International Conference on Rough Sets and Knowledge Technology, pages 136–145, 2011. [8] J. W. Grzymala-Busse and M. Hu. A comparison of several approaches to missing attribute values in data mining. In Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing, pages 340–347, 2000. [9] J. W. Grzymala-Busse and W. Rzasa. A local version of the MLEM2 algorithm for rule induction. Fundamenta Informaticae, 100:99–116, 2010. [10] J. W. Grzymala-Busse and A. Y. Wang. Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In Proceedings of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), pages 69– 72, 1997. [11] J. W. Grzymala-Busse and W. Ziarko. Data mining based on rough sets. In J. Wang, editor, Data Mining: Opportunities and Challenges, pages 142–173. Idea Group Publ., Hershey, PA, 2003. [12] M. Kryszkiewicz. Rules in incomplete information systems. Information Sciences, 113(3-4):271–292, 1999. [13] Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11:341–356, 1982. [14] Z. Pawlak and A. Skowron. Rough sets: Some extensions. Information Sciences, 177:28–40, 2007. [15] Z. Pawlak, S. K. M. Wong, and W. Ziarko. Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies, 29:81– 95, 1988. [16] D. Ślȩzak and W. Ziarko. The investigation of the bayesian rough set model. International Journal of Approximate Reasoning, 40:81–91, 2005. [17] J. Stefanowski and A. Tsoukias. Incomplete information tables and rough classification. Computational Intelligence, 17(3):545–566, 2001. [18] S. K. M. Wong and W. Ziarko. INFER—an adaptive decision support system based on the probabilistic approximate classification. In Proceedings of the 6-th International Workshop on Expert Systems and their Applications, pages 713–726, 1986. [19] Y. Y. Yao. Probabilistic rough set approximations. International Journal of Approximate Reasoning, 49:255–271, 2008. [20] Y. Y. Yao and S. K. M. Wong. A decision theoretic framework for approximate concepts. International Journal of Man-Machine Studies, 37:793–809, 1992. [21] W. Ziarko. Variable precision rough set model. Journal of Computer and System Sciences, 46(1):39–59, 1993. [22] W. Ziarko. Probabilistic approach to rough sets. International Journal of Approximate Reasoning, 49:272– 284, 2008.