Download Mining Incomplete Data with Attribute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mining Incomplete Data with Attribute-Concept
Values and “Do Not Care” Conditions
Patrick G. Clark1 and Jerzy W. Grzymala-Busse1,2
1
Department of Electrical Engineering and Computer Science,
University of Kansas, Lawrence, KS 66045, USA
2
Institute of Computer Science, Polish Academy of Sciences,
01–237 Warsaw, Poland
[email protected], [email protected]
Abstract. In this paper we present novel experimental results on comparing two interpretations of missing attribute values: attribute-concept
values and “do not care” conditions. Experiments were conducted on
176 data sets, with preprocessing using three kinds of probabilistic approximations (lower, middle and upper) and the MLEM2 rule induction
system. The performance was evaluated using the error rate computed by
ten-fold cross validation. At 5% statistical significance level, four cases
attribute-concept values and two cases “do not care” conditions performed better. At 10% statistical significance level, five cases attributeconcept values and three cases of “do not care” conditions performed
better. In the remaining cases the differences were not statistically significant.
1
Introduction
Lower and upper approximations are the basic ideas of rough set theory. A probabilistic approximation, defined using a probability α, is an extension of standard
lower and upper approximations. If α is equal to 1, the probabilistic approximation is reduced to the lower approximation; if α is slightly larger than 0, the
probabilistic approximation becomes the upper approximation. Probabilistic approximations have been investigated in Bayesian rough sets, decision-theoretic
rough sets, variable precision rough sets, etc., see, e.g., [12, 15–17, 20, 22–25].
Hybrid intelligent systems are an area of research that seeks to combine
many single classifier approaches to pattern recognition such that the resulting
collective performance improves on the performance of any single part [21]. The
idea that no single computational view solves all problems was presented in [19].
In our work we combine two areas of intelligent systems, rule learner classification
systems and uncertainty management in the form of rough sets [21].
Until recently, research on probabilistic approximations focused on theoretical properties of such approximations. Additionally, it was restricted to complete
data sets (with no missing attribute values). For incomplete data sets standard
approximations were extended to probabilistic approximations in [11]. The first
2
P. G. Clark and J. W. Grzymala-Busse
papers reporting experimental results on probabilistic approximations were [1,
4].
In this paper we study two interpretations of missing attribute values: attributeconcept values and “do not care” conditions. Our research is a continuation of
[5]. In [5] three interpretations of missing attribute values: lost values, attributeconcept values and “do not care” conditions were discussed; however, data sets
used for experiments in [5] were very restricted: only eight data sets were considered, all with 35% of missing attribute values. In this paper we consider a spectrum of data sets with various percentages of missing attribute values, starting
from 0 (complete data sets), and ending with saturated incomplete data sets,
with 5% as an increment of missing attribute values, for details see Section 4.
Our main objective was to check which interpretation of missing attribute
values: attribute-concept values and “do not care” conditions is better in terms of
the error rate. Our secondary objective was to compare three types of approximations, lower, middle and upper, where middle approximations are probabilistic
approximation associated with the parameter α = 0.5 [2, 3]. In this paper we
study usefulness of all three types of probabilistic approximations applied for
rule induction from incomplete data.
There exist many definitions of approximations [9] for data sets with missing
attribute values, we use one of the most successful options (from the view point
of rule induction) called concept approximations [9]. Concept approximations
were generalized to concept probabilistic approximations in [11].
Our experiments on rule induction on 176 data sets (with two types of missing
attribute values) and with three probabilistic approximations (lower, middle and
upper) show that an error rate, evaluated by ten-fold cross validation, depends
on a choice of the data set. Our main conclusion is that for a specific data
set both choices, for an interpretation of missing attribute values and for an
approximation type, should be taken into account in order to find the best
combination used for data mining.
2
Incomplete Data
We assume that the input data sets are presented in the form of a decision table.
An example of a decision table is shown in Table 1. Rows of the decision table
represent cases, while columns are labeled by variables. The set of all cases will
be denoted by U . In Table 1, U = {1, 2, 3, 4, 5, 6, 7, 8}. Independent variables
are called attributes and a dependent variable is called a decision and is denoted
by d. The set of all attributes will be denoted by A. In Table 1, A = {Education,
Skills, Experience}. The value for a case x and an attribute a will be denoted by
a(x).
In this paper we distinguish between two interpretations of missing attribute
values: attribute-concept values and “do not care” conditions. Attribute-concept
values, denoted by “−”, mean that the original attribute value is unknown;
however, because we know the concept to which a case belongs, we know all
possible attribute values. For example, if we know that a patient is sick with
Mining Incomplete Data
3
flu and if typical temperature values for such patients is high or very high,
then we will use these values for rule induction, for details see [10]. “Do not
care” conditions , denoted by “*”, mean that the original attribute values are
irrelevant, so we may replace them by any attribute value, for details see [6, 13,
18]. Table 1 presents an incomplete data set affected by both lost values and
attribute-concept values.
Table 1. A decision table
Attributes
Decision
Case
Education
Skills
Experience
Productivity
1
2
3
4
5
6
7
8
higher
*
secondary
higher
elementary
secondary
−
elementary
high
high
−
*
high
−
low
*
−
low
high
high
low
high
high
−
high
high
high
high
low
low
low
low
One of the most important ideas of rough set theory [14] is an indiscernibility
relation, defined for complete data sets. Let B be a nonempty subset of A. The
indiscernibility relation R(B) is a relation on U defined for x, y ∈ U as follows:
(x, y) ∈ R(B) if and only if ∀a ∈ B (a(x) = a(y)).
The indiscernibility relation R(B) is an equivalence relation. Equivalence classes
of R(B) are called elementary sets of B and are denoted by [x]B . A subset of U
is called B-definable if it is a union of elementary sets of B.
The set X of all cases defined by the same value of the decision d is called
a concept. For example, a concept associated with the value low of the decision
Productivity is the set {1, 2, 3, 4}. The largest B-definable set contained in X
is called the B-lower approximation of X, denoted by apprB (X), and defined as
follows
∪{[x]B | [x]B ⊆ X},
while the smallest B-definable set containing X, denoted by apprB (X) is called
the B-upper approximation of X, and is defined as follows
∪{[x]B | [x]B ∩ X 6= ∅}.
4
P. G. Clark and J. W. Grzymala-Busse
For a variable a and its value v, (a, v) is called a variable-value pair. A block
of (a, v), denoted by [(a, v)], is the set {x ∈ U | a(x) = v} [7].
For incomplete decision tables the definition of a block of an attribute-value
pair is modified in the following way.
– If for an attribute a there exists a case x such that a(x) = −, then the
corresponding case x should be included in blocks [(a, v)] for all specified
values v ∈ V (x, a) of attribute a, where
V (x, a) = {a(y) | a(y) is specified , y ∈ U, d(y) = d(x)},
– If for an attribute a there exists a case x such that a(x) = ∗, then the case x
should be included in blocks [(a, v)] for all specified values v of the attribute
a.
For the data set from Table 1, V (1, Experience) = {low, high}, V (3, Skills) =
{high}, V (6, Skills) = {low, high}, V (7, Education) = {elementary, secondary},
and V (8, Experience) = {low, high}.
For the data set from Table 1 the blocks of attribute-value pairs are:
[(Education, elementary)] = {2, 5, 7, 8},
[(Education, secondary)] = {2, 3, 6, 7},
[(Education, higher)] = {1, 2, 4},
[(Skills, low)] = {4, 6, 7, 8},
[(Skills, high)] = {1, 2, 3, 4, 5, 6, 8},
[(Experience, low)] = {1, 2, 5, 8},
[(Experience, high)] = {1, 3, 4, 6, 7, 8}.
For a case x ∈ U and B ⊆ A, the characteristic set KB (x) is defined as the
intersection of the sets K(x, a), for all a ∈ B, where the set K(x, a) is defined in
the following way:
– If a(x) is specified, then K(x, a) is the block [(a, a(x))] of attribute a and its
value a(x),
– If a(x) = −, then the corresponding set K(x, a) is equal to the union of
all blocks of attribute-value pairs (a, v), where v ∈ V (x, a) if V (x, a) is
nonempty. If V (x, a) is empty, K(x, a) = U ,
– If a(x) = ∗ then the set K(x, a) = U , where U is the set of all cases.
For Table 1 and B = A,
KA (1) = {1, 2, 4},
KA (2) = {1, 2, 5, 8},
KA (3) = {3, 6},
KA (4) = {1, 4},
KA (5) = {2, 5, 8},
KA (6) = {3, 6, 7},
KA (7) = {6, 7, 8},
KA (8) = {2, 5, 7, 8}.
Mining Incomplete Data
5
Note that for incomplete data there are a few possible ways to define approximations [9], we used concept approximations [11] since our previous experiments
indicated that such approximations are most efficient [11]. A B-concept lower
approximation of the concept X is defined as follows:
BX = ∪{KB (x) | x ∈ X, KB (x) ⊆ X},
while a B-concept upper approximation of the concept X is defined by:
BX = ∪{KB (x) | x ∈ X, KB (x) ∩ X 6= ∅} = ∪{KB (x) | x ∈ X}.
For Table 1, A-concept lower and A-concept upper approximations of the
concept {5, 6, 7, 8} are:
A{5, 6, 7, 8} = {6, 7, 8},
A{5, 6, 7, 8} = {2, 3, 5, 6, 7, 8}.
3
Probabilistic approximations
For completely specified data sets a probabilistic approximation is defined as
follows
apprα (X) = ∪{[x] | x ∈ U, P (X | [x]) ≥ α},
α is a parameter, 0 < α ≤ 1, see [11, 12, 16, 20, 22, 24]. Additionally, for simplicity, the elementary sets [x]A are denoted by [x]. For discussion on how this
definition is related to the value precision asymmetric rough sets see [1, 11].
Note that if α = 1, the probabilistic approximation becomes the standard
lower approximation and if α is small, close to 0, in our experiments it was 0.001,
the same definition describes the standard upper approximation.
For incomplete data sets, a B-concept probabilistic approximation is defined
by the following formula [11]
∪{KB (x) | x ∈ X, P r(X|KB (x)) ≥ α}.
For simplicity, we will denote KA (x) by K(x) and the A-concept probabilistic
approximation will be called a probabilistic approximation.
For Table 1 and the concept X = [(Productivity, low )] = {5, 6, 7, 8}, there
exist three distinct three distinct probabilistic approximations:
appr1.0 ({5, 6, 7, 8}) = {6, 7, 8},
appr0.75 ({5, 6, 7, 8}) = {2, 5, 6, 7, 8},
and
appr0.001 ({5, 6, 7, 8}) = {2, 3, 5, 6, 7, 8}.
The special probabilistic approximations with the parameter α = 0.5 will be
called a middle approximation.
P. G. Clark and J. W. Grzymala-Busse
Error rate
40
Lower, -
30
Middle, -
20
Upper, -
10
Lower, *
Middle, *
0
0
10
20
30
Percentage of missing attribute values
Upper, *
Fig. 1. Error rate for the bankruptcy data set
Error rate
100
Lower, -
80
Middle, -
60
Upper, -
40
Lower, *
Middle, *
20
0
Upper, *
10
20
30
40
Percentage of missing attribute values
Error rate
Fig. 2. Error rate for the breast cancer data set
45
40
35
30
25
20
15
Lower, Middle, Upper, Lower, *
Middle, *
0
10
20
30
Percentage of missing attribute values
40
Upper, *
Fig. 3. Error rate for the echocardiogram data set
Error rate
6
27
25
23
21
19
17
15
Lower, Middle, Upper, Lower, *
Middle, *
0
10
20
30
40
50
Percentage of missing attribute values
60
Fig. 4. Error rate for the hepatitis data set
Upper, *
Mining Incomplete Data
Lower, -
Error rate
90
70
Middle, -
50
Upper, -
30
Lower, *
Middle, *
10
0
10
20
30
40
50
60
Percentage of missing attribute values
Upper, *
Fig. 5. Error rate for the image segmentation data set
Lower, -
Error rate
60
Middle, -
40
Upper, -
20
Lower, *
Middle, *
0
0
10
20
30
Percentage of missing attribute values
Upper, *
Fig. 6. Error rate for the iris data set
Lower, -
Error rate
90
70
Middle, -
50
Upper, -
30
Lower, *
Middle, *
10
0
10
20
30
40
50
60
Percentage of missing attribute values
Upper, *
Fig. 7. Error rate for the lymphography data set
Error rate
100
Lower, -
80
Middle, -
60
Upper, -
40
Lower, *
20
Middle, *
0
0
10
20
30
40
50
60
Percentage of missing attribute values
Fig. 8. Error rate for the wine recognition data set
Upper, *
7
8
4
P. G. Clark and J. W. Grzymala-Busse
Experiments
For our experiments we used eight types of data sets from the University of
California at Irvine Machine Learning Repository.
Our main objective was to check which interpretation of missing attribute
values: attribute-concept values or “do not care” conditions is better in terms
of the error rate. Our secondary objective was to test which of the three probabilistic approximations: lower, middle or upper provides the best results, again
in terms of the error rate.
In our experiments two parameters were used: the percentage of missing
attribute values and the probability α used in the definition of probabilistic
approximations. Both parameters have many possible values.
For practical reasons, we restricted both parameters, for the percentage of
missing attribute values we considered numbers 0, 5%, 10% and so on. We have
replaced randomly and incrementally existing attribute values by symbols of
missing attribute values, first using − and then we replaced all symbols of
attribute-concept values by *, with the increment 5% until replacing another
5% of existing attribute values by symbols of missing attribute values caused
in the incomplete data set the entire row full of missing attribute values (i.e.,
there was a case x such that a(x), for all a ∈ A, was a symbol of the missing
attribute value). If so, we retracted the last replacement and tried another random replacement. If this attempt was unsuccessful, we tried yet another random
replacement. If this third attempt was unsuccessful, we ended the process of
creating incomplete data sets.
For example, for the bankruptcy data set, the maximum number of missing
attribute values is 35%, since with three random tries of replacing yet another
5% of attribute values, all three data sets with 40% of missing attribute values
had a row labeled by some x ∈ U with a(x) being a symbol of missing attribute
values for all a ∈ A.
For the bankruptcy data set, replacing existing attribute values by symbols
of attribute-concept values, resulted in seven new data sets (with 5%, 10%,...,
35%) of attribute-concept values. Then we created another seven data sets by
replacing all symbols of attribute-concept values by symbols of “do not care”
conditions. Thus for bankruptcy data sets, 15 data sets were used in experiments.
Since we used eight types of data sets, the total number of all data sets used in
experiments was 176.
We restricted our attention to three probabilistic approximations: lower (α =
1), upper (α = 0.001) and the most typical probabilistic approximation, for
α = 0.5, called the middle approximation. Therefore the total number of tenfold cross validation experiments was 176 × 3 = 528.
Results of our experiments are presented on Figures 1–8. First, for all eight
types of data sets we computed error rate associated with two interpretations of
missing attribute values: attribute-concept values and “do not care” conditions.
Then we evaluated the statistical significance of the results using the Wilcoxon
matched-pairs signed rank test, with the 5% level of significance for two-tailed
Mining Incomplete Data
9
test, separately for lower, middle and upper approximations. For the echocardiogram, for middle and upper approximations, the “do not care” condition
interpretation of missing attribute values was better than the attribute-concept
interpretation of missing attribute values.
On the other hand, for the image segmentation data set, for all three types of
approximations and for the wine recognition data set with lower approximation,
the attribute-concept value interpretation of missing attribute values was better than the “do not care” condition interpretation of missing attribute values.
Thus, for two combinations of data set and type of approximation, the “do not
care” condition interpretation of missing attribute values was better than the
attribute-concept value interpretation of missing attribute values. For other four
combinations of data set and type of approximation, the attribute-concept value
interpretation of missing attribute values was better than the “do not care” condition interpretation of missing attribute values. For remaining 18 combinations
of data set and type of approximation, the difference in performance between
attribute-concept values and “do not care” condition interpretation of missing
attribute was not statistically significant (5% significance level).
When we changed the level of significance in the Wilcoxon test to 10%, we
observed additionally that for the echocardiogram, for lower approximations, the
“do not care” condition interpretation of missing attribute values was better than
the attribute-concept value interpretation of missing attribute values. However,
for the iris data set and lower approximations, the attribute-concept value interpretation of missing attribute values was better than the “do not care” condition
interpretation of missing attribute values.
Thus, for three combinations of data set and type of approximation, the “do
not care” condition interpretation of missing attribute values was better than
the attribute-concept value interpretation of missing attribute values. For other
five combinations of data set and type of approximation, the attribute-concept
value interpretation of missing attribute values was better than the “do not care”
condition interpretation of missing attribute values. For the remaining 16 combinations of data set and type of approximation, the difference in performance
between attribute-concept values and “do not care” condition interpretation of
missing attribute was not statistically significant (10% significance level).
Then we compared the three kinds of approximations, separately for attributeconcept values and for “do not care” conditions, using the Friedman Rank Sums
test, again, with 5% of significance level. The total number of tests was 16 (eight
types of data sets with two interpretations of missing attribute values). For image segmentation data set with attribute-concept values and with “do not care”
conditions, for the iris data set with “do not care” conditions and for the wine
recognition data set with “do not care” conditions there was a strong evidence
to reject the null hypothesis that all three approximations are equivalent.
Using the test for ordered alternatives based on the Friedman rank sums test
we conclude that, with 5% of significance level, the middle and upper approximations are better than the lower approximations both image segmentation data
sets, while for remaining two data sets: iris and wine recognition, both with “do
10
P. G. Clark and J. W. Grzymala-Busse
not care” conditions, upper approximations are better than lower approximations. The difference in performance between the middle and upper approximations is not statistically significant. For the remaining 10 combinations of data
set and type of missing attribute value, the difference in performance between
approximations was not significant.
Again, when we changed the level of significance in the Friedman test to
10%, we may observe additionally that for the bankruptcy and lymphography
data sets, both with “do not care” conditions, upper approximations are better
than lower, while for the iris and wine recognition data sets, both with “do not
care” conditions, middle approximations are better than lower. Thus, upper approximations were better than lower approximations for six combinations of data
set and type of missing attribute values, and middle approximations were better
than lower approximations for other two such combinations. For the remaining
six combinations of data set and type of missing attribute value, the difference
in performance between approximations was not significant.
A summary of the results of the experiments indicate that for the majority
of the experiments performed, the results did not show a statistically significant
difference in performance between the interpretation of missing attribute values
and approximation types. However, there is strong evidence that there are situations where varying these values would yield better results with certain data
sets.
For rule induction we used the MLEM2 (Modified Learning from Examples
Module version 2) rule induction algorithm, a component of the LERS (Learning
from Examples based on Rough Sets) data mining system [7, 8].
5
Conclusions
Our primary objective was to compare the quality of rule sets induced from incomplete data sets with attribute-concept values and “do not care” conditions using three types of probabilistic approximations: lower, middle and upper. There
is some evidence that attribute-concept values are better than “do not care”
conditions in terms of the error rate measured by ten-fold cross validation.
This work is a continuation of the experiments in [5]. The primary focus
of [5] was to identify the best interpretation of missing attribute values with a
secondary objective of testing the usefulness of concept probabilistic approximations in mining incomplete data. In this work we expanded our investigation
to 176 data sets while in [5] only 24 data sets were considered. However, because of the experiment size, we restricted the number of interpretations from
three to two. In addition, while a primary objective was an investigation of two
interpretations of missing attribute values, this work also compared three approximations in an effort to identify the most effective between lower, middle
and upper.
Our experiments on rule induction on 176 data sets (with two types of missing
attribute values) and with three probabilistic approximations (lower, middle and
upper) show that an error rate, evaluated by ten-fold cross validation, depends on
Mining Incomplete Data
11
a choice of the data set. In the majority of the data experimented with resulted
in insignificant differences between the methods. However, as discussed
Our main conclusion is that for a specific data set both choices, for an interpretation of missing attribute values and for an approximation type, should be
taken into account in order to find the best combination used for data mining.
Data sets with large percentage of “do not care” conditions may cause the
error rate for lower approximation to increase up to 100% due to large characteristic sets, and, consequently, empty corresponding lower approximations and
empty rule sets.
Our secondary objective was to compare three approximations: lower, middle,
and upper. In six combinations, out of 16, lower approximations were worse than
middle or upper (5% significance level). In addition, in ten combinations, out
of 16, lower approximations were worse than middle or upper approximations.
Hence lower approximations should be avoided for mining incomplete data with
attribute-concept values or “do not care” conditions.
References
1. Clark, P.G., Grzymala-Busse, J.W.: Experiments on probabilistic approximations.
In: Proceedings of the 2011 IEEE International Conference on Granular Computing. (2011) 144–149
2. Clark, P.G., Grzymala-Busse, J.W.: Experiments on rule induction from incomplete data using three probabilistic approximations. In: Proceedings of the 2012
IEEE International Conference on Granular Computing. (2012) 90–95
3. Clark, P.G., Grzymala-Busse, J.W.: Experiments using three probabilistic approximations for rule induction from incomplete data sets. In: Proceeedings of the
MCCSIS 2012, IADIS European Conference on Data Mining ECDM 2012. (2012)
72–78
4. Clark, P.G., Grzymala-Busse, J.W.: Rule induction using probabilistic approximations and data with missing attribute values. In: Proceedings of the 15-th IASTED
International Conference on Artificial Intelligence and Soft Computing ASC 2012.
(2012) 235–242
5. Clark, P.G., Grzymala-Busse, J.W.: An experimental comparison of three interpretations of missing attribute values using probabilistic approximations. In: Proceedings of the 14-th International Conference on Rough Sets, Fuzzy Sets, Data
Mining and Granular Computing. (2013) 77–86
6. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. In: Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems. (1991) 368–377
7. Grzymala-Busse, J.W.: LERS—a system for learning from examples based on
rough sets. In Slowinski, R., ed.: Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers,
Dordrecht, Boston, London (1992) 3–18
8. Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: Proceedings of the 9th International Conference on Information
Processing and Management of Uncertainty in Knowledge-Based Systems. (2002)
243–250
12
P. G. Clark and J. W. Grzymala-Busse
9. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Workshop Notes, Foundations and New Directions of Data Mining, in
conjunction with the 3-rd International Conference on Data Mining. (2003) 56–63
10. Grzymala-Busse, J.W.: Three approaches to missing attribute values—a rough
set perspective. In: Proceedings of the Workshop on Foundation of Data Mining,
in conjunction with the Fourth IEEE International Conference on Data Mining.
(2004) 55–62
11. Grzymala-Busse, J.W.: Generalized parameterized approximations. In: Proceedings of the RSKT 2011, the 6-th International Conference on Rough Sets and
Knowledge Technology. (2011) 136–145
12. Grzymala-Busse, J.W., Ziarko, W.: Data mining based on rough sets. In Wang,
J., ed.: Data Mining: Opportunities and Challenges. Idea Group Publ., Hershey,
PA (2003) 142–173
13. Kryszkiewicz, M.: Rough set approach to incomplete information systems. In: Proceedings of the Second Annual Joint Conference on Information Sciences. (1995)
194–197
14. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11 (1982) 341–356
15. Pawlak, Z., Skowron, A.: Rough sets: Some extensions. Information Sciences 177
(2007) 28–40
16. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies 29 (1988) 81–95
17. Ślȩzak, D., Ziarko, W.: The investigation of the bayesian rough set model. International Journal of Approximate Reasoning 40 (2005) 81–91
18. Stefanowski, J., Tsoukias, A.: On the extension of rough sets under incomplete
information. In: Proceedings of the RSFDGrC’1999, 7th International Workshop
on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing.
(1999) 73–81
19. Wolpert, D.H.: The supervised learning no-free-lunch theorems. In: Soft Computing and Industry. Springer (2002) 25–42
20. Wong, S.K.M., Ziarko, W.: INFER—an adaptive decision support system based
on the probabilistic approximate classification. In: Proceedings of the 6-th International Workshop on Expert Systems and their Applications. (1986) 713–726
21. Wozniak, M., Grana, M., E., C.: A survey of multiple classifier systems as hybrid
systems. Information Fusion 16 (2014) 3–17
22. Yao, Y.Y.: Probabilistic rough set approximations. International Journal of Approximate Reasoning 49 (2008) 255–271
23. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximate concepts. International Journal of Man-Machine Studies 37 (1992) 793–809
24. Ziarko, W.: Variable precision rough set model. Journal of Computer and System
Sciences 46(1) (1993) 39–59
25. Ziarko, W.: Probabilistic approach to rough sets. International Journal of Approximate Reasoning 49 (2008) 272–284