Download rule induction using probabilistic approximations and data with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
RULE INDUCTION USING PROBABILISTIC APPROXIMATIONS AND DATA
WITH MISSING ATTRIBUTE VALUES
Patrick G. Clark
Department of Electrical Engineering and
Computer Science
University of Kansas
Lawrence, KS 66045, USA
email: [email protected]
ABSTRACT
This paper presents results of experiments on rule induction
from incomplete data (data with missing attribute values)
using probabilistic approximations. Such approximations,
broadly studied for many years, are fundamental concepts
of variable precision rough set theory and similar models to
deal with inconsistent data sets.
Our main objective was to study how useful are probabilistic approximations that are different from ordinary
lower and upper approximations. Our results are rather
pessimistic: for eight data sets with two types of missing
attribute values, in only one case out of 16 some of such
probabilistic approximations were better than ordinary approximations. On the other hand, in another case, some
probabilistic approximations were worse than ordinary approximations. Additionally, we studied how many different
probabilistic approximations may exist for a given concept
of a data set.
KEY WORDS
Data mining, rule induction, rough set theory, probabilistic approximations, parameterized approximations, incomplete data.
1
Introduction
An idea of lower and upper approximations is one of the
most fundamental concepts of rough set theory. A probabilistic (or parameterized) approximation, associated with
a probability (parameter) α, is a generalization of ordinary lower and upper approximations. If this probability α is quite small, the probabilistic approximation is reduced to an upper approximation; if it is equal to one, the
probabilistic approximation becomes a lower approximation [7]. Probabilistic approximations have been explored
for many years in areas such as variable precision rough
sets, Bayesian rough sets, decision-theoretic rough sets etc.
The idea was introduced in [18] and then studied in many
papers [11, 14, 15, 16, 19, 20, 21, 22].
Jerzy W. Grzymala-Busse
Department of Electrical Engineering and
Computer Science
University of Kansas
Lawrence, KS 66045, USA
and
Institute of Computer Science
Polish Academy of Sciences
01-237 Warsaw, Poland
email: [email protected]
Many authors discussed theoretical properties of
probabilistic approximations. Only recently probabilistic
approximations, for completely specified and inconsistent
data sets, were experimentally validated in [1]. Incomplete
data sets are inconsistent in the sense that it is necessary to
compute approximations of all concepts for rule induction.
For incomplete data sets probabilistic approximations were
generalized and re-defined in [7]. However, this paper, for
the first time, presents results of experiments on probabilistic approximations applied for such data.
The main objective of this paper is to study how useful are probabilistic approximations for data with missing
attribute values. Ordinary lower and upper approximations
proved their usefulness in many applications and it is a
well-known fact that rough-set approaches to missing attribute values, based on ordinary lower and upper approximations, are of the same quality as the best traditional
methods [8]. This paper compares probabilistic approximations with ordinary ones.
We will distinguish two kinds of missing attribute values: lost and ”do not care” conditions. If an attribute value
was originally given but now is not accessible (e.g., was
erased or was forgotten) we will call it lost. If a data set
consists of lost values, we will try to induce rules from existing data. Another interpretation of a missing attribute
value is based on a refusal to answer a question, e.g., some
people may refuse to tell their salary range, such a value
will be called a ”do not care” condition. For analysis of
data sets with ”do not care” conditions we will replace such
a missing attribute values by all possible attribute values.
For incomplete data sets there exist many definitions
of approximations. Following [7], we will use so called
concept approximations, generalized to concept probabilistic approximation in [7].
For a given concept of a data set the number of distinct probabilistic approximations is quite limited. In this
paper we report the number of distinct probabilities associated with all characteristic sets for all concepts of eight
real-life data sets with two interpretations of missing at-
Table 1. A complete data set
Attributes
Table 2. An incomplete data set
Decision
Decision
Case
Temperature
Headache
Cough
Flu
Case
Temperature
Headache
Cough
Flu
1
2
3
4
5
6
7
high
very-high
high
high
normal
normal
high
yes
yes
no
yes
yes
no
no
no
yes
no
yes
yes
no
no
yes
yes
yes
yes
no
no
no
1
2
3
4
5
6
7
high
very-high
*
high
normal
?
high
yes
*
no
yes
?
no
no
?
yes
no
yes
yes
no
*
yes
yes
yes
yes
no
no
no
tribute values, lost values and ”do not care” conditions.
Since characteristic sets are building blocks of probabilistic approximations, such numbers are upper limits for the
number of corresponding probabilistic approximations.
2
Attributes
Table 3. Conditional probabilities
K(x)
{1, 4} {2} {4} {3, 7} {3, 6, 7} {5}
P ({1, 2, 3, 4} | K(x)) 1.0 1.0 1.0 0.5
0.333
0
Complete Data Sets
We assume that the input data sets are presented in the
form of a decision table. An example of a decision table
is shown in Table 1. Rows of the decision table represent
cases, while columns are labeled by variables. The set of
all cases will be denoted by U . In Table 1, U = {1, 2, 3, 4,
5, 6, 7}. Independent variables are called attributes and a
dependent variable is called a decision and is denoted by d.
The set of all attributes will be denoted by A. In Table 1, A
= {Temperature, Headache, Cough}. The value for a case
x and an attribute a will be denoted by a(x).
One of the most important ideas of rough set theory
[13] is an indiscernibility relation. Let B be a nonempty
subset of A. The indiscernibility relation R(B) is a relation
on U defined for x, y ∈ U as follows:
(x, y) ∈ R(B) if and only if ∀a ∈ B (a(x) = a(y)).
The indiscernibility relation R(B) is an equivalence
relation. Equivalence classes of R(B) are called elementary sets of B and are denoted by [x]B . For Table 1, all
elementary sets of A are {1}, {2}, {3, 7}, {4}, {5} and
{6}. A subset of U is called A-definable if it is a union of
elementary sets.
The set X of all cases defined by the same value of
the decision d is called a concept. For example, a concept
associated with the value yes of the decision Flu is the set
{1, 2, 3, 4}. This concept is not A-definable. The largest
B-definable set contained in X is called the B-lower approximation of X, denoted by apprB (X), and defined as
follows
∪{[x]B | [x]B ⊆ X}
while the smallest B-definable set containing X, denoted
by apprB (X) is called the B-upper approximation of X,
and is defined as follows
∪{[x]B | [x]B ∩ X 6= ∅}.
For a variable a and its value v, (a, v) is called a
variable-value pair. A block of (a, v), denoted by [(a, v)],
is the set {x ∈ U | a(x) = v} [4]. For Table 1, there are
two concepts: the blocks [(F lu, yes)] = {1, 2, 3, 4} and
[(F lu, no)] = {5, 6, 7}. A-approximations of the concept
{1, 2, 3, 4} are:
apprA ([(F lu, yes)])
=
{1, 2, 4},
apprA ([(F lu, yes)])
=
{1, 2, 3, 4, 7}.
3 Incomplete Data Sets
In this paper we distinguish between two interpretations of
missing attribute values: lost values, denoted by ”?”, and
”do not care” conditions, denoted by ”*”. We assume that
lost values were erased or are unreadable and that for data
mining we use only remaining, specified values [10, 17].
”Do not care” conditions are interpreted as uncommitted
(Unpledged, neutral) [3, 12]. Such missing attribute values
will be replaced by all possible attribute values.
An example of an incomplete data set is shown in Table 2. For incomplete decision tables the definition of a
block of an attribute-value pair is modified in the following
way.
Table 4. A decision table
Attributes
Table 5. Data sets used for experiments
Decision
Case
Temperature
Headache
Cough
Flu
1
2
3
4
5
6
7
high
very-high
*
high
normal
?
high
yes
*
no
yes
?
no
no
?
yes
no
yes
yes
no
*
yes
yes
SPECIAL
yes
SPECIAL
SPECIAL
SPECIAL
Data set
Bankruptcy
Breast cancer
Echocardiogram
Image segmentation
Hepatitis
Iris
Lymphography
Wine recognition
• If for an attribute a there exists a case x such that
a(x) = ?, i.e., the corresponding value is lost, then the
case x should not be included in any blocks [(a, v)] for
all values v of attribute a,
For the data set from Table 2 the blocks of attributevalue pairs are:
[(Temperature, high)] = {1, 3, 4, 7},
[(Temperature, very-high)] = {2, 3},
[(Temperature, normal)] = {3, 5},
[(Headache, yes)] = {1, 2, 4},
[(Headache, no)] = {2, 3, 6, 7},
[(Cough, no)] = {3, 6, 7},
[(Cough, yes)] = {2, 4, 5, 7}.
For a case x ∈ U and B ⊆ A, the characteristic set
KB (x) is defined as the intersection of the sets K(x, a), for
all a ∈ B, where the set K(x, a) is defined in the following
way:
• If a(x) is specified, then K(x, a) is the block
[(a, a(x)] of attribute a and its value a(x),
• If a(x)) =? or a(x) = ∗ then the set K(x, a) = U ,
where U is the set of all cases.
For Table 1 and B = A,
KA (1) = {1, 4},
KA (2) = {2},
KA (3) = {3, 6, 7},
KA (4) = {4},
KA (5) = {5},
KA (6) = {3, 6, 7},
KA (7) = {3, 7}.
Note that for incomplete data there is a few possible
ways to define approximations [6, 9], we use concept approximations [7]. A B-concept lower approximation of the
concept X is defined as follows:
cases
attributes
concepts
66
277
74
210
155
150
148
178
5
9
7
19
19
4
18
13
2
2
2
7
2
3
4
3
50
Bankruptcy, *
Bankruptcy, ?
Breast cancer, *
Breast cancer, ?
45
40
35
30
Error rate
• If for an attribute a there exists a case x such that the
corresponding value is a ”do not care” condition, i.e.,
a(x) = ∗, then the case x should be included in blocks
[(a, v)] for all specified values v of attribute a.
Number of
25
20
15
10
5
0
0
0.2
0.4
0.6
Parameter alpha
0.8
1
Figure 1. Error rate for bankruptcy and breast cancer data
sets
BX = ∪{KB (x) | x ∈ X, KB (x) ⊆ X}.
A B-concept upper approximation of the concept X
is defined as follows:
BX = ∪{KB (x) | x ∈ X, KB (x) ∩ X 6= ∅} =
= ∪{KB (x) | x ∈ X}.
For Table 1, A-concept lower and A-concept upper
approximations of the two concepts: {1, 2, 3, 4} and {5, 6,
7} are:
A{1, 2, 3, 4} = {1, 2, 4},
A{5, 6, 7} = {5},
A{1, 2, 3, 4} = {1, 2, 3, 4, 6, 7},
A{5, 6, 7} = {3, 5, 6, 7}.
50
90
45
80
40
30
60
Error rate
Error rate
70
Echocardiogram, *
Echocardiogram, ?
Hepatitis, *
Hepatitis, ?
35
Image segmentation, *
Image segmentation, ?
Iris, *
Iris, ?
25
20
50
40
30
15
20
10
10
5
0
0
0
0
0.2
0.4
0.6
Parameter alpha
0.8
0.2
1
Figure 2. Error rate for echocardiogram and hepatitis data
sets
0.4
0.6
Parameter alpha
0.8
1
Figure 3. Error rate for image segmentation and iris data
sets
40
4
35
Probabilistic Approximations
30
apprα (X) = ∪{[x] | x ∈ U, P (X | [x]) ≥ α},
where [x] is [x]A and α is a parameter, 1 ≥ α > 0. We
excluded the case of α = 0 since then apprα (X) = U
for any nonempty X. Since we consider all possible values
of α, our definition of apprα (X) covers both lower and
upper probabilistic approximations. For discussion on how
this definition is related to the value precision asymmetric
rough sets see [1, 7].
Note that if α = 1, the probabilistic approximation becomes the standard lower approximation and if α is
small, close to 0, in our experiments it was 0.001, the same
definition describes the standard upper approximation.
For incomplete data sets, a B-concept probabilistic
approximation is defined by the following formula [7]
25
Error rate
In this paper we are exploring all probabilistic approximations that can be defined for a given concept X. For completely specified data sets a probabilistic approximation is
defined as follows
20
15
Lymphography, *
Lymphography, ?
Wine recognition, *
Wine recognition, ?
10
5
0
0
0.2
0.4
0.6
Parameter alpha
0.8
1
Figure 4. Error rate for lymphography and wine recognition
data sets
appr0.5 ({1, 2, 3, 4}) = {1, 2, 4}.
Note that there are only two distinct probabilistic approximations for the concept [(Flu, no)] as well.
∪{KB (x) | x ∈ X, P r(X|KB (x)) ≥ α}.
For Table 2 and the concept X = [(Flu, yes)] = {1,
2, 3, 4}, for any characteristic set K(x) = KA (x), x ∈
U , conditional probabilities P (X|K(x)) are presented in
Table 3.
Since in we will discuss only A-concept probabilistic
approximations, we will call them, for simplicity, probabilistic approximations.
Thus, for the concept {1, 2, 3, 4} we may define only
two distinct probabilistic approximations:
appr0.333 ({1, 2, 3, 4}) = {1, 2, 3, 4, 6, 7},
and
5 Rule Induction with LERS
The LERS (Learning from Examples based on Rough Sets)
data mining system [4, 5] starts from computing lower and
upper approximations for every concept and then it induces
rules using the MLEM2 (Modified Learning from Examples Module version 2) rule induction algorithm. Rules
induced from lower and upper approximations are called
certain and possible, respectively [2].
MLEM2 explores the search space of attribute-value
pairs. Its input data set is a lower or upper approximation of
Table 7. Breast cancer, lost values
Table 6. Bankruptcy, ”do not care” conditions
α
Error rate
0.001
0.75
0.8675
1.0
28.43
41.26
46.82
42.08
Standard deviation
5.580
3.933
4.103
5.748
a concept. In general, MLEM2 computes a local covering
and then converts it into a rule set [5].
In order to induce probabilistic rules we have to modify input data sets. For every probabilistic approximation
of the concept X = [(d, w)], the corresponding region will
be unchanged (every entry will be the same as in the original data set). For all remaining cases, the decision value
will be set to a special value, e.g., SPECIAL. Then we will
induce a possible rule set [4] using the MLEM2 rule induction algorithm. From the induced rule set, only rules
with (d, w) on the right hand side will survive, all remaining rules (for other values of d and for values SPECIAL)
should be deleted. The final rule set is a union of all rule
sets computed this way separately for all values of d.
For example, if we want to induce probabilistic rules
with α = 0.5 and X = [(Flu, yes)] = {1, 2, 3, 4} for the data
set presented on Table 1 we should construct the decision
table presented as Table 4.
From Table 4, the MLEM2 rule induction algorithm
induced the following possible rule with (Flu,yes) on right
hand side:
1, 3, 3
(Headache, yes) → (F lu, yes).
Rules for remaining two concepts must be computed
separately, from different tables. Rules are presented in the
LERS format, every rule is associated with three numbers:
the total number of attribute-value pairs on the left-hand
side of the rule, the total number of cases correctly classified by the rule during training, and the total number of
training cases matching the left-hand side of the rule, i.e.,
the rule domain size.
6
Experiments
For our experiments we used eight real-life data sets that
are available on the University of California at Irvine Machine learning Repository. These data sets were enhanced
by replacing 35% of existing attribute values by missing attribute values, separately by lost values and separately by
”do not care” conditions, see Table 5. Thus, for any data
set from Table 5, two data sets were used for experiments,
with missing attribute values interpreted as lost values and
as ”do not care” conditions, respectively.
α
0.001
0.4
0.9
1.0
Error rate
30.07
28.52
28.05
28.28
Standard deviation
0.900
1.102
1.318
1.135
The main objective of our research was to test whether
probabilistic approximations, different from lower and upper approximations, are truly better than lower and upper
approximations. To accomplish this objective, we conducted experiments of a single ten-fold cross validation
increasing the parameter α, with increments equal to 0.1,
from 0 to 1.0. For a given data sets, in all of these eleven
experiments we used identical ten pairs of larger (90%) and
smaller (10%) data sets, see Figures 1–4. If during such a
sequence of eleven experiments, the error rate was smaller
than the minimum of the error rates for lower and upper approximations or larger than maximum of the error rated for
lower and upper approximations, we selected more precise
values of the parameter α, to make sure that we are reaching an extreme. For a value suspected to be an extreme, we
conducted additional 30 experiments of ten-fold cross validation. We compared averages and the standard deviations,
using the standard statistical test for the difference between
two averages (two-tailed test with 5% of significance level).
For example, for the bankruptcy data set, affected by
lost values, denoted by ”?”, the error rate was constant,
so there is no need for additional 30 experiments. But for
the same data set affected by ”do not care” conditions, denoted by ”*”, it is clear that we should look more closely
at the parameter α around the value of 0.8. Results are presented in Table 6. Using the standard statistical test for the
difference between two averages (two tails and the significance level of 5%) we may conclude that the lower approximation (α = 1.0) is worse than the upper approximation (α = 0.001). For the rest of the paper, whenever
we will quote this statistical test, it will be always the twotailed test with the significance level of 5%. The same test
indicates that the probabilistic approximation, associated
with α = 0.8675 is worse than the lower approximation
(α = 1.0), as well as the upper approximation (α = 0.001).
Results of all remaining 30 experiments of ten-fold cross
validation are presented on Tables 7–14.
The wine recognition data set, affected by ”do not
care” conditions represents the only case where probabilistic approximations (e.g., for α = 0.4 and α = 0.8) are
better than both lower and upper approximations. For remaining 14 cases (combinations of a data set and a type of
missing attribute values) the probabilistic approximations
are not worse than the worse of the two: lower and upper
approximations and not better than the better of the two:
lower and upper approximations.
Table 13. Wine recognition, lost values
Table 8. Breast cancer, ”do not care” conditions
α
Error rate
0.001
0.2
0.45
0.5
1.0
28.93
28.78
29.38
29.38
29.03
Standard deviation
0.605
0.501
0.835
0.555
0.877
α
0.001
0.6
0.8
1.0
Error rate
Standard deviation
18.19
17.70
17.04
17.26
2.031
1.661
1.485
1.877
Table 14. Wine recognition, ”do not care” conditions
α
Error rate
Standard deviation
26.24
22.47
23.49
37.68
3.095
2.945
3.257
4.162
Table 9. Hepatitis, ”do not care” conditions
α
Error rate
0.001
0.2
0.4
0.7
1.0
19.66
19.61
20.08
20.95
20.56
Standard deviation
1.242
0.952
1.230
1.158
0.893
Table 10. Image segmentation, lost values
α
0.001
0.45
1.0
Error rate
45.81
45.71
45.90
Standard deviation
2.621
3.048
2.704
Table 11. Iris, lost values
α
0.001
0.75
1.0
Error rate
13.73
13.87
13.51
Standard deviation
1.915
1.520
1.485
0.001
0.4
0.8
1.0
Our secondary objective was to test how many different probabilistic approximations there exist for a given
concept of the real-life data set. Results are listed in Tables
15-22.
7 Conclusions
Our main objective was to test whether probabilistic approximations, different from lower and upper approximations, are truly better than lower and upper approximations.
As follows from our experiments, for only one out of 16
possibilities (the wine recognition data set combined with
missing attribute values interpreted as ”do not care” conditions) some of proper probabilistic approximations are better than ordinary lower and upper approximations. On the
other hand, for another possibility (the bankruptcy data set
combined with missing attribute values interpreted as ”do
not care” condition) some of proper probabilistic approximations are worse than ordinary lower and upper approximations.
Additionally, we may conclude that upper approximations are better than lower approximations for the following
Table 15. Bankruptcy
Table 12. Lymphography, ”do not care” conditions
Concept
α
0.001
0.8
1.0
Error rate
Standard deviation
28.24
29.71
30.83
2.568
3.374
2.803
bankruptcy
survival
Upper limit for the number of
distinct probabilistic approximations
?
*
1
1
18
18
Table 20. Iris
Table 16. Breast cancer
Concept
Concept
Upper limit for the number of
distinct probabilistic approximations
Upper limit for the number of
distinct probabilistic approximations
recurrence-events
no-recurrence-events
?
*
15
15
128
128
iris-setosa
iris-viginica
iris-versicolor
?
*
3
6
7
67
62
65
Table 21. Lymphography
Table 17. Echocardiogram
Concept
one
zero
Upper limit for the number of
distinct probabilistic approximations
?
*
1
1
13
13
Table 18. Hepatitis
Concept
yes
no
Upper limit for the number of
distinct probabilistic approximations
?
*
1
1
47
47
Table 19. Image segmentation
Concept
sky
cement
window
briskface
foliage
path
grass
Upper limit for the number of
distinct probabilistic approximations
?
*
1
3
2
2
2
3
1
47
76
58
66
75
63
56
Concept
one
two
three
four
Upper limit for the number of
distinct probabilistic approximations
?
*
1
2
2
1
3
26
26
1
six data sets, all associated with ”do not care” conditions:
bankruptcy, echo, hepatitis, iris, lymphography, and wine
recognition. For only one combination (breast cancer and
lost values) the lower approximation was better than the
upper approximation. For remaining 9 combinations the
difference is not statistically significant.
Furthermore, for five data sets (bankruptcy, image
segmentation, iris, lymphography and wine recognition) associated with lost values the error rate is smaller than the
error rate for the same data sets associated with ”do not
care” conditions. In the case of the lymphography data
set the conclusion that lost values are better than ”do not
care” conditions was a result of the Wilcoxon matchedpairs signed-ranks test (again, for two-tailed test and 5%
level of significance).
We also conducted experiments to test, for a given
concept, how many different probabilistic approximations
there exist for a combination of the data set and type of
Table 22. Wine recognition
Concept
one
two
three
Upper limit for the number of
distinct probabilistic approximations
?
*
4
3
6
96
69
106
missing attribute values. Note that the upper limits for
number of distinct probabilistic approximations are always
smaller for lost values than for ”do not care” conditions.
This fact explains why the error rate for data sets with lost
values is constant or show small deviation on Figures 1–4.
If the only conditional probabilities, associated with characteristic sets are 0’s and 1’s, then the only probabilistic
approximations are equal to both lower and upper approximations.
In the future we plan to study probabilistic approximations using a series of incomplete data sets, with incrementally increased number of missing attribute values.
References
[1] P. G. Clark and J. W. Grzymala-Busse. Experiments
on probabilistic approximations. In Proceedings of
the 2011 IEEE International Conference on Granular
Computing, pages 144–149, 2011.
[2] J. W. Grzymala-Busse. Knowledge acquisition under
uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems, 1:3–16, 1988.
[3] J. W. Grzymala-Busse. On the unknown attribute values in learning from examples. In Proceedings of the
ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pages 368–377, 1991.
[4] J. W. Grzymala-Busse. LERS—a system for learning
from examples based on rough sets. In R. Slowinski, editor, Intelligent Decision Support. Handbook of
Applications and Advances of the Rough Set Theory,
pages 3–18. Kluwer Academic Publishers, Dordrecht,
Boston, London, 1992.
[5] J. W. Grzymala-Busse. MLEM2: A new algorithm
for rule induction from imperfect data. In Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in
Knowledge-Based Systems, pages 243–250, 2002.
[6] J. W. Grzymala-Busse. Three approaches to missing
attribute values—a rough set perspective. In Proceedings of the Workshop on Foundation of Data Mining,
in conjunction with the Fourth IEEE International
Conference on Data Mining, pages 55–62, 2004.
[7] J. W. Grzymala-Busse. Generalized parameterized
approximations. In Proceedings of the RSKT 2011,
the 6-th International Conference on Rough Sets and
Knowledge Technology, pages 136–145, 2011.
[8] J. W. Grzymala-Busse and M. Hu. A comparison
of several approaches to missing attribute values in
data mining. In Proceedings of the Second International Conference on Rough Sets and Current Trends
in Computing, pages 340–347, 2000.
[9] J. W. Grzymala-Busse and W. Rzasa. A local version
of the MLEM2 algorithm for rule induction. Fundamenta Informaticae, 100:99–116, 2010.
[10] J. W. Grzymala-Busse and A. Y. Wang. Modified algorithms LEM1 and LEM2 for rule induction from
data with missing attribute values. In Proceedings of
the Fifth International Workshop on Rough Sets and
Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), pages 69–
72, 1997.
[11] J. W. Grzymala-Busse and W. Ziarko. Data mining
based on rough sets. In J. Wang, editor, Data Mining:
Opportunities and Challenges, pages 142–173. Idea
Group Publ., Hershey, PA, 2003.
[12] M. Kryszkiewicz. Rules in incomplete information
systems. Information Sciences, 113(3-4):271–292,
1999.
[13] Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11:341–356, 1982.
[14] Z. Pawlak and A. Skowron. Rough sets: Some extensions. Information Sciences, 177:28–40, 2007.
[15] Z. Pawlak, S. K. M. Wong, and W. Ziarko. Rough
sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies, 29:81–
95, 1988.
[16] D. Ślȩzak and W. Ziarko. The investigation of the
bayesian rough set model. International Journal of
Approximate Reasoning, 40:81–91, 2005.
[17] J. Stefanowski and A. Tsoukias. Incomplete information tables and rough classification. Computational
Intelligence, 17(3):545–566, 2001.
[18] S. K. M. Wong and W. Ziarko. INFER—an adaptive
decision support system based on the probabilistic approximate classification. In Proceedings of the 6-th
International Workshop on Expert Systems and their
Applications, pages 713–726, 1986.
[19] Y. Y. Yao.
Probabilistic rough set approximations. International Journal of Approximate Reasoning, 49:255–271, 2008.
[20] Y. Y. Yao and S. K. M. Wong. A decision theoretic
framework for approximate concepts. International
Journal of Man-Machine Studies, 37:793–809, 1992.
[21] W. Ziarko. Variable precision rough set model. Journal of Computer and System Sciences, 46(1):39–59,
1993.
[22] W. Ziarko. Probabilistic approach to rough sets. International Journal of Approximate Reasoning, 49:272–
284, 2008.