Download Mining Frequent Patterns from Correlated Incomplete

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mining Frequent Patterns from Correlated Incomplete Databases
Badran Raddaoui1 and Ahmed Samet2
2
1 LIAS - ENSMA EA 6315, University of Poitiers, France
University of Tunis El Manar, LIPAH Laboratory, Faculty of Sciences of Tunis, Tunisia
[email protected], [email protected]
Keywords:
Imperfection, Evidential Database, Correlated Incomplete Database, Frequent Itemset Mining.
Abstract:
Modern real-world applications are forced to deal with inconsistent, unreliable and imprecise information. In
this setting, considerable research efforts have been put into the field of caring for the intrinsic imprecision
of the data. Indeed, several frameworks have been introduced to deal with imperfection such as probabilistic, fuzzy, possibilistic and evidential databases. In this paper, we present an alternative framework, called
correlated incomplete database, to deal with information suffering with imprecision. In addition, correlated
incomplete database is studied from a data mining point of view. Since, frequent itemset mining is one of the
most fundamental problems in data mining, we propose an algorithm to extract frequent patterns from correlated incomplete databases. Our experiments demonstrate the effectiveness and scalability of our framework.
1
INTRODUCTION
Uncertain information is commonplace in real-world
data management domains. In recent years, uncertain data management has seen a revival in interest
because of number of challenges in terms of collecting, modeling, representing, querying, indexing and
mining the data. The study of uncertainty and incompleteness in databases has long been a growing interest of the community of database (Fuhr and Rölleke,
1997; Halpern, 1990; Imielinski and Jr., 1984). Recently, this interest has been rekindled by an increasing demand for managing large amounts of heterogeneous data, often incomplete and uncertain, emerging
from data cleaning, scientific data management, information extraction, sensor data management, economic decision making, moving object management,
market surveillance, etc. In particular, the incorporation of imprecise information is nowadays more and
more recognized as being indispensable for industrial
practice.
Handling databases that suffer from imperfection
is a scientific attractive discipline since the nineties
of last century (Bell et al., 1996). The nature of real
world data has led the community of database to develop new frameworks that handle imperfect information. Three types of imperfection have been developed, including imprecision, uncertainty and inconsistency. Both imprecision and uncertainty are
largely studied in literature (Hewawasam et al., 2007;
Lee, 1992a). More precisely, in (Dalvi and Suciu,
2007), the authors focus on query evaluation in traditional probabilistic databases; and ULDB (Benjelloun et al., 2006) supports uncertain data and data lineage in Trio (Widom, 2005). Recently, MayBMS uses
the vertical World-Set representation of uncertain data
(Olteanu et al., 2008). Note that the standard semantics adopted in most work is the possible worlds semantics (Zimányi, 1997).
In addition, several uncertain frameworks such as
probabilities, fuzzy set theory and more recently evidence theory are commonly used to model imprecise
data (Samet et al., 2014b). Indeed, information under
imprecision are modeled by sets, intervals, and fuzzy
values (Chen and Weng, 2008). Furthermore, the lack
of information is considered as a type of imprecision
which refers to incompleteness. Incompleteness is a
ubiquitous problem in practical data management. Indeed, in real-world applications, we may encounter
such type of data in store database. To illustrate, let
us consider a buyer who purchased two products p1
and p2 from a market. For instance, we may know
just the information that the buyer bought the product
p1 more than p2 (p1 > p2 ). Even the absence of information could be seen as some kind of incompleteness
in incomplete databases. For example, the information if the buyer has bought the product p1 could not
be available.
The theoretical foundations for representing
and querying incomplete information were laid by
Imielinski and Lipski (Imielinski and Jr., 1984). To
answer queries in the presence of incompleteness,
Levy (Levy, 1996) suggested to look for certain answers: those that do not depend on the interpretation
of unknown data, without requiring the completeness
of other parts of the database. Later, in (Razniewski
and Nutt, 2011) the authors developed techniques to
conclude the completeness of query answers from information about the completeness of parts of an incomplete database.
Recently, a new line of research has been established, which became known under the name Data
Mining. The problem of mining frequent itemsets is
well-known and essential in data mining, knowledge
discovery and data analysis. It has applications in various fields and becomes fundamental for data analysis as datasets and datastores that are becoming very
large.
In this paper, we propose an alternative framework
for handling incompleteness and correlation between
attributes within incomplete databases. The new
framework, called correlated incomplete database,
will be tackled from the data mining perspective. Specifically, the presence of elements in such
database is represented relatively to another one without any prior indication about their quantity value.
Indeed, information are represented using some kind
of correlations/dependencies between elements of the
given database. Further, the absence of information is
considered in this new database mode, i.e., the value
of an attribute is not known for sure. In addition, we
provide a new database modelling and a mining procedure to handle such database. For this, the correlated incomplete database is transformed into an evidential database using Ben Yaghlane’s axioms (Yaghlane et al., 2006). Next, we propose an algorithm to
mine answers over the obtained evidential database.
In addition, our empirical evaluation are conducted on
a synthetic correlated incomplete database obtained
from student answers to a questionnaire.
The rest of this paper proceeds as follows. In Section 2, we recall basic notions of evidence theory.
Then, we present the concept of evidential database.
In Section 3, we introduce a new kind of incomplete
database called correlated incomplete database which
its impreciseness is due to, inter alia, the lack of information. Section 4 shows the method to transform a
correlated incomplete database into an evidential one
using Ben Yaghlane axioms. Then, we propose in
Section 5 a new itemset mining algorithm in context
of correlated incomplete databases. Section 6 deals
with the implementation and empirical evaluation. Finally, we summarize the paper and we sketch issues
of future work.
2
FORMAL SETTING AND
NOTATION
In this section, we briefly review evidence theory, also
known as belief functions theory or Dempster-Shafer
theory, and we extend it to introduce the basic concepts of evidential databases (Lee, 1992b).
2.1
EVIDENCE THEORY
The evidence theory (Dempster, 1967) becomes more
and more popular. It is a simple and flexible framework for dealing with imperfect information. It generalizes the probabilistic framework by its capacity to
model the total and partial ignorance. Moreover, it
is a powerful tool for combining data. Different interpretations have been studied from evidence theory
such as (Dempster, 1967; Gärdenfors, 1983; Smets
and Kennes, 1994). One of the most used is the
Transferable Belief Model (TBM) proposed by Smets
(Smets and Kennes, 1994) to represent quantified beliefs. The TBM model is a non probabilistic interpretation of the evidence theory relying on two distinct
levels: (i) a credal level where beliefs are entertained
and quantified by belief functions; (ii) a pignistic level
where beliefs can be used to make decisions and are
quantified by probability functions. The evidence theory is based on several fundamentals such as the Basic Belief Assignment (BBA). A BBA m is a mapping
from elements of the power set 2Θ onto [0, 1]:
m : 2Θ −→ [0, 1]
where Θ is the frame of discernment. It is the set
of possible answers for a treated problem and is composed of N exhaustive and exclusive hypotheses:
Θ = {H1 , H2 , ..., HN }.
A BBA m do have several constraints such as:

 ∑ m(A) = 1
X⊆Θ
(1)
m(0)
/ ≥0
Each subset X of 2Θ fulfilling m(X) > 0 is called
/ = 0 is the norfocal element. Constraining m(0)
malized form of a BBA and this corresponds to a
closed-world assumption (Smets, 1988), while allow/ ≥ 0 corresponds to an open-world assumping m(0)
tion (Smets and Kennes, 1994).
In the spirit of BBA, other functions are commonly introduced from 2Θ to [0, 1]: the first one,
called the belief function, is interpreted as the degree
of justified support assigned to a proposition A by the
available evidence. Formally, it is defined as:
Bel(A) =
∑
06/ =B⊆A
m(B)
(2)
On the other hand, the plausibility function, denoted Pl(.), is defined as follows:
Pl(A) =
∑
m(B)
Example 1. From Table 1, A1 is an item and {ΘA B1 }
is an itemset such that A1 ⊂ {ΘA B1 } and A1 ∩
{ΘA B1 } = A1 .
(3)
B∩A6=0/
The plausibility expresses the maximum potential
support that could be given to a hypothesis, if further
evidence becomes available.
3
2.2
In this section, we present a new kind of imperfect database called correlated incomplete database.
A correlated incomplete database is an imprecise
database such that the imprecision refers to the lack
of the information.
EVIDENTIAL DATABASE
CORRELATED INCOMPLETE
DATABASE
An evidential database stores data that could be
perfect or imperfect. Data’s imperfection in such
database is expressed via the evidence theory. Formally, an evidential database, denoted by EDB , is
populated by two elements: n columns and d lines
where each column i (1 ≤ i ≤ n) has a domain Θi of
discrete values. Cell of line j and column i contains a
normalized BBA as follows:
Definition 1. A correlated incomplete database is
a triple C I DB = (O , I , R̃ ) where:
mi j : 2Θi → [0, 1] with

mi j (0)
/ =0
 ∑ mi j (A) = 1
Definition 2.
Let C I DB = (O , I , R̃ ) be a correlated incomplete database. For two items p1 , p2 ∈ I ,
we define the following operators as follows:
(4)
A⊆Θi
That is, such kind of representation makes from
the evidential database one of the largest formalism
to capture any other kind of database (Samet et al.,
2014b).
Transaction
T1
Attribute A
m11 (A1 ) = 0.7
m11 (ΘA ) = 0.3
T2
m12 (A2 ) = 0.3
m12 (ΘA ) = 0.7
Attribute B
m21 (B1 ) = 0.4
m21 (B2 ) = 0.2
m21 (ΘB ) = 0.4
m(B1 )22 = 1
Table 1: Evidential transaction database EDB
In an evidential database, as shown in Table 1, an
evidential item corresponds to a focal element. Thus,
an evidential itemset corresponds to a conjunction of
focal elements having different domains. Two different evidential itemsets can be related via the inclusion
or intersection operator. Indeed, the inclusion operator (Samet et al., 2013) for evidential itemsets is defined as follows:
Let X and Y are two evidential itemsets, then
X ⊆ Y ⇐⇒ ∀xi ∈ X, xi ⊆ y j
where xi and y j are, respectively, the ith and the jth
elements of X and Y . For the same evidential itemsets
X and Y , the intersection operator (Samet et al., 2013)
is defined as follows:
X ∩Y = Z ⇐⇒ ∀zk ∈ Z, zk ⊆ xi and zk ⊆ y j
• O is the set of objects (e.g., transactions),
• I is the set of items,
• R̃ describes the existence relation of an item to a
transaction.
• p1 p2 means that the existence of p1 depends on
the existence of p2
[Dependency]
• p1 ∼ p2 means that p1 is quasi-equal to p2 w.r.t
quantity
[Quasi-equality]
• − denotes the absence of information about an
item
[Deficiency]
Intuitively, the operator expresses that p1 and
p2 are highly correlated with each other. The operator ∼ expresses the quasi-equality between two items
without any information about their initial quantity.
Finally, the operator − says that the value of an item
is not known for sure, i.e., we know nothing about the
items.
Property 1. Let C I DB = (O , I , R̃ ) be a correlated
incomplete database and T ∈ O . Then, C I DB is symmetric if for all items p1 , p2 in T , we have:
R̃ (p1 , T ) = p1 rel p2 ⇒ R̃ (p2 , T ) = p1 rel p2 (5)
Example 2. Let us consider the correlated incomplete database depicted by Table 2. Then,
• In T1 , p1 p3 means that the client has bought
more p1 than p2 without any further indication
about the quantity of products.
• In T2 , p2 ∼ p3 means that the client has bought p2
and p3 with a similar quantity.
• R̃ (p2 , T1 ) = − signifies that we know nothing
about p2 in the transaction T1 .
Transaction
T1
T2
p1
p1 p3
−
p2
−
p2 ∼ p3
4.2
p3
p1 p3
p2 ∼ p3
Table 2: Example of a correlated incomplete database
4
4.1
FROM CORRELATED
INCOMPLETE DATABASE TO
EVIDENTIAL ONE
BEN YAGHLANE AXIOMS
TRANSFORMATION
The problem of eliciting qualitatively expert opinions
and generating basic belief assignments has been addressed by many researchers (Ennaceur et al., 2014).
In this subsection, we provide an overview of Ben
Yaghlane et al. approach (Yaghlane et al., 2006).
The authors proposed a method for generating optimized belief functions from qualitative preferences.
The aim of this method is then to convert preference
relations into constraints of an optimization problem
whose resolution, according to some uncertainty measures (UM), allows the generation of the least informative or the most uncertain belief functions defined
as follows:
a b ⇒ Bel(a) − Bel(b) ≥ ε
OUR APPROACH
Mining frequent itemsets directly from correlated incomplete databases (see Table 2) is a difficult task.
The database contains information only about the existence of an item relatively to another rather than its
frequency. In addition, item’s quantity in each record
is not required. This constraint prevents the straightforward use of any usual mining methods. In this subsection, we introduce a new method for correlated incomplete database transformation to obtain a classical
and treatable database. Note that he correlated incomplete database is transformed into an evidential one
for data mining perspective. The transformation process is made thanks to Ben Yaghlane’s axioms. Recall
that the axioms of Equation 6 and 7 were firstly introduced to express expert’s preferences. However, they
can be used to express numerical superiority between
items. So, given a correlated incomplete database
C I DB and two items p1 and p2 such that p1 p2 ,
we need to interpret this proposition differently from
an evidential point of view. Two BBAs can be constructed. The first BBA refers to the p1 column in the
database and answers the question ”Does the client
buy the product p1 ?”. Its frame of discernment Θ1
is constituted by two elements yes and no such as
Θ1 = {y, n}. The second BBA answers the same question but relatively to the item p2 . More generally, we
have:
(6)
a b ⇒ Bel(ya ) − Bel(yb ) ≥ ε
a ∼ b ⇒ |Bel(a) − Bel(b)| ≤ ε
(7)
where ε is considered to be the smallest gap that
the expert may discern between the degrees of belief
in two propositions a and b. Note that ε is a constant specified by the expert before beginning the optimization process. Ben Yaghlane et al. developed a
method that requires that propositions be represented
in terms of focal elements, and they assume that Θ
should always be considered as a potential focal element. Then, a mono-objective technique was used to
solve such constrained optimization problem:
maxm UM(m)
s.t.
Bel(a) − Bel(b) ≥ ε
Bel(a) − Bel(b) ≤ ε
Bel(a) − Bel(b) ≥ −ε
/ =0
∑ m(a) = 1, m(a) ≥ 0, ∀a ⊆ Θ; m(0)
a∈F(m)
(8)
This assertion is reasonable since we do not
have any information about the item’s frequency,
but only the existence of items and the dependency
between them in the database. The result is a BBA as
described in the following example:




m(ya ) = v
m(yb ) = v − ε
m(na ) = 0
m(nb ) = 0


m(Θ ) = 1 − v
m(Θ ) = 1 − v + ε
a
b
where v is a real value such as 0 < v < 1. Fixing
v to a low value leads to a BBA less informative. The
ε value is chosen by the expert depending on the gap
between the beliefs of the items. Despite the fact that
Equation 8 was initially introduced as a property for a
constructed BBA, it can be extended to assimilate two
BBAs. Note, when a b two identical BBAs can be
constructed as follows:


m(y) = a
(9)
m(n) = 0

m(Θ) = 1 − v
Finally, when the value of an item is not known
for sure, a vacuous BBA is constructed as follows:
m(Θ) = 1
(10)
Using the previous transformation steps, the obtained evidential database has the same size as the
original correlated incomplete database. Now, retrieving the value of the support from that database can be
done with the precise function (Samet et al., 2014a).
More precisely, given an item xi , the precise value is
computed as follows:
Pr(xi ) =
|xi ∩ x|
× m(x)
x⊆Θi |x|
∑
∀xi ∈ 2Θi .
(11)
Thus, the support of an itemset X is found in the
obtained evidential database EDB as follows:
SupPr
T j (X) =
Pr(Xi )
(12)
∑ SupportTPrj (X).
(13)
∏
Xi ∈θi ,i∈[1...n]
SupEDB (X) =
1
d
d
Require: ε, C I DB
Ensure: EDB
1: function GENERATE BBA(C I DB , ε,T )
2:
for all i in Column C I DB do
3:
if C I DB (T, i) = then
4:
BBA ← construct BBA(ε) \ \Bel(a) −
Bel(b) ≤ ε
5:
end if
6:
if C I DB (T, i) = ∼ then
7:
BBA
←
construct BBA(ε) \
\|Bel(a) − Bel(b)| ≥ ε
8:
end if
9:
end for
10:
return BBA
11: end function
12: for all T in Size C I DB do
13:
BBA ← generate BBA(C I DB , ε, T )
14:
EDB (T, 1) ← BBA
15: end for
j=1
The Table 3 illustrates the transformation process
of the correlated incomplete database given previously in Table 2. The computed BBA belongs to the
frame of discernment of all items Θ with ε = 0.05 and
v = 0.2. It should be noted that even there is a lack information about the item p2 in the transaction T1 (see
table 2), p2 possess a reasonable support. Indeed, the
lack of information does not signify the non existence
of the item. In our case, p2 has a support equal to :
1
m(Θ2 ) + m(y2 ) + 12 m(Θ2 )
Sup(p2 ) = SupEDB (y2 ) = 2
2
= 0.55
In the following, we detail the procedure of transforming a correlated incomplete database to an evidential one. Algorithm 1 provides the evidential
transformation through Ben Yaghlane’s axioms.
Given a correlated incomplete database C I DB ,
the function generate BBA(C I DB , ε, T ) allows to
construct a BBA for the transaction T in C I DB for a
fixed ε. The computed BBA verifies all constraints in
the columns of the considered transaction. The computed BBA is then inserted in the evidential database
EDB . This process is repeated for all transactions of
the database C I DB .
5
Algorithm 1 Correlated Incomplete Database Transformation Algorithm
FREQUENT ITEMSET MINING
In this section, correlated incomplete databases are
studied from a data mining point of view. Since, frequent itemset mining is one of the most fundamental
problems in data mining, we present in the following
an algorithm to extract frequent pattern from correlated incomplete databases. This can be done by using the evidential database obtained by the Algorithm
1.
So, Algorithm 2 describes the process of mining
frequent patterns from an evidential database. The
proposed algorithm, called EDMA, is a level-wise approach for determining all frequent patterns from the
evidential database EDB .
6
EMPIRICAL EVALUATION
In this section, we discuss some experimental results
for mining frequent itemset from correlated incomplete databases. Even if such kind of incomplete
database has not been discussed yet in the literature,
we can provide several sort of applications since the
uncertain events are naturally highly correlated with
each other. Then, one can consider the case of a Market Basket Analysis (MBA) problem where we lack
information about the quantity of each bought item.
The deficiency of knowledge and the quantity-wise
relative to the comparison among items are ubiquitous in these kind of data. In the following, we propose to study a database constructed from a questionnaire given to the students of the University of Littoral Cote d’Opale. The questionnaire is about the
grade obtained last year in 12 subjects. Since, most
of students have forgotten their grades, their answers
Transaction
T1
p1


m11 (y1 ) = 0.20
m11 (n1 ) = 0.00

m (Θ ) = 0.80
11
1
m12 (Θ1 ) = 1.00
T2
p2
m21 (Θ2 ) = 1.00


m22 (y2 ) = 0.20
m22 (n2 ) = 0.00

m (Θ ) = 0.80
22
2
p3


m31 (y3 ) = 0.15
m31 (n3 ) = 0

m (Θ ) = 0.85
 31 3

m32 (y3 ) = 0.20
m32 (n3 ) = 0.00

m (Θ ) = 0.80
32
3
Table 3: The evidential database obtained from the table 2
·104
1
3
# Frequent patterns
Precise-based support
Belief-based support
2
Execution time (s)
·105
0
Precise-based support
Belief-based support
2
1
0
100
5000
20000
50000
0.05
Database size
should be described using a correlated incomplete
database. In particular, the students mark the dependency of a grade relatively to another one or nothing in the case they do not remember. About 312 response forms were collected, 6 of them were rejected
(a few students did not understand the task and their
responses were omitted). The obtained database is depicted in Table 4.
The correlated incomplete database constructed
from students’ answers is transformed into an evidential database following the described procedure
in Algorithm 1. Since each proposed data mining algorithm must stand out performance-wise and
quality-wise, this means the provided EDMA algorithm associated to the transformation module must
extract the maximal frequent patterns in a reasonable
time. Therefore, the original correlated incomplete
database is extended by data duplication. We compared the data mining task using the precise-based
support and the belief-based one (Hewawasam et al.,
2007).
Figure 1 represents the evolution of frequent patterns relatively to the increase of the size of the
database.
Figure 2 represents the number of frequent patterns extracted from the obtained evidential database
relatively to the fixed value of the minsup. We can
0.4
minsup
Figure 2: Number of extracted frequent patterns relatively
to minsup
·105
1.5
# Valid rules
Figure 1: Extraction time relatively to the database size
0.2
Precise-based confidence
Belief-based confidence
1
0.5
0
0.5
0.7
0.9
minconf
Figure 3: Number of extracted valid association rules relatively to minconf
see that the number of frequent patterns is 22967 elements for a minsup fixed to 0.5 and retrieved with
the precise-based support. This important number of
patterns may be explained by the size of the database
having 12 attributes each one contains two elements
within the frame of discernment. As a result, the
treated database is similar to a 36 columns database.
The itemsets composed by only the yi (i ∈ [1, 12])
items are the most interesting since they reflect the
subjects where students got the best grades. Thus, re-
Subject
S1
Programming
Programming DB
S2
DB
DB Network
Programming DB
-
S3
DB ∼ Programming
Programming OS
DB ∼ Programming
Programming OS
···
···
···
···
···
···
···
OS
-
Network
DB Network
OS ∼ Network
Programming OS
Programming OS
OS ∼ Network
-
Table 4: A sample of the correlated database provided by students’ responses
trieving frequent patterns of the initial correlated incomplete database is possible by reducing frequent
evidential patterns to those containing only the yi
items. In addition, comparatively to the belief based
support, we discover more hidden pattern with the
precise support since the belief support. Figure 3
shows the number of valid association rules retrieved
from the evidential database. Each pattern of size k
gives 2k − 2 different association rules. Therefore, the
precise-based confidence measure provides the highest number of valid association rules comparatively
to the belief-based one. Those rules show the correlation between grades. In fact, we may have a rule of
the form if a student got a good grade in DB, then he
should have a good grade in programming. In addition, Figure 1 highlights the scalability of the precisebased and the belief-based support. The minsup is
fixed to 0.5 and the database is extended by duplicating data. The curb shows that precise-based support
is more expensive. This can be explained by two reasons. The first one is that the precise-based support
generates more frequent candidates. That is, the more
candidates the algorithm handles, the more support it
computes. In addition, the precise-based support relies on set intersection computing and therefore more
time is consumed than the belief-based one.
lyzed by experiments in a synthetic datasets, and the
experiments confirm that our algorithm and strategy
has ideal performance. Our future work will investigate how to further study and implement our approach
using real-world data. In particular, it will be interesting to consider openly available knowledge bases
such as DBpedia, Freebase, OpenCyc, Wikidata, and
YAGO. Another interesting direction is how our mining method can be extended for mining sequential patterns, max patterns, and partial periodicity.
REFERENCES
Bell, D. A., Guan, J., and Lee, S. K. (1996). Generalized
union and project operations for pooling uncertain and
imprecise information. Data & Knowledge Engineering, 18(2):89–117.
Benjelloun, O., Sarma, A. D., Halevy, A. Y., and Widom,
J. (2006). ULDBs: Databases with uncertainty and
lineage. In Proceedings of the 32nd International
Conference on Very Large Data Bases, Seoul, Korea,
pages 953–964.
Chen, Y. and Weng, C. (2008). Mining association
rules from imprecise ordinal data. Fuzzy Set Syst,
159(4):460–474.
Dalvi, N. N. and Suciu, D. (2007). Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523–
544.
7
Conclusion
In this paper, we have proposed a new type of imprecise database called correlated incomplete database.
More precisely, an membership between an item and
a given transaction is expressed relatively to another
item, making use of dependency and similarity operators. In addition, the deficiency of information
is supported maintaining the consistency with the incompleteness definition in literature. Then, we have
shown how a given correlated incomplete database
can be transformed to an evidential one using Ben
Yaghlane’s axioms. We have also presented an algorithm of frequent patterns mining from evidential
database with the use of the precise support. Furthermore, the effectiveness of our approach is ana-
Dempster, A. (1967). Upper and lower probabilities induced by multivalued mapping. AMS-38.
Ennaceur, A., Elouedi, Z., and Lefevre, E. (2014). Multicriteria decision making method with belief preference relations. International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems, 22(4):573–
590.
Fuhr, N. and Rölleke, T. (1997). A probabilistic relational
algebra for the integration of information retrieval and
database systems. ACM Trans. Inf. Syst., 15(1):32–66.
Gärdenfors, P. (1983). Probabilistic reasoning and evidentiary value. In Evidentiary Value: Philosophical, Judicial, and Psychological Aspects of a Theory: Essays
Dedicated to Sören Halldén on His Sixtieth Birthday.
C.W.K. Gleerups.
Halpern, J. Y. (1990). An analysis of first-order logics of
probability. Artif. Intell., 46(3):311–350.
Hewawasam, K. R., Premaratne, K., and Shyu, M.-L.
(2007). Rule mining and classification in a situation
assessment application: A belief-theoretic approach
for handling data imperfections. Trans. Sys. Man Cyber. Part B, 37(6):1446–1459.
Imielinski, T. and Jr., W. L. (1984). Incomplete information
in relational databases. J. ACM, 31(4):761–791.
Lee, S. (1992a). An extended relational database model for
uncertain and imprecise information. In Proceedings
of the 18th International Conference on Very Large
Data Bases, Vancouver, British Columbia, Canada,
pages 211–220.
Lee, S. (1992b). Imprecise and uncertain information in
databases: an evidential approach. In Proceedings of
Eighth International Conference on Data Engineering, Tempe, AZ, pages 614–621.
Levy, A. Y. (1996). Obtaining complete answers from incomplete databases. In Proceedings of 22th International Conference on Very Large Data Bases, Mumbai
(Bombay), India, pages 402–412.
Olteanu, D., Koch, C., and Antova, L. (2008). Worldset decompositions: Expressiveness and efficient algorithms. Theor. Comput. Sci., 403(2-3):265–284.
Razniewski, S. and Nutt, W. (2011). Completeness of
queries over incomplete databases. In Proceeding of
the 37th internation conference on Very Large Data
Bases, The Westin, Seattle, WA, 4(11):749–760.
Samet, A., Lefevre, E., and Ben Yahia, S. (2013). Mining frequent itemsets in evidential database. In
Proceedings of the fifth International Conference on
Knowledge and Systems Engeneering, Hanoi, Vietnam, pages 377–388.
Samet, A., Lefevre, E., and Ben Yahia, S. (2014a). Classification with evidential associative rules. In Proceedings of 15th International Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems, Montpellier, France,
pages 25–35.
Samet, A., Lefevre, E., and Ben Yahia, S. (2014b). Evidential database: a new generalization of databases? In
Proceedings of 3rd International Conference on Belief
Functions, Belief 2014, Oxford, UK, pages 105–114.
Smets, P. (1988). Belief functions. in Non Standard Logics
for Automated Reasoning , P. Smets, A. Mamdani, D.
Dubois, and H. Prade, Eds. London,U.K: Academic,
pages 253–286.
Smets, P. and Kennes, R. (1994). The Transferable Belief
Model. Artificial Intelligence, 66(2):191–234.
Widom, J. (2005). Trio: A system for integrated management of data, accuracy, and lineage. In Proceeding of
the Second Biennial Conference on Innovative Data
Systems Research (CIDR ’05), Pacific Grove, California, pages 262–276.
Yaghlane, A. B., Denoeux, T., and Mellouli, K. (2006).
Constructing belief functions from qualitative expert
opinions. In Proceeding of the second International
Conference on Information and Communication Technologies, ICTTA, 1:1363–1368.
Zimányi, E. (1997). Query evaluation in probabilistic relational databases. Theor. Comput. Sci., 171(1-2):179–
219.
Algorithm 2 Evidential Data Mining Apriori (EDMA) algorithm
Require: EDB , minsup, PT, Size EDB
Ensure: EI F F
1: function F REQUENT ITEMSET(candidate, minsup, PT , Size EDB )
2:
f requent ← 0/
3:
for all x in candidate do
4:
if Support estimation(PT, x, Size EDB ) ≥ minsup then
5:
f requent ← f requent ∪ {x}
6:
end if
7:
end for
8:
return f requent
9: end function
10: EI F F ← 0/
11: size ← 1
12: candidate ← candidate apriori gen(EDB , size)
/
13: While (candidate 6= 0)
14: f req ← Frequent itemset (candidate, minsup, PT, Size EDB )
15: size ← size + 1
16: EI F F ← EI F F ∪ f req
17: candidate ← candidate apriori gen(EDB , size, f req)
18: End While