Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Frequent Patterns from Correlated Incomplete Databases Badran Raddaoui1 and Ahmed Samet2 2 1 LIAS - ENSMA EA 6315, University of Poitiers, France University of Tunis El Manar, LIPAH Laboratory, Faculty of Sciences of Tunis, Tunisia [email protected], [email protected] Keywords: Imperfection, Evidential Database, Correlated Incomplete Database, Frequent Itemset Mining. Abstract: Modern real-world applications are forced to deal with inconsistent, unreliable and imprecise information. In this setting, considerable research efforts have been put into the field of caring for the intrinsic imprecision of the data. Indeed, several frameworks have been introduced to deal with imperfection such as probabilistic, fuzzy, possibilistic and evidential databases. In this paper, we present an alternative framework, called correlated incomplete database, to deal with information suffering with imprecision. In addition, correlated incomplete database is studied from a data mining point of view. Since, frequent itemset mining is one of the most fundamental problems in data mining, we propose an algorithm to extract frequent patterns from correlated incomplete databases. Our experiments demonstrate the effectiveness and scalability of our framework. 1 INTRODUCTION Uncertain information is commonplace in real-world data management domains. In recent years, uncertain data management has seen a revival in interest because of number of challenges in terms of collecting, modeling, representing, querying, indexing and mining the data. The study of uncertainty and incompleteness in databases has long been a growing interest of the community of database (Fuhr and Rölleke, 1997; Halpern, 1990; Imielinski and Jr., 1984). Recently, this interest has been rekindled by an increasing demand for managing large amounts of heterogeneous data, often incomplete and uncertain, emerging from data cleaning, scientific data management, information extraction, sensor data management, economic decision making, moving object management, market surveillance, etc. In particular, the incorporation of imprecise information is nowadays more and more recognized as being indispensable for industrial practice. Handling databases that suffer from imperfection is a scientific attractive discipline since the nineties of last century (Bell et al., 1996). The nature of real world data has led the community of database to develop new frameworks that handle imperfect information. Three types of imperfection have been developed, including imprecision, uncertainty and inconsistency. Both imprecision and uncertainty are largely studied in literature (Hewawasam et al., 2007; Lee, 1992a). More precisely, in (Dalvi and Suciu, 2007), the authors focus on query evaluation in traditional probabilistic databases; and ULDB (Benjelloun et al., 2006) supports uncertain data and data lineage in Trio (Widom, 2005). Recently, MayBMS uses the vertical World-Set representation of uncertain data (Olteanu et al., 2008). Note that the standard semantics adopted in most work is the possible worlds semantics (Zimányi, 1997). In addition, several uncertain frameworks such as probabilities, fuzzy set theory and more recently evidence theory are commonly used to model imprecise data (Samet et al., 2014b). Indeed, information under imprecision are modeled by sets, intervals, and fuzzy values (Chen and Weng, 2008). Furthermore, the lack of information is considered as a type of imprecision which refers to incompleteness. Incompleteness is a ubiquitous problem in practical data management. Indeed, in real-world applications, we may encounter such type of data in store database. To illustrate, let us consider a buyer who purchased two products p1 and p2 from a market. For instance, we may know just the information that the buyer bought the product p1 more than p2 (p1 > p2 ). Even the absence of information could be seen as some kind of incompleteness in incomplete databases. For example, the information if the buyer has bought the product p1 could not be available. The theoretical foundations for representing and querying incomplete information were laid by Imielinski and Lipski (Imielinski and Jr., 1984). To answer queries in the presence of incompleteness, Levy (Levy, 1996) suggested to look for certain answers: those that do not depend on the interpretation of unknown data, without requiring the completeness of other parts of the database. Later, in (Razniewski and Nutt, 2011) the authors developed techniques to conclude the completeness of query answers from information about the completeness of parts of an incomplete database. Recently, a new line of research has been established, which became known under the name Data Mining. The problem of mining frequent itemsets is well-known and essential in data mining, knowledge discovery and data analysis. It has applications in various fields and becomes fundamental for data analysis as datasets and datastores that are becoming very large. In this paper, we propose an alternative framework for handling incompleteness and correlation between attributes within incomplete databases. The new framework, called correlated incomplete database, will be tackled from the data mining perspective. Specifically, the presence of elements in such database is represented relatively to another one without any prior indication about their quantity value. Indeed, information are represented using some kind of correlations/dependencies between elements of the given database. Further, the absence of information is considered in this new database mode, i.e., the value of an attribute is not known for sure. In addition, we provide a new database modelling and a mining procedure to handle such database. For this, the correlated incomplete database is transformed into an evidential database using Ben Yaghlane’s axioms (Yaghlane et al., 2006). Next, we propose an algorithm to mine answers over the obtained evidential database. In addition, our empirical evaluation are conducted on a synthetic correlated incomplete database obtained from student answers to a questionnaire. The rest of this paper proceeds as follows. In Section 2, we recall basic notions of evidence theory. Then, we present the concept of evidential database. In Section 3, we introduce a new kind of incomplete database called correlated incomplete database which its impreciseness is due to, inter alia, the lack of information. Section 4 shows the method to transform a correlated incomplete database into an evidential one using Ben Yaghlane axioms. Then, we propose in Section 5 a new itemset mining algorithm in context of correlated incomplete databases. Section 6 deals with the implementation and empirical evaluation. Finally, we summarize the paper and we sketch issues of future work. 2 FORMAL SETTING AND NOTATION In this section, we briefly review evidence theory, also known as belief functions theory or Dempster-Shafer theory, and we extend it to introduce the basic concepts of evidential databases (Lee, 1992b). 2.1 EVIDENCE THEORY The evidence theory (Dempster, 1967) becomes more and more popular. It is a simple and flexible framework for dealing with imperfect information. It generalizes the probabilistic framework by its capacity to model the total and partial ignorance. Moreover, it is a powerful tool for combining data. Different interpretations have been studied from evidence theory such as (Dempster, 1967; Gärdenfors, 1983; Smets and Kennes, 1994). One of the most used is the Transferable Belief Model (TBM) proposed by Smets (Smets and Kennes, 1994) to represent quantified beliefs. The TBM model is a non probabilistic interpretation of the evidence theory relying on two distinct levels: (i) a credal level where beliefs are entertained and quantified by belief functions; (ii) a pignistic level where beliefs can be used to make decisions and are quantified by probability functions. The evidence theory is based on several fundamentals such as the Basic Belief Assignment (BBA). A BBA m is a mapping from elements of the power set 2Θ onto [0, 1]: m : 2Θ −→ [0, 1] where Θ is the frame of discernment. It is the set of possible answers for a treated problem and is composed of N exhaustive and exclusive hypotheses: Θ = {H1 , H2 , ..., HN }. A BBA m do have several constraints such as: ∑ m(A) = 1 X⊆Θ (1) m(0) / ≥0 Each subset X of 2Θ fulfilling m(X) > 0 is called / = 0 is the norfocal element. Constraining m(0) malized form of a BBA and this corresponds to a closed-world assumption (Smets, 1988), while allow/ ≥ 0 corresponds to an open-world assumping m(0) tion (Smets and Kennes, 1994). In the spirit of BBA, other functions are commonly introduced from 2Θ to [0, 1]: the first one, called the belief function, is interpreted as the degree of justified support assigned to a proposition A by the available evidence. Formally, it is defined as: Bel(A) = ∑ 06/ =B⊆A m(B) (2) On the other hand, the plausibility function, denoted Pl(.), is defined as follows: Pl(A) = ∑ m(B) Example 1. From Table 1, A1 is an item and {ΘA B1 } is an itemset such that A1 ⊂ {ΘA B1 } and A1 ∩ {ΘA B1 } = A1 . (3) B∩A6=0/ The plausibility expresses the maximum potential support that could be given to a hypothesis, if further evidence becomes available. 3 2.2 In this section, we present a new kind of imperfect database called correlated incomplete database. A correlated incomplete database is an imprecise database such that the imprecision refers to the lack of the information. EVIDENTIAL DATABASE CORRELATED INCOMPLETE DATABASE An evidential database stores data that could be perfect or imperfect. Data’s imperfection in such database is expressed via the evidence theory. Formally, an evidential database, denoted by EDB , is populated by two elements: n columns and d lines where each column i (1 ≤ i ≤ n) has a domain Θi of discrete values. Cell of line j and column i contains a normalized BBA as follows: Definition 1. A correlated incomplete database is a triple C I DB = (O , I , R̃ ) where: mi j : 2Θi → [0, 1] with mi j (0) / =0 ∑ mi j (A) = 1 Definition 2. Let C I DB = (O , I , R̃ ) be a correlated incomplete database. For two items p1 , p2 ∈ I , we define the following operators as follows: (4) A⊆Θi That is, such kind of representation makes from the evidential database one of the largest formalism to capture any other kind of database (Samet et al., 2014b). Transaction T1 Attribute A m11 (A1 ) = 0.7 m11 (ΘA ) = 0.3 T2 m12 (A2 ) = 0.3 m12 (ΘA ) = 0.7 Attribute B m21 (B1 ) = 0.4 m21 (B2 ) = 0.2 m21 (ΘB ) = 0.4 m(B1 )22 = 1 Table 1: Evidential transaction database EDB In an evidential database, as shown in Table 1, an evidential item corresponds to a focal element. Thus, an evidential itemset corresponds to a conjunction of focal elements having different domains. Two different evidential itemsets can be related via the inclusion or intersection operator. Indeed, the inclusion operator (Samet et al., 2013) for evidential itemsets is defined as follows: Let X and Y are two evidential itemsets, then X ⊆ Y ⇐⇒ ∀xi ∈ X, xi ⊆ y j where xi and y j are, respectively, the ith and the jth elements of X and Y . For the same evidential itemsets X and Y , the intersection operator (Samet et al., 2013) is defined as follows: X ∩Y = Z ⇐⇒ ∀zk ∈ Z, zk ⊆ xi and zk ⊆ y j • O is the set of objects (e.g., transactions), • I is the set of items, • R̃ describes the existence relation of an item to a transaction. • p1 p2 means that the existence of p1 depends on the existence of p2 [Dependency] • p1 ∼ p2 means that p1 is quasi-equal to p2 w.r.t quantity [Quasi-equality] • − denotes the absence of information about an item [Deficiency] Intuitively, the operator expresses that p1 and p2 are highly correlated with each other. The operator ∼ expresses the quasi-equality between two items without any information about their initial quantity. Finally, the operator − says that the value of an item is not known for sure, i.e., we know nothing about the items. Property 1. Let C I DB = (O , I , R̃ ) be a correlated incomplete database and T ∈ O . Then, C I DB is symmetric if for all items p1 , p2 in T , we have: R̃ (p1 , T ) = p1 rel p2 ⇒ R̃ (p2 , T ) = p1 rel p2 (5) Example 2. Let us consider the correlated incomplete database depicted by Table 2. Then, • In T1 , p1 p3 means that the client has bought more p1 than p2 without any further indication about the quantity of products. • In T2 , p2 ∼ p3 means that the client has bought p2 and p3 with a similar quantity. • R̃ (p2 , T1 ) = − signifies that we know nothing about p2 in the transaction T1 . Transaction T1 T2 p1 p1 p3 − p2 − p2 ∼ p3 4.2 p3 p1 p3 p2 ∼ p3 Table 2: Example of a correlated incomplete database 4 4.1 FROM CORRELATED INCOMPLETE DATABASE TO EVIDENTIAL ONE BEN YAGHLANE AXIOMS TRANSFORMATION The problem of eliciting qualitatively expert opinions and generating basic belief assignments has been addressed by many researchers (Ennaceur et al., 2014). In this subsection, we provide an overview of Ben Yaghlane et al. approach (Yaghlane et al., 2006). The authors proposed a method for generating optimized belief functions from qualitative preferences. The aim of this method is then to convert preference relations into constraints of an optimization problem whose resolution, according to some uncertainty measures (UM), allows the generation of the least informative or the most uncertain belief functions defined as follows: a b ⇒ Bel(a) − Bel(b) ≥ ε OUR APPROACH Mining frequent itemsets directly from correlated incomplete databases (see Table 2) is a difficult task. The database contains information only about the existence of an item relatively to another rather than its frequency. In addition, item’s quantity in each record is not required. This constraint prevents the straightforward use of any usual mining methods. In this subsection, we introduce a new method for correlated incomplete database transformation to obtain a classical and treatable database. Note that he correlated incomplete database is transformed into an evidential one for data mining perspective. The transformation process is made thanks to Ben Yaghlane’s axioms. Recall that the axioms of Equation 6 and 7 were firstly introduced to express expert’s preferences. However, they can be used to express numerical superiority between items. So, given a correlated incomplete database C I DB and two items p1 and p2 such that p1 p2 , we need to interpret this proposition differently from an evidential point of view. Two BBAs can be constructed. The first BBA refers to the p1 column in the database and answers the question ”Does the client buy the product p1 ?”. Its frame of discernment Θ1 is constituted by two elements yes and no such as Θ1 = {y, n}. The second BBA answers the same question but relatively to the item p2 . More generally, we have: (6) a b ⇒ Bel(ya ) − Bel(yb ) ≥ ε a ∼ b ⇒ |Bel(a) − Bel(b)| ≤ ε (7) where ε is considered to be the smallest gap that the expert may discern between the degrees of belief in two propositions a and b. Note that ε is a constant specified by the expert before beginning the optimization process. Ben Yaghlane et al. developed a method that requires that propositions be represented in terms of focal elements, and they assume that Θ should always be considered as a potential focal element. Then, a mono-objective technique was used to solve such constrained optimization problem: maxm UM(m) s.t. Bel(a) − Bel(b) ≥ ε Bel(a) − Bel(b) ≤ ε Bel(a) − Bel(b) ≥ −ε / =0 ∑ m(a) = 1, m(a) ≥ 0, ∀a ⊆ Θ; m(0) a∈F(m) (8) This assertion is reasonable since we do not have any information about the item’s frequency, but only the existence of items and the dependency between them in the database. The result is a BBA as described in the following example: m(ya ) = v m(yb ) = v − ε m(na ) = 0 m(nb ) = 0 m(Θ ) = 1 − v m(Θ ) = 1 − v + ε a b where v is a real value such as 0 < v < 1. Fixing v to a low value leads to a BBA less informative. The ε value is chosen by the expert depending on the gap between the beliefs of the items. Despite the fact that Equation 8 was initially introduced as a property for a constructed BBA, it can be extended to assimilate two BBAs. Note, when a b two identical BBAs can be constructed as follows: m(y) = a (9) m(n) = 0 m(Θ) = 1 − v Finally, when the value of an item is not known for sure, a vacuous BBA is constructed as follows: m(Θ) = 1 (10) Using the previous transformation steps, the obtained evidential database has the same size as the original correlated incomplete database. Now, retrieving the value of the support from that database can be done with the precise function (Samet et al., 2014a). More precisely, given an item xi , the precise value is computed as follows: Pr(xi ) = |xi ∩ x| × m(x) x⊆Θi |x| ∑ ∀xi ∈ 2Θi . (11) Thus, the support of an itemset X is found in the obtained evidential database EDB as follows: SupPr T j (X) = Pr(Xi ) (12) ∑ SupportTPrj (X). (13) ∏ Xi ∈θi ,i∈[1...n] SupEDB (X) = 1 d d Require: ε, C I DB Ensure: EDB 1: function GENERATE BBA(C I DB , ε,T ) 2: for all i in Column C I DB do 3: if C I DB (T, i) = then 4: BBA ← construct BBA(ε) \ \Bel(a) − Bel(b) ≤ ε 5: end if 6: if C I DB (T, i) = ∼ then 7: BBA ← construct BBA(ε) \ \|Bel(a) − Bel(b)| ≥ ε 8: end if 9: end for 10: return BBA 11: end function 12: for all T in Size C I DB do 13: BBA ← generate BBA(C I DB , ε, T ) 14: EDB (T, 1) ← BBA 15: end for j=1 The Table 3 illustrates the transformation process of the correlated incomplete database given previously in Table 2. The computed BBA belongs to the frame of discernment of all items Θ with ε = 0.05 and v = 0.2. It should be noted that even there is a lack information about the item p2 in the transaction T1 (see table 2), p2 possess a reasonable support. Indeed, the lack of information does not signify the non existence of the item. In our case, p2 has a support equal to : 1 m(Θ2 ) + m(y2 ) + 12 m(Θ2 ) Sup(p2 ) = SupEDB (y2 ) = 2 2 = 0.55 In the following, we detail the procedure of transforming a correlated incomplete database to an evidential one. Algorithm 1 provides the evidential transformation through Ben Yaghlane’s axioms. Given a correlated incomplete database C I DB , the function generate BBA(C I DB , ε, T ) allows to construct a BBA for the transaction T in C I DB for a fixed ε. The computed BBA verifies all constraints in the columns of the considered transaction. The computed BBA is then inserted in the evidential database EDB . This process is repeated for all transactions of the database C I DB . 5 Algorithm 1 Correlated Incomplete Database Transformation Algorithm FREQUENT ITEMSET MINING In this section, correlated incomplete databases are studied from a data mining point of view. Since, frequent itemset mining is one of the most fundamental problems in data mining, we present in the following an algorithm to extract frequent pattern from correlated incomplete databases. This can be done by using the evidential database obtained by the Algorithm 1. So, Algorithm 2 describes the process of mining frequent patterns from an evidential database. The proposed algorithm, called EDMA, is a level-wise approach for determining all frequent patterns from the evidential database EDB . 6 EMPIRICAL EVALUATION In this section, we discuss some experimental results for mining frequent itemset from correlated incomplete databases. Even if such kind of incomplete database has not been discussed yet in the literature, we can provide several sort of applications since the uncertain events are naturally highly correlated with each other. Then, one can consider the case of a Market Basket Analysis (MBA) problem where we lack information about the quantity of each bought item. The deficiency of knowledge and the quantity-wise relative to the comparison among items are ubiquitous in these kind of data. In the following, we propose to study a database constructed from a questionnaire given to the students of the University of Littoral Cote d’Opale. The questionnaire is about the grade obtained last year in 12 subjects. Since, most of students have forgotten their grades, their answers Transaction T1 p1 m11 (y1 ) = 0.20 m11 (n1 ) = 0.00 m (Θ ) = 0.80 11 1 m12 (Θ1 ) = 1.00 T2 p2 m21 (Θ2 ) = 1.00 m22 (y2 ) = 0.20 m22 (n2 ) = 0.00 m (Θ ) = 0.80 22 2 p3 m31 (y3 ) = 0.15 m31 (n3 ) = 0 m (Θ ) = 0.85 31 3 m32 (y3 ) = 0.20 m32 (n3 ) = 0.00 m (Θ ) = 0.80 32 3 Table 3: The evidential database obtained from the table 2 ·104 1 3 # Frequent patterns Precise-based support Belief-based support 2 Execution time (s) ·105 0 Precise-based support Belief-based support 2 1 0 100 5000 20000 50000 0.05 Database size should be described using a correlated incomplete database. In particular, the students mark the dependency of a grade relatively to another one or nothing in the case they do not remember. About 312 response forms were collected, 6 of them were rejected (a few students did not understand the task and their responses were omitted). The obtained database is depicted in Table 4. The correlated incomplete database constructed from students’ answers is transformed into an evidential database following the described procedure in Algorithm 1. Since each proposed data mining algorithm must stand out performance-wise and quality-wise, this means the provided EDMA algorithm associated to the transformation module must extract the maximal frequent patterns in a reasonable time. Therefore, the original correlated incomplete database is extended by data duplication. We compared the data mining task using the precise-based support and the belief-based one (Hewawasam et al., 2007). Figure 1 represents the evolution of frequent patterns relatively to the increase of the size of the database. Figure 2 represents the number of frequent patterns extracted from the obtained evidential database relatively to the fixed value of the minsup. We can 0.4 minsup Figure 2: Number of extracted frequent patterns relatively to minsup ·105 1.5 # Valid rules Figure 1: Extraction time relatively to the database size 0.2 Precise-based confidence Belief-based confidence 1 0.5 0 0.5 0.7 0.9 minconf Figure 3: Number of extracted valid association rules relatively to minconf see that the number of frequent patterns is 22967 elements for a minsup fixed to 0.5 and retrieved with the precise-based support. This important number of patterns may be explained by the size of the database having 12 attributes each one contains two elements within the frame of discernment. As a result, the treated database is similar to a 36 columns database. The itemsets composed by only the yi (i ∈ [1, 12]) items are the most interesting since they reflect the subjects where students got the best grades. Thus, re- Subject S1 Programming Programming DB S2 DB DB Network Programming DB - S3 DB ∼ Programming Programming OS DB ∼ Programming Programming OS ··· ··· ··· ··· ··· ··· ··· OS - Network DB Network OS ∼ Network Programming OS Programming OS OS ∼ Network - Table 4: A sample of the correlated database provided by students’ responses trieving frequent patterns of the initial correlated incomplete database is possible by reducing frequent evidential patterns to those containing only the yi items. In addition, comparatively to the belief based support, we discover more hidden pattern with the precise support since the belief support. Figure 3 shows the number of valid association rules retrieved from the evidential database. Each pattern of size k gives 2k − 2 different association rules. Therefore, the precise-based confidence measure provides the highest number of valid association rules comparatively to the belief-based one. Those rules show the correlation between grades. In fact, we may have a rule of the form if a student got a good grade in DB, then he should have a good grade in programming. In addition, Figure 1 highlights the scalability of the precisebased and the belief-based support. The minsup is fixed to 0.5 and the database is extended by duplicating data. The curb shows that precise-based support is more expensive. This can be explained by two reasons. The first one is that the precise-based support generates more frequent candidates. That is, the more candidates the algorithm handles, the more support it computes. In addition, the precise-based support relies on set intersection computing and therefore more time is consumed than the belief-based one. lyzed by experiments in a synthetic datasets, and the experiments confirm that our algorithm and strategy has ideal performance. Our future work will investigate how to further study and implement our approach using real-world data. In particular, it will be interesting to consider openly available knowledge bases such as DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Another interesting direction is how our mining method can be extended for mining sequential patterns, max patterns, and partial periodicity. REFERENCES Bell, D. A., Guan, J., and Lee, S. K. (1996). Generalized union and project operations for pooling uncertain and imprecise information. Data & Knowledge Engineering, 18(2):89–117. Benjelloun, O., Sarma, A. D., Halevy, A. Y., and Widom, J. (2006). ULDBs: Databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pages 953–964. Chen, Y. and Weng, C. (2008). Mining association rules from imprecise ordinal data. Fuzzy Set Syst, 159(4):460–474. Dalvi, N. N. and Suciu, D. (2007). Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523– 544. 7 Conclusion In this paper, we have proposed a new type of imprecise database called correlated incomplete database. More precisely, an membership between an item and a given transaction is expressed relatively to another item, making use of dependency and similarity operators. In addition, the deficiency of information is supported maintaining the consistency with the incompleteness definition in literature. Then, we have shown how a given correlated incomplete database can be transformed to an evidential one using Ben Yaghlane’s axioms. We have also presented an algorithm of frequent patterns mining from evidential database with the use of the precise support. Furthermore, the effectiveness of our approach is ana- Dempster, A. (1967). Upper and lower probabilities induced by multivalued mapping. AMS-38. Ennaceur, A., Elouedi, Z., and Lefevre, E. (2014). Multicriteria decision making method with belief preference relations. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 22(4):573– 590. Fuhr, N. and Rölleke, T. (1997). A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1):32–66. Gärdenfors, P. (1983). Probabilistic reasoning and evidentiary value. In Evidentiary Value: Philosophical, Judicial, and Psychological Aspects of a Theory: Essays Dedicated to Sören Halldén on His Sixtieth Birthday. C.W.K. Gleerups. Halpern, J. Y. (1990). An analysis of first-order logics of probability. Artif. Intell., 46(3):311–350. Hewawasam, K. R., Premaratne, K., and Shyu, M.-L. (2007). Rule mining and classification in a situation assessment application: A belief-theoretic approach for handling data imperfections. Trans. Sys. Man Cyber. Part B, 37(6):1446–1459. Imielinski, T. and Jr., W. L. (1984). Incomplete information in relational databases. J. ACM, 31(4):761–791. Lee, S. (1992a). An extended relational database model for uncertain and imprecise information. In Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, British Columbia, Canada, pages 211–220. Lee, S. (1992b). Imprecise and uncertain information in databases: an evidential approach. In Proceedings of Eighth International Conference on Data Engineering, Tempe, AZ, pages 614–621. Levy, A. Y. (1996). Obtaining complete answers from incomplete databases. In Proceedings of 22th International Conference on Very Large Data Bases, Mumbai (Bombay), India, pages 402–412. Olteanu, D., Koch, C., and Antova, L. (2008). Worldset decompositions: Expressiveness and efficient algorithms. Theor. Comput. Sci., 403(2-3):265–284. Razniewski, S. and Nutt, W. (2011). Completeness of queries over incomplete databases. In Proceeding of the 37th internation conference on Very Large Data Bases, The Westin, Seattle, WA, 4(11):749–760. Samet, A., Lefevre, E., and Ben Yahia, S. (2013). Mining frequent itemsets in evidential database. In Proceedings of the fifth International Conference on Knowledge and Systems Engeneering, Hanoi, Vietnam, pages 377–388. Samet, A., Lefevre, E., and Ben Yahia, S. (2014a). Classification with evidential associative rules. In Proceedings of 15th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Montpellier, France, pages 25–35. Samet, A., Lefevre, E., and Ben Yahia, S. (2014b). Evidential database: a new generalization of databases? In Proceedings of 3rd International Conference on Belief Functions, Belief 2014, Oxford, UK, pages 105–114. Smets, P. (1988). Belief functions. in Non Standard Logics for Automated Reasoning , P. Smets, A. Mamdani, D. Dubois, and H. Prade, Eds. London,U.K: Academic, pages 253–286. Smets, P. and Kennes, R. (1994). The Transferable Belief Model. Artificial Intelligence, 66(2):191–234. Widom, J. (2005). Trio: A system for integrated management of data, accuracy, and lineage. In Proceeding of the Second Biennial Conference on Innovative Data Systems Research (CIDR ’05), Pacific Grove, California, pages 262–276. Yaghlane, A. B., Denoeux, T., and Mellouli, K. (2006). Constructing belief functions from qualitative expert opinions. In Proceeding of the second International Conference on Information and Communication Technologies, ICTTA, 1:1363–1368. Zimányi, E. (1997). Query evaluation in probabilistic relational databases. Theor. Comput. Sci., 171(1-2):179– 219. Algorithm 2 Evidential Data Mining Apriori (EDMA) algorithm Require: EDB , minsup, PT, Size EDB Ensure: EI F F 1: function F REQUENT ITEMSET(candidate, minsup, PT , Size EDB ) 2: f requent ← 0/ 3: for all x in candidate do 4: if Support estimation(PT, x, Size EDB ) ≥ minsup then 5: f requent ← f requent ∪ {x} 6: end if 7: end for 8: return f requent 9: end function 10: EI F F ← 0/ 11: size ← 1 12: candidate ← candidate apriori gen(EDB , size) / 13: While (candidate 6= 0) 14: f req ← Frequent itemset (candidate, minsup, PT, Size EDB ) 15: size ← size + 1 16: EI F F ← EI F F ∪ f req 17: candidate ← candidate apriori gen(EDB , size, f req) 18: End While