Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive model based on the evidence theory for assessing Critical Micelle Concentration property Ahmed SametB1 , Théophile Gaudin2 , Huiling Lu2,3 , Anne Wadouachi3 , Gwladys Pourceau3 , Elisabeth Van Hecke2 , Isabelle Pezron2 , Karim El Kirat1 , and Tien-Tuan Dao1 1 Sorbonne University, Université de technologie de Compiègne CNRS, UMR 7338 Biomechanics and Bioengineering 2 Sorbonne University, Université de technologie de Compiègne EA 4297 Transformations Intégrées de la Matière Renouvelable 3 Université de Picardie Jules Verne CNRS, FRE 3517 Laboratoire de Glycochimie, des Antimicrobiens et des Agroressources ?? Abstract. In this paper, we introduce an uncertain data mining driven model for knowledge discovery in chemical database. We aim at discovering relationship between molecule characteristics and properties using uncertain data mining tools. In fact, we intend to predict the Critical Micelle Concentration (CMC) property based on a molecule characteristics. To do so, we develop a likelihood-based belief function modelling approach to construct evidential database. Then, a mining process is developed to discover valid association rules. The prediction is performed using association rule fusion technique. Experiments were conducted using a real-world chemical databases. Performance analysis showed a better prediction outcome for our proposed approach in comparison with several literature-based methods. Keywords: Evidential data mining, Chemical database, Association rule, Associative classifier. 1 Introduction Data mining is generally held to be generically a discipline of the field of Knowledge Discovery, or Knowledge Discovery in Databases (KDD). It is usually defined as the process of identifying valid, novel, potentially useful, and ultimately understandable patterns from large collections of data. Then, causal rules are derived from those patterns. Frequent patterns and valid rules can be used to test hypotheses (or verification goals) or to autonomously find entirely new patterns (discovery goals) [1]. Discovery goals could be predictive (requiring predictions ?? B [email protected] 2 Ahmed Samet et al. to be made using the data in the database) [2]. On the other hand, there has been an explosion in the availability of publicly accessible chemical information, including chemical structures of small molecules, structure-derived properties and associated biological activities in a variety of assays [3,4]. These data sources provide a significant opportunity to develop and apply computational tools to extract and understand the underlying structure-activity relationships. These techniques remain sensitive to the presence of imperfect data [5]. Recent years, we have noticed the emergence of uncertain data mining tools [6,7,8] that contribute to seek hidden pertinent information under the presence of uncertainty and imprecision. However, to the best of our knowledge, uncertain data mining tools have not yet been used to discover pertinent knowledges neither to predict in chemical databases. In this work, we are interested in evidential data mining in chemical databases. The latter provides a generalizing framework for probabilistic and binary data mining disciplines [9]. Recently, mining over evidential databases has flourished by several contributions and the introduction of new support and confidence measures [10,11]. In addition, it has been applied on several fields such as healthcare [12], cheminformatics [13], etc. From methodological point of view, we intend to apply an uncertain data mining-driven approach to analyze a chemical database. The chemical database contains records of amphiphilic molecules 4 . We aim at predicting physicochemical properties of a new molecule from their structural characteristics. The imprecision within the data is modelled using evidence theory. Methodologically, we transform the chemical database into an evidential database with a likelihood-based approach. Once the imprecision examined, valid association rules are selected and used for the prediction of Critical Micelle Concentration (CMC) property. This paper is organized as follows: in Section 2, the state-of-the-art works of evidential data mining are briefly recalled. In Section 3, we introduce our uncertain data mining driven approach. A new likelihood-based model for imprecision consideration is presented. The performance of our proposed approach was studied on a real-world chemical database in Section 4. Finally, we conclude and sketch potential issues for the future work. 2 Preliminaries 2.1 Evidential database An evidential database stores either uncertain and imprecise data[14] via the evidence theory. An evidential database, denoted by EDB, with n columns and d rows where each column i (1 ≤ i ≤ n) has a domain Θi of discrete values. Each 4 An amphiphilic molecule is chemical compound possessing both hydrophilic (waterloving, polar) and lipophilic (fat-loving) properties Critical Micelle Concentration property prediction with Evidence Theory 3 cell of a row j and a column i contains a normalized Belief Basic Assignment (BBA) mij : 2θi → [0, 1] as follows: mij (∅) = 0 P (1) mij (A) = 1. A⊆θi An item corresponds to a focal element5 . Two different itemsets (a.k.a patterns) can be related via either the inclusion or the intersection operator. Indeed, the inclusion operator for evidential itemsets [11] is defined as follows, where X and Y are two evidential itemsets: X ⊆ Y ⇐⇒ ∀xi ∈ X, xi ⊆ yj (2) xi and yj are respectively the ith and the j th element of X and Y . For the same evidential itemsets X and Y , the intersection operator is defined as follows: X ∩ Y = Z ⇐⇒ ∀zk ∈ Z, zk ⊆ xi and zk ⊆ yj . (3) An evidential association rule R is a causal relationship between two itemsets that can be written in the following form R : X → Y such that X ∩ Y = ∅. Example 1. We aim at developing a predictive model for a chemical database. The evidential database records information about several molecules. Table 1 shows an example of an evidential database. Table 1: Evidential database EDB Molecule M1 M2 Head Family (HF)? m11 (Glucose) = 1.0 Carbon Number (Nc )? CMC? m21 (7) = 0.9 m31 (12) = 0.8 m21 (7 ∪ 8) = 0.1 m31 (12 ∪ 50) = 0.2 m12 (Glucosamine) = 1.0 m22 (8) = 0.8 m32 (12) = 0.7 m22 (8 ∪ 10) = 0.2 m32 (0.2 ∪ 12) = 0.3 The first transaction means that the molecule M1 is a Glucose head family type of the frame of discernment ΘHF = {Glucose, Glucosamine}. The second attribute reflects the Critical Micelle Concentration (CMC) discretized in the following frame of discernment ΘCM C = {0.2, 12, 50}. M1 has a CMC close to 12 mM and could be some doubt if it belongs to around 50 millimolar (mM) CMC class. In Table 1, {HF = Glucose} is an item and {HF = Glucose} × {CM C = 12∪50} is an itemset such that {HF = Glucose} ⊂ {HF = Glucose}×{CM C = 12 ∪ 50} and {HF = Glucosamine} ∩ {HF = Glucosamine} × {CM C = 0.2 ∪ 12} = {HF = Glucosamine}. {Nc = 8} → {CM C = 12} is an association rule. In the following subsection, we recall the definition of belief-based, precise-based support and confidence measures that estimate the pertinence of patterns and association rules. 5 Each subset A of 2Θ , fulfilling m(A) > 0, is called a focal element. 4 2.2 Ahmed Samet et al. Support and confidence measures As is the case for probabilistic data mining [8], the support within the evidential context is based on expectation. Two support family approaches were proposed. The first support measure was proposed by [11] and called the belief-based support measure. It is considered as the lower bound for the support. It is written as follows: Y Y SupBel SupBel Bel(xi ). (4) Tj (X) = Tj (xi ) = i∈[1...n] i∈[1...n] Thus, the belief-based support in the entire database is computed as follows: d SupBel EDB (X) = 1X SupBel Tj (X). d j=1 (5) Since the belief-based support is a lower estimation of the support, it is obvious that in some cases that an itemset I could have a higher support value. Another measure was introduced by Samet et al.[15] that provides a medium estimation. The precise measure P r is defined by: X |xi ∩ x| × mij (x) ∀xi ∈ 2Θi . (6) |x| x⊆Θi Q The evidential support of an itemset X = xi in the transaction Tj (i.e., P r(xi ) = i∈[1...n] P rTj ) is then computed as follows: Y P rTj (X) = P r(xi ). (7) xi ∈Θi ,i∈[1...n] Thus, the support SupEDB of the itemset X becomes: d SupEDB (X) = 1X P rTj (X). d j=1 (8) A new metric for confidence computing based on the precise-based support measure is introduced in [10]. For an association rule R : Ra → Rc , the confidence is computed as follows: d P Conf (R) = SupTj (Ra ) × SupTj (Rc ) j=1 d P . (9) SupTj (Ra ) j=1 Example 2. We consider the same problem described in Example 1. The precise support of the itemset {HF = Glucose} × {Nc = 7} in the evidential database Critical Micelle Concentration property prediction with Evidence Theory 5 is equal to 1×0.95+0 = 0.475. It is superior to the one computed with the belief2 = 0.45. Finally, the association rule based support which is equal to 1×0.9+0 2 = 0.85 as confidence in the {Nc = 8} → {CM C = 12} has 0.05×0.9+0.9×0.85 0.05+0.9 evidential database. 3 Uncertain data mining approach for CMC prediction In the following, we introduce our uncertain data mining approach for amphiphilic molecule’s CMC prediction. The provided approach, shown in Figure 1 consists in three stages. Imprecision within the raw database is processed when evidential database is constructed upon the use of likelihood modelling approach. Then, frequent patterns and valid association rules are retrieved with EDMA mining algorithm [10]. The selected valid association rules are then used to compute the CMC of an amphiphilic molecule. Pre-processing DB Likelihood database modelling Mining process EDMA Associative classification EvAC Fig. 1: Evidential data mining based model for the prediction of CMC property 3.1 Likelihood modelling for input data Let us assume the class-conditional probability densities f (x|ωi ) to be known. Having observed x, the likelihood function is a function from Θ to [0, +∞) defined as L(ωi |x) = f (x|ωi ), for all k ∈ {1, . . . , K}. Shafer [16] proposed to derive from L a belief function on Θ defined by its plausibility function. Starting from axiomatic requirements, Appriou [17] proposed another method based on the construction of I belief functions mi (.). The idea consists in taking into account separately each class and evaluating the degree of belief given to each of them. In this case, the focal elements of each BBA mi are the singleton {ωi }, its complement ωi , and Θ. This model, hereafter referred to as the Separable Likelihood-based (SLB) method, has the following expression: mi ({ωi }) = 0 (10) mi (ωi ) = αi · {1 − R · L(ωi |x)} mi (Θ) = 1 − αi · {1 − R · L(ωi |x)}. 6 Ahmed Samet et al. where αi is a coefficient that can be used to model external information such as reliability, and R is a normalizing constant that can take any value in the range (0, (maxi (L(ωi |x)))−1 ]. A second model satisfying the axiomatic requirements [17] can be written as follows: αi ·R·L(ωi |x) mi ({ωi }) = 1+R·L(ωi |x) αi mi (ωi ) = 1+R·L(ω i |x) mi (Θ) = 1 − αi (11) Both BBA models could be used to consider imprecision within the data. However, we retain the second BBA model since it is more informative. Frequency 15 10 5 0 0 0.002 0.4 4.5 14.40 50 CMC (mM) Fig. 2: Frequency of appearance histogram of amphiphilic molecule’s CMC Now, we intend to model each molecule x with a BBA that expresses its membership to the CMC classes. To do so, we distinguish between two types of data (i.e., proprieties and descriptors of molecule) in the database : categoric and numeric data. A categoric data, such as the head family in Table 1, are represented by certain BBA. A BBA is called a certain BBA when it has one focal element, which is a singleton. It is representative of perfect knowledge and the absolute certainty. The numeric data are transformed into a BBA using the likelihood model. In fact, from the histogram of frequency in Figure 2, we construct a probability density function (pdf). A pdf is constructed on each CMC peak. The computed pdf corresponds to the class-conditional probability density. Five class-conditional probability density functions are constructed from Figure 2. A class-conditional pdf is computed for the following CMC classes: CMC around 0.002, 0.4, 4.5, 14.40 and 50 mM. Once the class-conditional probability functions are constructed, we compute the BBA that expresses the membership of x to each CMC class following the model detailed in Equation (10). The computed BBAs are then combined to obtain the final BBA that expresses the membership of x to all CMC classes: m = ⊕i∈I mi (12) Critical Micelle Concentration property prediction with Evidence Theory 7 where ⊕ is the Dempster’s rule of combination between two BBAs which is defined as: (m1 ⊕ m2 )(A) = m⊕ (A) = 1 1 − m(∅) X m1 (B) × m2 (C) ∀A ⊆ Θ, A 6= ∅ B∩C=A (13) where m(∅) represents the conflict mass between m1 and m2 , is defined as: m(∅) = X m1 (B) × m2 (C). (14) B∩C=∅ Once the evidential database is constructed, we mine frequent patterns and valid association rules. The methodology for patterns and association rules mining over evidential amphiphilic molecules database is detailed in the following subsection. 3.2 Data mining predictive-based model for amphiphile molecules In the following, we detail the evidential data mining process to predict the CMC property of a molecule based on its structural characteristics. Once the evidential database is constructed with the procedure described in subsection 3.1, we mine frequent patterns and valid association rules. A pattern is called frequent (resp. valid for association rules) if its computed support (resp. confidence) is higher than or equal to a fixed threshold minsup (resp. minconf ). In our model, we use EDMA algorithm [10] to retrieve frequent patterns and valid association rules from the evidential database. EDMA generates patterns having a precise support higher than minsup in a level-wise manner. The valid association rules are then deduced from frequent patterns. Each pattern of size k gives 2k − 2 different association rules. The retrieved association rules are of help for predicting the CMC value of a molecule. To do so, from the set of all valid association rules R, we retain only those that have a CMC item within the conclusion part such that: RI = {R : Ra → Rc ∈ R|∃y ∈ ΘCM C , y ∈ RC }. (15) The set RI represents the set of all prediction rules. They are the input of the EvAC algorithm (see Algorithm 1). EvAC is an associative classifier algorithm that fuses interesting association rules for prediction purposes. Indeed, EvAC algorithm classifies the data with fusion techniques and F ILT RAT E LARGE P REM ISE(.) function (line1) allows to filtrate the rules and to retain only those with the largest premise, having intersection with the under classification instance X. In fact, the set of the largest premise rules Rlarge are more precise than those with the shortest premise. Once found, they are considered as independent sources and are combined (line 2 to 4). The fusion is operated on association rules modelled into BBAs with Dempster’s rule of combination (see Equation (13)). The function argmax in line 5 allows the retention 8 Ahmed Samet et al. Algorithm 1 Evidential Associative Classification (EvAC) algorithm Require: R, X, ΘC Ensure: Class 1: Rlarge ← F ILT RAT E LARGE P REM ISE(R, X, ΘC ) 2: for all r ( ∈ Rlarge do m({r.conclusion}) = conf (r) 3: m← m(ΘC ) = 1 − conf (r) 4: m⊕ ← m⊕ ⊕ m 5: Class ← argmaxHk ∈ΘC BetP (Hk ) 6: function filtrate large premise(R, X, ΘC ) 7: max ← 0 8: for all r ∈ R do 9: if r.conclusion ∈ ΘC & X ∩ r.premise 6= ∅ then 10: if size(r.premise) > max then 11: Rlarge ← {r} 12: max ← size(r.premise) 13: else 14: if size(r.premise) = max then 15: Rlarge ← Rlarge ∪ {r} 16: return Rlarge of the hypothesis that maximizes the pignistic probability which is computed as follows: BetP (Hn ) = X A⊆Θ |Hn ∩ A| × m(A) |A| (1 − m(∅)) ∀Hn ∈ Θ. (16) Example 3. Let us assume a new molecule M x, depicted in Table 2, we intend to predict its CMC. Table 3 is a numerical example of evidential rules’ fusion using EvAC. The extracted classification association rules are modelled as BBAs. The decision with pignistic probability gives the {CM C = 12} class which is naturally the case. The result is interpreted as the molecule M x belongs to the {CM C = 12} class and its CMC is highly possible centred around 12. Table 2: The evidential transaction X under classification Molecule Head family? Carbon Number? CMC? M x m11 (Glucose) = 1.0 m21 (7) = 0.8 m21 (7 ∪ 8) = 0.2 ? 4 Empirical results Knowledge extraction from a chemical database is of great interest to the identification of useful molecules for a specific purpose. In this real case study, we aimed at predicting the relationships between structural characteristics and the Critical Micelle Concentration property prediction with Evidence Theory 9 Table 3: Numerical example of rule’s fusion Rule Confidence R1 : {HF = Glucose}, {CN = 8} → {CM C = 12} 0.9 R2 : {HF = Glucose}, {CN = 7} → {CM C = 12} 0.9 R3 : {HF = Glucose}, {CN = 7 ∪ 8} → {CM C = 50} 0.1 R4 : {HF = Glucose}, {CN = 7 ∪ 8} → {CM C = 12 ∪ 50} 1 w mΘC BetP Rl ΘC m ({12}) = 0.9 R1 Θ m C (ΘC ) = 0.1 R1 Θ m C ({12}) = 0.9 R2 Θ m C (ΘC ) = 0.1 BetP ({12}) = 0.993 R2 ΘC m ({50}) = 0.1 R3 Θ m C (ΘC ) = 0.9 BetP ({50}) = 0.007 R3 Θ m C (12 ∪ 50) = 1 R4 physico-chemical properties of the amphiphile molecules. In particular, we focused on the prediction of the Critical Micelle Concentration (CMC) of each molecule by using its structural properties. The database is established from the domain literature using a systematic review process. Each retrieved paper is scanned and reviewed by two domain experts. Relevant information of structural characteristics and related physico-chemical properties are extracted and stored into a raw database for further processing. A transformation process, as the one described in subsection 3.1, is performed to establish an evidential database from raw data. The database after transformation and processing contains 199 amphiphile molecules (i.e., rows) detailed in 24 attributes (structural characteristics and related physico-chemical properties) (i.e., columns). The amphiphile molecule evidential database contains over 109 items (i.e., focal elements) after transformation. Table 4: Classification Accuracy Method EvACLike-Pr EvACECM-Pr EvACLike-Bel EvACECM-Bel CMAR [18] N. Net KNN SVM % 65.83 63.83 49.20 49.20 58.29% 38.66 43.21 34.84 Figure 3 shows the number of extracted frequent patterns for two measures: the belief and the precise support measures. Those measure were evaluated for both likelihood-based and ECM-based imprecision modelling [19] for evidential database construction. The results show that precise-based support associated to an Evidential C-Means (ECM) for imprecision modelling provides the highest number of frequent patterns with a peak of 87423 comparatively to likelihoodbased that has a peak of 50415. It is important to highlight that the belief-based support measure always provides a lower number of frequent patterns than the precise one for both likelihood and ECM database construction. This confirms that the belief-based support is a pessimistic measure. Figure 4 highlights the number of association rules that will be used for classification. It is important to emphasize that the number of association rules depends on the number of the retrieved frequent patterns. The runtime for an extraction algorithm with the precise support is slightly higher than the belief- 10 Ahmed Samet et al. # Frequent patterns ·104 ECM-Precise Likelihood-Precise ECM-Belief Likelihood-Belief 8 6 4 # Valid rules ·106 2 1 ECM-Precise Likelihood-Precise ECM-Belief Likelihood-Belief 2 0 0 0.05 0.2 0.4 0.5 minsup 0.7 0.9 minsup Fig. 3: Number of retrieved frequent patterns from the database. Fig. 4: Number of retrieved valid association rules from the database. Runtime (s) Precise based support ΘCM C {0.004} {0.4} {4.5} {14.40} {50} Belief based support Recall 0.84 0.76 0.69 0.52 0.63 Precision 0.72 0.72 0.69 0.81 0.58 F-measure 0.77 0.74 0.69 0.63 0.60 102 100 0.2 0.4 0.6 0.8 1 Fig. 6: Recall, Precision and F-measure for EvAC on the CMC classification minsup Fig. 5: Runtime relatively to minsup value based one in Figure 5. The belief measure provides better runtime thanks to the mathematical simplicity of computing the belief function. To evaluate the accuracy of our algorithm, we perform a cross validation classification. The accuracy of EvAC Algorithm is given in Table 4. The classification with a likelihood imprecision modelling approach is as efficient as those provided with ECM. Technically, the reduction of the association rules, number-wise, when it is done correctly, helps to improve results. Indeed, the classification process as demonstrated in Algorithm 1 relies on rules merging using the Dempster’s rule of combination. An important number of association rules in addition to the characteristics of Dempster’s combination rule behaviour misleads the fusion process to errors. In addition, the classification accuracy depends on the quality of the discretization. The use of the likelihood approach for evidential database construction provides a better handling to the uncertainty with BBAs. In contrast, the result of the Critical Micelle Concentration property prediction with Evidence Theory 11 k-NN, the Neural Networks and SVM using the Weka software, in Table 4, are obtained after going through a PKIDiscretization. The comparison shows that our proposed framework performs more efficiently. In Table 6, we scrutinize the performance of the classification process under a likelihood database construction and the precise support with the Recall, Precision and F-measure relatively to each CMC class. We report the F1 score which is the harmonic mean between precision and recall. Specifically, the F1 score is : F1 = 2 × P rec × Rec , P rec + Rec P rec = tp , tp + f p Rec = tp tp + f n with tp, f p, f n denoting true positives, false positives, and false negatives. Several recall values are low such as the CMC={14.16} comparatively to the other classes. This results can be explained by the proximity of centroid clusters found by ECM. In fact, CMC={14.16} and CMC={16.09} could be merged into one representative class for a better detection. Conclusion In this paper, we introduced new uncertain data mining driven approach for physico-chemical property prediction of amphiphilic molecules. A new imprecision modelling approach based on likelihood is provided. The likelihood modelling approach is used to construct evidential database. As illustrated in the experiment section, the proposed approach provided an interesting performance on a real-world chemical database. In future work, we will be interested in confronting our results the to other uncertain data mining approaches such as probabilistic and fuzzy databases. Furthermore, the performance of mining algorithm could be improved by adding specific heuristics to reduce focal elements through evidential database construction process. Acknowledgement This work was performed, in partnership with the SAS PIVERT, within the frame of the French Institute for the Energy Transition (Institut pour la Transition Energétique (ITE) P.I.V.E.R.T. (www.institut-pivert.com) selected as an Investment for the Future (”Investissements d’Avenir”). This work was supported, as part of the Investments for the Future, by the French Government under the reference ANR-001-01. References 1. Seeja, K., Zareapoor, M.: Fraudminer: A novel credit card fraud detection model based on frequent itemset mining. The Scientific World Journal 2014 (2014) 2. Chen, Z., Chen, G.: Building an associative classifier based on fuzzy association rules. International Journal of Computational Intelligence Systems 1(3) (2008) 262–273 12 Ahmed Samet et al. 3. Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, New York, USA (1998) 30–36 4. King, R.D., Srinivasan, A., Dehaspe, L.: Warmr: a data mining tool for chemical data. Journal of Computer-Aided Molecular Design 15(2) (2001) 173–181 5. Sarfraz Iqbal, M., Golsteijn, L., Öberg, T., Sahlin, U., Papa, E., Kovarich, S., Huijbregts, M.A.: Understanding quantitative structure–property relationships uncertainty in environmental fate modeling. Environmental Toxicology and Chemistry 32(5) (2013) 1069–1076 6. Weng, C.H., Chen, Y.L.: Mining fuzzy association rules from uncertain data. Knowledge and Information Systems 23(2) (2010) 129–152 7. Leung, C.S., MacKinnon, R., Tanbeer, S.: Fast algorithms for frequent itemset mining from uncertain data. In Proceeding of IEEE International Conference on Data Mining (ICDM), Shenzhen, China (Dec 2014) 893–898 8. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. In Proceedings of the VLDB Endowment 5(11) (2012) 1650–1661 9. Samet, A., Lefevre, E., Ben Yahia, S.: Evidential database: a new generalization of databases? In Proceedings of 3rd International Conference on Belief Functions, Belief 2014, Oxford, UK (2014) 105–114 10. Samet, A., Lefevre, E., Ben Yahia, S.: Classification with evidential associative rules. In Proceedings of 15th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Montpellier, France (2014) 25–35 11. Hewawasam, K.R., Premaratne, K., Shyu, M.L.: Rule mining and classification in a situation assessment application: A belief-theoretic approach for handling data imperfections. Trans. Sys. Man Cyber. Part B 37(6) (2007) 1446–1459 12. Samet, A., Dao, T.T.: Mining over a reliable evidential database: Application on amphiphilic chemical database. To appear in proceeding of 14th International Conference on Machine Learning and Applications, IEEE ICMLA’15, Miami, Florida (2015) 13. Nouaouri, I., Samet, A., Allaoui, H.: Evidential data mining for length of stay (LOS) prediction problem. In proceeding of 11th IEEE International Conference on Automation Science and Engineering, CASE 2015, Gothenburg, Sweden, 2015 (2015) 1415–1420 14. Lee, S.: Imprecise and uncertain information in databases: an evidential approach. In Proceedings of Eighth International Conference on Data Engineering, Tempe, AZ (1992) 614–621 15. Samet, A., Lefevre, E., Ben Yahia, S.: Mining frequent itemsets in evidential database. In Proceedings of the fifth International Conference on Knowledge and Systems Engeneering, Hanoi, Vietnam (2013) 377–388 16. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press (1976) 17. Appriou, A.: Multisensor signal processing in the framework of the theory of evidence. Application of Mathematical Signal Processing Techniques to Mission Systems (1999) 5–1 18. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. in Proceedings of IEEE International Conference on Data Mining (ICDM01), San Jose, CA, IEEE Computer Society (2001) 369–376 19. Samet, A., Lefèvre, E., Ben Yahia, S.: Evidential data mining: precise support and confidence. Journal of Intelligent Information Systems (2016) 1–29