Download Predictive model based on the evidence theory for assessing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
Predictive model based on the evidence theory
for assessing Critical Micelle Concentration
property
Ahmed SametB1 , Théophile Gaudin2 , Huiling Lu2,3 , Anne Wadouachi3 ,
Gwladys Pourceau3 , Elisabeth Van Hecke2 , Isabelle Pezron2 , Karim El Kirat1 ,
and Tien-Tuan Dao1
1
Sorbonne University, Université de technologie de Compiègne
CNRS, UMR 7338 Biomechanics and Bioengineering
2
Sorbonne University, Université de technologie de Compiègne
EA 4297 Transformations Intégrées de la Matière Renouvelable
3
Université de Picardie Jules Verne
CNRS, FRE 3517 Laboratoire de Glycochimie, des Antimicrobiens et des
Agroressources
??
Abstract. In this paper, we introduce an uncertain data mining driven
model for knowledge discovery in chemical database. We aim at discovering relationship between molecule characteristics and properties using
uncertain data mining tools. In fact, we intend to predict the Critical
Micelle Concentration (CMC) property based on a molecule characteristics. To do so, we develop a likelihood-based belief function modelling
approach to construct evidential database. Then, a mining process is developed to discover valid association rules. The prediction is performed
using association rule fusion technique. Experiments were conducted using a real-world chemical databases. Performance analysis showed a better prediction outcome for our proposed approach in comparison with
several literature-based methods.
Keywords: Evidential data mining, Chemical database, Association
rule, Associative classifier.
1
Introduction
Data mining is generally held to be generically a discipline of the field of Knowledge Discovery, or Knowledge Discovery in Databases (KDD). It is usually defined as the process of identifying valid, novel, potentially useful, and ultimately
understandable patterns from large collections of data. Then, causal rules are derived from those patterns. Frequent patterns and valid rules can be used to test
hypotheses (or verification goals) or to autonomously find entirely new patterns
(discovery goals) [1]. Discovery goals could be predictive (requiring predictions
?? B
[email protected]
2
Ahmed Samet et al.
to be made using the data in the database) [2]. On the other hand, there has been
an explosion in the availability of publicly accessible chemical information, including chemical structures of small molecules, structure-derived properties and
associated biological activities in a variety of assays [3,4]. These data sources
provide a significant opportunity to develop and apply computational tools to
extract and understand the underlying structure-activity relationships. These
techniques remain sensitive to the presence of imperfect data [5]. Recent years,
we have noticed the emergence of uncertain data mining tools [6,7,8] that contribute to seek hidden pertinent information under the presence of uncertainty
and imprecision. However, to the best of our knowledge, uncertain data mining
tools have not yet been used to discover pertinent knowledges neither to predict
in chemical databases.
In this work, we are interested in evidential data mining in chemical
databases. The latter provides a generalizing framework for probabilistic and
binary data mining disciplines [9]. Recently, mining over evidential databases
has flourished by several contributions and the introduction of new support and
confidence measures [10,11]. In addition, it has been applied on several fields
such as healthcare [12], cheminformatics [13], etc.
From methodological point of view, we intend to apply an uncertain data
mining-driven approach to analyze a chemical database. The chemical database
contains records of amphiphilic molecules 4 . We aim at predicting physicochemical properties of a new molecule from their structural characteristics. The
imprecision within the data is modelled using evidence theory. Methodologically, we transform the chemical database into an evidential database with a
likelihood-based approach. Once the imprecision examined, valid association
rules are selected and used for the prediction of Critical Micelle Concentration
(CMC) property.
This paper is organized as follows: in Section 2, the state-of-the-art works
of evidential data mining are briefly recalled. In Section 3, we introduce our
uncertain data mining driven approach. A new likelihood-based model for imprecision consideration is presented. The performance of our proposed approach
was studied on a real-world chemical database in Section 4. Finally, we conclude
and sketch potential issues for the future work.
2
Preliminaries
2.1
Evidential database
An evidential database stores either uncertain and imprecise data[14] via the
evidence theory. An evidential database, denoted by EDB, with n columns and
d rows where each column i (1 ≤ i ≤ n) has a domain Θi of discrete values. Each
4
An amphiphilic molecule is chemical compound possessing both hydrophilic (waterloving, polar) and lipophilic (fat-loving) properties
Critical Micelle Concentration property prediction with Evidence Theory
3
cell of a row j and a column i contains a normalized Belief Basic Assignment
(BBA) mij : 2θi → [0, 1] as follows:

mij (∅) = 0
P
(1)
mij (A) = 1.

A⊆θi
An item corresponds to a focal element5 . Two different itemsets (a.k.a patterns) can be related via either the inclusion or the intersection operator. Indeed,
the inclusion operator for evidential itemsets [11] is defined as follows, where X
and Y are two evidential itemsets:
X ⊆ Y ⇐⇒ ∀xi ∈ X, xi ⊆ yj
(2)
xi and yj are respectively the ith and the j th element of X and Y . For the same
evidential itemsets X and Y , the intersection operator is defined as follows:
X ∩ Y = Z ⇐⇒ ∀zk ∈ Z, zk ⊆ xi and zk ⊆ yj .
(3)
An evidential association rule R is a causal relationship between two itemsets
that can be written in the following form R : X → Y such that X ∩ Y = ∅.
Example 1. We aim at developing a predictive model for a chemical database.
The evidential database records information about several molecules. Table 1
shows an example of an evidential database.
Table 1: Evidential database EDB
Molecule
M1
M2
Head Family (HF)?
m11 (Glucose) = 1.0
Carbon Number (Nc )?
CMC?
m21 (7) = 0.9
m31 (12) = 0.8
m21 (7 ∪ 8) = 0.1
m31 (12 ∪ 50) = 0.2
m12 (Glucosamine) = 1.0
m22 (8) = 0.8
m32 (12) = 0.7
m22 (8 ∪ 10) = 0.2 m32 (0.2 ∪ 12) = 0.3
The first transaction means that the molecule M1 is a Glucose head family
type of the frame of discernment ΘHF = {Glucose, Glucosamine}. The second
attribute reflects the Critical Micelle Concentration (CMC) discretized in the
following frame of discernment ΘCM C = {0.2, 12, 50}. M1 has a CMC close to
12 mM and could be some doubt if it belongs to around 50 millimolar (mM) CMC
class. In Table 1, {HF = Glucose} is an item and {HF = Glucose} × {CM C =
12∪50} is an itemset such that {HF = Glucose} ⊂ {HF = Glucose}×{CM C =
12 ∪ 50} and {HF = Glucosamine} ∩ {HF = Glucosamine} × {CM C = 0.2 ∪
12} = {HF = Glucosamine}. {Nc = 8} → {CM C = 12} is an association rule.
In the following subsection, we recall the definition of belief-based, precise-based
support and confidence measures that estimate the pertinence of patterns and
association rules.
5
Each subset A of 2Θ , fulfilling m(A) > 0, is called a focal element.
4
2.2
Ahmed Samet et al.
Support and confidence measures
As is the case for probabilistic data mining [8], the support within the evidential
context is based on expectation. Two support family approaches were proposed.
The first support measure was proposed by [11] and called the belief-based support measure. It is considered as the lower bound for the support. It is written
as follows:
Y
Y
SupBel
SupBel
Bel(xi ).
(4)
Tj (X) =
Tj (xi ) =
i∈[1...n]
i∈[1...n]
Thus, the belief-based support in the entire database is computed as follows:
d
SupBel
EDB (X) =
1X
SupBel
Tj (X).
d j=1
(5)
Since the belief-based support is a lower estimation of the support, it is
obvious that in some cases that an itemset I could have a higher support value.
Another measure was introduced by Samet et al.[15] that provides a medium
estimation. The precise measure P r is defined by:
X |xi ∩ x|
× mij (x)
∀xi ∈ 2Θi .
(6)
|x|
x⊆Θi
Q
The evidential support of an itemset X =
xi in the transaction Tj (i.e.,
P r(xi ) =
i∈[1...n]
P rTj ) is then computed as follows:
Y
P rTj (X) =
P r(xi ).
(7)
xi ∈Θi ,i∈[1...n]
Thus, the support SupEDB of the itemset X becomes:
d
SupEDB (X) =
1X
P rTj (X).
d j=1
(8)
A new metric for confidence computing based on the precise-based support measure is introduced in [10]. For an association rule R : Ra → Rc , the confidence
is computed as follows:
d
P
Conf (R) =
SupTj (Ra ) × SupTj (Rc )
j=1
d
P
.
(9)
SupTj (Ra )
j=1
Example 2. We consider the same problem described in Example 1. The precise
support of the itemset {HF = Glucose} × {Nc = 7} in the evidential database
Critical Micelle Concentration property prediction with Evidence Theory
5
is equal to 1×0.95+0
= 0.475. It is superior to the one computed with the belief2
= 0.45. Finally, the association rule
based support which is equal to 1×0.9+0
2
= 0.85 as confidence in the
{Nc = 8} → {CM C = 12} has 0.05×0.9+0.9×0.85
0.05+0.9
evidential database.
3
Uncertain data mining approach for CMC prediction
In the following, we introduce our uncertain data mining approach for amphiphilic molecule’s CMC prediction. The provided approach, shown in Figure
1 consists in three stages. Imprecision within the raw database is processed
when evidential database is constructed upon the use of likelihood modelling
approach. Then, frequent patterns and valid association rules are retrieved with
EDMA mining algorithm [10]. The selected valid association rules are then used
to compute the CMC of an amphiphilic molecule.
Pre-processing
DB
Likelihood
database
modelling
Mining process
EDMA
Associative classification
EvAC
Fig. 1: Evidential data mining based model for the prediction of CMC property
3.1
Likelihood modelling for input data
Let us assume the class-conditional probability densities f (x|ωi ) to be known.
Having observed x, the likelihood function is a function from Θ to [0, +∞)
defined as L(ωi |x) = f (x|ωi ), for all k ∈ {1, . . . , K}. Shafer [16] proposed to
derive from L a belief function on Θ defined by its plausibility function. Starting
from axiomatic requirements, Appriou [17] proposed another method based on
the construction of I belief functions mi (.). The idea consists in taking into
account separately each class and evaluating the degree of belief given to each
of them. In this case, the focal elements of each BBA mi are the singleton {ωi },
its complement ωi , and Θ. This model, hereafter referred to as the Separable
Likelihood-based (SLB) method, has the following expression:


mi ({ωi }) = 0
(10)
mi (ωi ) = αi · {1 − R · L(ωi |x)}


mi (Θ) = 1 − αi · {1 − R · L(ωi |x)}.
6
Ahmed Samet et al.
where αi is a coefficient that can be used to model external information
such as reliability, and R is a normalizing constant that can take any value
in the range (0, (maxi (L(ωi |x)))−1 ]. A second model satisfying the axiomatic
requirements [17] can be written as follows:

αi ·R·L(ωi |x)

mi ({ωi }) = 1+R·L(ωi |x)
αi
mi (ωi ) = 1+R·L(ω
i |x)


mi (Θ) = 1 − αi
(11)
Both BBA models could be used to consider imprecision within the data. However, we retain the second BBA model since it is more informative.
Frequency
15
10
5
0
0 0.002
0.4
4.5
14.40
50
CMC (mM)
Fig. 2: Frequency of appearance histogram of amphiphilic molecule’s CMC
Now, we intend to model each molecule x with a BBA that expresses its
membership to the CMC classes. To do so, we distinguish between two types
of data (i.e., proprieties and descriptors of molecule) in the database : categoric
and numeric data. A categoric data, such as the head family in Table 1, are
represented by certain BBA. A BBA is called a certain BBA when it has one
focal element, which is a singleton. It is representative of perfect knowledge and
the absolute certainty. The numeric data are transformed into a BBA using
the likelihood model. In fact, from the histogram of frequency in Figure 2, we
construct a probability density function (pdf). A pdf is constructed on each CMC
peak. The computed pdf corresponds to the class-conditional probability density.
Five class-conditional probability density functions are constructed from Figure
2. A class-conditional pdf is computed for the following CMC classes: CMC
around 0.002, 0.4, 4.5, 14.40 and 50 mM. Once the class-conditional probability
functions are constructed, we compute the BBA that expresses the membership
of x to each CMC class following the model detailed in Equation (10). The
computed BBAs are then combined to obtain the final BBA that expresses the
membership of x to all CMC classes:
m = ⊕i∈I mi
(12)
Critical Micelle Concentration property prediction with Evidence Theory
7
where ⊕ is the Dempster’s rule of combination between two BBAs which is
defined as:
(m1 ⊕ m2 )(A) = m⊕ (A) =
1
1 − m(∅)
X
m1 (B) × m2 (C)
∀A ⊆ Θ, A 6= ∅
B∩C=A
(13)
where m(∅) represents the conflict mass between m1 and m2 , is defined as:
m(∅) =
X
m1 (B) × m2 (C).
(14)
B∩C=∅
Once the evidential database is constructed, we mine frequent patterns and
valid association rules. The methodology for patterns and association rules mining over evidential amphiphilic molecules database is detailed in the following
subsection.
3.2
Data mining predictive-based model for amphiphile molecules
In the following, we detail the evidential data mining process to predict the CMC
property of a molecule based on its structural characteristics. Once the evidential
database is constructed with the procedure described in subsection 3.1, we mine
frequent patterns and valid association rules. A pattern is called frequent (resp.
valid for association rules) if its computed support (resp. confidence) is higher
than or equal to a fixed threshold minsup (resp. minconf ). In our model, we use
EDMA algorithm [10] to retrieve frequent patterns and valid association rules
from the evidential database. EDMA generates patterns having a precise support
higher than minsup in a level-wise manner. The valid association rules are then
deduced from frequent patterns. Each pattern of size k gives 2k − 2 different
association rules. The retrieved association rules are of help for predicting the
CMC value of a molecule. To do so, from the set of all valid association rules
R, we retain only those that have a CMC item within the conclusion part such
that:
RI = {R : Ra → Rc ∈ R|∃y ∈ ΘCM C , y ∈ RC }.
(15)
The set RI represents the set of all prediction rules. They are the input of the EvAC algorithm (see Algorithm 1). EvAC is an associative classifier algorithm that fuses interesting association rules for prediction purposes. Indeed, EvAC algorithm classifies the data with fusion techniques and
F ILT RAT E LARGE P REM ISE(.) function (line1) allows to filtrate the rules
and to retain only those with the largest premise, having intersection with the under classification instance X. In fact, the set of the largest premise rules Rlarge
are more precise than those with the shortest premise. Once found, they are
considered as independent sources and are combined (line 2 to 4). The fusion is
operated on association rules modelled into BBAs with Dempster’s rule of combination (see Equation (13)). The function argmax in line 5 allows the retention
8
Ahmed Samet et al.
Algorithm 1 Evidential Associative Classification (EvAC) algorithm
Require: R, X, ΘC
Ensure: Class
1: Rlarge ← F ILT RAT E LARGE P REM ISE(R, X, ΘC )
2: for all r (
∈ Rlarge do
m({r.conclusion}) = conf (r)
3:
m←
m(ΘC ) = 1 − conf (r)
4:
m⊕ ← m⊕ ⊕ m
5: Class ← argmaxHk ∈ΘC BetP (Hk )
6: function filtrate large premise(R, X, ΘC )
7:
max ← 0
8:
for all r ∈ R do
9:
if r.conclusion ∈ ΘC & X ∩ r.premise 6= ∅ then
10:
if size(r.premise) > max then
11:
Rlarge ← {r}
12:
max ← size(r.premise)
13:
else
14:
if size(r.premise) = max then
15:
Rlarge ← Rlarge ∪ {r}
16:
return Rlarge
of the hypothesis that maximizes the pignistic probability which is computed as
follows:
BetP (Hn ) =
X
A⊆Θ
|Hn ∩ A|
× m(A)
|A| (1 − m(∅))
∀Hn ∈ Θ.
(16)
Example 3. Let us assume a new molecule M x, depicted in Table 2, we intend
to predict its CMC. Table 3 is a numerical example of evidential rules’ fusion
using EvAC. The extracted classification association rules are modelled as BBAs.
The decision with pignistic probability gives the {CM C = 12} class which is
naturally the case. The result is interpreted as the molecule M x belongs to the
{CM C = 12} class and its CMC is highly possible centred around 12.
Table 2: The evidential transaction X under classification
Molecule
Head family?
Carbon Number? CMC?
M x m11 (Glucose) = 1.0 m21 (7) = 0.8
m21 (7 ∪ 8) = 0.2
?
4
Empirical results
Knowledge extraction from a chemical database is of great interest to the identification of useful molecules for a specific purpose. In this real case study, we
aimed at predicting the relationships between structural characteristics and the
Critical Micelle Concentration property prediction with Evidence Theory
9
Table 3: Numerical example of rule’s fusion
Rule
Confidence
R1 : {HF = Glucose}, {CN = 8} → {CM C = 12}
0.9
R2 : {HF = Glucose}, {CN = 7} → {CM C = 12}
0.9
R3 : {HF = Glucose}, {CN = 7 ∪ 8} → {CM C = 50}
0.1
R4 : {HF = Glucose}, {CN = 7 ∪ 8} → {CM C = 12 ∪ 50}
1
w mΘC
BetP
Rl
ΘC
m
({12}) = 0.9
R1
Θ
m C (ΘC ) = 0.1
R1
Θ
m C ({12}) = 0.9
R2
Θ
m C (ΘC ) = 0.1 BetP ({12}) = 0.993
R2
ΘC
m
({50}) = 0.1
R3
Θ
m C (ΘC ) = 0.9 BetP ({50}) = 0.007
R3
Θ
m C (12 ∪ 50) = 1
R4
physico-chemical properties of the amphiphile molecules. In particular, we focused on the prediction of the Critical Micelle Concentration (CMC) of each
molecule by using its structural properties. The database is established from
the domain literature using a systematic review process. Each retrieved paper is
scanned and reviewed by two domain experts. Relevant information of structural
characteristics and related physico-chemical properties are extracted and stored
into a raw database for further processing. A transformation process, as the
one described in subsection 3.1, is performed to establish an evidential database
from raw data. The database after transformation and processing contains 199
amphiphile molecules (i.e., rows) detailed in 24 attributes (structural characteristics and related physico-chemical properties) (i.e., columns). The amphiphile
molecule evidential database contains over 109 items (i.e., focal elements) after
transformation.
Table 4: Classification Accuracy
Method EvACLike-Pr EvACECM-Pr EvACLike-Bel EvACECM-Bel CMAR [18] N. Net KNN SVM
%
65.83
63.83
49.20
49.20
58.29%
38.66 43.21 34.84
Figure 3 shows the number of extracted frequent patterns for two measures:
the belief and the precise support measures. Those measure were evaluated for
both likelihood-based and ECM-based imprecision modelling [19] for evidential
database construction. The results show that precise-based support associated
to an Evidential C-Means (ECM) for imprecision modelling provides the highest
number of frequent patterns with a peak of 87423 comparatively to likelihoodbased that has a peak of 50415. It is important to highlight that the belief-based
support measure always provides a lower number of frequent patterns than the
precise one for both likelihood and ECM database construction. This confirms
that the belief-based support is a pessimistic measure.
Figure 4 highlights the number of association rules that will be used for
classification. It is important to emphasize that the number of association rules
depends on the number of the retrieved frequent patterns. The runtime for an
extraction algorithm with the precise support is slightly higher than the belief-
10
Ahmed Samet et al.
# Frequent patterns
·104
ECM-Precise
Likelihood-Precise
ECM-Belief
Likelihood-Belief
8
6
4
# Valid rules
·106
2
1
ECM-Precise
Likelihood-Precise
ECM-Belief
Likelihood-Belief
2
0
0
0.05
0.2
0.4
0.5
minsup
0.7
0.9
minsup
Fig. 3: Number of retrieved frequent
patterns from the database.
Fig. 4: Number of retrieved valid association rules from the database.
Runtime (s)
Precise based support ΘCM C
{0.004} {0.4} {4.5} {14.40} {50}
Belief based support
Recall
0.84 0.76 0.69 0.52 0.63
Precision
0.72 0.72 0.69 0.81 0.58
F-measure 0.77 0.74 0.69 0.63 0.60
102
100
0.2
0.4
0.6
0.8
1
Fig. 6: Recall, Precision and F-measure
for EvAC on the CMC classification
minsup
Fig. 5: Runtime relatively to minsup value
based one in Figure 5. The belief measure provides better runtime thanks to the
mathematical simplicity of computing the belief function. To evaluate the accuracy of our algorithm, we perform a cross validation classification. The accuracy
of EvAC Algorithm is given in Table 4. The classification with a likelihood imprecision modelling approach is as efficient as those provided with ECM. Technically,
the reduction of the association rules, number-wise, when it is done correctly,
helps to improve results. Indeed, the classification process as demonstrated in
Algorithm 1 relies on rules merging using the Dempster’s rule of combination.
An important number of association rules in addition to the characteristics of
Dempster’s combination rule behaviour misleads the fusion process to errors. In
addition, the classification accuracy depends on the quality of the discretization.
The use of the likelihood approach for evidential database construction provides
a better handling to the uncertainty with BBAs. In contrast, the result of the
Critical Micelle Concentration property prediction with Evidence Theory
11
k-NN, the Neural Networks and SVM using the Weka software, in Table 4, are
obtained after going through a PKIDiscretization. The comparison shows that
our proposed framework performs more efficiently.
In Table 6, we scrutinize the performance of the classification process under a likelihood database construction and the precise support with the Recall,
Precision and F-measure relatively to each CMC class. We report the F1 score
which is the harmonic mean between precision and recall. Specifically, the F1
score is :
F1 =
2 × P rec × Rec
,
P rec + Rec
P rec =
tp
,
tp + f p
Rec =
tp
tp + f n
with tp, f p, f n denoting true positives, false positives, and false negatives. Several recall values are low such as the CMC={14.16} comparatively to the other
classes. This results can be explained by the proximity of centroid clusters found
by ECM. In fact, CMC={14.16} and CMC={16.09} could be merged into one
representative class for a better detection.
Conclusion
In this paper, we introduced new uncertain data mining driven approach for
physico-chemical property prediction of amphiphilic molecules. A new imprecision modelling approach based on likelihood is provided. The likelihood modelling approach is used to construct evidential database. As illustrated in the
experiment section, the proposed approach provided an interesting performance
on a real-world chemical database. In future work, we will be interested in confronting our results the to other uncertain data mining approaches such as probabilistic and fuzzy databases. Furthermore, the performance of mining algorithm
could be improved by adding specific heuristics to reduce focal elements through
evidential database construction process.
Acknowledgement
This work was performed, in partnership with the SAS PIVERT, within the frame of
the French Institute for the Energy Transition (Institut pour la Transition Energétique
(ITE) P.I.V.E.R.T. (www.institut-pivert.com) selected as an Investment for the Future
(”Investissements d’Avenir”). This work was supported, as part of the Investments for
the Future, by the French Government under the reference ANR-001-01.
References
1. Seeja, K., Zareapoor, M.: Fraudminer: A novel credit card fraud detection model
based on frequent itemset mining. The Scientific World Journal 2014 (2014)
2. Chen, Z., Chen, G.: Building an associative classifier based on fuzzy association
rules. International Journal of Computational Intelligence Systems 1(3) (2008)
262–273
12
Ahmed Samet et al.
3. Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical
compounds. In Proceeding of the Fourth International Conference on Knowledge
Discovery and Data Mining (KDD-98), New York City, New York, USA (1998)
30–36
4. King, R.D., Srinivasan, A., Dehaspe, L.: Warmr: a data mining tool for chemical
data. Journal of Computer-Aided Molecular Design 15(2) (2001) 173–181
5. Sarfraz Iqbal, M., Golsteijn, L., Öberg, T., Sahlin, U., Papa, E., Kovarich, S., Huijbregts, M.A.: Understanding quantitative structure–property relationships uncertainty in environmental fate modeling. Environmental Toxicology and Chemistry
32(5) (2013) 1069–1076
6. Weng, C.H., Chen, Y.L.: Mining fuzzy association rules from uncertain data.
Knowledge and Information Systems 23(2) (2010) 129–152
7. Leung, C.S., MacKinnon, R., Tanbeer, S.: Fast algorithms for frequent itemset
mining from uncertain data. In Proceeding of IEEE International Conference on
Data Mining (ICDM), Shenzhen, China (Dec 2014) 893–898
8. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain
databases. In Proceedings of the VLDB Endowment 5(11) (2012) 1650–1661
9. Samet, A., Lefevre, E., Ben Yahia, S.: Evidential database: a new generalization
of databases? In Proceedings of 3rd International Conference on Belief Functions,
Belief 2014, Oxford, UK (2014) 105–114
10. Samet, A., Lefevre, E., Ben Yahia, S.: Classification with evidential associative
rules. In Proceedings of 15th International Conference on Information Processing
and Management of Uncertainty in Knowledge-Based Systems, Montpellier, France
(2014) 25–35
11. Hewawasam, K.R., Premaratne, K., Shyu, M.L.: Rule mining and classification in
a situation assessment application: A belief-theoretic approach for handling data
imperfections. Trans. Sys. Man Cyber. Part B 37(6) (2007) 1446–1459
12. Samet, A., Dao, T.T.: Mining over a reliable evidential database: Application on
amphiphilic chemical database. To appear in proceeding of 14th International Conference on Machine Learning and Applications, IEEE ICMLA’15, Miami, Florida
(2015)
13. Nouaouri, I., Samet, A., Allaoui, H.: Evidential data mining for length of stay
(LOS) prediction problem. In proceeding of 11th IEEE International Conference
on Automation Science and Engineering, CASE 2015, Gothenburg, Sweden, 2015
(2015) 1415–1420
14. Lee, S.: Imprecise and uncertain information in databases: an evidential approach.
In Proceedings of Eighth International Conference on Data Engineering, Tempe,
AZ (1992) 614–621
15. Samet, A., Lefevre, E., Ben Yahia, S.: Mining frequent itemsets in evidential
database. In Proceedings of the fifth International Conference on Knowledge and
Systems Engeneering, Hanoi, Vietnam (2013) 377–388
16. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press (1976)
17. Appriou, A.: Multisensor signal processing in the framework of the theory of
evidence. Application of Mathematical Signal Processing Techniques to Mission
Systems (1999) 5–1
18. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on
multiple class-association rules. in Proceedings of IEEE International Conference
on Data Mining (ICDM01), San Jose, CA, IEEE Computer Society (2001) 369–376
19. Samet, A., Lefèvre, E., Ben Yahia, S.: Evidential data mining: precise support and
confidence. Journal of Intelligent Information Systems (2016) 1–29