Download Interestingness Measures - University of Regina

Document related concepts
no text concepts found
Transcript
Interestingness Measures
Quality in KDD
Quality in KDD
Levels of Quality
Quality of discovered knowledge : f(D,M,U)

Data Quality (D)
 Noise,
accuracy, missing values, bad values, …
[Berti-Equille 2004]

Model Quality (M)
 Accuracy,

generalization, relevance, …
User-based Quality (U)
 Relevance
for decision making
User-based Quality
2 categories:
 Objective (D, M)


Computed from data only
Subjective (U)


Hypothesis : goal, domain knowledge
Hard to formalize (novelty)
Objective Measures
Examples of Quality Criteria
7 criteria of interest (interestingness) [Hussein
2000]:
 Objective:




Generality : (ex: Support)
Validity: (ex: Confidence)
Reliability: (ex: High generality and validity)
Subjective:




Common Sense: reliable + known yet
Actionability : utility for decision
Novelty: previously unknown
Surprise (Unexpectedness): contradiction ?
Quality and Association
Rules
AR Quality
Association Rules
Association rules [Agrawal et al. 1993]:
Market-basket analysis
 Non supervised learning
 Algorithms + 2 measures (support and
confidence)

Problems:
Enormous amount of rules (rough rules)
 Few semantic on support and confidence
measures
 Need to help the user select the best rules

AR Quality
Association Rules
Solutions:
Redundancy reduction
 Structuring (classes, close rules)
 Improve quality measures
 Interactive decision aid (rule mining)

AR Quality
Association Rules
Input : data


p Boolean attributes (V0, V1, … Vp) (columns)
n transactions (rows)
Output : Association Rules:

Implicative tendencies : X  Y
X and Y (itemsets) ex: V0^V4^V8  V1
 Negative examples


2 measures:
Support: supp(XY) = freq(XUY)
 Confidence: conf(XY) = P(Y|X) = freq(XUY)/freq(X)
 Algorithm properties (monotony)
Ex: Couches  beer (supp=20%, conf=90%)
(NB: max nb of rules 3p)

AR Quality
Limits of Support
Support: supp(XY) = freq(XUY)
 Generality of the rule
 Minimum support threshold (ex: 10%)
 Reduce the complexity
 Lose nuggets (support pruning)

Nugget:



Specific rule (low support)
Valid rule (high confidence)
High potential of novelty/surprise
AR Quality
Limits of Confidence
[Guillaume et al. 1998], [Lallich et al. 2004]
Confidence: conf(XY) = P(Y|X) = freq(XUY)/freq(X)
 Validity/logical aspect of the rule (inclusion)
 Minimal confidence threshold (ex: 90%)
 Reduces the amount of extracted rules
 Interestingness /= validity
 No detection of independence

Independence:


X and Y are independent: P(Y|X) = P(Y)
If P(Y) is high => nonsense rule with high support
Ex: Couches  beer (supp=20%, conf=90%) if supp(beer)=90%
AR Quality
Limits of the Pair
Support-Confidence
In practice:
High support threshold (10%)
 High confidence threshold (90%)

 Valid
and general rules
 Common
Sense but not novelty
Efficient measures but insufficient to capture
quality
Subjective Measures
AR Quality : Subjective Measures
Criteria
User-oriented measures (U)
Quality : interestingness:

Unexpectedness [Silberschatz 1996]
 Unknown

or contradictory rule
Actionability (Usefulness) [Piatesky-shapiro
1994]
 Usefulness

for decision making, gain
Anticipation [Roddick 2001]
 Prediction
on temporal dimension
AR Quality : Subjective Measures
Criteria
Unexpectedness and actionability:
Unexpected + useful = high interestingness
 Expected + non-useful = ?
 Expected + useful = reinforcement
 Unexpected + non-useful = ?

AR Quality : Subjective Measures
Principle
Algorithm principle:
1.
2.
3.
4.
5.
Extraction of the decision maker Knowledge
Formalization of the knowledge (K) (expected and actionable)
KDD (K’)
Compare K and K’
Select (subjective measures) the rules Δ(K,K’) of K’ which are:


Differ the most from K (unexpectedness)
Or are the most similar (actionability)
AR Quality : Subjective Measures
Rule Templates
[Klemettinen et al. 1994]
User knowledge (K): syntactic constraints



Patterns/forms of rules: A1, A2, …, AkAk+1
Ai: constraints on attribute Vi (interval of values)
K: K1 + K2


K1: interesting patterns (select)
K2: not interesting patterns (reject)
Goal: select the interesting rules inside K’
Boolean Criterion:

Rules XY of K’ satisfying K1 patterns but not K2
ones + constraints (threshold) support, confidence,
rule size (| XUY |)
AR Quality : Subjective Measures
Interestingness
[Sibershatz & Tuzhilin 1995]
User knowledge (K): beliefs



A set K of beliefs (bayes rules)
A belief (rule) a∈K weighted by p(α)
K: K1 + K2


K1: hard beliefs (p(α) constant)
K2: soft beliefs (p(α) can vary)
Goal: make beliefs K2 varying in function of the part of K’
which satisfy K1
+ Interest criterion of R=XY of K’:

Change weight α:
AR Quality : Subjective Measures
Logical Contradiction
[Padmanabhan and Tuzhilin 1998]
User Knowledge (K):

A set of K rules
Goal: select unexpected rules in K’
Unexpected criterion:
Rule AB of K’ and XY of K
 AB is unexpected if:

B
and Y are contradictory (p(B and Y)=0)
 (A and X) is frequent (p(A and Y) high)
 (A and X)  B is true (hence (A and X)not Y also
(exception!))
AR Quality : Subjective Measures
Attribute Costs
[Freitas 1999]
User Knowledge (K): costs

Cost of each attribute/item Ai: Cost(Ai)
Goal: select the costless rules in K’
Cost of a rule:


Rule A1, A2, …, AkB
Low mean cost:
AR Quality : Subjective Measures
Other Subjective Measures








Projected Savings (KEFIR system’s interestingness)
[Matheus & Piatetsky-Shapiro 1994]
Fuzzy Matching Interestingness Measure [Lie et al.
1996]
General Impression [Liu et al. 1997]
Logical Contradiction [Padmanabhan & Tuzhilin’s 1997]
Misclassification Costs [Frietas 1999]
Vague Feelings (Fuzzy General Impressions) [Liu et al.
2000]
Anticipation [Roddick and rice 2001]
Interestingness [Shekar & Natarajan’s 2001]
AR Quality : Subjective Measures
Classification
Interestingness Measure
Year
Application
Foundation
Scope
Subjective
Aspects
User’s Knowledge
Representation
1
Matheus and PiatetskyShapiro’s Projected
Savings
1994
Summaries
Utilitarian
Single
Rule
Unexpectedness
Pattern Deviation
2
Klemettinen et al. Rule
Templates
1994
Association
Rules
Syntactic
Single
Rule
Unexpectedness
& Actionability
Rule Templates
3
Silbershatz and Tuzhilin’s
Interestingness
1995
Format
Independent
Probabilistic
Rule Set
Unexpectedness
Hard & Soft Beliefs
4
Liu et al. Fuzzy Matching
Interestingness Measure
1996
Classification
rules
Syntactic
Distance
Single
Rule
Unexpectedness
Fuzzy Rules
5
Liu et al. General
Impressions
1997
Classification
Rules
Syntactic
Single
Rule
Unexpectedness
GI, RPK
6
Padmanabhan and Tuzhilin
Logical Contradiction
1997
Association
Rules
Logical, Statistic
Single
Rule
Unexpectedness
Beliefs XY
7
Freitas’ Attributes Costs
1999
Association
Rules
Utilitarian
Single
Rule
Actionability
Costs Values
8
Freitas’ Misclassification
Costs
1999
Association
rules
Utilitarian
Single rule
Actionability
Costs Values
9
Liu et al. Vague Feelings
(Fuzzy General
Impressions)
2000
Generalized
Association
Rules
Syntactic
Single
Rule
Unexpectedness
GI, RPK, PK
1
0
Roddick and Rice’s
Anticipation
2001
Format
Independent
Probabilistic
Single
Rule
Temporal
Dimension
Probability Graph
1
1
Shekar and Natarajan’s
Interestingness
2002
Association
Rules
Distance
Single
Rule
Unexpectedness
Fuzzy-graph based
taxonomy
AR Quality : Subjective Measures
Conclusion



Algorithm + Measures to compare K and K’
Focus on interesting rules
Knowledge is Domain specific

Acquisition of K?
Hard task to represent knowledge and goals of
the decision maker

Many improvements to make

Objective Measures
Principles and Classification
AR Quality : Objective Measures
Principle
Statistics on data D (transactions) for each rule
R=XY
Interestingness measure = i(R,D,H)
Degree of satisfaction of the hypothesis H in D
independently of U
AR Quality : Objective Measures
Contingency
Rule X with X and Y disjoined itemsets
 Inclusion of E(X) in E(Y)
5 observable parameters in E:
 n=|E|
amount of transactions
 nx=|E(X)|
cardinal of the premise (left hand side)
 ny=|E(Y)|
cardinal of the conclusion (right hand side)
 nxy=|E(X and Y)|
number of positive examples
 nx¬y=|E(X and ¬Y)|
number of negative examples
AR Quality : Objective Measures
Independence
p(X) estimated by
(frequency)
 Hypothesis of Independence of X and Y:





Inclusion /= dependence
AR Quality : Objective Measures
Equiprobability (Equilibrium)
Rule XY
 Same amount of negative examples (e-)
and positive examples (e+):
hence when:
2 situations:


(or P(Y|X)>0.5): e+ higher: rule XY

(or P(Y|X)<0.5): e- higher: rule X¬Y
Contra-positive ¬X¬Y
AR Quality : Objective Measures
Interestingness Measure Definition
i(XY) = f(n, nx, ny, nxy)
General principles:
Semantic and readability for the user
 Increasing value with the quality
 Sensibility to equiprobability (inclusion)
 Statistic Likelihood (confidence in the
measure itself)
 Noise resistance, time stability
 Surprisingness, nuggets ?

AR Quality : Objective Measures
Properties in the Literature
Properties of i(XY) = f(n, nx, ny, nxy)
 [Piatetsky-Shapiro 1991] (strong rules):




[Major & Mangano 1993]:


(P1) =0 if X and Y are independent
(P2) increases with examples nxy
(P3) decreases with premise nx (or conclusion ny)(?)
(P4) increases with nxy when confidence is constant (nxy/nx)
[Freitas 1999]:


(P5) asymmetry (i(XY)/=i(YX))
Small disjunctions (nuggets)
[Tan et al. 2002], [Hilderman & Hamilton 2001] and [Gras et al. 2004]
AR Quality : Objective Measures
Selected Properties

Inclusion and equiprobability


Independence



Noise Resistance, interval of security for independence and
equiprobability
Sensibility


Comparability, global threshold, inclusion
Non linearity


0, interval of security
Bounded maximum value


0, interval of security
N (nuggets), dilation (likelihood)
Frequency p(X)  cardinal nx
Reinforcement by similar rules (contra-positive, negative
rule,…)
[Smyth & Goodman 1991][Kodratoff 2001][Gras et al 2001][Gras et al. 2004]
AR Quality : Objective Measures
What Could Be a Good Measure?

Negative-examples nx¬y
+ independence + equiprobability 
constraints upon other dimensions
 Imax
AR Quality : Objective Measures
Consequences On Other
Dimensions

Conclusion ny
 Decrease

with ny (ny  n: Ind ↓)
Size of data n
 Increase
with dilation (Ind ↑)
 Increase
with n (Ind ↑)
AR Quality : Objective Measures
List
AR Quality : Objective Measures
Classification
Classification between three criteria:
 Object of the index


Range of the index


Concept measured by the index
Entity concerned with measurement
Nature of the index

Statistical or descriptive character of the index
AR Quality : Objective Measures
Classification
The Object:
 Certain indices take a fixed value with
independence. P(a ∩ b) = P(a) x P(b)


Certain indices take a fixed value with
equilibrium. P(a ∩ b) = P(a)/2


They evaluate a variation with independence
They evaluate a variation with equilibrium
Others do not take a fixed value with
independence or with equilibrium

Statistical indices
AR Quality : Objective Measures
Classification
The Range:
 Certain indices evaluate to more than a simple rule:

They relate simultaneously to a rule and its contra-positive:
I(a  b) = I(¬b  ¬ a)
Indices of quasi-Involvement

They simultaneously relate a rule and its reciprocal:

I(a  b) = I(b  a)


Indices of quasi-conjunction
They relate simultaneously to all three:
I(a  b) = I(b  a) = I(¬ b  ¬ a)

Indices of quasi-equivalence
AR Quality : Objective Measures
Classification
The Nature:
If variation :
If not :
statistical index
descriptive index
AR Quality : Objective Measures
Classification
AR Quality : Objective Measures
List Of Quality Measures

Monodimensional e+, e


Bidimensional - Inclusion




Descriptive-Confirm [Yves Kodratoff, 1999]
Sebag et Schoenauer [Sebag, Schoenauer, 1991]
Examples neg examples ratio (*)
Bidimensional – Inclusion – Conditional Probability




Support [Agrawal et al. 1996]
Ralambrodrainy [Ralambrodrainy, 1991]
Confidence [Agrawal et al. 1996]
Wang index [Wang et al., 1988]
Laplace (*)
Bidimensional – Analogous Rules

Descriptive Confirmed-Confidence [Yves Kodratoff, 1999] (*)
AR Quality : Objective Measures
List Of Quality Measures

Tridimensional – Analogous Rules





Causal Support [Kodratoff, 1999]
Causal Confidence [Kodratoff, 1999] (*)
Causal Confirmed-Confidence [Kodratoff, 1999]
Least contradiction [Aze & Kodratoff 2004] (*)
Tridimensional – Linear - Independent











Pavillon index [Pavillon, 1991]
Rule Interest [Piatetsky-Shapiro, 1991] (*)
Pearl index [Pearl, 1988], [Acid et al., 1991] [Gammerman, Luo, 1991]
Correlation [Pearson 1996] (*)
Loevinger index [Loevinger, 1947] (*)
Certainty factor [Tan & Kumar 2000]
Rate of connection[Bernard et Charron 1996]
Interest factor [Brin et al., 1997]
Top spin(*)
Cosine [Tan & Kumar 2000] (*)
Kappa [Tan & Kumar 2000]
AR Quality : Objective Measures
List Of Quality Measures

Tridimensional – Nonlinear – Independent














Chi squared distance
Logarithmic lift [Church & Hanks, 1990] (*)
Predictive association [Tan & Kumar 2000] (Goodman & Kruskal)
Conviction [Brin et al., 1997b]
Odd’s ratio [Tan & Kumar 2000]
Yule’Q [Tan & Kumar 2000]
Yule’s Y [Tan & Kumar 2000]
Jaccard [Tan & Kumar 2000]
Klosgen [Tan & Kumar 2000]
Interestingness [Gray & Orlowska, 1998]
Mutual information ratio (Uncertainty) [Tan et al., 2002]
J-measure [Smyth & Goodman 1991] [Goodman & Kruskal 1959] (*)
Gini [Tan et al., 2002]
General measure of rule interestingness [Jaroszewicz & Simovici, 2001] (*)
AR Quality : Objective Measures
List Of Quality Measures

Quadridimensional – Linear – independent



Quadridimensional – likeliness (conditional probability?) of
dependence



Probability of error of Chi2 (*)
Intensity of Involvement [Gras, 1996] (*)
Quadridimensional – Inclusion – dependent – analogous rules



Lerman index of similarity[Lerman, 1981]
Index of Involvement[Gras, 1996]
Entropic intensity of Involvement [Gras, 1996] (*)
TIC [Blanchard et al., 2004] (*)
Others



Surprisingness (*) [Freitas, 1998]
+ rules of exception [Duval et al. 2004]
+ rule distance, similarity [Dong & Li 1998]
AR Quality : Objective Measures
Objective Measures
Simulations and Properties
Quality of Rules : Objective Measures
Monodimensional Measures e+ eSupport [Agrawal et al. 1996]

Definition :

Semantics : degree of
general information
Sensitivity: 1 parameter
Measuring frequency
Linear
Insensitive to
independence
Disequilibrium?
Symmetrical






Quality of Rules : Objective Measures
Monodimensional Measures e+ eRalambrodrainy Measure [Ralambrodrainy 1991]

Definition :

Semantics: scarcity of the
eSensitivity: 1 parameter
Measuring frequency
Linear
Insensitive to
independence
Disequilibrium?
Increasing






Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Descriptive-Confirm [Kodratoff 1999]

Definition:

Semantics: variation e+ e(improved support)
Sensitivity: 2 parameters
Measuring frequency
Linear
Insensitive to
independence
0 with disequilibrium





Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Sebag and Schoenauer [Sebag & Schoenauer, 1991]

Definition:

Semantics: ratio e+/eSensitivity: 2 parameters
Measuring frequency
Non-Linear (very selective)
Insensitive to
independence
1 with disequilibrium
Max value not limited






Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Example and Counterexample Rate (*)

Definition:

Semantics: ratio e+/eSensitivity: 2 parameters
Measuring frequency
Non-linear (tolerance)
Insensitive to
independence
0 with disequilibrium
Max value limited






Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Confidence [Agrawal et al. 1996]

Definition:

Semantics: inclusion, validity
Sensitivity: 2 parameters
Measuring frequency
Linear
Insensitive to independence
0.5 with disequilibrium
Max value limited
Variations:









[Ganascia, 1991] : Charade
Or Descriptive ConfirmedConfidence [Kodratoff, 1999]
Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Wang [Wang et al 1988]

Definition:

Semantics: improved
support (threshold of
confidence integrated)
Sensitivity: 2 parameters
Measuring frequency
Linear
Insensitive to
independence
Disequilibrium?





Quality of Rules : Objective Measures
Bidimensional Measures - Inclusion
Laplace [Clark & Robin 1991], [Tan & Kumar 2000]

Definition:

Semantics: estimates
confidence (decreases
with lowering support)
Sensitivity: 2 parameters
Does not measure
frequency when
numbers are small
Linear
Insensitive to
independence
Max value limited





Quality of Rules : Objective Measures
Bidimensional Measures–Similar Rules
Descriptive Confirmed-Confidence [Kondratoff 1999]

Definition:

Semantics: confidence
confirmed by its negative
(X¬Y)
Sensitivity: 2 parameters
Measuring frequency
Linear
Insensitive to independence
0 with disequilibrium
Max value limited
Reinforcement by the
negative rule







Quality of Rules : Objective Measures
Bidimensional Measures–Similar Rules
Casual Support [Kodratoff 1999]

Definition:

Semantics: support
improved by the use of
the contra-positive
Sensitivity: 3 parameters
Measuring frequency
Linear
Insensitive to
independence
Disequilibrium?
Reinforcement by the
contra-positive rule






Quality of Rules : Objective Measures
Bidimensional Measures–Similar Rules
Casual Confidence [Kodratoff 1999]

Definition:

Semantics: confidence
reinforced by the contrapositive
Sensitivity: 3 parameters
Measuring frequency
Linear
Insensitive to independence
Disequilibrium?
Max value limited
Reinforcement by the
contra-positive rule
Evolution: Causal-Confirmed
Confidence: contra-positive +
negative








Quality of Rules : Objective Measures
Bidimensional Measures–Similar Rules
Least Contradiction [Aze & Kodratoff 2004]

Definition:

Semantics: little-contradiction
Sensitivity: 3 parameters
Measuring frequency
Linear
0 with Disequilibrium
Supports inclusive
measurement
Reinforcement by the
negative rule
Coupled with an algorithm







Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Centered Confidence (Pavillon Index) [Pavillon 1991]

Definition:

Semantics: variation with
independence, correction of
the size of the conclusion
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when independent
Disequilibrium?






Called Added Value in [Tan et
al. 2002]
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Rule Interest [Piatetsky-Shapiro 1991]

Definition:

Semantics: gap to
independence (strong rules)
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when independent
Disequilibrium?
Alternative symmetric
Measure:







Pear [Pearl, 1988], [Acid et al.,
1991] [GAMMERMAN, Luo,
1991]
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Coefficient of Correlation [Pearson 1996]

Definition:

Semantics:
Correlation
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when independent
Disequilibrium?






Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Loevinger (*) [Loevinger 1947]
Certainty Factor [Tan & Kumar 2000]

Definition:

Semantics: dependence
implicative
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when independent
Maximum value bounded
(inclusion)
Disequilibrium?
Equivalent measure: Certainty
factor [Tan & Kumar 2000]:







Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Varying Rates Liaison [Bernard & Charron 1996]

Definition:


Semantics: dependence
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when independent
Inclusion?
Disequilibrium?

Variations:








Measurement of interest (interest
factor) [Brin et al., 1997]
Equivalent to Lift
Alternative: Logarithmic Measure
of lift [Church & Hanks, 1990]
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Measure of Interest (Interest Factor) [Brin et al. 1997]
Lift (*)
Logarithmic Measure of Lift (*) [Church & Hanks 1990]
Cosine (*) [Tan & Kumar 2000]

Definitions:


Measure of Interest
(Interest Factor):
Lift

Logarithmic
Measure of Lift:

Cosine:






Semantics: dependence
Sensitivity: 3 parameters
Measuring Frequency
Linear
Inclusion?
Disequilibrium?
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Kappa [Tan & Kumar 2000]

Definition:

Semantics:
Sensitivity: 3 parameters
Measuring frequency
Linear
0 when Independent
Disequilibrium?
Maximum value
Strengthened by
contra-positive







Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Predictive Association (*) [Tan & Kumar 2000]
(Goodman & Kruskal)

Definition:

Semantics: X good prediction
for Y
Sensitivity: 3 parameters
Measuring frequency
Linear piecewise
0 when independent?
Maximum value?
Disequilibrium?






with
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Conviction [Brin et al. 1997b]

Definition:

Semantics: conviction
Sensitivity: 3 parameters
Measuring frequency
Non Linear (very
selective)
1 when independent
Maximum value not
merely
Disequilibrium?






(shape similar to Sebag and Schoenauer [Sebag &
Schoenauer 1991] except for independence)
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Odds Ratio, Yule’s Q, Yule’s Y [Tan & Kumar 2000]

Definitions:
(Close Conviction)
Odds Ratio:
Yule’s Q:








Yule’s Y:
Semantics: correlation
Sensitivity: 3 parameters
Measuring frequency
Non Linear (resistance to noise?)
1 or 0 when independent
Bounded max value (1 or not)
Disequilibrium?
Strengthened by the similar
rules
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Jaccard, Klosgen [Tan & Kumar 2000]









Definitions:

Jaccard:

Klosgen:
Semantics: correlation
Sensitivity: 3 parameters
Measuring frequency
Non Linear
0 when independent
Bounded max value (0 or 1)
Disequilibrium?
Strengthened by similar rules
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Interestingness Weighting Dependency [Gray & Orlowska 1998]

Definition:

Semantics: interest?
Sensitivity: 3 parameters
Measuring frequency
Non Linear
0 when independent
Inclusion?
Disequilibrium?






Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Mutual Information (Uncertainty) [Tan et al 2002]

Definition:

Semantics: information gain provided by X for Y
Sensitivity: 3 parameters
Measuring frequency
Non linear, entropic
0 when independent
Inclusion? Disequilibrium?
Strongly Symmetric
Low value







Quality of Rules : Objective Measures
Tridimensional Measures-Independence
J-Measure (*) [Smyth & Goodman 1991][Goodman & Kruskal 1959]

Definition:

Semantics: cross entropy (by
mutual information)
Sensitivity: 3 parameters
Measuring frequency
Non linear, entropic
O when Independent + concave
Inclusion? Disequilibrium?
Symmetric
Low value
Strengthened by the negative
(X¬Y)








Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Gini Index

Definition:

Semantics: quadratic
entropy
Sensitivity: 3 parameters
Measuring frequency
Non linear, entropic
0 when Independent +
concave
Inclusion?
Disequilibrium?
Very Symmetric
Low value







Quality of Rules : Objective Measures
Tridimensional Measures-Independence
General Measure of Rule Interestingness (*)
[Jaroszewicz & Simovici 2001]

Definition:









(continuum of measure
between Gini and Chi2)
Semantics:?
Sensitivity: 3 parameters
Measuring frequency
Non-Linear (Gini-> distance
from the Chi-2)
0 when independent
Inclusion? Disequilibrium?
Not Symmetric -> Symmetric




Δα: Family measures differences
conditioned by a factor
real α (Gini -> Distance from chi2)
ΔX(resp. ΔY): distribution of vector
X and (resp. Y)
Δxy: vector distribution of X and
attached Y
ΔX x ΔY: vector distribution of
attached X and Y under the
hypothesis of independence
θ: vector apriori distribution of Y
Quality of Rules : Objective Measures
Quadridimensional Measures-Independence
Lerman Similarity [Lerman 1981]

Definition:

Semantics: number of examples normalized centered
Sensitivity: 4 parameters
Measurement statistics (numbers)
Linear
0 when independent
Inclusion?
Disequilibrium?






Quality of Rules : Objective Measures
Quadridimensional Measures-Independence
Variation: Implication Index [Gras 1996]

Definition:

Semantics: number of normalized counter-examples
Sensitivity: 4 parameters
Measurement statistics (numbers)
Linear
0 when independent
Inclusion?
Disequilibrium?






Quality of Rules : Objective Measures
Quadridimensional Measures-Independence
Lerman Similarity [Lerman 1981]

Definition:

(probabilistic modeling, law chi2)
Semantics: probability of a
dependence between X and Y
Sensitivity: 4 parameters
Measuring probability, not
frequency
Non Linear + e- tolerance
0 when independent + real
Maximum value bounded
inclusion? Disequilibrium?
Strongly Symmetric => Coupling
measure of interest [Brin et al.,
1997]
Alternative: Report likelihood
[Ritschard & al., 1998]









Quality of Rules : Objective Measures
Quadridimensional
Measures-Independence
Intensity of Implication (*)[Gras 1996] (Analysis of Statistical
Involvement)








Definition:
(probabilistic modeling, law of counterexamples)
Semantics: likely the scarcity of counterexamples (Statistical astonishment)
Sensitivity: 4 parameters
Measuring probability, not frequency
Non Linear + e-tolerance
0.5 when independent + likelihood
Maximum value bounded
inclusion? Disequilibrium?
Logic rules:
Can be 0
Inspired by Link Likelihood [Lerman et al
1981]
AR Quality : Objective Measures
Intensity Of Involvement and
Analysis Of Implicative Statistics
Extensions

Modeling:

Structuring:

Binary variables => numerical,
ordinal, intervals, fuzzy
[Bernadet 2000, Guillaume
2002, ...]
Bulky data: intensity of entropic
Involvement [Gras et al. 2001]
Sequences: rules of prediction
[Blanchard et al. 2002]

Hierarchy implicative
(cohesion) [Gras et al. 2001]
Typical, reduction of variables
(inertia of Involvement) [Gras
et al. 2002]



Applications



CHIC (http://www.ardm.asso.fr/CHIC.html)
SIPINA (University of Lyon 2)
FELIX (PerformanSE SA)
AR Quality : Objective Measures
Quadridimensional Rules
Entropy (*) [Gras et al 2001] (Analysis of Statistical Involvement)









Definition:

Inclusion Rate:

Information:

Asymmetric entropy: the entropy H’(Y|X) decreases with p(Y|X)
(increases with
Semantics: Surprising Statistic + inclusion (removal of disequilibrium)
Sensitivity: 4 parameters
Measuring frequency non-probabilistic
Non linear + tolerance e- (adjustment of the selectivity with α (ex:
α=2)
Max 0.5 when independent + real
0 when in disequilibrium
Strengthened by the contra-positive
Maximum value bounded (1)
)
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
TIC (*) [Blanchard et al.2004] (Analysis of Statistical Involvement)









Definition:

Information Rate:

Asymmetric Entropy: The entropy Ê(X) with p(X)
Semantics: Surprise Statistic + inclusion (removal of
disequilibrium)
Sensitivity: 4 parameters
Measuring frequency
Non-linear, entropic
0 to independence
0 to Imbalance
Strengthened by the contra-positive
Maximum value bounded (1)
Quality of Rules : Objective Measures
Tridimensional Measures-Independence
Surprisingness (*) [Freitas 1998]





Definition:

Information gain provided by the
attribute Xi:

Conditional entropy:
Rule: X1 X2 X3 … Xp-1 Xp Y
Semantics: surprise gain informational resources provided
by the premise
Measuring frequency
Non-Linear: entropic
Can be used to assess individual contribution of each
attribute ...
Comparative Theory
Comparison by Simulation
Intensity of Involvement
Confidence, J-Measure, Coverage Rate
Comparison by Simulation
Intensity of Involvement
Confidence, J-Measure, Coverage Rate
Comparison by Simulation
Intensity of Involvement
Confidence, J-Measure, Coverage Rate
Comparison by Simulation
Intensity of Involvement
Confidence, PS, Intensity of Implication
Comparison by Simulation
TIC
Confidence, TIM, J-Measure, Gini Index
Comparison by Simulation
TIC
Confidence, TIM, J-Measure, Gini Index
Comparison by Simulation
TIC
Confidence, J-Measure, Coverage Rate
Comparison by Simulation
Comparison by Simulation
Comparison by Simulation
Comparison by Simulation
Comparison by Simulation
Intensity of Involvement
Confidence, J-Measure, Coverage Rate
Quality of Rules : Subjective Measures
Synthesis & Comparative Studies

[Bayardo and Agrawal, 1999]: influence of support


[Hilderman and Hamilton, 2001]: Interest summaries



10 criteria
[Lenca et al., 2004]: association rules interest


21 measures symmetrical 8 principles study of correlation, influence the
media
[Gras et al. 04]: interest association rules


9 symmetric measures, study of the relationship observed between 2
measurements, influence of support
[Tan et al., 2002]: association rules interest


16 measures, 5 principles of independence, correlation study
[Azé and Kodratoff, 2001]: resistance to noise in the data
[Tan & Kumar 2000]: interest association rules


9 measures, monotonous functions / antitones support, optimization
20 measures, 8 criteria for decision support multi-criteria
[Lallich & Teytaud 2004]: association rules interest

15 measures, 10 principles, learning and using the VC-dimension
Study of Comparative
Experiments
Project AR-QAT : Quality Measures
Analysis Tool
Experimental Results
Input Data Sets

30 Objective Measures
Experimental Results – Positive
Correlations
Experimental Results – Positive
Correlations
Experimental Results
Stable Strong Positive Correlations
Average Correlation
ARVAL
A workshop for calculating
quality measures
for the scientific community
http://www.univ-nantes.fr/arval
ARVAL
ARVAL
ARVAL
ARVAL
Conclusion
Conclusion and Outlooks

Quality = multidimensional concept:

Subjective (maker)



Interest = changes with the knowledge of the decision-maker
PB1: extract knowledge / objective decision-maker
Objective (data and rules)



Interest = on the Hypothetical Data: Inclusion, Independence,
Imbalance, nuggets, robustness ...
Antagonism Independence / Disequilibrium
Many indices (~ 50!) =>





PB2: restricted to support / confidence => workshop for
calculating indices
PB3: comparative study (properties, simulations) and
experimental (behavior data): a platform?
PB4: combining the clues, choose the right index => Decision
Support
PB5: new clues?
PB6: What is a good index? (ingredients of quality)
Ax: Quality Assessment of Knowledge
Perspective (PB1)
Combining Subjective and Objective Aspects of Quality





Search for
knowledge
Anthropocentric
approach
Adaptive Extraction
FELIX [Lehn et. Al
1999]
AR-VIS [Blanchard et
al. 2003]
Ax: Quality Assessment of Knowledge
Perspective (PB 2 3 4 5)
Platform for experimentation, support and a decision



Calculation: ARVAL? (www.polytech.univ-nantes.fr/arval)
Analysis: AR-QAT? [Popovici 2003]
Decision Support: HERBS? [Lenca et al. 2003] (wwwiasc.enst-bretagne.fr/ecd-ind/HERBS)
Bibliography

















[Agrawal et al., 1993] R. Agrawal, T. Imielinsky et A. Swami. Mining associations rules between sets of items in large databases. Proc. of
ACM SIGMOD'93, 1993, p. 207-216
[Azé & Kodratoff, 2001] J. Azé et Y. Kodratoff. Evaluation de la résistance au bruit de quelques mesures d'extraction de règles
d'association. Extraction des connaissances et apprentissage 1(4), 2001, p. 143-154
[Azé & Kodratoff, 2001] J. Azé et Y. Kodratoff. Extraction de « pépites » de connaissances dans les données : une nouvelle approche et
une étude de sensibilité au bruit. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Bayardo & Agrawal, 1999] R.J. Bayardo et R. Agrawal. Mining the most interesting rules. Proc. of the 5th Int. Conf. on Knowledge
Discovery and Data Mining, 1999, p.145-154.
[Bernadet 2000] M. Bernardet. Basis of a fuzzy knowledge discovery system. Proc. of Principles of Data Mining and Knowledge
Discovery, LNAI 1510, pages 24-33. Springer, 2000.
[Bernard et Charron 1996] J.-M. Bernard et C. Charron. L’analyse implicative bayésienne, une méthode pour l’étude des dépendances
orientées. I. Données binaires, Revue Mathématique Informatique et Sciences Humaines (MISH), vol. 134, 1996, p. 5-38.
[Berti-Equille 2004] L. Berti-équille. Etat de l'art sur la qualité des données : un premier pas vers la qualité des connaissances. Rapport
d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Blanchard et al. 2001] J. Blanchard, F. Guillet, et H. Briand. L'intensité d'implication entropique pour la recherche de règles de
prédiction intéressantes dans les séquences de pannes d'ascenseurs. Extraction des Connaissances et Apprentissage (ECA), Hermès
Science Publication, 1(4):77-88, 2002.
[Blanchard et al. 2003] J. Blanchard, F. Guillet, F. Rantière, H. Briand. Vers une Représentation Graphique en Réalité Virtuelle pour la
Fouille Interactive de Règles d’Association. Extraction des Connaissances et Apprentissage (ECA), vol. 17, n°1-2-3, 105-118, 2003.
Hermès Science Publication. ISSN 0992-499X, ISBN 2-7462-0631-5
[Blanchard et al. 2003a] J. Blanchard, F. Guillet, H. Briand. Une visualisation orientée qualité pour la fouille anthropocentrée de règles
d’association. In Cognito - Cahiers Romans de Sciences Cognitives. A paraître. ISSN 1267-8015
[Blanchard et al. 2003b] J. Blanchard, F. Guillet, H. Briand. A User-driven and Quality oriented Visualiation for Mining Association Rules.
In Proc. Of the Third IEEE International Conference on Data Mining, ICDM’2003, Melbourne, Florida, USA, November 19 - 22, 2003.
[Blanchard et al., 2004] J. Blanchard, F. Guillet, R. Gras, H. Briand. Mesurer la qualité des règles et de leurs contraposées avec le taux
informationnel TIC. EGC2004, RNTI, Cépaduès. 2004 A paraître.
[Blanchard et al., 2004a] J. Blanchard, F. Guillet, R. Gras, H. Briand. Mesure de la qualité des règles d'association par l'intensité
d'implication entropique. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Breiman & al. 1984] L.Breiman, J. Friedman, R. Olshen and C.Stone. Classification and Regression Trees. Chapman & Hall,1984.
[Briand et al. 2004] H. Briand, M. Sebag, G. Gras et F. Guillet (eds). Mesures de Qualité pour la fouille de données. Revue des
Nouvelles Technologies de l’Information, RNTI, Cépaduès, 2004. A paraître.
[Brin et al., 1997] S. Brin, R. Motwani and C. Silverstein. Beyond Market Baskets: Generalizing Association Rules to Correlations. In
Proceedings of SIGMOD’97, pages 265-276, AZ, USA, 1997.
[Brin et al., 1997b] S. Brin, R. Motwani, J. Ullman et S. Tsur. Dynamic itemset counting and implication rules for market basket data.
Proc. of the Int. Conf. on Management of Data, ACM Press, 1997, p. 255-264.
Bibliography



















[Church & Hanks, 1990] K. W. Church et P. Hanks. Word association norms, mutual information and lexicography. Computational
Linguistics, 16(1), 22-29, 1990.
[Clark & Robin 1991] Peter Clark and Robin Boswell: Rule Induction with CN2: Some Recent Improvements. In Proceeding of the
European Working Session on Learning EWSL-91, 1991.
[Dong & Li, 1998] G. Dong and J. Li. Interestingness of Discovered Association Rules in terms of Neighborhood-Based Unexpectedness.
In X. Wu, R. Kotagiri and K. Korb, editors, Proc. of 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD `98),
Melbourne, Australia, April 1998.
[Duval et al. 2004] B. Duval, A. Salleb, C. Vrain. Méthodes et mesures d’intérêt pour l’extraction de règles d’exception. Rapport d’activité
du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Fleury 1996] L. Fleury. Découverte de connaissances pour la gestion des ressources humaines. Thèse de doctorat, Université de
Nantes, 1996.
[Frawley & Piatetsky-Shapiro 1992] Frawley W. Piatetsky-Shapiro G. and Matheus C., « Knowledge discovery in databases: an
overview », AI Magazine, 14(3), 1992, pages 57-70
[Freitas, 1998] A. A. Freitas. On Objective Measures of Rule Suprisingness. In J. Zytkow and M. Quafafou, editors, Proceedings of the
Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD `98), pages 1-9, Nantes, France,
September 1998.
[Freitas, 1999] A. Freitas. On rule interestingness measures. Knowledge-Based Systems Journal 12(5-6), 1999, p. 309-315.
[Gago & Bento, 1998 ] P. Gago and C. Bento. A Metric for Selection of the Most Promising Rules. PKDD’98, 1998.
[Gray & Orlowska, 1998] B. Gray and M. E. Orlowska. Ccaiia: Clustering Categorical Attributes into Interesting Association Rules. In X.
Wu, R. Kotagiri and K. Korb, editors, Proc. of 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD `98), pages 132
43, Melbourne, Australia, April 1998.
[Goodman & Kruskal 1959] L. A. Goodman andW. H. Kruskal. Measures of Association for Cross Classification, ii: Further discussion
and references. Journal of the American Statistical Association, ??? 1959.
[Gras et al. 1995] R. Gras, H. Briand and P. Peter. Structuration sets with implication intensity. Proc. of the Int. Conf. On Ordinal and
Symbolic Data Analysis - OSDA 95. Springer, 1995.
[Gras, 1996] R. Gras et coll.. L'implication statistique - Nouvelle méthode exploratoire de données. La pensée sauvage éditions, 1996.
[Gras et al. 2001] R. Gras, P. Kuntz, et H. Briand. Les fondements de l'analyse statistique implicative et quelques prolongements pour la
fouille de données. Mathématiques et Sciences Humaines : Numéro spécial Analyse statistique implicative, 1(154-155) :9-29, 2001.
[Gras et al. 2001b] R. Gras, P. Kuntz, R. Couturier, et F. Guillet. Une version entropique de l'intensité d'implication pour les corpus
volumineux. Extraction des Connaissances et Apprentissage (ECA), Hermès Science Publication, 1(1-2) :69-80, 2001.
[Gras et al. 2002] R. Gras, F. Guillet, et J. Philippe. Réduction des colonnes d'un tableau de données par quasi-équivalence entre
variables. Extraction des Connaissances et Apprentissage (ECA), Hermès Science Publication, 1(4) :197-202, 2002.
[Gras et al. 2004] R. Gras, R. Couturier, J. Blanchard, H. Briand, P. Kuntz, P. Peter. Quelques critères pour une mesure de la qualité des
règles d’association. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Guillaume et al. 1998] S. Guillaume, F. Guillet, J. Philippé. Improving the discovery of associations Rules with Intensity of implication.
Proc. of 2nd European Symposium Principles of data Mining and Knowledge Discovery, LNAI 1510, p 318-327. Springer 1998.
[Guillaume 2002] S. Guillaume. Discovery of Ordinal Association Rules. M.-S. Cheng, P. S. Yu, B. Liu (Eds.), Proc. Of the 6th Pacific- sia
Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2002, LNCS 2336, pages 322-327 Springer 2002.
Bibliography



















[Guillet et al. 1999] F. Guillet, P. Kuntz, et R. Lehn. A genetic algorithm for visualizing networks of association rules. Proc. the 12th Int.
Conf. On Industrial and Engineering Appl. of AI and Expert Systems, LNCS 1611, pages 145-154. Springer 1999
[Guillet 2000] F. Guillet. Mesures de qualité de règles d’association. Cours DEA-ECD. Ecole polytechnique de l’université de Nantes.
2000.
[Hilderman & Hamilton, 1998] R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Interestingness Measures: A Survey.
(KDD `98), ??? New-York 1998.
[Hilderman et Hamilton, 2001] R. Hilderman et H. Hamilton. Knowledge discovery and measures of interest. Kluwer Academic
publishers, 2001.
[Hussain et al. 2001] F. Hussain, H. Liu, E. Suzuki and H. Lu. Exception Rule Mining with a Relative Interestingness Measure. ???
[Jaroszewicz & Simovici, 2001] S. Jaroszewicz et D.A. Simovici. A general measure of rule interestingness. Proc. of the 7th Int. Conf.
on Knowledge Discovery and Data Mining, L.N.C.S. 2168, Springer, 2001, p. 253-265
[Klemettinen et al. 1994] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen and A. I. Verkamo. Finding Interesting Rules from
Large Sets of Discovered Association Rules. In N. R. Adam, B. K. Bhargava and Y. Yesha, editors, Proc. of the Third International Conf.
on Information and Knowledge Management``, pages 401-407, Gaitersburg, Maryland, 1994.
[Kodratoff, 1999] Y. Kodratoff. Comparing Machine Learning and Knowledge Discovery in Databases:An Application to Knowledge
Discovery in Texts. Lecture Notes on AI (LNAI)-Tutorial series. 2000.
[Kuntz et al. 2000] P.Kuntz, F.Guillet, R.Lehn and H.Briand. A User-Driven Process for Mining Association Rules. In D. Zighed, J.
Komorowski and J.M. Zytkow (Eds.), Principles of Data Mining and Knowledge Discovery (PKDD2000), Lecture Notes in Computer
Science, vol. 1910, pages 483-489, 2000. Springer.
[Kodratoff, 2001] Y. Kodratoff. Comparing machine learning and knowledge discovery in databases: an application to knowledge
discovery in texts. Machine Learning and Its Applications, Paliouras G., Karkaletsis V., Spyropoulos C.D. (eds.), L.N.C.S. 2049, Springer,
2001, p. 1-21.
[Kuntz et al. 2001] P. Kuntz, F. Guillet, R. Lehn and H. Briand. A user-driven process for mining association rules. Proc. of Principles of
Data Mining and Knowledge Discovery, LNAI 1510, pages 483-489. Springer, 2000.
[Kuntz et al. 2001b] P. Kuntz, F. Guillet, R. Lehn, et H. Briand. Vers un processus d'extraction de règles d'association centré sur
l'utilisateur. In Cognito, Revue francophone internationale en sciences cognitives, 1(20) :13-26, 2001.
[Lallich et al. 2004] S. Lallich et O. Teytaud . Évaluation et validation de l’intérêt des règles d’association. Rapport d’activité du groupe
gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Lehn et al. 1999] R.Lehn, F.Guillet, P.Kuntz, H.Briand and J. Philippé. Felix : An interactive rule mining interface in a kdd process. In P.
Lenca (editor), Proc. of the 10th Mini-Euro Conference, Human Centered Processes, HCP’99, pages 169-174, Brest, France, September
22-24, 1999.
[Lenca et al. 2004] P. Lenca, P. Meyer, B. Vaillant, P. Picouet, S. Lallich. Evaluation et analyse multi-critères des mesures de qualité des
règles d’association. Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
[Lerman et al. 1981] I. C. Lerman, R. Gras et H. Rostam. Elaboration et évaluation d’un indice d’implication pour les données binaires.
Revue Mathématiques et Sciences Humaines, 75, p. 5-35, 1981.
[Lerman, 1981] I. C. Lerman. Classification et analyse ordinale des données. Paris, Dunod 1981.
[Lerman, 1993] I. C. Lerman. Likelihood linkage analysis classification method, Biochimie 75, p. 379-397, 1993.
[Lerman & Azé 2004] I. C. Lerman et J. Azé.Indidice probabiliste discriminant de vraisemblance du lien pour des données volumineuses.
Rapport d’activité du groupe gafoQualité de l’AS GafoDonnées. A paraître dans [Briand et al. 2004].
Bibliography















[Liu et al., 1999] B. Liu, W. Hsu, L. Mun et H. Lee. Finding interesting patterns using user expectations. IEEE Transactions on Knowledge
and Data Engineering 11, 1999, p. 817-832.
[Loevinger, 1947] J. Loevinger. A systemic approach to the construction and evaluation of tests of ability. Psychological monographs,
61(4), 1947.
[Mannila & Pavlov, 1999] H. Mannila and D. Pavlov. Prediction with Local Patterns using Cross-Entropy. Technical Report, Information
and Computer Science, University of California, Irvine, 1999.
[Matheus & Piatetsky-Shapiro, 1996] C. J. Matheus and G. Piatetsky-Shapiro. Selecting and Reporting what is Interesting: The KEFIR
Application to Healthcare data. In U. M. Fayyad, G. Piatetsky-Shapiro, P.Smyth and R. Uthurusamy (eds), Advances in Knowledge
Discovery and Data Mining, p. 401-419, 1996. AAAI Press/MIT Press. [Meo 2000] R. Meo. Theory of dependence values, ACM
Transactions on Database Systems 5(3), p. 380-406, 2000.
[Padmanabhan et Tuzhilin, 1998] B. Padmanabhan et A. Tuzhilin. A belief-driven method for discovering unexpected patterns. Proc. Of
the 4th Int. Conf. on Knowledge Discovery and Data Mining, 1998, p. 94-100.
[Pearson, 1896] K. Pearson. Mathematical contributions to the theory of evolution. III. regression, heredity and panmixia. Philosophical
Transactions of the Royal Society, vol. A, 1896.
[Piatestsky-Shapiro, 1991] G. Piatestsky-Shapiro. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in
Databases. Piatetsky-Shapiro G., Frawley W.J. (eds.), AAAI/MIT Press, 1991, p. 229-248
[Popovici, 2003] E. Popovici. Un atelier pour l'évaluation des indices de qualité. Mémoire de D.E.A. E.C.D., IRIN/Université
Lyon2/RACAI Bucarest, Juin 2003
[Ritschard & al., 1998] G. Ritschard, D. A. Zighed and N. Nicoloyannis. Maximiser l`association par agrégation dans un tableau croisé. In
J. Zytkow and M. Quafafou, editors, Proc. of the Second European Conf. on the Principles of Data Mining and Knowledge Discovery
(PKDD `98), Nantes, France, September 1998.
[Sebag et Schoenauer, 1988] M. Sebag et M. Schoenauer. Generation of rules with certainty and confidence factors from incomplete
and incoherent learning bases. Proc. of the European Knowledge Acquisition Workshop (EKAW'88), Boose J., Gaines B., Linster M.
(eds.), Gesellschaft für Mathematik und Datenverarbeitung mbH, 1988, p. 28.1-28.20.
[Shannon & Weaver, 1949] C.E. Shannon et W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.
[Silbershatz &Tuzhilin,1995] Avi Silberschatz and Alexander Tuzhilin. On Subjective Measures of Interestingness in Knowledge
Discovery, (KD. & DM. `95) ??? , 1995.
[Smyth & Goodman, 1991] P. Smyth et R.M. Goodman. Rule induction using information theory. Knowledge Discovery in Databases,
Piatetsky- Shapiro G., Frawley W.J. (eds.), AAAI/MIT Press, 1991, p. 159-176
[Tan & Kumar 2000] P. Tan, V. Kumar. Interestingness Measures for Association Patterns : A Perspective. Workshop tutorial (KDD
2000).
[Tan et al., 2002] P. Tan, V. Kumar et J. Srivastava. Selecting the right interestingness measure for association patterns. Proc. of the 8th
Int. Conf. on Knowledge Discovery and Data Mining, 2002, p. 32-41.