Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Associative Classification of
Imbalanced Datasets
Sanjay Chawla
School of IT
University of Sydney
1
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
2
Data Mining
• Data Mining research has settled into an
equilibrium involving four tasks
Associative
Classifier
Pattern Mining
(Association Rules)
Classification
DB
Clustering
Anomaly or Outlier
Detection
ML
3
Association Rules (Agrawal, Imielinksi and
Swami, 93 SIGMOD)
– An implication expression of the
form X Y, where X and Y are
itemsets
– Example:
{Milk, Diaper} {Beer}
• Rule Evaluation Metrics
– Confidence (c)
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example:
– Support (s)
• Fraction of transactions that
contain both X and Y
TID
{Milk , Diaper } Beer
s
(Milk, Diaper, Beer )
|T|
2
0.4
5
• Measures how often items in Y
(Milk, Diaper, Beer ) 2
appear in transactions that
c
0.67
contain X
(Milk, Diaper )
3
5
From “Introduction to Data Mining”, Tan,Steinbach and Kumar
Mining Association Rules
•
Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each
frequent itemset, where each rule is a binary
partitioning of a frequent itemset
•
Frequent itemset generation is computationally
expensive
6
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
7
Associative Classifiers
• Most of the Associative Classifiers are
based on rules discovered using the
support-confidence criterion.
• The classifier itself is a collection of rules
ranked using their support or confidence.
8
Associative Classifiers (2)
TID
Items
Gender
1
Bread, Milk
F
2
Bread, Diaper, Beer, Eggs
M
3
Milk Diaper, Beer, Coke
M
4
Bread, Milk, Diaper, Beer
M
5
Bread, Milk, Diaper, Coke
F
In a Classification task we want to predict the
class label (Gender) using the attributes
A good (albeit stereotypical) rule is {Beer,Diaper} Male whose
support is 60% and confidence is 100%
9
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
10
Imbalanced Data Set
• In some application domains, Data Sets
are Imbalanced :
– The proportion of samples from one class is
much smaller than the other class/classes.
– And the smaller class is the class of interest.
• Support and confidence are biased toward
the majority class, and do not perform well
in such cases.
11
Downsides of Support
• Support is biased towards the majority
class
– Eg: classes = {yes, no}, sup({yes})=90%
– minSup > 10% wipes out any rule predicting
“no”
– Suppose X no has confidence 1 and
support 3%. Rule discarded if minSup > 3%
even though it perfectly predicts 30% of the
instances in the minority class!
12
Downside of Confidence(1)
C
Conf(A C) = 20/25 = 0.8
Support(AC) = 20/100 = 0.2
Correlation between A and C:
A
A
C
20
5
25
70
5
75
90
10
100
( A, C )
20
0.89 1
( A) (C ) 0.25 0.90
Thus, when the data set is imbalanced a high support and high
confidence rule may not necessarily imply that the antecedent and
the consequent are positively correlated.
13
Downside of Confidence (2)
• Reasonable to expect that for “good rules”
the antecedent and consequent are not
independent!
• Suppose
– P(Class=Yes) = 0.9
– P(Class=Yes|X) = 0.9
14
Downsides of Confidence (3)
Another useful observation
• Higher confidence (support) for a rule in the
minority class implies higher correlation, and
lower correlation in the minority class implies
lower confidence, but neither of these apply for
the majority class.
• Confidence (support) tends to bias the
majority class.
15
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
16
Contingency Table
• A 2 * 2 Contingency Table for X → y.
• We will use the notation [a, b; c, d] to
represent this table.
X
b
rows
y
X
a
y
c
d
cd
cols
ac bd
ab
n abcd
17
Fisher Exact Test
• Given a table, [a, b; c, d], Fisher Exact
Test will find the probability (p-value) of
obtaining the given table under the
hypothesis that {X, ¬X} and {y, ¬y} are
independent.
• The margin sums (∑rows, ∑cols) are fixed.
18
Fisher Exact Test (2)
• The p-value is given by:
p([a, b; c, d ])
min( b ,c )
i 0
(a b)!(c d )!(a c)!(b d )!
n!(a i)!(b i)!(c i)!(d i)!
• We will only use rules whose p-values are below
the level of significant desired (e.g. 0.01).
• Rules that pass this test are statistically
significant in the positively associated direction
(e.g. X → y).
19
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
20
Class Correlation Ratio
• In Class Correlation, we are interested in rules
X → y where X is more positively correlated
with y than it is with ¬y.
• The correlation is defined by:
sup( X y ) | T |
an
corr ( X y )
sup( X ) sup( y ) (a c)( a b)
where |T| is the number of transactions n.
21
Class Correlation Ratio (2)
• We then use corr() to measure how
correlated X is with y compared to ¬y.
• X and y are positively correlated if
corr(X→y)>1, and negatively correlated if
corr(X→y)<1.
22
Class Correlation Ratio (3)
• Based on correlation corr(), we define the
Class Correlation Ratio (CCR):
corr ( X y ) a(c d )
CCR( X y )
corr ( X y ) c(a b)
• The CCR measures how much more positively
the antecedent is correlated with the class it
predicts (e.g. y), relative to the alternative
class (e.g. ¬y).
23
Class Correlation Ratio (4)
corr ( X y)
CCR( X y)
corr ( X y)
• We only use rules with CCR higher than a
desired threshold, so that no rules are used
that are more positively associated with the
classes they do not predict.
24
The two measurements
• We perform the following tests to
determine whether a potentially interesting
rule is indeed interesting:
– Check the significant of a rule X → y by
performing the Fisher’s Exact Test.
– Check whether CCR(X→y) > 1.
• Those rules that pass the above two tests
are candidates for the classification task.
25
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
26
Search and Pruning Strategies
• To avoid examining the whole set of possible
rules, we use search strategies that ensure
the concept of being potential interesting is
anti-monotonic:
X→y might be considered as potential
interesting if and only if all {X’→y|X’ in X}
have been found to be potentially interesting.
27
Search and Pruning Strategies (2)
• The contingency table [a, b; c, d] used to test
for the significance of the rule X → y in
comparison to one of its generalizations X-{z}
→ y for the Aggressive search strategy.
t:X t
t : X {z} t z t
t : X {z} t
t : y t
a sup( X y )
b sup( X {z} y ) sup( X y )
a b sup( X {z} y}
t : y t c sup( X y ) d sup( X {z} y ) sup( X y ) c d sup( X {z} y )
a c sup( X )
b d sup( X {z}) sup( X )
a b c d sup( X {z})
28
Example
• Suppose we have already determined that the
rules (A = a1) 1 and (A = a2) 1 are
significant.
• Now we want to test if X=(A =a1) ^ (A=a2) 1
is significant
• Then we carry out a FET and calculate the CCR
on X and X –{A=a1} (i.e. z = {a2})and X and X{A=a2} (i.e. z = {a1}).
• If the minimum of their p-value is less than the
significance level, and their CCR is greater than
1, we keep the X 1 rule, otherwise we discard
it.
29
Ranking Rules
• Strength Score (SS):
– In order to determine how interesting a rule is,
we need a ranking (ordering) of the rules, and
the ordering is defined by the Strength Score.
30
Overview
•
•
•
•
Data Mining Tasks
Associative Classifiers
Downside of Support and Confidence
Mining Rules from Imbalanced Data Sets
– Fisher’s Exact Test
– Class Correlation Ratio (CCR)
– Searching and Pruning Strategies
– Experiments
31
Experiments (Balanced Data)
• The preceding approach is represented by
“SPARCCC”.
• The experiments on Balanced Data Sets show
that the average accuracy of SPARCCC
compares favourably to CBA and C4.5.
– The table below is the prediction accuracy on
balanced data sets.
32
Experiments (Imbalanced Data)
• True Positive Rate (Recall/Sensitivity) is a better
performance measure for imbalanced data sets.
• SPARCCC overcomes other rule based techs
such as CBA and CCCS.
– The table below is True Positive Rate of the Minority
Class on Imbalanced version of the Datasets.
33
References
• Florian Verhein, Sanjay Chawla.
Using Significant, Positively Associated
and Relatively Class Correlated Rules For
Associative Classification of Imbalanced
Datasets.
The 2007 IEEE International Conference
on Data Mining . Omaha NE, USA.
October 28-31, 2007.
34