Download SPARCCC Slides

Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney 1 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 2 Data Mining • Data Mining research has settled into an equilibrium involving four tasks Associative Classifier Pattern Mining (Association Rules) Classification DB Clustering Anomaly or Outlier Detection ML 3 Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD) – An implication expression of the form X  Y, where X and Y are itemsets – Example: {Milk, Diaper}  {Beer} • Rule Evaluation Metrics – Confidence (c) Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Example: – Support (s) • Fraction of transactions that contain both X and Y TID {Milk , Diaper }  Beer s  (Milk, Diaper, Beer ) |T|  2  0.4 5 • Measures how often items in Y  (Milk, Diaper, Beer ) 2 appear in transactions that c   0.67 contain X  (Milk, Diaper ) 3 5 From “Introduction to Data Mining”, Tan,Steinbach and Kumar Mining Association Rules • Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support  minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset • Frequent itemset generation is computationally expensive 6 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 7 Associative Classifiers • Most of the Associative Classifiers are based on rules discovered using the support-confidence criterion. • The classifier itself is a collection of rules ranked using their support or confidence. 8 Associative Classifiers (2) TID Items Gender 1 Bread, Milk F 2 Bread, Diaper, Beer, Eggs M 3 Milk Diaper, Beer, Coke M 4 Bread, Milk, Diaper, Beer M 5 Bread, Milk, Diaper, Coke F In a Classification task we want to predict the class label (Gender) using the attributes A good (albeit stereotypical) rule is {Beer,Diaper}  Male whose support is 60% and confidence is 100% 9 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 10 Imbalanced Data Set • In some application domains, Data Sets are Imbalanced : – The proportion of samples from one class is much smaller than the other class/classes. – And the smaller class is the class of interest. • Support and confidence are biased toward the majority class, and do not perform well in such cases. 11 Downsides of Support • Support is biased towards the majority class – Eg: classes = {yes, no}, sup({yes})=90% – minSup > 10% wipes out any rule predicting “no” – Suppose X  no has confidence 1 and support 3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class! 12 Downside of Confidence(1) C Conf(A C) = 20/25 = 0.8 Support(AC) = 20/100 = 0.2 Correlation between A and C: A A C 20 5 25 70 5 75 90 10 100  ( A, C ) 20   0.89  1  ( A) (C ) 0.25  0.90 Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated. 13 Downside of Confidence (2) • Reasonable to expect that for “good rules” the antecedent and consequent are not independent! • Suppose – P(Class=Yes) = 0.9 – P(Class=Yes|X) = 0.9 14 Downsides of Confidence (3) Another useful observation • Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class. • Confidence (support) tends to bias the majority class. 15 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 16 Contingency Table • A 2 * 2 Contingency Table for X → y. • We will use the notation [a, b; c, d] to represent this table. X b  rows y X a y c d cd  cols ac bd ab n  abcd 17 Fisher Exact Test • Given a table, [a, b; c, d], Fisher Exact Test will find the probability (p-value) of obtaining the given table under the hypothesis that {X, ¬X} and {y, ¬y} are independent. • The margin sums (∑rows, ∑cols) are fixed. 18 Fisher Exact Test (2) • The p-value is given by: p([a, b; c, d ])  min( b ,c )  i 0 (a  b)!(c  d )!(a  c)!(b  d )! n!(a  i)!(b  i)!(c  i)!(d  i)! • We will only use rules whose p-values are below the level of significant desired (e.g. 0.01). • Rules that pass this test are statistically significant in the positively associated direction (e.g. X → y). 19 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 20 Class Correlation Ratio • In Class Correlation, we are interested in rules X → y where X is more positively correlated with y than it is with ¬y. • The correlation is defined by: sup( X  y ) | T | an corr ( X  y )   sup( X )  sup( y ) (a  c)( a  b) where |T| is the number of transactions n. 21 Class Correlation Ratio (2) • We then use corr() to measure how correlated X is with y compared to ¬y. • X and y are positively correlated if corr(X→y)>1, and negatively correlated if corr(X→y)<1. 22 Class Correlation Ratio (3) • Based on correlation corr(), we define the Class Correlation Ratio (CCR): corr ( X  y ) a(c  d ) CCR( X  y )   corr ( X  y ) c(a  b) • The CCR measures how much more positively the antecedent is correlated with the class it predicts (e.g. y), relative to the alternative class (e.g. ¬y). 23 Class Correlation Ratio (4) corr ( X  y) CCR( X  y)  corr ( X  y) • We only use rules with CCR higher than a desired threshold, so that no rules are used that are more positively associated with the classes they do not predict. 24 The two measurements • We perform the following tests to determine whether a potentially interesting rule is indeed interesting: – Check the significant of a rule X → y by performing the Fisher’s Exact Test. – Check whether CCR(X→y) > 1. • Those rules that pass the above two tests are candidates for the classification task. 25 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 26 Search and Pruning Strategies • To avoid examining the whole set of possible rules, we use search strategies that ensure the concept of being potential interesting is anti-monotonic: X→y might be considered as potential interesting if and only if all {X’→y|X’ in X} have been found to be potentially interesting. 27 Search and Pruning Strategies (2) • The contingency table [a, b; c, d] used to test for the significance of the rule X → y in comparison to one of its generalizations X-{z} → y for the Aggressive search strategy. t:X t t : X  {z}  t  z  t t : X  {z}  t t : y t a  sup( X  y ) b  sup( X  {z}  y )  sup( X  y ) a  b  sup( X  {z}  y} t : y  t c  sup( X  y ) d  sup( X  {z}  y )  sup( X  y ) c  d  sup( X  {z}  y ) a  c  sup( X ) b  d  sup( X  {z})  sup( X ) a  b  c  d  sup( X  {z}) 28 Example • Suppose we have already determined that the rules (A = a1)  1 and (A = a2)  1 are significant. • Now we want to test if X=(A =a1) ^ (A=a2)  1 is significant • Then we carry out a FET and calculate the CCR on X and X –{A=a1} (i.e. z = {a2})and X and X{A=a2} (i.e. z = {a1}). • If the minimum of their p-value is less than the significance level, and their CCR is greater than 1, we keep the X 1 rule, otherwise we discard it. 29 Ranking Rules • Strength Score (SS): – In order to determine how interesting a rule is, we need a ranking (ordering) of the rules, and the ordering is defined by the Strength Score. 30 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 31 Experiments (Balanced Data) • The preceding approach is represented by “SPARCCC”. • The experiments on Balanced Data Sets show that the average accuracy of SPARCCC compares favourably to CBA and C4.5. – The table below is the prediction accuracy on balanced data sets. 32 Experiments (Imbalanced Data) • True Positive Rate (Recall/Sensitivity) is a better performance measure for imbalanced data sets. • SPARCCC overcomes other rule based techs such as CBA and CCCS. – The table below is True Positive Rate of the Minority Class on Imbalanced version of the Datasets. 33 References • Florian Verhein, Sanjay Chawla. Using Significant, Positively Associated and Relatively Class Correlated Rules For Associative Classification of Imbalanced Datasets. The 2007 IEEE International Conference on Data Mining . Omaha NE, USA. October 28-31, 2007. 34

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SPARCCC Slides