Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney 1 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 2 Data Mining • Data Mining research has settled into an equilibrium involving four tasks Associative Classifier Pattern Mining (Association Rules) Classification DB Clustering Anomaly or Outlier Detection ML 3 Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD) – An implication expression of the form X Y, where X and Y are itemsets – Example: {Milk, Diaper} {Beer} • Rule Evaluation Metrics – Confidence (c) Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Example: – Support (s) • Fraction of transactions that contain both X and Y TID {Milk , Diaper } Beer s (Milk, Diaper, Beer ) |T| 2 0.4 5 • Measures how often items in Y (Milk, Diaper, Beer ) 2 appear in transactions that c 0.67 contain X (Milk, Diaper ) 3 5 From “Introduction to Data Mining”, Tan,Steinbach and Kumar Mining Association Rules • Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset • Frequent itemset generation is computationally expensive 6 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 7 Associative Classifiers • Most of the Associative Classifiers are based on rules discovered using the support-confidence criterion. • The classifier itself is a collection of rules ranked using their support or confidence. 8 Associative Classifiers (2) TID Items Gender 1 Bread, Milk F 2 Bread, Diaper, Beer, Eggs M 3 Milk Diaper, Beer, Coke M 4 Bread, Milk, Diaper, Beer M 5 Bread, Milk, Diaper, Coke F In a Classification task we want to predict the class label (Gender) using the attributes A good (albeit stereotypical) rule is {Beer,Diaper} Male whose support is 60% and confidence is 100% 9 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 10 Imbalanced Data Set • In some application domains, Data Sets are Imbalanced : – The proportion of samples from one class is much smaller than the other class/classes. – And the smaller class is the class of interest. • Support and confidence are biased toward the majority class, and do not perform well in such cases. 11 Downsides of Support • Support is biased towards the majority class – Eg: classes = {yes, no}, sup({yes})=90% – minSup > 10% wipes out any rule predicting “no” – Suppose X no has confidence 1 and support 3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class! 12 Downside of Confidence(1) C Conf(A C) = 20/25 = 0.8 Support(AC) = 20/100 = 0.2 Correlation between A and C: A A C 20 5 25 70 5 75 90 10 100 ( A, C ) 20 0.89 1 ( A) (C ) 0.25 0.90 Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated. 13 Downside of Confidence (2) • Reasonable to expect that for “good rules” the antecedent and consequent are not independent! • Suppose – P(Class=Yes) = 0.9 – P(Class=Yes|X) = 0.9 14 Downsides of Confidence (3) Another useful observation • Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class. • Confidence (support) tends to bias the majority class. 15 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 16 Contingency Table • A 2 * 2 Contingency Table for X → y. • We will use the notation [a, b; c, d] to represent this table. X b rows y X a y c d cd cols ac bd ab n abcd 17 Fisher Exact Test • Given a table, [a, b; c, d], Fisher Exact Test will find the probability (p-value) of obtaining the given table under the hypothesis that {X, ¬X} and {y, ¬y} are independent. • The margin sums (∑rows, ∑cols) are fixed. 18 Fisher Exact Test (2) • The p-value is given by: p([a, b; c, d ]) min( b ,c ) i 0 (a b)!(c d )!(a c)!(b d )! n!(a i)!(b i)!(c i)!(d i)! • We will only use rules whose p-values are below the level of significant desired (e.g. 0.01). • Rules that pass this test are statistically significant in the positively associated direction (e.g. X → y). 19 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 20 Class Correlation Ratio • In Class Correlation, we are interested in rules X → y where X is more positively correlated with y than it is with ¬y. • The correlation is defined by: sup( X y ) | T | an corr ( X y ) sup( X ) sup( y ) (a c)( a b) where |T| is the number of transactions n. 21 Class Correlation Ratio (2) • We then use corr() to measure how correlated X is with y compared to ¬y. • X and y are positively correlated if corr(X→y)>1, and negatively correlated if corr(X→y)<1. 22 Class Correlation Ratio (3) • Based on correlation corr(), we define the Class Correlation Ratio (CCR): corr ( X y ) a(c d ) CCR( X y ) corr ( X y ) c(a b) • The CCR measures how much more positively the antecedent is correlated with the class it predicts (e.g. y), relative to the alternative class (e.g. ¬y). 23 Class Correlation Ratio (4) corr ( X y) CCR( X y) corr ( X y) • We only use rules with CCR higher than a desired threshold, so that no rules are used that are more positively associated with the classes they do not predict. 24 The two measurements • We perform the following tests to determine whether a potentially interesting rule is indeed interesting: – Check the significant of a rule X → y by performing the Fisher’s Exact Test. – Check whether CCR(X→y) > 1. • Those rules that pass the above two tests are candidates for the classification task. 25 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 26 Search and Pruning Strategies • To avoid examining the whole set of possible rules, we use search strategies that ensure the concept of being potential interesting is anti-monotonic: X→y might be considered as potential interesting if and only if all {X’→y|X’ in X} have been found to be potentially interesting. 27 Search and Pruning Strategies (2) • The contingency table [a, b; c, d] used to test for the significance of the rule X → y in comparison to one of its generalizations X-{z} → y for the Aggressive search strategy. t:X t t : X {z} t z t t : X {z} t t : y t a sup( X y ) b sup( X {z} y ) sup( X y ) a b sup( X {z} y} t : y t c sup( X y ) d sup( X {z} y ) sup( X y ) c d sup( X {z} y ) a c sup( X ) b d sup( X {z}) sup( X ) a b c d sup( X {z}) 28 Example • Suppose we have already determined that the rules (A = a1) 1 and (A = a2) 1 are significant. • Now we want to test if X=(A =a1) ^ (A=a2) 1 is significant • Then we carry out a FET and calculate the CCR on X and X –{A=a1} (i.e. z = {a2})and X and X{A=a2} (i.e. z = {a1}). • If the minimum of their p-value is less than the significance level, and their CCR is greater than 1, we keep the X 1 rule, otherwise we discard it. 29 Ranking Rules • Strength Score (SS): – In order to determine how interesting a rule is, we need a ranking (ordering) of the rules, and the ordering is defined by the Strength Score. 30 Overview • • • • Data Mining Tasks Associative Classifiers Downside of Support and Confidence Mining Rules from Imbalanced Data Sets – Fisher’s Exact Test – Class Correlation Ratio (CCR) – Searching and Pruning Strategies – Experiments 31 Experiments (Balanced Data) • The preceding approach is represented by “SPARCCC”. • The experiments on Balanced Data Sets show that the average accuracy of SPARCCC compares favourably to CBA and C4.5. – The table below is the prediction accuracy on balanced data sets. 32 Experiments (Imbalanced Data) • True Positive Rate (Recall/Sensitivity) is a better performance measure for imbalanced data sets. • SPARCCC overcomes other rule based techs such as CBA and CCCS. – The table below is True Positive Rate of the Minority Class on Imbalanced version of the Datasets. 33 References • Florian Verhein, Sanjay Chawla. Using Significant, Positively Associated and Relatively Class Correlated Rules For Associative Classification of Imbalanced Datasets. The 2007 IEEE International Conference on Data Mining . Omaha NE, USA. October 28-31, 2007. 34