Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu The UNIVERSITY of Mining, KENTUCKY CS685 : Special Topics in Data UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) Any of these would be fine.. ..but which is best? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Classifier Margin x denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. CS685 : Special Topics in Data Mining, UKY Maximum Margin x denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM CS685 : Special Topics in Data Mining, UKY Maximum Margin x denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM CS685 : Special Topics in Data Mining, UKY Why Maximum Margin? denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against Copyright © 2001, 2003, Andrew W. Moore 1. Intuitively this feels safest. 2. If we’vef(x,w,b) made a small errorxin the = sign(w. - b) location of the boundary (it’s been The maximum jolted in its perpendicular margin direction) this gives us linear least chance classifier is the of causing a misclassification. 3. LOOCV is easylinear since classifier the modelwith is the, um, maximum immune to removal of any nonmargin. support-vector datapoints. This is the simplest 4. There’s some theory (using VC of SVM dimension) thatkind is related to (but an LSVM) not the same as)(Called the proposition that this is a good thing. 5. Empirically it works very very well. CS685 : Special Topics in Data Mining, UKY Estimate the Margin denotes +1 denotes -1 wx +b = 0 x • What is the distance expression for a point x to a line wx+b= 0? d ( x) xw b w Copyright © 2001, 2003, Andrew W. Moore 2 2 xw b d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Estimate the Margin denotes +1 denotes -1 wx +b = 0 Margin • What is the expression for margin? margin arg min d (x) arg min xD Copyright © 2001, 2003, Andrew W. Moore xD xw b d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin argmax margin(w, b, D) w ,b = argmax arg min d (xi ) w ,b xi D argmax arg min w ,b Copyright © 2001, 2003, Andrew W. Moore xi D b xi w d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin argmax arg min w ,b xi D b xi w d 2 w i 1 i subject to xi D : yi xi w b 0 • Min-max problem game problem Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY denotes +1 denotes -1 Maximize Margin Margin wx +b = 0 argmax arg min w ,b xi D b xi w d 2 w i i 1 subject to xi D : yi xi w b 0 Strategy: argmin i 1 wi2 d xi D : b xi w 1 Copyright © 2001, 2003, Andrew W. Moore w ,b subject to xi D : yi xi w b 1 CS685 : Special Topics in Data Mining, UKY Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints. Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Suppose we’re in 1-dimension What would SVMs do with this data? x=0 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Suppose we’re in 1-dimension Not a big surprise x=0 Positive “plane” Copyright © 2001, 2003, Andrew W. Moore Negative “plane” CS685 : Special Topics in Data Mining, UKY Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Harder 1-dimensionalRemember datasethow permitting nonlinear basis functions made linear regression so much nicer? Let’s permit them here too x=0 Copyright © 2001, 2003, Andrew W. Moore z k ( xk , x ) 2 k CS685 : Special Topics in Data Mining, UKY Harder 1-dimensionalRemember datasethow permitting nonlinear basis functions made linear regression so much nicer? Let’s permit them here too x=0 Copyright © 2001, 2003, Andrew W. Moore z k ( xk , x ) 2 k CS685 : Special Topics in Data Mining, UKY Common SVM basis functions zk = ( polynomial terms of xk of degree 1 to q ) zk = ( radial basis functions of xk ) | xk c j | z k [ j ] φ j (x k ) KernelFn zk = ( sigmoid functions of xk ) KW This is sensible. Is that the end of the story? No…there’s one more trick! Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY SVM Related Links • http://svm.dcs.rhbnc.ac.uk/ • http://www.kernel-machines.org/ • C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998. • SVMlight – Software (in C) http://ais.gmd.de/~thorsten/svm_light • BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press CS685 : Special Topics in Data Mining, UKY Classification - CBA CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu The UNIVERSITY of Mining, KENTUCKY CS685 : Special Topics in Data UKY Association Rules Transactionid Items bought 100 f, a, c, d, g, I, m, p 200 a, b, c, f, l,m, o 300 b, f, h, j, o 400 b, c, k, s, p 500 a, f, c, e, l, p, m, n Customer buys both Customer buys beer Customer buys diaper Itemset X = {x1, …, xk} Find all the rules X Y with minimum support and confidence support, s, is the probability that a transaction contains X Y confidence, c, is the conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Association rules: A C (60%, 100%) C A (60%, 75%) CS685 : Special Topics in Data Mining, UKY Classification based on Association • Classification rule mining versus Association rule mining • Aim – A small set of rules as classifier – All rules according to minsup and minconf • Syntax – Xy – X Y CS685 : Special Topics in Data Mining, UKY Why & How to Integrate • Both classification rule mining and association rule mining are indispensable to practical applications. • The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. – CARs: class association rules CS685 : Special Topics in Data Mining, UKY CBA: Three Steps • Discretize continuous attributes, if any • Generate all class association rules (CARs) • Build a classifier based on the generated CARs. CS685 : Special Topics in Data Mining, UKY Our Objectives • To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. • To build a classifier from the CARs. CS685 : Special Topics in Data Mining, UKY Rule Generator: Basic Concepts • Ruleitem <condset, y> :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y • condsupCount • The number of cases in D that contain condset • rulesupCount • The number of cases in D that contain the condset and are labeled with class y • Support=(rulesupCount/|D|)*100% • Confidence=(rulesupCount/condsupCount)*100% CS685 : Special Topics in Data Mining, UKY RG: Basic Concepts (Cont.) • Frequent ruleitems – A ruleitem is frequent if its support is above minsup • Accurate rule – A rule is accurate if its confidence is above minconf • Possible rule – For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. • The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate. CS685 : Special Topics in Data Mining, UKY RG: An Example • A ruleitem:<{(A,1),(B,1)},(class,1)> – assume that • the support count of the condset (condsupCount) is 3, • the support of this ruleitem (rulesupCount) is 2, and • |D|=10 – then (A,1),(B,1) -> (class,1) • supt=20% (rulesupCount/|D|)*100% • confd=66.7% (rulesupCount/condsupCount)*100% CS685 : Special Topics in Data Mining, UKY RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1 Ø; k++) do 5 C k = candidateGen (F k-1 ); //generate the candidate ruleitems Ck 6 7 8 9 10 11 12 using the frequent ruleitems Fk-1 for each data case d D do //scan the database C d = ruleSubset (C k , d); //find all the ruleitems in Ck whose condsets are supported by d for each candidate c C d do c.condsupCount++; if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in Ck end end CS685 : Special Topics in Data Mining, UKY RG: The Algorithm(cont.) 13 F k = {c C k | c.rulesupCount minsup}; //select those new frequent ruleitems to form Fk 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs = k CAR k ; 18 prCARs = k prCAR k ; CS685 : Special Topics in Data Mining, UKY