Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu The UNIVERSITY of Mining, KENTUCKY CS685 : Special Topics in Data UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Linear Classifiers x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) Any of these would be fine.. ..but which is best? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Classifier Margin x denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. CS685 : Special Topics in Data Mining, UKY Maximum Margin x denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM CS685 : Special Topics in Data Mining, UKY Maximum Margin x denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against Copyright © 2001, 2003, Andrew W. Moore a f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM CS685 : Special Topics in Data Mining, UKY Why Maximum Margin? denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against Copyright © 2001, 2003, Andrew W. Moore 1. Intuitively this feels safest. 2. If we’vef(x,w,b) made a small errorxin the = sign(w. - b) location of the boundary (it’s been The maximum jolted in its perpendicular margin direction) this gives us linear least chance classifier is the of causing a misclassification. 3. LOOCV is easylinear since classifier the modelwith is the, um, maximum immune to removal of any nonmargin. support-vector datapoints. This is the simplest 4. There’s some theory (using VC of SVM dimension) thatkind is related to (but an LSVM) not the same as)(Called the proposition that this is a good thing. 5. Empirically it works very very well. CS685 : Special Topics in Data Mining, UKY Specifying a line andPlus-Plane margin Classifier Boundary Minus-Plane • How do we represent this mathematically? • …in m input dimensions? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Specifying a line and margin Plus-Plane Classifier Boundary Minus-Plane • Plus-plane = { x : w . x + b = +1 } • Minus-plane = { x : w . x + b = -1 } if w . x + b >= 1 -1 if w . x + b <= -1 Universe explodes if -1 < w . x + b < 1 Classify as.. +1 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width M = Margin Width How do we compute M in terms of w and b? • Plus-plane = { x : w . x + b = +1 } • Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width M = Margin Width How do we compute M in terms of w and b? • Plus-plane = { x : w . x + b = +1 } • Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? Let u and v be two vectors on the Plus Plane. What is w . ( u – v ) And so of course the vector w is ? also perpendicular to the Minus Plane Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width x+ x- • • • • • M = Margin Width How do we compute M in terms of w and b? Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x-. Copyright © 2001, 2003, Andrew W. Moore Any location in mm::not R not necessarily a datapoint CS685 : Special Topics in Data Mining, UKY Computing the margin width x+ x- • • • • • • M = Margin Width How do we compute M in terms of w and b? Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x-. Claim: x+ = x- + l w for some value of l. Why? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width x+ x- • • • • • • M = Margin Width The line from x- to x+ is perpendicular to the How do we compute planes. M in terms- of +w So to get from x to x and b? travel some distance in direction w. Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x-. Claim: x+ = x- + l w for some value of l. Why? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width x+ M = Margin Width x- What we know: • w . x+ + b = +1 • w . x- + b = -1 • x+ = x- + l w • |x+ - x- | = M It’s now easy to get M in terms of w and b Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Computing the margin width M = Margin Width x+ x- w . (x - + l w) + b = 1 What we know: • w . x+ + b = +1 • w . x- + b = -1 • x+ = x- + l w • |x+ - x- | = M It’s now easy to get M in terms of w and b Copyright © 2001, 2003, Andrew W. Moore => w . x - + b + l w .w = 1 => -1 + l w .w = 1 => 2 λ w.w CS685 : Special Topics in Data Mining, UKY Computing the margin width M = Margin Width = x+ 2 w.w xM = |x+ - x- | =| l w |= What we know: • w . x+ + b = +1 • w . x- + b = -1 • x+ = x- + l w • |x+ - x- | = M 2 • λ λ | w | λ w.w 2 w.w 2 w.w w.w w.w Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Learning the Maximum Margin Classifier x+ M = Margin Width = 2 w.w x- Given a guess of w and b we can • Compute whether all data points in the correct half-planes • Compute the width of the margin So now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. How? Gradient descent? Simulated Annealing? Matrix Inversion? EM? Newton’s Method? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints. Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Quadratic Programming T u Ru T Find arg max c d u 2 u a11u1 a12u2 ... a1mum b1 Subject to a21u1 a22u2 ... a2 mum b2 : an1u1 an 2u2 ... anmum bn n additional linear inequality constraints a( n 1)1u1 a( n 1) 2u2 ... a( n 1) mum b( n 1) a( n 2)1u1 a( n 2) 2u2 ... a( n 2) mum b( n 2) : a( n e )1u1 a( n e ) 2u2 ... a( n e ) mum b( n e ) Copyright © 2001, 2003, Andrew W. Moore e additional linear equality constraints And subject to Quadratic criterion CS685 : Special Topics in Data Mining, UKY Quadratic Programming Find arg max u T u Ru T cd u 2 Quadratic criterion a11u1 a12u2 ... a1mum b1 a21u1 a22u2 ... a2 mum b2 : an1u1 an 2u2 ... anmum bn Subject to inequality constraints a( n 1)1u1 a( n 1) 2u2 ... a( n 1) mum b( n 1) a( n 2)1u1 a( n 2) 2u2 ... a( n 2) mum b( n 2) : a( n e )1u1 a( n e ) 2u2 ... a( n e ) mum b( n e ) Copyright © 2001, 2003, Andrew W. Moore e additional linear equality constraints And subject to n additional linear CS685 : Special Topics in Data Mining, UKY Learning the Maximum Margin Classifier M = Given guess of w , b we can 2 w.w • Compute whether all data points are in the correct halfplanes • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 What should our quadratic optimization criterion be? Copyright © 2001, 2003, Andrew W. Moore How many constraints will we have? What should they be? CS685 : Special Topics in Data Mining, UKY Learning the Maximum Margin Classifier M = Given guess of w , b we can 2 w.w • Compute whether all data points are in the correct halfplanes • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 What should our quadratic optimization criterion be? Minimize w.w Copyright © 2001, 2003, Andrew W. Moore How many constraints will we have? R What should they be? w . xk + b >= 1 if yk = 1 w . xk + b <= -1 if yk = -1 CS685 : Special Topics in Data Mining, UKY Uh-oh! This is going to be a problem! What should we do? denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Uh-oh! denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore This is going to be a problem! What should we do? Idea 1: Find minimum w.w, while minimizing number of training set errors. Problemette: Two things to minimize makes for an illdefined optimization CS685 : Special Topics in Data Mining, UKY Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) Tradeoff parameter There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) Tradeoff parameter There’s a serious practical Can’t be expressed asproblem a Quadratic that’s about to make us Programming reject problem. this approach. Can you Solving it may beguess too slow. what it is? (Also, doesn’t distinguish between disastrous errors and near misses) Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Uh-oh! denotes +1 denotes -1 Copyright © 2001, 2003, Andrew W. Moore This is going to be a problem! What should we do? Idea 2.0: Minimize w.w + C (distance of error points to their correct place) CS685 : Special Topics in Data Mining, UKY Learning Maximum Margin with Noise M = Given guess of w , b we can 2 w.w • Compute sum of distances of points to their correct zones • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 What should our quadratic optimization criterion be? Copyright © 2001, 2003, Andrew W. Moore How many constraints will we have? What should they be? CS685 : Special Topics in Data Mining, UKY Learning Maximum Margin with Noise e2 e11 M = Given guess of w , b we can e7 What should our quadratic optimization criterion be? Minimize R 1 w.w C ε k 2 k 1 Copyright © 2001, 2003, Andrew W. Moore 2 w.w • Compute sum of distances of points to their correct zones • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 How many constraints will we have? R What should they be? w . xk + b >= 1-ek if yk = 1 w . xk + b <= -1+ek if yk = -1 CS685 : Special Topics in Data Mining, UKY Learning Maximum Margin mwith Noise = # input e2 e11 M = Given guess of dimension w , b we can 2 w.w • Compute sum sof distances of points to their correct zones Our original (noiseless data) QP had m+1 the margin width variables: w1, •w2Compute , … wm, and b. Assume R datapoints, each e7 Our new (noisy data) QP) where has m+1+R (xk,y yk = +/- 1 k variables: w1, w2, … wm, b, ek , e1 ,… eR What should our quadratic optimization criterion be? Minimize R 1 w.w C ε k 2 k 1 Copyright © 2001, 2003, Andrew W. Moore How many constraints will we R= # records have? R What should they be? w . xk + b >= 1-ek if yk = 1 w . xk + b <= -1+ek if yk = -1 CS685 : Special Topics in Data Mining, UKY Learning Maximum Margin with Noise e2 M = Given guess of w , b we can e11 2 w.w • Compute sum of distances of points to their correct zones • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 e7 What should our quadratic optimization criterion be? Minimize R How many constraints will we have? R What should they be? 1 w.w C ε k w . xk + b >= 1-ek if yk = 1 2 k 1 w . xk + b <= -1+ek if yk = -1 There’s a bug in this QP. Can you spot it? Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Learning Maximum Margin with Noise e2 e11 M = Given guess of w , b we can e7 What should our quadratic optimization criterion be? Minimize R 1 w.w C ε k 2 k 1 Copyright © 2001, 2003, Andrew W. Moore 2 w.w • Compute sum of distances of points to their correct zones • Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 How many constraints will we have? 2R What should they be? w . xk + b >= 1-ek if yk = 1 w . xk + b <= -1+ek if yk = -1 ek >= 0 for all k CS685 : Special Topics in Data Mining, UKY An Equivalent QP R 1 R R Maximize αk αk αl Qkl where Qkl yk yl (x k .xl ) 2 k 1 l 1 k 1 Subject to these constraints: 0 αk C k R α k 1 k yk 0 Then define: R w αk yk x k Then classify with: f(x,w,b) = sign(w. x - b) k 1 b y K (1 ε K ) x K .w K where K arg max αk k Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY An Equivalent QP R 1 R R Maximize αk αk αl Qkl where Qkl yk yl (x k .xl ) 2 k 1 l 1 k 1 Subject to these constraints: 0 αk C k Then define: α k 1 k yk 0 Datapoints with ak > 0 will be the Then classify with: support vectors R w αk yk x k f(x,w,b) = sign(w. x - b) k 1 b R ..so this sum only y K (1 ε K ) x K .w K needs to be over the support vectors. where K arg max αk k Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY R Equivalent R QP 1An Maximize αk αk αl Qkl where Qkl yk yl (x k .xl ) 2 k 1 l 1 k 1 R R Why did I tell you about this Subject to these αk y k 0 αk QP? C k equivalent constraints: k 1 • It’s a formulation that QP packages can optimize more Then define: Datapoints with ak > quickly 0 will be the R • Because of further jaw-dropping Then classify with: support vectors w αk yk xdevelopments you’re about to k f(x,w,b) = sign(w. x - b) k 1 learn. ..so this sum only b y K (1 ε K ) x K .w K needs to be over the support vectors. 0 where K arg max αk k Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Estimate the Margin denotes +1 denotes -1 wx +b = 0 x • What is the distance expression for a point x to a line wx+b= 0? d ( x) xw b w Copyright © 2001, 2003, Andrew W. Moore 2 2 xw b d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Estimate the Margin denotes +1 denotes -1 wx +b = 0 Margin • What is the expression for margin? margin arg min d (x) arg min xD Copyright © 2001, 2003, Andrew W. Moore xD xw b d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin argmax margin(w, b, D) w ,b = argmax arg min d (xi ) w ,b xi D argmax arg min w ,b Copyright © 2001, 2003, Andrew W. Moore xi D b xi w d 2 w i 1 i CS685 : Special Topics in Data Mining, UKY Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin argmax arg min w ,b xi D b xi w d 2 w i 1 i subject to xi D : yi xi w b 0 • Min-max problem game problem Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY denotes +1 denotes -1 Maximize Margin Margin wx +b = 0 argmax arg min w ,b xi D b xi w d 2 w i i 1 subject to xi D : yi xi w b 0 Strategy: argmin i 1 wi2 d xi D : b xi w 1 Copyright © 2001, 2003, Andrew W. Moore w ,b subject to xi D : yi xi w b 1 CS685 : Special Topics in Data Mining, UKY Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints. Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Suppose we’re in 1-dimension What would SVMs do with this data? x=0 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Suppose we’re in 1-dimension Not a big surprise x=0 Positive “plane” Copyright © 2001, 2003, Andrew W. Moore Negative “plane” CS685 : Special Topics in Data Mining, UKY Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0 Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0 Copyright © 2001, 2003, Andrew W. Moore z k ( xk , x ) 2 k CS685 : Special Topics in Data Mining, UKY Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0 Copyright © 2001, 2003, Andrew W. Moore z k ( xk , x ) 2 k CS685 : Special Topics in Data Mining, UKY Common SVM basis functions zk = ( polynomial terms of xk of degree 1 to q ) zk = ( radial basis functions of xk ) | xk c j | z k [ j ] φ j (x k ) KernelFn KW zk = ( sigmoid functions of xk ) Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY SVM Performance • Anecdotally they work very very well indeed. • Example: They are currently the best-known classifier on a well-studied hand-writtencharacter recognition benchmark • There has been a lot of excitement and religious fervor about SVMs. • Despite this, some practitioners (including your lecturer) are a little skeptical. Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY Doing multi-class classification • SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). • What can be done? • Answer: with output arity N, learn N SVM’s – – – – SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N” • Then to predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region. Copyright © 2001, 2003, Andrew W. Moore CS685 : Special Topics in Data Mining, UKY SVM Related Links • http://svm.dcs.rhbnc.ac.uk/ • http://www.kernel-machines.org/ • C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998. • SVMlight – Software (in C) http://ais.gmd.de/~thorsten/svm_light • BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press CS685 : Special Topics in Data Mining, UKY Classification - CBA CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu The UNIVERSITY of Mining, KENTUCKY CS685 : Special Topics in Data UKY Association Rules Transactionid Items bought 100 f, a, c, d, g, I, m, p 200 a, b, c, f, l,m, o 300 b, f, h, j, o 400 b, c, k, s, p 500 a, f, c, e, l, p, m, n Customer buys both Customer buys beer Customer buys diaper Itemset X = {x1, …, xk} Find all the rules X Y with minimum support and confidence support, s, is the probability that a transaction contains X Y confidence, c, is the conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Association rules: A C (60%, 100%) C A (60%, 75%) CS685 : Special Topics in Data Mining, UKY Classification based on Association • Classification rule mining versus Association rule mining • Aim – A small set of rules as classifier – All rules according to minsup and minconf • Syntax – Xy – X Y CS685 : Special Topics in Data Mining, UKY Why & How to Integrate • Both classification rule mining and association rule mining are indispensable to practical applications. • The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. – CARs: class association rules CS685 : Special Topics in Data Mining, UKY CBA: Three Steps • Discretize continuous attributes, if any • Generate all class association rules (CARs) • Build a classifier based on the generated CARs. CS685 : Special Topics in Data Mining, UKY Our Objectives • To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. • To build a classifier from the CARs. CS685 : Special Topics in Data Mining, UKY Rule Generator: Basic Concepts • Ruleitem <condset, y> :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y • condsupCount • The number of cases in D that contain condset • rulesupCount • The number of cases in D that contain the condset and are labeled with class y • Support=(rulesupCount/|D|)*100% • Confidence=(rulesupCount/condsupCount)*100% CS685 : Special Topics in Data Mining, UKY RG: Basic Concepts (Cont.) • Frequent ruleitems – A ruleitem is frequent if its support is above minsup • Accurate rule – A rule is accurate if its confidence is above minconf • Possible rule – For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. • The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate. CS685 : Special Topics in Data Mining, UKY RG: An Example • A ruleitem:<{(A,1),(B,1)},(class,1)> – assume that • the support count of the condset (condsupCount) is 3, • the support of this ruleitem (rulesupCount) is 2, and • |D|=10 – then (A,1),(B,1) -> (class,1) • supt=20% (rulesupCount/|D|)*100% • confd=66.7% (rulesupCount/condsupCount)*100% CS685 : Special Topics in Data Mining, UKY RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1 Ø; k++) do 5 C k = candidateGen (F k-1 ); //generate the candidate ruleitems Ck 6 7 8 9 10 11 12 using the frequent ruleitems Fk-1 for each data case d D do //scan the database C d = ruleSubset (C k , d); //find all the ruleitems in Ck whose condsets are supported by d for each candidate c C d do c.condsupCount++; if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in Ck end end CS685 : Special Topics in Data Mining, UKY RG: The Algorithm(cont.) 13 F k = {c C k | c.rulesupCount minsup}; //select those new frequent ruleitems to form Fk 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs = k CAR k ; 18 prCARs = k prCAR k ; CS685 : Special Topics in Data Mining, UKY