Download Mathematical Programming in Support Vector Machines

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of Wisconsin - Madison Support Vector Machines Maximizing the Margin between Bounding Planes w x 0w = í + 1 A+ A- x 0w = í à 1 2 jj wjj 2 Proximal Vector Machines Fitting the Data using two parallel Bounding Planes w x 0w = í + 1 A+ A- 0 xw= í à 1 2 jj wjj 2 Standard Support Vector Machine Formulation  Solve the quadratic program for some ÷ > 0: min ÷ 2 k k y 2 2 1 2kw; í k 22 y; w; í s. t. D (Aw à eí ) + y > e + (QP) , where D i i = æ1, denotes A + or A à membership.  Margin is maximized by minimizing 12kw; í k 22 PSVM Formulation We have from the QP SVM formulation: min w; í s. t. Solving for min w; í ÷ 2 k k y 2 2 + 12kw; í k 22 = e D (Aw à eí ) + y = y in terms of w and í ÷ 2ke à (QP) D (A w à eí 2 )k 2 gives: + 1 2kw; í 2 k2 This simple, but critical modification, changes the nature of the optimization problem tremendously!! Advantages of New Formulation  Objective function remains strongly convex  An explicit exact solution can be written in terms of the problem data  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space  Exact leave-one-out-correctness can be obtained in terms of problem data Linear PSVM We want to solve: min w; í ÷ 2ke à D (A w à eí 2 )k 2 + 1 2kw; í 2 k2 Setting the gradient equal to zero, gives a nonsingular system of linear equations. Solution of the system gives the desired PSVM classifier Linear PSVM Solution h i w í = I (÷ + 0 H H) à1 0 H De Here, H = [A à e]  The linear system to solve depends on: 0 HH which is of the size (n + 1) â (n + 1)  n is usually much smaller than m Linear Proximal SVM Algorithm Input Define A; D H = [A à e] Calculate 0 v = H De h i Solve Classifier: ( ÷I + H 0H ) w í = v si gn(w 0x à í ) Nonlinear PSVM Formulation  Linear PSVM: (Linear separating surface: x 0w min w; í ÷ 2 k k y 2 2 + 12kw; í k 22 = í) (QP) s. t. D (Aw à eí ) + y = e By QP “duality”, w = A 0Du. Maximizing the margin in the “dual space” , gives: min ÷2ke à D (AA 0D u à eí ) k 22+ 12ku; í k 22 u; í  Replace AA 0 by a nonlinear kernel K (A ; A 0) : min ÷2ke à D (K (A; A 0)D u à eí ) k 22+ 12ku; í k 22 u; í The Nonlinear Classifier  The nonlinear classifier: K (x 0; A 0)D u = í K (A; A 0) : R m â n â R n â m7 à ! R mâ m  Where K is a nonlinear kernel, e.g.:  Gaussian (Radial Basis) Kernel : 0 K ( A; A ) i j = " à ö kA i à A j k 22 ; i; j = 1; . . .; m  The i j -entry of K (A; A 0) represents the “similarity” of data points A i and A j Nonlinear PSVM  Similar to the linear case, setting the gradient equal to zero, we obtain: hi u í = I (÷ + 0 H H) à1 0 H De Defining slightly different: H = [K (A; A 0) à e]  Here, the linear system to solve is of the size (m + 1) â (m + 1) However, reduced kernels techniques can be used (RSVM) to reduce dimensionality. Non Linear Proximal SVM Algorithm Input Define A; D K = K ( A; A 0) H = [A K à e] Calculate 0 v = H De h i Solve ( ÷I + H 0H ) uwí = v Classifier: si gn(K à àí )í ) si gn(w (x 0; A0x0)u u = Du

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mathematical Programming in Support Vector Machines