Download Probability Density Based Indexing for High

Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek October 2-4, 2000 M2000 1 Outline Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination Extensions Application in Drug Design Hallelujah Hype October 2-4, 2000 M2000 2 Support Vector Machines (SVM) A methodology for inference based on Vapnik’s Statistical Learning Theory. Key Ideas: “Maximize Margins” “Do the Dual” “Construct Kernels” October 2-4, 2000 M2000 3 Best Linear Separator? October 2-4, 2000 M2000 4 Best Linear Separator? October 2-4, 2000 M2000 5 Best Linear Separator? October 2-4, 2000 M2000 6 Best Linear Separator? October 2-4, 2000 M2000 7 Best Linear Separator? October 2-4, 2000 M2000 8 Find Closest Points in Convex Hulls d c October 2-4, 2000 M2000 9 Plane Bisect Closest Points x wb w  d c d c October 2-4, 2000 M2000 10 Find using quadratic program min 1 2 c    i xi i1 s.t.  i1 i 1 cd d  2  x i1  i1 i  0 i i i 1 i  1,...,  Many existing and new solvers. October 2-4, 2000 M2000 11 Best Linear Separator: Supporting Plane Method Maximize distance Between two paral supporting planes x w  b 1 Distance = “Margin” = 2 x w  b 1 October 2-4, 2000 || w || M2000 12 Maximize margin using quadratic program min w ,b s.t October 2-4, 2000 1 2 2 || w || xi w  b  1 i  Class 1 xi w  b  1 i  Class  1 M2000 13 Dual of Closest Points Method is Support Plane Method min  s.t. 2 1 yi i xi || ||  min 2 i 1 w ,b  i  1   i  1  s.t. 2 1 w|| || 2 yi  xi w  b   1  0 i  1,.., i1 i1  0 Solution only depends on support vectors: i w   yi i xi i 1 October 2-4, 2000  1 i  Class 1  yi :   1 i  Class  1 M2000 14 Statistical Learning Theory Misclassification error and the function complexity bound generalization error. Maximizing margins minimizes complexity. “Eliminates” overfitting. Solution depends only on Support Vectors not number of attributes. October 2-4, 2000 M2000 15 Margins and Complexity Skinny margin is more flexible thus more comple October 2-4, 2000 M2000 16 Margins and Complexity Fat margin is less complex. October 2-4, 2000 M2000 17 Linearly Inseparable Case Convex Hulls Intersec Same argument won’t work. October 2-4, 2000 M2000 18 Reduced Convex Hulls Don’t Intersect d  iClass1  iClass1  i xi i  1 0  i  D D October 2-4, 2000 Reduce by adding upper bound D 1 2 M2000 19 Find Closest Points Then Bisect min s.t. 1 2 ||  i xi    i xi ||2 i1  i1 i1 i 1  i1 i 1 0  i  D No change except for D. D determines number of Support Vectors. October 2-4, 2000 M2000 20 Linearly Inseparable Case: Supporting Plane Method Just add non-negative error vector z. min w ,b , z || w ||  C  zi 2 i 1 yi  xi w  b   zi  1 zi  0 i  1,.., s.t October 2-4, 2000 1 2 M2000 21 Dual of Closest Points Method is Support Plane Method min  s.t. 2 1 yi i xi || ||  min 2 i 1 w ,b  i  1   i  1  s.t. 2 1 w||  C  zi || 2 i 1 yi  xi w  b   zi  0 D  i  0 zi  0 i1 i1 i  1,.., i  0 Solution only depends on support vectors: w   yi i xi i 1 October 2-4, 2000 M2000 22 Nonlinear Classification x   a, b  x w  w1a  w2b   ( x)   a, b, ab, a , b  2 2  ( x) w  w1a  w2b  w3ab  w4 a  w5b 2 October 2-4, 2000 M2000 2 23 Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Define :  ( x) :  n n '  n n' Dual SVM becomes: min  s.t. 1 2   y y    (x )  (x i 1 j 1  y i 1 i i i j i j i j )  i i 1 0  i  0 i  1,.., October 2-4, 2000 M2000 24 Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953)  (u )  (v)  K (u, v) for certain  and K, e.g.  (u ) K (u, v) Degree d polynomial (u v  1) d Radial Basis Function Machine Two Layer Neural Network October 2-4, 2000 M2000  || u  v ||2  exp    2    sigmoid ( (u v)  c) 25 Final Classification via Kernels The Dual SVM becomes: min  s.t. 1 2  y y   i 1 j 1  y i 1 i i i j i j K ( xi , x j )    i i 1 0  i  0 i  1, .., October 2-4, 2000 M2000 26 October 2-4, 2000 M2000 27 Final SVM Algorithm Solve Dual SVM QP Recover primal variable b Classify new x   f ( x)  sign  i yi K ( x, xi )  b   i 1  i : 0 Solution only depends on support vectors October 2-4, 2000 M2000 28 Support Vector Machines (SVM) Key Formulation Ideas: “Maximize Margins” “Do the Dual” “Construct Kernels” Generalization Error Bounds Practical Algorithms October 2-4, 2000 M2000 29 SVM Extensions Regression Variable Selection Boosting Density Estimation Unsupervised Learning Novelty/Outlier Detection Feature Detection Clustering October 2-4, 2000 M2000 30 Example in Drug Design Goal to predict bio-reactivity of molecules to decrease drug development time. Target is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule. Constructs quantitative structure activity relationship (QSAR) model. October 2-4, 2000 M2000 31 SVM Regression: -insensitive loss function ( x w  b)  y   ( x w  b)  y - ( x w  b)  y   October 2-4, 2000 M2000 + 32  SVM Minimizes Underestimate+Overestimate ( x w  b)  y   min* w ,b , z , z ( x w  b)  y   October 2-4, 2000 s.t M2000 C   zi  zi*  || w ||1 i 1      j K ( xi , x j )  b  y   zi    j 1       j K ( xi , x j )  b  y   zi    j 1  zi , zi*  0 i  1,.., 33 LCCKA Problem Training data – 66 molecules 323 original attributes are wavelet coefficients of TAE Descriptors. 39 subset of attributes selected by linear 1-norm SVM (with no kernels). For details see DDASSL project link off of http://www.rpi.edu/~bennek. Testing set results reported. October 2-4, 2000 M2000 34 LCCK Prediction LCCKATest Set Estimates 1 Predicted Value 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 True Value 1 Q2=.25 October 2-4, 2000 M2000 35 Many Other Applications Speech Recognition Data Base Marketing Quark Flavors in High Energy Physics Dynamic Object Recognition Knock Detection in Engines Protein Sequence Problem Text Categorization Breast Cancer Diagnosis See: http://www.clopinet.com/isabelle/Projects/ SVM/applist.html October 2-4, 2000 M2000 36 Hallelujah! Generalization theory and practice meet General methodology for many types of problems Same Program + New Kernel = New method No problems with local minima Few model parameters. Selects capacity. Robust optimization methods. Successful Applications BUT… October 2-4, 2000 M2000 37 HYPE? Will SVMs beat my best hand-tuned method Z for X? Do SVM scale to massive datasets? How to chose C and Kernel? What is the effect of attribute scaling? How to handle categorical variables? How to incorporate domain knowledge? How to interpret results? October 2-4, 2000 M2000 38 Support Vector Machine Resources http://www.support-vector.net/ http://www.kernel-machines.org/ Links off my web page: http://www.rpi.edu/~bennek October 2-4, 2000 M2000 39

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Probability Density Based Indexing for High