* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Incremental Reduced Support Vector Machines
Computational electromagnetics wikipedia , lookup
Data analysis wikipedia , lookup
Computational complexity theory wikipedia , lookup
Corecursion wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Theoretical computer science wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Inverse problem wikipedia , lookup
Linear algebra wikipedia , lookup
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute of Statistical Science Academia Sinica 2003 International Conference on Informatics, Cybernetics, and Systems ISU, Kaohsiung, Dec. 14 2003 Outline Support Vector Machines for classification problems Linear and nonlinear SVMs Difficulties with nonlinear SVMs for large problems Storage and computational complexity Reduced Support Vector Machines Incremental Reduced Support Vector Machines Numerical Results Conclusions Support Vector Machines (SVMs) Powerful tools for Data Mining SVMs become the most promising learning algorithm for classification and regression SVMs have a sound theoretical foundation Base on statistical learning theory SVMs have an optimal defined separating surface SVMs can be generated very efficiently and have high accuracy SVMs can be extend from linear to nonlinear case By using kernel functions Support Vector Machines for Classification Maximizing the Margin between Bounding Planes A+ A- Support Vector Machine Formulation Solve the quadratic program for some min (QP) , s. t. where : , denotes or membership. SSVM:Smooth Support Vector Machine is an efficient SVM algorithm proposed by Yuh-Jye Lee Nonlinear Support Vector Machine Extend to nonlinear cases by using kernel functions The value of kernel function represents the inner product in the feature space Map data from input space to a higher dimensional feature space where the data can be separated linearly Nonlinear Support Vector Machine formulation: min s. t. Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel Long CPU time to compute is fully dense numbers Runs out of memory while storing kernel matrix Computational complexity depends on Complexity of nonlinear SSVM ø Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem Reduced Support Vector Machines Overcoming Computational & Storage Difficulties by Using a Rectangular Kernel Choose a small random sample of The small random sample is a representative sample of the entire dataset is 1% to 10% of the rows of Typically Replace corresponding by with in nonlinear SSVM Only need to compute and store the rectangular kernel numbers for Computational complexity reduces to The nonlinear separator only depends on Reduced Set plays the most important role in RSVM It is natural to raise two questions: Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance? Is there a mechanism that determines the size of reduced set automatically or dynamically? Incremental reduced support vector machine is proposed to answer these questions Our Observations (Ⅰ) The nonlinear separating surface is a linear combination of a set of kernel functions If the kernel functions are very similar, the hypothesis space spanned by this kernel functions will be very limited. Our Observations (Ⅱ) Start with a very small reduced set , then add new data point only when the kernel function is dissimilar to the current function set These points contribute the most extra information How to measure the dissimilar? solving least squares problems The information criterion is The distance from the kernel vector to the column space of is greater than a threshold This distance can be determined by solving a least squares problem Dissimilar Measurement solving least squares problems It has a unique solution , and the distance is í IRSVM Algorithm pseudo-code (sequential version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set 4 Computes its kernel vector 5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix 7 If its distance exceed a certain threshold 8 9 Add this point into the reduced set and form the new reduced kernal matrix Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface Speed up IRSVM Note we have to solve the least squares problem many times whose time complixity is The main cost depends on but not on Take advantage of this fact, we proposed a batch version of IRSVM that examines a batch points once IRSVM Algorithm pseudo-code (Batch version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For a batch data point not in the reduced set 4 Computes their kernel vectors 5 Computes the corresponding distances from these kernel vector 6 to the column space of the current reduced kernel matrix 7 For those points’ distance exceed a certain threshold 8 9 Add those point into the reduced set and form the new reduced kernal matrix Until no data points in a batch were added in line 7,8 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface IRSVM on four public data sets Conclusions IRSVM — an advanced algorithm of RSVM Start with extremely small reduced set and sequentially expands to include informative data points into the reduced set Determine the size of the reduced set automatically and dynamically but no pre-specified The reduced set generated by IRSVM will be more representative All advantages of RSVM for dealing with large scale nonlinear classification problem are retained Experimental tests show that IRSVM used a smaller reduced set without scarifying classification accuracy