Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LOGO Support Vector Machine & Classification using Weka (Brain, Computation, and Neural Learning) Byoung-Hee Kim Biointelligence Lab. CSE Seoul National University Agenda Supprt Vector Machine* History and basic concept of SVM How SVM works Optimization Kernel trick Properties of SVM SVM classification in Weka Review of Weka & dataset for the Project #2 Using SVM in Weka * Many slides are from “Martin Law, A Simple Introduction to Support Vector Machines, Lecture Slide for CSE 802, Michigan State Univ.” (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 2 History of Support Vector Machine SVM was first introduced in 1992 SVM becomes popular because of its success in handwritten digit recognition SVM is now regarded as an important example of “kernel methods”, one of the key area in machine learning Popularity SVM is regarded as the first choice for classification problems Many commercial analyzers contain SVM IBM® SPSS® Statistics SAS Enterprise Miner Oracle Data Mining (ODM) (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 3 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 4 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 5 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 6 Sequential minimal optimization (SMO) is one of the most popular algorithms for large-margin classification by SVM Examples closest to the hyperplane are support vectors x with non-zero α will be support vectors i i (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 7 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 8 Hard Margin vs. Soft Margin Hard margin formulation Soft margin formulation α Parameter C is the upper bound of C can be viewed as a way to control overfitting (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ i 9 Extension to Non-linear Decision Boundary How to generalize it to become nonlinear? Key idea: transform xi to a higher dimensional space to “make life easier” Input space: the space the point xi are located Feature space: the space of φ(xi) after transformation (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 10 Extension to Non-linear Decision Boundary Why transform? Linear operation in the feature space is equivalent to nonlinear operation in input space Classification can become easier with a proper transformation. In the XOR problem, for example, adding a new feature of x1x2 make the problem linearly separable (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 11 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 12 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 13 Kernel Functions (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 14 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 15 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 16 Properties of SVM Flexibility in choosing a similarity function Sparseness of solution when dealing with large data sets only support vectors are used to specify the separating hyperplane Ability to handle large feature spaces complexity does not depend on the dimensionality of the feature space Overfitting can be controlled by soft margin approach Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution Feature Selection (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 17 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 18 Introduction to Weka Weka: Data Mining Software in Java Weka is a collection of machine learning algorithms for data mining & machine learning tasks What you can do with Weka? data pre-processing, feature selection, classification, regression, clustering, association rules, and visualization Weka is an open source software issued under the GNU General Public License How to get? http://www.cs.waikato.ac.nz/ml/weka/ or just type „Weka‟ in google. (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 19 Dataset #1: Pima Indians Diabetes Description Pima Indians have the highest prevalence of diabetes in the world We will build classification models that diagnose if the patient shows signs of diabetes http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes Configuration of the data set 768 instances 8 attributes age, number of times pregnant, results of medical tests/analysis all numeric (integer or real-valued) Also, a discretized set will be provided Class value = 1 (Positive example ) Interpreted as "tested positive for diabetes" 500 instances Class value = 0 (Negative example) 268 instances (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 20 Dataset #2: Handwritten Digits (MNIST) Description The MNIST database of handwritten digits contains digits written by office workers and students We will build a recognition model based on classifiers with the reduced set of MNIST http://yann.lecun.com/exdb/mnist/ Configuration of the data set Attributes pixel values in gray level in a 28x28 image 784 attributes (all 0~255 integer) Full MNIST set Training set: 60,000 examples Test set: 10,000 examples For our practice, a reduced set with 800 examples is used Class value: 0~9, which represent digits from 0 to 9 (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 21 Practice Basic Comparing the performances of algorithms MultilayerPerceptron vs. J48 vs. SVM Checking the trained model (structure & parameter) Tuning parameters to get better models Understanding „Test options‟ & „Classifier output‟ in Weka Advanced Building committee machines using „meta‟ algorithms for classification Preprocessing / data manipulation – applying „Filter‟ Batch experiment with „Experimenter‟ Design & run a batch process with „KnowledgeFlow‟ (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 22 Mini-project Classification problem with Weka Data set 3 different data sets You should include at least one set from UCI ML repository and MNIST set (http://archive.ics.uci.edu/ml/) Classification methods MLP: iters, learning rate, momentum, # of hidden nodes SVM: will be addressed in the next class J48: Default options only Following slides (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 23 Support Vector Machine in Weka click • Click „SMO‟ • Set parameters for SMO • Set parameters for Test • Click „Start‟ for learning (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ • load a file that contains the training data by clicking „Open file‟ button • „ARFF‟ or „CSV‟ formats are readible • Click „Classify‟ tab • Click „Choose‟ button • Select „weka – function - SMO 24 How SMO works in Weka Classification algorithm Implements John Platt's sequential minimal optimization (SMO) algorithm for training a support vector classifier Multi-class problems are solved using pairwise classification To obtain proper probability estimates, use the option that fits logistic regression models to the outputs of the support vector machine References J. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schö lkopf and C. Burges and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, 1998. S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation. 13(3):637-649. (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 25 Parameters of SMO in Weka buildLogisticModels: Whether to fit logistic models to the outputs (for proper probability estimates) numFolds: The number of folds for cross-validation used to generate training data for logistic models randomSeed: Random number seed for the cross-validation c -- The complexity parameter C. It is the upper bound of alpha‟s filterType -- Determines how/if the data will be transformed. kernel -- The kernel to use. (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 26 Parameter Setting Guide Suggested buildLogisticModels: True numFolds: 3 or 5 randomSeed: any value C (complexity parameter, upper bound of alpha) Your own choice filterType Kernel and its subsequent parameters Debug – if on, you can see intermediate results Do not change! (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ epsilon checksTurnedOff toleranceParameter 27 Parameter Setting Guide - Kernel PolyKernel Try various exponents Floating points are allowed 1.0: linear kernel RBFKernel (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Try various gammas Gamma value corresponds to the inverse of the variance (width of the kernel) 28 Test Options and Classifier Output Setting the data set used for evaluation There are various metrics for evaluation (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 29 Committee Machine in Weka Using committee machine / ensemble learning in Weka Boosting: AdaBoostM1 Voting committee: Vote Bagging (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 30 Data Manipulation with Filter in Weka Attribute Selection, discretize Instance Re-sampling, selecting specified folds (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 31 Using Experimenter in Weka Tool for ‘Batch’ experiments Click „New‟ click • Select „Run‟ tab and click „Start‟ • If it has finished successfully, click „Analyse‟ tab and see the summary (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ • Set experiment type/iteration control • Set datasets / algorithms 32 KnowledgeFlow for Analysis Process Design (‘Process Flow Diagram’ of SAS® Enterprise Miner ) (C) 2009-2011, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 33