Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Statistical Data Mining and Optimization Methods for Machine Learning (Summer 2009) Instructor Dr. Myong K. (MK) Jeong (Assistant Professor, Department of Industrial & Systems Engineering and RUTCOR (Rutgers Center for Operations Research), Rutgers University) Contact: [email protected]; http:www.rci.rutgers.edu/~mjeong Class information Location: TBD (to be determined) Class times: Mon and Wed 1:30 pm –3:30pm Tentative duration: July 6 – August 19 (7 weeks) (Class times can be rearranged at the first class) Course Description This course will focus on statistical data mining and machine learning. Also, this course will introduce the data mining formulations through the mathematical programming (e.g., Mangasarian’s work). Textbooks 1. No textbook is required. Class materials for each topic will be provided by the instructor. 2. Some recommended books The Elements of Statistical Learning by T. Hastie et al., Springer Pattern Recognition and Machine Learning by Christopher M. Bishop, Springer Bayesian Data Analysis by A. Gelman et al., Chapman & Hall Topics To be Covered (The topics and level of their coverage may be changed according to the audience) O Classification Support vector machines (SVMs), Convex optimization for machine learning Bayesian version of SVMs: Relevance vector machine (RVM) Variants of SVMs: introduction of Mangasarian’s works (e.g., robust SVMs, sparse SVMs, L0 norm SVM, …) O Regression (Prediction or Calibration) Loss functions & Regularization (ridge regression, Lasso, and others) SVMs for regression, Bayesian SVMs for regression Function approximation Robust regression Optimization-based modeling for regression O Feature Selection and Extraction (1) Criteria Metrics and distance functions depending on data types (numeric, text, …) Distance between probability distributions (e.g., divergence) Dynamic time warping Separability measures (2) Optimization methods for feature selection Sequential forward/backward Selection, Recursive feature elimination (RFE) Forward/backward floating search, Branch and bound method, Genetic algorithms (3) Other topics Bayesian variable selection SVM-based feature selection Huge-scale feature selection O Transformation of variables Principal component analysis (PCA) Variants of PCA: Sparse-PCA, Probabilistic PCA, Kernel PCA Preprocessing of spectra data: orthogonal signal correction (OSC) Wavelet transform O Decision trees Classification trees Regression trees O Ensemble Methods Bagging Boosting (AdaBoost) O Logical Analysis of Data (LAD) for Classification (E. Boros et al., IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 12, NO. 2, MARCH/APRIL 2000) (1) Introduction of LAD and optimization models for LAD Introduction to Boolean functions Binarization of numerical variables & minimal support set identification through the set covering problem Pattern generation and selection, Construction of a discriminant (2) Comparison with CART and SVMs for classification O Other topics Reinforcement Learning Clustering Bayesian decision theory Spatial data mining Grading Policy: Homework (50%) and class project with an oral presentation (50%)