* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mathematical Programming in Data Mining
Survey
Document related concepts
Computational electromagnetics wikipedia , lookup
Corecursion wikipedia , lookup
Coding theory wikipedia , lookup
Genetic algorithm wikipedia , lookup
Computational complexity theory wikipedia , lookup
Data assimilation wikipedia , lookup
Selection algorithm wikipedia , lookup
Mathematical optimization wikipedia , lookup
Mathematical physics wikipedia , lookup
Mathematical economics wikipedia , lookup
Pattern recognition wikipedia , lookup
Theoretical computer science wikipedia , lookup
Transcript
Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin Abstract Describe mathematical programming to feature selection, clustering and robust representation Outline Motivation Objective Problems Feature Selection Clustering Robust Representation Conclusion Motivation Mathematical programming has been applied to a great variety of theoretical Problems can be formulated and effectively solved as mathematical programs Objective Describe three mathematicalprogramming-based developments relevant to data mining Problems Feature Selection Clustering Robust Representation Problem - Feature Selection Discriminating between two finite point sets in n-dimensional feature space and utilizes as few of the feature as possible Formulated as mathematical program with a parametric objective function and linear constraints Problem - Clustering Assigning m points in the ndimensional real space Rn to k clusters Formulated as determining k centers in Rn, the sum of distances of each point to the nearest center is minimized Problem - Robust Representation Modeling a system of relations in a manner that preserves the validity of the representation when the data on which the model is based changes Use a sufficiently small error זּis purposely tolerated Feature Selection Use the simplest model to describe the essence of a phenomenon Binary classification problem: – discriminating between two given point sets A and B in the n-dimensional real space Rn by using as few of the ndimensions of the space as possible Binary classification W P Feature Selection the following are some defined: A B Successive Linearization Algorithm w vector is result Experimentation 32-feature Wisconsin Prognostic Breast Cancer(WPBC) N=32, m = 28, k = 118, r = 0.05, 4 features, increasing tenfold cross-validation correctness by 35.4% Clustering Determining k cluster centers, the sum of the 1-norm distances of each point in a given database to nearest cluster center is minimized Minimizing product of two linear functions on a set defined by linear inequalities K-Median Algorithm Need to solve Experimentation used as a KDD tool to mine WPBC to discover medical knowledge key observation is curves are well separated Experimentation Robust Representation model remains valid under a class of data perturbation Use זּ-tolerance zone wherein errors are disregarded Better generalization results than conventional zero-tolerance Robust Representation A is a m*n matrix, a is a m*1 vector x is a vector be “learned” find minimize of Ax - a Robust Representation = A x a = A x זּ-tolerate a Conclusion Mathematical programming codes are reliable and robust codes Problems solved demonstrate mathematical programming as versatile and effective tool for solving important problems in data mining and knowledge discovery in databases Opinion Mathematical describe can explain about complex problems and convince others, but …you must be understand it first