* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Multivariate classification trees based on minimum features discrete
Factorization of polynomials over finite fields wikipedia , lookup
Computational complexity theory wikipedia , lookup
Predictive analytics wikipedia , lookup
Computational electromagnetics wikipedia , lookup
Mathematical optimization wikipedia , lookup
Corecursion wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Data assimilation wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Operational transformation wikipedia , lookup
Gene expression programming wikipedia , lookup
Multivariate classification trees based on minimum features discrete support vector machines Carlo Vercellis Dipartimento di Ingegneria Gestionale - Politecnico di Milano P.za Leonardo da Vinci 32, 20133 Milano Tel. (+39)0223992784 - Fax (+39)0223992772 [email protected] Abstract Decision trees have been widely recognized as one of the most effective techniques for classification in the data mining context, particularly when dealing with business oriented applications, such as those arising in the frame of customer relationship management. We propose an algorithm for generating decision trees in which multivariate splitting rules are based on the new concept of discrete support vector machines. By this we denote a discrete version of SVMs in which the error is properly expressed as the count of misclassified instances, in place of the misclassification distance considered by traditional SVMs, and an additional term is considered in order to reduce the complexity of the rule generated. The resulting mixed integer programming problem formulated at each node of the decision tree is then efficiently solved via a sequential LP-based heuristic. We then devise a procedure for generating decision trees in which a multivariate splitting rule is derived at each node from the approximate solution of the proposed discrete SVM. Alternative approximation algorithms based upon truncated branch and bound and tabu search methods are also discussed. Computational tests performed on several well-known benchmark datasets indicate that our algorithm consistently outperforms other classification approaches in terms of accuracy, and is therefore capable of good generalization on future unseen data. Further testing on marketing datasets of realistic size have proven the applicability of our algorithm to massive classification tasks. References C. Orsenigo, C. Vercellis, “Discrete support vector decision trees via tabu-search”, Journal of Computational Statistics and Data Analysis, in corso di revisione. C. Orsenigo, C. Vercellis, “Multivariate classification trees based on minimum features discrete support vector machines”, IMA Journal of Management Mathematics, in corso di revisione. C. Orsenigo, C. Vercellis, “Rules induction through discrete support vector decision trees”, Data Mining and Knowledge Discovery. Approaches Based on Rule Induction Techniques, E. Triantaphyllou et al. eds., Kluwer, Dordrecht, 2003, in corso di stampa. D. La Torre, C. Vercellis, “ C 1,1 approximations of generalized support vector machines”, Journal of Concrete and Applicable Mathematics 1 (2003), 125-134. D. La Torre, C. Vercellis, “On cardinality constrained optimization problems and applications”, Journal of Optimization Theory and Applications, sottoposto.