Download Multivariate classification trees based on minimum features discrete

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Factorization of polynomials over finite fields wikipedia , lookup

Computational complexity theory wikipedia , lookup

Predictive analytics wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Mathematical optimization wikipedia , lookup

Corecursion wikipedia , lookup

Algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Laplace–Runge–Lenz vector wikipedia , lookup

Operational transformation wikipedia , lookup

Gene expression programming wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
Multivariate classification trees based on minimum
features discrete support vector machines
Carlo Vercellis
Dipartimento di Ingegneria Gestionale - Politecnico di Milano
P.za Leonardo da Vinci 32, 20133 Milano
Tel. (+39)0223992784 - Fax (+39)0223992772
[email protected]
Abstract
Decision trees have been widely recognized as one of the most effective techniques for classification in the
data mining context, particularly when dealing with business oriented applications, such as those arising in
the frame of customer relationship management. We propose an algorithm for generating decision trees in
which multivariate splitting rules are based on the new concept of discrete support vector machines. By this
we denote a discrete version of SVMs in which the error is properly expressed as the count of misclassified
instances, in place of the misclassification distance considered by traditional SVMs, and an additional term is
considered in order to reduce the complexity of the rule generated. The resulting mixed integer programming
problem formulated at each node of the decision tree is then efficiently solved via a sequential LP-based
heuristic. We then devise a procedure for generating decision trees in which a multivariate splitting rule is
derived at each node from the approximate solution of the proposed discrete SVM. Alternative
approximation algorithms based upon truncated branch and bound and tabu search methods are also
discussed. Computational tests performed on several well-known benchmark datasets indicate that our
algorithm consistently outperforms other classification approaches in terms of accuracy, and is therefore
capable of good generalization on future unseen data. Further testing on marketing datasets of realistic size
have proven the applicability of our algorithm to massive classification tasks.
References
C. Orsenigo, C. Vercellis, “Discrete support vector decision trees via tabu-search”, Journal of Computational Statistics
and Data Analysis, in corso di revisione.
C. Orsenigo, C. Vercellis, “Multivariate classification trees based on minimum features discrete support vector
machines”, IMA Journal of Management Mathematics, in corso di revisione.
C. Orsenigo, C. Vercellis, “Rules induction through discrete support vector decision trees”, Data Mining and
Knowledge Discovery. Approaches Based on Rule Induction Techniques, E. Triantaphyllou et al. eds., Kluwer,
Dordrecht, 2003, in corso di stampa.
D. La Torre, C. Vercellis, “ C
1,1
approximations of generalized support vector machines”, Journal of Concrete and
Applicable Mathematics 1 (2003), 125-134.
D. La Torre, C. Vercellis, “On cardinality constrained optimization problems and applications”, Journal of Optimization
Theory and Applications, sottoposto.