Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Boosting Ashok Veeraraghavan Boosting Methods • Combine many weak classifiers to produce a committee. • Resembles Bagging and other committee based methods. • Many weak classifiers combined to produce a powerful committee. General Idea of Boosting • Weak Classifiers ( error rate slightly better than random). • Sequentially apply weak classifiers to modified versions of data. • Predictions of these classifiers are combined to produce a powerful classifier. • Remember “Bagging”. Preliminaries • • • • • Two class classification. Output: Y {1,1} Predictor Variables: X Weak Classifier: G(X) Errtrain = err 1 N I ( y train N i 1 i G ( xi )) Schematic of Adaboost Weighted Sample GM(x) ………….. M G ( x ) si g n ( m Gm ( x) ). m 1 Weighted Sample G3(x) Weighted Sample G2(x) Training Sample G1(x) AdaBoost • Initialize the observation weights wi=1/N. • For m=1 to M Fit a classifier Gm(x) using weights wi. Compute errtraining. 1 e r r l o g ( ) Compute err Compute w w . exp[ . I ( y G ( x ) ]. m m m • Output i i m G ( x ) si g n ( i i M m 1 m Gm ( x) ). Additive model M • Additive Model : f ( x) b x; eg., Neural Network, Wavelets, MARS etc. i 1 M m m M L ( y , b x; ) • Optimization problem. • Computationally intensive. • Resort to “Forward Stagewise Additive Modeling”. min{ M m , m }1 i 1 i i 1 m m Exponential Loss and AdaBoost • AdaBoost is equivalent to “Forward Stagewise additive modeling”. • Loss Function: L ( y , f ( x) ) exp( y . f ( x) ). • Basis Functions: Individual weak classifiers Gm(x). • Exponential Loss makes AdaBoost Computationally efficient. Loss Functions and Performance • • • • • Misclassification Exponential Binomial Deviance Squared Error Choice of loss function depends upon task at hand (Classification Vs Regression) and the distribution of the data. Off-the shelf Procedures for Data-Mining • Off-the shelf minimum preprocessing of data. • Requirements: Qualitative understanding of relationship between predictors and output. Filter out irrelevant features. Tackle Outliers. • Decision Trees Trees • Trees: Partition space into disjoint regions Rj(j=1,2,…J). • f(x ε Rj) = ρj J • Formally, T ( X , ) j I ( x R j ). j 1 arg min J L( y ,R j 1 xi R j i j ). • Computationally intensive search. Trees Contd. • Divide minimization into two parts and iterate. • Finding ρj given Rj. • Finding Rj : Greedy recursive partitioning. • Advantage Qualitative understanding of output. Internal feature selection. Boosting Trees G ( x ) si g n ( M m 1 m Gm ( x) ). Gm ( x ) T ( X , m ) • Boosting done by Forward Stagewise Additive Modeling or by AdaBoost. • Squared error loss Regression Tree. Optimal Size of Trees • Usually oversized Trees grown and then pruned to predetermined size. • Instead grow all trees to same size. • Optimal size of each tree estimated beforehand from data and target function. ANOVA( analysis of variance) expansion (X ) ( X j j j ) jk jk (X j , Xk ) jkl jkl ( X j , X k , X l ) ......... Regularization • Number of Boosting Iterations ??? • M too large then poor generalization i.e., overfitting. • Usually M selected by Validation. Shrinkage • Scale the importance each tree by 0<ε<1. f m ( x) f m1 ( x) J j 1 m b( x , ) • Trade off between ε and M. • Small ε Large M • Computations proportional to M! Applications • Non intrusive powing monitoring system • Tumor classification with Gene Expression data • Text Classification References • • • • • www.boosting.org T.Hastie, R.Tibshirani, J.Friedman. “The Elements of Statistical LearningData Mining,Inference, Prediction.” Springer Verlag. R. Meir and G. Rätsch. An introduction to boosting and leveraging. In S. Mendelson and A. Smola, editors, Advanced Lectures on Machine Learning, LNCS, pages 119-184. Springer, 2003. In press. Copyright by Springer Verlag. (PDF) R.E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999. Y.Freund and R.E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September 1999. Appearing in Japanese, translation by Naoki Abe.