Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning May 22, 2017 1 Data Mining: Concepts and Techniques 1 Drawback of Single Classifier • If we do some classifiers to choose the best, Which one is the best? – Maybe more than one classifiers meet the criteria (e.g. same training accuracy), especially in the following situations: • Without sufficient training data • The learning algorithm leads to different local optima easily 2 Drawback of Single Classifier- Cont. • Potentially valuable information may be lost by discarding the results of less-successful classifiers – E.g., the discarded classifiers may correctly classify some samples • The final decision must be wrong if the output of selected classifier is wrong • The trained classifier may not be complex enough to handle the problem 3 Ensemble Methods • Combining a number of trained classifiers lead to a better performance than any single one – Errors can be complemented by other correct classifications – Different classifiers have different knowledge regarding the problem • To decompose a complex problem into subproblems for which the solutions obtained are simpler to understand, implement, manage and update 4 General Idea D Step 1: Create Multiple Data Sets Step 2: Build Multiple Classifiers Step 3: Combine Classifiers D1 D2 C1 C2 .... Original Training data Dt-1 Dt Ct -1 Ct C* 5 Affecting Factors in ensemble methods • Three factors affecting the accuracy: – Accuracy of individual classifiers • How good are the individual classifiers? – Fusion Methods • How to combine classifiers? – Diversity among classifiers • What are the differences between decisions of various classifiers? 6 Accuracy of individual classifier • The performance of an individual classifier is affected by – Training Dataset (sample and feature) – Learning Model (types of classifier) – Model’s Parameters (e.g. the number of neurons in NN) 7 Examples of Ensemble Methods • How to generate an ensemble of classifiers? – Bootstrap aggregating(Bagging) خودراه انداز متراکم – Boosting (AdaBoost) افزایشی 8 Bagging: Boostrap Aggregation • Example in real world: Diagnosis based on multiple doctors’ majority vote • Training – Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., boostrap) – A classifier model Mi is learned for each training set Di • Classification: classify an unknown sample X – Each classifier Mi returns its class prediction – The bagged classifier M* counts the votes and assigns the class with the most votes to X • Prediction: can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple • Accuracy – Often significant better than a single classifier derived from D – For noise data: not considerably worse, more robust – Proved improved accuracy in prediction 9 Bagging • Sampling with replacement Original Data Bagging (Round 1) Bagging (Round 2) Bagging (Round 3) 1 7 1 1 2 8 4 8 3 10 9 5 4 8 1 10 5 2 2 5 6 5 3 5 7 10 2 9 8 10 7 6 9 5 3 3 10 9 2 7 • Build classifier on each bootstrap sample 10 Boosting • Example: Consult several doctors, based on a combination of weighted diagnoses—weight assigned based on the previous diagnosis accuracy • How boosting works? – Weights are assigned to each training tuple – A series of k classifiers is iteratively learned – After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1, to pay more attention to the training tuples that were misclassified by Mi – The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy • The boosting algorithm can be extended for the prediction of continuous values • Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks overfitting the model to misclassified data 11 Boosting • Records that are wrongly classified will have their weights increased • Records that are classified correctly will have their weights decreased Original Data Boosting (Round 1) Boosting (Round 2) Boosting (Round 3) 1 7 5 4 2 3 4 4 3 2 9 8 4 8 4 10 5 7 2 4 6 9 5 5 7 4 1 4 8 10 7 6 9 6 4 3 10 3 2 4 • Example 4 is hard to classify • Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds 12 Adaboost (Freund and Schapire, 1997) • • • • Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd) Initially, all the weights of tuples are set the same (1/d) Generate k classifiers in k rounds. At round i, – Tuples from D are sampled (with replacement) to form a training set Di of the same size – Each tuple’s chance of being selected is based on its weight – A classification model Mi is derived from Di – Its error rate is calculated using Di as a test set – If a tuple is misclssified, its weight is increased, o.w. it is decreased Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of the weights of the misclassified tuples: d • error ( M i ) w j err ( X j ) j The weight of classifier Mi’s vote is log 1 error ( M i ) error ( M i )