Download Data Mining: Concepts and Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia, lookup

Nonlinear dimensionality reduction wikipedia, lookup

Naive Bayes classifier wikipedia, lookup

K-nearest neighbors algorithm wikipedia, lookup

Transcript
Data Mining:
Concepts and Techniques
— Chapter 7 —
Classification Ensemble
Learning
May 22, 2017
1
Data Mining: Concepts and Techniques
1
Drawback of Single Classifier
• If we do some classifiers to choose the best,
Which one is the best?
– Maybe more than one classifiers meet the criteria (e.g.
same training accuracy), especially in the following
situations:
• Without sufficient training data
• The learning algorithm leads to different local optima
easily
2
Drawback of Single Classifier- Cont.
• Potentially valuable information may be lost by
discarding the results of less-successful
classifiers
– E.g., the discarded classifiers may correctly classify some
samples
• The final decision must be wrong if the output
of selected classifier is wrong
• The trained classifier may not be complex
enough to handle the problem
3
Ensemble Methods
• Combining a number of trained classifiers lead
to a better performance than any single one
– Errors can be complemented by other correct
classifications
– Different classifiers have different knowledge
regarding the problem
• To decompose a complex problem into subproblems for which the solutions obtained are
simpler to understand, implement, manage
and update
4
General Idea
D
Step 1:
Create Multiple
Data Sets
Step 2:
Build Multiple
Classifiers
Step 3:
Combine
Classifiers
D1
D2
C1
C2
....
Original
Training data
Dt-1
Dt
Ct -1
Ct
C*
5
Affecting Factors in ensemble
methods
• Three factors affecting the accuracy:
– Accuracy of individual classifiers
• How good are the individual classifiers?
– Fusion Methods
• How to combine classifiers?
– Diversity among classifiers
• What are the differences between decisions of various
classifiers?
6
Accuracy of individual classifier
• The performance of an individual classifier is
affected by
– Training Dataset (sample and feature)
– Learning Model (types of classifier)
– Model’s Parameters (e.g. the number of neurons in
NN)
7
Examples of Ensemble Methods
• How to generate an ensemble of classifiers?
– Bootstrap aggregating(Bagging) ‫خودراه انداز متراکم‬
– Boosting (AdaBoost) ‫افزایشی‬
8
Bagging: Boostrap Aggregation
• Example in real world: Diagnosis based on multiple doctors’ majority vote
• Training
– Given a set D of d tuples, at each iteration i, a training set Di of d tuples
is sampled with replacement from D (i.e., boostrap)
– A classifier model Mi is learned for each training set Di
• Classification: classify an unknown sample X
– Each classifier Mi returns its class prediction
– The bagged classifier M* counts the votes and assigns the class with the
most votes to X
• Prediction: can be applied to the prediction of continuous values by taking
the average value of each prediction for a given test tuple
• Accuracy
– Often significant better than a single classifier derived from D
– For noise data: not considerably worse, more robust
– Proved improved accuracy in prediction
9
Bagging
• Sampling with replacement
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)
1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
• Build classifier on each bootstrap sample
10
Boosting
•
Example: Consult several doctors, based on a combination of weighted
diagnoses—weight assigned based on the previous diagnosis accuracy
•
How boosting works?
–
Weights are assigned to each training tuple
–
A series of k classifiers is iteratively learned
–
After a classifier Mi is learned, the weights are updated to allow the
subsequent classifier, Mi+1, to pay more attention to the training tuples
that were misclassified by Mi
–
The final M* combines the votes of each individual classifier, where the
weight of each classifier's vote is a function of its accuracy
•
The boosting algorithm can be extended for the prediction of continuous
values
•
Comparing with bagging: boosting tends to achieve greater accuracy, but it
also risks overfitting the model to misclassified data
11
Boosting
• Records that are wrongly classified will have
their weights increased
• Records that are classified correctly will have
their weights decreased
Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
12
Adaboost (Freund and Schapire, 1997)
•
•
•
•
Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
– Tuples from D are sampled (with replacement) to form a training set
Di of the same size
– Each tuple’s chance of being selected is based on its weight
– A classification model Mi is derived from Di
– Its error rate is calculated using Di as a test set
– If a tuple is misclssified, its weight is increased, o.w. it is decreased
Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi
error rate is the sum of the weights of the misclassified tuples:
d
•
error ( M i )   w j  err ( X j )
j
The weight of classifier Mi’s
vote is
log
1  error ( M i )
error ( M i )