Download Slides-ensemble

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Ensemble Methods
ISQS 6347, Data & Text Mining
1
Ensemble Methods

Construct a set of classifiers from the training
data

Predict class label of previously unseen
records by aggregating predictions made by
multiple classifiers
ISQS 6347, Data & Text Mining
2
General Idea
D
Step 1:
Create Multiple
Data Sets
Step 2:
Build Multiple
Classifiers
Step 3:
Combine
Classifiers
D1
C1
D2
....
C2
Original
Training data
Dt-1
Dt
Ct -1
Ct
C*
ISQS 6347, Data & Text Mining
3
Why does it work?

Suppose there are 25 base classifiers



Each classifier has error rate,  = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong
prediction:
 25 i
25i

(
1


)
 0.06



 i 
i 13 

25
ISQS 6347, Data & Text Mining
4
Combined Ensemble Models
Model
1
Sample
1
Training
Data
Sample
2
Sample
3
Modeling
Method
Model
2
Ensemble
Model
(Average)
Score
Data
Model
3
ISQS 6347, Data & Text Mining
5
Combined Ensemble Models
Modeling
Method A
Model
A
Ensemble
Model
(Average)
Training
Data
Modeling
Method B
Score
Data
Model
B
ISQS 6347, Data & Text Mining
6
Examples of Ensemble Methods

How to generate an ensemble of classifiers?

Bagging

Boosting
ISQS 6347, Data & Text Mining
7
Bagging

Sampling with replacement
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)



1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
Build classifier on each bootstrap sample
Each sample has probability (1 – 1/n)n of being
selected
The probability an observation is not selected is 1 (1 – 1/n)n . When n is large enough, (1 – 1/n)n  1/e.
So the probability is 1 – 1/e  0.632
ISQS 6347, Data & Text Mining
8
Boosting

An iterative procedure to adaptively change
distribution of training data by focusing more
on previously misclassified records


Initially, all N records are assigned equal weights
Unlike bagging, weights may change at the end
of boosting round
ISQS 6347, Data & Text Mining
9
Boosting
Records that are wrongly classified will have
their weights increased
 Records that are classified correctly will have
their weights decreased

Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
ISQS 6347, Data & Text Mining
10