Download Ensemble methods with Data stream

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Ensemble methods with
Data Streams
Jungbeom Lee
CS240B
Outline
Intro
 Ensemble in Machine learning
 Online ensemble algorithms
 Future work

Intro
Previous class: Data Streams Classifiers
 Ensemble methods
 Online algorithm

Classifiers
•
The batch classification problem:
– Given a finite training set D={(x,y)} , where y={y1, y2, …, yk}, |D|=n, find
a function y=f(x) that can predict the y value for an unseen instance x
•
The data stream classification problem:
– Given an infinite sequence of pairs of the form (x,y) where y={y1, y2, …,
yk}, find a function y=f(x) that can predict the y value for an unseen
instance x
•
Example applications:
– Fraud detection in credit card transactions
– Topic classification in a news aggregation site, e.g. Google news
– Translator for foreign languages
Motivations
• Online mining different from static mining

Data Volume
◦ impossible to mine the entire data at one time
◦ can only afford constant memory per data sample

Changing data characteristics
◦ previously learned models are invalid

Cost of Learning
◦ model updates can be costly
◦ can only afford constant time per data sample.
Ensemble
A set of classifiers whose individual
decisions are combined in some way to
classify new examples
 An ensemble of classifiers to be more
accurate than any of its individual
members
 one key to successful is to use individual
classifiers with error rates below .5

Reasons
Ensemble methods

Manipulating the Training Examples
◦ Bagging
◦ Adaboost

Injecting Randomness
◦ C4.5 decision tree algorithm
Bagging algorithm
Bagging algorithm
Online bagging algorithm
Online weighted bagging algorithm
AdaBoost algorithm
AdaBoost algorithm
Adaptive boosting algorithm
Experimental Results
Type of Data
Experimental Results
Experimental Results
Experimental Results
Future work
Better online algorithm for Bagging
 Dealing with multiple data types

References






http://web.engr.oregonstate.edu/~tgd/publications
/mcs-ensembles.pdf
http://pages.bangor.ac.uk/~mas00a/papers/lkSUEM
A2008.pdf
http://web.cs.ucla.edu/~zaniolo/papers/NBCAJM
W77MW0J8CP.pdf
https://ti.arc.nasa.gov/m/pubarchive/archive/0962.pdf
https://engineering.purdue.edu/~givan/papers/bp.p
df
http://hanj.cs.illinois.edu/pdf/kdd03_emsemble.pdf