Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Regression
Albert Bifet
May 2012
COMP423A/COMP523A Data Stream Mining
Outline
1. Introduction
2. Stream Algorithmics
3. Concept drift
4. Evaluation
5. Classification
6. Ensemble Methods
7. Regression
8. Clustering
9. Frequent Pattern Mining
10. Distributed Streaming
Data Streams
Big Data & Real Time
Regression
Definition
Given a numeric class attribute, a regression algorithm builds a
model that predicts for every unlabelled instance I a numeric
value with accuracy.
y = f (x)
Example
Stock-Market price prediction
Example
Airplane delays
Evaluation
1. Error estimation: Hold-out or Prequential
2. Evaluation performance measures: MSE or MAE
3. Statistical significance validation: Nemenyi test
Evaluation Framework
2. Performance Measures
Regression mean measures
I
Mean square error:
MSE =
I
X
(f (xi ) − yi )2 /N
Root mean square error:
√
RMSE =
MSE =
qX
(f (xi ) − yi )2 /N
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
2. Performance Measures
Regression relative measures
I
Relative Square error:
X
X
RSE =
(f (xi ) − yi )2 /
(ȳi − yi )2
I
Root relative square error:
qX
X
√
RRSE = RSE =
(f (xi ) − yi )2 /
(ȳi ) − yi )2
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
2. Performance Measures
Regression absolute measures
I
Mean absolute error:
MAE =
I
X
(|f (xi ) − yi |)/N
Relative absolute error:
X
X
RAE =
(|f (xi ) − yi |)/
(|ŷi − yi |)
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
Linear Methods for Regression
Linear Least Squares fitting
I
Linear Regression Model
f (x) = β0 +
p
X
βj xj = Xβ
j=1
I
Minimize residual sum of squares
RSS(β) =
N
X
(yi − f (xi ))2 /N = (y − Xβ)0 (y − Xβ)
i=1
I
Solution:
β̂ = (X0 X)−1 X0 y
Perceptron
Attribute 1
w1
Attribute 2
w2
Attribute 3
w3
Attribute 4
w4
Attribute 5
w5
I
Data stream: h~xi , yi i
I
~ T ~xi ,
Classical perceptron: hw~ (~xi ) = w
~)=
Minimize Mean-square error: J(w
I
Output hw~ (~xi )
1
2
P
(yi − hw~ (~xi ))2
Perceptron
I
I
I
P
~ ) = 12 (yi − hw~ (~xi ))2
Minimize Mean-square error: J(w
~ =w
~ − η∇J ~xi
Stochastic Gradient Descent: w
Gradient of the error function:
X
∇J = −
(yi − hw~ (~xi ))
i
I
Weight update rule
~ =w
~ +η
w
X
(yi − hw~ (~xi ))~xi
i
Fast Incremental Model Tree with Drift Detection
FIMT-DD
FIMT-DD differences with HT:
1. Splitting Criterion
2. Numeric attribute handling using BINTREE
3. Linear model at the leaves
4. Concept Drift Handling: Page-Hinckley
5. Alternate Tree adaption strategy
Splitting Criterion
Standard Deviation Reduction Measure
I
Classification
Information Gain = Entropy(before Split) − Entropy(after split)
Entropy = −
Gini Index =
I
c
X
c
X
pi · log pi
pi (1 − pi ) = 1 −
c
X
pi2
Regression
Gain = SD(before Split) − SD(after split)
qX
(ȳ − yi )2 /N
StandardDeviation (SD) =
Numeric Handling Methods
Exhaustive Binary Tree (BINTREE – Gama et al, 2003)
I
Closest implementation of a batch method
I
Incrementally update a binary tree as data is observed
I
Issues: high memory cost, high cost of split search, data
order
Page Hinckley Test
I
The CUSUM test
g0 = 0,
gt = max (0, gt−1 + t − υ)
if gt > h then alarm and gt = 0
I
The Page Hinckley Test
g0 = 0,
gt = gt−1 + (t − υ)
Gt = min(gt )
if gt − Gt > h then alarm and gt = 0
Lazy Methods
kNN Nearest Neighbours:
1. Mean value of the k nearest neighbours
Pk
f̂ (xq ) =
2. Depends on distance function
i=1 f (xi )
k
Related documents