Download Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Course
Summer
Mining
Data
Summer Course: Data Mining
Regression Analysis
Presenter: Georgi Nalbantov
August 2009
Course
2/34
Summer
Mining
Data
Structure

Regression analysis: definition and examples

Classical Linear Regression

LASSO and Ridge Regression (linear and nonlinear)

Nonparametric (local) regression estimation:
kNN for regression, Decision trees, Smoothers

Support Vector Regression (linear and nonlinear)

Variable/feature selection (AIC, BIC, R^2-adjusted)
Course
3/34
Summer
Mining
Data
Feature Selection, Dimensionality Reduction, and
Clustering in the KDD Process
U.M.Fayyad, G.PatetskyShapiro and P.Smyth (1995)
Course
4/34
Summer
Mining
Data
Common Data Mining tasks
Clustering
X2
+
+
Classification
X2
+
+
++ +
+
Regression
+
+
+
+
+
+
+ +
+
+
++ +
+
+
+ +
+
-
-
-
+
-
+
+
-
X1
+
+
X1
X1

k-th Nearest Neighbour

Linear Discriminant Analysis, QDA

Classical Linear Regression

Parzen Window

Logistic Regression (Logit)

Ridge Regression

Unfolding, Conjoint
Analysis, Cat-PCA

Decision Trees, LSSVM, NN, VS

NN, CART
Course
5/34
Summer
Mining
Data
Linear regression analysis: examples
Course
6/34
Summer
Mining
Data
Linear regression analysis: examples
Course
7/34
Summer
Mining
Data
The Regression task

Given data on m explanatory variables and 1 explained variable, where the explained
1
variable can take real values in  , find a function that gives the “best” fit:
Given: ( x1, y1 ), … , ( xm , ym )
Find:
:

n X  1
n   1
“best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k )
is minimal
Course
8/54
Summer
Mining
Data
Classical Linear Regression (OLS)

Explanatory and Response Variables are Numeric

Relationship between the mean of the response variable and the level
of the explanatory variable assumed to be approximately linear
(straight line)

Model:
Y   0  1 x  
• 1 > 0  Positive Association
• 1 < 0  Negative Association
• 1 = 0  No Association
 ~ N (0,  )
Course
9/54
Summer
Mining
Data
Classical Linear Regression (OLS)
0  Mean response when x=0 (y-intercept)
^
^
^
y   0 1 x
1  Change in mean response when x
increases by 1 unit (slope)
0, 1 are unknown parameters (like m)
0+1x  Mean response when explanatory
variable takes on the value x
Task:
Minimize the sum of
squared errors:
^
^
^
y   0 1 x
n 




SSE  i 1  yi  y i   i 1  yi    0   1 xi  





n
^
2
^
^
2
Course
10/54
Summer
Mining
Data
Classical Linear Regression (OLS)

Parameter: Slope in the population model (1)

Estimator: Least squares estimate:  1
^
^

Estimated standard error:  1  s / S xx
^
s2
^


y

y





n2
S xx 

2

 x  x 
^
^
^
y   0 1 x
SSE
n2
2
Methods of making inference regarding
population:
 Hypothesis tests (2-sided or 1-sided)
 Confidence Intervals
a
c
i
d
a
i
i
c
c
B
e
i
M
t
E
g
1
(
4
8
6
0
L
9
3
7
4
2
1
x
y
a
D
Course
11/54
Summer
Mining
Data
Classical Linear Regression (OLS)
Course
12/54
Summer
Mining
Data
Classical Linear Regression (OLS)
Course
13/54
Summer
Mining
Data
Classical Linear Regression (OLS)

Coefficient of determination (r2) : proportion of
variation in y “explained” by the regression on x.
S yy  SSE
r 
2
where
S yy 
S yy
0  r2 1
 y  y 
SSE 
2


y

y




^
2
Course
14/54
Summer
Mining
Data
Classical Linear Regression (OLS):
Multiple regression

Numeric Response variable (y)

p Numeric predictor variables

Model:
Y = 0 + 1x1 +  + pxp + 

Partial Regression Coefficients: i  effect (on the mean response) of
increasing the ith predictor variable by 1 unit, holding all other
predictors constant
Course
15/54
Summer
Mining
Data
Classical Linear Regression (OLS):
Ordinary Least Squares estimation
• Population Model for mean response:
E (Y | x1 , x p )   0  1 x1     p x p
• Least Squares Fitted (predicted) equation, minimizing SSE:
^
^
^
^
Y   0   1 x1     p x p


SSE    Y  Y 


^
2
Course
16/54
Summer
Mining
Data
Classical Linear Regression (OLS):
Ordinary Least Squares estimation
• Model:
^
^
^
^
Y   0   1 x1     p x p
2
• OLS estimation:


min SSE    Y  Y 


• LASSO estimation:


min SSE    Y  Y     j

i 1 
j 1
• Ridge regression estimation:
2


min SSE    Y  Y     j
 j 1
i 1 
^
n
n
^
^
2
p
2
p
Course
17/59
Summer
Mining
Data
LASSO and Ridge estimation of model coefficients
sum(|beta|)
sum(|beta|)
Course
18/59
Summer
Mining
Data
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
Course
19/59
Summer
Mining
Data
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
Course
20/59
Summer
Mining
Data
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
Course
21/59
Summer
Mining
Data
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
How to Choose k or h?

When k or h is small, single instances matter; bias is small, variance is
large (undersmoothing): High complexity

As k or h increases, we average over more instances and variance
decreases but bias increases (oversmoothing): Low complexity

Cross-validation is used to finetune k or h.
Course
22/59
Summer
Mining
Data
Linear Support Vector Regression
●
●
●
●
●
●
Age

● ●
●
middle-sized area
Age
Expenditures
Expenditures
● ●
●
●
●
Expenditures
small area
biggest area
●●
●
●
●
●
●
“Support vectors”
Age
“Lazy case”
“Suspiciously smart case”
“Compromise case”, SVR
(underfitting)
(overfitting)
(good generalisation)
The thinner the “tube”, the more complex the model
Course
23/59
Summer
Mining
Data
Nonlinear Support Vector Regression
Expenditures

Map the data into a higher-dimensional space:
●●
●
●
●
Age
●
●
Course
24/59
Summer
Mining
Data
Nonlinear Support Vector Regression
Expenditures

Map the data into a higher-dimensional space:
●●
●
●
●
Age
●
●
Course
25/59
Summer
Mining
Data
Nonlinear Support Vector Regression: Technicalities

The SVR function:

To find the unknown parameters of the SVR function, solve:
Subject to:

How to choose


,
,
= RBF kernel:
Find
, , , and
from a cross-validation procedure
Course
26/59
Summer
Mining
Data
SVR Technicalities: Model Selection

Do 5-fold cross-validation to find
and
for several fixed values of .
CV_MSE, epsilon = 0.15
CV_MSE, epsilon = 0.15
0.02
0.018
0.064
0.016
0.0592
0.063
0.061
0.0598
0.062
0.0588
0.0598
0.0592
0.0588
0.012
0.0598
CVMSE
gamma
0.0592
0.014
0.0588
0.01
0.061
0.06
0.0588
0.0592
0.059
0.008
0.0592
0.0598
0.02
0.058
0.006
0
0
5
10
15
0.01
5
10
C
C
0
15
gamma
Course
27/59
Summer
Mining
Data
SVR Study :
Model Training, Selection and Prediction
True returns (red) and raw predictions (blue)
CVMSE (IR*, HR*, CR*)
CVMSE (IR*, HR*, CR*)
Course
28/59
Summer
Mining
Data
SVR: Individual Effects
Effect of credit spread on SP500
Effect of 3m treasure bill on SP500
-2.8
-0.5
-1
-2.85
-1.5
SP500
SP500
-2.9
-2
-2.95
-2.5
-3
-3.5
-70
-3
-60
-50
-40
-30
-20
-10
3moftreasure
bill
Effect
vix on SP500
0
10
20
30
-3.05
-2-1
-0.5
0
-2.5
3
3.5
4
-2.5
-3
-3.5
-3
SP500
SP500
Effect of vix FUT on SP500
0.5
1
1.5
2
2.5
credit spread
-4
-3.5
-4.5
-4
-5
-5.5
-40
-30
-20
-10
0
10
vix
20
30
40
50
60
-4.5
-10
-5
0
5
10
vix FUT
15
20
25
Course
29/34
Summer
Mining
Data
SVR Technicalities: SVR vs. OLS

Performance on the testHoliday
set Data, test set, epsilon = 0.15
4

Performance on the test set
SVR
Expenditures
3.5
MSE= 0.04
3
2.5
2
0
5
10
4
15
20set, OLS solution
25
Holiday Data, test
Observation
30
35
40
OLS
Expenditure
3.5
3
MSE= 0.23
2.5
2
0
5
10
15
20
Obserlation
25
30
35
40
Course
30/34
Summer
Mining
Data
Technical Note:
Number of Training Errors vs. Model Complexity
Min. number of
training errors,
Model complexity
test errors
training errors
complexity
Best trade-off
Functions ordered in
increasing complexity
MATLAB video here…
Course
31/34
Summer
Mining
Data
Variable selection for regression

Akaike Information Criterion (AIC). Final prediction error:
Course
32/34
Summer
Mining
Data
Variable selection for regression

Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final
prediction error:
BIC tends to choose simpler models than AIC.
Course
33/34
Summer
Mining
Data
Variable selection for regression

R^2-adjusted:
Course
34/34
Summer
Mining
Data
Conclusion / Summary / References

Classical Linear Regression
(any introductory statistical/econometric book)

LASSO and Ridge Regression (linear and
nonlinear)
http://www-stat.stanford.edu/~tibs/lasso.html ,
Bishop, 2006

Nonparametric (local) regression estimation:
kNN for regression, Decision trees, Smoothers
Alpaydin, 2004,
Hastie et. el., 2001

Support Vector Regression (linear and
nonlinear)
Smola and Schoelkopf, 2003

Variable/feature selection (AIC, BIC, R^2adjusted)
Hastie et. el., 2001,
(any statistical/econometric book)
Related documents