Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Course Summer Mining Data Summer Course: Data Mining Regression Analysis Presenter: Georgi Nalbantov August 2009 Course 2/34 Summer Mining Data Structure Regression analysis: definition and examples Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted) Course 3/34 Summer Mining Data Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process U.M.Fayyad, G.PatetskyShapiro and P.Smyth (1995) Course 4/34 Summer Mining Data Common Data Mining tasks Clustering X2 + + Classification X2 + + ++ + + Regression + + + + + + + + + + ++ + + + + + + - - - + - + + - X1 + + X1 X1 k-th Nearest Neighbour Linear Discriminant Analysis, QDA Classical Linear Regression Parzen Window Logistic Regression (Logit) Ridge Regression Unfolding, Conjoint Analysis, Cat-PCA Decision Trees, LSSVM, NN, VS NN, CART Course 5/34 Summer Mining Data Linear regression analysis: examples Course 6/34 Summer Mining Data Linear regression analysis: examples Course 7/34 Summer Mining Data The Regression task Given data on m explanatory variables and 1 explained variable, where the explained 1 variable can take real values in , find a function that gives the “best” fit: Given: ( x1, y1 ), … , ( xm , ym ) Find: : n X 1 n 1 “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is minimal Course 8/54 Summer Mining Data Classical Linear Regression (OLS) Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model: Y 0 1 x • 1 > 0 Positive Association • 1 < 0 Negative Association • 1 = 0 No Association ~ N (0, ) Course 9/54 Summer Mining Data Classical Linear Regression (OLS) 0 Mean response when x=0 (y-intercept) ^ ^ ^ y 0 1 x 1 Change in mean response when x increases by 1 unit (slope) 0, 1 are unknown parameters (like m) 0+1x Mean response when explanatory variable takes on the value x Task: Minimize the sum of squared errors: ^ ^ ^ y 0 1 x n SSE i 1 yi y i i 1 yi 0 1 xi n ^ 2 ^ ^ 2 Course 10/54 Summer Mining Data Classical Linear Regression (OLS) Parameter: Slope in the population model (1) Estimator: Least squares estimate: 1 ^ ^ Estimated standard error: 1 s / S xx ^ s2 ^ y y n2 S xx 2 x x ^ ^ ^ y 0 1 x SSE n2 2 Methods of making inference regarding population: Hypothesis tests (2-sided or 1-sided) Confidence Intervals a c i d a i i c c B e i M t E g 1 ( 4 8 6 0 L 9 3 7 4 2 1 x y a D Course 11/54 Summer Mining Data Classical Linear Regression (OLS) Course 12/54 Summer Mining Data Classical Linear Regression (OLS) Course 13/54 Summer Mining Data Classical Linear Regression (OLS) Coefficient of determination (r2) : proportion of variation in y “explained” by the regression on x. S yy SSE r 2 where S yy S yy 0 r2 1 y y SSE 2 y y ^ 2 Course 14/54 Summer Mining Data Classical Linear Regression (OLS): Multiple regression Numeric Response variable (y) p Numeric predictor variables Model: Y = 0 + 1x1 + + pxp + Partial Regression Coefficients: i effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant Course 15/54 Summer Mining Data Classical Linear Regression (OLS): Ordinary Least Squares estimation • Population Model for mean response: E (Y | x1 , x p ) 0 1 x1 p x p • Least Squares Fitted (predicted) equation, minimizing SSE: ^ ^ ^ ^ Y 0 1 x1 p x p SSE Y Y ^ 2 Course 16/54 Summer Mining Data Classical Linear Regression (OLS): Ordinary Least Squares estimation • Model: ^ ^ ^ ^ Y 0 1 x1 p x p 2 • OLS estimation: min SSE Y Y • LASSO estimation: min SSE Y Y j i 1 j 1 • Ridge regression estimation: 2 min SSE Y Y j j 1 i 1 ^ n n ^ ^ 2 p 2 p Course 17/59 Summer Mining Data LASSO and Ridge estimation of model coefficients sum(|beta|) sum(|beta|) Course 18/59 Summer Mining Data Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers Course 19/59 Summer Mining Data Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers Course 20/59 Summer Mining Data Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers Course 21/59 Summer Mining Data Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers How to Choose k or h? When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h. Course 22/59 Summer Mining Data Linear Support Vector Regression ● ● ● ● ● ● Age ● ● ● middle-sized area Age Expenditures Expenditures ● ● ● ● ● Expenditures small area biggest area ●● ● ● ● ● ● “Support vectors” Age “Lazy case” “Suspiciously smart case” “Compromise case”, SVR (underfitting) (overfitting) (good generalisation) The thinner the “tube”, the more complex the model Course 23/59 Summer Mining Data Nonlinear Support Vector Regression Expenditures Map the data into a higher-dimensional space: ●● ● ● ● Age ● ● Course 24/59 Summer Mining Data Nonlinear Support Vector Regression Expenditures Map the data into a higher-dimensional space: ●● ● ● ● Age ● ● Course 25/59 Summer Mining Data Nonlinear Support Vector Regression: Technicalities The SVR function: To find the unknown parameters of the SVR function, solve: Subject to: How to choose , , = RBF kernel: Find , , , and from a cross-validation procedure Course 26/59 Summer Mining Data SVR Technicalities: Model Selection Do 5-fold cross-validation to find and for several fixed values of . CV_MSE, epsilon = 0.15 CV_MSE, epsilon = 0.15 0.02 0.018 0.064 0.016 0.0592 0.063 0.061 0.0598 0.062 0.0588 0.0598 0.0592 0.0588 0.012 0.0598 CVMSE gamma 0.0592 0.014 0.0588 0.01 0.061 0.06 0.0588 0.0592 0.059 0.008 0.0592 0.0598 0.02 0.058 0.006 0 0 5 10 15 0.01 5 10 C C 0 15 gamma Course 27/59 Summer Mining Data SVR Study : Model Training, Selection and Prediction True returns (red) and raw predictions (blue) CVMSE (IR*, HR*, CR*) CVMSE (IR*, HR*, CR*) Course 28/59 Summer Mining Data SVR: Individual Effects Effect of credit spread on SP500 Effect of 3m treasure bill on SP500 -2.8 -0.5 -1 -2.85 -1.5 SP500 SP500 -2.9 -2 -2.95 -2.5 -3 -3.5 -70 -3 -60 -50 -40 -30 -20 -10 3moftreasure bill Effect vix on SP500 0 10 20 30 -3.05 -2-1 -0.5 0 -2.5 3 3.5 4 -2.5 -3 -3.5 -3 SP500 SP500 Effect of vix FUT on SP500 0.5 1 1.5 2 2.5 credit spread -4 -3.5 -4.5 -4 -5 -5.5 -40 -30 -20 -10 0 10 vix 20 30 40 50 60 -4.5 -10 -5 0 5 10 vix FUT 15 20 25 Course 29/34 Summer Mining Data SVR Technicalities: SVR vs. OLS Performance on the testHoliday set Data, test set, epsilon = 0.15 4 Performance on the test set SVR Expenditures 3.5 MSE= 0.04 3 2.5 2 0 5 10 4 15 20set, OLS solution 25 Holiday Data, test Observation 30 35 40 OLS Expenditure 3.5 3 MSE= 0.23 2.5 2 0 5 10 15 20 Obserlation 25 30 35 40 Course 30/34 Summer Mining Data Technical Note: Number of Training Errors vs. Model Complexity Min. number of training errors, Model complexity test errors training errors complexity Best trade-off Functions ordered in increasing complexity MATLAB video here… Course 31/34 Summer Mining Data Variable selection for regression Akaike Information Criterion (AIC). Final prediction error: Course 32/34 Summer Mining Data Variable selection for regression Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final prediction error: BIC tends to choose simpler models than AIC. Course 33/34 Summer Mining Data Variable selection for regression R^2-adjusted: Course 34/34 Summer Mining Data Conclusion / Summary / References Classical Linear Regression (any introductory statistical/econometric book) LASSO and Ridge Regression (linear and nonlinear) http://www-stat.stanford.edu/~tibs/lasso.html , Bishop, 2006 Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Alpaydin, 2004, Hastie et. el., 2001 Support Vector Regression (linear and nonlinear) Smola and Schoelkopf, 2003 Variable/feature selection (AIC, BIC, R^2adjusted) Hastie et. el., 2001, (any statistical/econometric book)