Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. CSE 5331/7331 F'07 © Prentice Hall 1 Table of Contents Linear Regression Nonlinear Regression Logistic Regression Metrics CSE 5331/7331 F'07 2 Remember High School? Y= mx + b You need two points to determine a straight line. You need two points to find values for m and b. THIS IS REGRESSION CSE 5331/7331 F'07 3 Regression Predict future values based on past values Linear Regression assumes linear relationship exists. y = c 0 + c1 x 1 + … + c n x n Find values to best fit the data CSE 5331/7331 F'07 © Prentice Hall 4 Linear Regression CSE 5331/7331 F'07 © Prentice Hall 5 Linear Regression Assume data fits a predefined function Determine best values for regression coefficients c0,c1,…,cn. Assume an error: y = c0+c1x1+…+cnxn+e Estimate error using mean squared error for training set: CSE 5331/7331 F'07 © Prentice Hall 6 Linear Regression Poor Fit Why use sum of least squares? http://curvefit.com/sum_of_squares.htm Linear doesn’t always work well CSE 5331/7331 F'07 7 Nonlinear Regression Data does not nicely fit a straight line Fit data to a curve Many possible functions Not as easy and straightforward as linear regression How nonlinear regression works: http://curvefit.com/how_nonlin_works.htm CSE 5331/7331 F'07 8 Logistic Regression Generalized linear model Predict discrete outcome – Binomial (binary) logistic regression – Multinomial logistic regression One dependent variable Logistic Regression by Gerard E. Dallal http://www.tufts.edu/~gdallal/logistic.htm CSE 5331/7331 F'07 9 Logistic Regression (cont’d) p log( ) 0 1 x 1 p Log Odds Function: P is probability that outcome is 1 Odds – The probability the event occurs divided by the probability that it does not occur Log Odds function is strictly increasing as p increases CSE 5331/7331 F'07 10 Why Log Odds? Shape of curve is desirable Relationship to probability Range – to + CSE 5331/7331 F'07 11 P-value The probability that a variable has a value greater than the observed value http://en.wikipedia.org/wiki/P-value http://sportsci.org/resource/stats/pvalue s.html CSE 5331/7331 F'07 12 Correlation Examine the degree to which the values for two variables behave similarly. Correlation coefficient r: • 1 = perfect correlation • -1 = perfect but opposite correlation • 0 = no correlation CSE 5331/7331 F'07 © Prentice Hall 13 Covariance Degree to which two variables vary in the same manner Correlation is normalized and covariance is not http://www.ds.unifi.it/VL/VL_EN/expect/e xpect3.html CSE 5331/7331 F'07 © Prentice Hall 14 Residual Error Difference between desired output and predicted output May actually use sum of squares CSE 5331/7331 F'07 15