Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Regression Analysis • • • • GOAL: Determine a function that “best” maps a set of inputs to corresponding outputs. Regression is one form of data mining. Regression both describes the data (finds humaninterpretable patterns that describe the data) and through interpolation and extrapolation predicts unknown or future values of variables. Regression analysis relies upon the concept of minimizing the squares of the errors. Regression Analysis: The problem • Given: a set of inputs x, corresponding outputs y, and a desired parameterized function type (e.g. linear, quadratic, exponential, logarithmic, Gaussian, etc.), determine the curve of best fitby finding the parameter values than minimize the sum of the squares of the errors in the data. Linear Regression • We will illustrate the concept beginning with the simplest regression problem type: fitting a line to a set of single variable inputs x and corresponding single variable outputs y. Linear Regression: Deriving the Equations • Given the data we wish to determine the values of the parameters m̂ and b̂ for which the line ŷ = m̂ x + b̂ minimizes the sum n å( ŷ - y ) i i=1 i 2 Linear Regression: Deriving the Equations • Substituting, we wish to minimize the function ( n ) ( f m̂, b̂ = å m̂ xi + b̂ - yi • i=1 ) 2 This occurs where ( ) = 0 and ¶f (m̂, b̂) = 0 ¶f m̂, b̂ ¶m̂ • ¶b̂ That is, where n å2x ( m̂ x + b̂ - y ) = 0 i i=1 i i n ( ) and å 2 m̂ xi + b̂ - yi = 0 i=1 Linear Regression: Deriving the Equations • Or, n n n i=1 i=1 i=1 m̂å xi2 + b̂å xi - å xi yi = 0 n n i=1 i=1 m̂ å xi + nb̂ - å yi = 0 Linear Regression: Deriving the Equations • Solving, we have, n ù 1é n b̂ = êå yi - m̂å xi ú n ë i=1 û i=1 • where m̂ = n n n i=1 i=1 i=1 nå xi yi - å xi å yi n æ ö 2 nå xi - çå xi ÷ è i=1 ø i=1 n 2 Linear Regression: Deriving the Equations • Pearson Correlation Coefficient for Linear n n n Regression n xy - x y r= å å å i i i i=1 i=1 n æ ö 2 nå xi - ç å xi ÷ è i=1 ø i=1 n -1 £ r £ 1 2 i i=1 n æ ö 2 nå yi - çå yi ÷ è i=1 ø i=1 n 2 Linear Regression: Analysis • • • • r values “close” to +1 indicate a strong positive correlation r values “close” to -1 indicate a strong negative correlation r values “close” to 0 indicate little correlation The “coefficient of determination”, r2, represents the proportion of the variation that can be explained by the linear relationship between x and y provided by the regression equation. Linear Regression: Excel • Demonsration Regression Extensions • • Multi-regression: Extension of regression to multiple inputs Non-linear regression: Replacing the linear relationship between inputs x and outputs y with a non-linear relationship a where the a are the parameters of f. i i • Similar equation derivations: For each 1£ j £ m , we have æn 2ö ¶ çå( ŷi - yi ) ÷ è i=1 ø =0 ¶a j Regression Extensions • Certain non-linear relations can be solved via linear regression upon logarithms of the variables p y = Cx • Power laws: • Taking logarithms of both sides yields ln y = lnC + ln ( x p ) = lnC + p ln x which is linear in log(x) and log(y) • Logarithmic laws: y = mlog x + b which is linear in log(x) and y • Exponential laws: y = Cerx + b • Taking logarithms aof both sides, log ( y - b) = log (C ) + rx which is linear in log(y-b) and x. i