Download Regression Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

System of polynomial equations wikipedia , lookup

Dual space wikipedia , lookup

Bra–ket notation wikipedia , lookup

Elementary algebra wikipedia , lookup

History of algebra wikipedia , lookup

Signal-flow graph wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Equation wikipedia , lookup

Linear algebra wikipedia , lookup

System of linear equations wikipedia , lookup

Transcript
Regression Analysis
•
•
•
•
GOAL: Determine a function that “best” maps a
set of inputs to corresponding outputs.
Regression is one form of data mining.
Regression both describes the data (finds humaninterpretable patterns that describe the data) and
through interpolation and extrapolation predicts
unknown or future values of variables.
Regression analysis relies upon the concept of
minimizing the squares of the errors.
Regression Analysis: The problem
•
Given: a set of inputs x, corresponding outputs y,
and a desired parameterized function type (e.g.
linear, quadratic, exponential, logarithmic,
Gaussian, etc.), determine the curve of best fitby
finding the parameter values than minimize the
sum of the squares of the errors in the data.
Linear Regression
•
We will illustrate the concept beginning with the
simplest regression problem type: fitting a line to
a set of single variable inputs x and corresponding
single variable outputs y.
Linear Regression: Deriving the
Equations
•
Given the data
we wish to determine the values of the
parameters m̂ and b̂ for which the line
ŷ = m̂ x + b̂ minimizes the sum
n
å( ŷ - y )
i
i=1
i
2
Linear Regression: Deriving the
Equations
•
Substituting, we wish to minimize the function
(
n
)
(
f m̂, b̂ = å m̂ xi + b̂ - yi
•
i=1
)
2
This occurs where
( ) = 0 and ¶f (m̂, b̂) = 0
¶f m̂, b̂
¶m̂
•
¶b̂
That is, where
n
å2x ( m̂ x + b̂ - y ) = 0
i
i=1
i
i
n
(
)
and å 2 m̂ xi + b̂ - yi = 0
i=1
Linear Regression: Deriving the
Equations
•
Or,
n
n
n
i=1
i=1
i=1
m̂å xi2 + b̂å xi - å xi yi = 0
n
n
i=1
i=1
m̂ å xi + nb̂ - å yi = 0
Linear Regression: Deriving the
Equations
•
Solving, we have,
n
ù
1é n
b̂ = êå yi - m̂å xi ú
n ë i=1
û
i=1
•
where
m̂ =
n
n
n
i=1
i=1
i=1
nå xi yi - å xi å yi
n
æ
ö
2
nå xi - çå xi ÷
è i=1 ø
i=1
n
2
Linear Regression: Deriving the
Equations
•
Pearson Correlation Coefficient for Linear
n
n
n
Regression
n xy - x y
r=
å
å å
i i
i
i=1
i=1
n
æ
ö
2
nå xi - ç å xi ÷
è i=1 ø
i=1
n
-1 £ r £ 1
2
i
i=1
n
æ
ö
2
nå yi - çå yi ÷
è i=1 ø
i=1
n
2
Linear Regression: Analysis
•
•
•
•
r values “close” to +1 indicate a strong positive
correlation
r values “close” to -1 indicate a strong negative
correlation
r values “close” to 0 indicate little correlation
The “coefficient of determination”, r2, represents
the proportion of the variation that can be
explained by the linear relationship between x and
y provided by the regression equation.
Linear Regression: Excel
•
Demonsration
Regression Extensions
•
•
Multi-regression: Extension of regression to
multiple inputs
Non-linear regression: Replacing the linear
relationship between inputs x and outputs y with a
non-linear relationship
a
where the a are the parameters of f.
i
i
• Similar equation derivations: For
each 1£ j £ m , we have
æn
2ö
¶ çå( ŷi - yi ) ÷
è i=1
ø
=0
¶a j
Regression Extensions
•
Certain non-linear relations can be solved via
linear regression upon logarithms of the variables
p
y
=
Cx
• Power laws:
•
Taking logarithms of both sides yields
ln y = lnC + ln ( x p ) = lnC + p ln x
which is linear in log(x) and log(y)
• Logarithmic laws: y = mlog x + b
which is linear in log(x) and y
• Exponential laws: y = Cerx + b
•
Taking logarithms aof both sides,
log ( y - b) = log (C ) + rx
which is linear in log(y-b) and x.
i