Download LEAST SQUARES REGRESSION Assumption: (Simple Linear

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
LEAST SQUARES REGRESSION
Notation:
Assumption: (Simple Linear Model, Version 1)
1. E(Y|x) = !0 + !1x (linear mean function)
[Picture]
2
Goal: To estimate !0 and !1 (and later " ) from data.
Data: (x1, y1), (x2, y2), … , (xn, yn).
2
• The estimates of !0 and !1 will be denoted by
"ˆ 0 and "ˆ1 , respectively. They are called the
ordinary least squares (OLS) estimates of !0
and !1.
• Eˆ (Y|x) = "ˆ 0 + "ˆ1 x = yˆ
• The line y = "ˆ 0 + "ˆ1 x is called the ordinary
least squares
! (OLS) line.
• yˆ i =
"ˆ 0
+ "ˆ1 xi
• eˆi = yi - yˆ i
( ith fitted value or ith fit)
(ith residual)
Set-up:
• Consider lines y = h0 + h1x.
• di = yi - (h0 + h1xi)
•
!
!
!
!
!
!
! !
!
and "ˆ1 will be the values of h0 and h1 that
minimize # di2.
"ˆ 0
3
4
To find critical points:
More Notation:
"RSS
"h0 (h0,h1) = # 2[yi - (h0 + h1xi)](-1)
"RSS
"h1 (h0,h1) = # 2[yi - (h0 + h1xi)](-xi)
2
• RSS(h0 ,h1) = # di (for Residual Sum of
Squares).
• RSS = RSS( "ˆ 0 , "ˆ1 ) = # eˆi 2 -- "the" Residual
Sum of Squares (i.e., the minimal residual sum
of squares)
Solving for "ˆ 0 and "ˆ1 :
So "ˆ 0 , "ˆ1 must satisfy the normal equations
(i)
• We want to minimize the function RSS(h0 ,h1)
= # di2 = # [yi - (h0 + h1xi)]2
• Visually, there is no maximum. [See Demos]
(ii)
"RSS
"ˆ "ˆ
"h 0 ( 0 , 1 )
"RSS
( "ˆ 0 , "ˆ1 )
"h1
(i)' # eˆi = 0
• Therefore if there is a critical point, minimum
occurs there.
(ii)' # eˆi xi = 0
In words:
(i)'
!
(ii)'
!!
!!
!
!
!
!
= # (-2)[yi - ( "ˆ 0 + "ˆ1 xi)]xi = 0
Cancelling the -2's and recalling that eˆi = yi - yˆ i :
• RSS(h0 ,h1) ! 0
!
= # (-2)[yi - ( "ˆ 0 + "ˆ1 xi)] = 0
5
6
(ii) $ (substituting "ˆ 0 = y - "ˆ1 x )
Visually:
Note that (i)' implies eˆi = 0
(sample mean of the eˆi 's is zero)
To solve the normal equations:
# [yi - ( y - "ˆ1 x + "ˆ1 xi)]xi = 0
$ # [(yi - y ) - "ˆ1 ( xi - x )]xi = 0
$ # xi (yi - y ) - "ˆ1 # xi( xi - x )] = 0
(i) $ # yi - # "ˆ 0 - "ˆ1 #xi = 0
# x (y
$ n y - n "ˆ 0 - "ˆ1 (n x ) = 0
ˆ1 x = 0
$ y - "ˆ 0 - "
Notation:
SXX = # xi( xi - x )
Consequences:
• y = "ˆ 0 + "ˆ1 x , which says:
So for short:
Note analogies to bivariate normal mean line:
(equation! 4.14)
!
!
!
!
• (µX.,µY) lies on the (population) mean line
!
!
(Problem 4.7)
! !
!
!
!
!
!
!
SYY = # yi (yi - y )
SXY = # xi (yi - y )
• Can solve for "ˆ 0 once we find "ˆ1 : "ˆ 0 = y - "ˆ1 x
• %Y|x = E(Y) - &Y|xE(X)
" y)
i
i
$ "ˆ1 = # x (x " x )
i
i
SXY
"ˆ1 =
SXX
7
Useful identities:
8
Summarize:
• SXX = # ( xi - x )2
"ˆ1 =
• SXY = # ( xi - x ) (yi - y )
SXY
SXX
SXY
"ˆ 0 = y - "ˆ1 x = y x
SXX
• SXY = # ( xi - x ) yi
• SYY = # (yi - y )2
Connection with Sample Correlation Coefficient
Proof of (1):
# ( x i - x )2
Recall: The sample correlation coefficient
c oˆ v(x, y)
= # [xi ( xi - x ) - x ( xi - x )]
ˆ (x,y) =
r = r(x,y) = "
sd( x)sd( y)
= # xi( xi - x ) - x # ( xi - x ),
(Note: everything here calculated from sample.)
and
Note that:
# ( xi - x ) = # xi - n x
coˆ v (x,y) =
= nx - nx = 0
[sd(x)]2 =
(Try the others yourself!)
[sd(y)]2 =
!
!
!
!
!
!
!
!
!
Therefore:
1
#
n "1
1
#
n "1
( xi - x ) (yi - y ) =
( x i - x )2 =
1
SYY
n "1
1
SXY
n "1
1
SXX
n "1
(similarly)
9
r2 =
[coˆ v(x, y)]2
[sd( x)]2 [sd( y)]2
=
10
More on r: Recall:
# 1 &2
%
( (SXY) 2
$ n "1 '
# 1
&# 1
&
SXX (%
SYY (
%
$n "1
'$ n " 1
'
Fits
2
=
( SXY )
(SXX )( SYY )
yˆ i = "ˆ 0 + "ˆ1 xi
Residuals
eˆi = yi - yˆ i
= yi - ( "ˆ 0 + "ˆ1 xi )
Also,
sd( y)
r sd( x) =
=
=
For short:
c oˆ v(x, y) sd( y)
sd( x)sd( y) sd( x)
RSS = RSS( "ˆ 0 , "ˆ1 ) = # eˆi 2 -- "the" Residual Sum
of Squares (i.e., the minimal residual sum of
squares)
coˆ v(x, y)
2
sd( x)
1
SXY
n -1
1
SXX
n -1
"ˆ1 = r
"ˆ 0 = y - "ˆ1 x
SXY
= SXX = "ˆ1
i.e., the point ( x , y ) is on the least squares line.
sy
sx
Recall and note analogy: For a bivariate normal
distribution,
E(Y|X = x) = %Y|x + &Y|Xx (equation 4.13)
#
where
y
&Y|X = " #
!
! ! x
!
!
!
[Picture]
! !
!
!
!
11
12
RSS
Calculate:
r2 = 1- SYY =
RSS = # eˆi 2 = #[ yi - ( "ˆ 0 + "ˆ1 xi )]2
Interpretation:
= #[ yi - ( y - "ˆ1 x ) - "ˆ1 xi ]2
RSS = # eˆi 2 is a measure of the variability in y
remaining after conditioning on x (i.e., after
regressing on x)
= #[ (yi - y )2 -2 "ˆ1 ( xi - x )(yi - y ) + "ˆ1 2( xi - x )2]
= #(yi - y )2 -2 "ˆ1 # ( xi - x )(yi - y ) + "ˆ1 2# ( xi - x )2
SXY
(SXY )
" SXY % 2
$
'
# SXX &
So
SXX
SYY - RSS is a measure of the amount of
variability of y accounted for by conditioning
(i.e., regressing) on x.
2
= SYY - SXX
#
(SXY)2
Thus
&
= SYY %%$1" (SXX )(SYY) (('
SYY " RSS
= SYY(1 - r2)
Thus
!
!
2
1-r =
RSS
,
SYY
so
!
!
!
!
[Picture]
SYY = # (yi - y )2 is a measure of the total
variability of the yi's from y .
= #[ (yi - y ) - "ˆ1 ( xi - x )]2
= SYY - 2 SXX SXY +
SYY " RSS
SYY
r2 = SYY
is the proportion of the total
variability in y accounted for by regressing
on x.
13
Note: One can show (details left to the interested
student) that SYY - RSS = # ( yˆ i - y )2 and yˆ i = y , so
vˆar( yˆ i )
that in fact r2 = vˆar(y ) , the proportion of the sample
i
variance of y accounted for by regression on x.
!
Related documents