Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 LEAST SQUARES REGRESSION Notation: Assumption: (Simple Linear Model, Version 1) 1. E(Y|x) = !0 + !1x (linear mean function) [Picture] 2 Goal: To estimate !0 and !1 (and later " ) from data. Data: (x1, y1), (x2, y2), … , (xn, yn). 2 • The estimates of !0 and !1 will be denoted by "ˆ 0 and "ˆ1 , respectively. They are called the ordinary least squares (OLS) estimates of !0 and !1. • Eˆ (Y|x) = "ˆ 0 + "ˆ1 x = yˆ • The line y = "ˆ 0 + "ˆ1 x is called the ordinary least squares ! (OLS) line. • yˆ i = "ˆ 0 + "ˆ1 xi • eˆi = yi - yˆ i ( ith fitted value or ith fit) (ith residual) Set-up: • Consider lines y = h0 + h1x. • di = yi - (h0 + h1xi) • ! ! ! ! ! ! ! ! ! and "ˆ1 will be the values of h0 and h1 that minimize # di2. "ˆ 0 3 4 To find critical points: More Notation: "RSS "h0 (h0,h1) = # 2[yi - (h0 + h1xi)](-1) "RSS "h1 (h0,h1) = # 2[yi - (h0 + h1xi)](-xi) 2 • RSS(h0 ,h1) = # di (for Residual Sum of Squares). • RSS = RSS( "ˆ 0 , "ˆ1 ) = # eˆi 2 -- "the" Residual Sum of Squares (i.e., the minimal residual sum of squares) Solving for "ˆ 0 and "ˆ1 : So "ˆ 0 , "ˆ1 must satisfy the normal equations (i) • We want to minimize the function RSS(h0 ,h1) = # di2 = # [yi - (h0 + h1xi)]2 • Visually, there is no maximum. [See Demos] (ii) "RSS "ˆ "ˆ "h 0 ( 0 , 1 ) "RSS ( "ˆ 0 , "ˆ1 ) "h1 (i)' # eˆi = 0 • Therefore if there is a critical point, minimum occurs there. (ii)' # eˆi xi = 0 In words: (i)' ! (ii)' !! !! ! ! ! ! = # (-2)[yi - ( "ˆ 0 + "ˆ1 xi)]xi = 0 Cancelling the -2's and recalling that eˆi = yi - yˆ i : • RSS(h0 ,h1) ! 0 ! = # (-2)[yi - ( "ˆ 0 + "ˆ1 xi)] = 0 5 6 (ii) $ (substituting "ˆ 0 = y - "ˆ1 x ) Visually: Note that (i)' implies eˆi = 0 (sample mean of the eˆi 's is zero) To solve the normal equations: # [yi - ( y - "ˆ1 x + "ˆ1 xi)]xi = 0 $ # [(yi - y ) - "ˆ1 ( xi - x )]xi = 0 $ # xi (yi - y ) - "ˆ1 # xi( xi - x )] = 0 (i) $ # yi - # "ˆ 0 - "ˆ1 #xi = 0 # x (y $ n y - n "ˆ 0 - "ˆ1 (n x ) = 0 ˆ1 x = 0 $ y - "ˆ 0 - " Notation: SXX = # xi( xi - x ) Consequences: • y = "ˆ 0 + "ˆ1 x , which says: So for short: Note analogies to bivariate normal mean line: (equation! 4.14) ! ! ! ! • (µX.,µY) lies on the (population) mean line ! ! (Problem 4.7) ! ! ! ! ! ! ! ! SYY = # yi (yi - y ) SXY = # xi (yi - y ) • Can solve for "ˆ 0 once we find "ˆ1 : "ˆ 0 = y - "ˆ1 x • %Y|x = E(Y) - &Y|xE(X) " y) i i $ "ˆ1 = # x (x " x ) i i SXY "ˆ1 = SXX 7 Useful identities: 8 Summarize: • SXX = # ( xi - x )2 "ˆ1 = • SXY = # ( xi - x ) (yi - y ) SXY SXX SXY "ˆ 0 = y - "ˆ1 x = y x SXX • SXY = # ( xi - x ) yi • SYY = # (yi - y )2 Connection with Sample Correlation Coefficient Proof of (1): # ( x i - x )2 Recall: The sample correlation coefficient c oˆ v(x, y) = # [xi ( xi - x ) - x ( xi - x )] ˆ (x,y) = r = r(x,y) = " sd( x)sd( y) = # xi( xi - x ) - x # ( xi - x ), (Note: everything here calculated from sample.) and Note that: # ( xi - x ) = # xi - n x coˆ v (x,y) = = nx - nx = 0 [sd(x)]2 = (Try the others yourself!) [sd(y)]2 = ! ! ! ! ! ! ! ! ! Therefore: 1 # n "1 1 # n "1 ( xi - x ) (yi - y ) = ( x i - x )2 = 1 SYY n "1 1 SXY n "1 1 SXX n "1 (similarly) 9 r2 = [coˆ v(x, y)]2 [sd( x)]2 [sd( y)]2 = 10 More on r: Recall: # 1 &2 % ( (SXY) 2 $ n "1 ' # 1 &# 1 & SXX (% SYY ( % $n "1 '$ n " 1 ' Fits 2 = ( SXY ) (SXX )( SYY ) yˆ i = "ˆ 0 + "ˆ1 xi Residuals eˆi = yi - yˆ i = yi - ( "ˆ 0 + "ˆ1 xi ) Also, sd( y) r sd( x) = = = For short: c oˆ v(x, y) sd( y) sd( x)sd( y) sd( x) RSS = RSS( "ˆ 0 , "ˆ1 ) = # eˆi 2 -- "the" Residual Sum of Squares (i.e., the minimal residual sum of squares) coˆ v(x, y) 2 sd( x) 1 SXY n -1 1 SXX n -1 "ˆ1 = r "ˆ 0 = y - "ˆ1 x SXY = SXX = "ˆ1 i.e., the point ( x , y ) is on the least squares line. sy sx Recall and note analogy: For a bivariate normal distribution, E(Y|X = x) = %Y|x + &Y|Xx (equation 4.13) # where y &Y|X = " # ! ! ! x ! ! ! [Picture] ! ! ! ! ! 11 12 RSS Calculate: r2 = 1- SYY = RSS = # eˆi 2 = #[ yi - ( "ˆ 0 + "ˆ1 xi )]2 Interpretation: = #[ yi - ( y - "ˆ1 x ) - "ˆ1 xi ]2 RSS = # eˆi 2 is a measure of the variability in y remaining after conditioning on x (i.e., after regressing on x) = #[ (yi - y )2 -2 "ˆ1 ( xi - x )(yi - y ) + "ˆ1 2( xi - x )2] = #(yi - y )2 -2 "ˆ1 # ( xi - x )(yi - y ) + "ˆ1 2# ( xi - x )2 SXY (SXY ) " SXY % 2 $ ' # SXX & So SXX SYY - RSS is a measure of the amount of variability of y accounted for by conditioning (i.e., regressing) on x. 2 = SYY - SXX # (SXY)2 Thus & = SYY %%$1" (SXX )(SYY) ((' SYY " RSS = SYY(1 - r2) Thus ! ! 2 1-r = RSS , SYY so ! ! ! ! [Picture] SYY = # (yi - y )2 is a measure of the total variability of the yi's from y . = #[ (yi - y ) - "ˆ1 ( xi - x )]2 = SYY - 2 SXX SXY + SYY " RSS SYY r2 = SYY is the proportion of the total variability in y accounted for by regressing on x. 13 Note: One can show (details left to the interested student) that SYY - RSS = # ( yˆ i - y )2 and yˆ i = y , so vˆar( yˆ i ) that in fact r2 = vˆar(y ) , the proportion of the sample i variance of y accounted for by regression on x. !