Download Simple OLS - National University of Kaohsiung

1.1 Purpose of Regression Analysis 1. Estimate a relationship among economic variables, such as y = f(x). 2. Forecast or predict the value of one variable, y, based on the value of another variable, x. 1.2 Weekly Food Expenditures y = dollars spent each week on food items. x = consumer’s weekly income. The relationship between x and the expected value of y , given x, might be linear: E(y|x) = 1 + 2 x 1.3 f(y|x=480) f(y|x=480) y|x=480 y Probability Distribution f(y|x=480) of Food Expenditures if given income x=$480. f(y|x) f(y|x=480) f(y|x=800) y|x=480 y|x=800 Probability Distribution of Food Expenditures if given income x=$480 and x=$800. 1.4 y 1.5 Average Expenditure E(y|x) E(y|x)=1+2x E(y|x) x 2 = E(y|x) x 1{ x (income) The Economic Model: a linear relationship between average expenditure on food and income. 1.6 Homoskedastic Case f(yt) . . E(y|x) = 1 + 2 x x1=480 x2=800 income xt The probability density function for yt at two levels of household income, x t 1.7 Heteroskedastic Case f(yt) . . x1 x2 x3 . income The variance of yt increases as household income, x t , increases. xt The Error Term 1.8 y is a random variable composed of two parts: I. Systematic component: This is the mean of y. E(y|x) = 1 + 2x II. Random component: ε = y - E(y) = y -  1 -  2x This is called the random error. Together E(y) and ε form the model: y = 1 + 2x + ε y . y4 y3 . y2 y1 1.9 . . x1 x2 x3 x4 x y . y4 y3 . y2 y1 1.10 . y . x1 x2 x3 x4 x y y4 ε4 y3 ε2 y2 y1 . {. . . } ε3 1.11 E(y) = 1 + 2x y ε1 x1 x2 x3 x4 x The relationship among y, εand the true regression line. Why must the stochastic error term be present in a regression equation 1.12 Many minor influences on Y are omitted from the equation (for instance, because data are unavailable). It is virtually impossible to avoid some sort of measurement error in at least one of the equation’s variables. The underlying theoretical equation might have a different functional form (or shape) than the one chosen for the regression. For example, the underlying equation might be nonlinear in the variables for a linear regression. All attempts to generalize human behavior must contain at least some amount of unpredictable or purely random variation. The Assumptions of Simple Linear 1.13 Regression Models 1. The value of y, for each value of x, is y = 1 + 2x + ε 2. The average value of the random error e is: E(ε) = 0 3. The variance of the random error e is: var(ε) = 2 = var(y) 4. The covariance between any pair of e is: cov(εi , εj) = cov(yi ,yj) = 0 for i  j 5. x must take at least two different values so that x is not a constant. 6. e is normally distributed with mean 0, var(ε)=2 (optional) ε ~ N(0,2) Population regression values: y t = 1 + 2x t + ε t Population regression line: E(y t|x t) = 1 + 2x t Sample regression values: y t = b1 + b2x t + ε^t Sample regression line: y^ t = b1 + b2x t 1.14 1.15 y ε^4 y^3 ε^2 y^1. y2 . {. y^ y .4 {. y^4 ^y = b + b x 1 2 .} ε^3 . y 3 2 ε^1 } . y1 x1 x2 x4 x3 ^ε x The relationship among y, and the fitted regression line. 1.16 y t = 1 + 2x t + ε t εt = y t － 1 － 2x t Minimize error sum of squared deviations: S(1,2) = T  (y t － 1 － 2x t )2 t =1 T =  εt 2 t =1 1.17 Minimize S(1,2) w.r.t. 1 and 2: S(1,2) = S() 1 S() 2 T (y t － 1 － 2x t 2 ) t =1 = － 2 (y t － 1 － 2x t ) = －2 x t (y t － 1 － 2x t ) Set each of these two derivatives equal to zero and solve these two equations for the two unknowns: 1, 2 1.18 Minimize w.r.t. 1 and 2: S() = S(.) T 2 ( y －  －  x )  t 1 2 t t =1 S(.) . S(.) < 0 i S(.) = 0 i . bi .S(.) i >0 i To minimize S(.), you set the two derivatives equal to zero to get: S() 1 S() 2 1.19 = － 2 (y t－ b1 － b2x t ) = 0 = － 2 x t (y t －b1 － b2x t ) = 0 When these two terms are set to zero, 1 and 2 become b1 and b2 because they no longer represent just any value of 1 and 2 but the special values that correspond to the minimum of S() . －2 (y t － 1.20 b1 － b2x t ) = 0 －2 x t (y t － b1 － b2x t ) = 0 y t － Tb1 － b2  x t = 0 x t y t － b1  x t － b2  xt 2 = 0 Tb1 + b2  x t = y t 2 b1  x t + b2  xt = x t y t 1.21 Tb1 + b2  x t = Normal 2 Equation b = 1  x t + b2  xt y t x t y t Solve for b1 and b2 using definitions of b2 = = T  x t yt  x t y t 2 2 T  x t － ( x t )  x t yt － Tx y 2 － T x2 x  t b1 = y － b2 x － x and y 1.22 Interpretation of Coefficients, b1 and b2 b2 represents an estimate of the mean change in y responding to a one-unit change in x. b1 is an estimate of the mean y when x = 0. It must be very careful to interpret the estimated intercept since we usually do not have any data points near x = 0. Note that regression analysis cannot be interpreted as a procedure for establishing a cause-and-effect relationship between variables. Simple Linear Regression Model y t = 1 + 2 x t +  t yt = demand for cars x t = prices For a given level of x t, the expected level of demand for cars will be: E(yt|x t) = 1 + 2 x t 1.23 Assumptions of the Simple Linear Regression Model 1.24 1. yt = 1 + 2x t +  t 2. E( t) = 0 <=> E(yt I xt) = 1 + 2x t 3. var( t) =  2 = var(yt) 4. cov( i, j) = cov(yi,yj) = 0 5. x t is not constant (no perfect collinearity) 6.  t~N(0, 2) <=> yt~N(1+ 2x t, 2) The population parameters 1 and 2 are unknown population constants. The formulas that produce the sample estimates b1 and b2 are called the estimators of 1 and 2. When b1 and b2 are used to represent the formulas rather than specific values, they are called estimators of 1 and 2 which are random variables because they are different from sample to sample. 1.25 1.26 Estimators are Random Variables ( estimates are not ) If the least squares estimators b1 and b2 are random variables, then what are their their means, variances, covariances and probability distributions? Compare the properties of alternative estimators to the properties of the least squares estimators. 1.27 The Expected Values of b1 and b2 The least squares formulas (estimators) in the simple regression case: b2 = Txtyt - xt yt Txt -(xt) 2 2 b1 = y - b2x where y = yt / T and x = x t / T Substitute in to get: yt = 1 + 2x t +  t 1.28 Txtt - xt t b2 = 2 + 2 2 Txt -(xt) The mean of b2 is: TxtE(t) - xt E(t) E(b2) = 2 + 2 2 Txt -(xt) Since E(t) = 0, then E(b2) = 2 . An Unbiased Estimator The result E(b2) = 2 means that the distribution of b2 is centered at 2. Since the distribution of b2 is centered at 2 ,we say that b2 is an unbiased estimator of 2. 1.29 Wrong Model Specification 1.30 The unbiasedness result on the previous slide assumes that we are using the correct model. If the model is of the wrong form or is missing important variables, then E(t) = 0, then E(b2) = 2 . 1.31 Unbiased Estimator of the Intercept In a similar manner, the estimator b1 of the intercept or constant term can be shown to be an unbiased estimator of 1 when the model is correctly specified. E(b1) = 1 1.32 Equivalent expressions for b2: (xt  x)yt  y ) b2 = 2 xt  x ) Expand and multiply top and bottom by T: b2 = Txtyt  xt yt Txt (xt) 2 2 = xtyt – T x y xt2 – T x2 1.33 Variance of b2 Given that both yt and t have variance  2, the variance of the estimator b2 is: var(b2) = 2 x t  x 2 b2 is a function of the yt values but var(b2) does not involve yt directly. 1.34 Variance of b1 Given b1 = y  b2x the variance of the estimator b1 is: x t var(b1) =  2 2 Tx t  x 2 1.35 Covariance of b1 and b2 cov(b1,b2) = 2 x x t  x 2 What factors determine variance and covariance of b1 and b2? 1.36 larger the 2, the greater the uncertainty about b1, b2 and their relationship. 2. The more spread out the xt values are then the more confidence we have in b1, b2, etc. 3. The larger the sample size, T, the smaller the variances and covariances. 4. The variance b1 is large when the (squared) xt values are far from zero (in either direction). 5. Changing the slope, b2, has no effect on the intercept, b1, when the sample mean is zero. But if sample mean is positive, the covariance between b1 and b2 will be negative, and vice versa. 1. The Gauss-Markov Theorem Under the first five assumptions of the simple, linear regression model, the ordinary least squares estimators b1 and b2 have the smallest variance of all linear and unbiased estimators of 1 and 2. This means that b1and b2 are the Best Linear Unbiased Estimators (BLUE) of 1 and 2. 1.37 Implications of Gauss-Markov 1.38 b1 and b2 are best within the class of linear and unbiased estimators. 2. Best means smallest variance within the class of linear/unbiased. 3. All of the first five assumptions must hold to satisfy Gauss-Markov. 4. Gauss-Markov does not require assumption six: normality. 5. G-Markov is not based on the least squares principle but on the estimation rules of b1 and b2. 1.39 G-Markov implications (continued) 6. If we are not satisfied with restricting our estimation to the class of linear and unbiased estimators, we should ignore the Gauss-Markov Theorem and use some nonlinear and/or biased estimator instead. (Note: a biased or nonlinear estimator could have smaller variance than those satisfying Gauss-Markov.) 7. Gauss-Markov applies to the b1 and b2 estimators and not to particular sample values (estimates) of b1 and b2. yt and t normally distributed 1.40 The least squares estimator of 2 and 1 can be expressed as a linear combination of yt : b2 = wt yt x t  x where wt = 2 x t  x b1 = y  b2x This means that b1and b2 are normal since linear combinations of normals are normal. normally distributed under The Central Limit Theorem 1.41 If the first five Gauss-Markov assumptions hold, and sample size, T, is sufficiently large, then the least squares estimators, b1 and b2, have a distribution that approximates the normal distribution with greater accuracy the larger the value of sample size, T. Probability Distribution of Least Squares Estimators 1.42 If one of the above two conditions is satisfied, then the distributions of b1 and b2 are b1 ~ N  1 , b2 ~ N  2 , 2 x t 2 Tx t  x 2 x t  x 2 2 Consistency 1.43 We would like our estimators, b1 and b2, to collapse onto the true population values, 1 and 2, as sample size, T, goes to infinity. One way to achieve this consistency property is for the variances of b1 and b2 to go to zero as T goes to infinity. Since the formulas for the variances of the least squares estimators b1 and b2 show that their variances do, in fact, go to zero, then b1 and b2, are consistent estimators of 1 and 2. Estimating the variance of the error term, 2 ^ εt = yt b1  b2 x t T ^  = ^2  εt t =1 T2 ^  is an unbiased estimator of  2 1.44 The Least Squares Predictor, yô Given a value of the explanatory variable, Xo, we would like to predict a value of the dependent variable, yo. The least squares predictor is: ^ y o = b1 + b 2 x o 1.45 1.46 Probability Distribution of Least Squares Estimators b1 ~ N  1 , 2 x t 2 Tx t  x 2 b2 ~ N 2 , 2 x t  x 2 1.47 b2 ~ N  2 , 2 x t  x 2 Create a standardized normal random variable, Z, by subtracting the mean of b2 and dividing by its standard deviation:  b2 2 var(b2)  Error Variance Estimation 1.48 Unbiased estimator of the error variance: t  ^2 =   ^ ε T Transform to a chi-square distribution: ^2 T   2   T Chi-Square degrees of freedom Since the errors  t = yt  1  2x t are not observable, we estimate them with  the sample residuals ε t = yt  b1  b2x t. Unlike the errors, the sample residuals are not independent since they use up two degrees of freedom by using b1 and b2 to estimate 1 and 2. We get only T2 degrees of freedom instead of T. 1.49 1.50 Student-t Distribution t= Z ~ t(m) V/m where Z ~ N(0,1) and V ~  2 (m) Provided both Z and V are independent. t = Z 1.51 ~ t(T-2) V / (T2) where Z = (b2 2) var(b2) and var(b2) = 2 ( xi  x )2 1.52 Z t = V / (T-2) V = (T2) ^ 2 2 (b2 2) t = var(b2) 2 ^ (T2)  2 (T2) var(b2) =  1.53 2 ( xi  x )2 (b2 2) 2 notice the cancellations  ( xi  x )2 t = = ^ (T2)  2 (T2) 2 (b2 2) ^2  ( xi  x )2 1.54 t = (b2 2) = 2 ^  ( xi  x ) t = 2 (b2 2) se(b2) (b2 2) ^ var(b2) 1.55 Student t - statistic t = (b2 2) se(b2) ~ t (T2) t has a Student-t Distribution with T2 degrees of freedom. The Least Squares ^ Predictor, yo 1.56 Given a value of the explanatory variable, X0, we would like to predict a value of the dependent variable, yo. The least squares predictor ^ is: y o = b1 + b 2 x o Prediction error : f = yô  yo = (b1 - 1) + (b2 - 2)x0 – ε0 1.57 Prediction error : f = ^yo  yo = (b1 - 1) + (b2 - 2)x0 – ε0 E[f ] =E[ yô  yo] = 0 2  x  x  1 o 2 var( f ) =  1 + + T x t  x2 f ~ N [, var( f )] 1.58 Prediction Intervals A (1)x100% prediction interval for yo is: y^ o  t(T-2),/2 se( f ) f = yô  yo se( f ) = ^ var( f ) 2  x  x  1 o ^ ^ 2 var( f ) =  1 + + T x t  x2 1.59 The Least Squares Estimator of Mean Response, ^ o , when x = x0 ^ = b + b x o 1 2 o Estimation error : ^ o  E[yo] = (b1 - 1) + (b2 - 2)x0 var( ^ 0) =  2 2  x  x  1 + o T x t  x 2 1.60 1.61 f(yt) . . . . y1 1 0 y0 x0 x1 X ^ and Predicted Response y^ Mean Response  Explaining Variation in yt 1.62 Predicting yt without any explanatory variables: yt = 1 + εt T (yt  b1) = 0 t=1 T εt = (yt  1) 2 t=1 T T T 2 t=1 T ε  t t=1 =  (y  b ) = 0  t 1 t=1 1 2 yt  Tb1 = 0 t=1 b1 = y y . y4 y3 . y2 y1 1.63 . . x1 x2 x3 x4 x y . y4 y3 . y2 y1 1.64 . y . x1 x2 x3 x4 x Explaining Variation in yt 1.65 ^ yt = b1 + b2xt + εt ^ Explained variation: yt = b1 + b2xt Unexplained variation: ^ ^ εt = yt  yt = yt  b1 b2xt y Y4 y4 ε^4 ε^2 y3 {. y2 y1 . 1.66 . x1 ^Y = b +b x 1 . } y ε^3 ε^1 x2 x3 x4 x 2 Explaining Variation in yt ^ ^ yt = yt + εt 1.67 using y as baseline ^ ^ yt  y = yt  y + εt T T (yty) =  t=1 2 t=1 cross 2 ^ product ε t=1 t term drops out 2 ^ (y y) + t T  SST = SSR + SSE y y4 E4 = Y  (b1+b2X) (SSE) y3 (b1+b2X)  Y (SSR) } . y y2 SSR (SST) y1 . x1 { . . 1.68 b1 + b2x Y  Y (SST) SSE x2 x3 x4 The relationship among SST, SSR, and SSE. x Total Variation in yt 1.69 SST = total sum of squares SST measures variation of yt around y T SST = (yt y) =  t=1 2 2 yt T 2 y Explained Variation in yt 1.70 SSR = regression sum of squares ^ Fitted yt values: ^y = b + b x t 1 2 t ^ SSR measures variation of yt around y T SSR= (yt y) =b2 (xt x) t=1 ^ 2 2 2 Unexplained Variation in yt SSE = error sum of squares ^ ^ εt = ytyt = yt b1  b2xt ^ SSE measures variation of yt around yt T SSE T = (yt yt) =  t=1 ^ 2 t=1 ^ 2 ε t 1.71 Analysis of Variance Table 1.72 Table 6.1 Analysis of Variance Table Source of Sum of Mean Variation DF Squares Square Explained 1 SSR MSR =SSR/1 Unexplained T-2 SSE MSE =SSE/(T-2) ^ 2] [=  Total T-1 SST Coefficient of Determination What proportion of the variation in yt is explained? 2 0< R <1 2 R = SSR SST 1.73 Coefficient of Determination SST = SSR + SSE SST SST Dividing by SST = SSR SSE + SST SST 1 = 2 R = SSR SST SSR + SSE SST SST = 1 SSE SST 1.74 Coefficient of Determination 2 R 1.75 is only a descriptive measure. 2 R does not measure the quality of the regression model. Focusing solely on maximizing 2 R is not a good idea. 1.76 In simple linear regression models, there are two ways to test H0: β2 = 0 vs 1. HA: β2 ≠ 0 Under H0, t = b2 / se(b2) ~ t(T-2) 2. Under H0, F = MSR / MSE ~ F1, T-2 Note that 1. 2. It can be show that t2(T-2) = F1, T-2 F = MSR / MSE = R2 / [(1  R2) / (T  2)] Regression Computer Output 1.77 Typical computer output of regression estimates: Table 6.2 Computer Generated Least Squares Results (1) (2) (3) (4) (5) Parameter Standard T for H0: Variable Estimate Error Parameter=0 Prob>|T| INTERCEPT 40.7676 22.1387 1.841 0.0734 X 0.1283 0.0305 4.201 0.0002 Regression Computer Output b1 = 40.7676 b2 = 0.1283 se(b1) = ^ 1) = 490.12 var(b se(b2) = ^ 2) = 0.0009326 = 0.0305 var(b t = t = b1 se(b1) b2 se(b2) = = = 22.1287 40.7676 22.1287 = 1.84 0.1283 0.0305 = 4.20 1.78 Regression Computer Output 1.79 Sources of variation in the dependent variable: Table 6.3 Analysis of Variance Table Sum of Mean Source DF Squares Square Explained 1 25221.2229 25221.2229 Unexplained 38 54311.3314 1429.2455 Total 39 79532.5544 R-square: 0.3171 Regression Computer Output SST = (yty) = 79532 2 ^ SSR = (yty) = 25221 2 ^ SSE = ε = 54311 2  t SSE /(T-2) = ^2 2 R = SSR SST = 1 = 1429.2455 SSE = 0.317 SST 1.80 Reporting Regression Results 1.81 2 R = 0.317 This R2 value may seem low but it is typical in studies involving cross-sectional data analyzed at the individual or micro level. A considerably higher R2 value would be expected in studies involving time-series data analyzed at an aggregate or macro level.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Simple OLS - National University of Kaohsiung