Download Predictions from the Multiple Regression Models

Chapter 13 Multiple Regression © Multiple Regression Model Multiple regression enables us to determine the simultaneous effect of several independent variables on a dependent variable using the least squares principle. Y  f ( X1 , X 2 ,, X K ) Multiple Regression Objectives Multiple regression provides two important results: 1. A linear equation that predicts the dependent variable, Y, as a function of “K” independent variables, xji, j = 1 , . . K. yˆ  b0  b1 x1i  b2 x2i    bk xki 2. The marginal change in the dependent variable, Y, that is related to a change in the independent variables – measured by the partial coefficients, bj’s. In multiple regression these partial coefficients depend on what other variables are included in the model. The coefficients bj indicates the change in Y given a unit change in xj while controlling for the simultaneous effect of the other independent variables. (In some problems both results are equally important. However, usually one will predominate. Multiple Regression Model (Example 11.1) Year Revenue 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 3.92 3.61 3.32 3.07 3.06 3.11 3.21 3.26 3.42 3.42 3.45 3.58 3.66 3.78 3.82 3.97 4.07 4.25 4.41 4.49 4.7 4.58 4.69 4.71 4.78 Number of Offices Profit Margin 7298 0.75 6855 0.71 6636 0.66 6506 0.61 6450 0.7 6402 0.72 6368 0.77 6340 0.74 6349 0.9 6352 0.82 6361 0.75 6369 0.77 6546 0.78 6672 0.84 6890 0.79 7115 0.7 7327 0.68 7546 0.72 7931 0.55 8097 0.63 8468 0.56 8717 0.41 8991 0.51 9179 0.47 9318 0.32 Multiple Regression Model POPULATION MULTIPLE REGRESSION MODEL The population multiple regression model defines the relationship between a dependent or endogenous variable, Y, and a set of independent or exogenous variables, xj, j=1, . . , K. The xji’s are assumed to be fixed numbers and Y is a random variable, defined for each observation, i, where i = 1, . . ., n and n is the number of observations. The model is defined as Yi  0  1 x1i   2 x2i     K xKi   i Where the j’s are constant coefficients and the ’s are random variables with mean 0 and variance 2. Standard Multiple Regression Assumptions The population multiple regression model is Yi  0  1 x1i   2 x2i     K xKi   i and we assume that n sets of observations are available. The following standard assumptions are made for the model. 1. The x’s are fixed numbers, or they are realizations of random variables, Xji that are independent of the error terms, i’s. In the later case, inference is carried out conditionally on the observed values of the xji’s. 2. The error terms are random variables with mean 0 and the same variance, 2. The later is called homoscedasticity or uniform variance. E[ i ]  0 and E[ i ]   2 for (i  1, , n) 2 Standard Multiple Regression Assumptions (continued) 3. The random error terms, i , are not correlated with one another, so that E[ i j ]  0 for all i  j 4. It is not possible to find a set of numbers, c0, c1, . . . , ck, such that c0  c1 x1i  c2 x2i    cK xKi  0 This is the property of no linear relation for the Xj’s. Least Squares Estimation and the Sample Multiple Regression We begin with a sample of n observations denoted as (x1i, x2i, . . ., xKi, yi i = 1, . . ,n) measured for a process whose population multiple regression model is Yi  0  1 x1i   2 x2i     K xKi   i The least-squares procedure obtains estimates of the coefficients, 1, 2, . . .,K are the values b0 , b1, . . ., bK, for which the sum of the squared deviations n SSE   ( yi  b0  b1 x1i  b2 x2i  bK xKi ) 2 i 1 is a minimum. The resulting equation yi  b0  b1 x1  b2 x2    bK xK is the sample multiple regression of Y on X1, X2, . . ., XK. Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is: Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space Regression Statistics Multiple R 0.930212915 R Square 0.865296068 Adjusted R Square 0.853050256 Standard Error 0.053302217 Observations 25 ANOVA df Regression Residual Total Intercept Revenue Number of Offices b0 2 22 24 SS 0.40151122 0.06250478 0.464016 MS F Significance F 0.20075561 70.66057082 2.64962E-10 0.002841126 Coefficients Standard Error t Stat P-value 1.564496771 0.079395981 19.70498685 1.81733E-15 0.237197475 0.055559366 4.269261695 0.000312567 -0.000249079 3.20485E-05 -7.771949195 9.50879E-08 b1 b2 Sum of Squares Decomposition and the Coefficient of Determination Given the multiple regression model fitted by least squares yi  b0  b1 x1i  b2 x2i    bK xKi  ei  yˆi  ei Where the bj’s are the least squares estimates of the coefficients of the population regression model and e’s are the residuals from the estimated regression model. The model variability can be partitioned into the components SST  SSR  SSE Where n Total Sum of Squares SST   ( yi  y ) 2 i 1 n n i 1 i 1 SST   ( yˆ i  y ) 2   ( yi  yˆ i ) 2 Sum of Squares Decomposition and the Coefficient of Determination (continued) Error Sum of Squares: n n i 1 i 1 SSE   ( yi  yˆ i ) 2   eˆi2 Regression Sum of Squares: n SSR   ( yˆ i  y ) 2 i 1 This decomposition can be interpreted as Total sample variability  Explained variability  Unexplained variability Sum of Squares Decomposition and the Coefficient of Determination (continued) The coefficient of determination, R2, of the fitted regression is defined as the proportion of the total sample variability explained by the regression and is SSR SSE R   1 SST SST 2 and it follows that 0  R2  1 Estimation of Error Variance Given the population regression model Yi  0  1 x1i   2 x2i     K xKi   i And the standard regression assumptions, let 2 denote the common variance of the error term i. Then an unbiased estimate of that variance is n 2 e i SSE s   n  K 1 n  K 1 2 e i 1 The square root of the variance, Se is also called the standard error of the estimate. Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is: Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space Regression Statistics Multiple R 0.930212915 R Square 0.865296068 Adjusted R Square 0.853050256 Standard Error 0.053302217 Observations 25 R2 se SSR SSE ANOVA df Regression Residual Total Intercept Revenue Number of Offices b0 2 22 24 SS 0.40151122 0.06250478 0.464016 MS F Significance F 0.20075561 70.66057082 2.64962E-10 0.002841126 Coefficients Standard Error t Stat P-value 1.564496771 0.079395981 19.70498685 1.81733E-15 0.237197475 0.055559366 4.269261695 0.000312567 -0.000249079 3.20485E-05 -7.771949195 9.50879E-08 b1 b2 Adjusted Coefficient of Determination The adjusted coefficient of determination, R2, is defined as SSE /( n  K  1) R  1 SST /( n  1) 2 We use this measure to correct for the fact that nonrelevant independent variables will result in some small reduction in the error sum of squares. Thus the adjusted R2 provides a better comparison between multiple regression models with different numbers of independent variables. Coefficient of Multiple Correlation The coefficient of multiple correlation, is the correlation between the predicted value and the observed value of the dependent variable 2 ˆ R  Corr (Y , y)  R and is equal to the square root of the multiple coefficient of determination. We use R as another measure of the strength of the linear relationship between the dependent variable and the independent variables. Thus it is comparable to the correlation between Y and X in simple regression. Basis for Inference About the Population Regression Parameters Let the population regression model be Yi  0  1 x1i   2 x2i     K xKi   i Let b0, b1 , . . , bK be the least squares estimates of the population parameters and sb0, sb1, . . ., sbK be the estimated standard deviations of the least squares estimators. Then if the standard regression assumptions hold and if the error terms i are normally distributed, the random variables corresponding to tb j  bj   j sb j ( j  1,2,, K ) are distributed as Student’s t with (n – K – 1) degrees of freedom. Confidence Intervals for Partial Regression Coefficients If the regression errors i , are normally distributed and the standard regression assumptions hold, the 100(1 - )% confidence intervals for the partial regression coefficients j, are given by b j  t( n K 1, / 2) sb j   j  b j  t( n K 1, / 2) sb j Where t(n – K - 1, /2) is the number for which P(t( n K 1)  t( n K 1, / 2) )   / 2 And the random variable t(n – K - 1) follows a Student’s t distribution with (n – K - 1) degrees of freedom. Multiple Regression Analysis for Profit Margin Analysis (Using Example 11.1) The regression equation is: Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space Regression Statistics Multiple R 0.930212915 R Square 0.865296068 Adjusted R Square 0.853050256 Standard Error 0.053302217 Observations 25 ANOVA df Regression Residual Total Intercept Revenue Number of Offices 2 22 24 SS 0.40151122 0.06250478 0.464016 MS F Significance F 0.20075561 70.66057082 2.64962E-10 0.002841126 Coefficients Standard Error t Stat P-value 1.564496771 0.079395981 19.70498685 1.81733E-15 0.237197475 0.055559366 4.269261695 0.000312567 -0.000249079 3.20485E-05 -7.771949195 9.50879E-08 b1 b2 tb2 tb1 Tests of Hypotheses for the Partial Regression Coefficients If the regression errors i are normally distributed and the standard least squares assumptions hold, the following tests have significance level : 1. To test either null hypothesis H 0 :  j  * or H 0 :  j  * against the alternative H1 :  j  * the decision rule is Reject H 0 if b j  * sb j  t n  K 1, Tests of Hypotheses for the Partial Regression Coefficients (continued) 2. To test either null hypothesis H 0 :  j  * or H 0 :  j  * against the alternative H1 :  j  * the decision rule is Reject H 0 if b j  * sb j  t n  K 1, Tests of Hypotheses for the Partial Regression Coefficients (continued) 3. To test the null hypothesis H 0 :  j  * Against the two-sided alternative H1 :  j  * the decision rule is Reject H 0 if b j  * sb j  t n  K 1, / 2 or b j  * sb j  t n  K 1, / 2 Test on All the Parameters of a Regression Model Consider the multiple regression model Yi  0  1 x1i   2 x2i     K xKi   i To test the null hypothesis H 0 : 1   2     K  0 against the alternative hypothesis H1 : At least one of the  j  0 At a significance level  we can use the decision rule SSR / K Reject H 0 if Fk,n -K -1   Fk ,n  K 1, 2 se Where F K,n – K –1, is the critical value of F from Table 7 in the appendix for which P(Fk,n-K-1  Fk ,n  K 1, )   The computed F K,n – K –1 follows an F distribution with numerator degrees of freedom k and denominator degrees of freedom (n – K – 1) Test on a Subset of the Regression Parameters Consider the multiple regression model Yi   0  1 x1i     K xKi  1Z1i   r Z ri   i To test the null hypothesis H 0 : 1   2     r  0 That a subset of regression parameters are simultaneously equal to 0 against the alternative hypothesis H1 : At least one of the  j  0 ( j  1,, r ) Test on a Subset of the Regression Parameters (continued) We compare the error sum of squares for the complete model with the error sum of squares for the restricted model. First run a regression for the complete model that includes all the independent variables and obtain SSE. Next run a restricted regression that excludes the Z variables whose coefficients are the ’s - - the number of variables excluded is r. From this regression obtain the restricted error sum of squares SSE (r). The compute the F statistic and apply the decision rule for a significance level  ( SSE (r )  SSE ) / r Reject H 0 if F   Fr ,n  K r 1, 2 se Predictions from the Multiple Regression Models Given that the population regression model Yi  0  1 x1i   2 x2i     K xKi   i (i  1,2,, n) holds and that the standard regression assumptions are valid. Let b0, b1, . . . , bK be the least squares estimates of the model coefficients, j, j = 1, 2, . . . ,K, based on the x1i, x2i, . . . , xKi, yi (i = 1, 2, . . . n) data points. Then given a new observation of a data point, x1,n+1, x 2,n+1, . . . , x K,n+1 the best linear unbiased forecast of Y n+1 is yˆ n1  b0  b1 x1,n1  b2 x2,n1    bK xK ,n1 It is very risky to obtain forecasts that are based on X values outside the range of the data used to estimate the model coefficients, because we do not have data evidence to support the linear model at those points. Quadratic Model Transformations The quadratic function Y   0  1 X 1   2 X   2 1 Can be transformed into a linear multiple regression model by defining new variables: z  x 1 1 z2  x12 And then specifying the model as Yi   0  1 z1i   2 z2i   i Which is linear in the transformed variables. Transformed quadratic variables can be combined with other variables in a multiple regression model. Thus we could fit a multiple quadratic regression using transformed variables. Exponential Model Transformations Coefficients for exponential models of the form 1 2 Y  0 X1 X 2  Can be estimated by first taking the logarithm of both sides to obtain an equation that is linear in the logarithms of the variables: log( Y )  log( 0 )  1 log( X 1 )   2 log( X 2 )  log(  ) Using this form we can regress the logarithm of Y on the logarithm of the two X variables and obtain estimates for the coefficients 1, 2 directly from the regression analysis. Note that this estimation procedure requires that the random errors are multiplicative in the original exponential model. Thus the error term, , is expressed as a percentage increase or decrease instead of the addition or subtraction of a random error as we have seen for linear regression models. Dummy Variable Regression Analysis The relationship between Y and X1 Y   0  1 X 1   can shift in response to a changed condition. The shift effect can be estimated by using a dummy variable which has values of 0 (condition not present) and 1 (condition present). All of the observations from one set of data have dummy variable X2 = 1, and the observations for the other set of data have X2 = 0. In these cases the relationship between Y and X1 is specified by the regression model yˆ  b0  b2 x2  b1 x1 Dummy Variable Regression Analysis (continued) The functions for each set of points are Yˆ  b0  b1 x1 when X 2  0 and Yˆ  (b0  b2 x2 )  b1 x1 when X 2  1 In the first function the constant is b0, while in the second the constant is b0 + b2. Dummy variables are also called indicator variables. Dummy Variable Regression for Differences in Slope To determine if there are significant differences in the slope between two discrete conditions we need to expand our regression model to a more complex form yˆ  b0  b2 x2  (b1  b3 x2 ) x1 Now we see that the slope coefficient of x1 contains two components, b1, and b3x2. When x2 equals 0, the slope estimate is the usual b1. However, when x2 equals 1, the slope is equal to the algebraic sun of b1 + b3. To estimate the model we actually need to multiply the variables to create a new set of transformed variables that are linear. Therefore the model actually used for the estimation is yˆ  b0  b2 x2  b1 x1  b3 x1 x2 Dummy Variable Regression for Differences in Slope (continued) The resulting regression model is now linear with three variables. The new variable x1x2 is often called an interaction variable. Note that when the dummy variable x2 = 0 this variable has a value of 0, but when x2 = 1 this variable has the value of x1. The coefficient b3 is an estimate of the difference in the coefficient of x1 when x2 = 1 compared to when x2 = 0. Thus the t statistic for b3 can be used to test the hypothesis H 0 :  3  0 | 1  0,  2  0 H1 :  3  0 | 1  0,  2  0 If we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups. In many cases we will be interested in both the difference in the constant and difference in the slope and will test both of the hypotheses presented in this section. Key Words  Adjusted Coefficient of Determination  Basis for Inference About the Population Regression Parameters  Coefficient of Multiple Determination  Confidence Intervals for Partial Regression Coefficients  Dummy Variable Regression Analysis  Dummy Variable Regression for Differences in Slope  Estimation of Error Variance  Least Squares Estimation and the Sample Multiple Regression  Prediction from Multiple Regression Models  Quadratic Model Transformations Key Words (continued)  Regression Objectives  Standard Error of the Estimate  Standard Multiple Regression Assumptions  Sum of Squares Decomposition and the Coefficient of Determination  Test on a Subset of the Regression Parameters  Test on All the Parameters of a Regression Model  Tests of Hypotheses for the Partial Regression Coefficients  The Population Multiple Regression Model

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Predictions from the Multiple Regression Models