Download Here we will discuss the details of the multiple linear regression

Here we will discuss the details of the multiple linear regression model. 1 The model for the population mean response looks similar to that for simple linear regression. On the left we have the expected value of the outcome Y given the set of X’s (the bold notation is used to represent the list of X’s). On the right we have the intercept, beta_0, followed by a beta(X) term for each predictor. The model for the i-th individual MUST use the “i“ notation for the variables, Y, X_1, X_2, up to X_p and adds the individual error term, epsilon_i. 2 Our assumptions about the error term are the same, that they are • Normally distributed with a mean of zero and constant variance (denoted by sigmasquared) • Statistically independent of each other We will come back and discuss model validation later. 3 Our model adds new terms for each new X variable in our model. In our model • We still have an intercept (beta_0) which represents the population mean outcome when ALL predictors are equal to zero. • We have a partial slope for each X-variable which here we denote beta_1 through beta_p. • I often use notation that is meaningful such as beta_age, beta_female, etc. so as to make it easy to remember what everything represents. • These slopes are now ADUSTED for the effects of the other predictors and in our interpretations of individual predictors, we must specify that the value is adjusted for all other predictors, in other words, that all other predictors are held constant. • When we interpret the predictor for a multi-level categorical variable – we are adjusting for all OTHER predictors, which will not include the other levels of that categorical variable – multi-level categorical variables add an extra layer of complexity that we will discuss soon. For now, we just want to understand the basic model. 4 We will let SAS provide our parameter estimates of the beta values, the parameters in our model. We can calculate predicted values and residuals in the same way – the model simply becomes more complex. The equation for the sums of squares is the same but our degrees of freedom for the model increase to p = the number of X-variables in our model The degrees of freedom for the total sum of squares is still n-1 and by subtraction we obtain the residual sum of squares degrees of freedom are n – p – 1. 5 The estimate of the standard deviation of the errors around our regression model is still the square root of the mean square error or Root MSE. The coefficient of determination, R-squared is also the same as well as it’s interpretation. The model has simply become more complex. We define the square root of R-squared to be the multiple correlation coefficient, r, this is equivalent to the Pearson’s correlation coefficient between the original Y and the predicted Y from our current model – this relationship will always be positive and the multiple correlation coefficient can be considered a measure of the strength of the model in the same way it was a measure of the strength of the linear relationship in the case of simple linear regression. 6 If we add parameters (betas) to our model by adding new X-variables, we will always increase the value of R-squared. Adjusted R-squared contains a penalty for additional parameters and is a better measure than R-squared of the improvement in model fit by adding new Xvariables. We will discuss this more when we cover model selection. 7 This is the most difficult material in this section to fully understand – but regardless of complete understanding – the resulting points of this discussion are important. • The equation for the variance of a particular parameter estimate, beta_j-hat is given here • the first fraction has the numerator of the estimated variance of the error term (MSE) and denominator of (n-1) times the variance of X_j • The second fraction has 1 over 1-r_j-squared. • r_j is the correlation when we run a regression model with X_j as the outcome and all other predictors at the explanatory variables. This is a measure of how correlated X_j is with the other predictors. • This second fraction is called the variance inflation factor. • If X_j is highly correlated with other predictors, then r_j-squared is close to 1, the denominator of the fraction is close to zero and thus the overall fraction of 1 over 1-r_j-squared is large. The effect is potentially a large in the variation of our parameter estimate in our model for that predictor, X_j. This would increase the width of confidence intervals and make it more difficult to find significant effects. • When X’s are highly correlated, we call this collinearity – or worse multicollinearity (lots of correlated variables). 8 • If x_j is NOT correlated with other predictors then the variance inflation factor is close to 1 and the addition of this predictor does not increase the variance of the parameter estimates unnecessarily. • SAS can easily give us the value of the variance inflation factor (VIF) in multiple regression – in GLM we get what is called the tolerance which is 1/VIF. • In this equation, there is a trade-off between the variance of the error term and the variance inflation factor. • Including additional predictors will increase the variance inflation factor • Including good predictors will decrease the variance of the error term • Thus, the variance of the parameter estimate beta_j-hat may not always increase by including more predictors. It can even decrease if some “good” predictors are included. • As for simple linear regression, the S-X_j-squared in the denominator implies if we can measure a larger range of X_j’s for our model, we can more precisely estimate the associated parameter. 9 We still have the same equation for the test statistic for the t-test for each beta – however the degrees of freedom have changed to n – p – 1, the error degrees of freedom. • We also have the confidence intervals, again with the degrees of freedom being the only change. • You should be able to calculate the test statistic from SAS output and should be able to obtain these results via SAS. • If we reject the null hypothesis for a given X, we then claim that that particular X predicts Y, holding all other factors in the model constant. • Multi-level predictors require different way of thinking about this but we will discuss that later. Just be aware at this point that it will be different. • The confidence intervals are also adjusted for the other predictors. • We can say: • Adjusted for the other predictors in the model • Holding all other predictors in the model constant (or fixed). 10 The F-test provides the overall significance of the model – does the model explain a significant amount of the variation in Y? This is still the ratio of the mean square for the model and the mean square error. 11 Finally we can construct standardized coefficients by multiplying by the standard deviation of the predictor and dividing by the standard deviation of the outcome. These will be unit-less versions of the estimates. These are the coefficients we would have obtained if our variables were all transformed to have a standard deviation of 1 prior to running the regression model. These are useful for comparing the relative strength of the association between continuous predictors in our model. For categorical predictors, the raw estimates are usually more useful. 12 We have introduced all of the basic concepts needed to begin looking at some examples of multiple linear regression. Our next task will be to discuss interpretations and other issues for binary and multilevel categorical predictors. Later we will discuss confounding and interaction and then model validation and selection. It is a long journey!! 13

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Here we will discuss the details of the multiple linear regression