Download Here we will discuss the details of the multiple linear regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Here we will discuss the details of the multiple linear regression model.
1
The model for the population mean response looks similar to that for simple linear
regression.
On the left we have the expected value of the outcome Y given the set of X’s (the
bold notation is used to represent the list of X’s).
On the right we have the intercept, beta_0, followed by a beta(X) term for each
predictor.
The model for the i-th individual MUST use the “i“ notation for the variables, Y, X_1,
X_2, up to X_p and adds the individual error term, epsilon_i.
2
Our assumptions about the error term are the same, that they are
• Normally distributed with a mean of zero and constant variance (denoted by sigmasquared)
• Statistically independent of each other
We will come back and discuss model validation later.
3
Our model adds new terms for each new X variable in our model. In our model
• We still have an intercept (beta_0) which represents the population mean
outcome when ALL predictors are equal to zero.
• We have a partial slope for each X-variable which here we denote beta_1
through beta_p.
• I often use notation that is meaningful such as beta_age, beta_female, etc. so as
to make it easy to remember what everything represents.
• These slopes are now ADUSTED for the effects of the other predictors and in our
interpretations of individual predictors, we must specify that the value is adjusted
for all other predictors, in other words, that all other predictors are held constant.
• When we interpret the predictor for a multi-level categorical variable – we are
adjusting for all OTHER predictors, which will not include the other levels of that
categorical variable – multi-level categorical variables add an extra layer of
complexity that we will discuss soon. For now, we just want to understand the
basic model.
4
We will let SAS provide our parameter estimates of the beta values, the parameters
in our model.
We can calculate predicted values and residuals in the same way – the model
simply becomes more complex.
The equation for the sums of squares is the same but our degrees of freedom for
the model increase to p = the number of X-variables in our model
The degrees of freedom for the total sum of squares is still n-1 and by subtraction
we obtain the residual sum of squares degrees of freedom are n – p – 1.
5
The estimate of the standard deviation of the errors around our regression model is
still the square root of the mean square error or Root MSE.
The coefficient of determination, R-squared is also the same as well as it’s
interpretation. The model has simply become more complex.
We define the square root of R-squared to be the multiple correlation coefficient, r,
this is equivalent to the Pearson’s correlation coefficient between the original Y and
the predicted Y from our current model – this relationship will always be positive and
the multiple correlation coefficient can be considered a measure of the strength of
the model in the same way it was a measure of the strength of the linear
relationship in the case of simple linear regression.
6
If we add parameters (betas) to our model by adding new X-variables, we will
always increase the value of R-squared.
Adjusted R-squared contains a penalty for additional parameters and is a better
measure than R-squared of the improvement in model fit by adding new Xvariables.
We will discuss this more when we cover model selection.
7
This is the most difficult material in this section to fully understand – but regardless
of complete understanding – the resulting points of this discussion are important.
• The equation for the variance of a particular parameter estimate, beta_j-hat is
given here
• the first fraction has the numerator of the estimated variance of the error term
(MSE) and denominator of (n-1) times the variance of X_j
• The second fraction has 1 over 1-r_j-squared.
• r_j is the correlation when we run a regression model with X_j as the outcome
and all other predictors at the explanatory variables. This is a measure of how
correlated X_j is with the other predictors.
• This second fraction is called the variance inflation factor.
• If X_j is highly correlated with other predictors, then r_j-squared is close to 1, the
denominator of the fraction is close to zero and thus the overall fraction of 1 over
1-r_j-squared is large. The effect is potentially a large in the variation of our
parameter estimate in our model for that predictor, X_j. This would increase the
width of confidence intervals and make it more difficult to find significant effects.
• When X’s are highly correlated, we call this collinearity – or worse multicollinearity (lots of correlated variables).
8
• If x_j is NOT correlated with other predictors then the variance inflation factor is close to 1
and the addition of this predictor does not increase the variance of the parameter
estimates unnecessarily.
• SAS can easily give us the value of the variance inflation factor (VIF) in multiple
regression – in GLM we get what is called the tolerance which is 1/VIF.
• In this equation, there is a trade-off between the variance of the error term and the
variance inflation factor.
• Including additional predictors will increase the variance inflation factor
• Including good predictors will decrease the variance of the error term
• Thus, the variance of the parameter estimate beta_j-hat may not always increase by
including more predictors. It can even decrease if some “good” predictors are included.
• As for simple linear regression, the S-X_j-squared in the denominator implies if we can
measure a larger range of X_j’s for our model, we can more precisely estimate the
associated parameter.
9
We still have the same equation for the test statistic for the t-test for each beta –
however the degrees of freedom have changed to n – p – 1, the error degrees of
freedom.
• We also have the confidence intervals, again with the degrees of freedom being
the only change.
• You should be able to calculate the test statistic from SAS output and should be
able to obtain these results via SAS.
• If we reject the null hypothesis for a given X, we then claim that that particular X
predicts Y, holding all other factors in the model constant.
• Multi-level predictors require different way of thinking about this but we will
discuss that later. Just be aware at this point that it will be different.
• The confidence intervals are also adjusted for the other predictors.
• We can say:
• Adjusted for the other predictors in the model
• Holding all other predictors in the model constant (or fixed).
10
The F-test provides the overall significance of the model – does the model explain a
significant amount of the variation in Y?
This is still the ratio of the mean square for the model and the mean square error.
11
Finally we can construct standardized coefficients by multiplying by the standard
deviation of the predictor and dividing by the standard deviation of the outcome.
These will be unit-less versions of the estimates.
These are the coefficients we would have obtained if our variables were all
transformed to have a standard deviation of 1 prior to running the regression model.
These are useful for comparing the relative strength of the association between
continuous predictors in our model.
For categorical predictors, the raw estimates are usually more useful.
12
We have introduced all of the basic concepts needed to begin looking at some
examples of multiple linear regression.
Our next task will be to discuss interpretations and other issues for binary and multilevel categorical predictors.
Later we will discuss confounding and interaction and then model validation and
selection.
It is a long journey!!
13