Download Multiple Correlation/ Regression as a Simplification of the GLM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
A Brief Introduction to Multiple Correlation/Regression
as a Simplification of the Multivariate General Linear Model
In its most general form, the GLM (General Linear Model) relates a set of p predictor variables
(X1 through Xp) to a set of q criterion variables (Y1 through Yq). We shall now briefly survey three
special cases of the GLM, univariate means, bivariate correlation/regression and multiple
correlation/regression.
The Univariate Mean: A One Parameter (a) Model
If there is only one Y and no X, then the GLM simplifies to the computation of a mean. We
apply the least squares criterion to reduce the squared deviations between Y and predicted Y to the
smallest value possible for a linear model. The prediction equation is Yˆ  Y . Error in prediction is
estimated by s 
(Y  Y )2
.
n 1
Bivariate Regression: A Two Parameter (a and b) Model
If there is only one X and only one Y, then the GLM simplifies to the simple bivariate linear
correlation/regression with which you are familiar. We apply the least squares criterion to reduce
the squared deviations between Y and predicted Y to the smallest value possible for a linear model.
2
Y  Ŷ is minimal. The GLM is reduced to
That is, we find a and b such that for Yˆ  a  bX , the


Y  a  bX  e  Yˆ  e , where e is the "error" term, the deviation of Y from predicted Y. The
coefficient "a" is the Y-intercept, the value of Y when X = 0 (the intercept was the mean of Y in the
one parameter model above), and "b" is the slope, the average amount of change in Y per unit
(Y  Yˆ )2
change in X. Error in prediction is estimated by sest _ Y 
.
n 1
Although the model is linear, that is, specifies a straight line relationship between X and Y, it
may be modified to test nonlinear models. For example, if you think that the function relating Y to X is
quadratic, you employ the model Y  a  b1 X  b2 X 2  e .
It is often more convenient to work with variables that have all been standardized to some
common mean and some common SD (standard deviation) such as 0, 1 (Z-scores). If scores are so
standardized, the intercept, "a," drops out (becomes zero) and the standardized slope, the number of
standard deviations that predicted Y changes for each change of one SD in X, is commonly referred
to as . In a bivariate regression,  is the Pearson r. If r = 1, then each change in X of one SD is
associated with a one SD change in predicted Y.
The variables X and Y may be both continuous (Pearson r), one continuous and one
dichotomous (point biserial r), or both dichotomous ().
Multiple Correlation/Regression
In multiple correlation/regression, one has two or more predictor variables but only one
criterion variable. The basic model is Yˆ  a  b1 X 1  b2 X 2    bp X p or, employing standardized
scores, Zˆ Y   1Z1   2 Z 2     p Z p . Again, we wish to find regression coefficients that produce a

Copyright 2012, Karl L. Wuensch - All rights reserved.
MV\MultReg\IntroMR.docx
2
predicted Y that is minimally deviant from observed Y, by the least squares criterion. We are
creating a linear combination of the X variables, a  b1 X 1  b2 X 2    b p X p , that is maximally
correlated with Y. That is, we are creating a superordinate predictor variable that is a linear
combination of the individual predictor variables, with the weighting coefficients (b1  bp) chosen
such that the Pearson r between the criterion variable and the linear combination is maximal. The
value of this r between Y and the best linear combination of X’s is called R, the multiple correlation
coefficient. Note that the GLM is not only linear, but additive. That is, we assume that the weighted
effect of X1 combines additively with the weighted effect of X2 to determine their joint effect,
a  b1 X 1  b2 X 2 , on predicted Y.
As a simple example of multiple regression, consider using high school GPA and SAT scores
to predict college GPA. R would give us an indication of the strength of the association between
college GPA and the best linear combination of high school GPA and SAT scores. We could
additionally look at the  weights (also called standardized partial regression coefficients) to
determine the relative contribution of each predictor variable towards predicting Y. These coefficients
are called partial coefficients to emphasize that they reflect the contribution of a single X in predicting
Y in the context of the other predictor variables in the model. That is, how much does predicted Y
change per unit change in Xi when we partial out (remove, hold constant) the effects of all the other
predictor variables. The weight applied to Xi can change dramatically if we change the context (add
one or more additional X or delete one or more of the X variables currently in the model). An X which
is highly correlated with Y could have a low weight simply because it is redundant with another X in
the model.
Rather than throwing in all of the independent variables at once (a simultaneous multiple
regression) we may enter them sequentially . With an a priori sequential analysis (also called a
hierarchical analysis), we would enter the predictors variables in some a priori order. For example,
for predicting college GPA, we might first enter high school GPA, a predictor we consider “highpriority” because it is cheap (all applicants can provide it at low cost). We would compute r2 and
interpret it as the proportion of variance in Y that is “explained” by high school GPA. Our next step
might be to add SAT-V and SAT-Q to the model and compute the multiple regression for
Yˆ  a  b1 X1  b2 X 2  b3 X 3 . We entered SAT scores with a lower priority because they are more
expensive to obtain - not all high school students have them and they cost money to obtain. We
enter them together because you get both for one price. This is called setwise entry. We now
compare the R2 (squared multiple correlation coefficient) with the r2 previously obtained to see how
much additional variance in Y is explained by adding X2 and X3 to the X1 already in the model. If the
increase in R2 seems large enough to justify the additional expense involved in obtaining the X2 and
X3 information, we retain X2 and X3 in the model. We might then add a yet lower priority predictor,
such as X4, the result of an on-campus interview (costly), and see how much further the R2 is
increased, etc.
In other cases we might first enter nuisance variables (covariates) for which we wish to
achieve “statistical control” and then enter our predictor variable(s) of primary interest later. For
example, we might be interested in the association between the amount of paternal care a youngster
has received (Xp) and how healthy e is (Y). Some of the correlation between Xp and Y might be due
to the fact that youngsters from “good” families get lots of maternal (Xm) care and lots of paternal
care, but it is the maternal care that causes the youngsters' good health. That is, Xp is correlated with
Y mostly because it is correlated with Xm which is in turn causing Y. If we want to find the effect of Xp
on Y we could first enter Xm and compute r2 and then enter Xp and see how much R2 increases. By
first entering the covariate, we have statistically removed (part of) its effect on Y and obtained a
clearer picture of the effect of Xp on Y (after removing the confounded nuisance variable’s effect).
This is, however, very risky business, because this adjustment may actually remove part of (or all) of
3
the actual causal effect of Xp on Y. For example, it may be that good fathers give their youngsters
lots of care, causing them to be healthy, and that mothers simply passively respond, spending more
time with (paternally caused) healthy youngsters than with unhealthy youngsters. By first removing
the noncausal “effect” of Xm on Y we, with our maternal bias, would have eliminated part of the truly
causal effect of Xp on Y. Clearly our a priori biases can affect the results of such a squential
analyses.
Stepwise multiple regression analysis employs one of several available statistical
algorithms to order the entry (and/or deletion) of predictors from the model being constructed. I opine
that stepwise analysis is one of the most misunderstood and abused statistical procedures employed
by psychologists. Many psychologists mistakenly believe that such an analysis will tell you which
predictors are importantly related to Y and which are not. That is a very dangerous delusion.
Imagine that among your predictors are two, let us just call them A and B, each of which is well
correlated with the criterion variable, Y. If A and B are redundant (explain essentially the same
portion of the variance in Y), then one, but not both, of A and B will be retained in the final model
constructed by the stepwise technique. Whether it is A or B that is retained will be due to sampling
error. In some samples A will, by chance, be just a little better correlated with Y than is B, while in
other samples B will be, by chance, just a little better correlated with Y than is A. With your sample,
whether it is A or B that is retained in the model does not tell you which of A and B is more
importantly related to Y. I strongly recommend against persons using stepwise techniques until they
have received advanced instruction in their use and interpretation. See this warning.
Assumptions
There are no assumptions involved in computing point estimates of the value of R, a, bi, or
sest_Y, but as soon as you use t or F to put a confidence interval on your estimate of one of these or
test a hypothesis about one of these there are assumptions. Exactly what the assumptions are
depends on whether you have adopted a correlation model or a regression model, which depends on
whether you treat the X variable(s) as fixed (regression) or random (correlation). Review this
distinction between regression and correlation in the document Bivariate Linear Correlation and then
work through my lesson on Producing and Interpreting Residuals Plots in SAS.
Return to Wuensch’s Stats Lessons Page
Copyright 2012, Karl L. Wuensch - All rights reserved.