Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Multiple Regression Analysis
y = β0 + β1x1 + β2x2 + . . . βkxk + u
1. Estimation
Introduction
! Main drawback of simple regression is that with 1
!
!
!
!
RHS variable it is unlikely that u is uncorrelated
with x
Multiple regression allows us to control for those
“other” factors
The more variables we have the more of y we will
be able to explain (better predictions)
e.g. Earn=β0+ β1Educ+ β2Exper+u
Allows us to measure the effect of education on
earnings holding experience fixed
1
Parallels with Simple Regression
β0 is still the intercept
β1 to βk all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption, so now assume that
! E(u|x1,x2, …,xk) = 0
! other factors affecting y are not related on average
to x1,x2, …, xk
! Still minimizing the sum of squared residuals
!
!
!
!
2
Estimating Multiple Regression
! The fitted equation can be written:
ŷ = !ˆ 0 + !ˆ1 x1 + !ˆ 2 x2 + ... + !ˆ k xk
! And we want to minimize the sum of squared
residuals:
#(
yi ! "ˆ 0 ! "ˆ1 xi1 ! "ˆ 2 xi 2 ! ... ! "ˆ k xik
)
2
! We will now have k+1 first order conditions
ˆ ! "ˆ x ! "ˆ x ! ... ! "ˆ x = 0
y
!
"
# i 0 1 i1 2 i 2
k ik
(
)
(
ˆ
x
y
!
"
# (
)
! ... ! "ˆ x ) = 0
ˆ ! "ˆ x ! "ˆ x ! ... ! "ˆ x = 0
x
y
!
"
# i1 i 0 1 i1 2 i 2
k ik
i2
Etc.
i
ˆ x ! "ˆ x
!
"
0
1 i1
2 i2
k ik
3
Interpreting the Coefficients
ŷ = !ˆ 0 + !ˆ1 x1 + !ˆ 2 x2 + ... + !ˆ k xk
! We can obtain the predicted change in y given
changes in the x variables
ˆ
ˆ
ˆ
!ŷ = "1!x1 + " 2 !x2 + ... + " k !xk
! The intercept drops out because it’s a constant so
its change is zero. If we hold x2,…,xk fixed, then
!ŷ = "ˆ1!x1 or
!ŷ
= "ˆ1
!x1
4
Interpreting the Coefficients
! Note that this is the same interpretation we
gave !ˆ1 in the univariate regression model.
It tells us how much a 1 unit change in x
changes y.
  In this more general case, it tells us the effect of
x1 on y, holding the other x’s constant.
  We can call this a ceteris paribus effect.
 
! Of course !ˆ2 tells us the ceteris paribus
effect of a 1 unit change in x2, and so forth.
5
iClickers
Question: What day of the week is today?
A) Monday
B) Tuesday
C) Wednesday
D) Thursday
E) Friday
Question: Is next week Spring Break?
A) Yes
B) No
6
iClickers
Press any letter on your iClicker for showing
up on a rainy Friday after Midterm 1 and
before Spring Break.
7
Ceteris Paribus interpretation
! Imagine we rerun our education/earnings
regression, but include lots of controls on
the RHS (x2, x3,….,xk)
If we control for everything that affects
earnings and is correlated with education, then
we can truly claim that “holding everything else
constant, !ˆ1 gives an estimate of the effect of
an extra year of education on earnings.”
  Because if everything else is controlled for, the
zero conditional mean assumption holds.
8
 
Ceteris Paribus interpretation
! This is a very powerful feature of multiple
regression
Suppose only 2 things determine earnings,
education and experience.
  In a laboratory setting, you’d want to pick a
bunch of people with the same experience
(perhaps average experience) and then vary
education across them, and look for differences
in earnings that resulted.
  This is obviously impractical/unethical
9
 
Ceteris Paribus interpretation
! If you couldn’t conduct an experiment, you could go out
and select only people with average experience and then
ask them how much education they have and how much
they earn
 
 
Then run univariate regression of earnings on education
This is inconvenient to have to pick your sample so carefuly
! Multiple regression allows for you to control (almost as if
you’re in a lab--more on that qualifier later in the course)
for differences in individuals along dimensions other than
their level of education.
 
Even though we haven’t picked a sample holding experience
constant, we can interpret the coefficient on education as if we
held experience constant in our sampling.
10
A “Partialling Out” Interpretation
! Suppose that x2 affects y and that x1 and x2 are
correlated. This means the zero conditional mean
assumption is violated if we simply estimate a
univariate regression of y on x1. (convince
yourself of this)
 
 
The usual way to deal with this is to “control for x2” by
including x2 on the right-hand side (multiple
regression). In practice, this is how we’ll usually deal
with the problem.
In principle, there’s another way we could do this. If
we could come up with a measure of x1 that is purged of
its relationship with x2, then we could estimate a
univariate regression of y on this “purged” version of
x1. x2 would still be contained in the error term, but it
would no longer be correlated with x1, so the zero
11
conditional mean assumption would not be violated.
A “Partialling Out” Interpretation
! Consider the case where k=2, so if we do multiple
regression we estimate
yi = ! 0 + !1 xi1 + ! 2 xi 2 + ui
! The estimate of coefficient, β1, is:
2
ˆ
!1 = # (xi1 " x1 ) yi # ( xi1 " x1 )
! Another way to express this same estimate is to
estimate (by univariate regression)
yi = ! 0 + !1r̂i1 + " i
! The estimate of coefficient, β1, is:
!ˆ1 =
(
)
where the r̂i1 are
residuals from the estimated regression
xi1 = ! 0 + ! 2 xi2 + ri
x̂i1 = !ˆ 0 + !ˆ 2 xi 2
r̂i1 = xi1 ! x̂i1 12
# (r̂i1 " r̂1 )yi
2
(
r̂
"
r̂
)
# i1 1
“Partialling Out” continued
! This implies that we estimate the same effect of x1 either
by
1. Regressing y on x1 and x2 (multiple regression) or by
2. Regressing y on residuals from a regression of x1 on x2
(univariate regression)
! These residuals are what is left in the variation of x1 after
x2 has been partialled out. They contain the variation of x1
that is independent of x2
! Therefore, only the part of x1 that is uncorrelated with x2 is
being related to y
! By regressing y on the residuals of the regression of x1 on
x2 we are estimating the effect of x1 on y after the effect of
x2 on x1 has been “netted out” or “partialled out.”
13
Simple vs Multiple Reg Estimate
~
~
~
• Compare the simple regression y = ! 0 + !1 x1
with the multiple regression yˆ = !ˆ 0 + !ˆ1 x1 + !ˆ 2 x2
~
Generally, !1 " !ˆ1 unless :
1. !ˆ 2 = 0 (i.e. no partial effect of x2 on y)
-the first order conditions will be the same in either case if
this is true
or
2. x1 and x2 are uncorrelated in the sample.
-if this is the case then regressing x1 on x2 results in no
partialling out (residuals give the complete variation in x1). In
the general case (with k regressors), we need x1 uncorrelated with
all other x’s.
14
Goodness-of-Fit
! As in the univariate case, we can think of each
observation being made up of an explained part
and an unexplained part yi = yˆ i + uˆi
! We can then define the following:
2
" ( y ! y ) is the total sum of squares (SST)
" ( ŷ ! y ) is the explained sum of squares (SSE)
" û is the residual sum of squares (SSR)
i
2
i
2
i
! Then, SST=SSE+SSR
15
Goodness-of-Fit (continued)
! We can use this to think about how well our
sample regression line fits our sample data
! The fraction of the total sum of squares (SST) that
is explained by the model is called the R-squared
of regression
! R2 = SSE/SST = 1 – SSR/SST
! Can also think of R-squared as the correlation
coefficient between the actual and fitted values of
2
y
(yi ! y )( ŷi ! ŷ )
"
2
R =
2
2
(y
!
y
)
(
ŷ
!
ŷ
)
" i
" i
(
(
)(
)
)
16
More about R-squared
! R2 can never decrease when another independent variable
is added to a regression, and usually will increase
! When you add another variable to the RHS, SSE will
either not change (if that variable explains none of the
variation in y), or it will increase (if that variable explains
some of the variation in y)
! SST will not change
! So while it’s tempting to interpret an increase in the Rsquared after adding a variable to the RHS as indicative
that the new variable is a useful addition to the model, one
shouldn’t read too much into it.
17
Assumptions for Unbiasedness
! The assumptions needed to get unbiased estimates
in simple regression can be restated
1. Population model is linear in parameters:
y = β0 + β1x1 + β2x2 +…+ βkxk + u
2. We have a random sample of size n from the
population model, {(xi1, xi2,…, xik, yi): i=1, 2, .., n}
- i.e. no sample selection bias
Then the sample model is
yi = β0 + β1xi1 + β2xi2 +…+ βkxik + ui
18
Assumptions for Unbiasedness
3. None of the x’s is constant, and there are no exact
linear relationships among the x’s
*We say “perfect collinearity” exists when one of
the x’s is an exact linear combination of other x’s
*It’s OK for x’s to be correlated with each other,
but perfect correlation makes estimation
impossible; high correlation can be problematic.
4. E(u|x1, x2,… xk) = 0
*all of the explanatory variables are “exogenous”
-We call an x variable “endogenous” if it is
correlated with the error term.
-Need to respecify model if at least one x
19
variable is endogenous.
Examples of Perfect Collinearity
1. x1=2(x2) - One of these is redundant
*x2 perfectly predicts x1. So we should throw out
either x1 or x2.
*If an x is constant (e.g. x=c) , then it will be
perfect linear function of β0. c=aβ0 for constant a.
2. The linear combinations can be more complicated
*Suppose you had data on people’s age and their
education, but not on their experience. If you
wanted to measure experience, you could assume
they started work right out of school and calculate
experience as Exp=Age-(Educ+5), where the 5
accounts for the 5 years before we start school as
kids.
20
Examples of Perfect Collinearity
*But then these three variables would be perfectly
collinear (each a linear combination of the others)
*in this case, only 2 of those variables could be
included on the RHS.
3. Including income and income2 does not violate
this assumption (not an exact linear function)
21
Unbiasedness
! If assumptions 1-4 hold, it can be shown that:
E !ˆ j = ! j , j = {0, 1, ..., k}
( )
i.e. that the estimated parameters are unbiased
! What happens if we include variables in our
specification that don’t belong (over-specify)?
 
 
Called “overspecification” or “inclusion of irrelevant
regressors.”
Not a problem for unbiasedness.
! What if we exclude a variable from our
specification that does belong?
 
This violates assumption 4. In general, OLS will be
biased; called “omitted variables problem”
22
Omitted Variable Bias
! This is one of the greatest problems in regression
analysis
 
 
If you omit a variable that affects y and is correlated
with x, then the coefficient that you estimate on x will
be partly picking up the effect of the omitted variable,
and hence will misrepresent the true effect of x on y.
This is why we need the zero conditional mean
assumption to hold; and to make sure it holds, we need
to include all relevant x variables.
23
iClickers
! Question: Suppose the true model of y is given by
y = ! 0 + !1 x1 + ! 2 x2 + u
!
but you don’t have data on x2 so you estimate
y˜ = "˜ 0 + "˜1 x1 + v instead. Assume that x1 and x2
are correlated, but not perfectly. This is a problem
because:
A) x1 and x2 are perfectly collinear.
B) u and v are different from each other.
C) v contains x2.
D) None of the above.
24
Omitted Variable Bias
! Suppose the true model is given by
y = ! 0 + !1 x1 + ! 2 x2 + u
! But we estimate
y! = !! 0 + !!1 x1 , then
!!1 =
#( x
#( x
i1
i1
" x1 ) yi
" x1 )
2
! Recall the true model, so that
yi = ! 0 + !1 xi1 + ! 2 xi 2 + ui
25
Omitted Variable Bias (cont)
! So the numerator of !!1 becomes
" ( x ! x )( # + # x + # x + u )
= # " ( x ! x ) + # " ( x ! x )x + # " ( x
+ " ( x ! x )u
recall: " ( x ! x )=0, " ( x ! x )x = " ( x
i1
0
i1
1
0
1 i1
i1
1
1
1
i1
2 i2
i1
i
1
i1
2
i1
! x1 )xi 2
i
1
i1
1
i1
i1
! x1 )
2
! Now the numerator becomes
"1 $ (x i1 # x1 ) 2 +" 2 $ (x i1 # x1 )x i2 + $ (x i1 # x1 )ui
26
Omitted Variable Bias (cont)
! Returning to our formula for !!1
#( x " x ) y
#( x " x )
! # (x " x ) + ! # ( x " x )x + # ( x
=
#( x " x )
! # ( x " x )x
( x " x )u
#
=! +
+
#( x " x )
#( x " x )
!!1 =
i1
1
i
2
i1
1
2
1
i1
2
i1
1
i2
i1
" x1 )ui
2
i1
2
i1
1
1
i2
1
i1
1
i1
1
2
i1
1
i
2
27
Omitted Variable Bias (cont)
x
"
x
x
x
"
x
u
(
)
(
)
#
#
so, !! = ! + !
+
#( x " x ) #( x " x )
i1
1
1
1
i2
2
2
i1
i1
1
i
2
1
i1
1
! Taking expectations (conditional on x’s) and
recalling that E(ui)=0, we get
E(!!1 ) = !1 + ! 2
# (x " x )x
# (x " x )
i1
i2
2
.
i1
! In other words, our estimate is (generally)
biased. :(
28
Omitted Variable Bias (cont)
! How might we interpret the bias?
! Consider the regression of x2 on x1:
x! 2 = !!0 + !!1 x1
then !!1
(x " x )x
#
=
#( x " x )
i1
i1
1
i2
2
1
The omitted variable bias equals zero if:
1. β2=0: (e.g. if x2 doesn’t belong in the model)
2. !!1 = 0 (e.g. if x1 and x2 don’t covary)
!
29
Summary of Direction of Bias
! The direction of the bias, (" 2 * #˜1) , depends on:
1.  The sign of "˜1
-positive if x1 x2 positively correlated
-negative if x1 x2!negatively correlated
! of β2
2. The sign
Summary Table
Corr(x1, x2) > 0
Corr(x1, x2) < 0
β2 > 0
Positive bias
Negative bias
β2 < 0
Negative bias
Positive bias
30
iClickers
Question: Suppose the true model of y is given
by y = ! 0 + !1 x1 + ! 2 x2 + u but you
estimate y˜ = "˜ 0 + "˜1 x1 + v instead. Suppose
that x1 and x2 are negatively correlated, and that
x2 has a negative effect on y. What will be the
sign of the bias of "˜1 ?
A)!
Positive.
B) Negative.
! be zero.
C) The bias will
D) None of the above.
!
iClickers
Suppose that 2 things, education and ability,
determine earnings. Assume that both matter
positively for earnings, and that ability and
education positively covary.
Question: If you estimate a model of earnings that
excludes ability, the coefficient estimate on
education will be
A)  positively biased.
B)  negatively biased.
C)  unbiased.
D)  none of the above.
!
32
Omitted Variable Bias Summary
! The size of the bias is also important
! Typically we don’t know the magnitude of β2 or "˜1
but we can usually make an educated guess
whether these are positive or negative
!
What about the more general case (k+1 variables)?
! Technically we can only sign the bias of the more
general case if all the included x’s are uncorrelated
33
OVB Example 1
! Suppose that the true model of the wage is
wage = " 0 + "1educ + " 2experience + u
but you estimate wage = " 0 + "1educ + v. Which way
!
!
will the estimated coefficient on educ be biased?
1) "˜1 has same sign as corr(educ, exper)<0
2) β2>0
! (because people with more experience
tend to be more productive)
Therefore bias= (" 2 * #˜1) <0; the estimate is
negatively biased
So, if you omit experience when it should be
included, the estimated coefficient on educ will be
!
negatively biased (too small on average)
34
OVB Example 2
! Suppose that the true model of the wage is
wage = " 0 + "1educ + " 2 ability + u
but you estimate wage = " 0 + "1educ + v. Which way
!
!
will the estimated coefficient on educ be biased?
* "˜1 has same sign as corr(educ, ability)>0
* β2>0
! (because people with more ability tend to
be more productive)
* bias= (" 2 * #˜1) > 0; the estimate is positively
biased
In other words, if you omit ability when it should be
included, the estimated coefficient on educ will be
!
positively biased (too big on average)
35
The More General Case
! For example, suppose that the true model is:
y = ! 0 + !1 x1 + ! 2 x2 + ! 3 x3 + u
and we estimate:
y! = !! 0 + !!1 x1 + !! 2 x2
! Suppose x1 and x3 are correlated. In general, both
!!1 and !! 2 will be biased. !! 2 will only be unbiased
if x1 and x2 are uncorrelated.
36
Variance of the OLS Estimators
! We now know (if assumptions 1-4 hold) that the
sampling distribution of our estimate is centered
around the true parameter
! Want to think about how spread out this
distribution is
! Much easier to think about this variance under an
additional assumption, so
Assumption 5:
! Assume Var(u|x1, x2,…, xk) = σ2
(Homoskedasticity)
! This implies that Var(y|x1, x2,…, xk) = σ2
37
Variance of OLS (cont)
! The 4 assumptions for unbiasedness, plus this
homoskedasticity assumption are known as the
Gauss-Markov assumptions
! These can be used to show:
( )
Var !ˆ j =
2
"
2
2
$ (xij # x j ) 1 # R j
(
)
! R 2j refers to the R-squared obtained from a
regression of xj on all other independent variables
(including an intercept)
!
38
Important to understand what R is
2
j
It’s not the R-squared for the regression of y on
the x’s (the usual R-squared we think of)
!
 
 
 
 
 
!
It’s an R-squared, but for a different regression
Suppose we’re calculating variance
of the estimator
!
of x1; then R12is the R-squared from a regression of x1
on x2, x3, ..., xk
If we’re calculating variance of the estimator of x2;
then R22 is the R-squared from a regression of x2 on x1,
x3, ..., xk
!
etc.
Whichever coefficient you’re calculating variance
for, you just regress that variable on all other RHS
variables (this gives a measurement of how closely
that variable is related—linearly—to other RHS vars)
39
Components of OLS Variances
Thus, how precisely we can estimate the
coefficients (variance) depends on:
1.  The error variance (σ2):
!
-A larger σ2 implies higher variance of the slope
coefficient estimators
-The more “noise” (unexplained variation) in the
equation, the harder it is to get precise estimates.
-Just like univariate case
2.  The total variation in xj
-Larger SSTj gives us more variation of xj from which to
discern an impact on y; lowers variance
-Just like univariate case
40
Components of OLS Variances
3. The linear relationships among the independent variables
(Rj2). Note that we didn’t need this term in the univariate
case:
*A larger Rj2 implies larger variance for estimator
*The more correlated the RHS variables, the higher the
variance of the estimators (the more imprecise the
estimates)
! The problem of highly correlated x variables is called
“Multi-Collinearity”
  Not clear what the solution is. Could drop some of the
correlated x variables, but then you may end up with
omitted variables bias.
  Can increase your sample size, which will boost the
SST, and maybe let you shrink the variance of your
estimators that way.
41
iClickers
! Recall the expression for variance of the slope
OLS estimators in multivariate regression:
( )
Var !ˆ j
"2
=
2
2
(x
#
x
)
1
#
R
$ ij j
j
(
)
! Question: Suppose you estimate
y i = " 0 + "1 x1i + " 2 x 2i + " 3 x 3i + ui
!
!
and want an expression for the variance of "ˆ 3 . You
would obtain the needed R32 by estimating which
of the following equations?
A) y i = " 0 + "1 x1i + " 2 x 2i + " 3 x 3i + ui
!
B) y i = " 0 + " 3 x 3i + v i
!
C) x 3i = " 0 + " 1 x1i + " 2 x 2i + w i
D) None of the above.
42
Example of Multi-Collinearity
2
(high R j )
! Suppose that you want to try to explain differences
!
in mortality rates across different states
! Might use variation in medical expenditures across
the states (e.g. expenditures on hospitals, medical
equipment, doctors, nurses, etc.)
! Problem?
 
 
Expenditures on hospitals will tend to be highly
correlated with expenditures on doctors, etc.
So estimates of impact of hospital spending on
mortality rates will tend to be imprecise.
43
Variances in misspecified models
! Consider again the misspecified model (x2
omitted):
y! = !! 0 + !!1 x1
! We showed that the estimate of β1 is biased if
(" 2 * #˜1 ) does not equal zero and that including x2
!
in this model does not bias the estimate even if
β2=0; this suggests that adding x variables can
only reduce bias
! So, Q: why not always throw in every possible
RHS variable, just to be safe? A: Multicollinearity
! We know that in the misspecified model
2
2
!
=0 if no other variables
R
"
1
!
Var(!1 ) =
2
on
RHS;
adding
vars
will
R
44
1
SST
1
Misspecified Models (cont)
Thus, Var !!1 < Var !ˆ1 (holding σ2 constant)
! Unless x1 and x2 are uncorrelated (Rj2=0); then the
variances are the same (holding σ2 constant)
! Assuming that x1 and x2 are correlated:
1)  If β2=0: !!1 and !ˆ1 will both be unbiased and
Var !! < Var !ˆ
( )
( )
( )
1
( )
1
2) If β2 not equal to 0: Then !!1 will be biased, and !ˆ1
will be unbiased, and variance of the misspecified
estimator could be greater or less than the variance
of the correctly specified estimator, depending on
what happens to σ2
45
Misspecified Models (cont)
! So, if x2 doesn’t affect y, (i.e. β2=0), it’s best to
leave x2 out, because we’ll get a more precise
estimate of the effect of x1 on y.
! Note that if x2 is uncorrelated with x1, but β2≠0
then including x2 will actually lower the variance
of the estimated coeff. on x1, because its inclusion
lowers σ2
! Generally, we need to include variables whose
omission would cause bias
 
But we often don’t want to include other variables, if
their correlation with other RHS variables will reduce
precision of the estimates.
46
Estimating the Error Variance
! We don’t know what the error variance, σ2, is,
because we don’t observe the errors, ui
! What we observe are the residuals, ûi
! We can use the residuals to form an estimate of the
error variance as we did for simple regression
!ˆ 2 =
(
2
û
" i
)
( n # k # 1)
= SSR df
! df=# of observations - # of estimated parameters
! df=n-(k+1)
47
Error Variance Estimate (cont)
! Thus, an estimate of the standard error of the jth
coefficient is:
( )
"ˆ
se !ˆ j =
1/2
n
+%
2(
2 .
-' $ (xij # x j ) * (1 # R j ) 0
)
,& i =1
/
(
2
j
)
12
= "ˆ +, SST j 1 # R ./
48
The Gauss-Markov Theorem
! Allows us to say something about the efficiency of
OLS
-why we should use OLS instead of some other estimator
! Given our 5 Gauss-Markov Assumptions it can be
shown that the OLS estimates are “BLUE”
Best:
-Smallest variance among the class of linear unbiased
estimators
Var("ˆ1 ) < Var("˙˙1 )
where "˙˙1 denotes any other unbiased estimator of β1.
49
!
The Gauss-Markov Theorem
Linear:
-Can be expressed as a linear function of y
n
"ˆ j = # w ij y i
i=1
where w is some function of the sample x’s.
Unbiased:
!
!
E("ˆ j ) = " j , for all j
Estimator:
-provides an estimate of the underlying parameters
*When assumptions hold, use OLS
*When assumptions fail, OLS is not BLUE: a better
50
estimator may exist