Download Homework 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Transcript
Economics 475: Econometrics
Homework #4: Answers
This homework is Thursday, May 5th.
1.
A large number of regressions investigating why some counties experience higher murder
rates. These regressions typically estimate equations similar to:
(1)
Mi = 0 + 1Pi + 2Ui + e1i
where M is the number of murders per 100,000 residents, P is the number of policemen per
100,000 residents, U is the unemployment rate, i indexes counties, and e1i is mean zero, variance
𝜎12 .
a. What signs do you expect 1 and 2 to take?
I would expect counties with more police to have lower crime rates (B 1<0) and with higher unemployment rates to
have greater crime rates (B2 >0).
b. One way to handle the problems encountered in part b is to think of murders being
determined simultaneously with police presence. Consider the simultaneous system of
equations:
(2)
Mi = 0 + 1Pi + 2Ui + e1i
(3)
Pi = 0 + 1Mi + 2Inci + e2i
where Inci is the county’s level of per capita income.
What are the reduced form equations for M and P?
𝛽0 + 𝛽1 𝛼0
𝛽1 𝛼2
𝛽2
𝛽1
1
𝑀𝑖 =
+
𝐼𝑛𝑐𝑖 +
𝑈𝑖 +
𝑒2 +
𝑒
1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1 1
𝛼0 + 𝛼1 𝛽0
𝛼2
𝛼1 𝛽2
𝛼1
1
𝑃𝑖 =
+
𝐼𝑛𝑐𝑖 +
𝑈𝑖 +
𝑒1 +
𝑒
1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1
1 − 𝛽1 𝛼1 2
c. If equations (2) and (3) describe the murder rate, what is the covariance between e1 and P?
What is the covariance between e1 and U? Given these covariances, what will happen to an OLS
estimate of (2)? Specifically, what will 𝛽̂1 and 𝛽̂2 be relative to their true values?
A high M (caused by a high e) would lead to counties hiring more police; thus a positive correlation occurs between
𝛼1
2
P and e. Specifically, the covariance is 𝐸[𝑒1 (𝑃 − 𝑃̅ )] =
𝜎𝑒1
.
1−𝛽1 𝛼1
The covariance between e1 and U is zero.
Estimating the regression in (1) would thus lead to biased coefficients (the estimate of B 1 would be biased in a
positive manner. The estimate of B2 is biased in a direction that depends upon U’s correlation with P and M).
d. Are structural equations (1) and (2) over, exactly, or underidentified?
In this case, there are two exogenous variables, U and Inc. In equation (1) there are two slope variables. Since there
are as many slope variables as exogenous variables, equation (1) is exactly identified. Likewise, equation (2) is
exactly identified.
e. When I solve for the reduced form equations for M and P, I get:
(3)
Mi = 0 + 1Inci + 2Ui + wi
(4)
Pi = 3 + 4Inci + 5Ui + vi
where the ’s are functions of the ’s and ’s and the w’s and v’s are functions of the random
error terms and the ’s and ’s. After using OLS to estimate equations (3) and (4), I find:
^
^
^
^
^
^
 0  .01 , 1  5 ,  2  12 ,  3  8 ,  4  7 ,  5  1
What are your ILS estimates of 0, 1, 2, 0, 1, 2?
Using these six estimates and the six equations given in part c, I can isolate each  and . I find:
(3)
Mi = 5.72429 – .714286Pi + 12.7143 Ui + e1i
(4)
Pi = 7.9917+ .083333Mi + 7.41667Inci + vi
2.
Consider the following 4 structural equations representing the macroeconomy:
(1)
Ct = B0 + B1Yt + B2it + e1t
(2)
Yt = Ct + It + Gt + NXt
(3)
it = B3 + B4Mt + B5Yt + e2t
(4)
Mt = B6 + B7it-1 + B8Yt-1 + e3t
Where Y = RGDP
C = Real Consumption expenditures
I = Real Investment
G = Real Government Expenditures
NX = Real Net Exports
i = nominal interest rate on 1 year T-bill
M = M2
Hints: In econometrics terminology, equations 1, 3, and 4 are stochastic—that is they contain
error terms and coefficients that require estimation. Equation 2 is non-stochastic, it is an identity
and thus needs no estimation. Equations which do not suffer from simultaneous equations bias
may be estimated with OLS, thus if you determine that one of the above stochastic equations
does not suffer from simultaneous bias, then it may be estimated without any correction BUT if
it is not truly simultaneous, then it is also not part of the system of equations and hence should
not be included when determining over-, exact-, or under-identification.
The data for the above variables are on my class website.
a. Using OLS and not correcting for simultaneous equations, estimate equation 1. What do you
find for American MPC? Given your understanding of simultaneity bias, in which direction do
you believe your estimate of the MPC is biased? Be sure to explain your reasoning.
I find:
. reg pce gdp i
Source
SS
df
MS
Model
Residual
1.9353e+09
1020175.07
2
202
967641536
5050.37161
Total
1.9363e+09
204
9491682.58
pce
Coef.
gdp
i
_cons
.6977574
-20.58601
5.180924
Std. Err.
.0012789
1.823057
15.55614
t
545.58
-11.29
0.33
Number of obs
F( 2,
202)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.739
=
=
=
=
=
=
205
.
0.0000
0.9995
0.9995
71.066
[95% Conf. Interval]
.6952356
-24.18067
-25.49232
.7002792
-16.99134
35.85416
The estimated MPC is .697. Since an increase in e1 raises C and higher levels of C increase Y, there should be a
positive correlation between e1 and Y. This positive correlation should cause an upward bias in the estimated MPC.
I use “should” here because there is a second endogeneity problem between e 1and i. Higher e1 raises C, raising Y,
and if B5 is positive, will also cause B2 to be biased.
b. Justify or criticize each stochastic equation. Do they represent accurately your understanding
of macroeconomic theory?
Equation (1) is easy to criticize from a microeconomics standpoint; it is unlikely that any individual bases
consumption decisions on something akin to current income. Instead, individuals likely make consumption
decisions based upon the discounted present value of all future expected earnings. From a macroeconomics
standpoint, it is clear that more than interest rates and disposable income impact consumption—wealth, previous
values of interest rates, etc. are also likely to be independent variables in equation (1). Thus, equation (1) likely
suffers from omitted variable bias. Equation (3) captures the relationship between quantity of money available and
interest rates but it is also likely that the quantity of money is a function of interest rates (think of a money
supply/money demand diagram—the quantity of money and interest rates in that model are the endogenous
variables). Of course this is what equation (4) proposes except it uses past quantities of interest rates to determine
contemporaneous money—something which ignores the Federal Reserve’s ability to instantaneously change the
money supply.
c. In the above equations, which variables are endogenous and which are exogenous (or
predetermined)?
Endogenous: C, i, and Y
Exogenous: I, G, NX, and M2.
Equation (4) is not part of the system of equations—all independent variables on the right hand side of equation (4)
are exogenous or predetermined and hence equation (4) may be estimated with OLS.
d. Construct the reduced form regressions equivalent to the above structural regressions. For
each structural equation explain if it is over-, under-, or exactly-identified?
The reduced form equations I find are:
B0  B 2 B3
B1  B 2 B5
B 2 e 2 t  e1t
B2 B4
I t  G t  NX t  

Mt 
1  B1  B 2 B5 1  B1  B 2 B5
1  B1  B 2 B5
1  B1  B 2 B5
B 0  B 2 B3
1
I t  G t  NX t   B2 B4 M t  B2 e 2t  e1t
Yt 

1  B1  B 2 B5 1  B1  B 2 B5
1  B1  B 2 B5
1  B1  B 2 B5
Ct 
 B  B 2 B3

1
I t  G t  NX t   B2 B4 M t  B2 e 2t  e1t   e 2t
i t  B3  B 4  B5  0

1  B1  B 2 B5
1  B1  B 2 B5 
 1  B1  B 2 B5 1  B1  B 2 B5
Since there are four exogenous variables, the only equations that would be exactly identified would be those with
four slope coefficients (there are none of these). Equations 1 and 3 both have two slope variables so each of these
equations are over-identified.
e. Estimate the reduced form equation for RGDP using OLS. Do you find coefficients that you
don’t expect?
I find:
. reg gdp inv g nx M2
Source
SS
df
MS
Model
Residual
4.1649e+09
5451203.43
4
203
1.0412e+09
26853.2189
Total
4.1704e+09
207
20146689.6
gdp
Coef.
inv
g
nx
M2
_cons
2.249857
2.837605
.7705119
.2688493
-201.1313
Std. Err.
.0571774
.1760108
.151579
.0810355
27.46067
t
39.35
16.12
5.08
3.32
-7.32
Number of obs
F( 4,
203)
Prob > F
R-squared
Adj R-squared
Root MSE
=
208
=38774.81
= 0.0000
= 0.9987
= 0.9987
= 163.87
P>|t|
[95% Conf. Interval]
0.000
0.000
0.000
0.001
0.000
2.137119
2.490561
.4716406
.1090701
-255.276
2.362595
3.184649
1.069383
.4286285
-146.9866
All coefficients are positive which is what I would expect. More of each of these things would theoretically increase
output.
f. Using your estimates from part e, perform a 2SLS regression on equation #1. Compare these
estimates with those you made in part a. Have you eliminated all endogeneity in the
consumption function? If not, what should you do?
. predict gdphat, xb
. reg pce gdphat i
Source
SS
df
MS
Model
Residual
1.9330e+09
3312721.52
2
202
966495262
16399.6115
Total
1.9363e+09
204
9491682.58
pce
Coef.
gdphat
i
_cons
.6904732
-46.57597
194.1473
Std. Err.
.0022823
3.247347
27.58908
t
302.53
-14.34
7.04
Number of obs
F( 2,
202)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
=
205
=58934.03
= 0.0000
= 0.9983
= 0.9983
= 128.06
[95% Conf. Interval]
.685973
-52.97901
139.7478
.6949735
-40.17292
248.5468
The estimate of the MPC is not substantially different from what was estimated using OLS.
By theory, this hasn’t eliminated the endogeneity in the consumption function for two primary reasons. First, as
explained in part a, i is likely endogenous as well. Secondly, the instruments used (inv, g, nx, M2) are certainly not
exogenous. In reality, all of these variables are likely endogenous with real GDP.
3.
Suppose you want to test whether girls who attend a girls’ high school do better in math
than girls who attend coed schools. You have a random sample of senior high school girls and
measure the variable score, an outcome of a mathematics standardized test. Let girlhs be a
dummy variable indicating whether a student attends a girls’ high school. Consider the
regression Scorei = B0 + B1Girlhsi + εi.
a. Suppose that parental support and motivation are unmeasured factors in ε. How does this fact
impact estimates of B1?
The omission of parental support (p) from this regression likely biases B1. If parents with high levels of support are
more likely to send their children to girls’ schools and if parents with high support raise students who score well on
an exam, the B1 will be biased upwards. In other words, if Cov(p, Girlhs) > 0 and Cov(p,Score) > 0, then OLS
estimates of B1 will be estimated higher than what it really is.
b. Consider the variable Numgirl where Numgirl is the number of girls’ high schools within a 20
mile radius of the observation’s home. Under what conditions could Numgirl be used as a valid
IV for Girlhs.
Numbirl is an appropriate instrument in this case if Numgirl is correlated with Girlhs and if Numgirl is not correlated
with ε (in this case, since ε is a function of parental support, what we are really saying is that it must be that Numgirl
is uncorrelated with parental support). It seems likely that Numgirl and Girlhs is correlated (the more local girls
schools, the more likely a girl attends one). It is more questionable if Numgirl is uncorrelated with the error term.
For instance, parents interested in their children may purposefully move to an area with many schooling options.
4.
Test for autocorrelation in the data you will use for your final project.