Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Economics 475: Econometrics Homework #4: Answers This homework is Thursday, May 5th. 1. A large number of regressions investigating why some counties experience higher murder rates. These regressions typically estimate equations similar to: (1) Mi = 0 + 1Pi + 2Ui + e1i where M is the number of murders per 100,000 residents, P is the number of policemen per 100,000 residents, U is the unemployment rate, i indexes counties, and e1i is mean zero, variance 𝜎12 . a. What signs do you expect 1 and 2 to take? I would expect counties with more police to have lower crime rates (B 1<0) and with higher unemployment rates to have greater crime rates (B2 >0). b. One way to handle the problems encountered in part b is to think of murders being determined simultaneously with police presence. Consider the simultaneous system of equations: (2) Mi = 0 + 1Pi + 2Ui + e1i (3) Pi = 0 + 1Mi + 2Inci + e2i where Inci is the county’s level of per capita income. What are the reduced form equations for M and P? 𝛽0 + 𝛽1 𝛼0 𝛽1 𝛼2 𝛽2 𝛽1 1 𝑀𝑖 = + 𝐼𝑛𝑐𝑖 + 𝑈𝑖 + 𝑒2 + 𝑒 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 𝛼0 + 𝛼1 𝛽0 𝛼2 𝛼1 𝛽2 𝛼1 1 𝑃𝑖 = + 𝐼𝑛𝑐𝑖 + 𝑈𝑖 + 𝑒1 + 𝑒 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 1 − 𝛽1 𝛼1 2 c. If equations (2) and (3) describe the murder rate, what is the covariance between e1 and P? What is the covariance between e1 and U? Given these covariances, what will happen to an OLS estimate of (2)? Specifically, what will 𝛽̂1 and 𝛽̂2 be relative to their true values? A high M (caused by a high e) would lead to counties hiring more police; thus a positive correlation occurs between 𝛼1 2 P and e. Specifically, the covariance is 𝐸[𝑒1 (𝑃 − 𝑃̅ )] = 𝜎𝑒1 . 1−𝛽1 𝛼1 The covariance between e1 and U is zero. Estimating the regression in (1) would thus lead to biased coefficients (the estimate of B 1 would be biased in a positive manner. The estimate of B2 is biased in a direction that depends upon U’s correlation with P and M). d. Are structural equations (1) and (2) over, exactly, or underidentified? In this case, there are two exogenous variables, U and Inc. In equation (1) there are two slope variables. Since there are as many slope variables as exogenous variables, equation (1) is exactly identified. Likewise, equation (2) is exactly identified. e. When I solve for the reduced form equations for M and P, I get: (3) Mi = 0 + 1Inci + 2Ui + wi (4) Pi = 3 + 4Inci + 5Ui + vi where the ’s are functions of the ’s and ’s and the w’s and v’s are functions of the random error terms and the ’s and ’s. After using OLS to estimate equations (3) and (4), I find: ^ ^ ^ ^ ^ ^ 0 .01 , 1 5 , 2 12 , 3 8 , 4 7 , 5 1 What are your ILS estimates of 0, 1, 2, 0, 1, 2? Using these six estimates and the six equations given in part c, I can isolate each and . I find: (3) Mi = 5.72429 – .714286Pi + 12.7143 Ui + e1i (4) Pi = 7.9917+ .083333Mi + 7.41667Inci + vi 2. Consider the following 4 structural equations representing the macroeconomy: (1) Ct = B0 + B1Yt + B2it + e1t (2) Yt = Ct + It + Gt + NXt (3) it = B3 + B4Mt + B5Yt + e2t (4) Mt = B6 + B7it-1 + B8Yt-1 + e3t Where Y = RGDP C = Real Consumption expenditures I = Real Investment G = Real Government Expenditures NX = Real Net Exports i = nominal interest rate on 1 year T-bill M = M2 Hints: In econometrics terminology, equations 1, 3, and 4 are stochastic—that is they contain error terms and coefficients that require estimation. Equation 2 is non-stochastic, it is an identity and thus needs no estimation. Equations which do not suffer from simultaneous equations bias may be estimated with OLS, thus if you determine that one of the above stochastic equations does not suffer from simultaneous bias, then it may be estimated without any correction BUT if it is not truly simultaneous, then it is also not part of the system of equations and hence should not be included when determining over-, exact-, or under-identification. The data for the above variables are on my class website. a. Using OLS and not correcting for simultaneous equations, estimate equation 1. What do you find for American MPC? Given your understanding of simultaneity bias, in which direction do you believe your estimate of the MPC is biased? Be sure to explain your reasoning. I find: . reg pce gdp i Source SS df MS Model Residual 1.9353e+09 1020175.07 2 202 967641536 5050.37161 Total 1.9363e+09 204 9491682.58 pce Coef. gdp i _cons .6977574 -20.58601 5.180924 Std. Err. .0012789 1.823057 15.55614 t 545.58 -11.29 0.33 Number of obs F( 2, 202) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.739 = = = = = = 205 . 0.0000 0.9995 0.9995 71.066 [95% Conf. Interval] .6952356 -24.18067 -25.49232 .7002792 -16.99134 35.85416 The estimated MPC is .697. Since an increase in e1 raises C and higher levels of C increase Y, there should be a positive correlation between e1 and Y. This positive correlation should cause an upward bias in the estimated MPC. I use “should” here because there is a second endogeneity problem between e 1and i. Higher e1 raises C, raising Y, and if B5 is positive, will also cause B2 to be biased. b. Justify or criticize each stochastic equation. Do they represent accurately your understanding of macroeconomic theory? Equation (1) is easy to criticize from a microeconomics standpoint; it is unlikely that any individual bases consumption decisions on something akin to current income. Instead, individuals likely make consumption decisions based upon the discounted present value of all future expected earnings. From a macroeconomics standpoint, it is clear that more than interest rates and disposable income impact consumption—wealth, previous values of interest rates, etc. are also likely to be independent variables in equation (1). Thus, equation (1) likely suffers from omitted variable bias. Equation (3) captures the relationship between quantity of money available and interest rates but it is also likely that the quantity of money is a function of interest rates (think of a money supply/money demand diagram—the quantity of money and interest rates in that model are the endogenous variables). Of course this is what equation (4) proposes except it uses past quantities of interest rates to determine contemporaneous money—something which ignores the Federal Reserve’s ability to instantaneously change the money supply. c. In the above equations, which variables are endogenous and which are exogenous (or predetermined)? Endogenous: C, i, and Y Exogenous: I, G, NX, and M2. Equation (4) is not part of the system of equations—all independent variables on the right hand side of equation (4) are exogenous or predetermined and hence equation (4) may be estimated with OLS. d. Construct the reduced form regressions equivalent to the above structural regressions. For each structural equation explain if it is over-, under-, or exactly-identified? The reduced form equations I find are: B0 B 2 B3 B1 B 2 B5 B 2 e 2 t e1t B2 B4 I t G t NX t Mt 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 B 0 B 2 B3 1 I t G t NX t B2 B4 M t B2 e 2t e1t Yt 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 Ct B B 2 B3 1 I t G t NX t B2 B4 M t B2 e 2t e1t e 2t i t B3 B 4 B5 0 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 1 B1 B 2 B5 Since there are four exogenous variables, the only equations that would be exactly identified would be those with four slope coefficients (there are none of these). Equations 1 and 3 both have two slope variables so each of these equations are over-identified. e. Estimate the reduced form equation for RGDP using OLS. Do you find coefficients that you don’t expect? I find: . reg gdp inv g nx M2 Source SS df MS Model Residual 4.1649e+09 5451203.43 4 203 1.0412e+09 26853.2189 Total 4.1704e+09 207 20146689.6 gdp Coef. inv g nx M2 _cons 2.249857 2.837605 .7705119 .2688493 -201.1313 Std. Err. .0571774 .1760108 .151579 .0810355 27.46067 t 39.35 16.12 5.08 3.32 -7.32 Number of obs F( 4, 203) Prob > F R-squared Adj R-squared Root MSE = 208 =38774.81 = 0.0000 = 0.9987 = 0.9987 = 163.87 P>|t| [95% Conf. Interval] 0.000 0.000 0.000 0.001 0.000 2.137119 2.490561 .4716406 .1090701 -255.276 2.362595 3.184649 1.069383 .4286285 -146.9866 All coefficients are positive which is what I would expect. More of each of these things would theoretically increase output. f. Using your estimates from part e, perform a 2SLS regression on equation #1. Compare these estimates with those you made in part a. Have you eliminated all endogeneity in the consumption function? If not, what should you do? . predict gdphat, xb . reg pce gdphat i Source SS df MS Model Residual 1.9330e+09 3312721.52 2 202 966495262 16399.6115 Total 1.9363e+09 204 9491682.58 pce Coef. gdphat i _cons .6904732 -46.57597 194.1473 Std. Err. .0022823 3.247347 27.58908 t 302.53 -14.34 7.04 Number of obs F( 2, 202) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 = 205 =58934.03 = 0.0000 = 0.9983 = 0.9983 = 128.06 [95% Conf. Interval] .685973 -52.97901 139.7478 .6949735 -40.17292 248.5468 The estimate of the MPC is not substantially different from what was estimated using OLS. By theory, this hasn’t eliminated the endogeneity in the consumption function for two primary reasons. First, as explained in part a, i is likely endogenous as well. Secondly, the instruments used (inv, g, nx, M2) are certainly not exogenous. In reality, all of these variables are likely endogenous with real GDP. 3. Suppose you want to test whether girls who attend a girls’ high school do better in math than girls who attend coed schools. You have a random sample of senior high school girls and measure the variable score, an outcome of a mathematics standardized test. Let girlhs be a dummy variable indicating whether a student attends a girls’ high school. Consider the regression Scorei = B0 + B1Girlhsi + εi. a. Suppose that parental support and motivation are unmeasured factors in ε. How does this fact impact estimates of B1? The omission of parental support (p) from this regression likely biases B1. If parents with high levels of support are more likely to send their children to girls’ schools and if parents with high support raise students who score well on an exam, the B1 will be biased upwards. In other words, if Cov(p, Girlhs) > 0 and Cov(p,Score) > 0, then OLS estimates of B1 will be estimated higher than what it really is. b. Consider the variable Numgirl where Numgirl is the number of girls’ high schools within a 20 mile radius of the observation’s home. Under what conditions could Numgirl be used as a valid IV for Girlhs. Numbirl is an appropriate instrument in this case if Numgirl is correlated with Girlhs and if Numgirl is not correlated with ε (in this case, since ε is a function of parental support, what we are really saying is that it must be that Numgirl is uncorrelated with parental support). It seems likely that Numgirl and Girlhs is correlated (the more local girls schools, the more likely a girl attends one). It is more questionable if Numgirl is uncorrelated with the error term. For instance, parents interested in their children may purposefully move to an area with many schooling options. 4. Test for autocorrelation in the data you will use for your final project.