* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 10.2 Suppose you have T=2 years of data on the same group of N
Survey
Document related concepts
Transcript
10.2 Suppose you have T=2 years of data on the same group of N working individuals. Consider the following model of wage determination: The unobserved effect is allowed to be correlated with is a time period indicator, where if t=2 and follows, assume that: and The variable if t=1. In what (a) Without further assumptions, what parameters is the log wage equation can be consistently estimated? Given the assumptions, we can estimate the parameters using Fixed Effects (FE) since we allow the to be correlated with and First, we eliminate the unobserved effect by differentiating between the time periods. The resulting equation will still be a standard linear model, except that it is stated in terms of the differences of all the variables included in the original equation: . FE.1 is satisfied as given, i.e. . Without FE.2, , we cannot conclude whether the FE estimator is consistent. But if we satisfy the rank condition, then we can consistently estimate the parameters of this log(wage) equation. (b) Interpret the coefficients is the growth in wage for males because , ceteris paribus. is the difference in the growth of wage between males and females. (c) Write the log wage equation explicitly for the two time period For t = 1, For t = 2, The differenced equation is derived as follows: โ log(๐ค๐๐๐๐๐ก ) = ๐2 + โ๐ง๐ ๐พ + ๐ฟ2 ๐๐๐๐๐๐๐ + โ๐ข๐ Where 10.4 a) Explain why including d2t is important in these contexts. In particular, what problems might be caused by leaving it out? It is important to include d2t because it will represent any economic shocks or any other shocks that could have happened when t = 2 or when the program has been initiated. In other words, d2t represents time specific factors that can affect the dependent variable. Hence, leaving out, d2t will result to an underestimation or overestimation of the dependent variable. b) Why is it important to include ci in the equation? It is important to include ci into the equation because first differences assume ci and the regressors to be correlated. If ci is not included in the equation, then an omitted variable problem would arise. c) Using the first differencing method, show that ๏ฑห2 ๏ฝ ๏y control and ๏คห2 ๏ฝ ๏y treat ๏ญ ๏y control , where ๏y control is the average change in y over the two periods for the group with progi2 = 0, and ๏y treat is the average change in y over the two periods for the group with progi2 = 1. This formula shows that ๏คห2 , the difference-in-differences estimator, arises out of an unobserved effects model. Writing out the equations for the two time periods, and taking d22 = 1 and d21 = 0, we have yi 2 ๏ฝ ๏ฑ1 ๏ซ ๏ฑ 2 ๏ซ ๏ค1 progi 2 ๏ซ ci ๏ซ ui 2 yi1 ๏ฝ ๏ฑ1 ๏ซ ๏ค1 progi1 ๏ซ ci ๏ซ ui1 Taking the difference, ๏yi ๏ฝ ๏ฑ 2 ๏ซ ๏ค1๏progi ๏ซ ๏ui ๏ฝ ๏ฑ 2 ๏ซ ๏ค1 progi ๏ซ ๏ui ๏ฌ1 i is in the treatment group . ๏ฎ0 i is in the control group where ๏progi ๏ฝ progi ๏ฝ ๏ญ Estimating this equation yields predicted values E ๏จ ๏yi ๏ฉ ๏ฝ ๏ฑห2 ๏ซ ๏คห1 E ๏จ progi ๏ฉ For all observations in the control group, progi = 0, thus the above equation is of the form E ๏จ ๏ycontrol ๏ฉ ๏ฝ ๏y control ๏ฝ ๏ฑห2 . On the other hand, for all observations in the treatment group, progi = 1, therefore E ๏จ ๏ytreatment ๏ฉ ๏ฝ ๏y treatment ๏ฝ ๏ฑห2 ๏ซ ๏คห1 Subtracting the control group equation from the treatment group equation gives us ๏ytreatment ๏ญ ๏y control ๏ฝ ๏คห1 . d) Write down the extension of the model for T time periods. With more than two time periods, a more general form for the model is will include all time dummies except for the base year. This way, first differenced equations for all periods will have an intercept. The structural equation becomes yit ๏ฝ ๏ฑ1 ๏ซ d t๏ฑ t ๏ซ ๏ค1 progit ๏ซ ci ๏ซ uit where dt = (d2t, d3t,โฆ,dtt,โฆ,dTt) is a 1 x (T-1) vector of time dummies and ฮธt = (ฮธ2, ฮธ3,โฆ, ฮธT) is a (T-1) x 1 vector of parameters. For periods t and t-1, we have yit ๏ฝ ๏ฑ1 ๏ซ ๏ฑt ๏ซ ๏ค1 progit ๏ซ ci ๏ซ uit yi ,t ๏ญ1 ๏ฝ ๏ฑ1 ๏ซ ๏ฑt -1 ๏ซ ๏ค1 progi ,t ๏ญ1 ๏ซ ci ๏ซ ui ,t ๏ญ1 First differencing all periods gives us ๏yit ๏ฝ ๏ง t ๏ซ ๏ค1๏progit ๏ซ ๏uit where ๏ง t ๏ฝ ๏ฑt ๏ญ ๏ฑt ๏ญ1 serves as the intercept of the equation for time period t. e) Which approach do you prefer, the DID estimator in part d) or the pooled OLS estimator using a pooled DID approach? Assuming that ci is correlated with the regressors, omitting ci will cause inconsistencies and other problems. Since pooled OLS will generally place ci at the error term, the pooled OLS estimates will be containing problems related to omitted variables. Furthermore, assuming that the idiosyncratic errors are serially correlated (since we are using DID for this problem and not fixed effects), pooled OLS will again generate inconsistent estimates. Thus, DID will be the preferred estimator because from DID removes the time constant variable, ci, and DID is a good estimator when there is serial correlation in the idiosyncratic errors. 10.8 (a) Use pooled OLS use http://fmwww.bc.edu/ec-p/data/wooldridge2k/NORWAY . reg lcrime d78 clrprc1 clrprc2 Source | SS df MS -------------+-----------------------------Model | 18.7948264 3 6.26494214 Residual | 21.1114968 102 .206975459 -------------+-----------------------------Total | 39.9063233 105 .380060222 Number of obs F( 3, 102) Prob > F R-squared Adj R-squared Root MSE = = = = = = 106 30.27 0.0000 0.4710 0.4554 .45495 -----------------------------------------------------------------------------lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d78 | -.0547246 .0944947 -0.58 0.564 -.2421544 .1327051 clrprc1 | -.0184955 .0053035 -3.49 0.001 -.0290149 -.007976 clrprc2 | -.0173881 .0054376 -3.20 0.002 -.0281735 -.0066026 _cons | 4.18122 .1878879 22.25 0.000 3.808545 4.553894 -----------------------------------------------------------------------------. reg lcrime d78 clrprc1 clrprc2, robust Linear regression Number of obs F( 3, 102) Prob > F R-squared Root MSE = = = = = 106 24.01 0.0000 0.4710 .45495 -----------------------------------------------------------------------------| Robust lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d78 | -.0547246 .0883541 -0.62 0.537 -.2299747 .1205254 clrprc1 | -.0184955 .0047622 -3.88 0.000 -.0279413 -.0090497 clrprc2 | -.0173881 .0045592 -3.81 0.000 -.0264311 -.008345 _cons | 4.18122 .1934741 21.61 0.000 3.797465 4.564975 The time dummy is not significantly different from zero but the lagged clear-up percentages are significantly different from zero and have negative coefficients. Specifically, a 10 percentage point increase in clear-up percentage one year leads to an estimated 18.5% drop in crime rates this year. A 10 percentage point increase in clear-up percentage two years ago leads to an estimated drop of 17.39% in crime rates this year. This implies that first and second lags of clear-up percentage deters present crime rate. The two variables (clrprc1 and clrprc2) are statistically significant at 5% significance level; the magnitudes are also significant. IS THERE SERIAL CORRELATION? . predict uHat (option xb assumed; fitted values) . reg uHat L.uHat Source | SS df MS -------------+-----------------------------Model | 3.17344694 1 3.17344694 Residual | 4.02695499 51 .078959902 -------------+-----------------------------Total | 7.20040193 52 .138469268 Number of obs F( 1, 51) Prob > F R-squared Adj R-squared Root MSE = = = = = = 53 40.19 0.0000 0.4407 0.4298 .281 -----------------------------------------------------------------------------uHat | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------uHat | L1. | .5592138 .0882095 6.34 0.000 .3821257 .7363018 _cons | 1.365023 .2296778 5.94 0.000 .9039254 1.82612 We reject the null hypothesis that there is no serial correlation. Serial correlation in the errors of a panel data model can be a result of an omitted time-constant factor. Thus, we cannot just use POLS because this will lead to inconsistent estimates. (b) Estimate the equation by fixed effects and compare with POLS. . xtreg lcrime d78 clrprc1 clrprc2,fe Fixed-effects (within) regression Group variable: district Number of obs Number of groups = = 106 53 R-sq: Obs per group: min = avg = max = 2 2.0 2 within = 0.4209 between = 0.4798 overall = 0.4234 corr(u_i, Xb) F(3,50) Prob > F = 0.3645 = = 12.12 0.0000 -----------------------------------------------------------------------------lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d78 | .0856556 .0637825 1.34 0.185 -.0424553 .2137665 clrprc1 | -.0040475 .0047199 -0.86 0.395 -.0135276 .0054326 clrprc2 | -.0131966 .0051946 -2.54 0.014 -.0236302 -.0027629 _cons | 3.350995 .2324736 14.41 0.000 2.884058 3.817932 -------------+---------------------------------------------------------------sigma_u | .47140473 sigma_e | .2436645 rho | .78915666 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(52, 50) = 5.88 Prob > F = 0.0000 . xtreg lcrime d78 clrprc1 clrprc2,fe robust Fixed-effects (within) regression Group variable: district Number of obs Number of groups = = 106 53 R-sq: Obs per group: min = avg = max = 2 2.0 2 within = 0.4209 between = 0.4798 overall = 0.4234 corr(u_i, Xb) = 0.3645 F(3,50) Prob > F = = 8.84 0.0001 (Std. Err. adjusted for clustering on district) -----------------------------------------------------------------------------| Robust lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d78 | .0856556 .0554876 1.54 0.129 -.0257945 .1971057 clrprc1 | -.0040475 .0042659 -0.95 0.347 -.0126158 .0045207 clrprc2 | -.0131966 .0047286 -2.79 0.007 -.0226942 -.003699 _cons | 3.350995 .2622724 12.78 0.000 2.824205 3.877785 -------------+---------------------------------------------------------------sigma_u | .47140473 sigma_e | .2436645 rho | .78915666 (fraction of variance due to u_i) Using fixed effects, clrpc1 becomes significant but clrpc2 remains significant but now at a higher significance level (at 10% significance level). The signs are still the same but the effect of the 2nd lag decreases in magnitude under Fixed Effects. Even if we have taken out the time-constant variable using Fixed Effects estimation, we still need to test for serial correlation of the idiosyncratic errors, to make sure that our FE estimates are consistent. . predict uHat (option xb assumed; fitted values) . reg uHat L.uHat Source | SS df MS -------------+-----------------------------Model | .759584583 1 .759584583 Residual | .970555983 51 .019030509 -------------+-----------------------------Total | 1.73014057 52 .033271934 Number of obs F( 1, 51) Prob > F R-squared Adj R-squared Root MSE = = = = = = 53 39.91 0.0000 0.4390 0.4280 .13795 -----------------------------------------------------------------------------uHat | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------uHat | L1. | .5644182 .0893384 6.32 0.000 .3850639 .7437725 _cons | 1.351664 .2300904 5.87 0.000 .8897386 1.81359 ------------------------------------------------------------------------------ (c) . test ( 1) clrprc1=clrprc2 clrprc1 - clrprc2 = 0 F( 1, 50) = Prob > F = 1.82 0.1828 We fail to reject the null hypothesis, Ho: ฮฒ1 = ฮฒ2. This implies that the effect of the 1st and 2nd lag of clear-up percentage of crime rate are virtually the same. Having rejected the null hypothesis, a more parsimonious model would be: ฬ ฬ ฬ ๐๐๐๐๐๐ = ๐1 + ๐ฝ1 (๐๐๐๐๐๐1 + ๐๐๐๐๐๐๐2 ) + ๐ขฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ๐ Where: ๐๐๐๐๐๐ = ๐๐๐๐๐๐๐๐ก โ ๐๐๐๐๐๐ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ๐ ๐๐๐๐๐๐1 = ๐๐๐๐๐๐1๐๐ก โ ๐๐๐๐๐๐1 ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ๐ ๐๐๐๐๐๐2 = ๐๐๐๐๐๐2๐๐ก โ ๐๐๐๐๐๐2 ๐ขฬ = ๐ข๐๐ก โ ๐ขฬ ๐ . egen avlcrime=mean(lcrime), by (district) . - preserve gen lcrimeDem= lcrime - avlcrime . - preserve egen avcl1=mean(clrprc1), by (district) . egen avcl2=mean(clrprc2), by (district) . gen cl1Dem= clrprc1 - avcl1 . gen cl2Dem= clrprc2 - avcl2 . . gen sumCL= cl1Dem + cl2Dem . reg lcrimeDem sumCL Source | SS df MS -------------+-----------------------------Model | 1.93960511 1 1.93960511 Residual | 3.18702681 104 .030644489 -------------+-----------------------------Total | 5.12663192 105 .048825066 Number of obs F( 1, 104) Prob > F R-squared Adj R-squared Root MSE = = = = = = 106 63.29 0.0000 0.3783 0.3724 .17506 -----------------------------------------------------------------------------lcrimeDem | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------sumCL | -.010951 .0013765 -7.96 0.000 -.0136807 -.0082214 _cons | -2.81e-09 .0170029 -0.00 1.000 -.0337174 .0337174 ------------------------------------------------------------------------------ Comparing the adjusted R-squared of this new model and the original FE model, we see that the R-squared of the original FE model is higher. This can be explained by the fact that the original FE model has one more explanatory variable used. 10.13 To see whether the minimization of the weighted sum of squared residuals can be a used as a procedure in the estimation of ฮฒ , we need to show that derived estimator of ฮฒ is consistent and asymptotically efficient. Let the minimization problem be: N T min ๏ฅ๏ฅ ๏จ yit ๏ญ a1d1i ๏ญ a2 d 2i ๏ญ ... ๏ญ aN dN i ๏ญ xit b ๏ฉ hit where dni ๏ฝ 1 if i ๏ฝ n . 2 i ๏ฝ1 t ๏ฝ1 Note that this is similar to the minimization problem of weighted least squares. This reduces to: N T min ๏ฅ๏ฅ ๏จ yit ๏ญ ai ๏ญ xit b ๏ฉ hit . i ๏ฝ1 t ๏ฝ1 2 (a) T ๏ฅ๏จ y Then, the first order condition with respect to ai is ๏ฆ T Solving for ai , we get: aหi ๏ฝ ๏ง ๏ฅ ๏จ t ๏ฝ1 Let wi ๏ฝ 1 T 1 t ๏ฝ1 it ๏ฅh T , yiw ๏ฝ wi ๏ฅ t ๏ฝ1 yit T xit ๏ถ ๏ญ ๏ฅ b๏ท hit t ๏ฝ1 hit ๏ธ t ๏ฝ1 T 1 t ๏ฝ1 it ๏ฅh it ๏ญ ai ๏ญ xit b ๏ฉ hit ๏ฝ 0 . . T yit x and xiw ๏ฝ wi ๏ฅ it . hit t ๏ฝ1 hit Then aหi ๏ฝ yiw๏ญ xiwb . Plugging this into (a), we have: min ๏ฅ๏ฅ ๏ฉ๏ซ๏จ yit ๏ญ yiw ๏ฉ ๏ญ ๏จ xit ๏ญ xiw ๏ฉ b ๏น๏ป N T 2 i ๏ฝ1 t ๏ฝ1 hit . Let yit ๏ฝ ๏จ yit ๏ญ yiw ๏ฉ hit and xit ๏ฝ ๏จ xit ๏ญ xiw ๏ฉ hit . N T Then, we have: min ๏ฅ๏ฅ ๏ yit ๏ญ xit b ๏ . b 2 i ๏ฝ1 t ๏ฝ1 Solving this minimization problem will give us the estimated pooled OLS for ฮฒ : ๏ญ1 ๏ฆ N T ๏ถ ๏ฆ N T ๏ถ ฮฒห ๏ฝ ๏ง ๏ฅ๏ฅ x๏ขit xit ๏ท ๏ง ๏ฅ๏ฅ x๏ขit yit ๏ท ๏จ i ๏ฝ1 t ๏ฝ1 ๏ธ ๏จ i ๏ฝ1 t ๏ฝ1 ๏ธ We check for the properties of ฮฒฬ . (i) Consistency T Let uiw ๏ฝ wi (๏ฅ uit / hit ) so that yiw ๏ฝ xiwฮฒ ๏ซ ci ๏ซ uiw . t ๏ฝ1 Subtracting this from yit ๏ฝ xit ฮฒ ๏ซ ci ๏ซ uit for all t gives: yit ๏ฝ xit ฮฒ ๏ซ uit where uit ๏ฝ (uit ๏ญ uiw ) / hit Plugging into (b) and dividing by N we have: ๏ญ1 N T N T ๏ฆ ๏ถ ๏ฆ ๏ถ ฮฒห ๏ฝ ฮฒ ๏ซ ๏ง N ๏ญ1 ๏ฅ๏ฅ x๏ขit xit ๏ท ๏ง N ๏ญ1 ๏ฅ๏ฅ x๏ขit uit ๏ท i ๏ฝ1 t ๏ฝ1 i ๏ฝ1 t ๏ฝ1 ๏จ ๏ธ ๏จ ๏ธ (b) T Since T ๏ฅ x๏ข u ๏ฝ ๏ฅ x๏ข u it it t ๏ฝ1 t ๏ฝ1 it it / hit ๏ญ1 N T N T ๏ฆ ๏ถ ๏ฆ ๏ถ ฮฒห ๏ฝ ฮฒ ๏ซ ๏ง N ๏ญ1 ๏ฅ๏ฅ x๏ขit xit ๏ท ๏ง N ๏ญ1 ๏ฅ๏ฅ x๏ขit uit / hit ๏ท i ๏ฝ1 t ๏ฝ1 i ๏ฝ1 t ๏ฝ1 ๏จ ๏ธ ๏จ ๏ธ (c) Assumption E (uit | xi , hi , ci ) ๏ฝ 0 implies E (x๏ขit uit ) ๏ฝ 0 Hence, p lim(ฮฒห ) ๏ฝ ฮฒ . ฮฒฬ is consistent. (ii) Asymptotic Efficiency T T t ๏ฝ1 t ๏ฝ1 Let A ๏ฝ ๏ฅ (x๏ขit xit ) and B ๏ฝ Var (๏ฅ (x๏ขit uit / hit ) The asymptotic variance is A var N (ฮฒห ๏ญ ฮฒ) ๏ฝ A๏ญ1BA๏ญ1 Assuming Cov(uit , uis | xi , hi , ci ) ๏ฝ 0 for t ๏น s Var (uit | xi , hi , ci ) ๏ฝ ๏ณ u2 hit Then we can show B ๏ฝ ๏ณ u2 A hence N (ฮฒห ๏ญ ฮฒ) ๏ฝ ๏ณ u2 A๏ญ1 By law of iterated expectations, T ๏ฅ E (u t ๏ฝ1 2 it ) ๏ฝ ๏ณ u2 {T ๏ญ E[ wi T ๏ฅ t ๏ฝ1 (1 / hit )]} ๏ฝ ๏ณ u2 (T ๏ญ 1) N T Hence A var(ฮฒห ) ๏ฝ ๏ณห u2 (๏ฅ๏ฅ x 'it xit ) ๏ญ1 i ๏ฝ1 t ๏ฝ1 Thus, it is possible to use the method of weighted least squares, or more specifically, the โfixed effects weighted least squaresโ in the estimation of parameters in a fixed effects model. The estimators of the parameters that will be derived are consistent and asymptotically efficient. 10.14 We have the unobserved effects model: ๐ฆ๐๐ก = ๐ผ + ๐๐๐ก ๐ท + ๐๐ ๐ธ + โ๐ + ๐ข๐๐ก ๐ธ(๐ข๐๐ก |๐๐ , ๐๐ , โ๐ ) = 0, ๐ธ(โ๐ |๐๐ , ๐๐ ) = 0 ๐ก = 1, โฆ , ๐ Let ๐โ2 = ๐๐๐(โ๐ ) and ๐๐ข2 = ๐๐๐(๐ข๐๐ก ). If we estimate ๐ท with fixed effects, we are estimating the equation: ๐ฆ๐๐ก = ๐๐๐ก ๐ท + ๐๐ + ๐ข๐๐ก Where: ๐๐ = ๐ผ + ๐๐ ๐ธ + โ๐ (a) ๐๐๐(๐๐ ) = ๐๐2 = ๐๐๐(๐ผ + ๐๐ ๐ธ + โ๐ ) ๐๐2 = ๐๐๐(๐๐ )๐ธโฒ ๐ธ + ๐๐๐(โ๐ ) We note that ๐ธ(โ๐ |๐๐ , ๐๐ ) = 0. As such, ๐ธ(โ๐ ๐ง๐ ) = 0. ๐๐2 = ๐๐๐(๐๐ )๐ธโฒ ๐ธ + ๐โ2 โฅ ๐โ2 We see that ๐๐2 = ๐โ2 if ๐๐๐(๐๐ )๐ธโฒ ๐ธ = 0. However, if ๐๐๐(๐๐ )๐ธโฒ ๐ธ > 0, then, ๐๐2 is strictly larger than ๐โ2 . (b) Unlike using random effects, we cannot include time-constant variables (even if they can be observed) in estimating the model, since time-constant variables are eliminated from the transformed time-demeaned equation. With fixed effects the time-constant variables, both observed and unobserved, are expressed in such a way that they are lumped into ๐๐ . Hence, as we can see from the results above, the variance of the unobserved effect in the fixed effects estimation gives us two components: the variance of the time-constant factors (which can be possibly observed) and the variance of the โtrue unobservedโ. With the inclusion of RE.1b, it is possible to include time-constant variables in estimating the model using random effects. Since we can control for the time-constant variables along with the time-varying variables, the variability of the unobserved effect is only due to the unobserved time-constant variable. The result does make intuitive sense, since through the random effects estimation we are able to control more variablesโin the sense that we can include time-constant variables in the estimation. As such, we can expect that fixed effects will lead to a larger estimated variance of the unobserved effect than if we estimate the model by random effects.