Download 10.2 Suppose you have T=2 years of data on the same group of N

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bias of an estimator wikipedia , lookup

Expectationโ€“maximization algorithm wikipedia , lookup

Regression analysis wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
10.2
Suppose you have T=2 years of data on the same group of N working individuals.
Consider the following model of wage determination:
The unobserved effect
is allowed to be correlated with
is a time period indicator, where
if t=2 and
follows, assume that:
and
The variable
if t=1. In what
(a) Without further assumptions, what parameters is the log wage equation can be
consistently estimated?
Given the assumptions, we can estimate the parameters using Fixed Effects (FE) since we
allow the
to be correlated with
and
First, we eliminate the unobserved
effect by differentiating between the time periods. The resulting equation will still be a
standard linear model, except that it is stated in terms of the differences of all the
variables included in the original equation:
.
FE.1 is satisfied as given, i.e.
. Without FE.2,
, we cannot conclude whether the FE estimator is consistent. But if
we satisfy the rank condition, then we can consistently estimate the parameters of this
log(wage) equation.
(b) Interpret the coefficients
is the growth in wage for males because
, ceteris paribus.
is the difference in the growth of wage between males and females.
(c) Write the log wage equation explicitly for the two time period
For t = 1,
For t = 2,
The differenced equation is derived as follows:
โˆ† log(๐‘ค๐‘Ž๐‘”๐‘’๐‘–๐‘ก ) = ๐œƒ2 + โˆ†๐‘ง๐‘– ๐›พ + ๐›ฟ2 ๐‘“๐‘’๐‘š๐‘Ž๐‘™๐‘’๐‘– + โˆ†๐‘ข๐‘–
Where
10.4
a) Explain why including d2t is important in these contexts. In particular, what
problems might be caused by leaving it out?
It is important to include d2t because it will represent any economic shocks or any
other shocks that could have happened when t = 2 or when the program has been
initiated. In other words, d2t represents time specific factors that can affect the
dependent variable. Hence, leaving out, d2t will result to an underestimation or
overestimation of the dependent variable.
b) Why is it important to include ci in the equation?
It is important to include ci into the equation because first differences assume ci and
the regressors to be correlated. If ci is not included in the equation, then an omitted
variable problem would arise.
c) Using the first differencing method, show that ๏ฑห†2 ๏€ฝ ๏„y control and ๏คห†2 ๏€ฝ ๏„y treat ๏€ญ ๏„y control ,
where ๏„y control is the average change in y over the two periods for the group with
progi2 = 0, and ๏„y treat is the average change in y over the two periods for the group
with progi2 = 1. This formula shows that ๏คห†2 , the difference-in-differences estimator,
arises out of an unobserved effects model.
Writing out the equations for the two time periods, and taking d22 = 1 and d21 = 0,
we have
yi 2 ๏€ฝ ๏ฑ1 ๏€ซ ๏ฑ 2 ๏€ซ ๏ค1 progi 2 ๏€ซ ci ๏€ซ ui 2
yi1 ๏€ฝ ๏ฑ1 ๏€ซ ๏ค1 progi1 ๏€ซ ci ๏€ซ ui1
Taking the difference,
๏„yi ๏€ฝ ๏ฑ 2 ๏€ซ ๏ค1๏„progi ๏€ซ ๏„ui
๏€ฝ ๏ฑ 2 ๏€ซ ๏ค1 progi ๏€ซ ๏„ui
๏ƒฌ1 i is in the treatment group
.
๏ƒฎ0 i is in the control group
where ๏„progi ๏€ฝ progi ๏€ฝ ๏ƒญ
Estimating this equation yields predicted values
E ๏€จ ๏„yi ๏€ฉ ๏€ฝ ๏ฑห†2 ๏€ซ ๏คห†1 E ๏€จ progi ๏€ฉ
For all observations in the control group, progi = 0, thus the above equation is of the
form
E ๏€จ ๏„ycontrol ๏€ฉ ๏€ฝ ๏„y control ๏€ฝ ๏ฑห†2 .
On the other hand, for all observations in the treatment group, progi = 1, therefore
E ๏€จ ๏„ytreatment ๏€ฉ ๏€ฝ ๏„y treatment ๏€ฝ ๏ฑห†2 ๏€ซ ๏คห†1
Subtracting the control group equation from the treatment group equation gives us
๏„ytreatment ๏€ญ ๏„y control ๏€ฝ ๏คห†1 .
d) Write down the extension of the model for T time periods.
With more than two time periods, a more general form for the model is will include
all time dummies except for the base year. This way, first differenced equations for
all periods will have an intercept. The structural equation becomes
yit ๏€ฝ ๏ฑ1 ๏€ซ d t๏ฑ t ๏€ซ ๏ค1 progit ๏€ซ ci ๏€ซ uit
where dt = (d2t, d3t,โ€ฆ,dtt,โ€ฆ,dTt) is a 1 x (T-1) vector of time dummies and ฮธt = (ฮธ2,
ฮธ3,โ€ฆ, ฮธT) is a (T-1) x 1 vector of parameters.
For periods t and t-1, we have
yit ๏€ฝ ๏ฑ1 ๏€ซ ๏ฑt ๏€ซ ๏ค1 progit ๏€ซ ci ๏€ซ uit
yi ,t ๏€ญ1 ๏€ฝ ๏ฑ1 ๏€ซ ๏ฑt -1 ๏€ซ ๏ค1 progi ,t ๏€ญ1 ๏€ซ ci ๏€ซ ui ,t ๏€ญ1
First differencing all periods gives us
๏„yit ๏€ฝ ๏ง t ๏€ซ ๏ค1๏„progit ๏€ซ ๏„uit
where ๏ง t ๏€ฝ ๏ฑt ๏€ญ ๏ฑt ๏€ญ1 serves as the intercept of the equation for time period t.
e) Which approach do you prefer, the DID estimator in part d) or the pooled OLS
estimator using a pooled DID approach?
Assuming that ci is correlated with the regressors, omitting ci will cause
inconsistencies and other problems. Since pooled OLS will generally place ci at the
error term, the pooled OLS estimates will be containing problems related to omitted
variables. Furthermore, assuming that the idiosyncratic errors are serially correlated
(since we are using DID for this problem and not fixed effects), pooled OLS will
again generate inconsistent estimates. Thus, DID will be the preferred estimator
because from DID removes the time constant variable, ci, and DID is a good
estimator when there is serial correlation in the idiosyncratic errors.
10.8
(a) Use pooled OLS
use http://fmwww.bc.edu/ec-p/data/wooldridge2k/NORWAY
. reg
lcrime d78 clrprc1 clrprc2
Source |
SS
df
MS
-------------+-----------------------------Model | 18.7948264
3 6.26494214
Residual | 21.1114968
102 .206975459
-------------+-----------------------------Total | 39.9063233
105 .380060222
Number of obs
F( 3,
102)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
106
30.27
0.0000
0.4710
0.4554
.45495
-----------------------------------------------------------------------------lcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d78 | -.0547246
.0944947
-0.58
0.564
-.2421544
.1327051
clrprc1 | -.0184955
.0053035
-3.49
0.001
-.0290149
-.007976
clrprc2 | -.0173881
.0054376
-3.20
0.002
-.0281735
-.0066026
_cons |
4.18122
.1878879
22.25
0.000
3.808545
4.553894
-----------------------------------------------------------------------------. reg
lcrime d78 clrprc1 clrprc2, robust
Linear regression
Number of obs
F( 3,
102)
Prob > F
R-squared
Root MSE
=
=
=
=
=
106
24.01
0.0000
0.4710
.45495
-----------------------------------------------------------------------------|
Robust
lcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d78 | -.0547246
.0883541
-0.62
0.537
-.2299747
.1205254
clrprc1 | -.0184955
.0047622
-3.88
0.000
-.0279413
-.0090497
clrprc2 | -.0173881
.0045592
-3.81
0.000
-.0264311
-.008345
_cons |
4.18122
.1934741
21.61
0.000
3.797465
4.564975
The time dummy is not significantly different from zero but the lagged clear-up
percentages are significantly different from zero and have negative coefficients.
Specifically, a 10 percentage point increase in clear-up percentage one year leads to an
estimated 18.5% drop in crime rates this year. A 10 percentage point increase in clear-up
percentage two years ago leads to an estimated drop of 17.39% in crime rates this year.
This implies that first and second lags of clear-up percentage deters present crime rate.
The two variables (clrprc1 and clrprc2) are statistically significant at 5% significance
level; the magnitudes are also significant.
IS THERE SERIAL CORRELATION?
. predict uHat
(option xb assumed; fitted values)
. reg uHat L.uHat
Source |
SS
df
MS
-------------+-----------------------------Model | 3.17344694
1 3.17344694
Residual | 4.02695499
51 .078959902
-------------+-----------------------------Total | 7.20040193
52 .138469268
Number of obs
F( 1,
51)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
53
40.19
0.0000
0.4407
0.4298
.281
-----------------------------------------------------------------------------uHat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------uHat |
L1. |
.5592138
.0882095
6.34
0.000
.3821257
.7363018
_cons |
1.365023
.2296778
5.94
0.000
.9039254
1.82612
We reject the null hypothesis that there is no serial correlation. Serial correlation in the
errors of a panel data model can be a result of an omitted time-constant factor. Thus, we
cannot just use POLS because this will lead to inconsistent estimates.
(b) Estimate the equation by fixed effects and compare with POLS.
. xtreg
lcrime d78 clrprc1 clrprc2,fe
Fixed-effects (within) regression
Group variable: district
Number of obs
Number of groups
=
=
106
53
R-sq:
Obs per group: min =
avg =
max =
2
2.0
2
within = 0.4209
between = 0.4798
overall = 0.4234
corr(u_i, Xb)
F(3,50)
Prob > F
= 0.3645
=
=
12.12
0.0000
-----------------------------------------------------------------------------lcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d78 |
.0856556
.0637825
1.34
0.185
-.0424553
.2137665
clrprc1 | -.0040475
.0047199
-0.86
0.395
-.0135276
.0054326
clrprc2 | -.0131966
.0051946
-2.54
0.014
-.0236302
-.0027629
_cons |
3.350995
.2324736
14.41
0.000
2.884058
3.817932
-------------+---------------------------------------------------------------sigma_u | .47140473
sigma_e |
.2436645
rho | .78915666
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(52, 50) =
5.88
Prob > F = 0.0000
. xtreg
lcrime d78 clrprc1 clrprc2,fe robust
Fixed-effects (within) regression
Group variable: district
Number of obs
Number of groups
=
=
106
53
R-sq:
Obs per group: min =
avg =
max =
2
2.0
2
within = 0.4209
between = 0.4798
overall = 0.4234
corr(u_i, Xb)
= 0.3645
F(3,50)
Prob > F
=
=
8.84
0.0001
(Std. Err. adjusted for clustering on district)
-----------------------------------------------------------------------------|
Robust
lcrime |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d78 |
.0856556
.0554876
1.54
0.129
-.0257945
.1971057
clrprc1 | -.0040475
.0042659
-0.95
0.347
-.0126158
.0045207
clrprc2 | -.0131966
.0047286
-2.79
0.007
-.0226942
-.003699
_cons |
3.350995
.2622724
12.78
0.000
2.824205
3.877785
-------------+---------------------------------------------------------------sigma_u | .47140473
sigma_e |
.2436645
rho | .78915666
(fraction of variance due to u_i)
Using fixed effects, clrpc1 becomes significant but clrpc2 remains significant but now at
a higher significance level (at 10% significance level). The signs are still the same but
the effect of the 2nd lag decreases in magnitude under Fixed Effects.
Even if we have taken out the time-constant variable using Fixed Effects estimation, we
still need to test for serial correlation of the idiosyncratic errors, to make sure that our FE
estimates are consistent.
. predict uHat
(option xb assumed; fitted values)
. reg uHat L.uHat
Source |
SS
df
MS
-------------+-----------------------------Model | .759584583
1 .759584583
Residual | .970555983
51 .019030509
-------------+-----------------------------Total | 1.73014057
52 .033271934
Number of obs
F( 1,
51)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
53
39.91
0.0000
0.4390
0.4280
.13795
-----------------------------------------------------------------------------uHat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------uHat |
L1. |
.5644182
.0893384
6.32
0.000
.3850639
.7437725
_cons |
1.351664
.2300904
5.87
0.000
.8897386
1.81359
------------------------------------------------------------------------------
(c)
. test
( 1)
clrprc1=clrprc2
clrprc1 - clrprc2 = 0
F(
1,
50) =
Prob > F =
1.82
0.1828
We fail to reject the null hypothesis, Ho: ฮฒ1 = ฮฒ2.
This implies that the effect of the 1st and 2nd lag of clear-up percentage of crime rate are
virtually the same.
Having rejected the null hypothesis, a more parsimonious model would be:
ฬˆ
ฬˆ
ฬˆ
๐‘™๐‘๐‘Ÿ๐‘–๐‘š๐‘’
= ๐œƒ1 + ๐›ฝ1 (๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘1
+ ๐‘๐‘™๐‘Ÿ๐‘๐‘๐‘Ÿ๐‘2
) + ๐‘ขฬˆ
ฬˆ
ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…๐‘–
Where:
๐‘™๐‘๐‘Ÿ๐‘–๐‘š๐‘’
= ๐‘™๐‘๐‘Ÿ๐‘–๐‘š๐‘’๐‘–๐‘ก โˆ’ ๐‘™๐‘๐‘Ÿ๐‘–๐‘š๐‘’
ฬˆ
ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…๐‘–
๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘1
= ๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘1๐‘–๐‘ก โˆ’ ๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘1
ฬˆ
ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…๐‘–
๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘2 = ๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘2๐‘–๐‘ก โˆ’ ๐‘๐‘™๐‘Ÿ๐‘๐‘Ÿ๐‘2
๐‘ขฬˆ = ๐‘ข๐‘–๐‘ก โˆ’ ๐‘ขฬ…๐‘–
. egen avlcrime=mean(lcrime), by (district)
. - preserve
gen lcrimeDem= lcrime - avlcrime
. - preserve
egen avcl1=mean(clrprc1), by (district)
. egen avcl2=mean(clrprc2), by (district)
. gen cl1Dem= clrprc1 - avcl1
. gen cl2Dem= clrprc2 - avcl2
. . gen sumCL= cl1Dem + cl2Dem
. reg
lcrimeDem sumCL
Source |
SS
df
MS
-------------+-----------------------------Model | 1.93960511
1 1.93960511
Residual | 3.18702681
104 .030644489
-------------+-----------------------------Total | 5.12663192
105 .048825066
Number of obs
F( 1,
104)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
106
63.29
0.0000
0.3783
0.3724
.17506
-----------------------------------------------------------------------------lcrimeDem |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sumCL |
-.010951
.0013765
-7.96
0.000
-.0136807
-.0082214
_cons | -2.81e-09
.0170029
-0.00
1.000
-.0337174
.0337174
------------------------------------------------------------------------------
Comparing the adjusted R-squared of this new model and the original FE model, we see
that the R-squared of the original FE model is higher. This can be explained by the fact
that the original FE model has one more explanatory variable used.
10.13
To see whether the minimization of the weighted sum of squared residuals can be a used
as a procedure in the estimation of ฮฒ , we need to show that derived estimator of ฮฒ is
consistent and asymptotically efficient.
Let the minimization problem be:
N
T
min ๏ƒฅ๏ƒฅ ๏€จ yit ๏€ญ a1d1i ๏€ญ a2 d 2i ๏€ญ ... ๏€ญ aN dN i ๏€ญ xit b ๏€ฉ hit where dni ๏€ฝ 1 if i ๏€ฝ n .
2
i ๏€ฝ1 t ๏€ฝ1
Note that this is similar to the minimization problem of weighted least squares.
This reduces to:
N
T
min ๏ƒฅ๏ƒฅ ๏€จ yit ๏€ญ ai ๏€ญ xit b ๏€ฉ hit .
i ๏€ฝ1 t ๏€ฝ1
2
(a)
T
๏ƒฅ๏€จ y
Then, the first order condition with respect to ai is
๏ƒฆ
T
Solving for ai , we get: aห†i ๏€ฝ ๏ƒง ๏ƒฅ
๏ƒจ t ๏€ฝ1
Let wi ๏€ฝ 1
T
1
t ๏€ฝ1
it
๏ƒฅh
T
, yiw ๏€ฝ wi ๏ƒฅ
t ๏€ฝ1
yit T xit ๏ƒถ
๏€ญ ๏ƒฅ b๏ƒท
hit t ๏€ฝ1 hit ๏ƒธ
t ๏€ฝ1
T
1
t ๏€ฝ1
it
๏ƒฅh
it
๏€ญ ai ๏€ญ xit b ๏€ฉ hit ๏€ฝ 0 .
.
T
yit
x
and xiw ๏€ฝ wi ๏ƒฅ it .
hit
t ๏€ฝ1 hit
Then aห†i ๏€ฝ yiw๏€ญ xiwb .
Plugging this into (a), we have:
min ๏ƒฅ๏ƒฅ ๏ƒฉ๏ƒซ๏€จ yit ๏€ญ yiw ๏€ฉ ๏€ญ ๏€จ xit ๏€ญ xiw ๏€ฉ b ๏ƒน๏ƒป
N
T
2
i ๏€ฝ1 t ๏€ฝ1
hit .
Let yit ๏€ฝ ๏€จ yit ๏€ญ yiw ๏€ฉ hit and xit ๏€ฝ ๏€จ xit ๏€ญ xiw ๏€ฉ hit .
N
T
Then, we have: min ๏ƒฅ๏ƒฅ ๏› yit ๏€ญ xit b ๏ .
b
2
i ๏€ฝ1 t ๏€ฝ1
Solving this minimization problem will give us the estimated pooled OLS for ฮฒ :
๏€ญ1
๏ƒฆ N T
๏ƒถ ๏ƒฆ N T
๏ƒถ
ฮฒห† ๏€ฝ ๏ƒง ๏ƒฅ๏ƒฅ x๏‚ขit xit ๏ƒท ๏ƒง ๏ƒฅ๏ƒฅ x๏‚ขit yit ๏ƒท
๏ƒจ i ๏€ฝ1 t ๏€ฝ1
๏ƒธ ๏ƒจ i ๏€ฝ1 t ๏€ฝ1
๏ƒธ
We check for the properties of ฮฒฬ‚ .
(i) Consistency
T
Let uiw ๏€ฝ wi (๏ƒฅ uit / hit ) so that yiw ๏€ฝ xiwฮฒ ๏€ซ ci ๏€ซ uiw .
t ๏€ฝ1
Subtracting this from yit ๏€ฝ xit ฮฒ ๏€ซ ci ๏€ซ uit for all t gives:
yit ๏€ฝ xit ฮฒ ๏€ซ uit where uit ๏€ฝ (uit ๏€ญ uiw ) / hit
Plugging into (b) and dividing by N we have:
๏€ญ1
N T
N T
๏ƒฆ
๏ƒถ ๏ƒฆ
๏ƒถ
ฮฒห† ๏€ฝ ฮฒ ๏€ซ ๏ƒง N ๏€ญ1 ๏ƒฅ๏ƒฅ x๏‚ขit xit ๏ƒท ๏ƒง N ๏€ญ1 ๏ƒฅ๏ƒฅ x๏‚ขit uit ๏ƒท
i ๏€ฝ1 t ๏€ฝ1
i ๏€ฝ1 t ๏€ฝ1
๏ƒจ
๏ƒธ ๏ƒจ
๏ƒธ
(b)
T
Since
T
๏ƒฅ x๏‚ข u ๏€ฝ ๏ƒฅ x๏‚ข u
it it
t ๏€ฝ1
t ๏€ฝ1
it it
/ hit
๏€ญ1
N T
N T
๏ƒฆ
๏ƒถ ๏ƒฆ
๏ƒถ
ฮฒห† ๏€ฝ ฮฒ ๏€ซ ๏ƒง N ๏€ญ1 ๏ƒฅ๏ƒฅ x๏‚ขit xit ๏ƒท ๏ƒง N ๏€ญ1 ๏ƒฅ๏ƒฅ x๏‚ขit uit / hit ๏ƒท
i ๏€ฝ1 t ๏€ฝ1
i ๏€ฝ1 t ๏€ฝ1
๏ƒจ
๏ƒธ ๏ƒจ
๏ƒธ
(c)
Assumption E (uit | xi , hi , ci ) ๏€ฝ 0 implies E (x๏‚ขit uit ) ๏€ฝ 0
Hence, p lim(ฮฒห† ) ๏€ฝ ฮฒ . ฮฒฬ‚ is consistent.
(ii) Asymptotic Efficiency
T
T
t ๏€ฝ1
t ๏€ฝ1
Let A ๏€ฝ ๏ƒฅ (x๏‚ขit xit ) and B ๏€ฝ Var (๏ƒฅ (x๏‚ขit uit / hit )
The asymptotic variance is
A var N (ฮฒห† ๏€ญ ฮฒ) ๏€ฝ A๏€ญ1BA๏€ญ1
Assuming
Cov(uit , uis | xi , hi , ci ) ๏€ฝ 0 for t ๏‚น s
Var (uit | xi , hi , ci ) ๏€ฝ ๏ณ u2 hit
Then we can show B ๏€ฝ ๏ณ u2 A hence
N (ฮฒห† ๏€ญ ฮฒ) ๏€ฝ ๏ณ u2 A๏€ญ1
By law of iterated expectations,
T
๏ƒฅ E (u
t ๏€ฝ1
2
it
) ๏€ฝ ๏ณ u2 {T ๏€ญ E[ wi
T
๏ƒฅ
t ๏€ฝ1
(1 / hit )]}
๏€ฝ ๏ณ u2 (T ๏€ญ 1)
N
T
Hence A var(ฮฒห† ) ๏€ฝ ๏ณห† u2 (๏ƒฅ๏ƒฅ x 'it xit ) ๏€ญ1
i ๏€ฝ1 t ๏€ฝ1
Thus, it is possible to use the method of weighted least squares, or more specifically, the
โ€˜fixed effects weighted least squaresโ€™ in the estimation of parameters in a fixed effects
model. The estimators of the parameters that will be derived are consistent and
asymptotically efficient.
10.14
We have the unobserved effects model:
๐‘ฆ๐‘–๐‘ก = ๐›ผ + ๐’™๐‘–๐‘ก ๐œท + ๐’›๐‘– ๐œธ + โ„Ž๐‘– + ๐‘ข๐‘–๐‘ก
๐ธ(๐‘ข๐‘–๐‘ก |๐’™๐‘– , ๐’›๐‘– , โ„Ž๐‘– ) = 0,
๐ธ(โ„Ž๐‘– |๐’™๐‘– , ๐’›๐‘– ) = 0
๐‘ก = 1, โ€ฆ , ๐‘‡
Let ๐œŽโ„Ž2 = ๐‘‰๐‘Ž๐‘Ÿ(โ„Ž๐‘– ) and ๐œŽ๐‘ข2 = ๐‘‰๐‘Ž๐‘Ÿ(๐‘ข๐‘–๐‘ก ).
If we estimate ๐œท with fixed effects, we are estimating the equation:
๐‘ฆ๐‘–๐‘ก = ๐’™๐‘–๐‘ก ๐œท + ๐‘๐‘– + ๐‘ข๐‘–๐‘ก
Where: ๐‘๐‘– = ๐›ผ + ๐’›๐‘– ๐œธ + โ„Ž๐‘–
(a)
๐‘‰๐‘Ž๐‘Ÿ(๐‘๐‘– ) = ๐œŽ๐‘2 = ๐‘‰๐‘Ž๐‘Ÿ(๐›ผ + ๐’›๐‘– ๐œธ + โ„Ž๐‘– )
๐œŽ๐‘2 = ๐‘‰๐‘Ž๐‘Ÿ(๐’›๐‘– )๐œธโ€ฒ ๐œธ + ๐‘‰๐‘Ž๐‘Ÿ(โ„Ž๐‘– )
We note that ๐ธ(โ„Ž๐‘– |๐’™๐‘– , ๐’›๐‘– ) = 0. As such, ๐ธ(โ„Ž๐‘– ๐‘ง๐‘– ) = 0.
๐œŽ๐‘2 = ๐‘‰๐‘Ž๐‘Ÿ(๐’›๐‘– )๐œธโ€ฒ ๐œธ + ๐œŽโ„Ž2 โ‰ฅ ๐œŽโ„Ž2
We see that ๐œŽ๐‘2 = ๐œŽโ„Ž2 if ๐‘‰๐‘Ž๐‘Ÿ(๐’›๐‘– )๐œธโ€ฒ ๐œธ = 0. However, if ๐‘‰๐‘Ž๐‘Ÿ(๐’›๐‘– )๐œธโ€ฒ ๐œธ > 0, then, ๐œŽ๐‘2 is
strictly larger than ๐œŽโ„Ž2 .
(b)
Unlike using random effects, we cannot include time-constant variables (even if they can
be observed) in estimating the model, since time-constant variables are eliminated from
the transformed time-demeaned equation. With fixed effects the time-constant variables,
both observed and unobserved, are expressed in such a way that they are lumped into ๐‘๐‘– .
Hence, as we can see from the results above, the variance of the unobserved effect in the
fixed effects estimation gives us two components: the variance of the time-constant
factors (which can be possibly observed) and the variance of the โ€œtrue unobservedโ€.
With the inclusion of RE.1b, it is possible to include time-constant variables in estimating
the model using random effects. Since we can control for the time-constant variables
along with the time-varying variables, the variability of the unobserved effect is only due
to the unobserved time-constant variable. The result does make intuitive sense, since
through the random effects estimation we are able to control more variablesโ€”in the sense
that we can include time-constant variables in the estimation. As such, we can expect that
fixed effects will lead to a larger estimated variance of the unobserved effect than if we
estimate the model by random effects.