Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Page 1
Answers Chapter 11
11.1, 11.2, 11.7, 11.10 (parts a and b only).
11.1
See class notes on April 19th …we did all of these transformations in class.
Specification
Transformation:
for var(et)
 2 xt
Why???
Divide the model by X1/4:
independent variables: 1/X1/4
and X/X1/4
We divide by the standard
deviation, which is the square
root of the variance : X1/4.
We can ignore the  term in all
of the models since it doesn’t
vary over observations.
2xt
Divide the model by X1/2
Independent variables: 1/X
and X/X1/2
1/2
This is just like the one we did
in class. The standard deviation
is
xt
 2 xt2
Divide the model by X:
independent variables: 1/x plus
an intercept
Here, the standard deviation is
xt, so we divide by xt
2ln(xt)
Divide the model by (ln(X))1/2
Here the standard deviation is
Independent variables:
1/(ln(x))1/2
ln( xt )
Here is the SAS code and output from estimating the food expenditure model for each of these
transformations. Make sure to look through the results and pick out the estimates of 1 and 2.
options ls=78;
options formdlim='*';
data food;
infile 'c:\documents and settings\doylejm\my
documents\classes\ec385\ue\datafiles\table3-1.dat'
firstobs=2;
input y x;
proc reg ;
white: model y = x / acov;
run;
data foodgls;
set food;
y_a = y/(x**(1/4));
x1_a = 1/(x**(1/4));
x2_a = x/(x**(1/4));
y_b = y/sqrt(x);
x1_b = 1/sqrt(x);
Page 2
x2_b = x/sqrt(x);
y_c = y/x;
x_c = 1/x;
lnx = log(x);
y_d = y/sqrt(lnx);
x1_d = 1/sqrt(lnx);
x2_d = x/sqrt(lnx);
proc reg;
Part_A: model y_a = x1_a
x2_a
/ noint;
Part_B:
model y_b = x1_b
x2_b
/ noint;
Part_C:
model y_c = x_c;
Part_D:
model y_d = x1_d x2_d / noint;
run;
Model: white
Dependent Variable: y
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
38
39
25221
54311
79533
25221
1429.24556
Root MSE
Dependent Mean
Coeff Var
37.80536
130.31300
29.01120
R-Square
Adj R-Sq
F Value
Pr > F
17.65
0.0002
0.3171
0.2991
Parameter Estimates
Variable
Intercept
x
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
40.76756
0.12829
22.13865
0.03054
1.84
4.20
0.0734
0.0002
******************************************************************************
The SAS System
10
The REG Procedure
Model: white
Dependent Variable: y
Consistent Covariance of Estimates
Variable
Intercept
x
Intercept
x
561.88951794
-0.886643258
-0.886643258
0.0014569032
Page 3
Below are 4 sets of regression estimates, for parts a) – d).
The estimates that are
highlighted in yellow are estimates of the 1 (intercept in the original model)
and those highlighted blue are estimates of the 2 (slope in the original model).
Model: Part_A
Dependent Variable: y_a
NOTE: No intercept in model. R-Square is redefined.
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Uncorrected Total
2
38
40
26028
1915.30947
27943
13014
50.40288
Root MSE
Dependent Mean
Coeff Var
7.09950
25.30918
28.05108
R-Square
Adj R-Sq
F Value
Pr > F
258.20
<.0001
0.9315
0.9279
Parameter Estimates
Variable
x1_a
x2_a
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
36.75257
0.13391
20.05232
0.02879
1.83
4.65
0.0747
<.0001
******************************************************************************
Model: Part_B
Dependent Variable: y_b
NOTE: No intercept in model. R-Square is redefined.
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Uncorrected Total
2
38
40
978.84644
68.70197
1047.54841
489.42322
1.80795
Root MSE
Dependent Mean
Coeff Var
1.34460
4.93981
27.21965
Variable
x1_b
x2_b
DF
R-Square
Adj R-Sq
Parameter Estimates
Parameter
Standard
Estimate
Error
1
1
31.92438
0.14096
F Value
Pr > F
270.71
<.0001
0.9344
0.9310
t Value
Pr > |t|
1.77
5.22
0.0839
<.0001
17.98608
0.02700
******************************************************************************
Model: Part_C
Dependent Variable: y_c
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
1
38
39
0.00562
0.09285
0.09846
0.00562
0.00244
F Value
Pr > F
2.30
0.1377
Page 4
Root MSE
Dependent Mean
Coeff Var
0.04943
0.19116
25.85840
R-Square
Adj R-Sq
0.0571
0.0322
Parameter Estimates
Variable
Intercept
x_c
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
0.15769
21.28584
0.02342
14.03797
6.73
1.52
<.0001
0.1377
******************************************************************************
Model: Part_D
Dependent Variable: y_d
NOTE: No intercept in model. R-Square is redefined.
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Uncorrected Total
2
38
40
106685
8111.23735
114797
53343
213.45361
Root MSE
Dependent Mean
Coeff Var
Variable
x1_d
x2_d
14.61005
R-Square
50.89489
Adj R-Sq
28.7063
Parameter Estimates
F Value
Pr > F
249.90
<.0001
0.9293
0.9256
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
39.55015
0.12996
21.46901
0.02997
1.84
4.34
0.0733
0.0001
11.2
Code and SAS output appear below.
(a)
Countries with high per capita income can decide whether to spend larger amounts on education than
their poorer neighbours, or to spend more of their larger income on other things. They are likely
to have more discretion with respect to where public monies are spent. On the other hand,
countries with low per capita income may regard a particular level of education spending as
essential, meaning that they have less scope for deviating from a mean function. These
differences can be captured by a model with heteroskedasticity. Remember that
heteroskedasticity is more common in cross-section data.
(b) The least squares estimated function is
yt   01246
.
 0.07317 xt
(0.0485) (0.00518)
R2  0.862
This function and the corresponding residuals appear in Figure 11.1. The absolute magnitude of
the errors does tend to increase as x increases suggesting the existence of heteroskedasticity.
Page 5
Yt 1.6
1.4
1.2
y = - 0.1246 + 0.0732x
1.0
0.8
0.6
0.4
0.2
0.0
-0.2 0
5
10
15
20
Xt
Figure 11.1 Estimated Function for Education Expenditure
(c) Since it is suspected that, if heteroskedasticity exists, the variance is related to xt , we begin by
ordering the observations according to the magnitude of xt. Then, splitting the sample into two
equal subsets of 17 observations each, and applying least squares to each subset, we obtain  12
= 0.0081608 and  22 = 0.029127 leading to a Goldfelt-Quandt statistic of
GQ 
0.029127
= 3.569
0.008161
The critical value from an F-distribution with (15,15) degrees of freedom and a 5% significance
level is Fc = 2.40. Since 3.569 > 2.40 we reject a null hypothesis of homoskedasticity and
conclude that the error variance is directly related to per capita income xt.
(e) Generalized least squares estimation under the assumption var  et    2 xt yields
yt   0.0929  0.06932 xt
(0.0289) (0.00441)
(note: I have expressed these results in the model’s original form although it was estimated with
no intercept and two independent variables: the reciprocal of the square root of x and x over the
square root of x.) The estimated response of per capita education expenditure to per capita
income has declined slightly relative to the least squares estimate. The associated 95%
confidence interval is (0.0603, 0.0783). This interval is narrower than both those computed from
least squares estimates. The comparison with the White-calculated interval suggests that
generalized least squares is more efficient; a comparison with the conventional least squares
interval is not really valid because the standard errors used to compute that interval are not
valid. See below for the case were Var(et) = 2X2t. The differences of how this is carried out
and how to interpret the results is important.
Part B
Source
Least Squares results
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of
DF
Squares
Model
Error
Corrected Total
1
32
33
3.68386
0.59063
4.27449
Mean
Square
3.68386
0.01846
F Value
Pr > F
199.59
<.0001
Page 6
Root MSE
Dependent Mean
Coeff Var
Variable
Intercept
x
DF
1
1
0.13586
0.47674
28.49753
R-Square
Adj R-Sq
Parameter Estimates
Parameter
Standard
Estimate
Error
-0.12457
0.04852
0.07317
0.00518
0.8618
0.8575
t Value
-2.57
14.13
Pr > |t|
0.0151
<.0001
******************************************************************************
The REG Procedure
Model: MODEL1
Dependent Variable: y
This is part D, White standard
Error for b2 would the the square root of 0.0000363146 = 0.006. This is larger than the 0.00518 value
reported y least squares.
Consistent Covariance of Estimates
Variable
Intercept
x
Intercept
0.0015372135
-0.000211654
x
-0.000211654
0.0000363146
******************************************************************************
This regression gets you the numerator for the GQ-statistic
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
1
Error
15
Corrected Total
16
Root MSE
Dependent Mean
Coeff Var
Variable
Intercept
x
DF
1
1
0.42220
0.42220
14.50
0.0017
0.43690
0.02913
0.85910
0.17067
R-Square
0.4914
0.78115
Adj R-Sq
0.4575
21.84803
Parameter Estimates
Parameter
Standard
Estimate
Error
t Value
Pr > |t|
-0.14087
0.24569
-0.57
0.5749
0.07516
0.01974
3.81
0.0017
This regression gets you the denominator for the GQ-statistic
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Model
1
0.14225
0.14225
17.43
Error
15
0.12241
0.00816
Corrected Total
16
0.26466
Pr > F
0.0008
Root MSE
Dependent Mean
Coeff Var
Variable
Intercept
x
DF
1
1
0.09034
R-Square
0.5375
0.17232
Adj R-Sq
0.5066
52.42382
Parameter Estimates
Parameter
Standard
Estimate
Error
t Value
Pr > |t|
-0.03807
0.05495
-0.69
0.4990
0.05047
0.01209
4.17
0.0008
******************************************************************************
These are the critical values
The SAS System
Obs
fc
tc
17
Page 7
1
2.40345
2.03693
This regression corrects for heteroskedasticity of the form var(et) = 2Xt
The REG Procedure
Model: MODEL1
Dependent Variable: ystar
NOTE: No intercept in model. R-Square is redefined.
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
Error
Uncorrected Total
2
32
34
0.96083
0.06341
1.02423
0.48041
0.00198
242.45
<.0001
Root MSE
Dependent Mean
Coeff Var
0.04451
R-Square
0.9381
0.15116
Adj R-Sq
0.9342
29.44875
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
x1star
1
-0.09292
0.02890
-3.21
0.0030
x2star
1
0.06932
0.00441
15.71
<.0001
We predict that if GDP per capita increases by $1.00, pubic expenditures on education per capital will
increase by $0.069
******************************************************************************
This regression corrects for heteroskedasticity of the form var(et) = 2X2t
The REG Procedure
Model: MODEL1
Dependent Variable: ystar
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
1
0.00349
0.00349
12.69
0.0012
Error
32
0.00880
0.00027504
Corrected Total
33
0.01229
Root MSE
0.01658
R-Square
0.2840
Dependent Mean
0.05153
Adj R-Sq
0.2616
Coeff Var
32.18259
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
0.06443
0.00460
13.99
<.0001
xstar
1
-0.06739
0.01892
-3.56
0.0012
We predict that if GDP per capita increases by $1.00, pubic expenditures on education per capital will
increase by $0.064, because the intercept in this transformed model is actually the slope coefficient
the original model.
Here is the code that generated the results:
data pubexp;
infile 'A:pubexp.dat' firstobs=2;
input ee gdp pop;
y = ee/pop;
x = gdp/pop;
proc sort;
by descending x;
proc reg;
model y = x / acov;
output out=pubout p=yhat r=ehat;
symbol1 i=join c=green v=circle;
proc gplot;
plot y*x = '*' yhat*x = 1 /overlay;
plot ehat*x;
* create data set;
Page 8
data one;
set pubexp;
if _n_ <= 17;
proc reg;
model y = x;
data two;
set pubexp;
if _n_ >= 18;
proc reg;
model y = x;
* critcal values for tests;
data;
fc = finv(.95,15,15);
tc = tinv(.975,32);
proc print;
*PART E GLS via weighted least squares;;
data two;
set pubexp;
ystar = y/sqrt(x);
x1star = 1/sqrt(x);
x2star = x/sqrt(x);
** this code corrects for hetero assuming var(et) = sigmasq*xt;
proc reg ;
model ystar = x1star x2star / noint;
run;
data three;
set pubexp;
ystar = y/x;
xstar = 1/x;
** this code corrects for hetero assuming var(et) = sigmasq*xtsquare;
proc reg ;
model ystar = xstar;
run;
11.7
(a) The least squares estimates of equation (11.7.5) are
y t = 2.243 + 0.164 xt + 1.145 nt
(2.669) (0.035) (0.414)
R2 = 0.45
These results suggest that an increase in income of $1000 will increase food expenditure by
$164; an additional person in the household will increase food expenditure by $1,145. Both the
estimated slope coefficients are significantly different from zero.
(b) See Figures 11.2 and 11.3. Overall, the residuals tend to increase in absolute value as x increases
and as n increases. Thus, the plots suggest the existence of heteroskedasticity that is dependent
on both xt and nt.
Page 9
10
5
RESID
0
-5
-10
-15
20
40
60
80
100
X
Figure 11.2 Residuals Plotted Against Income.
10
5
RESID
0
-5
-10
-15
0
2
4
6
8
N
Figure 11.3 Residuals Plotted Against Number of Persons
(c)
(i) To perform the first Goldfeld-Quandt test we order the observations according to
decreasing values of xt. Then, we find the least squares regression of
yt  1   2 xt   3 nt  et for both the first and second halves of the observations, to obtain
estimates  12 and  22 , respectively. We find that  12 = 31.129 and  22 = 5.8819. Although
we are not hypothesizing constant error variances within each subsample, to perform the
Goldfeld-Quandt test we proceed as if H0 and H1 are given by H0: 12   22 and H1:
 22  12 . The test statistic value is:
GQ 
ˆ 12 31.129

 5.2923
ˆ 22 5.8819
The 5% critical value for (16, 16) degrees of freedom is Fc = 2.33. Thus, because GQ =
5.2923 > Fc = 2.33, we reject H0 and conclude that heteroskedasticity exists, and is
dependent on xt.
(ii) When we order the observations with respect to nt , there is not a unique ordering because
nt takes on repeated integer values. There are 8 observations where nt = 3. One of these
values must be included in the first 19 observations, the other 7 in the last 19 observations.
There are 8 ways of doing this. The results from SAS, EViews and SHAZAM are as
follows.
Page 10
GQ 
 12
 22

28.233
 2.88
9.799
(SAS)
These values are greater than 2.33, and so we reject a null hypothesis of homoskedasticity
and conclude that the error variances are dependent on nt. These test outcomes are consistent
with the evidence provided by the residual plots in part (b).
(d) The alternative variance estimators yield the following standard errors:
Standard Errors
Coefficients
White
Least Squares
2
3
0.0287
0.4360
0.0354
0.4140
The results from White's variance estimator suggest the usual least squares results would
underestimate the reliability of estimation for 2 and overestimate the reliability of estimation
for 3.
(e) To find generalized least squares estimates when  2t   2 ht   2 exp0.055xt  012
. nt  we
begin by calculating ht for each observation. Then we apply least squares to the transformed
model.
 yt 
 1 
 x 
 n   e 

  1 
   2  t   3  t    t 
 ht 
 ht 
 ht 
 ht   ht 
The resulting estimates, with those from least squares, and the White standard errors are in the
table below. The two estimates for 2 are similar, but the GLS estimate for the response of food
expenditure to an additional household member is noticeably higher. The standard errors
suggest that 1 and 3 have been more precisely estimated by GLS, but not 2. However, we do
need to keep in mind that standard errors are square roots of estimated variances. It is possible
for an improvement in precision to take place even when it is not reflected by the standard
errors.
Variable
constant
xt
nt
GLS
LS (White)
1.682
2.243
(1.760)
(2.270)
0.160
0.165
(0.032)
(0.029)
1.364
1.145
(0.285)
(0.436)
Page 11
11.10 (a) The graphs for plotting the residuals against income and age show that the absolute values of the
residuals increase as income increases but they appear to be constant as age increases. This
indicates that the error variance depends on income.
(b) Since the residual plot shows that the error variance may increase when income increases, and
this is a reasonable outcome since greater income implies greater flexibility in travel, we set the
null and alternative hypotheses as H0 : 12  22 against H1 : 12  22 . The test statistic is
GQ 
ˆ 12
(2.9471 107 ) (100  4)

 2.8124
ˆ 22
(1.0479  107 ) (100  4)
The 5% critical value for (96, 96) degrees of freedom is Fc  1.35 . Thus, we reject H 0 and
conclude that the error variance depends on income.