Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 Answers Chapter 9, 11 and 12 Suggested problems 9.2 (a) Considering each of the coefficients in turn, we have the following interpretations. Intercept: At the beginning of the time period over which observations were taken, on a day which is not Friday, Saturday or a holiday, and a day which has neither a full moon nor a half moon, the average number of emergency room cases was 94. T: The average number of emergency room cases has been increasing by 0.0338 per day. HOLIDAY: The average number of emergency room cases goes up by 13.9 on holidays. FRI and SAT: The average number of emergency room cases goes up by 6.9 and 10.6 on Fridays and Saturdays, respectively. FULLMOON: The average number of emergency room cases goes up by 2.45 on days when there is a full moon. However, a null hypothesis stating that a full moon has no influence on the number of emergency room cases would not be rejected. NEWMOON: The average number of emergency room cases goes up by 6.4 on days when there is a new moon. However, a null hypothesis stating that a new moon has no influence on the number of emergency room cases would not be rejected. (b) ****************************************************************************** The REG Procedure Model: MODEL1 Dependent Variable: calls Analysis of Variance Su of Mean Source DF Squares Square F Value Pr > F Model 6 5693.37691 948.89615 7.77 <.0001 Error 222 27109 122.11182 Corrected Total 228 32802 Root MSE 11.05042 R-Square 0.1736 Dependent Mean 100.56769 Adj R-Sq 0.1512 Coeff Var 10.98804 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 93.69583 1.55916 60.09 <.0001 t 1 0.03380 0.01105 3.06 0.0025 hol 1 13.86293 6.44517 2.15 0.0326 fri 1 6.90978 2.11132 3.27 0.0012 sat 1 10.58940 2.11843 5.00 <.0001 full 1 2.45445 3.98092 0.62 0.5382 new 1 6.40595 4.25689 1.50 0.1338 (c) The null and alternative hypotheses are H 0 : 6 7 0 H1 : 6 or 7 is nonzero. The test statistic is F ( SSER SSEU ) 2 SSEU (229 7) where SSE R = 27424 is the sum of squared errors from the estimated equation with FULLMOON and NEWMOON omitted and SSEU = 27109 is the sum of squared errors from the estimated equation with these variables included. The calculated value of the F statistic is 1.29 with corresponding p-value of 0.277. This p-value came from SAS output (see below). Page 2 Alternatively, you can get the Fcritical value of approx 3.07 at 5% level of significance. Thus, we do not reject the null hypothesis that new and full moons have no impact on the number of emergency room cases. Here is the restricted regression: The REG Procedure Model: MODEL2 Dependent Variable: calls Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 5378.00978 1344.50245 10.98 <.0001 Error 224 27424 122.42942 Corrected Total 228 32802 Root MSE 11.06478 R-Square 0.1640 Dependent Mean 100.56769 Adj R-Sq 0.1490 Coeff Var 11.00232 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 94.02146 1.54585 60.82 <.0001 t 1 0.03383 0.01107 3.06 0.0025 hol 1 13.61679 6.45107 2.11 0.0359 fri 1 6.84914 2.11367 3.24 0.0014 sat 1 10.34207 2.11533 4.89 <.0001 F = [(SSER – SSEU)/2]/(SSEU/(t-k)) =[(27424-27109)/2]/(27109/229-7) = 157.5/122.11 = 1.29 Note: the following code in sas will AUTOMATICALLY do the restricted F-test.. proc reg; model calls = t hol fri sat full new; test full=0, new=0; run; Here is the output…see the F stat at the bottom. The REG Procedure Model: MODEL1 Dependent Variable: calls Analysis of Variance Sum of Mean DF Squares Square 6 5693.37691 948.89615 222 27109 122.11182 228 32802 Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var Variable Intercept t DF 1 1 11.05042 100.56769 10.98804 R-Square Adj R-Sq Parameter Estimates Parameter Standard Estimate Error 93.69583 1.55916 0.03380 0.01105 F Value 7.77 Pr > F <.0001 0.1736 0.1512 t Value 60.09 3.06 Pr > |t| <.0001 0.0025 Page 3 hol 1 13.86293 6.44517 2.15 0.0326 fri 1 6.90978 2.11132 3.27 0.0012 sat 1 10.58940 2.11843 5.00 <.0001 full 1 2.45445 3.98092 0.62 0.5382 new 1 6.40595 4.25689 1.50 0.1338 ****************************************************************************** The REG Procedure Model: MODEL1 Test 1 Results for Dependent Variable calls Mean Source DF Square F Value Pr > F Numerator 2 157.68356 1.29 0.2770 Denominator 222 122.11182 11.1 See class handout on April 27th…we did all of these transformations in class. Specification Transformation: for var(et) 2 xt Why??? Divide the model by X1/4: independent variables: 1/X1/4 and X/X1/4 We divide by the standard deviation, which is the square root of the variance : X1/4. We can ignore the term in all of the models since it doesn’t vary over observations. 2xt Divide the model by X1/2 Independent variables: 1/X and X/X1/2 1/2 This is just like the one we did in class. The standard deviation xt is 2 xt2 Divide the model by X: independent variables: 1/x plus an intercept Here, the standard deviation is xt, so we divide by xt 2ln(xt) Divide the model by (ln(X))1/2 Here the standard deviation is Independent variables: 1/(ln(x))1/2 ln( xt ) 11.2 Code and SAS output appear below. (a) Countries with high per capita income can decide whether to spend larger amounts on education than their poorer neighbours, or to spend more of their larger income on other things. They are likely to have more discretion with respect to where public monies are spent. On the other hand, countries with low per capita income may regard a particular level of education spending as essential, meaning that they have less scope for deviating from a mean function. These differences can be captured by a model with heteroskedasticity. Remember that heteroskedasticity is more common in cross-section data. (b) The least squares estimated function is yt 01246 . 0.07317 xt (0.0485) (0.00518) R2 0.862 Page 4 This function and the corresponding residuals appear in Figure 11.1. The absolute magnitude of the errors does tend to increase as x increases suggesting the existence of heteroskedasticity. Yt 1.6 1.4 1.2 y = - 0.1246 + 0.0732x 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 0 5 10 15 20 Xt Figure 11.1 Estimated Function for Education Expenditure (c) Since it is suspected that, if heteroskedasticity exists, the variance is related to xt , we begin by ordering the observations according to the magnitude of xt. Then, splitting the sample into two equal subsets of 17 observations each, and applying least squares to each subset, we obtain 12 = 0.0081608 and 22 = 0.029127 leading to a Goldfelt-Quandt statistic of GQ 0.029127 = 3.569 0.008161 The critical value from an F-distribution with (15,15) degrees of freedom and a 5% significance level is Fc = 2.40. Since 3.569 > 2.40 we reject a null hypothesis of homoskedasticity and conclude that the error variance is directly related to per capita income xt. (e) Generalized least squares estimation under the assumption var et 2 xt yields yt 0.0929 0.06932 xt (0.0289) (0.00441) (note: I have expressed these results in the model’s original form although it was estimated with no intercept and two independent variables: the reciprocal of the square root of x and x over the square root of x.) The estimated response of per capita education expenditure to per capita income has declined slightly relative to the least squares estimate. The associated 95% confidence interval is (0.0603, 0.0783). This interval is narrower than both those computed from least squares estimates. The comparison with the White-calculated interval suggests that generalized least squares is more efficient; a comparison with the conventional least squares interval is not really valid because the standard errors used to compute that interval are not valid. See below for the case were Var(et) = 2X2t. The differences of how this is carried out and how to interpret the results is important. Part B Source Model Least Squares results The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of DF Squares 1 3.68386 Mean Square F Value Pr > F 3.68386 199.59 <.0001 Page 5 Error Corrected Total 32 33 0.59063 4.27449 Root MSE Dependent Mean Coeff Var 0.13586 0.47674 28.49753 Variable Intercept x DF 1 1 0.01846 R-Square Adj R-Sq Parameter Estimates Parameter Standard Estimate Error -0.12457 0.04852 0.07317 0.00518 0.8618 0.8575 t Value -2.57 14.13 Pr > |t| 0.0151 <.0001 ****************************************************************************** The REG Procedure Model: MODEL1 Dependent Variable: y This is part D, White standard Error for b2 would the the square root of 0.0000363146 = 0.006. This is larger than the 0.00518 value reported y least squares. Consistent Covariance of Estimates Variable Intercept x Intercept 0.0015372135 -0.000211654 x -0.000211654 0.0000363146 ****************************************************************************** This regression gets you the numerator for the GQ-statistic The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 Error 15 Corrected Total 16 Root MSE Dependent Mean Coeff Var Variable Intercept x DF 1 1 0.42220 0.42220 14.50 0.0017 0.43690 0.02913 0.85910 0.17067 R-Square 0.4914 0.78115 Adj R-Sq 0.4575 21.84803 Parameter Estimates Parameter Standard Estimate Error t Value Pr > |t| -0.14087 0.24569 -0.57 0.5749 0.07516 0.01974 3.81 0.0017 This regression gets you the denominator for the GQ-statistic The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Model 1 0.14225 0.14225 17.43 Error 15 0.12241 0.00816 Corrected Total 16 0.26466 Root MSE Dependent Mean Coeff Var Variable Intercept x DF 1 1 Pr > F 0.0008 0.09034 R-Square 0.5375 0.17232 Adj R-Sq 0.5066 52.42382 Parameter Estimates Parameter Standard Estimate Error t Value Pr > |t| -0.03807 0.05495 -0.69 0.4990 0.05047 0.01209 4.17 0.0008 Page 6 ****************************************************************************** These are the critical values The SAS System Obs fc tc 1 2.40345 2.03693 17 This regression corrects for heteroskedasticity of the form var(et) = 2Xt Source The REG Procedure Model: MODEL1 Dependent Variable: ystar NOTE: No intercept in model. R-Square is redefined. Analysis of Variance Sum of Mean DF Squares Square F Value Model Error Uncorrected Total 2 32 34 0.96083 0.06341 1.02423 0.48041 0.00198 242.45 Pr > F <.0001 Root MSE Dependent Mean Coeff Var 0.04451 R-Square 0.9381 0.15116 Adj R-Sq 0.9342 29.44875 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| x1star 1 -0.09292 0.02890 -3.21 0.0030 x2star 1 0.06932 0.00441 15.71 <.0001 We predict that if GDP per capita increases by $1.00, pubic expenditures on education per capital will increase by $0.069 ****************************************************************************** This regression corrects for heteroskedasticity of the form var(et) = 2X2t The REG Procedure Model: MODEL1 Dependent Variable: ystar Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 0.00349 0.00349 12.69 0.0012 Error 32 0.00880 0.00027504 Corrected Total 33 0.01229 Root MSE 0.01658 R-Square 0.2840 Dependent Mean 0.05153 Adj R-Sq 0.2616 Coeff Var 32.18259 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.06443 0.00460 13.99 <.0001 xstar 1 -0.06739 0.01892 -3.56 0.0012 We predict that if GDP per capita increases by $1.00, pubic expenditures on education per capital will increase by $0.064, because the intercept in this transformed model is actually the slope coefficient the original model. Here is the code that generated the results: data pubexp; infile 'A:pubexp.dat' firstobs=2; input ee gdp pop; y = ee/pop; x = gdp/pop; proc sort; by descending x; proc reg; model y = x / acov; output out=pubout p=yhat r=ehat; symbol1 i=join c=green v=circle; * create data set; Page 7 proc gplot; plot y*x = '*' yhat*x = 1 /overlay; plot ehat*x; data one; set pubexp; if _n_ <= 17; proc reg; model y = x; data two; set pubexp; if _n_ >= 18; proc reg; model y = x; * critcal values for tests; data; fc = finv(.95,15,15); tc = tinv(.975,32); proc print; *PART E GLS via weighted least squares;; data two; set pubexp; ystar = y/sqrt(x); x1star = 1/sqrt(x); x2star = x/sqrt(x); ** this code corrects for hetero assuming var(et) = sigmasq*xt; proc reg ; model ystar = x1star x2star / noint; run; data three; set pubexp; ystar = y/x; xstar = 1/x; ** this code corrects for hetero assuming var(et) = sigmasq*xtsquare; proc reg ; model ystar = xstar; run; 11.7 (a) The least squares estimates of equation (11.7.5) are y t = 2.243 + 0.164 xt + 1.145 nt (2.669) (0.035) (0.414) R2 = 0.45 These results suggest that an increase in income of $1000 will increase food expenditure by $164; an additional person in the household will increase food expenditure by $1,145. Both the estimated slope coefficients are significantly different from zero. (b) See Figures 11.2 and 11.3. Overall, the residuals tend to increase in absolute value as x increases and as n increases. Thus, the plots suggest the existence of heteroskedasticity that is dependent on both xt and nt. Page 8 10 5 RESID 0 -5 -10 -15 20 40 60 80 100 X Figure 11.2 Residuals Plotted Against Income. 10 5 RESID 0 -5 -10 -15 0 2 4 6 8 N Figure 11.3 Residuals Plotted Against Number of Persons (c) (i) To perform the first Goldfeld-Quandt test we order the observations according to decreasing values of xt. Then, we find the least squares regression of yt 1 2 xt 3 nt et for both the first and second halves of the observations, to obtain estimates 12 and 22 , respectively. We find that 12 = 31.129 and 22 = 5.8819. Although we are not hypothesizing constant error variances within each subsample, to perform the Goldfeld-Quandt test we proceed as if H0 and H1 are given by H0: 12 22 and H1: 22 12 . The test statistic value is: GQ ˆ 12 31.129 5.2923 ˆ 22 5.8819 The 5% critical value for (16, 16) degrees of freedom is Fc = 2.33. Thus, because GQ = 5.2923 > Fc = 2.33, we reject H0 and conclude that heteroskedasticity exists, and is dependent on xt. (ii) When we order the observations with respect to nt , there is not a unique ordering because nt takes on repeated integer values. There are 8 observations where nt = 3. One of these values must be included in the first 19 observations, the other 7 in the last 19 observations. There are 8 ways of doing this. The results from SAS, EViews and SHAZAM are as follows. Page 9 GQ 12 22 28.233 2.88 9.799 (SAS) These values are greater than 2.33, and so we reject a null hypothesis of homoskedasticity and conclude that the error variances are dependent on nt. These test outcomes are consistent with the evidence provided by the residual plots in part (b). (d) The alternative variance estimators yield the following standard errors: Standard Errors Coefficients White Least Squares 2 3 0.0287 0.4360 0.0354 0.4140 The results from White's variance estimator suggest the usual least squares results would underestimate the reliability of estimation for 2 and overestimate the reliability of estimation for 3. (e) To find generalized least squares estimates when 2t 2 ht 2 exp0.055xt 012 . nt we begin by calculating ht for each observation. Then we apply least squares to the transformed model. yt 1 x n e 1 2 t 3 t t ht ht ht ht ht The resulting estimates, with those from least squares, and the White standard errors are in the table below. The two estimates for 2 are similar, but the GLS estimate for the response of food expenditure to an additional household member is noticeably higher. The standard errors suggest that 1 and 3 have been more precisely estimated by GLS, but not 2. However, we do need to keep in mind that standard errors are square roots of estimated variances. It is possible for an improvement in precision to take place even when it is not reflected by the standard errors. Variable constant xt nt GLS LS (White) 1.682 2.243 (1.760) (2.270) 0.160 0.165 (0.032) (0.029) 1.364 1.145 (0.285) (0.436) Page 10 11.10 (a) The graphs for plotting the residuals against income and age show that the absolute values of the residuals increase as income increases but they appear to be constant as age increases. This indicates that the error variance depends on income. (b) Since the residual plot shows that the error variance may increase when income increases, and this is a reasonable outcome since greater income implies greater flexibility in travel, we set the null and alternative hypotheses as H0 : 12 22 against H1 : 12 22 . The test statistic is GQ ˆ 12 (2.9471 107 ) (100 4) 2.8124 ˆ 22 (1.0479 107 ) (100 4) The 5% critical value for (96, 96) degrees of freedom is Fc 1.35 . Thus, we reject H 0 and conclude that the error variance depends on income.