Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A SAS" Macro for the Box-Cox Transformation : Estimation and Testing Charles Hallahan, Economic Research Service, USDA economists. ':An important difference between theoretical statistics and applied statistics is that in the former a probability model is taken as a given starting poin~ whereas in applied statistics a model is often selected with the aid of the data" (see [10J.) After selecting the variables to be included in a regression model, the functional form needs to be determined. Linear model specification may involve a transformation of the variables in the model so that the errors are independent. normally distributed and homoskedastic. The Box·Cox family of power transformations encompasses log transforms, square roots, reciprocals, and no transformation. A SAS macro, BOXCOX, has been written to estimate the power parameter 1, perform a modified Anderson·Darling test for normality of the residuals, graph the residuals via a frequency polygon, perform Ramsey's RESET specification test and calculate elasticities for the OLS, log-linear and transformed models. 1. Introduction. Among the assumptions usually made for the classical linear regression model are linearity, constant variance, and normality of the residuals. Tukey's suggestion of a family of power transformations to achieve these goals was later modified by Box and Cox [2l Given a variable z, defIne z()") as follows: z'" = (z' - 1)/~ if ~ = log(z) t 0 if>. =0. For the regression model with dependent variable y and explanatory variables Xl' .••• "to: (where Xl = 1) • a value of A is sought so that 2 Estimation Method. The BOXCOX macro uses the notation and estimation technique as described in Draper and Smith [6], 225-235. Write the transformed model as: ~ andXisnbyk. Here, we're assuming that the ~'s are not transformed [f A is such that £. is Nffi, all), then the likelihood function can be maximized to determine the maximum likelihood estimators (MLE's) for f1. aI, and >.. The log likelihood is: L(I!,~,a' I y,x) = -n/210g(2~) (2) var('J = a' (3) .. - N(Q,a' I). Note that we must have y > 0 in order for y).,) to be generally defined for values of A different from L A more general case is to allow each ~ to be similarly transformed with a Aj ,j = 2,k. The intermediate situation of allowing the ~'s to be transformed by the same A as that. used for y is an option in the BOXCOX macro described below. The special case of transforming both y and the x's by A = 0 is the so-called log-linear specification, of interest to - n/210g(a') -'ha·'I:P' - XC), <1" - XJ!.) +(~ - l)Llog(y.). As described in [6], a straight-forward way to maximize L is to let>. vary over some range, say -2 to 2. Conditional on a value for>. in this range, estimate /1 and a I by least squares and form Lmax(~) = -Yill. log(u'(~)) + log(J(~,y)) where 0 2 is the usual mean square error and J(A~) is the Jacobian of the transformation from ~ to j:, i.e. J(>"'Yi) = y/"\. Incrementing>. over the interval, the value of A, say A, which produces the maximum value of Lmax(A) is the MLE of A. In [181, Spitzer develops a modified Newton method to perform the maximum likelihood estimation. The grid search method described above is used in the macro because of its simplicity. For a discussion of the Box-Cox transformation in an econometric context, see Fomby el al [8J, p. 423-431. i"" 1,n (1) = X/1 + £., where Wi = y;{).,) ,i = l,n a Standard errors for the P/s and can be calculated either conditionally on A or unconditionally where A is considered to be estimated along with the PJ's and o. Conditional standard error estimates are what would be obtained if A was first determined, the data transformed and the transformed model estimated by OLS in a statistical package with j treated as known. The BOXCOX macro calculates unconditional standard errors by using Ihe Berndl-Hall-Hall-Hausman (BHHH) estimator of the information matrix. Letting !L = (fJ.' ,a2,A)' , the information matrix is defined as: 1339 r Then [-a~/a!a!' evaluated at! == is a consistent estimator of the variance-covariance matrix of the MLE of !. The BHHH method estimates 18 with -[8L/8lU 8L/8f], _See Fomby et al [8J. p_ 429 fo' the necessary derivatives. e The question of whether inference on the {J's should be unconditional ur conditioned on the estimated>.. is debated at length in a series of papers by Bickel and Doksum [1], Box and Cox [3], and Hinkley and Runger [10]. By calculating both conditional and unconditional standard errors for the {J's, users of the macro can make their own choice. Economists are interested in elasticities. The elasticity of y with respect to an explanatory variable "i is the percentage change in y associated with a one percent change in "i, i.e., the derivative of log(y) with respect to log("i)' When only y has been transformed, the functional relationship between y and "i is (y' - 1)/), = iJj~ + c, The chain rule leads to the elasticity of y with respect to "i being When the mean and standard deviation are estimated, a modified statistic A· for finite samples is preferred : This test has been further modified in the case of Box-Cox transformed models to account for the fact that>.. is also estimated, see Linnet [12]. The macro BOXCOX uses this final statistic when testing normality of the residuals from the transformed model. Fmally, significance tests for skewness and kurtosis of the regression residuals are performed using the results of Kiefer and Salmon [11]. It is hoped that these tests will convey a consistent message. 4. Specification Testing. We've seen that the linear and log-linear models can be considered as special cases of the Box-Cox transformed family of models. In [9], Godfrey et al examine various tests of the linear and log-linear models and conclude that '~.RESET appears to be most useful in combining relatively good power with simplicity of computation, and is therefore to be recommended", Ramsey's RESET(p ) test amounts to augmenting the original regressors with powers of the predicted values for the null model (either the linear,or log-linear models) and tests the joint significance of the added variables. The "p" refers to the number of powers used. The macro BOXCO X just includes RESET(2) now. Elasticities are usually measured at the mean values of y and "i' For the log-linear model, where .\ = 0 and the x's are also transformed, the elasticity is constant at Pi' 3. Residual Checking. The RESET test does not require a specific alternative. In an earlier paper, Davidson and MacKinnon [5] derived Lagrange Multiplier tests for the linear and log· linear models against the more general Box-Cox model. However, Godfrey et aJ included the Davidson-MacKinnon tests in their Monte Carlo analysis and concluded that "... they can be very unreliable when disturbances are flot nonnally distributed". Consequently, these tests were not included in BOXCOX. The hypothesis that the transformed model has (approximately) normal errors is checked by the BOXCOX macro in two ways. Graphically, a frequency polygon, which is a smoothed histogram - see [15J, of the residuals is plotted. A frequency poIygon is constructed by joining with a straight line the mid-bin values of a histogram. Scott [15] shows that the frequency polygon dominates the histogram in terms of integrated mean square error. The optimal bin width is found to be 2.15sn·1/ 5, where s is an estimate of the standard deviation from a sample of size n. 5. Other Issues. The Anderson·Darling test for normality is also performed. The Anderson-Darling test belongs to the class of tests known as EDF tests since they are based on the empirical distribution function. The Anderson-Darling test is the EDF test recommended by D'Agostino et al [41, page 372. The statistic is calculated by first ordering the sample values in increasing order and standardizing to produce Xci») i=l,n. Let 2i == F(Xc,), where F is the standard normal cdC. Then defme Other issues related to the Box-Cox transformation are: (i) it's usefulness in forecasting, (ii) the effects of autocorrelation and heteroskedasticity on the estimation of >... (iii) the effects of truncation. (I) 1340 The question of how useful the Box-Cox transformation is for producing forecasts is addressed by Nelson and Granger in [13]. Using twenty-one actual economic time series, they concluded that ..... using the Box-Cox transform does not consistently produce superior forecasts . ... A main problem found was that no value of A appeared to produce normally distributed data and so the maximum likelihood procedure was inappropriate." Nelson-Granger fit univariate ARIMA models to generate their forecasts. In [17], Spitzer performs variuus Monte Carlo experiments to study the small sample properties of parameter estimates in Box-Cox transformed models. This approach differs from that of Nelson-Granger by using simulated data and regression models to generate forecasts. Spitzer's conclusions also diffeCThe use of the Box-Cox model for forecasting purposes is most promising. The relative errors of forecast are small and the forecasts are unbiased and of remarkably small variance. The good performance of the model for forecasting seems to hold true even when the parameters are poorly estimated." (ii) (iii) The macro has two required parameters, LHS = name of the dependent variable, and RHS = naID'es of the independent variables. The other arguments all have defaults : DATA = last -TITLE ~ -L ~ -2 -U = 2 )NCR =.1 TX == no - LOGTOL = 0.1 =OLS = yes LOGLIN = yes -ElAST = yes -lSTOBS = 1 -OBS = 0 -WS = 100 -GRAPH %include 'adstat.mac'; %inc1ude 'freqpoly.mac'; %include 'boxcox.mac'; %boxcoxlDATA=data1, _ L=-3, _ U=3,_INCR=.05, _LHS~y, _RHS~xl) It was noted ahove that we need y > 0 in order to apply a Box-Cox transformation. If Y<" = (/ - 1)/~ = It f!. + "then y > 0 ~ > -1/A -~' References [1] Bickel,P. and K. Doksum, 'An Analysis of Transformations Revisited', JASA, June 1981. Vol 76, No. 374, 296-311. [2] Box,G.E.P. and D.R. Cox, 'An AnalY'is of Transformations', 1964, JRSS, Ser. B, Vol.26, 211-243. [3] , 'An AnalY'is of Transformations Revisited, Rebutted', JASA, March 1982, Vol 77, No. 377, 209-210. (4) D'Agostino,R. and M. Stephens, 'Goodness-of-Fit Techniques', 1986, Marcel Dekker. [5] Davidson,R. and J. MacKinnon, 'Testing Linear and LogIinear Regressions Against Box-Cox Alternatives', Aug 1985, Canadian J. of Economics, VoLlS, 499-517. [6) Draper,N.R. and H. Smith, 'Applied Regression Analysis, Second Edition', 1981, Wiley. [7} Draper,N.R. and D.R. Cox, 'On Distributions and Their Transformation to Normality', 1969, JRSS, Ser. B, VoG1, 472-476. [8] Fomhy,T. and R. Hill, S. Johnson, 'Advanced fl if A > 0 and , < -1/~ -~' f!. if ~ < yes The macros ADSTAT, for the Anderson-Darling statistic, and FREOPOLY, for the frequency polygon. have been written as separate macros and are called by BOXCOX. This was done to keep the size of BOXCOX from becoming unwieldy and also allows ADSTAT and FREQPOLY to be used independently of BQXCOX. A sample use of the macro would be: Seaks and Layson in {16] combine the estimation of A along with the assumption of autocorre1ated errors or the presence of heteroskedasticity. The likelihood functions for these so-called extended models and algorithms for obtaining maximum likelihood estimators are presented. Ignoring either autocorrelation or nonconstant variance can lead to biased estimates. For example, in a case where A = 1, yet the error variance V( E:;) is related to E(YJ, the estimate of A will be biased toward A "" O. This follows from the property of the log transform to stabilize the error variance in the presence of heteroskedasticity. E: := =DEVICE = ps2ega _RESET ~ 2 _DW "" yes : Input data set : Output label : A lower bound : A upper bound : ~ increment : Transform x's'! : Default to log : Estimate OLS : Estimate log-linear model : Calculate elasticities : 1st obs. to use : Last obs. to use, 0 = all obs. : IML workspace size : Frequency polygon : Graphics device : Ramsey's RESET(2) : Durbin-Watson test o. Thus, E: is necessarily a truncated normal, see Poirier [14]. However, Draper and Cox [7} show that as long as the distribution of E: is reasonably symmetric and not too badly truncated then the Box-Cox procedure leads to approximately consistent estimates of >.. In [17], Spitzer's simulations showed that truncation did not have an adverse effect on parameter estimation nor lead to significant bias. 6. The BOXCOX Macro. 1341 Econometric Methods',1984, Springer-Verlag. [9] Godfrey,L.G. and M. McAleer, C.R. McKenzie, 'Variable Addition and Lagrange Multiplier Tests for Linear and Logarithmic Regression Models', August 1988, The Review of Economics and Statistics, Vol. 70 #3, 492-503. [10] Hinkley,D. and G. Runge<, 'The Analysis of Transformed Data', June 1984, JASA, Vol.79, No.386, 302-320. [11] Kiefer,N. and M. Salmon, "'Testing Normality in Econometric Models', 1983, Economic Letters, Vol.U, 123-127 [12J Linnet,K., 'Testing Normality of Transformed Data', Applied Statistics, 1988, Vo!.37, No.2, 180-186. 113} Nelson,H. and C.WJ. Granger, 'Experience Using the Box-Cox Transformation When Forecasting Economic Time Series', 1979, J.Econometrics, Vo1.10, 57-69. [14] Poirier,D., ne Use of the Box-Cox Transformation in Limited Dependent Variable Models', JASA, June 1978, Vol 76, No. 362, 284-287. [15] Scott,D., ~Frequency Polygons: Theory and Applications', JASA, June 1985, Vol.80, No.390, 348- 354. [16] Seaks,T., and S. Layson, 'Box-Cox Estimation with Standard Econometric Problems', Feb. 1983, Rev. of Economics and Statistics, No.1, 160-164. [17] Spitzer,J., 'A Monte Carlo Investigation of the BoxCox Transformation in Small Samples', Sept 1978, JASA, Vo!.73, NO.363, 488-495. [18] ,'A Fast and Efficient Algorithm for the Estimation of Parameters in Models with the Box-andCox Transformation', JASA, December 1982, Vol.77, No.380, 760-766. [19] , 'A Primer on Box-Cox Estimation', The Review of Economics and Statistics, 1982, Vol.64, 307313. The sample output that follows uses the example from Draper and Smith, p.225-235. The SAS statements submitted are: %include '.\sugi15\boxcox.mac'; %include '.\sugi15\freqpoly.mac'; %include '.\sugiI5\adstat.mac; libname in '. \sugilS; options nosymbolgen nomprint; %boxcoxldata = in.boxcox,)hs=visc,_rhs=oil filler, 1=-.2, u=.l, incr=.01, title= Using Data from Draper & Smith Test-the Macr~ )oglin=oo, _device=ftle); to Sample Output from Macro BOXCOX ---Macro to F'md BOX-COX Power Transformation-----CF:Draper-Smith (1981) See p.225-235 for Notation--Using Data from Draper & Smith to Test the Macro Data set used = in.boxcox Lower limit for lambda = -.2 Upper limit for lambda = .1 Increment for lambda = .01 Dependent variable = visc Independent Variables = oil filler Using all of the observations X Variables oot transformed OLS model is estimated Log1inear model is not estimated Elasticities are computed Durbin-Watson computed Do frequency polygon graphs Graphics output device is ps2ega # of values of lambda to use = 30 The SAS macros BOXCOX, ADSTAT, :md FREQPOLY are too long to be included in this paper. Copies of the macros can be obtained from me at : ----------Linear OLS Model----------Number of Observations = 23 Mean of dependent variable vise = ERS/USDA, Room 240 1301 New York Ave, NW Washington, DC 20005 parameter constant OIL FILLER SAS and SAS/IML are registered trademarks of SAS Institute, Cary, NC, USA. beta 28.184 -1.717 1.559 52.348 s.e. 6.332 0.264 0.145 elasticity 4.451 -6.502 10.735 0.000 -0.470 0.932 error variance for Linear OLS model: 191.03009 R-square for Linear OLS model is 0.879 Mean square error for Linear Ol.S model = 13.821364 Durbin-Watson statistic = 1.078 --Ramsey RESET(2) Test for Misspecification of OLS Model-F = 291.294 1342 Sigoificance = 0.000 ---------Transformed Model----------maximum value of LMAX function is -14.78198 corresponding value of lambda is -0.05 95% c. i. for lambda (using chi-sqr approx) is from -0.13 to 0.02 "'" =using log transformation since lambda dose to 0= = = ---Macro to find Box-cox Transformation------cf: Draper-Smith (1981) See p. 225-235 for Notation--Plot of !.MAX Fen See DRAPER-SMITH, Fig.5.2,p.230 Horizontal Line is at .5*chisqr(1,.95) Use to Select 95 % Confidence Interval for Lambda By Dropping Vertical Line from Intersection of Curve & Line Using Data from Draper & Smith to Test the Macro Plot of LMAX*LAMBDA. Symbol used is "'''. Plot of TEST05*lAMBDA. Symbol used is '-'. Estimation Results for Transformed Model -14+ I I Number of observations = 23 Mean of transformed dependent variable vise = 3.7268039 Mean of untransformed dependent variable vise = 52.34 Elasticities computed at means of untransformed data I I **** I -15+ I Results using (inconsistent) OLS formula for std errors of parameters I I I I parameter beta -16+ L I s.e. constant 3.212206 OIL ·0.031518 FILLER 0.030884 0.023295 0.000971 0.000534 MI A I 137.892126 -32.453198 57.818343 X I I -17+ I Results using (consistent) BHHH method for estimating information matrix I I I I -18+ parameter beta constant 3.21 OIL -0.03 FILLER 0.03 s.e. 0.0276 116.33 0.00 0.0010 -29.86 -0.55 0.0006 48.41 1.17 standard error for lambda = I I I I elasticity Mean of X 1.00 14.34 31.30 I -19+ I I 0.056456 I I approximate 95% c.i. for lambda is from ..Q.162 to 0.0629 error vat. (sigma-squared) for transformed model: 0'(X)258 std error of regression for transformed model: 0.0508 std error for estimate of sigma-squared = 0.001239 R-square for transformed model is 0.995 I -20+ * --- +--------- +--------- +--------- + --------- +-------- +--------. + -0.20 Durbin-Watson statistic = 1.666 1343 -0.15 -0.10 -0.05 LAMBDA 0.00 0.05 ----SCOIT'S FREQUENCY POLYGON----cf: Frequency Polygons: Theory and Application by D. Scott, JASA, June 1985, 348-354 Anderson-Darling Normality Test for Residuals Data used is : OlS Model Residuals Output device is ps2ega bin width"" 1.1484 xmin '" -1.183 xmax = 2.6784 Number of bins = 4 cf: Testing Normality of Transformed Data by K. Linnet, Applied Statistics, 1988,37, No.2, 180-186 Data series used : OLS Residuals Data not transformed by Box-Cox lambda Statistics for tested series II'Rl!I:QU""CY -'11 . --- l'OLTGQJII A -. -, . . . Length: 23 Mean: 14E-15 SId dey : 13.178 Skewness: 1.134 chi-square = 3.758 sig. = 0.0526 Kurtosis: 0.855 chi-square = 0.153 sig. = 0.6958 chi-square = 3.911 sig. = 0.1415 Joint Test: . . . . The value for the unadjusted Anderson-Darling statistic is 0.8886 p-value for test is between 0.025 and 0.01 Figure 1 OLS Residuals -----SCOIT'S FREQUENCY POLYGON---Data used is : Box Cox Model Residuals Output device is ps2ega bin width = 1.1484 xmin '" -1.707 xmax = 1.7152 Number of bins = 3 II'RBOU£NCY Anderson-Darling Normality Test for Residuals Data series used: Box Cox Residuals Data has been transformed by Box-Cox lambda Statistics for tested series Length: 23 Mean: -7E-16 std dev : 0.0485 Skewness: 0.043 chi-square =:: 0.005 sig. = 0.9418 Kurtosis: -0.904 chi-square = 0.743 sig. = 0.3886 chi-square = 0.749 sig. = 0.6877 Joint Test: POLYGoN The value for the adjusted Anderson-Darling statistic is 0.41 p-value for test is greater than .20 Figure 2 Box-Cox Residuals 1344