Download A SAS® Macro for the Box-Cox Transformation: Estimation and Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Forecasting wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
A SAS" Macro for the Box-Cox Transformation : Estimation and Testing
Charles Hallahan, Economic Research Service, USDA
economists.
':An important difference between theoretical statistics and
applied statistics is that in the former a probability model is
taken as a given starting poin~ whereas in applied statistics
a model is often selected with the aid of the data"
(see [10J.)
After selecting the variables to be included in a regression
model, the functional form needs to be determined. Linear
model specification may involve a transformation of the
variables in the model so that the errors are independent.
normally distributed and homoskedastic. The Box·Cox
family of power transformations encompasses log
transforms, square roots, reciprocals, and no
transformation. A SAS macro, BOXCOX, has been
written to estimate the power parameter 1, perform a
modified Anderson·Darling test for normality of the
residuals, graph the residuals via a frequency polygon,
perform Ramsey's RESET specification test and calculate
elasticities for the OLS, log-linear and transformed
models.
1. Introduction.
Among the assumptions usually made for the classical
linear regression model are linearity, constant variance,
and normality of the residuals. Tukey's suggestion of a
family of power transformations to achieve these goals was
later modified by Box and Cox [2l Given a variable z,
defIne z()") as follows:
z'" = (z' - 1)/~ if ~
=
log(z)
t
0
if>. =0.
For the regression model with dependent variable y and
explanatory variables Xl' .••• "to: (where Xl = 1) • a value of
A is sought so that
2 Estimation Method.
The BOXCOX macro uses the notation and estimation
technique as described in Draper and Smith [6], 225-235.
Write the transformed model as:
~
andXisnbyk.
Here, we're assuming that the ~'s are not transformed [f
A is such that £. is Nffi, all), then the likelihood function
can be maximized to determine the maximum likelihood
estimators (MLE's) for f1. aI, and >.. The log likelihood
is:
L(I!,~,a' I y,x) = -n/210g(2~)
(2)
var('J = a'
(3)
.. - N(Q,a' I).
Note that we must have y > 0 in order for y).,) to be
generally defined for values of A different from L A more
general case is to allow each ~ to be similarly transformed
with a Aj ,j = 2,k. The intermediate situation of allowing
the ~'s to be transformed by the same A as that. used for y
is an option in the BOXCOX macro described below. The
special case of transforming both y and the x's by A = 0 is
the so-called log-linear specification, of interest to
- n/210g(a')
-'ha·'I:P' - XC), <1" - XJ!.)
+(~ - l)Llog(y.).
As described in [6], a straight-forward way to maximize L
is to let>. vary over some range, say -2 to 2. Conditional
on a value for>. in this range, estimate /1 and a I by least
squares and form
Lmax(~) = -Yill. log(u'(~))
+
log(J(~,y))
where 0 2 is the usual mean square error and J(A~) is the
Jacobian of the transformation from ~ to j:, i.e. J(>"'Yi) =
y/"\. Incrementing>. over the interval, the value of A, say
A, which produces the maximum value of Lmax(A) is the
MLE of A. In [181, Spitzer develops a modified Newton
method to perform the maximum likelihood estimation.
The grid search method described above is used in the
macro because of its simplicity. For a discussion of the
Box-Cox transformation in an econometric context, see
Fomby el al [8J, p. 423-431.
i"" 1,n
(1)
= X/1 + £., where Wi = y;{).,) ,i = l,n
a
Standard errors for the P/s and
can be calculated
either conditionally on A or unconditionally where A is
considered to be estimated along with the PJ's and o.
Conditional standard error estimates are what would be
obtained if A was first determined, the data transformed
and the transformed model estimated by OLS in a
statistical package with j treated as known. The
BOXCOX macro calculates unconditional standard errors
by using Ihe Berndl-Hall-Hall-Hausman (BHHH)
estimator of the information matrix.
Letting !L = (fJ.' ,a2,A)' , the information matrix is
defined as:
1339
r
Then [-a~/a!a!' evaluated at! == is a consistent
estimator of the variance-covariance matrix of the MLE of
!. The BHHH method estimates 18 with
-[8L/8lU 8L/8f], _See Fomby et al [8J. p_ 429 fo' the
necessary derivatives.
e
The question of whether inference on the {J's should be
unconditional ur conditioned on the estimated>.. is
debated at length in a series of papers by Bickel and
Doksum [1], Box and Cox [3], and Hinkley and Runger
[10]. By calculating both conditional and unconditional
standard errors for the {J's, users of the macro can make
their own choice.
Economists are interested in elasticities. The elasticity of y
with respect to an explanatory variable "i is the percentage
change in y associated with a one percent change in "i, i.e.,
the derivative of log(y) with respect to log("i)' When only
y has been transformed, the functional relationship
between y and "i is
(y' - 1)/),
= iJj~
+
c,
The chain rule leads to the elasticity of y with respect to "i
being
When the mean and standard deviation are estimated, a
modified statistic A· for finite samples is preferred :
This test has been further modified in the case of Box-Cox
transformed models to account for the fact that>.. is also
estimated, see Linnet [12]. The macro BOXCOX uses this
final statistic when testing normality of the residuals from
the transformed model. Fmally, significance tests for
skewness and kurtosis of the regression residuals are
performed using the results of Kiefer and Salmon [11]. It
is hoped that these tests will convey a consistent message.
4. Specification Testing.
We've seen that the linear and log-linear models can be
considered as special cases of the Box-Cox transformed
family of models. In [9], Godfrey et al examine various
tests of the linear and log-linear models and conclude that
'~.RESET appears to be most useful in combining
relatively
good power with simplicity of computation, and is therefore
to be recommended",
Ramsey's RESET(p ) test amounts to augmenting the
original regressors with powers of the predicted values for
the null model (either the linear,or log-linear models) and
tests the joint significance of the added variables. The "p"
refers to the number of powers used. The macro
BOXCO X just includes RESET(2) now.
Elasticities are usually measured at the mean values of y
and "i' For the log-linear model, where .\ = 0 and the x's
are also transformed, the elasticity is constant at Pi'
3. Residual Checking.
The RESET test does not require a specific alternative. In
an earlier paper, Davidson and MacKinnon [5] derived
Lagrange Multiplier tests for the linear and log· linear
models against the more general Box-Cox model.
However, Godfrey et aJ included the Davidson-MacKinnon
tests in their Monte Carlo analysis and concluded that "...
they can be very unreliable when disturbances are flot
nonnally distributed". Consequently, these tests were not
included in BOXCOX.
The hypothesis that the transformed model has
(approximately) normal errors is checked by the
BOXCOX macro in two ways. Graphically, a frequency
polygon, which is a smoothed histogram - see [15J, of the
residuals is plotted. A frequency poIygon is constructed by
joining with a straight line the mid-bin values of a
histogram. Scott [15] shows that the frequency polygon
dominates the histogram in terms of integrated mean
square error. The optimal bin width is found to be
2.15sn·1/ 5, where s is an estimate of the standard deviation
from a sample of size n.
5. Other Issues.
The Anderson·Darling test for normality is also
performed. The Anderson-Darling test belongs to the class
of tests known as EDF tests since they are based on the
empirical distribution function. The Anderson-Darling test
is the EDF test recommended by D'Agostino et al [41,
page 372. The statistic is calculated by first ordering the
sample values in increasing order and standardizing to
produce Xci») i=l,n. Let 2i == F(Xc,), where F is the
standard normal cdC. Then defme
Other issues related to the Box-Cox transformation are:
(i) it's usefulness in forecasting,
(ii) the effects of autocorrelation and
heteroskedasticity on the estimation of >...
(iii) the effects of truncation.
(I)
1340
The question of how useful the Box-Cox
transformation is for producing forecasts is
addressed by Nelson and Granger in [13]. Using
twenty-one actual economic time series, they
concluded that ..... using the Box-Cox transform
does not consistently produce superior forecasts . ...
A main problem found was that no value of A
appeared to produce normally distributed data and
so the maximum likelihood procedure was
inappropriate." Nelson-Granger fit univariate
ARIMA models to generate their forecasts. In
[17], Spitzer performs variuus Monte Carlo
experiments to study the small sample properties
of parameter estimates in Box-Cox transformed
models. This approach differs from that of
Nelson-Granger by using simulated data and
regression models to generate forecasts. Spitzer's
conclusions also diffeCThe use of the Box-Cox
model for forecasting purposes is most promising.
The relative errors of forecast are small and the
forecasts are unbiased and of remarkably small
variance. The good performance of the model for
forecasting seems to hold true even when the
parameters are poorly estimated."
(ii)
(iii)
The macro has two required parameters, LHS = name of
the dependent variable, and RHS = naID'es of the
independent variables. The other arguments all have
defaults :
DATA = last
-TITLE ~
-L ~ -2
-U = 2
)NCR =.1
TX == no
- LOGTOL = 0.1
=OLS = yes
LOGLIN = yes
-ElAST = yes
-lSTOBS = 1
-OBS = 0
-WS = 100
-GRAPH
%include 'adstat.mac';
%inc1ude 'freqpoly.mac';
%include 'boxcox.mac';
%boxcoxlDATA=data1, _ L=-3, _ U=3,_INCR=.05,
_LHS~y, _RHS~xl)
It was noted ahove that we need y > 0 in order to
apply a Box-Cox transformation. If
Y<" = (/ - 1)/~ = It f!. + "then y > 0 ~
> -1/A
-~'
References
[1] Bickel,P. and K. Doksum, 'An Analysis of
Transformations Revisited', JASA, June 1981. Vol 76,
No. 374, 296-311.
[2] Box,G.E.P. and D.R. Cox, 'An AnalY'is of
Transformations', 1964, JRSS, Ser. B, Vol.26,
211-243.
[3]
, 'An AnalY'is of
Transformations Revisited, Rebutted', JASA, March
1982, Vol 77, No. 377, 209-210.
(4) D'Agostino,R. and M. Stephens, 'Goodness-of-Fit
Techniques', 1986, Marcel Dekker.
[5] Davidson,R. and J. MacKinnon, 'Testing Linear and
LogIinear Regressions Against Box-Cox Alternatives',
Aug 1985, Canadian J. of Economics, VoLlS, 499-517.
[6) Draper,N.R. and H. Smith, 'Applied Regression
Analysis, Second Edition', 1981, Wiley.
[7} Draper,N.R. and D.R. Cox, 'On Distributions and
Their Transformation to Normality', 1969, JRSS, Ser.
B, VoG1, 472-476.
[8] Fomhy,T. and R. Hill, S. Johnson, 'Advanced
fl if A > 0 and
, < -1/~ -~' f!. if ~ <
yes
The macros ADSTAT, for the Anderson-Darling statistic,
and FREOPOLY, for the frequency polygon. have been
written as separate macros and are called by BOXCOX.
This was done to keep the size of BOXCOX from
becoming unwieldy and also allows ADSTAT and
FREQPOLY to be used independently of BQXCOX.
A sample use of the macro would be:
Seaks and Layson in {16] combine the
estimation of A along with the assumption of
autocorre1ated errors or the presence of
heteroskedasticity. The likelihood functions for
these so-called extended models and algorithms
for obtaining maximum likelihood estimators are
presented. Ignoring either autocorrelation or nonconstant variance can lead to biased estimates.
For example, in a case where A = 1, yet the error
variance V( E:;) is related to E(YJ, the estimate of
A will be biased toward A "" O. This follows from
the property of the log transform to stabilize the
error variance in the presence of heteroskedasticity.
E:
:=
=DEVICE = ps2ega
_RESET ~ 2
_DW "" yes
: Input data set
: Output label
: A lower bound
: A upper bound
: ~ increment
: Transform x's'!
: Default to log
: Estimate OLS
: Estimate log-linear model
: Calculate elasticities
: 1st obs. to use
: Last obs. to use, 0 = all obs.
: IML workspace size
: Frequency polygon
: Graphics device
: Ramsey's RESET(2)
: Durbin-Watson test
o.
Thus, E: is necessarily a truncated normal, see
Poirier [14]. However, Draper and Cox [7} show
that as long as the distribution of E: is reasonably
symmetric and not too badly truncated then the
Box-Cox procedure leads to approximately
consistent estimates of >.. In [17], Spitzer's
simulations showed that truncation did not have
an adverse effect on parameter estimation nor
lead to significant bias.
6. The BOXCOX Macro.
1341
Econometric Methods',1984, Springer-Verlag.
[9] Godfrey,L.G. and M. McAleer, C.R. McKenzie,
'Variable Addition and Lagrange Multiplier Tests for
Linear and Logarithmic Regression Models', August
1988, The Review of Economics and Statistics, Vol.
70 #3, 492-503.
[10] Hinkley,D. and G. Runge<, 'The Analysis of
Transformed Data', June 1984, JASA, Vol.79, No.386,
302-320.
[11] Kiefer,N. and M. Salmon, "'Testing Normality in
Econometric Models', 1983, Economic Letters, Vol.U,
123-127
[12J Linnet,K., 'Testing Normality of Transformed Data',
Applied Statistics, 1988, Vo!.37, No.2, 180-186.
113} Nelson,H. and C.WJ. Granger, 'Experience Using the
Box-Cox Transformation When Forecasting Economic
Time Series', 1979, J.Econometrics, Vo1.10, 57-69.
[14] Poirier,D., ne Use of the Box-Cox Transformation
in Limited Dependent Variable Models', JASA, June
1978, Vol 76, No. 362, 284-287.
[15] Scott,D., ~Frequency Polygons: Theory and
Applications', JASA, June 1985, Vol.80, No.390, 348-
354.
[16] Seaks,T., and S. Layson, 'Box-Cox Estimation with
Standard Econometric Problems', Feb. 1983, Rev. of
Economics and Statistics, No.1, 160-164.
[17] Spitzer,J., 'A Monte Carlo Investigation of the BoxCox Transformation in Small Samples', Sept 1978,
JASA, Vo!.73, NO.363, 488-495.
[18]
,'A Fast and Efficient Algorithm for the
Estimation of Parameters in Models with the Box-andCox Transformation', JASA, December 1982, Vol.77,
No.380, 760-766.
[19]
, 'A Primer on Box-Cox Estimation', The
Review of Economics and Statistics, 1982, Vol.64, 307313.
The sample output that follows uses the example from
Draper and Smith, p.225-235.
The SAS statements submitted are:
%include '.\sugi15\boxcox.mac';
%include '.\sugi15\freqpoly.mac';
%include '.\sugiI5\adstat.mac;
libname in '. \sugilS;
options nosymbolgen nomprint;
%boxcoxldata = in.boxcox,)hs=visc,_rhs=oil filler,
1=-.2, u=.l, incr=.01, title= Using Data from Draper &
Smith Test-the Macr~ )oglin=oo, _device=ftle);
to
Sample Output from Macro BOXCOX
---Macro to F'md BOX-COX Power Transformation-----CF:Draper-Smith (1981) See p.225-235 for Notation--Using Data from Draper & Smith to Test the Macro
Data set used = in.boxcox
Lower limit for lambda = -.2
Upper limit for lambda = .1
Increment for lambda = .01
Dependent variable = visc
Independent Variables = oil filler
Using all of the observations
X Variables oot transformed
OLS model is estimated
Log1inear model is not estimated
Elasticities are computed
Durbin-Watson computed
Do frequency polygon graphs
Graphics output device is ps2ega
# of values of lambda to use = 30
The SAS macros BOXCOX, ADSTAT, :md FREQPOLY
are too long to be included in this paper. Copies of the
macros can be obtained from me at :
----------Linear OLS Model----------Number of Observations = 23
Mean of dependent variable vise =
ERS/USDA, Room 240
1301 New York Ave, NW
Washington, DC 20005
parameter
constant
OIL
FILLER
SAS and SAS/IML are registered trademarks of SAS
Institute, Cary, NC, USA.
beta
28.184
-1.717
1.559
52.348
s.e.
6.332
0.264
0.145
elasticity
4.451
-6.502
10.735
0.000
-0.470
0.932
error variance for Linear OLS model: 191.03009
R-square for Linear OLS model is 0.879
Mean square error for Linear Ol.S model = 13.821364
Durbin-Watson statistic =
1.078
--Ramsey RESET(2) Test for Misspecification of OLS
Model-F = 291.294
1342
Sigoificance = 0.000
---------Transformed Model----------maximum value of LMAX function is -14.78198
corresponding value of lambda is
-0.05
95% c. i. for lambda (using chi-sqr approx) is from
-0.13 to 0.02
"'" =using log transformation since lambda dose to 0= = =
---Macro to find Box-cox Transformation------cf: Draper-Smith (1981) See p. 225-235 for Notation--Plot of !.MAX Fen See DRAPER-SMITH, Fig.5.2,p.230
Horizontal Line is at .5*chisqr(1,.95)
Use to Select 95 % Confidence Interval for Lambda
By Dropping Vertical Line from Intersection of Curve &
Line
Using Data from Draper & Smith to Test the Macro
Plot of LMAX*LAMBDA. Symbol used is "'''.
Plot of TEST05*lAMBDA. Symbol used is '-'.
Estimation Results for Transformed Model
-14+
I
I
Number of observations = 23
Mean of transformed dependent variable vise = 3.7268039
Mean of untransformed dependent variable vise = 52.34
Elasticities computed at means of untransformed data
I
I
****
I
-15+
I
Results using (inconsistent) OLS formula for std errors
of parameters
I
I
I
I
parameter
beta
-16+
L I
s.e.
constant 3.212206
OIL
·0.031518
FILLER 0.030884
0.023295
0.000971
0.000534
MI
A I
137.892126
-32.453198
57.818343
X I
I
-17+
I
Results using (consistent) BHHH method for estimating
information matrix
I
I
I
I
-18+
parameter beta
constant 3.21
OIL
-0.03
FILLER 0.03
s.e.
0.0276 116.33 0.00
0.0010 -29.86 -0.55
0.0006 48.41 1.17
standard error for lambda =
I
I
I
I
elasticity Mean of X
1.00
14.34
31.30
I
-19+
I
I
0.056456
I
I
approximate 95% c.i. for lambda is from ..Q.162 to 0.0629
error vat. (sigma-squared) for transformed model: 0'(X)258
std error of regression for transformed model: 0.0508
std error for estimate of sigma-squared = 0.001239
R-square for transformed model is 0.995
I
-20+
*
--- +--------- +--------- +--------- + --------- +-------- +--------. + -0.20
Durbin-Watson statistic =
1.666
1343
-0.15
-0.10
-0.05
LAMBDA
0.00
0.05
----SCOIT'S FREQUENCY POLYGON----cf: Frequency Polygons: Theory and Application
by D. Scott, JASA, June 1985, 348-354
Anderson-Darling Normality Test for Residuals
Data used is : OlS Model Residuals
Output device is ps2ega
bin width"" 1.1484
xmin '" -1.183 xmax = 2.6784
Number of bins =
4
cf: Testing Normality of Transformed Data
by K. Linnet, Applied Statistics, 1988,37, No.2, 180-186
Data series used : OLS Residuals
Data not transformed by Box-Cox lambda
Statistics for tested series
II'Rl!I:QU""CY
-'11
.
---
l'OLTGQJII
A
-.
-,
. . .
Length:
23
Mean: 14E-15
SId dey : 13.178
Skewness: 1.134 chi-square = 3.758 sig. = 0.0526
Kurtosis: 0.855 chi-square = 0.153 sig. = 0.6958
chi-square = 3.911 sig. = 0.1415
Joint Test:
. . .
.
The value for the unadjusted Anderson-Darling statistic is
0.8886
p-value for test is between 0.025 and 0.01
Figure 1 OLS Residuals
-----SCOIT'S FREQUENCY POLYGON---Data used is : Box Cox Model Residuals
Output device is ps2ega
bin width = 1.1484
xmin '" -1.707 xmax = 1.7152
Number of bins =
3
II'RBOU£NCY
Anderson-Darling Normality Test for Residuals
Data series used: Box Cox Residuals
Data has been transformed by Box-Cox lambda
Statistics for tested series
Length:
23
Mean: -7E-16
std dev : 0.0485
Skewness: 0.043 chi-square =:: 0.005 sig. = 0.9418
Kurtosis: -0.904 chi-square = 0.743 sig. = 0.3886
chi-square = 0.749 sig. = 0.6877
Joint Test:
POLYGoN
The value for the adjusted Anderson-Darling statistic is
0.41
p-value for test is greater than .20
Figure 2 Box-Cox Residuals
1344