Download Detecting Structural Change Using SAS ETS Procedures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Forecasting wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Detecting Structural Change Using SAS®/ETS Procedures
Archie J. Calise, Queensborough College of the City University of New York
Joseph Earley, Loyola Marymount University, Los Angeles
All great natures delight in stability
... Emerson
ABSTRACT
This paper illustrates how the SAS System may be used to test for structural change in a time series. PROC
AUTOREG with the CHOW option are used to perform a Chow test for structural stability on airline passenger travel
time series data before and after the September 11, 2001 terrorist incident. The paper illustrates the ease of use of
this procedure in investigating the structural change of a time series. Also illustrated in the paper is the Time Series
Forecasting System, which is a modular option under Solutions> Analysis>Time Series Forecasting System. The
TSFS may be use to fit numerous pre-selected and other models based on a user-selected criterion such as Rsquare or mean square forecast error.
INTRODUCTION
The analysis of time series is an important statistical methodology with principal contributions developed by academic
disciplines such as biometrics, economics and sociology. Recent developments in arima modeling, spectral analysis
and X12 methodology have added immensely to the usefulness of the methodology. This paper illustrates how a time
series may be examined for structural stability using an application of the F-test called the Chow test. Figure 1 is a
time series of monthly total passenger traffic volume from the Los Angeles International Airport from January, 1993
to April, 2006. The 105th observation, the monthly passenger traffic for September, 2001, shows a marked numerical
drop from 6,624,720 to 3,593,455. This of course may be attributed to the terrorist event of September 11 of that
year. This paper uses the SAS/ETS procedure PROC AUTOREG with the CHOW option to test whether or not there
was a change in the structural stability of this time series. The following question is addressed: Was the monthly
drop at the 105th observation a one-time phenomenon, or did the terrorism event permanently effect the evolution of
the time series?
LAX
Figure 1
6500
6000
5500
5000
4500
4000
3500
3000
0
50
100
150
200
time
REGRESSION ANALYSIS
Regression analysis is the study of the relationship between a dependent variable, Y, and one or more independent
variables, X's. The SAS System contains numerous procedures which may be used to estimate regression equations.
A linear regression equation may be expressed as:
Yi = βo + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + μi
where Yi
Xi
βi
μi
is the dependent variable
are the independent variables
are the regression coefficients
is the error term or residual
1
Regression analysis allows the researcher to determine the influence which each respective independent variable
has on the dependent variable, ceteris paribus. In addition, a correctly specified regression model allows for the use
of numerous statistical tests and summary statistics, such as R-square, which indicate how well the model fits the
data. For most of these tests to be statistically correct, there are a number of implicit assumptions imposed on the
model which must be satisfied (Gujarati, 2003). These model assumptions should be tested for validity. Pending
results from these tests, there are a variety of econometric methods which may be used to deal with assumption
violations. For this study, the regressor used is the index of time. A simple linear regression model was estimated for
LAX passenger traffic data for several time windows: pre-911, post-911 and entire data set from January, 1993
through April, 2006.
CHOW TEST FOR STRUCTURAL STABILITY1
In order to test for the stability of a relationship between a dependent variable and the explanatory regressor, in our
example, time, SAS includes an option of the PROC AUTOREG MODEL statement called CHOW - which allows the
researcher to select the potential breakpoint of the relationship which we desire to test. If there is no structural
change, we would expect that the estimated residuals from a regression using the entire data would not differ from
the combined residuals from two regressions using each subset of the data. A large difference between the sets of
residuals would indicate that there has been a break in the data - i.e. a structural change has occurred. From a
statistical perspective, the null hypothesis for the CHOW test is that the subset regression slope coefficients, β 1 and
β 2 are equal, and thus the subsets can be viewed as one dataset. Alternatively, the intervention has changed the
nature of the relationship.
Ho: β1 = β2 conditioned on the equality of the sample error variances
where the two subset regressions are:
y1 = X1 β 1 + u1
y2 = X2 β 2 + u2
Chow statistic
with n1 observations
with n2 observations
(u'u - u'1u1 - u'2 u2 ) / k
__________________________
(u'1u1 + u'2u2)/ (n1 + n2 -2k)
=
where u is the residual vector for the entire data set regression, u1 and u2 the residuals for the subset regressions.
Chow showed that the sampling distribution of the above statistic is distributed as an F distribution with k degrees of
freedom in the numerator and (n1 + n2 - 2k) degrees of freedom in the denominator. Figure 2 illustrates the trend
lines for the two subset regressions. Casual observation indicates that a downward shift has occurred, but whether or
not the slopes have remained is not readily apparent.
Figure 2
6500
6000
LAXmod
5500
5000
4500
4000
3500
3000
0
20
40
60
80
100
120
140
160
180
time
Following are the regression equations and summary statistics for the two subset regressions using JMP®
1
Equations for this section follow the nomenclature of SAS/ETS Chapter 10 THE AUTOREG Procedure, p. 441.
2
visualization software from the SAS Institute.
1st time period: january, 1993 through august, 2001
LAX = 3909.6486 + 18.831451 time
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.618506
0.614765
448.3274
4898.3
104
Analysis of Variance
Source
Model
Error
C. Total
DF
1
102
103
Sum of Squares
33238864
20501738
53740602
Mean Square
33238864
200997.44
F Ratio
165.3696
Prob > F
<.0001
Parameter Estimates
Term
Intercept
time
Estimate
3909.6486
18.831451
Std Error
88.56214
1.464387
t Ratio
44.15
12.86
Prob>|t|
<.0001
<.0001
2nd time period: september, 2001 through april, 2006
LAX = 2771.0301 + 15.159067 time
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.189842
0.174839
515.4497
4779.606
56
Analysis of Variance
Source
Model
Error
C. Total
DF
1
54
55
Sum of Squares
3361935
14347175
17709110
Mean Square
3361935
265688
F Ratio
12.6537
Prob > F
0.0008
Parameter Estimates
Term
Intercept
time
Estimate
2771.0301
15.159067
Std Error
568.8366
4.261516
t Ratio
4.87
3.56
Prob>|t|
<.0001
0.0008
Figure 3 illustrates the regression line for the entire time series.
Figure 3
6500
6000
LAXmod
5500
5000
4500
4000
3500
3000
0
20
40
60
80
100
time
3
120
140
160
180
Complete time period: january, 1993 through april, 2006
LAX = 4469.6784 + 4.8084317 time
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.109664
0.104029
636.798
4856.757
160
Analysis of Variance
Source
Model
Error
C. Total
DF
1
158
159
Sum of Squares
7891665
64070855
71962519
Mean Square
7891665
405512
F Ratio
19.4610
Prob > F
<.0001
Parameter Estimates
Term
Intercept
time
Estimate
4469.6784
4.8084317
Std Error
101.1604
1.089986
t Ratio
44.18
4.41
Prob>|t|
<.0001
<.0001
Following is the SAS 9.1 code, ODS and regular output for performing the Chow test at observation 105.
ods html;
ods graphics on;
proc autoreg data = LAX_data;
model traffic = time / chow = (105 );
run;
ods graphics off;
ods html close;
The SAS System
The AUTOREG Procedure
Dependent Variable
traffic
Ordinary Least Squares Estimates
SSE
64070854.5
MSE
405512
SBC
2528.26289
Regress R-Square
0.1097
Durbin-Watson
0.7284
158
DFE
Root MSE
636.79804
AIC
2522.11254
Total R-Square
0.1097
Structural Change Test
Test
Break Point
Num DF
4
Den DF
F Value
Pr > F
Structural Change Test
Test
Break Point
Chow
105
Num DF
Den DF
F Value
2
156
65.41
Estimate
Standard
Error
Pr > F
<.0001
Variable
DF
t Value
Approx
Pr > |t|
Intercept
1
4470
101.1604
44.18
<.0001
time
1
4.8084
1.0900
4.41
<.0001
The AUTOREG Procedure
Dependent Variable
traffic
Ordinary Least Squares Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
64070854.5
405512
2528.26289
0.1097
0.7284
DFE
Root MSE
AIC
Total R-Square
158
636.79804
2522.11254
0.1097
Structural Change Test
Break
Point
105
Test
Chow
Num DF
2
Den DF
156
F Value
65.41
Pr > F
<.0001
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
time
1
1
4470
4.8084
101.1604
1.0900
44.18
4.41
<.0001
<.0001
Following is the JMP regression output for Traffic = f( time, post911) . The dummy variable, post911, indicates that
ceteris paribus, the 911 attack reduced the volume of traffic by 1585.544 thousands of passengers per month.
Summary of Fit Using 911 Dummy variable
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.513364
0.507164
472.2866
4856.757
160
Analysis of Variance
Source
Model
Error
C. Total
DF
2
157
159
Sum of Squares
36942939
35019581
71962519
Mean Square
18471469
223054.65
F Ratio
82.8114
Prob > F
<.0001
Parameter Estimates
Term
Intercept
time
post911
Estimate
3935.6792
18.335631
-1585.544
Std Error
88.42163
1.434733
138.9317
t Ratio
44.51
12.78
-11.41
Prob>|t|
<.0001
<.0001
<.0001
Following is the JMP regression output for Traffic = f( time, post911, post911interaction) . The post911 interaction
5
variable indicates that the slope of the traffic onto time line has decreased in the post911 subset of data, yet the pvalue of 0.3834 indicates that this reduction is not statistically significant.
Summary of Fit Using 911 Dummy variable with interaction
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.515735
0.506422
472.642
4856.757
160
Parameter Estimates
Term
Intercept
time
post911
post911interaction
Estimate
3909.6486
18.831451
-1138.619
-3.672384
Std Error
93.36523
1.543807
529.8854
4.201509
t Ratio
41.87
12.20
-2.15
-0.87
Prob>|t|
<.0001
<.0001
0.0332
0.3834
In addition to exploring the time series with regression analysis, arima methods were used for estimation and
forecasting. The SAS system comes with an extremely useful tool called the Time Series Forecasting System. The
TSFS may be use to fit numerous pre-selected and other models based on user-selected criterion such as R-square
or mean square forecast error. The following results are for the arima model which was selected by the TSFS as the
best model, using the root-mean-square error as the criterion. Figure 4 illustrates the forecasting plot derived by
using the arima model selected by the TSFS.
Figure 4
7500
7000
Predicted Value
6500
6000
5500
5000
4500
4000
3500
3000
0
50
100
150
200
Row
Following is the arima model output selected by the time series forecasting system.
Model Summary
DF
Sum of Squared Errors
Variance Estimate
Standard Deviation
Akaike's 'A' Information Criterion
Schwarz's Bayesian Criterion
RSquare
RSquare Adj
-2LogLikelihood
Stable
Yes
Invertible
Yes
142
6715589.76
47292.8856
217.469275
1587.23791
1602.19008
0.88348054
0.8801983
1587.57794
Parameter Estimates
Term
AR1,1
AR1,2
MA1,1
MA1,2
MA2,12
Factor
1
1
1
1
2
Lag
1
2
1
2
12
Estimate
-0.423738
0.39151725
-0.2021262
0.66064239
0.74626543
Std Error
0.2075185
0.197103
0.173526
0.1634116
0.06976
t Ratio
-2.04
1.99
-1.16
4.04
10.70
Prob>|t|
0.0430
0.0489
0.2460
<.0001
<.0001
CONCLUSION
The Chow test performed on the LAX airline passenger traffic data indicates that there was a structural break at the
6
105th observation - September, 2001. Since the dummy interaction variable, post911interaction, is not statistically
significant (p-value = 0.3834), we may also conclude that this intervention was abrupt with a permanent duration.
Follow up decomposition studies also indicate that the seasonal pattern of the time series has remained the same.
Thus we find that the various procedures available in SAS/ETS allow the researcher to perform graphically-pleasing
sophisticated state-of-the-arts time series analysis with a short learning curve.
PROC AUTOREG SYNTAX:
PROC AUTOREG options;
BY variables;
MODEL dependent = regressors / options;
HETERO variables / options;
RESTRICT equation, ...,equation;
TEST equation, ...,equation/ option;
OUTPUT OUT = SAS data set options;
COPYRIGHT INFORMATION
SAS and JMP are registered trademarks of the SAS Institute, Inc. in the USA and other countries. ”Indicates USA
registration. Other brand or product names are registered trademarks or trademarks of their respective companies.
REFERENCES
Chow, Gregory, "Tests of Equality Between Sets of Coefficients in two Two Linear Regressions", Econometrica, vol.
28, no.3, 1960,pp.591-605.
Hansen, Bruce E., "The New Econometrics of Structural Change: Dating Breaks in U.S. Labor Productivity", The
Journal of Economic Perspectives, Volume 15, Number 4, Fall 2001 pp. 117-128.
Gujarati, Damodar (2003), Basic Econometrics, Fourth Edition, New York: McGraw-Hill Irwin, Inc.
Los Angeles World Airports website: http://www.lawa.org/lax/volTraffic.cfm.
SAS Institute Inc. (1991 and 1993), SAS/ETS Software: Applications Guides 1 and 2, Version 6, First Edition, Cary,
N.C.: SAS Institute Inc.
CONTACT INFORMATION
Joseph Earley
Loyola Marymount University
Los Angeles, California 90045-2659
Work Phone: 310-338-1887
Fax: 310-338-1950
E-mail: [email protected]
7