Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Instrumental variables estimation wikipedia , lookup
Choice modelling wikipedia , lookup
Regression toward the mean wikipedia , lookup
Time series wikipedia , lookup
Forecasting wikipedia , lookup
Linear regression wikipedia , lookup
Data assimilation wikipedia , lookup
Detecting Structural Change Using SAS®/ETS Procedures Archie J. Calise, Queensborough College of the City University of New York Joseph Earley, Loyola Marymount University, Los Angeles All great natures delight in stability ... Emerson ABSTRACT This paper illustrates how the SAS System may be used to test for structural change in a time series. PROC AUTOREG with the CHOW option are used to perform a Chow test for structural stability on airline passenger travel time series data before and after the September 11, 2001 terrorist incident. The paper illustrates the ease of use of this procedure in investigating the structural change of a time series. Also illustrated in the paper is the Time Series Forecasting System, which is a modular option under Solutions> Analysis>Time Series Forecasting System. The TSFS may be use to fit numerous pre-selected and other models based on a user-selected criterion such as Rsquare or mean square forecast error. INTRODUCTION The analysis of time series is an important statistical methodology with principal contributions developed by academic disciplines such as biometrics, economics and sociology. Recent developments in arima modeling, spectral analysis and X12 methodology have added immensely to the usefulness of the methodology. This paper illustrates how a time series may be examined for structural stability using an application of the F-test called the Chow test. Figure 1 is a time series of monthly total passenger traffic volume from the Los Angeles International Airport from January, 1993 to April, 2006. The 105th observation, the monthly passenger traffic for September, 2001, shows a marked numerical drop from 6,624,720 to 3,593,455. This of course may be attributed to the terrorist event of September 11 of that year. This paper uses the SAS/ETS procedure PROC AUTOREG with the CHOW option to test whether or not there was a change in the structural stability of this time series. The following question is addressed: Was the monthly drop at the 105th observation a one-time phenomenon, or did the terrorism event permanently effect the evolution of the time series? LAX Figure 1 6500 6000 5500 5000 4500 4000 3500 3000 0 50 100 150 200 time REGRESSION ANALYSIS Regression analysis is the study of the relationship between a dependent variable, Y, and one or more independent variables, X's. The SAS System contains numerous procedures which may be used to estimate regression equations. A linear regression equation may be expressed as: Yi = βo + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + μi where Yi Xi βi μi is the dependent variable are the independent variables are the regression coefficients is the error term or residual 1 Regression analysis allows the researcher to determine the influence which each respective independent variable has on the dependent variable, ceteris paribus. In addition, a correctly specified regression model allows for the use of numerous statistical tests and summary statistics, such as R-square, which indicate how well the model fits the data. For most of these tests to be statistically correct, there are a number of implicit assumptions imposed on the model which must be satisfied (Gujarati, 2003). These model assumptions should be tested for validity. Pending results from these tests, there are a variety of econometric methods which may be used to deal with assumption violations. For this study, the regressor used is the index of time. A simple linear regression model was estimated for LAX passenger traffic data for several time windows: pre-911, post-911 and entire data set from January, 1993 through April, 2006. CHOW TEST FOR STRUCTURAL STABILITY1 In order to test for the stability of a relationship between a dependent variable and the explanatory regressor, in our example, time, SAS includes an option of the PROC AUTOREG MODEL statement called CHOW - which allows the researcher to select the potential breakpoint of the relationship which we desire to test. If there is no structural change, we would expect that the estimated residuals from a regression using the entire data would not differ from the combined residuals from two regressions using each subset of the data. A large difference between the sets of residuals would indicate that there has been a break in the data - i.e. a structural change has occurred. From a statistical perspective, the null hypothesis for the CHOW test is that the subset regression slope coefficients, β 1 and β 2 are equal, and thus the subsets can be viewed as one dataset. Alternatively, the intervention has changed the nature of the relationship. Ho: β1 = β2 conditioned on the equality of the sample error variances where the two subset regressions are: y1 = X1 β 1 + u1 y2 = X2 β 2 + u2 Chow statistic with n1 observations with n2 observations (u'u - u'1u1 - u'2 u2 ) / k __________________________ (u'1u1 + u'2u2)/ (n1 + n2 -2k) = where u is the residual vector for the entire data set regression, u1 and u2 the residuals for the subset regressions. Chow showed that the sampling distribution of the above statistic is distributed as an F distribution with k degrees of freedom in the numerator and (n1 + n2 - 2k) degrees of freedom in the denominator. Figure 2 illustrates the trend lines for the two subset regressions. Casual observation indicates that a downward shift has occurred, but whether or not the slopes have remained is not readily apparent. Figure 2 6500 6000 LAXmod 5500 5000 4500 4000 3500 3000 0 20 40 60 80 100 120 140 160 180 time Following are the regression equations and summary statistics for the two subset regressions using JMP® 1 Equations for this section follow the nomenclature of SAS/ETS Chapter 10 THE AUTOREG Procedure, p. 441. 2 visualization software from the SAS Institute. 1st time period: january, 1993 through august, 2001 LAX = 3909.6486 + 18.831451 time Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.618506 0.614765 448.3274 4898.3 104 Analysis of Variance Source Model Error C. Total DF 1 102 103 Sum of Squares 33238864 20501738 53740602 Mean Square 33238864 200997.44 F Ratio 165.3696 Prob > F <.0001 Parameter Estimates Term Intercept time Estimate 3909.6486 18.831451 Std Error 88.56214 1.464387 t Ratio 44.15 12.86 Prob>|t| <.0001 <.0001 2nd time period: september, 2001 through april, 2006 LAX = 2771.0301 + 15.159067 time Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.189842 0.174839 515.4497 4779.606 56 Analysis of Variance Source Model Error C. Total DF 1 54 55 Sum of Squares 3361935 14347175 17709110 Mean Square 3361935 265688 F Ratio 12.6537 Prob > F 0.0008 Parameter Estimates Term Intercept time Estimate 2771.0301 15.159067 Std Error 568.8366 4.261516 t Ratio 4.87 3.56 Prob>|t| <.0001 0.0008 Figure 3 illustrates the regression line for the entire time series. Figure 3 6500 6000 LAXmod 5500 5000 4500 4000 3500 3000 0 20 40 60 80 100 time 3 120 140 160 180 Complete time period: january, 1993 through april, 2006 LAX = 4469.6784 + 4.8084317 time Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.109664 0.104029 636.798 4856.757 160 Analysis of Variance Source Model Error C. Total DF 1 158 159 Sum of Squares 7891665 64070855 71962519 Mean Square 7891665 405512 F Ratio 19.4610 Prob > F <.0001 Parameter Estimates Term Intercept time Estimate 4469.6784 4.8084317 Std Error 101.1604 1.089986 t Ratio 44.18 4.41 Prob>|t| <.0001 <.0001 Following is the SAS 9.1 code, ODS and regular output for performing the Chow test at observation 105. ods html; ods graphics on; proc autoreg data = LAX_data; model traffic = time / chow = (105 ); run; ods graphics off; ods html close; The SAS System The AUTOREG Procedure Dependent Variable traffic Ordinary Least Squares Estimates SSE 64070854.5 MSE 405512 SBC 2528.26289 Regress R-Square 0.1097 Durbin-Watson 0.7284 158 DFE Root MSE 636.79804 AIC 2522.11254 Total R-Square 0.1097 Structural Change Test Test Break Point Num DF 4 Den DF F Value Pr > F Structural Change Test Test Break Point Chow 105 Num DF Den DF F Value 2 156 65.41 Estimate Standard Error Pr > F <.0001 Variable DF t Value Approx Pr > |t| Intercept 1 4470 101.1604 44.18 <.0001 time 1 4.8084 1.0900 4.41 <.0001 The AUTOREG Procedure Dependent Variable traffic Ordinary Least Squares Estimates SSE MSE SBC Regress R-Square Durbin-Watson 64070854.5 405512 2528.26289 0.1097 0.7284 DFE Root MSE AIC Total R-Square 158 636.79804 2522.11254 0.1097 Structural Change Test Break Point 105 Test Chow Num DF 2 Den DF 156 F Value 65.41 Pr > F <.0001 Variable DF Estimate Standard Error t Value Approx Pr > |t| Intercept time 1 1 4470 4.8084 101.1604 1.0900 44.18 4.41 <.0001 <.0001 Following is the JMP regression output for Traffic = f( time, post911) . The dummy variable, post911, indicates that ceteris paribus, the 911 attack reduced the volume of traffic by 1585.544 thousands of passengers per month. Summary of Fit Using 911 Dummy variable RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.513364 0.507164 472.2866 4856.757 160 Analysis of Variance Source Model Error C. Total DF 2 157 159 Sum of Squares 36942939 35019581 71962519 Mean Square 18471469 223054.65 F Ratio 82.8114 Prob > F <.0001 Parameter Estimates Term Intercept time post911 Estimate 3935.6792 18.335631 -1585.544 Std Error 88.42163 1.434733 138.9317 t Ratio 44.51 12.78 -11.41 Prob>|t| <.0001 <.0001 <.0001 Following is the JMP regression output for Traffic = f( time, post911, post911interaction) . The post911 interaction 5 variable indicates that the slope of the traffic onto time line has decreased in the post911 subset of data, yet the pvalue of 0.3834 indicates that this reduction is not statistically significant. Summary of Fit Using 911 Dummy variable with interaction RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.515735 0.506422 472.642 4856.757 160 Parameter Estimates Term Intercept time post911 post911interaction Estimate 3909.6486 18.831451 -1138.619 -3.672384 Std Error 93.36523 1.543807 529.8854 4.201509 t Ratio 41.87 12.20 -2.15 -0.87 Prob>|t| <.0001 <.0001 0.0332 0.3834 In addition to exploring the time series with regression analysis, arima methods were used for estimation and forecasting. The SAS system comes with an extremely useful tool called the Time Series Forecasting System. The TSFS may be use to fit numerous pre-selected and other models based on user-selected criterion such as R-square or mean square forecast error. The following results are for the arima model which was selected by the TSFS as the best model, using the root-mean-square error as the criterion. Figure 4 illustrates the forecasting plot derived by using the arima model selected by the TSFS. Figure 4 7500 7000 Predicted Value 6500 6000 5500 5000 4500 4000 3500 3000 0 50 100 150 200 Row Following is the arima model output selected by the time series forecasting system. Model Summary DF Sum of Squared Errors Variance Estimate Standard Deviation Akaike's 'A' Information Criterion Schwarz's Bayesian Criterion RSquare RSquare Adj -2LogLikelihood Stable Yes Invertible Yes 142 6715589.76 47292.8856 217.469275 1587.23791 1602.19008 0.88348054 0.8801983 1587.57794 Parameter Estimates Term AR1,1 AR1,2 MA1,1 MA1,2 MA2,12 Factor 1 1 1 1 2 Lag 1 2 1 2 12 Estimate -0.423738 0.39151725 -0.2021262 0.66064239 0.74626543 Std Error 0.2075185 0.197103 0.173526 0.1634116 0.06976 t Ratio -2.04 1.99 -1.16 4.04 10.70 Prob>|t| 0.0430 0.0489 0.2460 <.0001 <.0001 CONCLUSION The Chow test performed on the LAX airline passenger traffic data indicates that there was a structural break at the 6 105th observation - September, 2001. Since the dummy interaction variable, post911interaction, is not statistically significant (p-value = 0.3834), we may also conclude that this intervention was abrupt with a permanent duration. Follow up decomposition studies also indicate that the seasonal pattern of the time series has remained the same. Thus we find that the various procedures available in SAS/ETS allow the researcher to perform graphically-pleasing sophisticated state-of-the-arts time series analysis with a short learning curve. PROC AUTOREG SYNTAX: PROC AUTOREG options; BY variables; MODEL dependent = regressors / options; HETERO variables / options; RESTRICT equation, ...,equation; TEST equation, ...,equation/ option; OUTPUT OUT = SAS data set options; COPYRIGHT INFORMATION SAS and JMP are registered trademarks of the SAS Institute, Inc. in the USA and other countries. ”Indicates USA registration. Other brand or product names are registered trademarks or trademarks of their respective companies. REFERENCES Chow, Gregory, "Tests of Equality Between Sets of Coefficients in two Two Linear Regressions", Econometrica, vol. 28, no.3, 1960,pp.591-605. Hansen, Bruce E., "The New Econometrics of Structural Change: Dating Breaks in U.S. Labor Productivity", The Journal of Economic Perspectives, Volume 15, Number 4, Fall 2001 pp. 117-128. Gujarati, Damodar (2003), Basic Econometrics, Fourth Edition, New York: McGraw-Hill Irwin, Inc. Los Angeles World Airports website: http://www.lawa.org/lax/volTraffic.cfm. SAS Institute Inc. (1991 and 1993), SAS/ETS Software: Applications Guides 1 and 2, Version 6, First Edition, Cary, N.C.: SAS Institute Inc. CONTACT INFORMATION Joseph Earley Loyola Marymount University Los Angeles, California 90045-2659 Work Phone: 310-338-1887 Fax: 310-338-1950 E-mail: [email protected] 7