Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Regression Applying on Forecast of Time Series and Its Solution on Web Pages — Take the Application on Economy Prediction in Fujian as an Example DING Yuechao SUN Yang Computer Engineering College Jimei University Xiamen 361021, China [email protected] Computer Engineering College Jimei University Xiamen 361021, China [email protected] Abstract—Regression analysis is the statistic method which decides two or more variables’ dependency. In general, the data to be analyzed with regression are not time series. This paper summarized the prediction efficacity in Fujian economy forecast after the shifting regression and trend regression were put forward in 2005 and were applied to predict time series. Meanwhile, the solution of forecast-type regression on web pages is introduced, and the soft is put on the Internet to be used freely by people. As long as user provides data to be analyzed and follows the operating steps, any time series variable’s future value would be predicted. Some economy indexes in the following two years in Fujian are forecasted, all of which are to be verified by the practical economy results. Keywords- forecast; time series; regression; web page I. INTRODUCTION Forecast means predicting what will happen in the future based on accurate statistical data, by combining scientific methods with the perspective of history, actual conditions and objective laws. The data generated by the development of economy business are time series data. The future values of time series can be predicted. The time series data change with time, and hence is a time function. The time series analysis aims to discover the pattern of indicator changes over time. In general, for single indicator forecast, time series smooth method is applied. It includes these methods: shifting average, index smooth, difference index smooth, self-adaptive filtering, linear model, multinomial model, exponential curve model, revision exponential curve model, growth curve model and seasonal variation [1] [2]. However, the generating of time series data is the synthetical result of multiple factors. By using simple regression, neither the linear extension nor the curved extension of a single factor can accurately predict the new value. By using multivariate regression [3], the value of dependent variable can be achieved only when the simultaneous independent variables are known. In general, the values of independent variables and dependent variable are produced simultaneously. Therefore, it is not necessary to predict the dependent variable’s value while it is known. Meanwhile, in regression analysis, the sequence of samples’ values is not considered in the model. In other words, there is no method of sequential regression in math. So it’s necessary to improve ordinary multivariate regression analysis to make it applicable to prediction or trend analysis. In 2005, we put forward a new method called “Multifactor Forward Shifting Regression” and published the paper “A New Method—Forward Shifting Regression and Its Application in Prediction of Per Capita GDP in Fujian” [4]. The macro-economic indicators in Fujian were predicted. In another paper “Some Statistic Characteristics of Economy in Fujian Province and its Developing Trend” [5], ten economic indicators of Fujian in the next two years were forecasted and some statistic characteristics were analyzed. The latter was republish by the Renmin University’s copied paper and journal material “Statistics and actuary” which is an authoritative periodical in China. In the following two years after the economy data came out, the forecast was proved to be satisfactory. In 2006 in Fujian, the Gross output Value of Agriculture, Forestry, Animal husbandry and Fishery (GVAFAF) is 1535 (1450, predicted value outside the bracket, actual value inside the bracket, similarly hereinafter) 100M yuan; the Added Value of Industry (AVI) is 3471 (3312) 100M yuan; the total Retail Sales of Consumer Goods (RSCG) is 2597 (2704) 100M yuan; the Freight TonKilometers (FTK) is 1927 (1904) 100M t-km; the PassengerKilometers (PK) is 540 (525) 100M p-km; the per Capita Annual Income of Rural Households (CAIRH) is 4869 (4835) yuan; the Per Capita GDP (PCGDP) is 21432 (21385) yuan. The forecast two years ahead the year also got approving result. In 2007 in Fujian, the GVAFAF is 1713 (1692) 100M yuan; the AVI is 3986 (4018) 100M yuan; the RSCG is 2993 (3188) 100M yuan; the FTK is 2240 (2084) 100M t-km; the PK is 621 (588) 100M p-km; the CAIRH is 5504 (5467) yuan; the PCGDP is 25346 (25908) yuan. In 2008, the method shifting regression was improved, while a new method “Multi-factor Trend Regression” was put forward. In the new method, the forecast effects of last several periods on the next period are weighted. In recent years, both “Forward Shifting Regression” and “Multi-factor Trend Regression” have got approving results on the application to a dozen of province-level area in China. II. FORECAST METHOD APPLYING REGRESSION ANALYSIS A. Multivariate linear regression When there are multiple independent variables, the linear regression is multivariate linear regression. Suppose x1, x2, …,xp are measurable or controllable variables, and if variable y has a linear relation with x1, x2,…, xp, then n group of data can be collected after n times of experiments: (yi, xi1, xi2, …, xip), i= 1,2,…,n。 Multiple regression analysis models can be represented as [3] y1 = b0 + b1x11 + b2x12 + … + bpx1p +ε1 y2 = b0 + b1x21 + b2x22 + … + bpx2p +ε2 ………… yn = b0 + b1xn1 + b2xn2 + … + bpxnp +εn where b0, b1, b2, …, bp are p+1 undetermined parameters, that is, regression coefficient. εi denotes random factor’s influence on yi during the tests, and it is usually ignored. Through the solution of the linear equation, the values of parameter b0, b1, b2, …, bp can be obtained, and p-variable regression equation is shown below: y= b0 + b1x1 + b2x2 + … + bpxp The goal of establishing regression equation is to utilize it to forecast and control. In reality, the relationship between random parameter y and x1, x2, …, xp cannot be predetermined. Linear regression model is only an assumption before regression equation is solved. Therefore, it needs to be tested statistically. The regression analysis does not consider the order of time. The value of dependent variable is obtained merely based on the linear expression of influence factors. Furthermore, some values of dependent variables, such as economic indicators, can only be obtained on condition that the values of other variables during the same period are known. Therefore, it is not forecast in the strict sense. The true forecast means the values are forecast before actual events occur. B. Shifting regression Based on the view that the values of the indicators at a time is the foundation of next value of the indicator to be predicted, the dependent variable at a later time is a function of all the independent variables at a previous time, and the regression model can be [4] y2 = b0 + b1x11 + b2x12 + … + bpx1p +ε1 y3 = b0 + b1x21 + b2x22 + … + bpx2p +ε2 …… …… …… yn = b0 + b1xm1 + b2xm2 + … + bpxmp +εm where m = n – 1. The other parameters are the same as the above mentioned regression mode. This model is one line less than the previous model, and it means one less sample is calculated during data processing, which has little effect on the solution to the equation when a large number of samples are available. The independent variables’ values of the last sample is not calculated in the model but can just be put into the regression equation to predicate the coming period’s dependent variable value. This forecast method is called “Multi-factor Forward Shifting Regression”. Deduced by analogy as shifting 1 period forward, we can shift dependent variable 2 or 3 periods forward. If 2 periods is moved forward to build the equation, the last but one sample (produced in the n–1 period) may be put into the equation to predicate the dependent’s value yn+1 in n+1 period. If 3 periods is shifted forward and the equation is built, the last but two sample (produced in the n–2 period) may also be put into the equation to predicate the value yn+1. As the method shifting 1, 2 and 3 periods forward can reflect the developing trend of the multiple time series, this new method may be called “Multi-factor Trend Regression”. Forecasting yn+1 produces three predicted values. Therefore, the average value can serve as the final predicted value. Weighted average method can also be applied here: the weights of predicted value in the model of shifting dependent 1, 2 and 3 periods forward may be given as 0.5, 0.3 and 0.2 respectively. To predict the next values of dependent, the nearest period sample has the greatest impact on the value to be predicted. III. THE REALIZTION OF SHIFTING REGRESSION ON WEB Over the past several years, some new methods put forward by us are designed as desktop programs. Because scholars have been requesting the software, we designed two web-type programs [6] by JavaEE technology and put them on web, one of which is “shifting regression” [7]. Input the website http://210.34.136.253:8088/forecast/, following the instructor, you may process you data and get the forecast result. The flow chat is as Fig. 1. Figure 1. Flow chart of shifting regression on web During the operating of program on web, the interface of selecting dependent variable and independent variables is shown as Fig. 2, and the data are also displayed on the figure. You may select any variables and samples (records). When you select method, you have three choices: ordinary stepwise regression, shifting forward regression and trend regression. The ordinary regression can only get the functional relationship among dependent and independent variables, and time series prediction can not be done. Both the other two methods can predict the dependent’s value in the new period. The data in Fujian province during 19832009 [8] [9] are processed. Dependent variable y is the per capita GDP. 11 Independent variables x1, x2, ..., x11 are selected, they are: GVAFAF, AVI, RSCG, CPI, FTK, PK, FHS(Freight Handled at Seaports), TGR(Total Government Revenue), TIFA(Total Investment in Fixed Assets), CAIRH and NGR(Natural Growth Rate). All the data from 1983 to 2009 are selected. Figure 2. The interface of selectiong dependent,independents and samples for Fujian data According to the number of samples and variables and the estimated number of variables to be selected into the model, the F-statistic for significance test of entering and removing variable would be determined while doing TABLE I. regression. Here, F1 and F2 are both given as 4.5. Next, the regression coefficients of shifting dependent 1 period forward were got and shown in Table I. COEFFICIENTS OF REGRESSION EQUATION Coefficient b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 value 2700.61 0 2.519 0 3.23 7.52 0 5.396 0 -199.49 The regression equation is obtained and shown below: y = 2700.61 + 2.519× x2 + 3.23× x5 + 7.52 × x6 + 5.396 × x8 -199.49 × x11 That is: PCGDP of next year=2700.61 + 2.519 × AVI of previous year + 3.23 × FTK of previous year + 7.52 × PK of previous year + 5.396 × TGR of previous year -199.49 × NGR of previous year According to the equation, the soft put in the sample data in 2009, and the PCGDP in next year (2010) turn out to be 35495 yuan. The predicted results of PCGDP from 1984 to 2009 are shown in Table II. The errors in the first decade are comparatively high; especially the errors are as high as 50% in the first two years. From 1994 to 2009, the errors are small, with all the errors are below 3.41% and most of them are about 1%. The results are quite good. TABLE II. COMPARISON OF ESTIMATED VALUES AND REAL VALUES 1994 1995 1996 Real Predicted Error value Value Year % (yuan) (yuan) 591 342 42.19 1997 737 348 52.78 1998 809 965 19.27 1999 999 1065 6.57 2000 1349 1181 12.43 2001 1589 1629 2.53 2002 1763 1820 3.21 2003 2041 2051 0.49 2004 3072 20.13 2557 2005 3556 3956 11.24 2006 5194 0.02 5193 2007 6526 6688 2.48 2008 7646 7557 1.17 2009 IV. THE FORECAST OF MACROECONOMIC INDICATORS IN Year 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 Real Predicted value Value Error % (yuan) (yuan) 8775 8532 2.77 9603 9276 3.41 10323 10251 0.70 11194 11127 0.59 11892 11865 0.23 12938 13126 1.46 14333 14295 0.26 16469 16558 0.54 18861 1.15 18646 21385 21606 1.03 25361 2.11 25908 30123 29682 1.46 33051 33589 1.63 THE FOLLOWING TWO YEARS There are twelve indicators in the data of Fujian province. In fact, any of these indicators can be dependent variable, with the others serving as independent variables to forecast the dependent's value in 2010. If we do this for all TABLE III. Year GVAFAF 100M yuan 2010 2011 2150 2228 the variables, then all the estimated values of indicators can be obtained. This paper selects the method “Forward Shifting Regression” to forecast the values in 2010. NGR doesn’t vary much each year and therefore it will not be discussed for now. Directly put the above predicted values in 2010 in the equation, and the values in 2011 can be predicted. If the predicted values in 2010 are put into the data table and the equation is rebuilt, the values in 2011 can also be forecasted. In this paper, the values of economic indicators in 2010 and 2011 in Fujian is predicted and shown in table III. The values in 2011 are predicted by the method “Multi-factor Trend Regression”. V. “Multi-factor Forward Shifting Regression” and “Multifactor Trend Regression” are designed as web applying software and put on the Internet. Supported by the forecast of economic indicators in Fujian and other provinces, the method proved to be effective in the past years. Here, the economic indicators in next two years are predicted and to be verified by the practical economy result. The methods improved the previous time series forecasting methods which only involve self-extension without taking multiple factors (variables) into consideration. Besides, they got over the weakness of forecasting by general regression analysis that relies on the simultaneous independent variables. An economy indicator is the function of interrelated economy factors. The indicators (independent variables) in a period may affect the indicator (dependent variable) to be predicted in the next period, which is the basis of the new methods. The new methods brought a new approach to economy forecast. THE FORECAST OF ECONOMIC INDICATORS OF FUJIAN PROVINCE IN 2010 AND 2011 AVI RSCG 100M 100M yuan yuan 5423 4831 5730 5279 CPI % 97 99 FTK 100M t-km 2743 2829 PK 100M p-km 697 681 [5] REFERENCES [1] [2] [3] [4] CONCLUSIONS QIAN Zhongwei, LI Shengda. Quantitative Method of Economic Forecast. Chongqing Press: Chongqing, 1994, pp. 135-192, Brucel L. Bowerman, Richard T. O’Connell. Forecasting and Time Series(3rd Ed.). Brooks/Cole and China Machine Press, 2003, pp.291-427. James M Lattin, J Douglas Carroll, Paul E Green. Analyzing Multivariate Data. Brooks/Cole and China Machine Press, 2003, pp. 38-80. SHEN Jun, DING Yuechao, “A NewMethod of Forward Regression Ana lysis and Its Application in Prediction of Per Capita GDP in Fujian Prov ince”, Journal of Jimei University(Nature Science), 2006, vol. 11, No. 4, pp. 375-380. [6] [7] [8] [9] FHS 10000 tons 28681 31810 TGR TIFA 100M yuan 100M yuan 2073 2317 7539 8942 CAIRH PCGDP yuan yuan 6963 7464 35495 39322 SHEN Jun, DING Yuechao, “Some Statistic Characteristics of Economy Development in Fujian and its Developing Trend”, Journal of Jimei University(Philosophy and Social Science), 2006, vol. 9, no. 4, pp. 38-43. DING Yuechao, “Announcement on Sharing Two Softwares”, EB/OL,http://210.34.136.253:8488/contribute.htm, 2010-04-18 DING Yuechao, “Forecast-type Regression(Shifting Regression) software”, http://210.34.136.253:8088/forecast/, 2010-04-18. Bureau of Statistics of Fujian, Fujian Statistics Yearbook 2009. China Statistics Press, Beijing, 2009. Bureau of Statistics of Fujian, “Statistical Bulletin 2009 of Fujian National Economy and Society Development”, EB/OL, http://www.fujian.gov.cn/zwgk/tjxx/tjgb/201003/t20100306_197570. htm, 2009-02-25.