Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Short-term Forecasting Method for Regional Logistics Demand Based on Support Vector Regression* Xiao Jianhua 1,2 Tang Jun 1 (1.School of Management, Wuyi University, Jiangmen, Guangdong 529020, China; 2. School of Economic Management, Beihang University, Beijing 100083, China) Abstract A short-term forecasting method for regional logistics demand based on support vector regression (SVR) is proposed to solve the problems caused by nonlinearity and fluctuation in forecasting regional logistics demand. Because the proposed method is based on the research of statistical learning theory and kernel mapping, it has good processing ability for nonlinear data. Consequently, it has good generalization ability. At last, an experiment is conducted to validate the rationality and validity of the proposed method. Keywords Support Vector Regression (SVR); Regional Logistics; Forecasting; Kernel Method 1 Introduction Regional logistics demand forecasting not only is an important tache in process of regional logistics system planning and logical logistics resource collocation but also offers necessary decision support for logistics industry development policies constituted by local government and logistics infrastructure construction. Compared with national macro logistics demand, regional logistics demand has own characteristics. Firstly, there is strong nonlinearity between logistics demand quantity and various influencing indexes. Secondly, regional logistics demand quantity is more fluctuant. So regional logistics demand forecasting is becoming a complex research problem. Support Vector Regression (SVR) is likely to provide an effective method for short-term forecasting of regional logistics demand. In the paper, possible influencing factors are analyzed firstly, then short-term forecasting model for regional logistics demand based on SVR is established. The new model applied in Shanghai City indicates that applying SVR in short-term forecasting of regional logistics demand is feasible in theory and acceptable in forecasting precision. 2 Support Vector Regression Model SVR is a main form of kernel method. It has excellent nonlinear regression capacity [1]. In nature, kernel method can achieve nonlinear transform from data space to feature space. Different kernel function can satisfy different nonlinear transform. Suppose xi and x j are two samples in data space, the base of kernel method is to achieve dot multiplying transform of vector ( xi ⋅ x j ) → K ( xi , x j ) = ϕ ( xi ) ⋅ ϕ ( x j ) (1) generally, nonlinear transform function ϕ (⋅) from data space to feature space is complex but its corresponding kernel function K (⋅) is simple relatively. There are dot multiplying operations in many data processing problem, so kernel method has broad applying foreground. The superiority of kernel method is that nonlinear problem which is difficultly or not effectively processed in data space is easily or effectively processed in feature space. The attractive characteristic of kernel method is that it only needs relatively simple kernel function operation because mapping from data space from feature space is very complex. In formula (1), functions which satisfy Mercer condition can be kernel function in theory. For example, polynomial kernel function, RBF kernel function, Sigmoid kernel function and so on. For sample gather ( X , y ) = {( X 1 , y1 ), ( X 2 , y 2 ), ⋅⋅⋅, ( X i , yi )}, ( X i , yi ) ∈ R n × R , in nature, linear * This project was supported by the Guangdong Natural Science Foundation (06029822) and China Postdoctoral Science Foundation (2005038042). 710 multi-regression is to solve weight coefficient w in equation y = Xw + σ (2) so as to σ is minimum to some measure standard. Considering need of regression function, on the condition of ε -insensitive loss function, linear regression problem can be transformed into following optimizing problem: l 1 (3) min Φ( w, ξ * , ξ ) = ( w ⋅ w) + C ∑ ( ξi + ξ i* ) i = 1 2 in formula (3), l is number of sample, C is fixed value, used to split the difference regression error and function characteristic. Above formula can be transformed into following optimizing problem [2]: l ( 1 ) max W (α , α ) = ∑ α i ( yi − ε ) − α i ( yi + ε ) − * i =1 * l ( ∑ αi − αi 2 i , j =1 * )(α * j −α j )(x ⋅ x ) i j (4) 0 ≤ α i ≤ C , i = 1, 2, ⋅⋅ ⋅, l * s.t . 0 ≤ α i ≤ C , i = 1, 2, ⋅ ⋅⋅, l (5) l * ∑ (α i − α i ) = 0 i =1 For nonlinear regression, kernel function can be introduced, optimizing problem formula (4) is transformed as l ( ) max W (α , α ) = ∑ α i ( yi − ε ) − α i ( yi + ε ) − * i =1 * 1 l ( ∑ αi − αi 2 i , j =1 * )(α * j ) − α j K ( xi , x j ) (6) restriction of formula (6) is the same as formula (5). On the condition of restriction formula (5), parameter α i and α i* can be gained by solving formula (6).On the basis of KKT condition, few of α i and α i* unequal to 0. The samples ( X i , yi ) corresponding with these α i and α are called support vectors. Finally, regression function is * i f ( x ) = ∑ (α i − α i ) K ( xi , x ) + b * (7) SVs thereinto b=− 1 ∑ (α i − α i )[ K ( xr , xi ) + K ( xs , xi )] 2 SVs * (8) xr , xs are anyone in two-class support vectors. Formula (7) is SVR model. We can find that it has two advantages from above discussion. Firstly, it is based on kernel mapping which ensures its nonlinear processing ability. Secondly, it is based on statistical learning theory, so it synthetically takes fitting error and function characteristic of regression model into account, its good generalization ability is also assured. 3 SVR Model Applied in Short-term Forecasting of Regional Logistics Demand 3.1 Analysis of economic influencing factors to regional logistics demand There are many economic influencing factors to regional logistics demand, but the main factors are regional economy scale, industry structure and economy space distribution [3]. Firstly, the whole level and scale of regional economy development are ultimate decisive factor of regional logistics demand. The gross level of regional economy is higher, the demand of raw material, semi-manufactured goods and manufactured goods is higher. Regional economy increase faster, regional economy is more active, regional logistics demand increase faster. If regional economy stagnates or reverse, regional logistics demand will lack or fall. So logistics industry has something to do with civil economy development. Secondly, industry structure has influence on regional logistics demand. Different industry structure will affect logistics demand function, logistics demand level and logistics demand quantity obviously. 711 Logistics demand brought by agriculture, forest industry and graze industry is extensive, quantity is big but additive value is low. Logistics demand of industry is developing towards direction of refining and high additive value. The third industry such as currency industry directly promotes development of modern logistics industry. Finally, asymmetric regional economy space distribution, unbalanced regional economy development directly causes logistics demand externally. Especially in China, distribution of every energy sources and some natural resource is highly unbalanced. Majority of natural resource distribute in northeast, northwest, southwest and north area in which productivity is relatively lower, but central and littoral area in which population is big and productivity is high quite lack resource. Therefore, a industry distribution that basal industry such as raw material excavating, coarse machining keeps away from machining industry forms in geographical location. The result is logistics span in time and space very big. 3.2 Data acquirement and pretreatment Table1 Statistical Data of Logistics Demand Scale and Economic Factors in Shanghai City (1978~2004) Indexes X1 X2 X3 X4 X5 X6 y Year 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 11.00 11.39 10.10 10.58 13.31 13.52 17.26 19.53 19.69 21.60 27.36 29.63 32.60 33.36 34.16 38.21 48.59 61.68 71.58 75.80 78.50 80.00 83.20 85.50 88.24 90.27 96.70 211.05 221.21 236.10 244.34 249.32 255.32 275.37 325.63 336.02 364.38 433.05 466.18 482.68 551.34 677.39 900.33 1143.24 1409.85 1582.50 1754.39 1847.20 1953.98 2163.68 2355.53 2564.69 2977.61 3788.20 50.76 53.83 65.69 69.84 74.44 82.97 98.22 121.59 135.12 159.48 187.89 200.73 241.17 309.07 402.77 573.07 780.09 991.04 1248.12 1530.02 1762.50 2000.98 2304.27 2509.81 2755.83 2976.30 3565.30 54.10 68.28 80.43 88.73 89.80 100.68 123.72 173.39 196.84 225.25 295.83 331.38 333.86 382.06 464.82 624.30 770.74 970.04 1161.30 1325.21 1471.03 1590.38 1722.27 1861.30 2035.21 2220.41 2454.60 30.26 38.78 45.06 41.50 38.93 41.40 44.00 51.74 52.04 59.96 72.45 78.48 74.31 80.44 97.57 127.32 158.67 190.25 222.63 247.64 313.44 386.04 547.10 608.98 726.64 1098.68 1568.00 442 512 553 585 576 615 726 1030 1190 1298 1680 1845 2009 2421 2842 4162 5343 6712 7742 8699 9202 10328 11546 12562 14295 14867 16683 19645 19613 20037 20150 21153 21594 23121 24243 26671 27241 27832 27666 26777 27558 29580 30293 28585 27571 45821 45938 46230 48398 52206 54049 56652 61073 62986 In order to validate the forecasting model, we make a forecast for logistics demand in Shanghai City. Due to statistical data is not easy to acquire, we only select freightage quantity as index of logistics demand scale. According to economic influencing factors analyzed above paragraph, taking maneuverability as principle, we select following economic indexes: production value of the first industry, production value of the second industry, production value of the third industry, total amount of regional retail, total amount of regional foreign trade and average resident consumption level. Thereinto, production value of three industries not only take total amount of regional economy into account, but also take effect which economy structure in Shanghai City has on. Business currency is also an 712 important part of regional logistics demand, so total amount of regional retail and average resident consumption level are taken into account. Moreover, considering countrywide foreign trade haven status of Shanghai, extroverted haven logistics account for a majority of the whole logistics demand, so we take total amount of regional foreign trade into account. Finally, correlative indexes data for freightage quantity forecasting from 1978 to 2004 in Shanghai City are collected in table 1[4,5]. Meanings of X1 to X6 in table 1 as follows: production value of the first industry (a hundred million RMB), production value of the second industry (a hundred million RMB), production value of the third industry (a hundred million RMB), total amount of regional retail (a hundred million RMB), total amount of regional foreign trade (a hundred million dollar), average resident consumption level (RMB). y means freightage quantity(ten thousand ton). 3.3 Correlation analysis and index selection We can find in table 1 that freightage quantity data from 1994 to 1996 belong to outlier data. In order to obtain better forecasting effect, we firstly process these data using exponential smoothness, then make correlation analysis between periodic ratio of these factors and freightage quantity. The result is in table 2. Table2 Correlation Coefficient between Freight Amount (t) and Economic Factors (t-k) Economic Factors Data of Year t-k Correlation Coefficient with Freight Amount of Year t k=1 k=2 k=3 k=4 k=5 The First Industry 0.2810 0.0269 -0.2376 -0.3042 -0.3721 The Second Industry 0.4199 0.4462 0.4702 0.0880 -0.3749 The Third Industry 0.5581 0.4844 0.5419 0.3213 -0.3020 Total Amount of Regional Retail 0.3044 0.1193 0.2521 0.0748 -0.5503 Total Amount of Regional Foreign Trade 0.1524 0.1567 0.2485 -0.2872 -0.6243 Average Resident Consumption Level 0.4563 0.2178 0.3646 0.2207 -0.5587 Freightage Quantity 0.4098 0.1646 0.1766 -0.3918 -0.3219 We can make a conclusion from table 2: the third industry contributes most to steady increase of freightage quantity, the second industry next, and they go into effect inside three years. Contribution of the first industry and total amount of regional retail concentrates on former year, and contribution of total amount of regional foreign trade is not big. As far as resident consumption level is concerned, relativity between freightage quantity of year t and resident consumption level of year t-1 is the biggest. Finally, there is definite relativity between freightage quantity of year t and freightage quantity of year t-1. Take correlation coefficient which is bigger than 0.4 as boundary, we ultimately choose four indexes as input of regional logistics demand forecasting model: the second industry (t-1), the third industry (t-1), average resident consumption level (t-1) and freightage quantity (t-1). 3.4 Establishing SVR model and analyzing forecasting effect We choose RBF function 2 2 K ( X 1 X 2 ) = exp( − X 1 − X 2 σ ) (9) in formula (6) as kernel function, practically assuming σ = 15 . In order to explain validity of the method, we divide data into training sample and testing sample firstly. Taking indexes data from 1979 to 1998 as input of training sample and freightage quantity from 1980 to 1999 as output of training sample to establish SVR model. Then taking indexes data from 1999 to 2003 as input of testing sample and freightage quantity from 2000 to 2004 as output of testing sample to test forecasting ability of SVR model. In SVR model, insensitive coefficient ε equals to 0.01, C equals to 20 in formula (3). We use MATLAB7.0 program to achieve above arithmetic, the experiment result is in table 3 and figure 1. In figure 1, real line shows actual freightage quantity from 1978 to 2004 in Shanghai City, asterisk shows SVR forecasting freightage quantity of lately five years. It is obvious that forecasting result is very perfect. Applying the method in forecasting freightage quantity of Shanghai City in 2005, forecasting , 713 increase rate is 5.28%, forecasting value of freightage quantity in 2005 is 66309 ten thousand ton. Table3 Comparison of Forecasting and Actual freightage quantity (ten thousand ton) SVR Forecasting Actual Forecasting Error (%) Year 2000 2001 2002 2003 2004 Freightage Quantity Freightage Quantity 50322 54455 56013 58925 63317 52206 54049 56652 61073 62986 -3.6093 0.7507 -1.1273 -3.5175 0.5250 Figure 1 Comparison of SVR Forecasting Value and Actual Value 4 Conclusion It is a new research to apply SVR in short-term forecasting of regional logistics demand, but it obviously has some advantages. However, research of applying SVR to forecast regional logistics demand just now spreads, so there are many problems to solve farther. There are not only own problem of SVR, but also problem of how to combine logistics demand with economy development Hopefully, these problem will be solved little by little along with the research. References [1] Smola A J, Scholkopf B. A tutorial on support vector regression [R].NeuroCOLT TR NC-TR-98-030,Royal Holloway College University of London, UK, 1998 [2] S Gunn. Support Vector Machines for Classification and Regression[R]. Technical Report. University of Southamptom. 1998 [3] Hou Rui, Zhang Bixi. Regional Logistics Demand Forecasting Method and Application. System Engineering Theory and Practice. 2005,25(12):43~47 [4] State Department Development Research Center. data handbook of Chinese region development (1978~1989). China Finance Economy Press.1992 [5] State Statistical Bureau. China statistical summary (2005). China Statistics Press. 2005 714