Download A Short-term Forecasting Method for Regional Logistics Demand

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
A Short-term Forecasting Method for Regional Logistics Demand
Based on Support Vector Regression*
Xiao Jianhua 1,2 Tang Jun 1
(1.School of Management, Wuyi University, Jiangmen, Guangdong 529020, China;
2. School of Economic Management, Beihang University, Beijing 100083, China)
Abstract A short-term forecasting method for regional logistics demand based on support vector
regression (SVR) is proposed to solve the problems caused by nonlinearity and fluctuation in forecasting
regional logistics demand. Because the proposed method is based on the research of statistical learning
theory and kernel mapping, it has good processing ability for nonlinear data. Consequently, it has good
generalization ability. At last, an experiment is conducted to validate the rationality and validity of the
proposed method.
Keywords Support Vector Regression (SVR); Regional Logistics; Forecasting; Kernel Method
1 Introduction
Regional logistics demand forecasting not only is an important tache in process of regional logistics
system planning and logical logistics resource collocation but also offers necessary decision support for
logistics industry development policies constituted by local government and logistics infrastructure
construction. Compared with national macro logistics demand, regional logistics demand has own
characteristics. Firstly, there is strong nonlinearity between logistics demand quantity and various
influencing indexes. Secondly, regional logistics demand quantity is more fluctuant. So regional
logistics demand forecasting is becoming a complex research problem. Support Vector Regression (SVR)
is likely to provide an effective method for short-term forecasting of regional logistics demand.
In the paper, possible influencing factors are analyzed firstly, then short-term forecasting model for
regional logistics demand based on SVR is established. The new model applied in Shanghai City
indicates that applying SVR in short-term forecasting of regional logistics demand is feasible in theory
and acceptable in forecasting precision.
2 Support Vector Regression Model
SVR is a main form of kernel method. It has excellent nonlinear regression capacity [1]. In nature,
kernel method can achieve nonlinear transform from data space to feature space. Different kernel
function can satisfy different nonlinear transform. Suppose xi and x j are two samples in data space,
the base of kernel method is to achieve dot multiplying transform of vector
( xi ⋅ x j ) → K ( xi , x j ) = ϕ ( xi ) ⋅ ϕ ( x j )
(1)
generally, nonlinear transform function ϕ (⋅) from data space to feature space is complex but its
corresponding kernel function K (⋅) is simple relatively.
There are dot multiplying operations in many data processing problem, so kernel method has broad
applying foreground. The superiority of kernel method is that nonlinear problem which is difficultly or
not effectively processed in data space is easily or effectively processed in feature space. The attractive
characteristic of kernel method is that it only needs relatively simple kernel function operation because
mapping from data space from feature space is very complex.
In formula (1), functions which satisfy Mercer condition can be kernel function in theory. For
example, polynomial kernel function, RBF kernel function, Sigmoid kernel function and so on.
For sample gather ( X , y ) = {( X 1 , y1 ), ( X 2 , y 2 ), ⋅⋅⋅, ( X i , yi )}, ( X i , yi ) ∈ R n × R , in nature, linear
*
This project was supported by the Guangdong Natural Science Foundation (06029822) and China Postdoctoral
Science Foundation (2005038042).
710
multi-regression is to solve weight coefficient w in equation
y = Xw + σ
(2)
so as to σ is minimum to some measure standard. Considering need of regression function, on the
condition of ε -insensitive loss function, linear regression problem can be transformed into following
optimizing problem:
l
1
(3)
min Φ( w, ξ * , ξ ) = ( w ⋅ w) + C ∑ ( ξi + ξ i* )
i
=
1
2
in formula (3), l is number of sample, C is fixed value, used to split the difference regression error and
function characteristic. Above formula can be transformed into following optimizing problem [2]:
l
(
1
)
max W (α , α ) = ∑ α i ( yi − ε ) − α i ( yi + ε ) −
*
i =1
*
l
(
∑ αi − αi
2 i , j =1
*
)(α
*
j
−α j
)(x ⋅ x )
i
j
(4)

0 ≤ α i ≤ C , i = 1, 2, ⋅⋅ ⋅, l

*
s.t . 0 ≤ α i ≤ C , i = 1, 2, ⋅ ⋅⋅, l
(5)
l *
 ∑ (α i − α i ) = 0
 i =1
For nonlinear regression, kernel function can be introduced, optimizing problem formula (4) is
transformed as
l
(
)
max W (α , α ) = ∑ α i ( yi − ε ) − α i ( yi + ε ) −
*
i =1
*
1
l
(
∑ αi − αi
2 i , j =1
*
)(α
*
j
)
− α j K ( xi , x j )
(6)
restriction of formula (6) is the same as formula (5).
On the condition of restriction formula (5), parameter α i and α i* can be gained by solving formula
(6).On the basis of KKT condition, few of α i and α i* unequal to 0. The samples
( X i , yi )
corresponding
with these α i and α are called support vectors.
Finally, regression function is
*
i
f ( x ) = ∑ (α i − α i ) K ( xi , x ) + b
*
(7)
SVs
thereinto
b=−
1
∑ (α i − α i )[ K ( xr , xi ) + K ( xs , xi )]
2 SVs
*
(8)
xr , xs are anyone in two-class support vectors.
Formula (7) is SVR model. We can find that it has two advantages from above discussion. Firstly, it
is based on kernel mapping which ensures its nonlinear processing ability. Secondly, it is based on
statistical learning theory, so it synthetically takes fitting error and function characteristic of regression
model into account, its good generalization ability is also assured.
3 SVR Model Applied in Short-term Forecasting of Regional Logistics Demand
3.1 Analysis of economic influencing factors to regional logistics demand
There are many economic influencing factors to regional logistics demand, but the main factors are
regional economy scale, industry structure and economy space distribution [3].
Firstly, the whole level and scale of regional economy development are ultimate decisive factor of
regional logistics demand. The gross level of regional economy is higher, the demand of raw material,
semi-manufactured goods and manufactured goods is higher. Regional economy increase faster, regional
economy is more active, regional logistics demand increase faster. If regional economy stagnates or
reverse, regional logistics demand will lack or fall. So logistics industry has something to do with civil
economy development.
Secondly, industry structure has influence on regional logistics demand. Different industry structure
will affect logistics demand function, logistics demand level and logistics demand quantity obviously.
711
Logistics demand brought by agriculture, forest industry and graze industry is extensive, quantity is big
but additive value is low. Logistics demand of industry is developing towards direction of refining and
high additive value. The third industry such as currency industry directly promotes development of
modern logistics industry.
Finally, asymmetric regional economy space distribution, unbalanced regional economy
development directly causes logistics demand externally. Especially in China, distribution of every
energy sources and some natural resource is highly unbalanced. Majority of natural resource distribute
in northeast, northwest, southwest and north area in which productivity is relatively lower, but central
and littoral area in which population is big and productivity is high quite lack resource. Therefore, a
industry distribution that basal industry such as raw material excavating, coarse machining keeps away
from machining industry forms in geographical location. The result is logistics span in time and space
very big.
3.2 Data acquirement and pretreatment
Table1 Statistical Data of Logistics Demand Scale and Economic Factors in Shanghai City (1978~2004)
Indexes
X1
X2
X3
X4
X5
X6
y
Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
11.00
11.39
10.10
10.58
13.31
13.52
17.26
19.53
19.69
21.60
27.36
29.63
32.60
33.36
34.16
38.21
48.59
61.68
71.58
75.80
78.50
80.00
83.20
85.50
88.24
90.27
96.70
211.05
221.21
236.10
244.34
249.32
255.32
275.37
325.63
336.02
364.38
433.05
466.18
482.68
551.34
677.39
900.33
1143.24
1409.85
1582.50
1754.39
1847.20
1953.98
2163.68
2355.53
2564.69
2977.61
3788.20
50.76
53.83
65.69
69.84
74.44
82.97
98.22
121.59
135.12
159.48
187.89
200.73
241.17
309.07
402.77
573.07
780.09
991.04
1248.12
1530.02
1762.50
2000.98
2304.27
2509.81
2755.83
2976.30
3565.30
54.10
68.28
80.43
88.73
89.80
100.68
123.72
173.39
196.84
225.25
295.83
331.38
333.86
382.06
464.82
624.30
770.74
970.04
1161.30
1325.21
1471.03
1590.38
1722.27
1861.30
2035.21
2220.41
2454.60
30.26
38.78
45.06
41.50
38.93
41.40
44.00
51.74
52.04
59.96
72.45
78.48
74.31
80.44
97.57
127.32
158.67
190.25
222.63
247.64
313.44
386.04
547.10
608.98
726.64
1098.68
1568.00
442
512
553
585
576
615
726
1030
1190
1298
1680
1845
2009
2421
2842
4162
5343
6712
7742
8699
9202
10328
11546
12562
14295
14867
16683
19645
19613
20037
20150
21153
21594
23121
24243
26671
27241
27832
27666
26777
27558
29580
30293
28585
27571
45821
45938
46230
48398
52206
54049
56652
61073
62986
In order to validate the forecasting model, we make a forecast for logistics demand in Shanghai City.
Due to statistical data is not easy to acquire, we only select freightage quantity as index of logistics
demand scale. According to economic influencing factors analyzed above paragraph, taking
maneuverability as principle, we select following economic indexes: production value of the first
industry, production value of the second industry, production value of the third industry, total amount of
regional retail, total amount of regional foreign trade and average resident consumption level. Thereinto,
production value of three industries not only take total amount of regional economy into account, but
also take effect which economy structure in Shanghai City has on. Business currency is also an
712
important part of regional logistics demand, so total amount of regional retail and average resident
consumption level are taken into account. Moreover, considering countrywide foreign trade haven status
of Shanghai, extroverted haven logistics account for a majority of the whole logistics demand, so we
take total amount of regional foreign trade into account.
Finally, correlative indexes data for freightage quantity forecasting from 1978 to 2004 in Shanghai
City are collected in table 1[4,5]. Meanings of X1 to X6 in table 1 as follows: production value of the first
industry (a hundred million RMB), production value of the second industry (a hundred million RMB),
production value of the third industry (a hundred million RMB), total amount of regional retail (a
hundred million RMB), total amount of regional foreign trade (a hundred million dollar), average
resident consumption level (RMB). y means freightage quantity(ten thousand ton).
3.3 Correlation analysis and index selection
We can find in table 1 that freightage quantity data from 1994 to 1996 belong to outlier data. In
order to obtain better forecasting effect, we firstly process these data using exponential smoothness, then
make correlation analysis between periodic ratio of these factors and freightage quantity. The result is in
table 2.
Table2 Correlation Coefficient between Freight Amount (t) and Economic Factors (t-k)
Economic Factors Data of Year t-k
Correlation Coefficient with Freight
Amount of Year t
k=1
k=2
k=3
k=4
k=5
The First Industry
0.2810
0.0269 -0.2376 -0.3042 -0.3721
The Second Industry
0.4199
0.4462
0.4702
0.0880
-0.3749
The Third Industry
0.5581
0.4844
0.5419
0.3213
-0.3020
Total Amount of Regional Retail
0.3044
0.1193
0.2521
0.0748
-0.5503
Total Amount of Regional Foreign Trade
0.1524
0.1567
0.2485 -0.2872 -0.6243
Average Resident Consumption Level
0.4563
0.2178
0.3646
0.2207
-0.5587
Freightage Quantity
0.4098
0.1646
0.1766 -0.3918 -0.3219
We can make a conclusion from table 2: the third industry contributes most to steady increase of
freightage quantity, the second industry next, and they go into effect inside three years. Contribution of
the first industry and total amount of regional retail concentrates on former year, and contribution of
total amount of regional foreign trade is not big. As far as resident consumption level is concerned,
relativity between freightage quantity of year t and resident consumption level of year t-1 is the biggest.
Finally, there is definite relativity between freightage quantity of year t and freightage quantity of year
t-1.
Take correlation coefficient which is bigger than 0.4 as boundary, we ultimately choose four indexes
as input of regional logistics demand forecasting model: the second industry (t-1), the third industry (t-1),
average resident consumption level (t-1) and freightage quantity (t-1).
3.4 Establishing SVR model and analyzing forecasting effect
We choose RBF function
2
2
K ( X 1 X 2 ) = exp( − X 1 − X 2 σ )
(9)
in formula (6) as kernel function, practically assuming σ = 15 .
In order to explain validity of the method, we divide data into training sample and testing sample
firstly. Taking indexes data from 1979 to 1998 as input of training sample and freightage quantity from
1980 to 1999 as output of training sample to establish SVR model. Then taking indexes data from 1999
to 2003 as input of testing sample and freightage quantity from 2000 to 2004 as output of testing sample
to test forecasting ability of SVR model. In SVR model, insensitive coefficient ε equals to 0.01, C
equals to 20 in formula (3). We use MATLAB7.0 program to achieve above arithmetic, the experiment
result is in table 3 and figure 1.
In figure 1, real line shows actual freightage quantity from 1978 to 2004 in Shanghai City, asterisk
shows SVR forecasting freightage quantity of lately five years. It is obvious that forecasting result is
very perfect.
Applying the method in forecasting freightage quantity of Shanghai City in 2005, forecasting
,
713
increase rate is 5.28%, forecasting value of freightage quantity in 2005 is 66309 ten thousand ton.
Table3 Comparison of Forecasting and Actual freightage quantity (ten thousand ton)
SVR Forecasting
Actual
Forecasting Error (%)
Year
2000
2001
2002
2003
2004
Freightage Quantity
Freightage Quantity
50322
54455
56013
58925
63317
52206
54049
56652
61073
62986
-3.6093
0.7507
-1.1273
-3.5175
0.5250
Figure 1 Comparison of SVR Forecasting Value and Actual Value
4 Conclusion
It is a new research to apply SVR in short-term forecasting of regional logistics demand, but it
obviously has some advantages. However, research of applying SVR to forecast regional logistics
demand just now spreads, so there are many problems to solve farther. There are not only own problem
of SVR, but also problem of how to combine logistics demand with economy development Hopefully,
these problem will be solved little by little along with the research.
References
[1] Smola A J, Scholkopf B. A tutorial on support vector regression [R].NeuroCOLT TR
NC-TR-98-030,Royal Holloway College University of London, UK, 1998
[2] S Gunn. Support Vector Machines for Classification and Regression[R]. Technical Report.
University of Southamptom. 1998
[3] Hou Rui, Zhang Bixi. Regional Logistics Demand Forecasting Method and Application. System
Engineering Theory and Practice. 2005,25(12):43~47
[4] State Department Development Research Center. data handbook of Chinese region development
(1978~1989). China Finance Economy Press.1992
[5] State Statistical Bureau. China statistical summary (2005). China Statistics Press. 2005
714