Download Modeling Shanghai Composite Index: An Application of ARIMA Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Modeling Shanghai Composite Index: An Application of ARIMA
Model
1. Introduction
The stock index could act as an indicator in the investment decision. As the
quantitative trading develops, stock indices are often used as a trigger for trading.
Additionally, many ETFs are index ETF. Modelling indices has significant meaning in
making investment arrangement. This article aimed to construct an ARIMA model used
to fit the Shanghai Composite Index (SSE Index).
2. Method
Data collection
Data of the Shanghai Composite Index were collected from the website of Yahoo
Finance https://au.finance.yahoo.com/. Here we adopt 30-day moving average data to
model the mid-term trend for SSE Index. The irrational behaviour of market
participants also makes the short-term price more oscillating. Hence, I collected the 30day moving average of the last business day in each month from December 1994 to
November 2014 (including 240 values) for modelling and left the latest 10 data for
comparison with the prediction. As we usually do, take the logarithm for the data to
make the data more stable.
Time series analysis
In this passage, we adopt ARIMA model. In general, a univariate ARIMA (p,d,q)
model can be written as:
(1−α1B −α2B2−···−αpBp)(1−B)dXt = (1 + β1B +···+ βqBq)et.
Where p, d, q represent the order of autoregressive terms, the number of differencing,
and the order of moving average respectively.
To analyse the SSE Index from 12/1994 to 11/2014, we follow the Box-Jenkins
approach. First, we plot ACF and PACF in order to get some initial guess of these
1
parameters. Second, let’s make comparison between the candidate models based on
Akaike Information Criterion (AIC). The model with smallest AIC is preferred. Then,
by using Ljung-Box test as goodness of fit test we could examine the normality
assumption for residual series. In the last step, we adopt the final model to make
prediction. The comparison between observations and predicted values may enable us
to evaluate the fitness of model.
This study was performed in R (version win3.1.2).
3. Results Analysis
Figure.1 describes the graph of index value. A trend is observed in the time series,
and two spikes appear around Year 2007-2008 and Year 2014-2015. The ACF and
PACF are plotted to identify the parameters. Figure.2 and Figure.3 illustrated that ACF
decreases exponentially and PACF cuts off at lag 1, indicating an AR (1) process.
According to Figure.4, the trend is removed by differencing.
Figure.1. Temporal distribution of SSE Index, 12/1991 to 9/2015
2
Figure.2. ACF of logarithmic SSE Index
Figure. 3. PACF of logarithmic SSE Index
Figure.4. First-order trend difference of SSE Index ,12/1991 to 9/2015
Figure.5. ACF of first differenced log SSE Index
Figure. 6. PACF of first differenced log SSE Index
ACF and PACF of logarithm SSE Index after first difference are presented in
Figure.5 and Figure.6 separately. Only three lines markedly spike out of the confidence
3
interval in the plot of PACF, of which lag 1 is the most significant, suggesting that p
may equal to 1. The ACF decreases to zero quickly even though there are still some
spikes in the lager lags. So q=2 or 4 could be tested. After the first difference, the series
becomes stationary to some extent, hence d=1 is determined. Finally, we get initial
guess for the 3 parameters.
Table.1
AIC values of ARIMA models
Number
Model
AIC
1
ARIMA[1,1,0]
-719.30
2
ARIMA[1,1,2]
-717.39
3
ARIMA[1,1,4]
-716.06
Figure.7. Diagnostic Checking for Model 1
4
Figure.8. Diagnostic Checking for Model 1
Based on the results in Table.1, Model 1 and Model 2 are better than Model 3. The
further diagnostic checking will be applied to the first two models. Ljung-Box test is
used for testing here. The null hypothesis of Ljung-Box test is that the residuals follow
a white noise process. Observed from Figure.7 and Figure.8, Model 2 performs better
than Model 1. The p-values in Model 2 are all well above the critical region,
representing the normality of residuals; While Model 1 fails to follow the normality
assumption after lag 5. Therefore, we discard Model 1. In Figure.9, most of dots in the
Q-Q Plot are in a line, verifying the normality assumption in Model 2.
5
Figure.9. Normal probability plot for Model 2
Table.2 Comparison between predicted and observed values under ARIMA [1,1,2] process (1)
Month
Dec.1998 – Sep.1999
Dec.2002– Sep.2003
Observed values
Predicted Values(95%CI)
Observed values
Predicted Values(95%CI)
Dec
7.095055
7.154412 (7.035785, 7.273039)
7.245124
7.284562 (7.171960, 7.397164)
Jan
7.049383
7.142213 (6.926138, 7.358288)
7.260623
7.281729 (7.082684, 7.480774)
Feb
7.01416
7.143546 (6.808125, 7.478967)
7.309795
7.281807 (7.015182, 7.548432)
Mar
7.035014
7.143400 (6.726103, 7.560698)
7.30579
7.281804 (6.961758, 7.601851)
Apr
7.057584
7.143416 (6.657393, 7.629439)
7.333755
7.281805 (6.916054, 7.647555)
May
7.039483
7.143414 (6.597292, 7.689537)
7.337524
7.281805 (6.875458, 7.688151)
Jun
7.245844
7.143414 (6.543175, 7.743653)
7.342675
7.281805 (6.838564, 7.725045)
Jul
7.369233
7.143414 (6.493550, 7.793278)
7.315819
7.281805 (6.804514, 7.759095)
Aug
7.384211
7.143414 (6.447455, 7.839374)
7.289262
7.281805 (6.772737, 7.790872)
Sep
7.39111
7.143414 (6.404228, 7.882601)
7.252953
7.281805 (6.742830, 7.820779)
Table.3 Comparison between predicted and observed values under ARIMA [1,1,2] process (2)
Month
Dec.2007 – Sep.2008
Dec.2014 – Sep.2015
Observed values
Predicted Values(95%CI)
Observed values
Predicted Values(95%CI)
Dec
8.528304
8.561415 (8.454205, 8.668624)
7.97052954
7.823826 (7.719913, 7.927738)
Jan
8.53475
8.547505 (8.348281, 8.746729)
8.08100925
7.833945 (7.643790 , 8.024101)
Feb
8.442054
8.547322 (8.271875, 8.822769)
8.08148411
7.840389 (7.579161 , 8.101617)
Mar
8.308642
8.547319 (8.212411, 8.882228)
8.13630781
7.844491 (7.520438, 8.168544)
Apr
8.15126
8.547319 (8.162017, 8.932621)
8.30282742
7.847103 ( 7.466402, 8.227805)
May
8.157918
8.547319 (8.117491, 8.977147)
8.39903611
7.848767 (7.416411, 8.281122)
Jun
8.055153
8.547319 (8.077164, 9.017474)
8.46735638
7.849826 (7.369953, 8.329698)
Jul
7.938999
8.547319 (8.040032, 9.054606)
8.29890293
7.850500 (7.326579, 8.374421)
Aug
7.864707
8.547319 (8.005439, 9.089199)
8.20938641
7.850929(7.285891, 8.415967)
Sep
7.715171
8.547319 (7.972925, 9.121713)
8.094331537
7.851203 (7.247548, 8.454857)
6
We utilize ARIMA [1,1,2] model to predict for 10 month ahead in 4 distinct periods
and constructed 95% confidence intervals for each period.
The predictions in Table.2 are more accurate than the predictions in Table.3. All the
predicted values in Table.2 fell into the confidence intervals. Especially, the predictions
matched to the observations well, suggesting the fitness of the model. On the contrary,
half of the predicted values are unreliable in Table.3. The highlight observed values are
the values that jump out the confidence intervals.
This outcome may be reasonable if we take into account the real situation. China’s
stock market was experiencing a bull market at the end of Year 2007. Increasingly more
people participated in the capital game. The market was full of speculation. The SSE
Index even reached the highest point during this period. As the global financial crisis
approached, the stock price dropped drastically at an unexpected speed. The bubble
burst. The history repeated at the end of 2014. The share prices experienced a roller
coaster again.
4. Discussion
All in all, we can draw the conclusion that ARIMA [1, 1, 2] model can be well applied
when the market is rational. Even though it shows less explanatory ability in the
extreme violation time, its usefulness in application cannot be denied.
7