Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling Shanghai Composite Index: An Application of ARIMA Model 1. Introduction The stock index could act as an indicator in the investment decision. As the quantitative trading develops, stock indices are often used as a trigger for trading. Additionally, many ETFs are index ETF. Modelling indices has significant meaning in making investment arrangement. This article aimed to construct an ARIMA model used to fit the Shanghai Composite Index (SSE Index). 2. Method Data collection Data of the Shanghai Composite Index were collected from the website of Yahoo Finance https://au.finance.yahoo.com/. Here we adopt 30-day moving average data to model the mid-term trend for SSE Index. The irrational behaviour of market participants also makes the short-term price more oscillating. Hence, I collected the 30day moving average of the last business day in each month from December 1994 to November 2014 (including 240 values) for modelling and left the latest 10 data for comparison with the prediction. As we usually do, take the logarithm for the data to make the data more stable. Time series analysis In this passage, we adopt ARIMA model. In general, a univariate ARIMA (p,d,q) model can be written as: (1−α1B −α2B2−···−αpBp)(1−B)dXt = (1 + β1B +···+ βqBq)et. Where p, d, q represent the order of autoregressive terms, the number of differencing, and the order of moving average respectively. To analyse the SSE Index from 12/1994 to 11/2014, we follow the Box-Jenkins approach. First, we plot ACF and PACF in order to get some initial guess of these 1 parameters. Second, let’s make comparison between the candidate models based on Akaike Information Criterion (AIC). The model with smallest AIC is preferred. Then, by using Ljung-Box test as goodness of fit test we could examine the normality assumption for residual series. In the last step, we adopt the final model to make prediction. The comparison between observations and predicted values may enable us to evaluate the fitness of model. This study was performed in R (version win3.1.2). 3. Results Analysis Figure.1 describes the graph of index value. A trend is observed in the time series, and two spikes appear around Year 2007-2008 and Year 2014-2015. The ACF and PACF are plotted to identify the parameters. Figure.2 and Figure.3 illustrated that ACF decreases exponentially and PACF cuts off at lag 1, indicating an AR (1) process. According to Figure.4, the trend is removed by differencing. Figure.1. Temporal distribution of SSE Index, 12/1991 to 9/2015 2 Figure.2. ACF of logarithmic SSE Index Figure. 3. PACF of logarithmic SSE Index Figure.4. First-order trend difference of SSE Index ,12/1991 to 9/2015 Figure.5. ACF of first differenced log SSE Index Figure. 6. PACF of first differenced log SSE Index ACF and PACF of logarithm SSE Index after first difference are presented in Figure.5 and Figure.6 separately. Only three lines markedly spike out of the confidence 3 interval in the plot of PACF, of which lag 1 is the most significant, suggesting that p may equal to 1. The ACF decreases to zero quickly even though there are still some spikes in the lager lags. So q=2 or 4 could be tested. After the first difference, the series becomes stationary to some extent, hence d=1 is determined. Finally, we get initial guess for the 3 parameters. Table.1 AIC values of ARIMA models Number Model AIC 1 ARIMA[1,1,0] -719.30 2 ARIMA[1,1,2] -717.39 3 ARIMA[1,1,4] -716.06 Figure.7. Diagnostic Checking for Model 1 4 Figure.8. Diagnostic Checking for Model 1 Based on the results in Table.1, Model 1 and Model 2 are better than Model 3. The further diagnostic checking will be applied to the first two models. Ljung-Box test is used for testing here. The null hypothesis of Ljung-Box test is that the residuals follow a white noise process. Observed from Figure.7 and Figure.8, Model 2 performs better than Model 1. The p-values in Model 2 are all well above the critical region, representing the normality of residuals; While Model 1 fails to follow the normality assumption after lag 5. Therefore, we discard Model 1. In Figure.9, most of dots in the Q-Q Plot are in a line, verifying the normality assumption in Model 2. 5 Figure.9. Normal probability plot for Model 2 Table.2 Comparison between predicted and observed values under ARIMA [1,1,2] process (1) Month Dec.1998 – Sep.1999 Dec.2002– Sep.2003 Observed values Predicted Values(95%CI) Observed values Predicted Values(95%CI) Dec 7.095055 7.154412 (7.035785, 7.273039) 7.245124 7.284562 (7.171960, 7.397164) Jan 7.049383 7.142213 (6.926138, 7.358288) 7.260623 7.281729 (7.082684, 7.480774) Feb 7.01416 7.143546 (6.808125, 7.478967) 7.309795 7.281807 (7.015182, 7.548432) Mar 7.035014 7.143400 (6.726103, 7.560698) 7.30579 7.281804 (6.961758, 7.601851) Apr 7.057584 7.143416 (6.657393, 7.629439) 7.333755 7.281805 (6.916054, 7.647555) May 7.039483 7.143414 (6.597292, 7.689537) 7.337524 7.281805 (6.875458, 7.688151) Jun 7.245844 7.143414 (6.543175, 7.743653) 7.342675 7.281805 (6.838564, 7.725045) Jul 7.369233 7.143414 (6.493550, 7.793278) 7.315819 7.281805 (6.804514, 7.759095) Aug 7.384211 7.143414 (6.447455, 7.839374) 7.289262 7.281805 (6.772737, 7.790872) Sep 7.39111 7.143414 (6.404228, 7.882601) 7.252953 7.281805 (6.742830, 7.820779) Table.3 Comparison between predicted and observed values under ARIMA [1,1,2] process (2) Month Dec.2007 – Sep.2008 Dec.2014 – Sep.2015 Observed values Predicted Values(95%CI) Observed values Predicted Values(95%CI) Dec 8.528304 8.561415 (8.454205, 8.668624) 7.97052954 7.823826 (7.719913, 7.927738) Jan 8.53475 8.547505 (8.348281, 8.746729) 8.08100925 7.833945 (7.643790 , 8.024101) Feb 8.442054 8.547322 (8.271875, 8.822769) 8.08148411 7.840389 (7.579161 , 8.101617) Mar 8.308642 8.547319 (8.212411, 8.882228) 8.13630781 7.844491 (7.520438, 8.168544) Apr 8.15126 8.547319 (8.162017, 8.932621) 8.30282742 7.847103 ( 7.466402, 8.227805) May 8.157918 8.547319 (8.117491, 8.977147) 8.39903611 7.848767 (7.416411, 8.281122) Jun 8.055153 8.547319 (8.077164, 9.017474) 8.46735638 7.849826 (7.369953, 8.329698) Jul 7.938999 8.547319 (8.040032, 9.054606) 8.29890293 7.850500 (7.326579, 8.374421) Aug 7.864707 8.547319 (8.005439, 9.089199) 8.20938641 7.850929(7.285891, 8.415967) Sep 7.715171 8.547319 (7.972925, 9.121713) 8.094331537 7.851203 (7.247548, 8.454857) 6 We utilize ARIMA [1,1,2] model to predict for 10 month ahead in 4 distinct periods and constructed 95% confidence intervals for each period. The predictions in Table.2 are more accurate than the predictions in Table.3. All the predicted values in Table.2 fell into the confidence intervals. Especially, the predictions matched to the observations well, suggesting the fitness of the model. On the contrary, half of the predicted values are unreliable in Table.3. The highlight observed values are the values that jump out the confidence intervals. This outcome may be reasonable if we take into account the real situation. China’s stock market was experiencing a bull market at the end of Year 2007. Increasingly more people participated in the capital game. The market was full of speculation. The SSE Index even reached the highest point during this period. As the global financial crisis approached, the stock price dropped drastically at an unexpected speed. The bubble burst. The history repeated at the end of 2014. The share prices experienced a roller coaster again. 4. Discussion All in all, we can draw the conclusion that ARIMA [1, 1, 2] model can be well applied when the market is rational. Even though it shows less explanatory ability in the extreme violation time, its usefulness in application cannot be denied. 7