Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2013 UNDERGRADUATE AWARDS Testing the Efficient Market Hypothesis Using Data Analytics Demo User 5/20/2013 Predicting the movements of the stock market is an area of interest to a huge number of people not least due to the profit opportunities that it would present. It would also allow governments to refine their policies to be more effective and allow for safer hedging by firms and companies. In this paper I test techniques used in data mining to see if they can predict the market with greater success than the models that have historically been used. The aim here is to lay groundwork on which future research into predicting the market using advanced techniques can be based. The weak form of the efficient market hypothesis states that current returns cannot be predicted using past returns. I first review the literature on the topic and examine the most recent methods which have been applied to testing this hypothesis. I subsequently carry out a test using an autoregressive moving average model which is one of the models traditionally used in testing the efficient market hypothesis. I then apply the data mining techniques classification trees and random forests to the data. The results from these tests are then compared against traditionally used methods and their potential for predicting the market is highlighted. This paper concludes with recommendations for extensions to these tests. Table of Contents Introduction ............................................................................................................................................ 2 Motivation............................................................................................................................................... 4 Literature Review .................................................................................................................................... 5 Empirical Approach ................................................................................................................................. 8 Standard Models ................................................................................................................................. 8 More Powerful techniques ................................................................................................................. 9 Tests of the EMH using the advanced techniques ............................................................................ 10 Description of the Data Set ................................................................................................................... 12 Empirical Results ................................................................................................................................... 15 Data Mining Methods ....................................................................................................................... 17 Summary of Results .............................................................................................................................. 20 Extensions ............................................................................................................................................. 21 Conclusion ............................................................................................................................................. 22 Bibliography .......................................................................................................................................... 23 1 Introduction The idea of efficient markets makes sense to us all intuitively in that it must be difficult to get rich and yet it is one of the most hotly contested propositions in all of the social sciences. Even though the efficient market hypothesis (EMH) is very simple to state it has proven to be extremely resilient to empirical proof or refutation. There have been several decades of research and thousands of published studies and yet there has of yet been no consensus about whether markets are in fact efficient. Most financial models have been built around the EMH including the famous Black-ScholesMerton model and the theory is often applied in other areas of economics such as macroeconomic growth models. The aim of this paper is to test the EMH using a traditional ARMA model and more recently developed statistical models. The use of different methods enables me to test if the market can be predicted not just by linear relationships but also by interactions between past returns. A brief description of the EMH is provided below along with a description of the levels of the EMH. In this paper I intend to test the weak form efficiency of the EMH. Efficient Markets Hypothesis The efficient market hypothesis can be defined as “Security prices accurately reflect available information, and respond rapidly to new information as soon as it becomes available.” (Myers, 1996). This implies that it is near impossible to make returns that consistently outperform the market and if it happens then it is due more to luck than to any particular inference or analytics on the behalf of the investor. Forms of Efficiency Three forms of the efficient market hypothesis have been proposed and they are as follows: Weak Form Efficiency – This form argues that investors should not be able to make excess returns by observing only historical asset prices. Prices should respond only to new information or new economic events. Technical analysts or chartists believe this to be untrue and seek to use the trends and patterns of past prices to predict future prices. Semi-Strong Form Efficiency – This form argues that all publically available information such as company earnings and market conditions should be reflected in asset prices making it impossible to earn excess returns without having insider knowledge. It would be futile for investors to search for bargain opportunities from an analysis of published data if this form holds. Fundamentalists believe this to be untrue and trade securities based on the fundamental value of a company. Strong Form Efficiency – This form argues that it should be impossible to earn excess returns from both public and private information, thus ruling out insider trading. This form is unlikely to hold and as such there are laws against insider trading. There is some systematic evidence that insiders can earn abnormal returns even when they trade legally (Seyhun, 1998). 2 These three levels are not independent of one another, this means that for the market to be efficient in the semi-strong sense it must be efficient in the weak form and must be efficient in both to be strong from efficient. 3 Motivation Of late there has been a loss of faith in the EMH as many studies have come out arguing against it. In this paper I intend to test the market using not only the standard models which have been applied to the theory in the past but to also make use of some of the more powerful statistic techniques that have been developed more recently. I feel that it is possible there is a reverse file bias at work in relation to studies of market efficiency. This means that when a model which can predict the market is created it would not be published as it would be far more profitable for the researcher to sell the model to an investment bank rather than publish the paper. Even though there are many arguments against the efficient market hypothesis there still exists no well-publicised way to make a risk free profit from trading on the market. In this paper I intend to test the weak form of the efficient market hypothesis using an ARMA model. The data I will use is weekly price data from the NYSE Composite. This index covers all the common stock that is listed on the New York Stock Exchange; over 2000 stocks are covered on the index. The basis for these tests stems from the original tests of the EMH based upon random walk theory. The original tests failed to reject the EMH and although modern economic belief is that the market is not efficient and I am making use of modern data it will be difficult for these tests to reject the EMH. While the ARMA methodology is a very useful technique in forecasting and has many applications it struggles to model returns data. ARMA models are unable to take account of interactions in the data and so a model that does take account of such interactions could provide stronger results. Should the ARMA model fail to provide any new conclusions I propose to test the EMH using some of the more powerful statistical techniques that have been developed in recent years. As computer power has grown there has been an explosion in the number and power of statistical techniques. Advances over the last two decades have in the field of data analysis and data mining have led to the development of many techniques which could allow more accurate prediction of movements in the stock markets. Indeed some of the techniques which will be applied are undoubtedly already in use by the major investment banks. The techniques that I intend to apply are classification trees and random forests. Classification trees have been shown to be one of the least powerful of the methods and yet they are the least complicated. Thus they will allow us to see clearly the rules generated which may provide an insight into any patterns emerging in the data. Random forests are an example of an ensemble method which is used regularly in the financial sector for prediction. These models while only using past returns as input data may provide better results than the standard models. These techniques consistently outperform techniques such as linear regression methods when it comes to both regression and classification tasks. The prediction power of these methods will be tested with the conclusion that, subject to certain conditions, if they perform significantly better than chance we can reject the null hypothesis of market efficiency. 4 Literature Review The earliest known statement about the efficient market hypothesis comes from 1889. “When shares become publicly known in an open market, the value which they acquire there may be regarded as the judgment of the best intelligence concerning them.” (Gibson, 1889) The origins of the modern day EMH stem from the work of Eugene F. Fama and Paul A. Samuelson in the 1960’s. In 1965 Fama published his thesis which argued that share prices followed random walks (Fama E. , The behavior of stock market prices, 1965) while Samuelson published a proof for a version of the EMH (Samuelson, 1965). In 1970 Fama defined an efficient market as one in which security prices always fully reflect the available information (Fama E. , Efficient capital markets: A review of theory and empirical work, 1970). The implications of this statement are huge and in the 1960s and 1970s the EMH was a huge success as academics developed more powerful theoretical reasons why the theory should hold and a large amount of supporting empirical findings emerged. The field of academic finance was created on the basis of the EMH and its applications (Shleifer, 2000). In 1978 at the height of the popularity of the EMH Jensen wrote ‘there is no other proposition in economics which has more solid empirical evidence supporting it than the Efficient Markets Hypothesis’ (Jensen, 1978). The basic theoretical case for the EMH relies on three key arguments (Shleifer, 2000). Firstly investors are assumed to be rational and as such will value securities rationally. Secondly any investors that are not rational will trade randomly and so their trades will cancel out. Third, arbitrageurs will step in to take advantage of any situation where irrational investors do act in a similar fashion. If these assumptions hold then effects from irrational and uninformed agents should show up in the data as merely white noise and so asset prices would follow random walks. Tests of the weak-form efficiency have their origins in random walk theory. This theory suggests that the sequences of share price movements over time are consistent with a series of cumulative random numbers rather than forming patterns. The random walk model assumes that successive returns are independent and that returns are identically distributed over time (Elton, Gruber, J.Brown, & Goetzmann, 2011). An early test of this type was carried out by Working who found that the price changes of wheat follow a random walk (Working, 1934). In his 1953 paper Kendall found that when price series are observed at relatively close intervals the random change from one period to the next is large enough in magnitude to swamp any systematic effect which may be present (Kendall, 1953). He also finds that there is empirical evidence and theoretical support for the belief that aggregate index numbers behave more systematically than their components. Roberts too finds that stock prices follow a pattern similar to a random walk and yet theorizes that departures from this model will one day be discovered by economists (Roberts, 1959). Alexander found that in speculative markets price changes appear to follow a random walk but a move once initiated tends to persist (Alexander, 1961). Thus by applying certain filter techniques abnormal profits could be made however profits disappeared one transaction costs were taken into account. Similar finding were made by Fama and Blume in 1966 (Fama & Blume, Filter rules and stock market trading, 1966). That certain techniques could provide abnormal profits leads to the 5 possibility that markets are not efficient even though the techniques would have to be streamlined in order to retain profitability when transaction costs are included. After the 1970’s the EMH was challenged on both theoretical and empirical grounds. De Bondt and Thaler advanced the theory that stock prices overreact to explain their findings that low performing stocks earn higher returns over the next few years than stocks that have previously had high returns (De Bondt & Thaler, 1985). Jegadeesh and Titman find that movements in individual stock prices over the period of six to twelve months tend to predict future movements in the same direction (Jegadeesh & Titman, 1993). Fama and French found that returns were predictable across a three to five year period however this effect seemed to disappear over time (Fama & French, 1988). It was also shown by many authors that it is difficult to maintain the case that investors are fully rational. Kahneman and Riepe in their 1998 paper summarize the deviations from the standard decision making model by many individuals (Kahneman & Riepe, 1998). Evidence that the markets are in fact inefficient is summarized and laid out clearly in the book by Andrei Shleifer (Shleifer, 2000). Shleifer presents a multitude of evidence to support this claim including the reasons why it is still extremely difficult to profit from the inefficiencies and why arbitrageurs are unable to take advantage of the obvious mispricings. The revelation that the market is inefficient has led to the creation of the field of behavioral finance. Perhaps the most obvious source of potential profit arising out of the field of behavioral finance is the existence of positive feedback loops. Here noise traders in price bubbles react to past price changes, as opposed to particular news (Shleifer, 2000). Examples of this include the rising prices of internet stocks in 1998. Black found that investors willingness to bear risk rises with their wealth which could lead to positive feedback trading (Black, 1988). Shiller in his book Irrational Exuberance outlines how feedback bubbles could occur and provides examples of where such bubbles have occurred in relation to the housing market (Shiller, 2005). Shleifer provides a model showing how such feedback loops could occur in the market (Shleifer, 2000). It appears that what is necessary here is to create a trading strategy that can profit from such bubbles. In relation to the predictability of the financial market evidence that this is possible is provided in papers such as that by Lo et al. who used technical analysis and computational algorithms to predict the market (Lo, Mamaysky, & Wang, 2000). The existence of trends and patterns in the market was demonstrated by Lo and McKinley (Lo & MacKinlay, 2001). Gencay found that technical trading rules such as using a moving average model are more effective at predicting returns and exchange rates than a model that assumes they follow a random walk (1998) (1997). Over the last few years there have been several promising results produced by using nonlinear classification and machine learning techniques for market prediction. Using support vector machines Wang and Zhu demonstrated that the market is predictable to some degree (Wang & Zhu, 2010). However, once trading costs are taken into account Wang and Zhu’s model is no longer profitable. Cao et al. find that using neural networks to predict the market is significantly more accurate than using either the capital asset pricing model or Fama and French’s three factor model in predicting stock market returns (Cao, Leggio, & Schniederjans, 2005). They believe that there are investors out there who are using neural networks to successfully exploit inefficiencies in the market but who are better served by keeping the information to themselves rather than publishing their findings. Huang et al. show that support vector machines can be used to forecast market returns more accurately 6 than other classification methods but suggest that in order to maximize the accuracy of forecasts several methods should be combined in an ensemble or blended model (Huang, Nakamori, & Wang, 2005). 7 Empirical Approach In this paper I will be testing the weak form efficiency of the EMH, the possibility that returns can be predicted from past returns. I intend on running several types of models in order to test the hypothesis that returns cannot be predicted from past returns. This can be stated as: E(Rt|Rt-1,Rt-2,…) = E(Rt) Where Rt is the return at time t. The returns of an asset price can be calculated following Cooper (1982). This gives the following formula: Rt= loge (Pt/Pt-1) Where Pt is the price at time t. The price data will be converted into returns data this way. Standard Models The model which I test will be made up of two parts. The first of these will be an autoregressive (AR) model; the functional form of the model is as follows: Rt = α + φ1(Rt-1) + … + φn(Rt-n) + ϵt Where α is the expected return, φn are the coefficients on previous returns, ϵ represents the error term, n denotes the number of lagged terms to include and this model is denoted AR(n). The second part of the model will be made up of a moving average model. This model predicts the current return as a function of the mean value and previous returns. The model will have the following functional form: Rt = μ + ϵt - ψ1(ϵt-1) - … - ψn(ϵt-n) In this model μ denotes the mean value while ϵt denotes the error at time t, the ψi are the coefficients on the previous error terms and n is the order of the moving average model. This model is denoted as MA(n). The model I will test is an autoregressive moving average model. This model combines the elements of both a moving average and autoregressive model and its functional form is as follows: Rt = α + φ1(Rt-1) + … + φp(Rt-p) + μ - ψ1(ϵt-q) - … - ψn(ϵt-q) + ϵt The notation is the same as in the previous models with the order of the autoregressive model now p and the order of the moving average is q. The model is denoted ARMA(p,q). The EMH requires to test that: H0:φ1=…= φp= ψ1=… ψn=0 against the alternative H1 that any of the parameters are statistically significantly different to 0. If the data are not stationary then a different model will have to be tried which involves differencing the times series, this is done in order to remove a trend. This differencing is integrated into the ARMA model creating autoregressive integrated moving average model, ARIMA. The functional form of this model is as follows: (1 – φ1B – φ2B2 - … - φpBp)(1-B)dRt = c + (1 – ψ1B – ψ2B2 - … - ψqBq)ϵt 8 Where B represent the backshift operator defined such that B(Rt) = Rt-1 and B2(Rt) = Rt-2. Here the terms α and μ have been combined into the term c. d represents the order of differencing in the model. The model is denoted ARIMA (p,d,q). The null and alternate hypotheses are the same as for the ARMA model. When it comes to determining the significance of results the explanatory power of the models will be important. For this we will make use of the R-squared measure. R-squared measures the amount of variation in the data that is explained by the model. ARMA and ARIMA models are computed using maximum likelihood however and so we will have to construct a pseudo R-squared value. This will be done using the method laid out in the Stata online manual (Cox, 2003). More Powerful techniques In order to make use of classification trees and random forests I will create a data set with the dependent variable being the returns in the current period and the independent variables being the lagged values from the last 104 weeks, approximately two years. I will outline the techniques I propose to use below. Classification and Regression trees Classification and Regression trees of the CART methodology, which will be used here, were introduced by Breiman, Friedman, Olshen and Stone in 1984 (Classification and regression trees, 1984). One of the strengths of trees is that they are far superior at picking up relationships than linear regression is. The algorithm used is as follows: 1. At the base node measure the impurity of the node using a criterion such as the Information or Gini criterion. 2. Choose the variable and value that leads to the largest fall in the impurity measure when the data is split using it. 3. Split the data into two nodes using this variable and value. 4. Repeat steps 2 & 3 for each node until no further gains in accuracy are seen. 5. Prune the tree back using the complexity parameter value. 6. The terminal nodes are then classified as whatever class makes up the highest proportion of the cases at the node. Cases are then run down the tree and classified according to which terminal node they end up in. Random Forests Random forests are an ensemble method created by Leo Breiman in 2001 (Breiman, Random Forests, 2001). Random forests make use of the decision tree methodology. Random forests involve the construction of a large number of trees and then combining the results in order to make predictions. The unique feature of a random forest is that there is two kinds of randomness built in. Instead of building each tree using the whole data set a sample of the data set is taken with replacement, the bootstrap method, to get the data to build the tree upon. In addition at each node only a subsection of the variables is available to be used to split the data. These variables are selected randomly. This additional randomness has been shown to increase the accuracy of the 9 predictions and allows the models to be combined in an ensemble without over fitting the data. The pseudo code for the algorithm as laid out by Montillo is: Let Ntrees be the number of trees to build for each of Ntrees iterations, let mtry be the number of variables available to split the data at any node. 1. Select a new bootstrap sample from training set 2. Grow an un-pruned tree on this bootstrap. 3. At each internal node, randomly select mtry predictors and determine the best split using only these predictors. 4. Do not perform cost complexity pruning. Save tree as is, alongside those built thus far. We output overall prediction as the average response (regression) or majority vote (classification) from all individually trained trees (Albert A. Montillo, 2009). Tests of the EMH using the advanced techniques In order to test the EMH using these advanced techniques the data will be randomly split into training and test sets using a 75/25 split. The models will then be constructed on the training set and applied to the test set. The dependent variable will be converted to a binary variable to allow for the following tests. The data will be categorized into two categories returns above the median value, H, and returns below the median value, L. The median is used here as if we had chosen the mean value then the model would automatically have an accuracy of greater than 50%. As the median value is higher than the mean assigning all returns to the high class would result in an accuracy of greater than 50% for the model. The classification tree will be built on the training data using the previously outlined algorithm. The tree will then be applied to the test set data. The hypotheses are as follows: H0: The accuracy of the model is equal to 0.5 H1: The accuracy of the model is not equal to 0.5 Or H0: The model has no predictive power/ is no better than picking the results by chance H1: The model has predictive power/is better than picking the results by chance Thus if the model is able to predict above average returns at a rate that is statistically significantly different to chance we can reject the null hypothesis and by extension the weak-form of the EMH which states that past returns should provide no information in relation to current returns. The random forest will follow the exact same procedure with the model constructed on the training data set and applied to the test data set. The hypotheses and subsequent conclusions remain unchanged for the random forest. 10 The statistical significance of these models will be tested using the receiver operating characteristic (ROC) curve. This graphs the relationship between sensitivity and specificity for the model. The sensitivity of a returns test is the proportion of returns for whom the outcome is high that are correctly identified by the test. The specificity is the proportion of returns for whom the outcome is low that are correctly identified by the test. A test with a high sensitivity may have low specificity and vice versa. Usually a tradeoff between the two will have to be made. The ROC curve is used because simple classification accuracy has been shown to be a poor metric for performance (Provost & Fawcett, 1997). In order to test the significance of the ROC curve confidence intervals will be constructed. These will be constructed using the bootstrap method which involves taking a random sample with replacement the same size as the test set from the test set and running it through the model. The area under the curve (AUC) gives a measure the predictive power of the model and is explained clearly by Bradley in his 1997 paper (1997). The AUC of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance which is equivalent to the Wilcoxon test of ranks (Hanley & McNeil, 1982). An ideal model would have an AUC of 1 or 100%. If the null hypothesis holds we would expect the AUC of the models to be equal to 0.5 or 50% which would be equivalent to just picking the results by chance (Bewick, Cheek, & Ball, 2004). 11 Description of the Data Set The data set contains information on the price of the NYSE Composite over a period of time taken from the Yahoo! Finance website (Yahoo! Finance, 2013). The data set stretches from 31/12/1965 to 11/02/2013 and there are 2459 observations. There is no missing data in the data set. As the data were in price form they were converted into returns with the loss of one observation leaving 2458 observations. The table 1 summarises some of the key characteristics of the data set. Mean Standard Error Median Minimum Maximum 0.00115119286102522 0.000451086760493625 0.003091893 -0.217345349 0.130591436 Table 1 - Data Summary Statistics The fact that the mean is a far lower value than the median indicates that losses when they do occur are more extreme than gains above the mean value. This can also be seen in the minimum and maximum values where the minimum value much further away from the mean value than the maximum value. The series was tested for a unit root using the Dickey-Fuller test and was found to be stationary. The null hypothesis here is that there is a unit root, a p-value of 0.000 combined with a test statistic value of -49.620 and a 1% critical value allows us to reject this hypothesis. Dickey-Fuller test for unit root Z(t) Number of obs Test Statistic 1% Critical Value -49.620 -3.430 2353 Interpolated Dickey-Fuller 5% Critical 10% Critical Value Value MacKinnon approximate p-value for Z(t) = 0.0000 Table 2 – Dickey-Fuller Test for Unit Root 12 = -2.860 -2.570 The kernel density is displayed on plot 1. From this plot we can see that returns are very similar to a normal distribution yet have a higher peak at the mean value. 10 5 0 Density 15 20 Kernel density estimate -.2 -.1 0 returns .1 .2 Kernel density estimate Normal density kernel = epanechnikov, bandwidth = 0.0035 Plot 1 – Kernel Density of Returns A plot of the observations can be seen in plot 2 on the following page. With so many observations it would be difficult to spot small trends even if they existed and yet no major trends or runs are apparent from the plot. The data is clearly very noisy and the lack of any linear trend suggests that we should not have to difference the data i.e. the data is stationary in mean. Along with the plot the autocorrelation function (ACF) plot and the partial autocorrelation function (PACF) plot are also displayed. It can be clearly seen from these plots that the lags at several of the variables are significant. The autocorrelation function measures the similarity of observations as a function of the time separation between them. The partial autocorrelation function is an extension of the autocorrelation function where the dependence on the intermediate elements is removed. The statistically significant lags on the ACF plot, those that break through the support line, may indicate the existence of an autoregressive process at those lags while the PACF spikes indicate lags where a moving average may exist. The significant lags on the ACF are 6, 15, 27, 60, 83 and 88. The significant lags for the PACF are 6, 15, 60, 83, 89, 91 and 99. Particular attention will be paid to these lags when constructing the models. It is difficult at this stage to determine whether these lags represent an autoregressive or moving average process. 13 Plot 2 – Time Series Plot of the Data with ACF and PACF Plots 14 Empirical Results In this section I will present the empirical results that emerged from examining the data. The interpretation and implications of the results will be presented in the next section. The results in this section were computed using the statistical software Microsoft Excel, Stata and R. ARMA(7,7) In order to determine the best model to use and avoid over fitting a test of the significance of various lags was carried out. The variables are tested over three criteria namely Akaike's information criterion (AIC), Hannan and Quinn’s information criterion (HQIC) and Schwarz's Bayesian information criterion (SBIC). The results are provided below with * indicating the highest value for each criterion. From the table we can see that both the HQIC and SBIC are maximised at 0 while the AIC is maximised at 7. As the AIC is better at handling monthly and weekly data we will use this criterion (Ivanov & Kilian, 2001). Selection-order criteria Sample: 1960w12 - 2005w15 lag 0 1 2 3 4 5 6 7 8 9 10 LL 5568.89 5569.56 5570.57 5570.65 5571.35 5571.43 5577.42 5578.9 5579.12 5579.32 5579.58 LR 1.3363 2.0236 .16142 1.4078 .1662 11.977* 2.9468 .45216 .39764 .51108 Number of obs df 1 1 1 1 1 1 1 1 1 1 p 0.248 0.155 0.688 0.235 0.684 0.001 0.086 0.501 0.528 0.475 FPE .000506 .000506 .000506 .000507 .000507 .000507 .000505 .000505* .000505 .000506 .000506 AIC -4.75076 -4.75047 -4.75048 -4.7497 -4.74945 -4.74866 -4.75292 -4.75333* -4.75266 -4.75198 -4.75135 HQIC -4.74986* -4.74868 -4.7478 -4.74612 -4.74497 -4.74329 -4.74666 -4.74617 -4.74461 -4.74303 -4.7415 = 2344 SBIC -4.7483* -4.74556 -4.74311 -4.73987 -4.73716 -4.73392 -4.73572 -4.73367 -4.73055 -4.72741 -4.72432 Table 3 – Selection Order Criteria The model was run and tested positive for heteroscedasticity. The model was then tested for autoregressive conditional heteroscedasticity (ARCH) which came back positive. As there is ARCH present the model we will use the appropriate model by including ARCH features in the regression. The significant variables in this model are the lags 2, 5, 6 and 7 for the autoregressive process, lags 2, 5 and 7 for the moving average and lags 1, 2, 3, 4, 6 and 7 for the ARCH process. As such we are able to reject the null hypothesis of no significant variables. The significant variables are summarised in the table below. 15 Variable RT-2 RT-5 RT-6 RT-7 ϵt-2 ϵt-5 ϵt-7 ARCHt-1 ARCHt-2 ARCHt-3 ARCHt-4 ARCHt-6 ARCHt-7 Coefficient 0.6237697*** (0.0586303) -0.4644677*** (0.0832076) 0.1173529* (0.0634434) 0.8371188*** (0.0550205) -0.6121567*** (0.0593238) 0.4464211*** (0.0806975) -0.854614*** (0.0541573) 0.2134317*** (0.0218642) 0.086734*** (0.027355) 0.1471664*** (0.0280176) 0.1003543*** (0.0219859) 0.0746435*** (0.0211112) 0.0337553* (0.0175689) Table 4 – ARMA Output, standard errors are reported in parentheses and p-values are denoted as follows: ***p<0.01, **p<0.05, *p<0.1 Again the statistically significant values of the coefficients here indicate that the past values of returns have some predictive power when it comes to forecasting future returns. The summary statistics for this model are provided in the table below. Statistic Log likelihood Wald Chi-Square Pseudo R-squared Durbin Watson Value 5863.085 22374.53*** 0.01403622 2.0000374 Table 5 – ARMA Summary Statistics The Wald Chi-Square statistic is highly significant with a p-value of 0.000. This statistic tests the hypothesis that at least one of the predictors’ regression coefficients is not equal to zero. Thus we can conclude that the variables in the model are significant. The pseudo R-squared value for this model is 0.01403622 which indicates that this model explains 1.4% of the variation in the data. The Durbin-Watson statistic for this model is 2. This fails to reject the null hypothesis of no autocorrelation and so it can be inferred that there is no heteroscedasticity present. 16 Data Mining Methods Here the classification tree and random forest methods will be tested on the data set. Classification Tree The classification tree was built using the Gini criterion as this operates more effectively in noisy domains (A. & Berry, 2004). The classification tree is displayed below. The nodes that contain predominantly high returns are denoted H while those with predominately low returns are denoted L. Plot 3 – Classification Tree What is interesting about this tree is that the variables it uses to split upon are not recent variables. In fact the first variable split upon is the lagged value from 103 weeks ago, almost two years. The variable which was created closest to the present is t32 which is the lagged value from 32 weeks ago, nearly four months. This indicates that the patterns which allow predictability take place long before the pattern is realised. In addition none of the variables here appeared as significant on the ACF or PACF plots. This showcases how decision trees make use of interactions as opposed to linear effects. 17 The misclassification table produced by the classification tree with a cut off value of 0.5 is displayed below. The returns are classified according to whether they are above, high, or below, low, the median return value. Classification Tree Predicted High Low Actual High 175 102 Low 178 133 Table 6 – Classification Tree Misclassification Tables Thus this classification tree classifies high returns with an accuracy of 63% and low returns with an accuracy of 42%. Overall the accuracy of the tree is 52%. Thus it would seem the tree has a small amount of predictive power. In order to determine if the predictive power of the tree is statistically significant further tests are required. Below is the receiver operating characteristic (ROC) curve for this classification tree. The 95% confidence interval was constructed using the bootstrap method with 2,000 repetitions. Plot 4 – Classification Tree ROC curve As can be seen the confidence interval for the AUC spans the area 49.8% to 59% and thus is not statistically significantly different from 50% at the 95% confidence level. As such we fail to reject the null hypothesis that the model has no predictive power or is better than chance. 18 Random Forest The random forest contains 2,000 trees, with a minimum final node size of one and twenty variables were tried at each split. The misclassification table using a cut-off value of 0.499 for the random forest is displayed below. Random Forest Predicted High Low Actual High 171 106 Low 180 131 Table 7 – Random Forest Misclassification Table Thus this model predicts high returns with an accuracy of 61.7% and low returns with an accuracy of 42%. The overall accuracy of the model is 51.36% again this is only slightly better than chance. The ROC curve of the random forest is displayed below. The 95% confidence intervals were calculated using the same bootstrap technique as was used for the classification tree. Plot 5 – Random Forest ROC curve From the ROC curve we can see that the estimated AUC is 54.8% with the 95% confidence bounds of 50.1% and 59.5%. As this confidence interval does is above 50% we can conclude that this model has predictive power with 95% confidence. Thus we are able to reject the null hypothesis of no predictive power. 19 Summary of Results In the ARMA model there were statistically significant variables which let us reject the null hypothesis. Thus past returns were found to have an effect on the current market price. This is a violation of the weak form of the EMH. However the model has very little actual predictive power and so the gains from exploiting this inefficiency may be non-existent when transaction costs are taken into account. The ARMA (7,7) model had a pseudo R-squared value of 0.01403622 which implies that it still explains less than 1.5% of the variation in returns. Thus any trading strategy based upon any of this model would have an extremely high level of risk associated with it due to its poor explanatory power. The classification tree failed to reject the null hypothesis that the model had predictive power. As such for this test the result is that markets are efficient. This result was expected as singular classification trees generally have very low predictive power. In relation to the random forest the model the null hypothesis was rejected and we can accept that the model had predictive power. However even though the confidence interval was above 50% it was only marginally so, by 0.1% in fact, which is very close to a model based on chance. There are no R-squared measures for tree based methods. That the model was only slightly better than chance leaves open the possibility that any potential profits arising from the implementation of a trading strategy based on such a model could be wiped out by transaction costs. It is difficult to know what conclusions to draw for the results here. Elton et al. (2011) suggest three possible explanations of what could be the true state of affairs: 1. With so many researchers examining the same data set, as is definitely the case for the NYSE Composite data used here, patterns will be found and these patterns are simply random. 2. The patterns could be caused by the market structure and order flow. 3. The markets are inefficient. The results found here are consistent with the majority of the literature in that because of transaction costs it is more than likely that the return differences are not large enough to allow development of a trading model to exploit the patterns in the returns. 20 Extensions In order to determine if it would be possible in order to profit from the market inefficiencies described above data on transaction costs would need to be available and a trading strategy would have to be developed based on the model. Only if this model is able to make returns above the market average allowing for transaction costs could we consider it proof of a violation of the EMH. Timmerman and Granger say that as transaction costs vary over time a real time set of transaction costs is needed to fully test the EMH and this makes disproving the theory far more difficult (Timmermann & Granger, 2004). As seen in the examination of the classification tree different variables have linear and interaction effects on returns. As ARMA models are good at capturing linear effects and tree based methods are good at capturing interactions a model that combines these two methods could provide a substantially better prediction than either of the two models can alone. I believe that future tests of the EMH will involve the use of similar advanced statistical models to the ones tested here. In particular the field of ensemble models which allow the combination of several thousand models of different types to allow for more accurate results holds substantial promise. By making use of model selection algorithms such as the one outlined by Caruna a far greater level of accuracy can be achieved than what it is possible to achieve using single models alone (Caruana, 2004). These ensemble methods can combine different types of models such as random forests, neural nets, support vector machines and others to create a more accurate prediction. The downside to the construction of these models is that they require significant computing power to create and would need to be constantly updated making them unusable for those without access to significant computing power. If the feedback loop amplification theory, outlined by Shiller in his book Irrational Exuberance (Shiller, 2005) and by Shliefer in his book Irrational Exuberance (Shleifer, 2000), holds it may be possible to find more significant patterns using smaller periods. While previously information took time to travel from person to person the advent of the smartphone age has led to many being constantly updated on developments around the world. As such the feedback loops described by Shiller would have a far greater chance of being picked up by a random forest if the data was made up of smaller time periods such as seconds, minutes or hours. The time period used in this study of weeks would be far too great to pick up such patterns as would I suspect even daily data. The emphasis that random forests put on interactions between variables would make them very well suited to detect the sort of feedback amplification described by Shiller. 21 Conclusion In this paper I have examined the literary background of the EMH, the testing of it, its development into the field of behavioural finance and the application of modern statistical techniques to the testing of it. The empirical approach was then laid out which included brief descriptions of the statistical techniques which some readers may be unfamiliar with. An exploration of the dataset was then completed followed by the empirical results. The findings here are similar to those found in many papers regarding the EMH that some inefficiency can be found yet it is a relatively small amount and there would be little to no profit opportunity in exploiting such inefficiencies once transaction costs are taken into account. Saying nothing for the risks one would be taking following a model that only explains 1.5% of the variation in the market. A bright spot in the findings here was the statistically significant predictive power of the random forest model. This model performed statistically significantly better than chance and models of this sort hold particular promise when it comes to future tests of the EMH. In order to accurately test the weak-form efficiency of the market it is necessary to have real time data on transaction costs and that data is currently not available. A key component of future studies will be to include such data. My recommendations for extensions also include making greater use of combination models and ensemble methods and using data with different time periods. There has been much work done in relation to the efficient market hypothesis and much work remains to be done. As the field of behavioural finance develops there will be more opportunities for economists to structure their models to identify inefficiency in the market. I believe that the continued advances in statistical techniques and computer power will provide economists with the tools to find more breakdowns in the EMH at all strength levels. Whether those findings will be published or sold to an investment bank is another matter entirely. 22 Bibliography A., M. J., & Berry, G. S. (2004). Data Mining Techniques For Marketing, Sales and Customer Relationship Management. Indianapolis: Wiley Publishing Inc. Albert A. Montillo, P. (2009, February 04). Guest lecture: Statistical Foundations of Data Analysis. Temple University. Alexander, S. (1961). Price movements in speculative markets: trends or random walks. Industrial management review, May, pp. 7-26. Allan Timmermann, C. W. (2004). Efficient market hypothesis and forecasting. International Journal of Forecasting, 20, 15 – 27. Bewick, V., Cheek, L., & Ball, J. (2004). Statistics review 13: receiver operating characteristic curves. Crit Care, 8: 508–512. Black, F. (1988). An equilibrium model of the crash. NBER Macroeconomics Annual, 269-76. Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition , 30 (7) 1145–1159. Breiman, L. (2001). Random Forests. University of California, Berkley: Statistics Department . Breiman, L., & Friedman, J. H. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. Cao, Q., Leggio, K. B., & Schniederjans, M. J. (2005). A comparison between Fama and French’s model and artifcial neural networks in predicting the Chinese stock market. Computers & Operations Research, 32: 2499 – 2512. Caruana, R. (2004). Ensemble Selection from Libraries of Models. International Conference on Machine Learning. Cornell. Cooper, J. C. (1982). World stock markets: some random walk tests. Applied Economics, 14, 515-531. Cox, N. J. (2003, September). Do it yourself R-squared. Retrieved from www.stata.com: http://www.stata.com/support/faqs/statistics/r-squared/ De Bondt, W., & Thaler, R. (1985). Does the Stock Market overreact? Journal of Finance, 40:793-805. Elton, E. J., Gruber, M. J., J.Brown, S., & Goetzmann, W. N. (2011). Modern Portfolio Theory and Investment Analysis 8th edition. Wiley. Fama, E. (1965). The behavior of stock market prices. Journal of Business. Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25: 383-417. Fama, E., & Blume, M. (1966). Filter rules and stock market trading. Journal of Business, Security Prices: A Supplement, January. 23 Fama, E., & French, K. (1988). Permanent and temporary components of stock prices. Journal of Political Economy, 96:246-73. Gencay, R. (1997). Linear, non-linear, and essential exchange rate prediction with simple technical trading rules. Journal of International Economics, 47:91–107. Gencay, R. (1998). The predictability of security returns with simple technical trading rules. Journal of Empirical Finance, 5:347–59. Gibson, G. (1889). The Stock Exchanges of London Paris and New York. New York: G. P. Putnman & Sons. Hanley, J., & McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143: 29-36. Huang, W., Nakamori, Y., & Wang, S.-Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32: 2513 – 2522. Ivanov, V., & Kilian, L. (2001). A Practitioner's Guide to Lag-Order Selection for Vector Autoregressions. London, Centre for Economic Policy: CEPR Discussion Paper no. 2685. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: implications for stock market efficiency. Journal of Finance, 48: 65-91. Jensen, M. (1978). Some anomalous evidence regarding market efficiency. Journal of Financial Economics, 6: 95-101. Kahneman, D., & Riepe, M. W. (Summer 1998). Aspects of investor psychology. Journal of Portfolio Management, Vol. 24 Issue 4, p52-65. Kendall, R. (1953). The analysis of economic time series, part 1: prices. Journal of the Royal Statistical Society, Vol 116, No.1, pp 11-34. Lo, A. W., Mamaysky, H., & Wang, J. (2000). Foundations of Technical Analysis: Computational Algorithims, Statistical Inference, and Empirical Implementation. The Journal of Finance, Vol No. 4, 1705-1765. Lo, A., & MacKinlay, C. (2001). A non-random walk down wall street. Princeton: Princeton University Press. Myers, R. B. (1996). Principles of Corporate Finance. Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. Third Internat. Conf. on Knowledge Discovery and Data Mining (pp. 43–48). Menlo Park, CA: AAAI Press. Roberts, H. (1959). Stock market "patterns" and financial analysis: methodolical suggestions. Journal of Finance, March. Samuelson, P. (1965). Proof that properly anticipated prices . Industrial Management Review. 24 Seyhun, H. N. (1998). Investment Intelligence From Insider Trading. Cambridge: MIT Press. Shiller, R. J. (2005). Irrational Exuberance. New Jersey: Princeton University Press. Shleifer, A. (2000). Inefficient Markets. Oxford: Oxford University Press. Wang, L., & Zhu, J. (2010). Financial market forecasting using a two-step kernel learning method for the support vector regression. Ann Oper Res, 174: 103–120. Working, H. (1934). A random difference series for use in the analysis of time series. Journal of the American Statistical Association, March. Yahoo! Finance. (2013, 02 11). Yahoo! Finance : NYSE Composite Index. Retrieved 02 11, 2013, from Yahoo! Finance: http://finance.yahoo.com/q/hp?s=%5ENYA&a=00&b=12&c=1950&d=01&e=13&f=2013&g= w 25