Download Testing the Efficient Market Hypothesis Using Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Transcript
2013 UNDERGRADUATE AWARDS
Testing the Efficient
Market Hypothesis Using
Data Analytics
Demo User
5/20/2013
Predicting the movements of the stock market is an area of interest to a huge number of people not
least due to the profit opportunities that it would present. It would also allow governments to refine
their policies to be more effective and allow for safer hedging by firms and companies. In this paper I
test techniques used in data mining to see if they can predict the market with greater success than
the models that have historically been used. The aim here is to lay groundwork on which future
research into predicting the market using advanced techniques can be based. The weak form of the
efficient market hypothesis states that current returns cannot be predicted using past returns. I first
review the literature on the topic and examine the most recent methods which have been applied to
testing this hypothesis. I subsequently carry out a test using an autoregressive moving average
model which is one of the models traditionally used in testing the efficient market hypothesis. I then
apply the data mining techniques classification trees and random forests to the data. The results
from these tests are then compared against traditionally used methods and their potential for
predicting the market is highlighted. This paper concludes with recommendations for extensions to
these tests.
Table of Contents
Introduction ............................................................................................................................................ 2
Motivation............................................................................................................................................... 4
Literature Review .................................................................................................................................... 5
Empirical Approach ................................................................................................................................. 8
Standard Models ................................................................................................................................. 8
More Powerful techniques ................................................................................................................. 9
Tests of the EMH using the advanced techniques ............................................................................ 10
Description of the Data Set ................................................................................................................... 12
Empirical Results ................................................................................................................................... 15
Data Mining Methods ....................................................................................................................... 17
Summary of Results .............................................................................................................................. 20
Extensions ............................................................................................................................................. 21
Conclusion ............................................................................................................................................. 22
Bibliography .......................................................................................................................................... 23
1
Introduction
The idea of efficient markets makes sense to us all intuitively in that it must be difficult to get rich
and yet it is one of the most hotly contested propositions in all of the social sciences. Even though
the efficient market hypothesis (EMH) is very simple to state it has proven to be extremely resilient
to empirical proof or refutation. There have been several decades of research and thousands of
published studies and yet there has of yet been no consensus about whether markets are in fact
efficient. Most financial models have been built around the EMH including the famous Black-ScholesMerton model and the theory is often applied in other areas of economics such as macroeconomic
growth models.
The aim of this paper is to test the EMH using a traditional ARMA model and more recently
developed statistical models. The use of different methods enables me to test if the market can be
predicted not just by linear relationships but also by interactions between past returns.
A brief description of the EMH is provided below along with a description of the levels of the EMH. In
this paper I intend to test the weak form efficiency of the EMH.
Efficient Markets Hypothesis
The efficient market hypothesis can be defined as “Security prices accurately reflect available
information, and respond rapidly to new information as soon as it becomes available.” (Myers,
1996). This implies that it is near impossible to make returns that consistently outperform the
market and if it happens then it is due more to luck than to any particular inference or analytics on
the behalf of the investor.
Forms of Efficiency
Three forms of the efficient market hypothesis have been proposed and they are as follows:



Weak Form Efficiency – This form argues that investors should not be able to make excess
returns by observing only historical asset prices. Prices should respond only to new
information or new economic events. Technical analysts or chartists believe this to be
untrue and seek to use the trends and patterns of past prices to predict future prices.
Semi-Strong Form Efficiency – This form argues that all publically available information such
as company earnings and market conditions should be reflected in asset prices making it
impossible to earn excess returns without having insider knowledge. It would be futile for
investors to search for bargain opportunities from an analysis of published data if this form
holds. Fundamentalists believe this to be untrue and trade securities based on the
fundamental value of a company.
Strong Form Efficiency – This form argues that it should be impossible to earn excess returns
from both public and private information, thus ruling out insider trading. This form is
unlikely to hold and as such there are laws against insider trading. There is some systematic
evidence that insiders can earn abnormal returns even when they trade legally (Seyhun,
1998).
2
These three levels are not independent of one another, this means that for the market to be
efficient in the semi-strong sense it must be efficient in the weak form and must be efficient in
both to be strong from efficient.
3
Motivation
Of late there has been a loss of faith in the EMH as many studies have come out arguing against it. In
this paper I intend to test the market using not only the standard models which have been applied to
the theory in the past but to also make use of some of the more powerful statistic techniques that
have been developed more recently. I feel that it is possible there is a reverse file bias at work in
relation to studies of market efficiency. This means that when a model which can predict the market
is created it would not be published as it would be far more profitable for the researcher to sell the
model to an investment bank rather than publish the paper. Even though there are many arguments
against the efficient market hypothesis there still exists no well-publicised way to make a risk free
profit from trading on the market.
In this paper I intend to test the weak form of the efficient market hypothesis using an ARMA model.
The data I will use is weekly price data from the NYSE Composite. This index covers all the common
stock that is listed on the New York Stock Exchange; over 2000 stocks are covered on the index. The
basis for these tests stems from the original tests of the EMH based upon random walk theory. The
original tests failed to reject the EMH and although modern economic belief is that the market is not
efficient and I am making use of modern data it will be difficult for these tests to reject the EMH.
While the ARMA methodology is a very useful technique in forecasting and has many applications it
struggles to model returns data. ARMA models are unable to take account of interactions in the data
and so a model that does take account of such interactions could provide stronger results.
Should the ARMA model fail to provide any new conclusions I propose to test the EMH using some of
the more powerful statistical techniques that have been developed in recent years. As computer
power has grown there has been an explosion in the number and power of statistical techniques.
Advances over the last two decades have in the field of data analysis and data mining have led to the
development of many techniques which could allow more accurate prediction of movements in the
stock markets. Indeed some of the techniques which will be applied are undoubtedly already in use
by the major investment banks. The techniques that I intend to apply are classification trees and
random forests.
Classification trees have been shown to be one of the least powerful of the methods and yet they
are the least complicated. Thus they will allow us to see clearly the rules generated which may
provide an insight into any patterns emerging in the data. Random forests are an example of an
ensemble method which is used regularly in the financial sector for prediction. These models while
only using past returns as input data may provide better results than the standard models. These
techniques consistently outperform techniques such as linear regression methods when it comes to
both regression and classification tasks. The prediction power of these methods will be tested with
the conclusion that, subject to certain conditions, if they perform significantly better than chance we
can reject the null hypothesis of market efficiency.
4
Literature Review
The earliest known statement about the efficient market hypothesis comes from 1889.
“When shares become publicly known in an open market, the value which they acquire there may be
regarded as the judgment of the best intelligence concerning them.” (Gibson, 1889)
The origins of the modern day EMH stem from the work of Eugene F. Fama and Paul A. Samuelson in
the 1960’s. In 1965 Fama published his thesis which argued that share prices followed random walks
(Fama E. , The behavior of stock market prices, 1965) while Samuelson published a proof for a
version of the EMH (Samuelson, 1965).
In 1970 Fama defined an efficient market as one in which security prices always fully reflect the
available information (Fama E. , Efficient capital markets: A review of theory and empirical work,
1970). The implications of this statement are huge and in the 1960s and 1970s the EMH was a huge
success as academics developed more powerful theoretical reasons why the theory should hold and
a large amount of supporting empirical findings emerged. The field of academic finance was created
on the basis of the EMH and its applications (Shleifer, 2000). In 1978 at the height of the popularity
of the EMH Jensen wrote ‘there is no other proposition in economics which has more solid empirical
evidence supporting it than the Efficient Markets Hypothesis’ (Jensen, 1978).
The basic theoretical case for the EMH relies on three key arguments (Shleifer, 2000). Firstly
investors are assumed to be rational and as such will value securities rationally. Secondly any
investors that are not rational will trade randomly and so their trades will cancel out. Third,
arbitrageurs will step in to take advantage of any situation where irrational investors do act in a
similar fashion. If these assumptions hold then effects from irrational and uninformed agents should
show up in the data as merely white noise and so asset prices would follow random walks.
Tests of the weak-form efficiency have their origins in random walk theory. This theory suggests that
the sequences of share price movements over time are consistent with a series of cumulative
random numbers rather than forming patterns. The random walk model assumes that successive
returns are independent and that returns are identically distributed over time (Elton, Gruber,
J.Brown, & Goetzmann, 2011). An early test of this type was carried out by Working who found that
the price changes of wheat follow a random walk (Working, 1934).
In his 1953 paper Kendall found that when price series are observed at relatively close intervals the
random change from one period to the next is large enough in magnitude to swamp any systematic
effect which may be present (Kendall, 1953). He also finds that there is empirical evidence and
theoretical support for the belief that aggregate index numbers behave more systematically than
their components. Roberts too finds that stock prices follow a pattern similar to a random walk and
yet theorizes that departures from this model will one day be discovered by economists (Roberts,
1959).
Alexander found that in speculative markets price changes appear to follow a random walk but a
move once initiated tends to persist (Alexander, 1961). Thus by applying certain filter techniques
abnormal profits could be made however profits disappeared one transaction costs were taken into
account. Similar finding were made by Fama and Blume in 1966 (Fama & Blume, Filter rules and
stock market trading, 1966). That certain techniques could provide abnormal profits leads to the
5
possibility that markets are not efficient even though the techniques would have to be streamlined
in order to retain profitability when transaction costs are included.
After the 1970’s the EMH was challenged on both theoretical and empirical grounds. De Bondt and
Thaler advanced the theory that stock prices overreact to explain their findings that low performing
stocks earn higher returns over the next few years than stocks that have previously had high returns
(De Bondt & Thaler, 1985). Jegadeesh and Titman find that movements in individual stock prices
over the period of six to twelve months tend to predict future movements in the same direction
(Jegadeesh & Titman, 1993). Fama and French found that returns were predictable across a three to
five year period however this effect seemed to disappear over time (Fama & French, 1988).
It was also shown by many authors that it is difficult to maintain the case that investors are fully
rational. Kahneman and Riepe in their 1998 paper summarize the deviations from the standard
decision making model by many individuals (Kahneman & Riepe, 1998). Evidence that the markets
are in fact inefficient is summarized and laid out clearly in the book by Andrei Shleifer (Shleifer,
2000). Shleifer presents a multitude of evidence to support this claim including the reasons why it is
still extremely difficult to profit from the inefficiencies and why arbitrageurs are unable to take
advantage of the obvious mispricings. The revelation that the market is inefficient has led to the
creation of the field of behavioral finance.
Perhaps the most obvious source of potential profit arising out of the field of behavioral finance is
the existence of positive feedback loops. Here noise traders in price bubbles react to past price
changes, as opposed to particular news (Shleifer, 2000). Examples of this include the rising prices of
internet stocks in 1998. Black found that investors willingness to bear risk rises with their wealth
which could lead to positive feedback trading (Black, 1988). Shiller in his book Irrational Exuberance
outlines how feedback bubbles could occur and provides examples of where such bubbles have
occurred in relation to the housing market (Shiller, 2005). Shleifer provides a model showing how
such feedback loops could occur in the market (Shleifer, 2000). It appears that what is necessary
here is to create a trading strategy that can profit from such bubbles.
In relation to the predictability of the financial market evidence that this is possible is provided in
papers such as that by Lo et al. who used technical analysis and computational algorithms to predict
the market (Lo, Mamaysky, & Wang, 2000). The existence of trends and patterns in the market was
demonstrated by Lo and McKinley (Lo & MacKinlay, 2001). Gencay found that technical trading rules
such as using a moving average model are more effective at predicting returns and exchange rates
than a model that assumes they follow a random walk (1998) (1997).
Over the last few years there have been several promising results produced by using nonlinear
classification and machine learning techniques for market prediction. Using support vector machines
Wang and Zhu demonstrated that the market is predictable to some degree (Wang & Zhu, 2010).
However, once trading costs are taken into account Wang and Zhu’s model is no longer profitable.
Cao et al. find that using neural networks to predict the market is significantly more accurate than
using either the capital asset pricing model or Fama and French’s three factor model in predicting
stock market returns (Cao, Leggio, & Schniederjans, 2005). They believe that there are investors out
there who are using neural networks to successfully exploit inefficiencies in the market but who are
better served by keeping the information to themselves rather than publishing their findings. Huang
et al. show that support vector machines can be used to forecast market returns more accurately
6
than other classification methods but suggest that in order to maximize the accuracy of forecasts
several methods should be combined in an ensemble or blended model (Huang, Nakamori, & Wang,
2005).
7
Empirical Approach
In this paper I will be testing the weak form efficiency of the EMH, the possibility that returns can be
predicted from past returns. I intend on running several types of models in order to test the
hypothesis that returns cannot be predicted from past returns. This can be stated as:
E(Rt|Rt-1,Rt-2,…) = E(Rt)
Where Rt is the return at time t. The returns of an asset price can be calculated following Cooper
(1982). This gives the following formula:
Rt= loge (Pt/Pt-1)
Where Pt is the price at time t. The price data will be converted into returns data this way.
Standard Models
The model which I test will be made up of two parts. The first of these will be an autoregressive (AR)
model; the functional form of the model is as follows:
Rt = α + φ1(Rt-1) + … + φn(Rt-n) + ϵt
Where α is the expected return, φn are the coefficients on previous returns, ϵ represents the error
term, n denotes the number of lagged terms to include and this model is denoted AR(n).
The second part of the model will be made up of a moving average model. This model predicts the
current return as a function of the mean value and previous returns. The model will have the
following functional form:
Rt = μ + ϵt - ψ1(ϵt-1) - … - ψn(ϵt-n)
In this model μ denotes the mean value while ϵt denotes the error at time t, the ψi are the
coefficients on the previous error terms and n is the order of the moving average model. This model
is denoted as MA(n).
The model I will test is an autoregressive moving average model. This model combines the elements
of both a moving average and autoregressive model and its functional form is as follows:
Rt = α + φ1(Rt-1) + … + φp(Rt-p) + μ - ψ1(ϵt-q) - … - ψn(ϵt-q) + ϵt
The notation is the same as in the previous models with the order of the autoregressive model now
p and the order of the moving average is q. The model is denoted ARMA(p,q).
The EMH requires to test that: H0:φ1=…= φp= ψ1=… ψn=0 against the alternative H1 that any of the
parameters are statistically significantly different to 0.
If the data are not stationary then a different model will have to be tried which involves differencing
the times series, this is done in order to remove a trend. This differencing is integrated into the
ARMA model creating autoregressive integrated moving average model, ARIMA. The functional form
of this model is as follows:
(1 – φ1B – φ2B2 - … - φpBp)(1-B)dRt = c + (1 – ψ1B – ψ2B2 - … - ψqBq)ϵt
8
Where B represent the backshift operator defined such that B(Rt) = Rt-1 and B2(Rt) = Rt-2. Here the
terms α and μ have been combined into the term c. d represents the order of differencing in the
model. The model is denoted ARIMA (p,d,q). The null and alternate hypotheses are the same as for
the ARMA model.
When it comes to determining the significance of results the explanatory power of the models will
be important. For this we will make use of the R-squared measure. R-squared measures the amount
of variation in the data that is explained by the model. ARMA and ARIMA models are computed
using maximum likelihood however and so we will have to construct a pseudo R-squared value. This
will be done using the method laid out in the Stata online manual (Cox, 2003).
More Powerful techniques
In order to make use of classification trees and random forests I will create a data set with the
dependent variable being the returns in the current period and the independent variables being the
lagged values from the last 104 weeks, approximately two years. I will outline the techniques I
propose to use below.
Classification and Regression trees
Classification and Regression trees of the CART methodology, which will be used here, were
introduced by Breiman, Friedman, Olshen and Stone in 1984 (Classification and regression trees,
1984). One of the strengths of trees is that they are far superior at picking up relationships than
linear regression is. The algorithm used is as follows:
1. At the base node measure the impurity of the node using a criterion such as the Information
or Gini criterion.
2. Choose the variable and value that leads to the largest fall in the impurity measure when the
data is split using it.
3. Split the data into two nodes using this variable and value.
4. Repeat steps 2 & 3 for each node until no further gains in accuracy are seen.
5. Prune the tree back using the complexity parameter value.
6. The terminal nodes are then classified as whatever class makes up the highest proportion of
the cases at the node.
Cases are then run down the tree and classified according to which terminal node they end up in.
Random Forests
Random forests are an ensemble method created by Leo Breiman in 2001 (Breiman, Random
Forests, 2001). Random forests make use of the decision tree methodology. Random forests involve
the construction of a large number of trees and then combining the results in order to make
predictions. The unique feature of a random forest is that there is two kinds of randomness built in.
Instead of building each tree using the whole data set a sample of the data set is taken with
replacement, the bootstrap method, to get the data to build the tree upon. In addition at each node
only a subsection of the variables is available to be used to split the data. These variables are
selected randomly. This additional randomness has been shown to increase the accuracy of the
9
predictions and allows the models to be combined in an ensemble without over fitting the data. The
pseudo code for the algorithm as laid out by Montillo is:
Let Ntrees be the number of trees to build for each of Ntrees iterations, let mtry be the number of
variables available to split the data at any node.
1. Select a new bootstrap sample from training set
2. Grow an un-pruned tree on this bootstrap.
3. At each internal node, randomly select mtry predictors and determine the best split using only
these predictors.
4. Do not perform cost complexity pruning. Save tree as is, alongside those built thus far.
We output overall prediction as the average response (regression) or majority vote (classification)
from all individually trained trees (Albert A. Montillo, 2009).
Tests of the EMH using the advanced techniques
In order to test the EMH using these advanced techniques the data will be randomly split into
training and test sets using a 75/25 split. The models will then be constructed on the training set and
applied to the test set. The dependent variable will be converted to a binary variable to allow for the
following tests. The data will be categorized into two categories returns above the median value, H,
and returns below the median value, L. The median is used here as if we had chosen the mean value
then the model would automatically have an accuracy of greater than 50%. As the median value is
higher than the mean assigning all returns to the high class would result in an accuracy of greater
than 50% for the model.
The classification tree will be built on the training data using the previously outlined algorithm. The
tree will then be applied to the test set data. The hypotheses are as follows:
H0: The accuracy of the model is equal to 0.5
H1: The accuracy of the model is not equal to 0.5
Or
H0: The model has no predictive power/ is no better than picking the results by chance
H1: The model has predictive power/is better than picking the results by chance
Thus if the model is able to predict above average returns at a rate that is statistically significantly
different to chance we can reject the null hypothesis and by extension the weak-form of the EMH
which states that past returns should provide no information in relation to current returns.
The random forest will follow the exact same procedure with the model constructed on the training
data set and applied to the test data set. The hypotheses and subsequent conclusions remain
unchanged for the random forest.
10
The statistical significance of these models will be tested using the receiver operating characteristic
(ROC) curve. This graphs the relationship between sensitivity and specificity for the model. The
sensitivity of a returns test is the proportion of returns for whom the outcome is high that are
correctly identified by the test. The specificity is the proportion of returns for whom the outcome is
low that are correctly identified by the test. A test with a high sensitivity may have low specificity
and vice versa. Usually a tradeoff between the two will have to be made. The ROC curve is used
because simple classification accuracy has been shown to be a poor metric for performance (Provost
& Fawcett, 1997). In order to test the significance of the ROC curve confidence intervals will be
constructed. These will be constructed using the bootstrap method which involves taking a random
sample with replacement the same size as the test set from the test set and running it through the
model.
The area under the curve (AUC) gives a measure the predictive power of the model and is explained
clearly by Bradley in his 1997 paper (1997). The AUC of a classifier is equivalent to the probability
that the classifier will rank a randomly chosen positive instance higher than a randomly chosen
negative instance which is equivalent to the Wilcoxon test of ranks (Hanley & McNeil, 1982). An ideal
model would have an AUC of 1 or 100%. If the null hypothesis holds we would expect the AUC of the
models to be equal to 0.5 or 50% which would be equivalent to just picking the results by chance
(Bewick, Cheek, & Ball, 2004).
11
Description of the Data Set
The data set contains information on the price of the NYSE Composite over a period of time taken
from the Yahoo! Finance website (Yahoo! Finance, 2013). The data set stretches from 31/12/1965 to
11/02/2013 and there are 2459 observations. There is no missing data in the data set. As the data
were in price form they were converted into returns with the loss of one observation leaving 2458
observations. The table 1 summarises some of the key characteristics of the data set.
Mean
Standard Error
Median
Minimum
Maximum
0.00115119286102522
0.000451086760493625
0.003091893
-0.217345349
0.130591436
Table 1 - Data Summary Statistics
The fact that the mean is a far lower value than the median indicates that losses when they do occur
are more extreme than gains above the mean value. This can also be seen in the minimum and
maximum values where the minimum value much further away from the mean value than the
maximum value.
The series was tested for a unit root using the Dickey-Fuller test and was found to be stationary. The
null hypothesis here is that there is a unit root, a p-value of 0.000 combined with a test statistic
value of -49.620 and a 1% critical value allows us to reject this hypothesis.
Dickey-Fuller test for unit root
Z(t)
Number of obs
Test
Statistic
1% Critical
Value
-49.620
-3.430
2353
Interpolated Dickey-Fuller
5% Critical
10% Critical
Value
Value
MacKinnon approximate p-value for Z(t) = 0.0000
Table 2 – Dickey-Fuller Test for Unit Root
12
=
-2.860
-2.570
The kernel density is displayed on plot 1. From this plot we can see that returns are very similar to a
normal distribution yet have a higher peak at the mean value.
10
5
0
Density
15
20
Kernel density estimate
-.2
-.1
0
returns
.1
.2
Kernel density estimate
Normal density
kernel = epanechnikov, bandwidth = 0.0035
Plot 1 – Kernel Density of Returns
A plot of the observations can be seen in plot 2 on the following page. With so many observations it
would be difficult to spot small trends even if they existed and yet no major trends or runs are
apparent from the plot. The data is clearly very noisy and the lack of any linear trend suggests that
we should not have to difference the data i.e. the data is stationary in mean.
Along with the plot the autocorrelation function (ACF) plot and the partial autocorrelation function
(PACF) plot are also displayed. It can be clearly seen from these plots that the lags at several of the
variables are significant. The autocorrelation function measures the similarity of observations as a
function of the time separation between them. The partial autocorrelation function is an extension
of the autocorrelation function where the dependence on the intermediate elements is removed.
The statistically significant lags on the ACF plot, those that break through the support line, may
indicate the existence of an autoregressive process at those lags while the PACF spikes indicate lags
where a moving average may exist. The significant lags on the ACF are 6, 15, 27, 60, 83 and 88. The
significant lags for the PACF are 6, 15, 60, 83, 89, 91 and 99. Particular attention will be paid to
these lags when constructing the models. It is difficult at this stage to determine whether these lags
represent an autoregressive or moving average process.
13
Plot 2 – Time Series Plot of the Data with ACF and PACF Plots
14
Empirical Results
In this section I will present the empirical results that emerged from examining the data. The
interpretation and implications of the results will be presented in the next section. The results in this
section were computed using the statistical software Microsoft Excel, Stata and R.
ARMA(7,7)
In order to determine the best model to use and avoid over fitting a test of the significance of
various lags was carried out. The variables are tested over three criteria namely Akaike's information
criterion (AIC), Hannan and Quinn’s information criterion (HQIC) and Schwarz's Bayesian information
criterion (SBIC). The results are provided below with * indicating the highest value for each criterion.
From the table we can see that both the HQIC and SBIC are maximised at 0 while the AIC is
maximised at 7. As the AIC is better at handling monthly and weekly data we will use this criterion
(Ivanov & Kilian, 2001).
Selection-order criteria
Sample: 1960w12 - 2005w15
lag
0
1
2
3
4
5
6
7
8
9
10
LL
5568.89
5569.56
5570.57
5570.65
5571.35
5571.43
5577.42
5578.9
5579.12
5579.32
5579.58
LR
1.3363
2.0236
.16142
1.4078
.1662
11.977*
2.9468
.45216
.39764
.51108
Number of obs
df
1
1
1
1
1
1
1
1
1
1
p
0.248
0.155
0.688
0.235
0.684
0.001
0.086
0.501
0.528
0.475
FPE
.000506
.000506
.000506
.000507
.000507
.000507
.000505
.000505*
.000505
.000506
.000506
AIC
-4.75076
-4.75047
-4.75048
-4.7497
-4.74945
-4.74866
-4.75292
-4.75333*
-4.75266
-4.75198
-4.75135
HQIC
-4.74986*
-4.74868
-4.7478
-4.74612
-4.74497
-4.74329
-4.74666
-4.74617
-4.74461
-4.74303
-4.7415
=
2344
SBIC
-4.7483*
-4.74556
-4.74311
-4.73987
-4.73716
-4.73392
-4.73572
-4.73367
-4.73055
-4.72741
-4.72432
Table 3 – Selection Order Criteria
The model was run and tested positive for heteroscedasticity. The model was then tested for
autoregressive conditional heteroscedasticity (ARCH) which came back positive. As there is ARCH
present the model we will use the appropriate model by including ARCH features in the regression.
The significant variables in this model are the lags 2, 5, 6 and 7 for the autoregressive process, lags 2,
5 and 7 for the moving average and lags 1, 2, 3, 4, 6 and 7 for the ARCH process. As such we are able
to reject the null hypothesis of no significant variables. The significant variables are summarised in
the table below.
15
Variable
RT-2
RT-5
RT-6
RT-7
ϵt-2
ϵt-5
ϵt-7
ARCHt-1
ARCHt-2
ARCHt-3
ARCHt-4
ARCHt-6
ARCHt-7
Coefficient
0.6237697***
(0.0586303)
-0.4644677***
(0.0832076)
0.1173529*
(0.0634434)
0.8371188***
(0.0550205)
-0.6121567***
(0.0593238)
0.4464211***
(0.0806975)
-0.854614***
(0.0541573)
0.2134317***
(0.0218642)
0.086734***
(0.027355)
0.1471664***
(0.0280176)
0.1003543***
(0.0219859)
0.0746435***
(0.0211112)
0.0337553*
(0.0175689)
Table 4 – ARMA Output, standard errors are reported in parentheses and p-values are denoted as follows:
***p<0.01, **p<0.05, *p<0.1
Again the statistically significant values of the coefficients here indicate that the past values of
returns have some predictive power when it comes to forecasting future returns. The summary
statistics for this model are provided in the table below.
Statistic
Log likelihood
Wald Chi-Square
Pseudo R-squared
Durbin Watson
Value
5863.085
22374.53***
0.01403622
2.0000374
Table 5 – ARMA Summary Statistics
The Wald Chi-Square statistic is highly significant with a p-value of 0.000. This statistic tests the
hypothesis that at least one of the predictors’ regression coefficients is not equal to zero. Thus we
can conclude that the variables in the model are significant.
The pseudo R-squared value for this model is 0.01403622 which indicates that this model explains
1.4% of the variation in the data.
The Durbin-Watson statistic for this model is 2. This fails to reject the null hypothesis of no
autocorrelation and so it can be inferred that there is no heteroscedasticity present.
16
Data Mining Methods
Here the classification tree and random forest methods will be tested on the data set.
Classification Tree
The classification tree was built using the Gini criterion as this operates more effectively in noisy
domains (A. & Berry, 2004). The classification tree is displayed below. The nodes that contain
predominantly high returns are denoted H while those with predominately low returns are denoted
L.
Plot 3 – Classification Tree
What is interesting about this tree is that the variables it uses to split upon are not recent variables.
In fact the first variable split upon is the lagged value from 103 weeks ago, almost two years. The
variable which was created closest to the present is t32 which is the lagged value from 32 weeks
ago, nearly four months. This indicates that the patterns which allow predictability take place long
before the pattern is realised. In addition none of the variables here appeared as significant on the
ACF or PACF plots. This showcases how decision trees make use of interactions as opposed to linear
effects.
17
The misclassification table produced by the classification tree with a cut off value of 0.5 is displayed
below. The returns are classified according to whether they are above, high, or below, low, the
median return value.
Classification Tree
Predicted
High
Low
Actual
High
175
102
Low
178
133
Table 6 – Classification Tree Misclassification Tables
Thus this classification tree classifies high returns with an accuracy of 63% and low returns with an
accuracy of 42%. Overall the accuracy of the tree is 52%. Thus it would seem the tree has a small
amount of predictive power. In order to determine if the predictive power of the tree is statistically
significant further tests are required. Below is the receiver operating characteristic (ROC) curve for
this classification tree. The 95% confidence interval was constructed using the bootstrap method
with 2,000 repetitions.
Plot 4 – Classification Tree ROC curve
As can be seen the confidence interval for the AUC spans the area 49.8% to 59% and thus is not
statistically significantly different from 50% at the 95% confidence level. As such we fail to reject the
null hypothesis that the model has no predictive power or is better than chance.
18
Random Forest
The random forest contains 2,000 trees, with a minimum final node size of one and twenty variables
were tried at each split. The misclassification table using a cut-off value of 0.499 for the random
forest is displayed below.
Random Forest
Predicted
High
Low
Actual
High
171
106
Low
180
131
Table 7 – Random Forest Misclassification Table
Thus this model predicts high returns with an accuracy of 61.7% and low returns with an accuracy of
42%. The overall accuracy of the model is 51.36% again this is only slightly better than chance. The
ROC curve of the random forest is displayed below. The 95% confidence intervals were calculated
using the same bootstrap technique as was used for the classification tree.
Plot 5 – Random Forest ROC curve
From the ROC curve we can see that the estimated AUC is 54.8% with the 95% confidence bounds of
50.1% and 59.5%. As this confidence interval does is above 50% we can conclude that this model has
predictive power with 95% confidence. Thus we are able to reject the null hypothesis of no
predictive power.
19
Summary of Results
In the ARMA model there were statistically significant variables which let us reject the null
hypothesis. Thus past returns were found to have an effect on the current market price. This is a
violation of the weak form of the EMH. However the model has very little actual predictive power
and so the gains from exploiting this inefficiency may be non-existent when transaction costs are
taken into account. The ARMA (7,7) model had a pseudo R-squared value of 0.01403622 which
implies that it still explains less than 1.5% of the variation in returns. Thus any trading strategy based
upon any of this model would have an extremely high level of risk associated with it due to its poor
explanatory power.
The classification tree failed to reject the null hypothesis that the model had predictive power. As
such for this test the result is that markets are efficient. This result was expected as singular
classification trees generally have very low predictive power.
In relation to the random forest the model the null hypothesis was rejected and we can accept that
the model had predictive power. However even though the confidence interval was above 50% it
was only marginally so, by 0.1% in fact, which is very close to a model based on chance. There are no
R-squared measures for tree based methods. That the model was only slightly better than chance
leaves open the possibility that any potential profits arising from the implementation of a trading
strategy based on such a model could be wiped out by transaction costs.
It is difficult to know what conclusions to draw for the results here. Elton et al. (2011) suggest three
possible explanations of what could be the true state of affairs:
1. With so many researchers examining the same data set, as is definitely the case for the NYSE
Composite data used here, patterns will be found and these patterns are simply random.
2. The patterns could be caused by the market structure and order flow.
3. The markets are inefficient.
The results found here are consistent with the majority of the literature in that because of
transaction costs it is more than likely that the return differences are not large enough to allow
development of a trading model to exploit the patterns in the returns.
20
Extensions
In order to determine if it would be possible in order to profit from the market inefficiencies
described above data on transaction costs would need to be available and a trading strategy would
have to be developed based on the model. Only if this model is able to make returns above the
market average allowing for transaction costs could we consider it proof of a violation of the EMH.
Timmerman and Granger say that as transaction costs vary over time a real time set of transaction
costs is needed to fully test the EMH and this makes disproving the theory far more difficult
(Timmermann & Granger, 2004).
As seen in the examination of the classification tree different variables have linear and interaction
effects on returns. As ARMA models are good at capturing linear effects and tree based methods are
good at capturing interactions a model that combines these two methods could provide a
substantially better prediction than either of the two models can alone.
I believe that future tests of the EMH will involve the use of similar advanced statistical models to
the ones tested here. In particular the field of ensemble models which allow the combination of
several thousand models of different types to allow for more accurate results holds substantial
promise. By making use of model selection algorithms such as the one outlined by Caruna a far
greater level of accuracy can be achieved than what it is possible to achieve using single models
alone (Caruana, 2004). These ensemble methods can combine different types of models such as
random forests, neural nets, support vector machines and others to create a more accurate
prediction. The downside to the construction of these models is that they require significant
computing power to create and would need to be constantly updated making them unusable for
those without access to significant computing power.
If the feedback loop amplification theory, outlined by Shiller in his book Irrational Exuberance
(Shiller, 2005) and by Shliefer in his book Irrational Exuberance (Shleifer, 2000), holds it may be
possible to find more significant patterns using smaller periods. While previously information took
time to travel from person to person the advent of the smartphone age has led to many being
constantly updated on developments around the world. As such the feedback loops described by
Shiller would have a far greater chance of being picked up by a random forest if the data was made
up of smaller time periods such as seconds, minutes or hours. The time period used in this study of
weeks would be far too great to pick up such patterns as would I suspect even daily data. The
emphasis that random forests put on interactions between variables would make them very well
suited to detect the sort of feedback amplification described by Shiller.
21
Conclusion
In this paper I have examined the literary background of the EMH, the testing of it, its development
into the field of behavioural finance and the application of modern statistical techniques to the
testing of it. The empirical approach was then laid out which included brief descriptions of the
statistical techniques which some readers may be unfamiliar with. An exploration of the dataset was
then completed followed by the empirical results.
The findings here are similar to those found in many papers regarding the EMH that some
inefficiency can be found yet it is a relatively small amount and there would be little to no profit
opportunity in exploiting such inefficiencies once transaction costs are taken into account. Saying
nothing for the risks one would be taking following a model that only explains 1.5% of the variation
in the market.
A bright spot in the findings here was the statistically significant predictive power of the random
forest model. This model performed statistically significantly better than chance and models of this
sort hold particular promise when it comes to future tests of the EMH.
In order to accurately test the weak-form efficiency of the market it is necessary to have real time
data on transaction costs and that data is currently not available. A key component of future studies
will be to include such data. My recommendations for extensions also include making greater use of
combination models and ensemble methods and using data with different time periods.
There has been much work done in relation to the efficient market hypothesis and much work
remains to be done. As the field of behavioural finance develops there will be more opportunities for
economists to structure their models to identify inefficiency in the market. I believe that the
continued advances in statistical techniques and computer power will provide economists with the
tools to find more breakdowns in the EMH at all strength levels. Whether those findings will be
published or sold to an investment bank is another matter entirely.
22
Bibliography
A., M. J., & Berry, G. S. (2004). Data Mining Techniques For Marketing, Sales and Customer
Relationship Management. Indianapolis: Wiley Publishing Inc.
Albert A. Montillo, P. (2009, February 04). Guest lecture: Statistical Foundations of Data Analysis.
Temple University.
Alexander, S. (1961). Price movements in speculative markets: trends or random walks. Industrial
management review, May, pp. 7-26.
Allan Timmermann, C. W. (2004). Efficient market hypothesis and forecasting. International Journal
of Forecasting, 20, 15 – 27.
Bewick, V., Cheek, L., & Ball, J. (2004). Statistics review 13: receiver operating characteristic curves.
Crit Care, 8: 508–512.
Black, F. (1988). An equilibrium model of the crash. NBER Macroeconomics Annual, 269-76.
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning
algorithms. Pattern Recognition , 30 (7) 1145–1159.
Breiman, L. (2001). Random Forests. University of California, Berkley: Statistics Department .
Breiman, L., & Friedman, J. H. (1984). Classification and regression trees. Monterey, CA: Wadsworth
& Brooks/Cole Advanced Books & Software.
Cao, Q., Leggio, K. B., & Schniederjans, M. J. (2005). A comparison between Fama and French’s
model and artifcial neural networks in predicting the Chinese stock market. Computers &
Operations Research, 32: 2499 – 2512.
Caruana, R. (2004). Ensemble Selection from Libraries of Models. International Conference on
Machine Learning. Cornell.
Cooper, J. C. (1982). World stock markets: some random walk tests. Applied Economics, 14, 515-531.
Cox, N. J. (2003, September). Do it yourself R-squared. Retrieved from www.stata.com:
http://www.stata.com/support/faqs/statistics/r-squared/
De Bondt, W., & Thaler, R. (1985). Does the Stock Market overreact? Journal of Finance, 40:793-805.
Elton, E. J., Gruber, M. J., J.Brown, S., & Goetzmann, W. N. (2011). Modern Portfolio Theory and
Investment Analysis 8th edition. Wiley.
Fama, E. (1965). The behavior of stock market prices. Journal of Business.
Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance,
25: 383-417.
Fama, E., & Blume, M. (1966). Filter rules and stock market trading. Journal of Business, Security
Prices: A Supplement, January.
23
Fama, E., & French, K. (1988). Permanent and temporary components of stock prices. Journal of
Political Economy, 96:246-73.
Gencay, R. (1997). Linear, non-linear, and essential exchange rate prediction with simple technical
trading rules. Journal of International Economics, 47:91–107.
Gencay, R. (1998). The predictability of security returns with simple technical trading rules. Journal
of Empirical Finance, 5:347–59.
Gibson, G. (1889). The Stock Exchanges of London Paris and New York. New York: G. P. Putnman &
Sons.
Hanley, J., & McNeil, B. (1982). The meaning and use of the area under a receiver operating
characteristic (ROC) curve. Radiology, 143: 29-36.
Huang, W., Nakamori, Y., & Wang, S.-Y. (2005). Forecasting stock market movement direction with
support vector machine. Computers & Operations Research, 32: 2513 – 2522.
Ivanov, V., & Kilian, L. (2001). A Practitioner's Guide to Lag-Order Selection for Vector
Autoregressions. London, Centre for Economic Policy: CEPR Discussion Paper no. 2685.
Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: implications for
stock market efficiency. Journal of Finance, 48: 65-91.
Jensen, M. (1978). Some anomalous evidence regarding market efficiency. Journal of Financial
Economics, 6: 95-101.
Kahneman, D., & Riepe, M. W. (Summer 1998). Aspects of investor psychology. Journal of Portfolio
Management, Vol. 24 Issue 4, p52-65.
Kendall, R. (1953). The analysis of economic time series, part 1: prices. Journal of the Royal Statistical
Society, Vol 116, No.1, pp 11-34.
Lo, A. W., Mamaysky, H., & Wang, J. (2000). Foundations of Technical Analysis: Computational
Algorithims, Statistical Inference, and Empirical Implementation. The Journal of Finance, Vol
No. 4, 1705-1765.
Lo, A., & MacKinlay, C. (2001). A non-random walk down wall street. Princeton: Princeton University
Press.
Myers, R. B. (1996). Principles of Corporate Finance.
Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison
under imprecise class and cost distributions. Third Internat. Conf. on Knowledge Discovery
and Data Mining (pp. 43–48). Menlo Park, CA: AAAI Press.
Roberts, H. (1959). Stock market "patterns" and financial analysis: methodolical suggestions. Journal
of Finance, March.
Samuelson, P. (1965). Proof that properly anticipated prices . Industrial Management Review.
24
Seyhun, H. N. (1998). Investment Intelligence From Insider Trading. Cambridge: MIT Press.
Shiller, R. J. (2005). Irrational Exuberance. New Jersey: Princeton University Press.
Shleifer, A. (2000). Inefficient Markets. Oxford: Oxford University Press.
Wang, L., & Zhu, J. (2010). Financial market forecasting using a two-step kernel learning method for
the support vector regression. Ann Oper Res, 174: 103–120.
Working, H. (1934). A random difference series for use in the analysis of time series. Journal of the
American Statistical Association, March.
Yahoo! Finance. (2013, 02 11). Yahoo! Finance : NYSE Composite Index. Retrieved 02 11, 2013, from
Yahoo! Finance:
http://finance.yahoo.com/q/hp?s=%5ENYA&a=00&b=12&c=1950&d=01&e=13&f=2013&g=
w
25