Download MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
MACHINE LEARNING IN
ALGORITHMIC TRADING SYSTEMS
OPPORTUNITIES AND PITFALLS
WHAT I’M GOING TO TALK ABOUT
• Extremely broad topic – will keep it high level
• Why and how you might use ML
• Common pitfalls – not ‘classic’ data science
• Some example applications and algorithms that I like
• I hope I whet your appetite for ML!
A LITTLE ABOUT ME
• Director of Algorithmic Trading at Honour Plus Capital
• Wholesale fund, diversified investment approach – fixed income, equities,
foreign exchange, property
• Mechanical engineer by training
• Other than algorithmic trading, I also like learning, travelling, rowing and
these days hanging out with my young family.
CODING LANGUAGES AND SOFTWARE
• I particularly like R and C, but all of what I present can be implemented in
Python, MATLAB etc
• For trading simulations, I like the Zorro platform: powerful, flexible, simple Csyntax, R interface, designed specifically for back-testing accuracy
MACHINE LEARNING – WHAT’S THE FUSS?
WHAT IS IT AND WHY SHOULD YOU CARE?
• Algorithms that allow computers to find insights hidden in data
• As simple as linear regression, as complex as a deep neural network with
thousands of interconnected nodes
• ‘Mainstream’ by 2018 – big data, fast computers
• Rapidly evolving
• Find new sources of alpha. Maintain your market edge.
THE FASCINATION FACTOR
MACHINE LEARNING – PROS AND CONS
• Powerful, flexible, insightful
• Find complex, non-linear relationships that humans can’t observe directly
• Humans are good at seeing relationships between 2, 3, possibly 4 variables.
• Higher dimensionality requires abstract thinking that we are not really built
for.
• Examples
2D RELATIONSHIPS
X
Y
8
7.48
9
10.07
10
10.19
11
11.98
12
10.38
13
12.02
14
15.05
15
13.67
16
15.25
17
16.83
18
17.07
19
19.76
20
19.74
21
21.75
22
19.03
23
23.20
3D RELATIONSHIPS
X
Y
Z
-0.895
-0.895
1.342
-0.789
-0.789
1.274
-0.684
-0.684
1.212
-0.579
-0.579
1.155
-0.474
-0.474
1.107
-0.368
-0.368
1.066
-0.263
-0.263
1.034
-0.158
-0.158
1.012
-0.053
-0.053
1.001
0.053
0.053
1.001
0.158
0.158
1.012
0.263
0.263
1.034
HIGH ORDER DIMENSIONALITY
• Can you visualise the relationships between 4, 5, 6 variables?
• How about 100? 1000?
MACHINE LEARNING – PROS AND CONS
• Powerful – easy to overfit
• Requires effort to understand and set up
• Lots of moving parts = lots can go wrong, particular when you begin
processing data and executing trades in real time
TWO FLAVOURS – SUPERVISED AND
UNSUPERVISED LEARNING
• What is the difference?
• When would you use one or the other?
SUPERVISED LEARNING
• Predict some value or class from
the features
• For example, predict to which
group something belongs based
on the values of x1 and x2
• Examples: linear regression, logistic
regression, artificial neural networks,
support vector machines, decision trees
Image credit: Andrew Ng
UNSUPERVISED LEARNING
• Detect natural structure within the
data
• For example, divide the data into
its natural segments based on x1
and x2.
• Examples: k-means clustering, selforganizing maps
Image credit: Andrew Ng
EXAMPLE APPLICATIONS
• ‘Blind’ data mining
• ‘Intelligent’ data mining
• Model-based
• These approaches attempt to predict price movement based on some known data
• Strategy insights – train a model using the returns of a trading system as the
dependent variable and any factor you think affects it as an independent variable.
Essentially the above process in reverse.
•
Segregating data into ‘natural’ groupings
‘BLIND’ DATA MINING
• Not recommended
• Mine a data set for combinations of variables that have predictive utility
• Very easy to get lucky
• Requires additional work to statistically account for data-mining bias. See White
(2000). His approach was popularised by Aronson (2006).
•
White’s approach lends itself to computerised data mining as it can be automated as
part of the workflow
•
Compare with manually combining technical analysis rules
‘INTELLIGENT’ DATA MINING
• Engineer and select variables that are likely to have some relationship with
the target variable
• Tap into your domain knowledge and creativity
• Feature engineering takes time and effort
• Some algorithms that can assist – RFE, Boruta
• Also susceptible to data-mining bias
‘INTELLIGENT’ DATA MINING
• Boruta feature selection algorithm measures the relative importance of your
variables by comparing them with random variables obtained by randomised
subsampling of the actual variables.
• Easy to implement, intuitive, elegant
• Blue: random variables
• Green: variables that ranked
higher than the best random
variable
• Red: variables that ranked
lower than the best random
variable
Source: robotwealth.com
Sharpes of models constructed with
various combinations of variables
selected by Boruta trained with
various algorithms.
Source: robotwealth.com
MODEL-BASED APPROACH
•
Construct a mathematical
representation of some market
phenomenon
•
Consider the ARMA model and its
limitations
•
Artificial Neural Networks – “universal
function approximators”
MODEL-BASED APPROACH
• Some creative examples that leverage the power of deep neural networks
and the availability of high quality satellite imagery
• Oil tanks with floating lids for insight into oil supply and demand dynamics
• Car park usage as a predictor of retail sales
SEGREGATION
• Split data into natural groupings to gain insight into how it is structured
• Example: candlestick patterns that actually have a quantitative rationale
Source: robotwealth.com
PITFALLS
•
“Classic” data science is difficult to apply to the markets. The Coursera ML courses
and the Kaggle data science competitions are a great place to start, but nearly
always deal with data that is fundamentally different to ours.
•
Specifically, our data is not IID. It contains complex autocorrelations and is nonstationary. Makes conventional cross-validation techniques difficult to apply – more
on this shortly.
•
•
Tendency to overfit
Data mining and the tendency to be fooled by randomness
TENDENCY TO OVERFIT
• Deal with the easiest problem first
• Reduce the number of features – in practice, avoid the temptation to use more
than three or four
• Regularization is your friend
• So is cross-validation – but needs to be done right
REGULARIZATION
• A mathematical technique for reducing the complexity of a model
• For example, reduce the impact of terms in a linear regression:
𝑦 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥𝑎4
• Regularization might set some coefficients to zero or to very low values
REGULARIZATION
• Getting the regularization parameter right  models that are neither underfit nor over-fit
• Need a way of measuring and comparing your out-of-sample performance –
cross validation and a final validation test set
CROSS VALIDATION
What?
•
•
A method of measuring the predictive performance of a statistical model
Several varieties – k-fold, leave one out, bootstrap
Why?
• High R-squared != good model … especially if model is overfit
• Test the model on a set of data not used in model training
• One in-sample plus one OOS set not usually good enough
K-FOLD CROSS VALIDATION
• The original sample is randomly partitioned into k subsamples
• Train the model on k-1 subsamples
• Test it on the kth subsample, that is, the one that was left out and record
performance
• Select a different subsample to leave out and repeat the process until all k
subsamples have been used as a test set
• Repeated k-fold CV: repeat the process with a different random partitioning
DON’T TREAT CV AS A PANACEA
• CV performance can be an estimate of out-of-sample performance, but in
practice it will be a biased estimate, particularly if we repeat it for different
variations of our model
• Consider tuning a model parameter (eg a regularization parameter) and
selecting the model with the best CV performance…selection bias
• Unfortunately k-fold CV is not a great approach for financial data
THE PROBLEM OF NON-I.I.D. DATA
• What is Independent and Identically Distributed data?
• Independent: one event gives no information about another event (P2 is not
related to P1)
• Identically distributed: the distribution is constant
• If x and y taken from the same distribution, they are identically distributed
EXAMPLE: COIN TOSS
The tosses are IID because:
1.
The occurrence of one result tells us nothing about the probability of another
result. That is, the process has no memory.
2.
The distribution from which the tosses are drawn never changes. The
probability of each result never changes.
Note that even a biased coin toss is IID (events don’t need to be equiprobable)
FINANCIAL DATA
•
Highly non-stationary – the statistical moments vary with time and the length of their
calculation windows
•
Certain events may or may not affect the probability of other events
•
Not a total disaster – simply means that the assumptions of many common statistical
test are violated
•
Also means that the randomisation process used in regular CV is not appropriate
•
Don’t use data from the future to build a model on the past!
TIME SERIES CROSS VALIDATION
• Use a rolling window
• Train the model on the window, test it on the adjacent data
• Shift the window into the future and repeat (the start of the window can be
anchored or moving)
• In R, easy to implement with the caret package
• In Zorro, easy to implement with the walk-forward framework
TIME SERIES CROSS VALIDATION
In practice, the length of the optimization and out of sample periods may be
important factors in a model’s success. But a robust trading model will be
largely insensitive to these.
TIME SERIES CV IS ALSO NOT A PANACEA
• Even if your model passes a TS CV test, there is no guarantee that it won’t just
stop working if and when the underlying relationship being modelled changes.
• Particularly true for data-mining systems.
• At lest with model-based systems, you have insight into when it is likely to
break down.
DIAGNOSTICS - LEARNING PLOTS
(GET YOUR MODEL UNDER CONTROL EFFICIENTLY)
High variance – overfit
model
Image credit: Andrew Ng
High bias – underfit
model
ALL THE RAGE: DEEP NEURAL NETWORKS
•
What is it?
Broadly - multi-layer neural networks with potentially thousands of nodes
Capable of making high level abstractions in line with what we call ‘intuition’
Computer vision (driverless cars), handwriting recognition, Alpha Go
Made possible through advances in computational efficiency, both hardware and
software
•
Stacked Auto Encoders – simplify feature selection?
USEFUL RESOURCES
• Max Kuhn, his caret package and book Applied Predictive Modelling
• CRAN machine learning task view
• Winning the Kaggle Algorithmic Trading Competition – an interesting read
• David Aronson’s Statistically Sound Machine Learning
• www.robotwealth.com (of course) – lots of source code examples, R and Lite-C
• Hyndsight - the blog of Rob Hyndman, Melbourne Uni statistician http://robjhyndman.com/