Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS OPPORTUNITIES AND PITFALLS WHAT I’M GOING TO TALK ABOUT • Extremely broad topic – will keep it high level • Why and how you might use ML • Common pitfalls – not ‘classic’ data science • Some example applications and algorithms that I like • I hope I whet your appetite for ML! A LITTLE ABOUT ME • Director of Algorithmic Trading at Honour Plus Capital • Wholesale fund, diversified investment approach – fixed income, equities, foreign exchange, property • Mechanical engineer by training • Other than algorithmic trading, I also like learning, travelling, rowing and these days hanging out with my young family. CODING LANGUAGES AND SOFTWARE • I particularly like R and C, but all of what I present can be implemented in Python, MATLAB etc • For trading simulations, I like the Zorro platform: powerful, flexible, simple Csyntax, R interface, designed specifically for back-testing accuracy MACHINE LEARNING – WHAT’S THE FUSS? WHAT IS IT AND WHY SHOULD YOU CARE? • Algorithms that allow computers to find insights hidden in data • As simple as linear regression, as complex as a deep neural network with thousands of interconnected nodes • ‘Mainstream’ by 2018 – big data, fast computers • Rapidly evolving • Find new sources of alpha. Maintain your market edge. THE FASCINATION FACTOR MACHINE LEARNING – PROS AND CONS • Powerful, flexible, insightful • Find complex, non-linear relationships that humans can’t observe directly • Humans are good at seeing relationships between 2, 3, possibly 4 variables. • Higher dimensionality requires abstract thinking that we are not really built for. • Examples 2D RELATIONSHIPS X Y 8 7.48 9 10.07 10 10.19 11 11.98 12 10.38 13 12.02 14 15.05 15 13.67 16 15.25 17 16.83 18 17.07 19 19.76 20 19.74 21 21.75 22 19.03 23 23.20 3D RELATIONSHIPS X Y Z -0.895 -0.895 1.342 -0.789 -0.789 1.274 -0.684 -0.684 1.212 -0.579 -0.579 1.155 -0.474 -0.474 1.107 -0.368 -0.368 1.066 -0.263 -0.263 1.034 -0.158 -0.158 1.012 -0.053 -0.053 1.001 0.053 0.053 1.001 0.158 0.158 1.012 0.263 0.263 1.034 HIGH ORDER DIMENSIONALITY • Can you visualise the relationships between 4, 5, 6 variables? • How about 100? 1000? MACHINE LEARNING – PROS AND CONS • Powerful – easy to overfit • Requires effort to understand and set up • Lots of moving parts = lots can go wrong, particular when you begin processing data and executing trades in real time TWO FLAVOURS – SUPERVISED AND UNSUPERVISED LEARNING • What is the difference? • When would you use one or the other? SUPERVISED LEARNING • Predict some value or class from the features • For example, predict to which group something belongs based on the values of x1 and x2 • Examples: linear regression, logistic regression, artificial neural networks, support vector machines, decision trees Image credit: Andrew Ng UNSUPERVISED LEARNING • Detect natural structure within the data • For example, divide the data into its natural segments based on x1 and x2. • Examples: k-means clustering, selforganizing maps Image credit: Andrew Ng EXAMPLE APPLICATIONS • ‘Blind’ data mining • ‘Intelligent’ data mining • Model-based • These approaches attempt to predict price movement based on some known data • Strategy insights – train a model using the returns of a trading system as the dependent variable and any factor you think affects it as an independent variable. Essentially the above process in reverse. • Segregating data into ‘natural’ groupings ‘BLIND’ DATA MINING • Not recommended • Mine a data set for combinations of variables that have predictive utility • Very easy to get lucky • Requires additional work to statistically account for data-mining bias. See White (2000). His approach was popularised by Aronson (2006). • White’s approach lends itself to computerised data mining as it can be automated as part of the workflow • Compare with manually combining technical analysis rules ‘INTELLIGENT’ DATA MINING • Engineer and select variables that are likely to have some relationship with the target variable • Tap into your domain knowledge and creativity • Feature engineering takes time and effort • Some algorithms that can assist – RFE, Boruta • Also susceptible to data-mining bias ‘INTELLIGENT’ DATA MINING • Boruta feature selection algorithm measures the relative importance of your variables by comparing them with random variables obtained by randomised subsampling of the actual variables. • Easy to implement, intuitive, elegant • Blue: random variables • Green: variables that ranked higher than the best random variable • Red: variables that ranked lower than the best random variable Source: robotwealth.com Sharpes of models constructed with various combinations of variables selected by Boruta trained with various algorithms. Source: robotwealth.com MODEL-BASED APPROACH • Construct a mathematical representation of some market phenomenon • Consider the ARMA model and its limitations • Artificial Neural Networks – “universal function approximators” MODEL-BASED APPROACH • Some creative examples that leverage the power of deep neural networks and the availability of high quality satellite imagery • Oil tanks with floating lids for insight into oil supply and demand dynamics • Car park usage as a predictor of retail sales SEGREGATION • Split data into natural groupings to gain insight into how it is structured • Example: candlestick patterns that actually have a quantitative rationale Source: robotwealth.com PITFALLS • “Classic” data science is difficult to apply to the markets. The Coursera ML courses and the Kaggle data science competitions are a great place to start, but nearly always deal with data that is fundamentally different to ours. • Specifically, our data is not IID. It contains complex autocorrelations and is nonstationary. Makes conventional cross-validation techniques difficult to apply – more on this shortly. • • Tendency to overfit Data mining and the tendency to be fooled by randomness TENDENCY TO OVERFIT • Deal with the easiest problem first • Reduce the number of features – in practice, avoid the temptation to use more than three or four • Regularization is your friend • So is cross-validation – but needs to be done right REGULARIZATION • A mathematical technique for reducing the complexity of a model • For example, reduce the impact of terms in a linear regression: 𝑦 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥𝑎4 • Regularization might set some coefficients to zero or to very low values REGULARIZATION • Getting the regularization parameter right models that are neither underfit nor over-fit • Need a way of measuring and comparing your out-of-sample performance – cross validation and a final validation test set CROSS VALIDATION What? • • A method of measuring the predictive performance of a statistical model Several varieties – k-fold, leave one out, bootstrap Why? • High R-squared != good model … especially if model is overfit • Test the model on a set of data not used in model training • One in-sample plus one OOS set not usually good enough K-FOLD CROSS VALIDATION • The original sample is randomly partitioned into k subsamples • Train the model on k-1 subsamples • Test it on the kth subsample, that is, the one that was left out and record performance • Select a different subsample to leave out and repeat the process until all k subsamples have been used as a test set • Repeated k-fold CV: repeat the process with a different random partitioning DON’T TREAT CV AS A PANACEA • CV performance can be an estimate of out-of-sample performance, but in practice it will be a biased estimate, particularly if we repeat it for different variations of our model • Consider tuning a model parameter (eg a regularization parameter) and selecting the model with the best CV performance…selection bias • Unfortunately k-fold CV is not a great approach for financial data THE PROBLEM OF NON-I.I.D. DATA • What is Independent and Identically Distributed data? • Independent: one event gives no information about another event (P2 is not related to P1) • Identically distributed: the distribution is constant • If x and y taken from the same distribution, they are identically distributed EXAMPLE: COIN TOSS The tosses are IID because: 1. The occurrence of one result tells us nothing about the probability of another result. That is, the process has no memory. 2. The distribution from which the tosses are drawn never changes. The probability of each result never changes. Note that even a biased coin toss is IID (events don’t need to be equiprobable) FINANCIAL DATA • Highly non-stationary – the statistical moments vary with time and the length of their calculation windows • Certain events may or may not affect the probability of other events • Not a total disaster – simply means that the assumptions of many common statistical test are violated • Also means that the randomisation process used in regular CV is not appropriate • Don’t use data from the future to build a model on the past! TIME SERIES CROSS VALIDATION • Use a rolling window • Train the model on the window, test it on the adjacent data • Shift the window into the future and repeat (the start of the window can be anchored or moving) • In R, easy to implement with the caret package • In Zorro, easy to implement with the walk-forward framework TIME SERIES CROSS VALIDATION In practice, the length of the optimization and out of sample periods may be important factors in a model’s success. But a robust trading model will be largely insensitive to these. TIME SERIES CV IS ALSO NOT A PANACEA • Even if your model passes a TS CV test, there is no guarantee that it won’t just stop working if and when the underlying relationship being modelled changes. • Particularly true for data-mining systems. • At lest with model-based systems, you have insight into when it is likely to break down. DIAGNOSTICS - LEARNING PLOTS (GET YOUR MODEL UNDER CONTROL EFFICIENTLY) High variance – overfit model Image credit: Andrew Ng High bias – underfit model ALL THE RAGE: DEEP NEURAL NETWORKS • What is it? Broadly - multi-layer neural networks with potentially thousands of nodes Capable of making high level abstractions in line with what we call ‘intuition’ Computer vision (driverless cars), handwriting recognition, Alpha Go Made possible through advances in computational efficiency, both hardware and software • Stacked Auto Encoders – simplify feature selection? USEFUL RESOURCES • Max Kuhn, his caret package and book Applied Predictive Modelling • CRAN machine learning task view • Winning the Kaggle Algorithmic Trading Competition – an interesting read • David Aronson’s Statistically Sound Machine Learning • www.robotwealth.com (of course) – lots of source code examples, R and Lite-C • Hyndsight - the blog of Rob Hyndman, Melbourne Uni statistician http://robjhyndman.com/