Cross Validation (CV) is not a novel topic, but from my experience as both a data scientist and front desk practitioner, it is a statistical tool often underappreciated and misused. I believe that innumerous bad trading ideas could have been discarded had they been handled with due statistical care. So this blog is both a cautionary tale and a quick reference to modern CV methods in finance.
We will cover the following topics:
- Backtesting through Cross Validation
- Combinatorial Purged Cross Validation
- Example of Combinatorial Purged Cross Validation in Python
Cross validation in machine learning has been covered previously, and I’d like to focus here on the topic of cross-validation as a backtesting tool. I will mention some of the pitfalls of cross validation (CV) in finance as well, specially in what concerns data leaking and data peeking.
To recap, CV is a resampling tool aimed at assigning measures of accuracy (e.g.: bias, variance, confidence intervals, prediction error, etc.) to sample estimates (e.g.: model performance, Sharpe values etc). CV is an old idea in statistics whose time seems to have come back again with the advent of modern computers and machine learning.
In finance, CV becomes especially important in handling the high noise-to-signal effects, and in mitigating spurious results resulting from the overfitting of strategies. I should also say that CV is not restricted to ML trading models only, and can be applied to rules-based strategies created through data mining.
I will be using both terms (data leaking and data peaking) interchangeably in the context of CV. This refers to situations where information deliberately or not leaks across folds. This is important when your model or strategy contains indicators that require historic data with large lookbacks.
Let’s use a concrete example for illustration. For sake of concreteness, assume that
- Your model or strategy depends on an indicator such as realised volatility with a lookback of 63 days (or 3 business months)
- Your folds are ordered chronologically, in 1 year blocks
- Your current test or out-of-sample fold is year 4
Schematically, the folds would look like this: |–1–||–2–||–3–|(|–4–|)|–5–|
Between years 4 and 5, volatility computed in the early days of year (5) would require information that’s only available in the OOS fold. (There’s no issue between folds 3 and 4 since the information required in fold 4 is already available.)
One common approach to dealing with this unintended data leak is to use what De Prado calls purging and embargoing. In technical terms, this involves removing some of the data points near the boundaries of affected folds, but there’s a subtle difference between the two processes.
There’s more to it than I’m mentioning here since in De Prado’s terminology, to each labelled data point there are two times attached to it: a trade time and an event time. The specific type of data removal I’m suggesting in the example above is an example of embargoing. (It happens before purging.) In our specific example, I’d remove the first 63 or so days from fold (5).
Moral: Be aware of the temporal dependencies of your features.
Before we discuss purging, we need to talk about event times in finance. In essence, this means that any labelled data point in a financial time series has a trade time and an event time. The event time usually indicates when in the future the mark-to-market value of an asset reached a certain level such as a stop loss or a take profit price. In practice, this means that labels become path-dependent, and care needs to be taken so that when computing labels we don’t peek into the out-of-sample fold.
For a concrete example, say we are trying to build an ML model to predict whether IBM prices would move up or down in the next 5 business days by at least 50 basis points (bps) based on various data sources. The size of these movements are estimated based on recent levels of realised volatility for IBM shares. A common labelling scheme would be: +1 if the share price moves more than 50 bps, 0 if the share price moves by less than 50 bps in absolute value, and -1 if the share price moves down by more than 50 bps.
Next, let’s assume that our typical trading horizon is 1 week. You would enter a position today, and liquidate it one week later. Most people in practice however would have a stop loss or take profit level for a trade so that they can exit a trade earlier if either of those levels are reached. The point is that to mark-to-market your trade, you’d need to observe the price path during the next 5 days, or for the next 5 ticks (you could exit before).
In the labelling process, we have to take care to remove data for which the event times overlap with the trade times in the test fold. This process is called purging.
In practice, one first embargoes the data set, and then purges it. One can implement embargoing by increasing the event time of the test fold preceding a training fold. (See De Prado for more details.)
Moral:Be aware of the price paths between trades and events (e.g. end of the trading horizon, stop loss, take profit etc.)
For more in-depth information on this topic of label building in finance, I recommend the Financial Data Science & Feature Engineering introductory video from Quantra.
Visit QuantInsti Blog for details on Backtesting through Cross Validation: https://blog.quantinsti.com/cross-validation-embargo-purging-combinatorial/
Disclosure: Interactive Brokers
Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.