# Practical Statistics for Algo Traders

Contributor:
Robot Wealth
Visit: Robot Wealth

We take a look back at this classic piece from Kris Longmore and re-evaluate the importance of practical statistics for algo traders.

How do you feel when you see the word “statistics”? Maybe you sense that it’s something you should be really good at, but aren’t. Maybe the word gives you a sense of dread, since you’ve started exploring its murky depths, but thrown your hands up in despair and given up – perhaps more than once. If you read lots of intelligent-sounding quant blogs, you might even feel like your lack of statistical sophistication is what’s standing between you and algo trading success.

Well, you’re not alone. The reality is that classical statistics is difficult, time-consuming and downright confusing. Fundamentally, we use statistics to answer a question – but when we use classical methods to answer it, half the time we forget what question we were seeking an answer to in the first place.

But guess what? There’s another way to get our questions answered without resorting to classical statistics. And it’s one that will generally appeal to the practical, hands-on problem solvers that tend to be attracted to algo trading in the long run.

Specifically, algo traders can leverage their programming skills to get answers to tough statistical questions – without resorting to classical statistics. In the words of Jake van der Plas, whose awesome PyCon 2016 talk inspired some of the ideas in this post, “if you can write a for loop, you can do statistics.”

In this post and the ones that follow, I want to show you some examples of how simulation and resampling methods lend themselves to intuitive computational solutions to problems that are quite complex when posed in the domain of classical statistics. Let’s get started.

Starting Simple: Beating a Game of Chance

The example that we’ll start with is relatively simple and more for illustrative purposes than something that you’ll use a lot in a trading context. But it sets the scene for what follows and provides a useful place to start getting a sense for the intuition behind the methods I’ll show you later.

You’ve probably heard the story of Ed Thorp and Claude Shannon. The former is a mathematics professor and hedge fund manager; the latter was a mathematician and engineer referred to as “the father of information theory”, and whose discoveries underpin the digital age in which we live today (he’s kind of a big deal).

When they weren’t busy changing the world, these guys would indulge in another great hobby: beating casinos at games of chance. Thorp is known for developing a system of card counting to win at Blackjack. But the story I find even more astonishing is that together, Thorp and Shannon developed the first wearable computer, whose sole purpose was to beat the game of roulette. According to a 2013 article describing the affair,

Roughly the size of a pack of cigarettes, the computer itself had 12 transistors that allowed its wearer to time the revolutions of the ball on a roulette wheel and determine where it would end up. Wires led down from the computer to switches in the toes of each shoe, which let the wearer covertly start timing the ball as it passed a reference mark. Another set of wires led up to an earpiece that provided audible output in the form of musical cues – eight different tones represented octants on the roulette wheel. When everything was in sync, the last tone heard indicated where the person at the table should place their bet. Some of the parts, Thorp says, were cobbled together from the types of transmitters and receivers used for model airplanes.

So what’s all this got to do with hacking statistics? Well, nothing really, except that it provides context for an interesting example. Say we were a pit boss in a big casino, and we’d been watching a roulette player sitting at the table for hours, amassing an unusually large pile of chips. A review of the casino’s closed circuit television revealed that the player had played 150 games of roulette and won 7 of those. What are the chances that the player’s run of good luck is an indication of cheating?

To answer that question, we firstly need to understand the probabilities of the game of roulette. There are 37 numbers on the roulette wheel (0 to 36), so the probability of choosing the correct number on any given spin is 1 in 37.1For a correct guess, the house pays out \$36 for every \$1 wagered. So the payout is slightly less than the expectancy, which of course ensures that the house wins in the long run.

In order to use classical statistics to work out the probability that our player was cheating, we would firstly need to recognise that our player’s run of good luck could be modeled with the binomial probability distribution:

Here are some R functions for implementing these equations:2

And here’s how to calculate the probability of winning 7 out of 150 games of roulette:

This returns a value of 0.062, which means there is about a 6% of chance of winning 7 out of 150 games of roulette.

But wait, we’re not done yet! We’ve actually found the probability of winning exactly 7 out of 150 games, but we really want to know the probability of winning at least 7 out of 150 games. So we actually need to sum up the probabilities associated with winning 7, 8, 9, 10, … etc games. This number is the p-value, which is used in statistics to measure the validity of the null hypothesis, which is the idea we are trying to disprove – in our case, that the player isn’t cheating.

Confused? You’re not alone. Classical statistics is full of these double negatives and it’s one of the reasons that it’s so easy to forget what question we were even trying to answer in the first place.

In the next post Kris will show us a function for calculating the p-value for our roulette player of possibly dubious integrity (or commendable ingenuity, depending on your point of view).