Reinforcement Learning in Trading

Articles From: QuantInsti
Website: QuantInsti

Initially, we were using machine learning and AI to simulate how humans think, only a thousand times faster! The human brain is complicated but is limited in capacity. This simulation was the early driving force of AI research. But we have reached a point today where humans are amazed at how AI “thinks”.

A quote sums it up perfectly, “AlphaZero, a reinforcement learning algorithm developed by Google’s DeepMind AI, taught us that we were playing chess wrong!”

While most chess players know that the ultimate objective of chess is to win, they still try to keep most of the chess pieces on the board. But AlphaZero understood that it didn’t need all its chess pieces as long as it was able to take the opponent’s king. Thus, its moves are perceived to be quite risky but ultimately they would pay off handsomely.

AlphaZero understood that to fulfil the long term objective of checkmate, it would have to suffer losses in the game. We call this delayed gratification. What’s impressive is that before AlphaZero, few people thought of playing in this manner. Ever since various experts in a variety of disciplines have been working on ways to adapt reinforcement learning in their research. This exciting achievement of AlphaZero started our interest in exploring the usage of reinforcement learning for trading.

This article is structured as follows. The focus is to describe the applications of reinforcement learning in trading and discuss the problem that RL can solve, which might be impossible through a traditional machine learning approach. You won’t find any code to implement but lots of examples to inspire you to explore the reinforcement learning framework for trading.

  • What is reinforcement learning?
  • How to apply reinforcement learning in trading?
  • Components of reinforcement learning
  • Q Table and Q Learning
  • Key Challenges

What is reinforcement learning?

Reinforcement learning might sound exotic and advanced, but the underlying concept of this technique is quite simple. In fact, everyone knows about it since childhood!

As a kid, you were always given a reward for excelling in sports or studies. Also, you were reprimanded or scolded for doing something mischievous like breaking a vase. This was a way to change your behaviour. Suppose you would get a bicycle or PlayStation for coming first, you would practice a lot to come first. And since you knew that breaking a vase meant trouble, you would be careful around it. This is called reinforcement learning.

The reward served as positive reinforcement while the punishment served as negative reinforcement. In this manner, your elders shaped your learning. In a similar way, the RL algorithm can learn to trade in financial markets on its own by looking at the rewards or punishments received for the actions.

Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm of learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL)
– Google Deepmind

How to apply reinforcement learning in trading?

In the realm of trading, the problem can be stated in multiple ways such as to maximise profit, reduce drawdowns, or portfolio allocation. The RL algorithm will learn the strategy to maximise long-term rewards.

For example, the share price of Amazon was almost flat from late 2018 to the start of 2020. Most of us would think a mean-reverting strategy would work better here.

Reinforcement Learning

But if you see from early 2020, the price picked up and started trending. Thus from the start of 2020, deploying a mean-reverting strategy would have resulted in a loss. Looking at the mean-reverting market conditions in the prior year, most of the traders would have exited the market when it started to trend.

But if you had gone long and held the stock, it would have helped you in the long run. In this case, foregoing your present reward for future long-term gains. This behaviour is similar to the concept of delayed gratification which was talked about at the beginning of the article.

The RL model can pick up price patterns from the year 2017 and 2018 and with a bigger picture in mind, the model can continue to hold on to a stock for outsize profits later on.

Stay tuned for the next installment in which Ishan will discuss reinforcement learning different from traditional machine learning algorithms.

Visit QuantInsti to download practical code: https://blog.quantinsti.com/reinforcement-learning-trading/.

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.