See Part I for an overview of reinforcement learning.
Components of reinforcement learning
With the bigger picture in mind on what the RL algorithm tries to solve, let us learn the building blocks or components of the reinforcement learning model.
The actions can be thought of what problem is the RL algo solving. If the RL algo is solving the problem of trading then the actions would be Buy, Sell and Hold. If the problem is portfolio management then the actions would be capital allocations to each of the asset classes. How does the RL model decide which action to take?
There are two methods or policies which help the RL model take the actions. Initially, when the RL agent knows nothing about the game, the RL agent can decide actions randomly and learn from it. This is called an exploration policy. Later, the RL agent can use past experiences to map state to action that maximises the long-term rewards. This is called an exploitation policy.
The RL model needs meaningful information to take actions. This meaningful information is the state. For example, you have to decide whether to buy Apple stock or not. For that, what information would be useful to you? Well, you can say I need some technical indicators, historical price data, sentiments data and fundamental data. All this information collected together becomes the state. It is up to the designer on what data should make up the state.
But for proper analysis and execution, the data should be weakly predictive and weakly stationary. The data should be weakly predictive is simple enough to understand, but what do you mean by weakly stationary? Weakly stationary means that the data should have a constant mean and variance. But why is this important? The short answer is that machine learning algorithms work well on stationary data. Alright! How does the RL model learn to map state to action to take?
A reward can be thought of as the end objective which you want to achieve from your RL system. For example, the end objective would be to create a profitable trading system. Then, your reward becomes profit. Or it can be the best risk-adjusted returns then your reward becomes Sharpe ratio.
Defining a reward function is critical to the performance of an RL model. The following metrics can be used for defining the reward.
- Profit per tick
- Sharpe Ratio
- Profit per trade
The environment is the world that allows the RL agent to observe State. When the RL agent applies the action, the environment acts on that action, calculates rewards and transitions to the next state. For example, the environment can be thought of as a chess game or trading Apple stock.
Stay tuned for the next installment in which Ishan will demonstrate the RL model.
Visit QuantInsti to download practical code: https://blog.quantinsti.com/reinforcement-learning-trading/.
Disclosure: Interactive Brokers
Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.