# Pairs Trading Basics: Correlation, Cointegration And Strategy – Part I

Contributor:
QuantInsti
Visit: QuantInsti

A pairs trading strategy is one of the most popular strategies when it comes to finding trading opportunities between the two stocks that are co-integrated.

How do the stocks co-integrate? How to take advantage of their co-integration with a pairs trading strategy? This blog discusses it all as it covers:

• What is the logic behind pairs trading?
• Essential terms used in pairs trading
• Correlation
• Cointegration
• Z-score
• Augmented Dickey Fuller Test
• Select stocks for pairs trading
• Entry points
• Defining exit points
• Pairs trading strategy using Excel and Python

In a pairs trading strategy, usually, a pair of stocks is traded in a market-neutral strategy, i.e. it doesn’t matter whether the market is trending upwards or downwards, the two open positions for each stock hedge against each other. The key challenges in pairs trading are to:

• Select a pair which will give you good statistical arbitrage opportunities over time
• Select the entry/exit points

Pairs trading was first introduced in the mid-1980s by a group of technical analyst researchers that were employed by Morgan Stanley. The pairs trading strategy uses statistical and technical analysis to seek out potential market-neutral profits.

## What is the logic behind pairs trading?

In the case of a pairs trading strategy, the two stocks or the financial instruments need to be trending at a similar mean price and remain close to each other. But, on certain occasions, one of the instruments may go through a short period of deviation from another in terms of price.

In this short period, the trader can take the opportunity to go long on one of the financial instruments while shorting the other. The positions are based on the current market price of both the stocks and their standard deviation.

## Essential terms used in pairs trading

Some of the essential terms that are used in pairs trading strategy are-

### Correlation

Correlation is quantified by the correlation coefficient ρ, which ranges from -1 to +1. The correlation coefficient indicates the degree of correlation between the two variables.

The value of +1 means there exists a perfect positive correlation between the two variables, -1 means there is a perfect negative correlation and 0 means there is no correlation.

A perfect positive correlation is when one variable moves in either an upward or downward direction and the other variable also moves in the same direction with the same magnitude.

Whereas a perfect negative correlation is when one variable moves in the upward direction and the other variable moves in the downward (i.e. opposite) direction with the same magnitude.

The correlation coefficient for the two variables is given by:

Correlation(X,Y) = ρ = COV(X,Y) / SD(X).SD(Y)

where,
cov (X, Y) = the covariance between X & Y
SD (X) and SD(Y) = the standard deviation of the respective variables

If the correlation is high, say 0.8, traders may choose that pair for pairs trading. This high number represents a strong relationship between the two stocks. So if A goes up, the chances of B going up are also quite high.

Based on this assumption a market neutral strategy is played where A is bought and B is sold; bought and sold decisions are made based on their individual patterns.

Just looking at correlation might give you spurious results. For instance, if your pairs trading strategy is based on the spread between the prices of the two stocks, it is possible that the prices of the two stocks keep on increasing without ever mean-reverting.

where ‘a’ and ‘b’ = prices of stocks A and B respectively

For each stock of A bought, you have sold n number of stocks of B.

Now, both ‘a’ and ‘b’ increase in such a way that the value of the spread decreases. This will result in a loss since stock A is increasing at a rate lower than stock B and you are short on stock B.

Thus, one should be careful of using only correlation for determining the pairs of the stocks while performing the pairs trading strategy.

### Cointegration

​​The most common test for Pairs Trading is the cointegration test. Cointegration is a statistical property of two or more time-series variables which indicates if a linear combination of the variables is stationary.

Let us understand the statement above. The two time series variables, in this case, are the log of prices of stocks A and B. Linear combination of these variables can be a linear equation defining the spread:

As you know,

where ‘a’ and ‘b’ are prices of stocks A and B respectively.

For each stock of A bought, you have sold n stocks of B.

If A and B are cointegrated, the equation above is stationary. A stationary process has very valuable features which are required to model pairs trading strategies.

For instance, in this case, if the equation above is stationary, that suggests that the mean and variance of this equation remain constant over time.

So if we start with ‘n’, which is called the hedge ratio, so that spread = 0, the property of stationary implies that the expected value of spread will remain as 0. Any deviation from this expected value is a case for statistical abnormality, hence a case for pairs trading!

### Z-score

Given a normal distribution of raw data points, the z-score is calculated so that the new distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Having such a distribution ~ N(0, 1) is very useful for creating threshold levels.

For instance, in pairs trading, we have a distribution of spread between the prices of stocks A and B. We can convert these raw scores of spread into z-scores as explained below.

This new distribution will have a mean of 0 and a standard deviation of 1. It is easy to create threshold levels for this distribution such as 1.5 sigma, 2 sigma, 2.5 sigma, and so on.

The formula for z-score is as follows:

z = (x – mean) / standard deviation

where,
x = a raw data point
z = the z-score

Mean and standard deviation can be rolling statistics for a period of ‘t’ days or minutes or time intervals.

Stay tuned for the next installment in which Chainika Thakar will discuss Augmented Dickey Fuller Test.