This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.

# K-Means Clustering Algorithm For Pair Selection In Python – Part IV Contributor:
QuantInsti
Visit: QuantInsti

In the previous installment of this series, Lamarcus demonstrated how to build a heatmap.

Earlier we used Matplotlibs scatter plot method. So now we’ll introduce Seaborn’s scatter plot method. Note that Seaborn is built on top of Matplotlib and thus Matplotlibs functionality can be applied to Seaborn.

#Creating a scatter plot using Seaborn
plt.figure(figsize=(15,10))
sns.jointplot(newDF[‘WMT’],newDF[‘TGT’])
plt.legend(loc=0)
plt.show()

One feature that I like about using Seaborn’s scatter plot is that it provides the Correlation Coefficient and P-Value. From looking at this pearsonr value, we can see that WMT and TGT were not positively correlated over the period. Now that we have a better understanding of our two stocks, let’s check to see if a tradable relationship exists.

We’ll use the Augmented Dickey Fuller Test to determine of our stocks can be traded within a Statistical Arbitrage Strategy.

Recall that we imported the adfuller test from the statsmodels.tsa.api package earlier.

To perform the ADF test, we must first create the spread of our stocks. We add this to our existing newDF dataframe.

We have now performed the ADF test on our spread and need to determine whether or not our stocks are cointegrated. Let’s write some logic to determine the results of our test.

#Logic that states if our test statistic is less than
#a specific critical value, then the pair is cointegrated at that
#level, else the pair is not cointegrated
print(‘Spread is Cointegrated at 1% Significance Level’)
print(‘Spread is Cointegrated at 5% Significance Level’)
print(‘Spread is Cointegrated at 10% Significance Level’)
else: 