# K-Means Clustering Algorithm For Pair Selection In Python – Part VII

See the prior installments in this series here. Part I, Part II, Part III, Part IVPart V and and Part VI.

To view the complete print out of the ADF2 test, we can call adf2.

(-1.9620694402101162,
0.30344784824995258,
1,
502,
{‘1%’: -3.4434437319767452,
‘10%’: -2.5698456884811351,
‘5%’: -2.8673146875484368},
1305.4559226426163)

How about we take a breather here and review what we have learned so far. In this section, we began our journey toward understanding the efficacy of K-Means for pair selection and Statistical Arbitrage by attempting to develop a Statistical Arbitrage strategy in a world with no K-Means.

We learned that in a Statistical Arbitrage trading world without K-Means, we are left to our own devices for solving the historic problem of pair selection. We’ve learned that despite two stocks being related on a fundamental level, this doesn’t necessarily insinuate that they will provide a tradable relationship.

Understanding K-Means

Before we start implementing the K-means clustering algorithm for statistical arbitrage, let’s take a look at how K-Means works.

We will begin by importing our usual data analysis and manipulation libraries. Sci-kit learn offers built-in datasets that you can play with to get familiar with various algorithms. You can take a look at some of the datasets provided by sklearn here.

To gain an understanding of how K-Means works, we’re going to create our own toy data and visualize the clusters. Then we will use sklearn’s K-Means algorithm to assess its ability to identify the clusters that we created. Let’s get started!

#importing necessary libraries
#data analysis and manipulation libraries
import numpy as np
import pandas as pd
#visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
#machine learning libraries
#the below line is far making fake data far illustration purposes
from sklearn.datasets import make_blobs

Stay tuned -for the next installment in this series. Lamarcus will create the data to begin the analysis.

