Machine Learning Classification Algorithms – Part III

See Part I and Part II to get insight on Supervised Learning.

Types of Classification

Based on the number and level of classes present in the dataset, there are three types of classification.

Binary Classification

This type of classification has only two categories. Usually, they are boolean values – 1 or 0, True or False, High or Low. Some examples where such a classification could be used is in cancer detection or email spam detection where the labels would be positive or negative for cancer and spam or not spam for spam detection.

Let us take an example. We are using a breast cancer detection dataset that can be downloaded from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data= pd.read_csv(“data.csv”)
data.head()

read_data_for_classification.py hosted with ❤ by GitHub

iddiagnosisradius_meantexture_meanperimeter_meanarea_meansmoothness_meancompactness_meanconcavity_meanconcave points_meansymmetry_meanfractal_dimension_meanradius_setexture_seperimeter_searea_sesmoothness_secompactness_seconcavity_seconcave points_sesymmetry_sefractal_dimension_seradius_worsttexture_worstperimeter_worstarea_worstsmoothness_worstcompactness_worstconcavity_worstconcave points_worstsymmetry_worstfractal_dimension_worst
0842302M17.9910.38122.801001.00.118400.277600.30010.147100.24190.078711.09500.90538.589153.400.0063990.049040.053730.015870.030030.00619325.3817.33184.602019.00.16220.66560.71190.26540.46010.11890
1842517M20.5717.77132.901326.00.084740.078640.08690.070170.18120.056670.54350.73393.39874.080.0052250.013080.018600.013400.013890.00353224.9923.41158.801956.00.12380.18660.24160.18600.27500.08902
284300903M19.6921.25130.001203.00.109600.159900.19740.127900.20690.059990.74560.78694.58594.030.0061500.040060.038320.020580.022500.00457123.5725.53152.501709.00.14440.42450.45040.24300.36130.08758
384348301M11.4220.3877.58386.10.142500.283900.24140.105200.25970.097440.49561.15603.44527.230.0091100.074580.056610.018670.059630.00920814.9126.5098.87567.70.20980.86630.68690.25750.66380.17300
484358402M20.2914.34135.101297.00.100300.132800.19800.104300.18090.058830.75720.78135.43894.440.0114900.024610.056880.018850.017560.00511522.5416.67152.201575.00.13740.20500.40000.16250.23640.07678

sns.scatterplot(x=”radius_mean”,y=”texture_mean”,hue=”diagnosis”,data=data)

Fig. 2. Scatter Plot – Texture Mean vs. Radius Mean

Here you can see the two ‘classes’ – ‘M’ stands for malignant and ‘B’ stands for benign. As you can see, the classes are well divided and are easily differentiable to the naked eye for these two features. However, this will not be true for all pairs of features.

Models that can be used for such a classification are:

  • Logistic Regression
  • Support Vector Classifiers

You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.

Stay tuned for the next installment in this series to learn about Multi-class Classification.

Visit QuantInsti for additional insight on this topic: https://blog.quantinsti.com/machine-learning-classification/

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stocks or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.