This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.

Machine Learning Classification Algorithms – Part III

See Part I and Part II to get insight on Supervised Learning.

Types of Classification

Based on the number and level of classes present in the dataset, there are three types of classification.

Binary Classification

This type of classification has only two categories. Usually, they are boolean values – 1 or 0, True or False, High or Low. Some examples where such a classification could be used is in cancer detection or email spam detection where the labels would be positive or negative for cancer and spam or not spam for spam detection.

Let us take an example. We are using a breast cancer detection dataset that can be downloaded from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data= pd.read_csv(“data.csv”)
data.head()

read_data_for_classification.py hosted with ❤ by GitHub

iddiagnosisradius_meantexture_meanperimeter_meanarea_meansmoothness_meancompactness_meanconcavity_meanconcave points_meansymmetry_meanfractal_dimension_meanradius_setexture_seperimeter_searea_sesmoothness_secompactness_seconcavity_seconcave points_sesymmetry_sefractal_dimension_seradius_worsttexture_worstperimeter_worstarea_worstsmoothness_worstcompactness_worstconcavity_worstconcave points_worstsymmetry_worstfractal_dimension_worst
0842302M17.9910.38122.801001.00.118400.277600.30010.147100.24190.078711.09500.90538.589153.400.0063990.049040.053730.015870.030030.00619325.3817.33184.602019.00.16220.66560.71190.26540.46010.11890
1842517M20.5717.77132.901326.00.084740.078640.08690.070170.18120.056670.54350.73393.39874.080.0052250.013080.018600.013400.013890.00353224.9923.41158.801956.00.12380.18660.24160.18600.27500.08902
284300903M19.6921.25130.001203.00.109600.159900.19740.127900.20690.059990.74560.78694.58594.030.0061500.040060.038320.020580.022500.00457123.5725.53152.501709.00.14440.42450.45040.24300.36130.08758
384348301M11.4220.3877.58386.10.142500.283900.24140.105200.25970.097440.49561.15603.44527.230.0091100.074580.056610.018670.059630.00920814.9126.5098.87567.70.20980.86630.68690.25750.66380.17300
484358402M20.2914.34135.101297.00.100300.132800.19800.104300.18090.058830.75720.78135.43894.440.0114900.024610.056880.018850.017560.00511522.5416.67152.201575.00.13740.20500.40000.16250.23640.07678

sns.scatterplot(x=”radius_mean”,y=”texture_mean”,hue=”diagnosis”,data=data)

Fig. 2. Scatter Plot – Texture Mean vs. Radius Mean

Here you can see the two ‘classes’ – ‘M’ stands for malignant and ‘B’ stands for benign. As you can see, the classes are well divided and are easily differentiable to the naked eye for these two features. However, this will not be true for all pairs of features.

Models that can be used for such a classification are:

  • Logistic Regression
  • Support Vector Classifiers

You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.

Stay tuned for the next installment in this series to learn about Multi-class Classification.

Visit QuantInsti for additional insight on this topic: https://blog.quantinsti.com/machine-learning-classification/

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stocks or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

trading top