Data Manipulation and Visualization Techniques in Julia – Part III

QuantInsti

Contributor:
QuantInsti
Visit: QuantInsti

Learn how to create new dataframes with Part I and how to perform basic mathematical operations in Part II.

Grouping data

Let’s look at ways to group data, which comes in handy while summarising data.

In-built datasets in Julia

The package RDatasets.jl in Julia helps you import all the in-build packages in R that can be used for testing purposes.

Here’s how you can find out the list of available datasets. It has 763 datasets.

We’ll work with one of the in-built datasets (“iris”) in this section. “iris” provides the data for multiple measurements of 3 plant species and 4 features for each of them. More details about this dataset can be found here.

The following snapshot shows the variables in the iris dataset.

Iris Dataset

Source: https://ai.plainenglish.io/iris-flower-classification-step-by-step-tutorial-c8728300dc9e

SepalLengthSepalWidthPetalLengthPetalWidthSpecies
Float64Float64Float64Float64Cat…
5.13.51.40.2setosa
4.93.01.40.2setosa
4.73.21.30.2setosa
4.63.11.50.2setosa
5.03.61.40.2setosa
5.43.91.70.4setosa
4.63.41.40.3setosa
5.03.41.50.2setosa
4.42.91.40.2setosa
4.93.11.50.1setosa
5.43.71.50.2setosa
4.83.41.60.2setosa
4.83.01.40.1setosa
4.33.01.10.1setosa
5.84.01.20.2setosa
5.74.41.50.4setosa
5.43.91.30.4setosa
5.13.51.40.3setosa
5.73.81.70.3setosa
5.13.81.50.3setosa
5.43.41.70.2setosa
5.13.71.50.4setosa
4.63.61.00.2setosa
5.13.31.70.5setosa
4.83.41.90.2setosa
5.03.01.60.2setosa
5.03.41.60.4setosa
5.23.51.50.2setosa
5.23.41.40.2setosa
4.73.21.60.2setosa

Here’s the summary of this dataset.

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
SepalLength5.843334.35.87.90Float64
SepalWidth3.057332.03.04.40Float64
PetalLength3.7581.04.356.90Float64
PetalWidth1.199330.11.32.50Float64
Species setosa virginica0CategoricalValue{String, UInt8}

Let’s look at some of the questions you might want to answer using the iris dataset.

We can perform arithmetic operations by grouping data based on various columns. Here’s how we can get the answer to the following question –

What’s the mean value of the sepal length of each species?

Speciesmm
CategoryFloat64
setosa5.006
versicolor5.936
virginica6.588

Another package that helps make the operations more intuitive is Pipe.jl. It lets you write operations as they are performed instead of the backward approach.

Speciesmm
CategoryFloat64
setosa5.006
versicolor5.936
virginica6.588
Speciesnrow
CategoryFloat64
setosa50
versicolor50
virginica50

Stay tuned for the next installment, in which Anshul Tayal will demonstrate how to deal with missing data.

Visit QuantInsti to read the full article: https://blog.quantinsti.com/data-manipulation-visualization-using-julia/.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.