Learn how to create new dataframes with Part I, how to perform basic mathematical operations in Part II and see Part III for instructions on how to use the package RDatasets.jl.
Dealing with missing data
Julia has a “missing” object that is used for unavailable data. You can use skipmissing() function to perform operations ignoring the missing values.
Output:
a | b |
---|---|
Int64? | String? |
1 | Apple |
missing | Orange |
3 | missing |
7 | Grapes |
You can use dropmissing() function to remove the missing values.
a | b |
---|---|
Int64 | String |
1 | Apple |
7 | Grapes |
More details for dealing with missing values can be found here.
Importing and exporting data as CSV and Excel files
Reading data is the first step in analysing any kind of data. Most of the information we come across is either in CSV or excel format, so we’ll focus on these two. We will work with CSV.jl and XLSX.jl for dealing with CSV and Excel files.
Reading and writing CSV files
We’ll read a CSV file (infy.csv), as a dataframe, containing historical stock price data for Infosys downloaded from Yahoo finance for the period 21-Dec-2020 to 22-Dec-2021.
Here’s a summary for this data.
variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | DataType |
Date | 2020-12-22 | 2021-12-21 | 0 | Date | ||
Open | 20.5674 | 16.39 | 20.63 | 24.05 | 0 | Float64 |
High | 20.7164 | 16.69 | 20.775 | 24.5 | 0 | Float64 |
Low | 20.4097 | 16.36 | 20.51 | 23.94 | 0 | Float64 |
Close | 20.5685 | 16.58 | 20.725 | 24.22 | 0 | Float64 |
Adj Close | 20.3422 | 16.2664 | 20.5451 | 24.22 | 0 | Float64 |
Volume | 7.09982e6 | 1320600 | 6.43815e6 | 22911800 | 0 | Int64 |
Here, we calculate the range –
Date | Open | High | Low | Close | Adj Close | Volume | range |
---|---|---|---|---|---|---|---|
Date | Float64 | Float64 | Float64 | Float64 | Float64 | Int64 | Float64 |
2020-12-22 | 16.39 | 16.74 | 16.36 | 16.58 | 16.2664 | 6714400 | 0.379999 |
2020-12-23 | 16.9 | 16.93 | 16.57 | 16.59 | 16.2762 | 5913500 | 0.36 |
2020-12-24 | 16.68 | 16.69 | 16.52 | 16.6 | 16.286 | 1320600 | 0.170001 |
2020-12-28 | 16.73 | 16.84 | 16.72 | 16.77 | 16.4528 | 4239300 | 0.120001 |
2020-12-29 | 16.9 | 16.9 | 16.67 | 16.76 | 16.443 | 8473700 | 0.23 |
2020-12-30 | 16.87 | 17.0 | 16.83 | 16.93 | 16.6098 | 3877200 | 0.17 |
2020-12-31 | 17.01 | 17.03 | 16.89 | 16.95 | 16.6294 | 3693700 | 0.140002 |
2021-01-04 | 17.39 | 17.43 | 17.06 | 17.25 | 16.9237 | 12597600 | 0.370001 |
2021-01-05 | 17.32 | 17.67 | 17.32 | 17.65 | 17.3162 | 8109900 | 0.35 |
2021-01-06 | 17.4 | 17.79 | 17.34 | 17.73 | 17.3946 | 9136300 | 0.450001 |
2021-01-07 | 17.36 | 17.55 | 17.26 | 17.55 | 17.2181 | 10272000 | 0.289999 |
2021-01-08 | 18.07 | 18.61 | 18.02 | 18.59 | 18.2384 | 17802400 | 0.590001 |
2021-01-11 | 18.68 | 18.86 | 18.55 | 18.76 | 18.4052 | 12220600 | 0.310002 |
2021-01-12 | 18.92 | 18.94 | 18.54 | 18.6 | 18.2482 | 10629100 | 0.4 |
2021-01-13 | 19.03 | 19.07 | 18.4 | 18.43 | 18.0814 | 18409900 | 0.67 |
2021-01-14 | 18.57 | 18.65 | 18.14 | 18.22 | 17.8754 | 13286100 | 0.510001 |
2021-01-15 | 18.19 | 18.38 | 18.11 | 18.17 | 17.8263 | 7443000 | 0.269998 |
2021-01-19 | 18.08 | 18.18 | 17.95 | 18.12 | 17.7773 | 7179600 | 0.229999 |
2021-01-20 | 18.37 | 18.47 | 18.29 | 18.4 | 18.052 | 5408500 | 0.179998 |
2021-01-21 | 18.39 | 18.4 | 18.15 | 18.2 | 17.8558 | 7963400 | 0.25 |
2021-01-22 | 18.23 | 18.27 | 18.06 | 18.18 | 17.8361 | 5663500 | 0.210001 |
2021-01-25 | 18.15 | 18.22 | 17.84 | 17.92 | 17.5811 | 6012600 | 0.379999 |
2021-01-26 | 17.92 | 17.92 | 17.75 | 17.85 | 17.5124 | 5472600 | 0.17 |
2021-01-27 | 17.65 | 17.89 | 17.44 | 17.47 | 17.1396 | 11388300 | 0.449998 |
2021-01-28 | 17.46 | 17.75 | 17.41 | 17.64 | 17.3064 | 7877600 | 0.34 |
2021-01-29 | 17.16 | 17.23 | 16.88 | 16.88 | 16.5607 | 9671400 | 0.350001 |
2021-02-01 | 17.19 | 17.42 | 17.05 | 17.38 | 17.0513 | 5829200 | 0.370001 |
2021-02-02 | 17.45 | 17.51 | 17.34 | 17.44 | 17.1101 | 4119800 | 0.17 |
2021-02-03 | 17.6 | 17.75 | 17.49 | 17.65 | 17.3162 | 4677800 | 0.26 |
2021-02-04 | 17.54 | 17.64 | 17.36 | 17.59 | 17.2573 | 4439600 | 0.279998 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
This updated dataframe can be saved using CSV.write() function.
Reading and writing excel files
We’ll use the XLSX.jl package in Julia to read and write excel files.
Here’s how it can be done –
Date | Open | High | Low | Close | Adj Close | Volume |
---|---|---|---|---|---|---|
Any | Any | Any | Any | Any | Any | Any |
2020-12-22 | 16.39 | 16.74 | 16.36 | 16.58 | 16.2664 | 6714400 |
2020-12-23 | 16.9 | 16.93 | 16.57 | 16.59 | 16.2762 | 5913500 |
2020-12-24 | 16.68 | 16.69 | 16.52 | 16.6 | 16.286 | 1320600 |
2020-12-28 | 16.73 | 16.84 | 16.72 | 16.77 | 16.4528 | 4239300 |
2020-12-29 | 16.9 | 16.9 | 16.67 | 16.76 | 16.443 | 8473700 |
2020-12-30 | 16.87 | 17.0 | 16.83 | 16.93 | 16.6098 | 3877200 |
2020-12-31 | 17.01 | 17.03 | 16.89 | 16.95 | 16.6294 | 3693700 |
2021-01-04 | 17.39 | 17.43 | 17.06 | 17.25 | 16.9237 | 12597600 |
2021-01-05 | 17.32 | 17.67 | 17.32 | 17.65 | 17.3162 | 8109900 |
2021-01-06 | 17.4 | 17.79 | 17.34 | 17.73 | 17.3946 | 9136300 |
2021-01-07 | 17.36 | 17.55 | 17.26 | 17.55 | 17.2181 | 10272000 |
2021-01-08 | 18.07 | 18.61 | 18.02 | 18.59 | 18.2384 | 17802400 |
2021-01-11 | 18.68 | 18.86 | 18.55 | 18.76 | 18.4052 | 12220600 |
2021-01-12 | 18.92 | 18.94 | 18.54 | 18.6 | 18.2482 | 10629100 |
2021-01-13 | 19.03 | 19.07 | 18.4 | 18.43 | 18.0814 | 18409900 |
2021-01-14 | 18.57 | 18.65 | 18.14 | 18.22 | 17.8754 | 13286100 |
2021-01-15 | 18.19 | 18.38 | 18.11 | 18.17 | 17.8263 | 7443000 |
2021-01-19 | 18.08 | 18.18 | 17.95 | 18.12 | 17.7773 | 7179600 |
2021-01-20 | 18.37 | 18.47 | 18.29 | 18.4 | 18.052 | 5408500 |
2021-01-21 | 18.39 | 18.4 | 18.15 | 18.2 | 17.8558 | 7963400 |
2021-01-22 | 18.23 | 18.27 | 18.06 | 18.18 | 17.8361 | 5663500 |
2021-01-25 | 18.15 | 18.22 | 17.84 | 17.92 | 17.5811 | 6012600 |
2021-01-26 | 17.92 | 17.92 | 17.75 | 17.85 | 17.5124 | 5472600 |
2021-01-27 | 17.65 | 17.89 | 17.44 | 17.47 | 17.1396 | 11388300 |
2021-01-28 | 17.46 | 17.75 | 17.41 | 17.64 | 17.3064 | 7877600 |
2021-01-29 | 17.16 | 17.23 | 16.88 | 16.88 | 16.5607 | 9671400 |
2021-02-01 | 17.19 | 17.42 | 17.05 | 17.38 | 17.0513 | 5829200 |
2021-02-02 | 17.45 | 17.51 | 17.34 | 17.44 | 17.1101 | 4119800 |
2021-02-03 | 17.6 | 17.75 | 17.49 | 17.65 | 17.3162 | 4677800 |
2021-02-04 | 17.54 | 17.64 | 17.36 | 17.59 | 17.2573 | 4439600 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
We can write an excel file using the writetable() function.
Julia has in-built read() and write() open() close() functions to work with text files. More details can be found here.
Data can be written in .jld format as well. .jld is Julia’s data format built using the JLD.jl package.
Details for the following packages can be found here –
Stay tuned for the next installment, in which Anshul Tayal will present how to create scripts for data visualization.
Visit QuantInsti to read the full article: https://blog.quantinsti.com/data-manipulation-visualization-using-julia/.
Disclosure: Interactive Brokers
Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.
Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.