Towards Better Keras Modeling – Part III

 In today’s post, The Alpha Scientist walks us through how to set up Talos and Numerox.

Looking inside of one of these dataframes, you’ll note a few things:

  1. The last several columns are named ‘bernie’, ‘elizabeth’ etc… These are code names for different outcome variables the data can be used to generate. The key point is that each of these outcomes is independent of each other (not a multi-class problem, instead several one-class problems).
  2. The features are labeled as only x1 to x50, not with names that provide any hint to what they represent. This is by design – part of numerai’s approach to abstracting real financial market data into 100% data science, 0% market knowledge.
  3. The features have already been scaled and standardized. Allegedly, the data has also been carefully cleansed as well.
    Next, we’ll define a simple function to split each dataframe into features (X), labels (y), and two other series regions and eras, which we will discard (they are not available on the out of sample data, so we definitely don’t want to train models using them!)

We’ll choose one target variable to use as our target label. Here, I’ll choose elizabeth.

Now that we have data, I’ll train a very basic logistic regression model to serve as baseline:

I’ve calculated the log-loss values and class accuracies for training and validation sets, as well as for a naive strategy of always guessing 50%. You’ll note that the model has learned something but not all that much.

This is the unfortunate reality of using machine learning to form likely outcomes in the financial markets. Edge exists, but it is very, very slight. There is also an upside in this reality.

Achieving a model, which truly offers 51% (or better yet, 52/53/54%) accuracy out of a sample can potentially be extraordinarily profitable, given highly liquid markets and careful attention to transaction costs. And the reward for moving from 51/49 advantage to 52/48 is a doubling in potential profit.
_ actually more than double, when considering transaction costs_

If we wanted to improve upon this result with a “representation” model, there would be any number of tactics to employ. However, since the purpose of this post is to explore hyperparameter optimization of keras models, I won’t bother.

Enter, Talos…

At this point, you may want to take a minute or two to read the talos docs and example notebooks.

TL;DR: essentially a talos workflow, involves (1) creating a dict of parameter values to evaluate, (2) defining your keras model within a buildmodel as you may already do, but with a few small modifications in format, and (3) running a “Scan” method.

In the next installment, the Alpha Scientist will continue with the demonstration of Talos.

I’m Chad, aka The Alpha Scientist. I’ve created The Alpha Scientist blog to explore the intersection of my two professional passions: locating “alpha” in market inefficiencies and applying data science methods. If you’ve found this post useful, please follow @data2alpha on Twitter and forward to a friend or colleague who may also find this topic interesting.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from The Alpha Scientist and is being posted with permission from The Alpha Scientist. The views expressed in this material are solely those of the author and/or The Alpha Scientist and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.