Machine Learning: The Recovery of Missing Firm Characteristics

The post “Machine Learning: The Recovery of Missing Firm Characteristics” first appeared on Alpha Architect Blog.

Excerpt

Recovering Missing Firm Characteristics with Attention-Based Machine Learning

  • Heiner Beckmeyer and Timo Wiedemann, University of Muenster (Germany)
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category.

What are the research questions?

Firm characteristics are often missing, which forces both researchers and practitioners to come up with workarounds when handling missing data. Previous approaches resorted to either dropping observations with missing entries or simply imputing the cross-sectional mean of a given characteristic. As both procedures accompany serious drawbacks (see below), there is a need for more advanced methods. The authors set up an attention-based machine learning model, motivated by recent advances in natural language to find answers to the following questions:

  1. How do firm characteristics relate to the cross-section of other – observed – characteristics and their historical evolution?
  2. How well does the proposed machine learning approach fare against competing approaches?
  3. How important is it to explicitly model nonlinear and interaction effects? How important is it to incorporate the temporal dynamics of the characteristics?
  4. On which information does the model rely when uncovering the latent structure governing firm characteristics?

What are the Academic Insights?

The authors show that:

  1. The proposed model is highly accurate in extracting the latent structure underlying the evolution of observable firm characteristics. Their approach comfortably outperforms competing methods by a large scale. When using the model to reconstruct available firm characteristics in a controlled environment, the authors show an expected error of around 4 percentiles from the true value which is more than 2-times more accurate than the next-best method.
  2. Incorporating information about the temporal evolution of the characteristics is essential to boost the model’s ability to reconstruct characteristics. While some characteristics exhibit a high degree of autocorrelation, others predominantly depend on cross-characteristic information. Incorporating both types of information is therefore decisive. The authors highlight that the model is flexible enough to simultaneously uncover a wide range of processes governing the evolution of characteristics in a simulation study.
  3. Model sanity checks showing the distribution of the reconstructed (i.e., previously missing) characteristics attest internal validity, with results well in line with expectations. Information is more often missing for smaller firms, and those that would be considered of low quality.
  4. Revisiting the literature on risk factors in financial research shows that many risk premia are likely much smaller than previously thought. Adding to the recent debate on replicability in financial research, the authors highlight, in turn, that most risk premia remain significant. The completed dataset poses an additional out-of-sample hurdle for existing and new risk premia to pass.
  5. Recovered percentiles of firm characteristics have been made publicly available for future research here.
Disclosure: Alpha Architect

The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).

This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Alpha Architect and is being posted with permission from Alpha Architect. The views expressed in this material are solely those of the author and/or Alpha Architect and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.