This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.

Can A Computer Read Employee Emails and Detect Fraud?

Alpha Architect

Alpha Architect
Visit: Alpha Architect



The post “Can A Computer Read Employee Emails and Detect Fraud?” first appeared on Alpha Architect Blog.

Zero-Revelation RegTech: Detecting Risk through Linguistic Analysis of Corporate Emails and News

  • S.R. Das, S. Kim, B. Kothari
  • Journal of Financial Data Science, Spring 2019
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category

What are the Research Questions

Last week I took you on a tour of utilizing the data hidden in the language of the news. In this post, we’re taking the analysis of language to corporate emails. Clearly, unlike the data in news, corporate emails are non-public information. Therefore the data is being utilized to develop regulatory technology (RegTech), not hunting for alpha generation. This paper applies natural language programming (NLP), a popular data science technique used in finance, to develop an early-warning system for detecting corporate fraud and/or failure. Specifically, the authors attempt at answering the following research questions:

  • Does the sentiment conveyed by employee communications (i.e. emails) contain value-relevant information?
  • Is this information conveyed in a timely manner (i.e., does email sentiment lead subsequent stock returns)?
  • Do other structural characteristics of internal employee emails (e.g., email length, email volume, or email-network characteristics) also contain value-relevant information?
  • Which tends to contain more value-relevant information, the actual verbal content, or structural characteristics of employee emails?

What are the Academic Insights?

By analyzing a unique dataset made up of 113,000 emails from 144 Enron employees and 1,300 that appeared on PR Newswire from January 2000 to December 2001, the authors find:

  1. YES, the authors observe trending patterns in the sentiment contained in emails. Specifically, they observe the positive sentiment from both emails and news articles decline into the year of 2001 as Enron problems started to manifest.
  2. YES, the net sentiment of email content is a meaningful predictor of subsequent stock returns. Specifically, a one standard deviation decrease in the net sentiment gleaned from emails is associated with a 4.5% decline in stock returns (coefficient estimate = 2.347, t-statistic = 3.27).
  3. YES, the authors find that when the length of emails is added as an independent variable to the regression, it takes over in explaining the relation with future stock returns. In fact, for every 20-character decline in email length, there is a 1.17% in future stock returns. Additionally, the authors find that structural characteristics such as the length of emails contain the most value-relevant information.

Why does it matter?

The importance of RegTech has grown rapidly since the financial crisis; more than $160 billion has been paid in fines by various financial institutions. Also, about 10%-15% of the staff in financial institutions is dedicated to compliance ( Arnold, 2016) and a RegTech solution could create a reduction of costs. This paper develops a RegTech expert system solution to parse corporate email content to detect shifts in critical characteristics in a timely, efficient, and noninvasive manner. Clearly it’s hard to make large sweeping conclusions from one data set on a company that the researchers knew had failed. That however shouldn’t stop us from taking a deeper look into the utilization of textual RegTech analysis of corporate management emails as a means to detect risk in a timelier fashion. It may also be used by regulators in their audit process because they can requisition such analyses from firms without intrusively reading emails. In the words of the authors:

Early detection and prevention is better than a cure”

The Most Important Chart from the Paper:


In this paper, we demonstrate how an applied linguistics platform may be used to parse corporate email content and news to assess factors predicting escalating risk or the gradual shifting of other critical characteristics within the firm before they are eventually manifested in observable data and financial outcomes. We find that email content and news articles meaningfully predict increased risk and potential malaise. We also find that other structural characteristics, such as the average email length, are strong predictors of risk and subsequent performance. We present implementations of three spatial analyses of internal corporate communication, i.e., email networks, vocabulary trends, and topic analysis. Overall, we propose a RegTech solution by which to systematically and effectively detect escalating risk or potential malaise without the need to manually read individual employee emails.

Disclosure: Alpha Architect

The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).

This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Alpha Architect and is being posted with permission from Alpha Architect. The views expressed in this material are solely those of the author and/or Alpha Architect and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.

trading top