Grokking Linear Regression Analysis in Finance

Contributor:
QuantInsti
Visit: QuantInsti

Linear regression, simple linear regression, ordinary least squares, multiple linear, OLS, multivariate, …

You’ve probably come across these names when you encountered regression. If that isn’t enough, you even have stranger ones like lasso, ridge, quantile, mixed linear, etc.

My series of articles is meant for those who have some exposure to regression in that you’ve used it or seen it used. So you probably have a fuzzy idea of what it is but not spent time looking at it intimately. There are many write-ups and material online on regression (including the QI blog) which place prominence on different aspects of the subject.

We have a post that shows you how to use regression analysis to create a trend following trading strategy. We also have one that touches upon using the scikit-learn library to build and regularize linear regression models. There are posts that show how to use it on forex datagold prices and stock prices framing it as a machine learning problem.

My emphasis here is on building some level of intuition with a brief exposure to background theory. I will then go on to present examples to demonstrate the techniques we can use and what inferences we can draw from them.

I have intentionally steered clear of any derivations, because it’s already been tackled well elsewhere (check the references section). There’s enough going on here for you to feel the heat a little bit.

This is the first article on the subject where we will explore the following topics.

• Some high school math
• What are models?
• Why linear?
• Where does regression fit in?
• Nomenclature
• Types of linear regression

• Simple linear regression
• Multiple linear regression
• Linear regression of a non-linear relationship
• Model parameters and model estimates
• So what’s OLS?
• What’s next?
• References

Some high school math

Most of us have seen the equation of a straight line in high school.

y = mx + c

where

• xx and yy are the XX- and YY- coordinates of any point on the line respectively,
• mm is the slope of the line,
• cc is the yy- intercept (i.e. the point where the line cuts the YY-axis)

The relationships among x,y,mx,y,m and cc are deterministic, i.e. if we know the value of any three, we can precisely calculate the value of the unknown fourth variable.

All linear models in econometrics (a fancy name for statistics applied to economics and finance) start from here with two crucial differences from what we studied in high school.

1. The unknowns now are always mm and cc
2. When we calculate our unknowns, it’s only our ‘best’ guess at what their values are. In fact, we don’t calculate; we estimate the unknowns.

Before moving on to the meat of the subject, I’d like to unpack the term linear models.

What are models?

Generally speaking, models are educated guesses about the working of a phenomenon. They reduce or simplify reality. They do so to help us understand the world better. If we didn’t work with a reduced form of the subject under investigation, we could as well have worked with reality itself. But that’s not feasible or even helpful.

In the material world, a model is a simplified version of the object that we study. This version is created such that we capture its main features. The model of a human eye reconstructs it to include its main parts and their relationships with each other.

Similarly, the model of the moon (based on who is studying it) would focus on features relevant to that field of study (such as the topography of its surface, or its chemical composition or the gravitational forces it is subject to etc.).

However, in economics and finance (and other social sciences), our models are slightly peculiar. Here too, a model performs a similar function. But instead of dissecting an actual object, we are investigating social or economic phenomena.

Like what happens to the price of a stock when inflation is high or when there’s a drop in GDP growth (or a combination of both). We only have raw observed data to go by. But that in itself doesn’t tell us much. So we try to find a suitable and faithful approximation of our data to help make sense of it.

We embody this approximation in a mathematical expression with variables (or more precisely, parameters) that have to be estimated from our data set. These type of models are data-driven (or statistical) in nature.

In both cases, we wilfully delude ourselves with stories to help us interpret what we see.

In finance, we have no idea how the phenomenon is wired. But our models are useful mathematical abstractions, and for the most part, they work satisfactorily. As the statistician George Box said, “All models are wrong, but some are useful”. Otherwise, we wouldn’t be using them. 🙂

These finance models stripped to their bones can be seen asdata=model+errordata=model+error
ordata=signal+noisedata=signal+noise

It is useful to think of the modeling exercise as a means to unearth the structure of the hidden data-generating process (which is the process that causes the data to appear the way it does). Here, the model (if specified and estimated suitably) would be our best proxy to reveal this process.

I also find it helpful to think of working with data as a quest to extract the signal from the noise.

Why linear?

Because the most used statistical or mathematical models we encounter are either linear or transformed to a quasi-linear form. I speak of general ones like simple or multiple linear regression, logistic regression, etc. or even finance-specific ones like the CAPM, the Fama-French or the Carhart factor models.

Where does regression fit in?

Regression analysis is the fundamental method used in fitting models to our data set, and linear regression is its most commonly used form.

Here, the basic idea is to measure the linear relationship between variables whose behavior (with each other) we are interested in.

Both correlation and regression can help here. However, with correlation, we summarize the relationship into a single number which is not very useful. Regression, on the other hand, gives us a mathematical expression that is richer and more interpretative. So we prefer to work with it.

Linear regression assumes that the variable of our interest (the dependent variable) can be modeled as a linear function of the independent variable(s) (or explanatory variable(s)).

Francis Galton coined the name in the nineteenth century when he compared the heights of parents and their children. He observed that tall parents tended to have shorter children and short parents tended to have taller children. Over generations, the heights of human beings converged to the mean. He referred to the phenomenon as ‘regressing to the mean’.

The objective of regression analysis is to:

• either measure the strength of relationships (between the response variable and one or more explanatory variables), or
• forecast into the future

Stay tuned for the next installment in which Vivek will discuss the Nomenclature.

Visit QuantInsti for additional insight on this topic: https://blog.quantinsti.com/linear-regression/.

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.

Disclosure: Forex

There is a substantial risk of loss in foreign exchange trading. The settlement date of foreign exchange trades can vary due to time zone differences and bank holidays. When trading across foreign exchange markets, this may necessitate borrowing funds to settle foreign exchange trades. The interest rate on borrowed funds must be considered when computing the cost of trades across multiple markets.