Exclusive Lasso and Group Lasso Using R code

This post shows how to use the R packages for estimating an exclusive lasso and a group lasso. These lasso variants have a given grouping order in common but differ in how this grouping constraint is functioning when a variable selection is performed.

Lasso, Group Lasso, and Exclusive Lasso

While LASSO (least absolute shrinkage and selection operator) has many variants and extensions, our focus is on two lasso models: Group Lasso and Exclusive Lasso. Before we dive into the specifics, let’s go over the similarities and differences of these two lasso variants from the following figure.

In the above figure, 15 variables are categorized into four groups. Lasso selects important features irrespective of the grouping. Of course, lasso did not select Group 2 and 4’s variables but it is not intended but just an estimation result. While group lasso selects all or none in specific group, exclusive lasso selects at least one variable in each group.

From a perspective of competition, group lasso implements a completion across groups and on the contrary, exclusive lasso makes variables in the same group compete with each other within each group.

Since we can grasp the main characteristics of two lasso modes from the above figure, let’s turn to the mathematical expressions.

Equations

There are some various expressions for these models and the next equations are for lasso, group lasso, and exclusive Lasso following Qiu et al. (2021).

where the coefficient in β are divided into G groups and βg denotes the coefficient vector of the g-th group.

In the group lasso, l2,1-norm consists of the intra-group non-sparsity via l2-norm and inter-group sparsity via l1-norm. Therefore, variables of each group will be either selected or discarded entirely. Refer to Yuan and Lin (2006) for more information on the group lasso.

In exclusive lasso, l1,2-norm consists of the intra-group sparsity via l1-norm and inter-group non-sparsity via l2-norm. Exclusive lasso selects at least one variable from each group. Refer to Zhou et al. (2010) for more information on the exclusive lasso.

R code

The following R code implements lasso, group lasso, and exclusive lasso for an artificial data set with a given group index. Required R packages are glmnet for lasso, gglasso for group lasso, and ExclusiveLasso for exclusive lasso.

#========================================================#
# Quantitative ALM, Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://kiandlee.blogspot.com
#--------------------------------------------------------#
# Group Lasso and Exclusive Lasso
#========================================================#
 
library(glmnet)
library(gglasso)
library(ExclusiveLasso)
 
graphics.off()  # clear all graphs
rm(list = ls()) # remove all files from your workspace
 
set.seed(1234)
 
#--------------------------------------------
# X and y variable
#--------------------------------------------
 
N = 500 # number of observations
p = 20  # number of variables
 
# random generated X
X = matrix(rnorm(N*p), ncol=p)
 
# standardization : mean = 0, std=1
X = scale(X)
 
# artificial coefficients
beta = c(0.15,-0.33,0.25,-0.25,0.05,0,0,0,0.5,0.2,
        -0.25, 0.12,-0.125,0,0,0,0,0,0,0)
 
# Y variable, standardized Y
y = X%*%beta + rnorm(N, sd=0.5)
#y = scale(y)
 
# group index for X variables
v.group <- c(1,1,1,1,1,2,2,2,2,2,
             3,3,3,3,3,4,4,4,4,4)
 
#--------------------------------------------
# Model with a given lambda
#--------------------------------------------
 
# lasso
la <- glmnet(X, y, lambda = 0.1,
             family="gaussian", alpha=1,
             intercept = F) 
# group lasso
gr <- gglasso(X, y, lambda = 0.2,
             group = v.group, loss="ls",
             intercept = F)
# exclusive lasso
ex <- exclusive_lasso(X, y,lambda = 0.2, 
             groups = v.group, family="gaussian", 
             intercept = F) 
# Results
df.comp <- data.frame(
    group = v.group, beta = beta,
    Lasso     = la$beta[,1],
    Group     = gr$beta[,1],
    Exclusive = ex$coef[,1]
)
df.comp
 
#------------------------------------------------
# Run cross-validation & select lambda
#------------------------------------------------
# lambda.min : minimal MSE
# lambda.1se : the largest λ at which the MSE is 
#   within one standard error of the minimal MSE.
 
# lasso
la_cv <- cv.glmnet(x=X, y=y, family='gaussian',
            alpha=1, intercept = F, nfolds=5)
x11(); plot(la_cv)
paste(la_cv$lambda.min, la_cv$lambda.1se)
 
# group lasso
gr_cv <- cv.gglasso(x=X, y=y, group=v.group, 
            loss="ls", pred.loss="L2", 
            intercept = F, nfolds=5)
x11(); plot(gr_cv)
paste(gr_cv$lambda.min, gr_cv$lambda.1se)
 
# exclusive lasso
ex_cv <- cv.exclusive_lasso(
            X, y, groups = v.group,
            intercept = F, nfolds=5)
x11(); plot(ex_cv)
paste(ex_cv$lambda.min, ex_cv$lambda.1se)
 
 
#--------------------------------------------
# Model with selected lambda
#--------------------------------------------
 
# lasso
la <- glmnet(X, y, lambda = la_cv$lambda.1se,
             family="gaussian", alpha=1,
             intercept = F) 
# group lasso
gr <- gglasso(X, y, lambda = gr_cv$lambda.1se+0.1,
             group = v.group, loss="ls",
             intercept = F)
# exclusive lasso
ex <- exclusive_lasso(X, y,lambda = ex_cv$lambda.1se, 
             groups = v.group, family="gaussian", 
             intercept = F) 
# Results
df.comp.lambda.1se <- data.frame(
    group = v.group, beta = beta,
    Lasso     = la$beta[,1],
    Group     = gr$beta[,1],
    Exclusive = ex$coef[,1]
)
df.comp.lambda.1se

The first output from the above R code is the table of coefficients of all models with given each initial λ parameter. We can easily find the model-specific pattern of each model. I add horizontal dotted lines for separating each group just for exposition purpose.

> df.comp

    group   beta       Lasso        Group    Exclusive
V1      1  0.150  0.01931728  0.016938769  0.013555753
V2      1 -0.330 -0.18832916 -0.047695924 -0.184065967
V3      1  0.250  0.17261562  0.042254702  0.169516525
V4      1 -0.250 -0.16322025 -0.043994211 -0.153730137
V5      1  0.050  0.00000000  0.009673207  0.000000000
--------------------------------------------------------
V6      2  0.000  0.00000000  0.001067915  0.000000000
V7      2  0.000  0.00000000  0.001355834  0.000000000
V8      2  0.000  0.00000000  0.014211932  0.000000000
V9      2  0.500  0.38757370  0.101900169  0.385382905
V10     2  0.200  0.11146785  0.044591933  0.110731304
--------------------------------------------------------
V11     3 -0.250 -0.15010738  0.000000000 -0.186626541
V12     3  0.120  0.00000000  0.000000000  0.003117881
V13     3 -0.125 -0.08305582  0.000000000 -0.120458426
V14     3  0.000  0.00000000  0.000000000  0.000000000
V15     3  0.000  0.00000000  0.000000000  0.000000000
--------------------------------------------------------
V16     4  0.000  0.00000000  0.000000000  0.000000000
V17     4  0.000  0.00000000  0.000000000  0.000000000
V18     4  0.000  0.00000000  0.000000000  0.010918904
V19     4  0.000  0.00000000  0.000000000  0.015330520
V20     4  0.000  0.00000000  0.000000000  0.013591628

The second output is the table of coefficients of all models with each selected lambda which is a result of cross validation. I add horizontal dotted lines for separating each group just for exposition purpose.

> df.comp.lambda.1se

    group   beta       Lasso         Group   Exclusive
V1      1  0.150  0.07776181  4.779605e-02  0.08297141
V2      1 -0.330 -0.24670209 -1.257863e-01 -0.25235238
V3      1  0.250  0.22825029  1.130749e-01  0.23282822
V4      1 -0.250 -0.21384666 -1.168170e-01 -0.21582154
V5      1  0.050  0.03733144  2.717197e-02  0.04139150
-------------------------------------------------------
V6      2  0.000  0.00000000  2.184575e-03  0.00000000
V7      2  0.000  0.00000000  3.353260e-03  0.00000000
V8      2  0.000  0.01031027  2.950791e-02  0.02043597
V9      2  0.500  0.43538564  2.200230e-01  0.44419164
V10     2  0.200  0.16649806  9.620757e-02  0.17844727
-------------------------------------------------------
V11     3 -0.250 -0.20316169 -1.886308e-02 -0.22419384
V12     3  0.120  0.03113405  4.430739e-03  0.05392923
V13     3 -0.125 -0.13474237 -1.468172e-02 -0.15506193
V14     3  0.000  0.00000000 -3.646683e-05  0.00000000
V15     3  0.000  0.00000000 -1.311539e-03  0.00000000
-------------------------------------------------------
V16     4  0.000  0.00000000  0.000000e+00  0.00000000
V17     4  0.000  0.00000000  0.000000e+00 -0.01087451
V18     4  0.000  0.00000000  0.000000e+00  0.00000000
V19     4  0.000  0.00000000  0.000000e+00  0.01946948
V20     4  0.000  0.00000000  0.000000e+00  0.01318269

An Interesting Property of Exclusive Lasso

As stated earlier, the exclusive lasso selects at least one variable from each group. Let’s check if this argument holds true with the next R code by setting λ to a higher value (100), which prevents from selecting variables.

# lasso
la <- glmnet(X, y, lambda = 100,
             family="gaussian", alpha=1,
             intercept = F) 
# group lasso
gr <- gglasso(X, y, lambda = 100,
             group = v.group, loss="ls",
             intercept = F)
# exclusive lasso
ex <- exclusive_lasso(X, y,lambda = 100, 
             groups = v.group, family="gaussian", 
             intercept = F) 

The following result is sufficient for supporting the above explanation. While lasso and group lasso discard all variables with a higher λ, exclusive lasso select one variable from each group. I add horizontal dotted lines for separating each group just for exposition purpose.

> df.comp.higher.lambda

    group   beta Lasso Group     Exclusive
V1      1  0.150     0     0  0.0000000000
V2      1 -0.330     0     0 -0.0031930017
V3      1  0.250     0     0  0.0000000000
V4      1 -0.250     0     0  0.0000000000
V5      1  0.050     0     0  0.0000000000
-------------------------------------------
V6      2  0.000     0     0  0.0000000000
V7      2  0.000     0     0  0.0000000000
V8      2  0.000     0     0  0.0000000000
V9      2  0.500     0     0  0.0051059151
V10     2  0.200     0     0  0.0000000000
-------------------------------------------
V11     3 -0.250     0     0 -0.0026019669
V12     3  0.120     0     0  0.0000000000
V13     3 -0.125     0     0  0.0000000000
V14     3  0.000     0     0  0.0000000000
V15     3  0.000     0     0  0.0000000000
-------------------------------------------
V16     4  0.000     0     0  0.0000000000
V17     4  0.000     0     0  0.0000000000
V18     4  0.000     0     0  0.0006854416
V19     4  0.000     0     0  0.0000000000
V20     4  0.000     0     0  0.0000000000

This is interesting and may be useful when we want to select one security in each sector when forming a diversified asset portfolio with many investment sectors. Of course, a further analysis is necessary to select arbitrary predetermined number of securities from each sector.

Concluding Remarks

This post shows how to use group lasso and exclusive lasso using R code. In particular, I think that the exclusive lasso delivers some interesting result which will be investigated furthermore in following research such as sector-based asset allocation (sectoral diversification).

Reference

Yuan, M. and L. Lin (2006), Model Selection and Estimation in Regression with Grouped Variables, Journal of the Royal Statistical Society, Series B 68, pp. 49–67.

Zhou, Y., R. Jin, and S. Hoi (2010), Exclusive Lasso for Multi-task Feature Selection. In International Conference on Artificial Intelligence and Statistics, pp. 988-995.

Qiu, L., Y. Qu, C. Shang, L. Yang, F. Chao, and Q. Shen (2021), Exclusive Lasso-Based k-Nearest Neighbors Classification. Neural Computing and Applications, pp. 1-15. 

Visit SH Fintech Modeling to learn more about this topic.

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from SH Fintech Modeling and is being posted with permission from SH Fintech Modeling. The views expressed in this material are solely those of the author and/or SH Fintech Modeling and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.