Comparison of LassoNet with Glmnet in terms of Linear Regression

Question

Comparison of LassoNet with Glmnet in terms of Linear Regression

n-erfan opened this issue a year ago · comments

As per the documentation, LassoNet is supposed to behave as a Linear Regressor when the hyperparameter M is set to 0. I'm comparing this configuration with that of another model which can act as a Linear Regressor, i.e. Glmnet (https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet), along with a Lasso penalty. This is to check if they yield the same/ similar optimal lambda value, cross-validation error and feature coefficients.

As per my understanding, the two only differ in the objective function. Glmnet uses the Gaussian equation as the objective function while operating as a Linear Regressor and it differs from that of LassoNet by a constant multiple of 0.5. Hence, the optimal lambda value in Glmnet should be half of that in LassoNet. However, after repeated attempts I've found that to not be the case. The minimum cross-validation error and coefficients also differ between the two models.

To keep the comparison fair, I used the same standardized dataset, the same list of lambda values (Lambda_.txt) that the LassoNet model takes up automatically, along with the same 5 fold cross validation. I've given a code snippet below for better understanding:

lambdas = [Lambda_.txt]

LassoNetRegressorCV(hidden_dims=(2,), M=0.0, random_state=42, torch_seed=0, cv=5)

cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)

It'd be really helpful for me if you could help explain this difference as I intend to use LassoNet in further research endeavors. Please let me know if you need any further clarification.

Thanks.

ilemhadri · Answer 1 · Thu Feb 09 2023 10:06:04 GMT+0800 (China Standard Time)

In these cases, devil is typically in the details.
For example, cv.glmnet runs in R which uses different random seeding mechanisms than Python.
As such, I'd ask for a snippet of code (including perhaps some synthetic data) that fully reproduces this issue before we can investigate further.

Nafis Erfan · Answer 2 · Thu Feb 09 2023 11:41:14 GMT+0800 (China Standard Time)

The synthetic dataset is attached:
X: X_st_p9_N400.csv
y: y_cent_p9_N400.csv

Data Description
There are 9 features that we try to use and predict the credit balance of 400 hypothetical individuals. They have been standardized beforehand in r.

The python code snippet for LassoNet is given below:

modelCV = LassoNetRegressorCV(hidden_dims=(2,), cv=5, M=0, verbose=True, random_state = 42, torch_seed=0)
pathCV = modelCV.path(X, y)

score = utils.eval_on_path(modelCV, pathCV, X, y, score_function=mean_squared_error)
lambda_ = [save.lambda_ for save in pathCV]

print(lambda_[score.index(min(score))])     # 1.6701531119511235
print(min(score)     # 9590.15629790727

fig = plt.figure(figsize=(12, 12))
plt.grid(True)
plt.plot(lambda_, score, ".-")
plt.xlabel("lambda")
plt.xscale("log")
plt.ylabel("CVE")

The CVE Vs lambda plot (please open in a new window for better legibility):

The r code snippet for Glmnet:

cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)
best_lambda <- cv_model$lambda.min

print(best_lambda)     # 0.5090331
print(min(cv_model$cvm))     # 9953.031

best_model <- glmnet(X_st, y_cent, alpha = 1, lambda = best_lambda, intercept=FALSE)

plot(cv_model$lambda, cv_model$cvm,xlab="lambda",ylab="CVE",col="blue",pch=19, log = 'x')

The CVE Vs lambda plot for the cv_model in glmnet is given below:

Findings
The optimal lambda, the cross-validation error minimum value, as well as the cross-validation error Vs lambda differs significantly between both models.

ilemhadri · Answer 3 · Thu Feb 09 2023 11:55:49 GMT+0800 (China Standard Time)

Without any knowledge about where this data comes from/the underlying data-generating process, the first thing that comes to mind is the random seeds. I would start by making sure the 5 folds are identical across both.

Nafis Erfan · Answer 4 · Thu Feb 09 2023 12:30:10 GMT+0800 (China Standard Time)

I suppose I could also use leave one out CV to ensure that the folds are overall identical across both models.

Louis Abraham · Answer 5 · Thu Mar 23 2023 21:04:18 GMT+0800 (China Standard Time)

Closing this for inactivity!