lasso-net / lassonet

Feature selection in neural networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparison of LassoNet with Glmnet in terms of Linear Regression

n-erfan opened this issue · comments

As per the documentation, LassoNet is supposed to behave as a Linear Regressor when the hyperparameter M is set to 0. I'm comparing this configuration with that of another model which can act as a Linear Regressor, i.e. Glmnet (https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet), along with a Lasso penalty. This is to check if they yield the same/ similar optimal lambda value, cross-validation error and feature coefficients.

As per my understanding, the two only differ in the objective function. Glmnet uses the Gaussian equation as the objective function while operating as a Linear Regressor and it differs from that of LassoNet by a constant multiple of 0.5. Hence, the optimal lambda value in Glmnet should be half of that in LassoNet. However, after repeated attempts I've found that to not be the case. The minimum cross-validation error and coefficients also differ between the two models.

To keep the comparison fair, I used the same standardized dataset, the same list of lambda values (Lambda_.txt) that the LassoNet model takes up automatically, along with the same 5 fold cross validation. I've given a code snippet below for better understanding:

lambdas = [Lambda_.txt]

LassoNetRegressorCV(hidden_dims=(2,), M=0.0, random_state=42, torch_seed=0, cv=5)

cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)

It'd be really helpful for me if you could help explain this difference as I intend to use LassoNet in further research endeavors. Please let me know if you need any further clarification.

Thanks.

In these cases, devil is typically in the details.
For example, cv.glmnet runs in R which uses different random seeding mechanisms than Python.
As such, I'd ask for a snippet of code (including perhaps some synthetic data) that fully reproduces this issue before we can investigate further.

The synthetic dataset is attached:
X: X_st_p9_N400.csv
y: y_cent_p9_N400.csv

Data Description
There are 9 features that we try to use and predict the credit balance of 400 hypothetical individuals. They have been standardized beforehand in r.

The python code snippet for LassoNet is given below:

modelCV = LassoNetRegressorCV(hidden_dims=(2,), cv=5, M=0, verbose=True, random_state = 42, torch_seed=0)
pathCV = modelCV.path(X, y)

score = utils.eval_on_path(modelCV, pathCV, X, y, score_function=mean_squared_error)
lambda_ = [save.lambda_ for save in pathCV]

print(lambda_[score.index(min(score))])     # 1.6701531119511235
print(min(score)     # 9590.15629790727

fig = plt.figure(figsize=(12, 12))
plt.grid(True)
plt.plot(lambda_, score, ".-")
plt.xlabel("lambda")
plt.xscale("log")
plt.ylabel("CVE")

The CVE Vs lambda plot (please open in a new window for better legibility):
Credit_LassoNet_CVE_v_Lamb

The r code snippet for Glmnet:

cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)
best_lambda <- cv_model$lambda.min

print(best_lambda)     # 0.5090331
print(min(cv_model$cvm))     # 9953.031

best_model <- glmnet(X_st, y_cent, alpha = 1, lambda = best_lambda, intercept=FALSE)

plot(cv_model$lambda, cv_model$cvm,xlab="lambda",ylab="CVE",col="blue",pch=19, log = 'x')

The CVE Vs lambda plot for the cv_model in glmnet is given below:
Credit_glmnet_CVE_v_lamb

Findings
The optimal lambda, the cross-validation error minimum value, as well as the cross-validation error Vs lambda differs significantly between both models.

Without any knowledge about where this data comes from/the underlying data-generating process, the first thing that comes to mind is the random seeds. I would start by making sure the 5 folds are identical across both.

I suppose I could also use leave one out CV to ensure that the folds are overall identical across both models.

Closing this for inactivity!