Lower accuracy when reproducing the experiments

Question

Lower accuracy when reproducing the experiments

HadiHammoud44 opened this issue 4 months ago · comments

Hello, thank you for the work you have done. In my attempt to replicate the experiments reported in the LassoNet paper, I found that the results I am getting are totally different. The performance at 50 selected features is significantly lower than the one reported in the paper (<60% vs 88% for the ISOLET dataset). I have repeated the experiment on 20 or 30 runs and tried all the possible hidden_dim. I was wondering if there I am missing something or if there is a default parameter which significantly affects the performance and needs to be changed.

I will detail the steps I have done:

I downloaded the datasets from Google drive provided on the github repo.
I created a fresh virtual environment using venv with python 3.9.2 and only installed lassonet.
For loading the data, I also used the loaders in data_utils.py in the experiments folder of the repo.
Instead of tuning the hidden_dim (as the paper indicates), I experimented with all the possible options [d//3, 2d//3, d, 4d//3]
I used the following script for running the experiments (for one hidden_dim at a time):

import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from lassonet import LassoNetClassifier
from lassonet.interfaces import LassoNetClassifierCV
from lassonet.plot import plot_path
from lassonet.utils import eval_on_path
from data_utils import load_mice, load_coil, load_activity, load_isolet
import torch
import pickle


(X_train, y_train), (X_test, y_test) = load_isolet()
X_train_valid_fixed = X_train
y_train_valid_fixed = y_train

seed = None
device = 'cuda'

data_dim = X_train.shape[1]
hidden_dim = (data_dim//3,)

score_list_of_lists = []
n_selected_list_of_lists = []
lambda_list_of_lists = []

for i in range(30):
    X_train, X_val, y_train, y_val = train_test_split(X_train_valid_fixed, y_train_valid_fixed, test_size=0.125, random_state=seed)
    model = LassoNetClassifier(M=10, hidden_dims=hidden_dim, verbose=1, torch_seed=seed, random_state=seed, device=device)
    path = model.path(X_train, y_train, X_val=X_val, y_val=y_val)

    score = eval_on_path(model, path, X_test, y_test, score_function=None)
    n_selected = [save.selected.sum().item() for save in path]
    lambda_ = [save.lambda_ for save in path]

    score_list_of_lists.append(score)
    n_selected_list_of_lists.append(n_selected)
    lambda_list_of_lists.append(lambda_)

And the following to plot

plt.figure(figsize=(30, 10))
for sublist_A, sublist_B in zip(score_list_of_lists, n_selected_list_of_lists):
    plt.plot(sublist_B, sublist_A)   
    plt.xlabel('Features Selected')
    plt.ylabel('Accuracy')
    plt.title('Accuracy vs Features Selected for hidden_dim=(data_dim//3,) on ISOLET dataset --- GPU version for 30 runs')

plt.savefig('isolet_1.png')

Surprisingly, I got the following plots:

To rule out possible GPU issues, I ran the first experiment on the CPU for fewer runs (as it was taking longer)

Similarly, I repeated the first experiment on COIL dataset

I was surprised of the plots given the steps I have followed. However, I realized that similar plots were reported by a paper that studies LassoNet (especially for the case of 50 features).

I suspect the behavior should be consistent, and I'm still wondering what I might have missed. Could you kindly provide insight or assistance to help resolve these discrepancies? Thank you in advance!

Louis Abraham · Answer 1 · Thu Mar 07 2024 19:16:22 GMT+0800 (China Standard Time)

Hi! Thank you a lot for your work! This is really useful as we did change the code and the default hyperparameters quite a bit since the publication of the paper.

@ilemhadri is really the one who can answer your questions.

I would recommend to try a lower value of path_multiplier, eg 1.01 or 1.005. You could also disable early stopping with patience=None.

Also, I think the caption of the last figure is wrong.

ilemhadri · Answer 2 · Thu Mar 07 2024 20:13:20 GMT+0800 (China Standard Time)

Many thanks for sharing these results.

Indeed, the current version of LassoNet went through substantial upgrades and bug fixes. In addition, default parameters have likely changed over time to accomodate more use cases and improve overall performance.

Perhaps another thing to keep in mind is that we did not use the network that was learned during training, but retrained the reconstruction network from scratch to address the bias due to L1 regularization (cf. Section 4.4 of the paper).

Hadi Hammoud · Answer 3 · Thu Mar 07 2024 23:16:13 GMT+0800 (China Standard Time)

Many thanks for your immediate responses!
May I ask how is it possible (code wise) to retrain the network from scratch, "restricting the input features to the subset of interest and zeroing out all others"?
Also to be confident of my tests, are there other parameters which you think have been changed and might have a huge impact? I apologize for bombarding you with questions, but I am seeking a fair comparison of the models :D !

Louis Abraham · Answer 4 · Fri Mar 08 2024 00:32:35 GMT+0800 (China Standard Time)

You are welcome!
I'm replying from my phone but basically you just have to select the first feature mask on the path that has less than 50 positive entries.
Then you select those features with X[:, mask] and you train a model with lambda = 0 (you can probably use the standard API, eg provide lambda_seq=[0]).

You can modify the number of epochs to improve the results.
I think that you could try modifying the learning rates too as a last resort.

Hadi Hammoud · Answer 5 · Mon Mar 11 2024 17:06:38 GMT+0800 (China Standard Time)

Thanks for the tips, it finally worked out and I get similar results! Really appreciate your help

Louis Abraham · Answer 6 · Mon Mar 11 2024 17:47:15 GMT+0800 (China Standard Time)

Great!!! Do you mind sharing what was needed exactly (just changing the parameters or retraining without penalty)? Could you contribute by adding your code to the experiments/ folder? That would be great!

Hadi Hammoud · Answer 7 · Mon Mar 11 2024 18:57:45 GMT+0800 (China Standard Time)

It was retraining without penalty, without changing the parameters.

Sure thing! While I managed to reproduce the results in table 1, I couldn't do so for the plots without retraining the model from scratch for each set of selected features, which doesn't align with the functionality of LassoNet as mentioned in appendix A. Thus I refrained from doing it. However, I'll be happy to write the code for this testing if you can clarify the approach for achieving it.

Louis Abraham · Answer 8 · Tue Mar 12 2024 02:13:35 GMT+0800 (China Standard Time)

@ilemhadri can you help here? I don't remember what we did there.

ilemhadri · Answer 9 · Wed Mar 13 2024 19:32:51 GMT+0800 (China Standard Time)

@HadiHammoud44 in case it's not clear from Appendix A, the note there about the cost remaining the same refers to the cost of selecting features and not to the cost of training the final predictive model.
Each of the results in Appendix A did follow the same procedure of retraining the network from scratch on the subset of features that LassoNet selected.

Hadi Hammoud · Answer 10 · Wed Mar 13 2024 23:37:53 GMT+0800 (China Standard Time)

Oh okk clear! Then the code is already done for that, will add it soon. Thanks again for your prompt assistance.