Memory consumption in parallel execution using CPU
rafamarquesi opened this issue · comments
Hello! I don't have a lot of experience, especially with deep learning algorithms. I am in need of help with running TabNet. I'm using pytorch-tabnet==4.0.
The dataset:
x_train shape: (2378460, 30)
y_train shape: (2378460,)
x_test shape: (594616, 30)
y_test shape: (594616,)
I'm using the scikit learn library to perform the grid search, as I'm also testing algorithms other than TabNet. For that, I followed the suggestion found in this issue.
The parameters to be adjusted in grid search are:
{
'classifier__estimator': [
TabNetClassifierTuner(
device_name='cpu',
use_embeddings=True,
threshold_categorical_features=150,
use_cat_emb_dim=True
)
],
'classifier__estimator__seed': [settings.random_seed],
'classifier__estimator__clip_value': [1],
'classifier__estimator__verbose': [1],
'classifier__estimator__optimizer_fn': [torch.optim.Adam],
# 'classifier__estimator__optimizer_params': [dict(lr=2e-2)],
'classifier__estimator__optimizer_params': [
{'lr': 0.02},
{'lr': 0.01},
{'lr': 0.001}
],
'classifier__estimator__scheduler_fn': [torch.optim.lr_scheduler.StepLR],
'classifier__estimator__scheduler_params': [{
'step_size': 10, # how to use learning rate scheduler
'gamma': 0.95
}],
'classifier__estimator__mask_type': ['sparsemax'],
'classifier__estimator__n_a': [8, 64],
'classifier__estimator__n_steps': [3, 10],
'classifier__estimator__gamma': [1.3, 2.0],
'classifier__estimator__cat_emb_dim': [10, 20],
'classifier__estimator__n_independent': [2, 5],
'classifier__estimator__n_shared': [2, 5],
'classifier__estimator__momentum': [0.02, 0.4],
'classifier__estimator__lambda_sparse': [0.001, 0.1]
}
The fit parameters:
super().fit(
X_train=X_train,
y_train=y_train,
eval_set=[(X_train, y_train), (X_valid, y_valid)],
eval_name=['train', 'valid'],
eval_metric=['auc', 'balanced_accuracy', 'accuracy'],
weights=0,
max_epochs=1000,
patience=20,
batch_size=16384,
virtual_batch_size=2048,
num_workers=0,
drop_last=False,
augmentations=None
)
These settings are for running the experiment for a binary classification. I also intend to run an experiment for multiclass classification.
I created a smaller sample of the dataset, with almost 50k records, for binary classification, and it worked great. I got the grid search result. Using CPU, it took more than a day, with 37 CPUs and 45G of memory. During this experiment, TabNet's grid search started consuming just over 15G of memory. Over time it went up until it stayed for a long time, and ended up running with just over 40G. Grid search was configured with n_jobs=37 and pre_dispatch=37 and in fit batch_size=1024, virtual_batch_size=128.
Since the full dataset detailed above is larger, I'm using another machine, with 120 CPUs and 115G of memory. However, after a few days, the execution ends due to lack of memory. I've already tried running the grid search with the parameters n_jobs and pre_dispatch equal to 20, 15 and 10. The last one, n_jobs and pre_dispatch equal to 10, after a week of execution, ended due to lack of memory space. I'm trying to run with n_jobs and pre_dispatch equal to 5 now.
For now I don't have access to GPU, I'm looking for this option.
The parallel execution of the grid search always starts consuming little memory. Over time it increases, as I reported. I don't know if I'm doing something wrong in the settings. Is there any way to control this? Any tips?
Thanks in advance for your attention.
I would advise you to get a GPU, it will make the training much much faster. So you won't have to parallelize training, you can just to you grid search one point at a time and it will still be much faster than training on CPU.
I thought I was doing some configuration wrong. I am trying to acquire access to GPUs. Thank you, @Optimox.