dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

Home Page:https://dreamquark-ai.github.io/tabnet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maybe `drop_last` should be set as False in default?

ChihchengHsieh opened this issue · comments

Describe the bug
I set up a small dataset to have a test, but the training process behave really weird, showing that the loss is 0.0, and the weights are unchanged since epoch 1.

What is the current behavior?

epoch 85 | loss: 0.0 | 0:00:00s
epoch 86 | loss: 0.0 | 0:00:00s
epoch 87 | loss: 0.0 | 0:00:00s
epoch 88 | loss: 0.0 | 0:00:00s
epoch 89 | loss: 0.0 | 0:00:00s
epoch 90 | loss: 0.0 | 0:00:00s
epoch 91 | loss: 0.0 | 0:00:00s
epoch 92 | loss: 0.0 | 0:00:00s
epoch 93 | loss: 0.0 | 0:00:00s
epoch 94 | loss: 0.0 | 0:00:00s
epoch 95 | loss: 0.0 | 0:00:00s
epoch 96 | loss: 0.0 | 0:00:00s
epoch 97 | loss: 0.0 | 0:00:00s
epoch 98 | loss: 0.0 | 0:00:00s
epoch 99 | loss: 0.0 | 0:00:00s

If the current behavior is a bug, please provide the steps to reproduce.

Expected behavior

Screenshots

Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:

Additional context

After diving into your source code, I found it's because I have a small dataset size, which is smaller than default batch_size (1024). And, you also set the drop_last to True in default, resulting dropping the only batch I had. No error was showing when I encountered this issue. Setting it to Fasle in default may be more intuitive ? 👍

All the best,

This is debatable indeed, here is the rationale behind this choice:

  • training a model on a dataset smaller than your batch size is not really recommended anyway so I don't see your case as a real concern
  • without drop_last=True, training on a dataset where N % batch_size == 1 would raise an error because of batch normalization that raises an error when given a batch of one. This behavior however seems like a legit concern, why would the code run with a dataset of size 10240 but not 10241 ?