Maybe `drop_last` should be set as False in default?
ChihchengHsieh opened this issue · comments
Describe the bug
I set up a small dataset to have a test, but the training process behave really weird, showing that the loss is 0.0, and the weights are unchanged since epoch 1.
What is the current behavior?
epoch 85 | loss: 0.0 | 0:00:00s
epoch 86 | loss: 0.0 | 0:00:00s
epoch 87 | loss: 0.0 | 0:00:00s
epoch 88 | loss: 0.0 | 0:00:00s
epoch 89 | loss: 0.0 | 0:00:00s
epoch 90 | loss: 0.0 | 0:00:00s
epoch 91 | loss: 0.0 | 0:00:00s
epoch 92 | loss: 0.0 | 0:00:00s
epoch 93 | loss: 0.0 | 0:00:00s
epoch 94 | loss: 0.0 | 0:00:00s
epoch 95 | loss: 0.0 | 0:00:00s
epoch 96 | loss: 0.0 | 0:00:00s
epoch 97 | loss: 0.0 | 0:00:00s
epoch 98 | loss: 0.0 | 0:00:00s
epoch 99 | loss: 0.0 | 0:00:00s
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
After diving into your source code, I found it's because I have a small dataset size, which is smaller than default batch_size (1024). And, you also set the drop_last
to True in default, resulting dropping the only batch I had. No error was showing when I encountered this issue. Setting it to Fasle
in default may be more intuitive ? 👍
All the best,
This is debatable indeed, here is the rationale behind this choice:
- training a model on a dataset smaller than your batch size is not really recommended anyway so I don't see your case as a real concern
- without drop_last=True, training on a dataset where N % batch_size == 1 would raise an error because of batch normalization that raises an error when given a batch of one. This behavior however seems like a legit concern, why would the code run with a dataset of size 10240 but not 10241 ?