The way of computing loss may be confusing
qhd1996 opened this issue · comments
@ZeroRin
Just averaging the loss of in one batche may be confusing for the following reasons:
- Since the number of training samples in one batch are not always the same ( the data_loader contains all the doc indices, not only the training doc indices), just averaging the batch loss may assign different loss weights between batches.
- For different epoch, one training sample may assign with different loss weight.