The way of computing loss may be confusing

Question

qhd1996 opened this issue 3 years ago · comments

@ZeroRin
Just averaging the loss of in one batche may be confusing for the following reasons:

Since the number of training samples in one batch are not always the same ( the data_loader contains all the doc indices, not only the training doc indices), just averaging the batch loss may assign different loss weights between batches.
For different epoch, one training sample may assign with different loss weight.