[Question]: A little question about the lm loss computation
nb003pl opened this issue · comments
Description
In aquila_model.py: 148
self.loss_func = nn.CrossEntropyLoss(ignore_index=self.config.ignore_index)
the default ignore_index is -100
In aquila_pretrain.py:117
def padding(indice, max_length, pad_idx=tokenizer.token_end_id):
the default padding value is tokenizer.token_end_id, which equals to 100007
When computing lm loss during pre-training, corresponding to aquila_pretrain.py:212
loss = self.loss_func(
shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1).long()).mean()
Since the padding token 100007, which does not equal to ignore_index -100, will contribute to the final loss.
Pls correct me if my understanding is wrong. Thanks.
Alternatives
No response
Yes you are right.
Actually pretrain datasets have been processed, including adding bos & eos and concatenation. So there are no extra padding id(100007) in one sample.