[Question]: A little question about the lm loss computation

Question

[Question]: A little question about the lm loss computation

nb003pl opened this issue a year ago · comments

Description

In aquila_model.py: 148
self.loss_func = nn.CrossEntropyLoss(ignore_index=self.config.ignore_index)
the default ignore_index is -100

In aquila_pretrain.py:117
def padding(indice, max_length, pad_idx=tokenizer.token_end_id):
the default padding value is tokenizer.token_end_id, which equals to 100007

When computing lm loss during pre-training, corresponding to aquila_pretrain.py:212
loss = self.loss_func(
shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1).long()).mean()

Since the padding token 100007, which does not equal to ignore_index -100, will contribute to the final loss.

Pls correct me if my understanding is wrong. Thanks.

Alternatives

No response

ldwang · Answer 1 · Fri Aug 04 2023 14:37:08 GMT+0800 (China Standard Time)

Yes you are right.
Actually pretrain datasets have been processed, including adding bos & eos and concatenation. So there are no extra padding id(100007) in one sample.