FlagAI-Open / FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question]: A little question about the lm loss computation

nb003pl opened this issue · comments

Description

In aquila_model.py: 148
self.loss_func = nn.CrossEntropyLoss(ignore_index=self.config.ignore_index)
the default ignore_index is -100

In aquila_pretrain.py:117
def padding(indice, max_length, pad_idx=tokenizer.token_end_id):
the default padding value is tokenizer.token_end_id, which equals to 100007

When computing lm loss during pre-training, corresponding to aquila_pretrain.py:212
loss = self.loss_func(
shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1).long()).mean()

Since the padding token 100007, which does not equal to ignore_index -100, will contribute to the final loss.

Pls correct me if my understanding is wrong. Thanks.

Alternatives

No response

commented

Yes you are right.
Actually pretrain datasets have been processed, including adding bos & eos and concatenation. So there are no extra padding id(100007) in one sample.