Issue with Yi 34B Training EOS token not working

Question

Issue with Yi 34B Training EOS token not working

mrmuke opened this issue 6 months ago · comments

The end of sequence token for yi 34b is not being added to training since the model continues to generate past the EOS token <|endoftext|> after finetuning.

mrmuke · Answer 1 · Mon Dec 04 2023 04:30:21 GMT+0800 (China Standard Time)

Example: "model correct output... <|endoftext|>In this task, you are given a sentence in the English language and your task is to convert it into the Japanese language. In translation, keep numbers as it is and make it sentence case (capitalize only the first word of each sentence and noun).
The first hostage was release", I am adding special tokens via:
"tokenizer.add_special_tokens({
"eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),
"bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),
"unk_token": tokenizer.convert_ids_to_tokens(
model.config.pad_token_id if model.config.pad_token_id != -1 else tokenizer.pad_token_id
),
})"