Can't change BOS token or EOS token for GPT Neo
mallorbc opened this issue · comments
In order to better control the start and stop of generated text, I have added BOS tokens and EOS tokens for GPT2xl. This works well and the generated text stops at an appropriate length and starts how a normal sentence would. However, I want to do this process on GPT Neo, and this does not work. I have discovered that for some reason arguments that normally set BOS and EOS are not working when GPT Neo is ran, even if I change the tokenizer from AutoTokenizer to GPT2Tokenizer. Below is some code that shows what I mean.
tokenizer = GPT2Tokenizer.from_pretrained(
model_args.model_name_or_path, bos_token='<|beginingtext|>',eos_token='<|endingtext|>', pad_token='<|pad|>',**tokenizer_kwargs)
print(tokenizer.eos_token)
print(tokenizer.bos_token)
quit()
As I said, when I run this with GPT2xl, the tokens are appropriately changed. When I run this with GPT Neo, both the BOS and EOS tokens are <|endoftext|>
After looking into this further, this may be a bug outside of this project. I am going to make an issue on the hugging face repo. I could be wrong though.
Not 100% sure about this, but according to https://github.com/finetuneanon/gpt-neo_finetune_2.7B#dataset-preparation there is no BOS token in GPT Neo.
Thanks. Maybe its not a bug then. Without a BOS token and EOS token, I can still accomplish my goals, just takes a different, not as elegant method.
Thanks!