salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Impact of new Eos token id on human eval dataset

amd-1221 opened this issue · comments

if Eos token id is changed from 2 to 50256, accuracy on eval dataset will also get impacted, If true then what about paper mentioned accuracy on human eval dataset?

For the HumanEval benchmark execution, the tokenizer is instantiated explicitly, so that the model configuration file has no effect. See,

def create_custom_gpt2_tokenizer():

Thank you for the consideration!