salesforce / CodeGen

if Eos token id is changed from 2 to 50256, accuracy on eval dataset will also get impacted, If true then what about paper mentioned accuracy on human eval dataset?

For the HumanEval benchmark execution, the tokenizer is instantiated explicitly, so that the model configuration file has no effect. See,

CodeGen/jaxformer/hf/sample.py

Line 84 in c483074

def create_custom_gpt2_tokenizer():

Thank you for the consideration!

Impact of new Eos token id on human eval dataset