Impact of new Eos token id on human eval dataset
amd-1221 opened this issue · comments
amd-1221 commented
if Eos token id is changed from 2 to 50256, accuracy on eval dataset will also get impacted, If true then what about paper mentioned accuracy on human eval dataset?
Erik Nijkamp commented
For the HumanEval benchmark execution, the tokenizer is instantiated explicitly, so that the model configuration file has no effect. See,
CodeGen/jaxformer/hf/sample.py
Line 84 in c483074
Thank you for the consideration!