salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

codegen-350M-multi and codegen-350M-mono model files mistakenly shared the same hash

nforest opened this issue · comments

# codegen-350M-nl,multi,mono
# wget -P checkpoints https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-multi.tar.gz && tar -xvf checkpoints/codegen-350M-multi.tar.gz -C checkpoints/
# wget -P checkpoints https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-mono.tar.gz && tar -xvf checkpoints/codegen-350M-mono.tar.gz -C checkpoints/

Hello!
I found that the model files downloaded from two different links above are the same:

md5sum codegen-350M-mono/*
d81cbe1111f246ca7f48850d0ed627fb  codegen-350M-mono/pytorch_model.bin

md5sum codegen-350M-multi/*
d81cbe1111f246ca7f48850d0ed627fb  codegen-350M-multi/pytorch_model.bin

Thank you for noticing. I have corrected the checkpoints. Please verify:

md5sum codegen-350M-mono/*
e5c11e8445017b915d61699bc0d0d204  codegen-350M-mono/config.json
37f839fb92f1fc346c8562ce97e3897b  codegen-350M-mono/pytorch_model.bin

md5sum codegen-350M-multi/*
e5c11e8445017b915d61699bc0d0d204  codegen-350M-multi/config.json
d81cbe1111f246ca7f48850d0ed627fb  codegen-350M-multi/pytorch_model.bin

Thanks, I can confirm that the new checkpoints are more reasonable in downstream tasks.

Thanks, glad to hear