codegen-350M-multi and codegen-350M-mono model files mistakenly shared the same hash
nforest opened this issue · comments
nforest commented
# codegen-350M-nl,multi,mono
# wget -P checkpoints https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-multi.tar.gz && tar -xvf checkpoints/codegen-350M-multi.tar.gz -C checkpoints/
# wget -P checkpoints https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-mono.tar.gz && tar -xvf checkpoints/codegen-350M-mono.tar.gz -C checkpoints/
Hello!
I found that the model files downloaded from two different links above are the same:
md5sum codegen-350M-mono/*
d81cbe1111f246ca7f48850d0ed627fb codegen-350M-mono/pytorch_model.bin
md5sum codegen-350M-multi/*
d81cbe1111f246ca7f48850d0ed627fb codegen-350M-multi/pytorch_model.bin
Erik Nijkamp commented
Thank you for noticing. I have corrected the checkpoints. Please verify:
md5sum codegen-350M-mono/*
e5c11e8445017b915d61699bc0d0d204 codegen-350M-mono/config.json
37f839fb92f1fc346c8562ce97e3897b codegen-350M-mono/pytorch_model.bin
md5sum codegen-350M-multi/*
e5c11e8445017b915d61699bc0d0d204 codegen-350M-multi/config.json
d81cbe1111f246ca7f48850d0ed627fb codegen-350M-multi/pytorch_model.bin
nforest commented
Thanks, I can confirm that the new checkpoints are more reasonable in downstream tasks.
Erik Nijkamp commented
Thanks, glad to hear