Issue: Missing Generation of `pytorch_model.bin` File During Model Tuning

Question

Issue: Missing Generation of `pytorch_model.bin` File During Model Tuning

WilliamYi96 opened this issue 5 months ago · comments

Thank you for sharing your interesting project!

Recently, when I ran bash ./script/llama_prune.sh, the pruning step worked perfectly fine. However, during the tuning step, although there were no error information, the generated structure only included the following:

checkpoints-200
- model.safetensors
- optimizer.pt
- rng_state.pth
- scheduler.pt
- trainer_state.json
- training_args.bin

I noticed that the pytorch_model.bin file was not saved. I haven't modified the code, and I am using PyTorch version 2.1.2+cu121. Could you suggest what the possible reason for this might be?

Kai Yi · Answer 1 · Fri Dec 22 2023 21:08:13 GMT+0800 (China Standard Time)

Issue resolved. The reason lies in the newer versions of the transformers library, where safetensors has become the default format, replacing pytorch_model.bin, starting from transformers>=4.33.0. This issue can be addressed by either downgrading to transformers==4.33.0 using pip install transformers==4.33.0, or by setting self_serialization=False in model.save_pretrained().

Tracking here: huggingface/transformers#28183

Kai Yi · Answer 2 · Sat Dec 23 2023 02:27:58 GMT+0800 (China Standard Time)

Two updates:

pip install transformers==4.33.0 will lead to the following issue:

AttributeError: 'LlamaTokenizer' object has no attribute 'added_tokens_decoder'. Did you mean: '_added_tokens_decoder'?

If using the latest transformers and setting self_serialization=False, there is still no pytorch_model.bin saved.

This issue still exists.

Kai Yi · Answer 3 · Sat Dec 23 2023 02:44:24 GMT+0800 (China Standard Time)

Issue resolved. The problem is that when constructing the trainer, save_safetensors=False should be set. Otherwise, the above safe_serialization=False will not work.

https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors