chaoyi-wu / Finetune_LLAMA

简单易懂的LLaMA微调指南。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

convert_to_ds_params.py doesn't generate tokenizer

tammypi opened this issue · comments

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?

You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py.

I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run
python finetune_pp_peft.py --model_path ../llama-7b/, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.

Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py to convert the model into hf format and run finetune_pp_peft.py without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py in transformers package instead of your file? What is the difference? Thanks!

Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py in this project but add convert_to_ds_params.py incorrectly. Thanks for your issue, I have fixed this bug.