convert_to_ds_params.py doesn't generate tokenizer

Question

convert_to_ds_params.py doesn't generate tokenizer

tammypi opened this issue a year ago · comments

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?

chaoyi-wu · Answer 1 · Mon Jun 26 2023 18:11:23 GMT+0800 (China Standard Time)

You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py.

Jingyey · Answer 2 · Sat Jul 01 2023 04:55:51 GMT+0800 (China Standard Time)

I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run
python finetune_pp_peft.py --model_path ../llama-7b/, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.

Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py to convert the model into hf format and run finetune_pp_peft.py without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py in transformers package instead of your file? What is the difference? Thanks!

chaoyi-wu · Answer 3 · Wed Jul 05 2023 16:44:51 GMT+0800 (China Standard Time)

Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py in this project but add convert_to_ds_params.py incorrectly. Thanks for your issue, I have fixed this bug.