OSError: Not found: "checkpoints/lit-llama/tokenizer.model": No such file or directory Error #2

Question

OSError: Not found: "checkpoints/lit-llama/tokenizer.model": No such file or directory Error #2

anirudhitagi opened this issue 5 months ago · comments

Where to get the tokenizer.model file?

I have been following the instructions given here - https://github.com/Lightning-AI/lit-llama/blob/main/howto/train_redpajama.md

when I run
python scripts/prepare_redpajama.py --source_path data/RedPajama-Data-1T-Sample --tokenizer_path checkpoints/lit-llama/tokenizer.model --destination_path data/lit-redpajama-sample --sample True

I get the error -

OSError: Not found: "checkpoints/lit-llama/tokenizer.model": No such file or directory Error #2

Sebastian Raschka · Answer 1 · Wed Feb 28 2024 02:35:28 GMT+0800 (China Standard Time)

Good question, usually it comes with the model you downloaded via the python download.py ... script

anirudhitagi · Answer 2 · Wed Feb 28 2024 03:03:18 GMT+0800 (China Standard Time)

could you please point me towards the python download.py ... script and a reference commmand?

Sebastian Raschka · Answer 3 · Wed Feb 28 2024 03:20:30 GMT+0800 (China Standard Time)

Sure, for example, you can run

scripts/download.py --repo_id openlm-research/open_llama_7b --local_dir checkpoints/open-llama/7B

as described here, which will download the weights and create the checkpoint files, including tokenizer.model.

The download.py script is in the ./scripts/ subdirectory. Please let me know if you bump into issues or have questions.

anirudhitagi · Answer 4 · Wed Feb 28 2024 06:10:29 GMT+0800 (China Standard Time)

That worked! Thank you so much