Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OSError: Not found: "checkpoints/lit-llama/tokenizer.model": No such file or directory Error #2

anirudhitagi opened this issue · comments

Where to get the tokenizer.model file?

I have been following the instructions given here - https://github.com/Lightning-AI/lit-llama/blob/main/howto/train_redpajama.md

when I run
python scripts/prepare_redpajama.py --source_path data/RedPajama-Data-1T-Sample --tokenizer_path checkpoints/lit-llama/tokenizer.model --destination_path data/lit-redpajama-sample --sample True

I get the error -

OSError: Not found: "checkpoints/lit-llama/tokenizer.model": No such file or directory Error #2

Good question, usually it comes with the model you downloaded via the python download.py ... script

could you please point me towards the python download.py ... script and a reference commmand?

Sure, for example, you can run

scripts/download.py --repo_id openlm-research/open_llama_7b --local_dir checkpoints/open-llama/7B

as described here, which will download the weights and create the checkpoint files, including tokenizer.model.

The download.py script is in the ./scripts/ subdirectory. Please let me know if you bump into issues or have questions.

That worked! Thank you so much