PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

is it possible to use a previously downloaded HF .gguf file

cleesmith opened this issue · comments

First, this app works great on a MacBook Pro M3 Max 128GB and for lots of transformers and LLM models. One of the few RAG app's where I have been able to run it without the internet (well, once all of the models are downloaded), and using the terminal command "sudo pumas run" I can see it using 100% GPU (mps) during queries.
So thank you so much, and for your videos on YouTube.

Since I seem to be trying new RAG or fine-tuning app's so often, I have a lot of existing GGUF files from Hugging Face previously downloaded. Is there a way I/you/us can change this app to use any of those previous ".gguf" downloads. As it is time consuming to download the same stuff over and over again. I did notice that the "models" folder has file types other than just a ".gguf" file ... is there a way to convert previously downloaded gguf into the layout used in your "models" folder.

Please let me know and thanks again for this repo.

Thank you and glad you are finding this useful. I am not sure, in the snapshots folder under every model that is downloaded, there is the main gguf file. The code is using llama-cpp-python (python binding) to download the file. This might be doing the conversion under the hood. Will need to look into that.

@cleesmith you may look at https://huggingface.co/docs/huggingface_hub/guides/manage-cache and setting HF_HOME environment variable. I personally did it and all my HF models are downloading there. But on windows it will keep warming you about symlinks and some other stuff. Anyway you may try it. You can also download huggingface-cli and manage downloads and cache from it.

@VerdonTrigance There is a PR done for using symlinks without any errors or bugs being shown.
You can look into it.
The title of the PR has symlink in it.
Thank you.

I'm not sure if there is a better way but the only PR with symlink in the name I found was about ingesting documents, not reusing previously downloaded models. Here's how to do it for anyone still looking:

Example with TheBloke/Phind-CodeLlama-34B-v2-GGUF phind-codellama-34b-v2.Q6_K.gguf:

  • set MODEL_PATH in constants.py
  • get the latest commit hash from huggingface (here it is da37c48be3b0c6cd487fe05259521dc2824f5a5f)
  • mkdir --parents $MODEL_PATH/models--TheBloke--Phind-CodeLlama-34B-v2-GGUF/snapshots/da37c48be3b0c6cd487fe05259521dc2824f5a5f
  • put (or link) your gguf there