Adding quantized Model
SaddamBInSyed opened this issue · comments
Hi
Thanks for this good work.,
Is there a method to add a quantized LLM model so that it can be run on a GPU with under 10GB of VRAM, making it accessible to more users?
Note:
I am running llama3 from the ollama tool on my laptop, so once we have this option in this repo, I can test the same on my laptop itself.
Thank you