Adding quantized Model

Question

SaddamBInSyed opened this issue 12 days ago · comments

Hi
Thanks for this good work.,

Is there a method to add a quantized LLM model so that it can be run on a GPU with under 10GB of VRAM, making it accessible to more users?

Note:
I am running llama3 from the ollama tool on my laptop, so once we have this option in this repo, I can test the same on my laptop itself.

Thank you