💡 [REQUEST] Add support for llamacpp
kumare3 opened this issue · comments
Ketan Umare commented
Start Date
No response
Implementation PR
It would be great to serve llama models on cpu using unionml. this is possible by using python bindings for llama cpp with 4-bit quantization that allows it to run on cpu pretty well
https://github.com/nomic-ai/pygpt4all
Reference Issues
No response
Summary
Ideal would be that users can fine tune a model and then serve it using the llama cpp module all within the unionml app
Basic Example
NA
Drawbacks
NA
Unresolved questions
No response