GGUF support

Question

Mihaiii opened this issue a month ago · comments

Right now transformers.js works with ONNX models. It would be useful to also support GGUF files (see llama.cpp)

Wider support + ONNX doesn't quantize below 8bit, but GGUF does.

I could help manual testing. Regarding the dev work, I'm unsure.