varunshenoy / super-json-mode

Low latency JSON generation using LLMs ⚡️

Repository from Github https://github.comvarunshenoy/super-json-modeRepository from Github https://github.comvarunshenoy/super-json-mode

Ollama Integration using single processing at a time

Namangarg110 opened this issue · comments

Dear @varunshenoy,

I modified code to incorporate Ollama into the SuperJSON. However, It is unable to do batch processing. If you think it would be considerable placeholder until Ollama adds batch processing I can make a PR.

Best,
Naman

This is a known issue. I'm going to wait until Ollama or llama-cpp-python natively supports batching before accepting any PRs.