kserve / modelmesh

Distributed Model Serving Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are there any plans to support streaming of prediction responses?

Legion2 opened this issue · comments

I'm currently trying to setup streaming reponses of LLM generation from vLLM, however I receive an Streaming not yet supported error from modelmesh. I think this is coming from this code snippet:

String msg = "Streaming not yet supported";

It looks like implementing streaming is a non trivial task in this SidecarModelMesh class. Are there any plans on implementing streaming support or are there any blockers for this?