Are there any plans to support streaming of prediction responses?
Legion2 opened this issue · comments
I'm currently trying to setup streaming reponses of LLM generation from vLLM, however I receive an Streaming not yet supported
error from modelmesh. I think this is coming from this code snippet:
It looks like implementing streaming is a non trivial task in this SidecarModelMesh class. Are there any plans on implementing streaming support or are there any blockers for this?