llm-edge / hal-9100

Edge full-stack LLM platform. Written in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

implement stream

louis030195 opened this issue · comments

no idea why openai didnt implement the most important feature though let's do a temporary hack and then copy them when they impl it

commented

has this been implemented?

no but would love a contribution? any idea about the best UX for this? any prediction how openai will do this?

commented

for GPTs rn, i think they literally just wait for assistants api to complete and streams the tokens out lol

as to how they will do it, not entirely sure. mb this is one of those things that’s a pain to implement and will be superseded soon

im assuming openai will either implement:

  • webhook
  • grpc <- not very client/js friendly
  • websocket <- not very scalable / long wait - friendly
  • something else

to stream response and status/step update during the assistant workflow

problem is that it's annoying for the devs to implement then
ironic because assistants api is supposed to allow you have 10x less code than before

I echo Travis's comment here transitive-bullshit/OpenOpenAI#5 that any choice that's not SSE would be "sad panda" given the existing streaming setup in the chat completion API. Thoughts on what that might look like?