ShipBit / wingman-ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimize response streaming for lower latency

Shackless opened this issue · comments

Quoted from Discord user meenie:

To help with the pretty large delay, it would be ideal to stream the reply from GPT-4 into TTS and then start streaming the TTS audio as it's being generated. I haven't jumped into the codebase yet to verify. If you aren't doing that, I think I might be able to help.

We are already utilising streams but should check if we really do it everywhere we can. The biggest delay seems to come from TTS as there is a noticeable delay before the voice output starts. Double-check if we're doing this right.

I put some work into this last night and found that there is a bug with streaming TTS responses in that the feature doesn't actually exist lol. I've created an issue here to keep track of when OpenAI fixes it.

My findings are that when OpenAI eventually fixes this bug, we need to update the AudioPlayer to accept a file path that then utilizes buffering and threads to chunk up the audio file into smaller parts and then use PyDub for playback. I ended up getting the code to work, but it's no faster than what we have right now because of the OpenAI issue mentioned above.

closed in favor of https://wingman-ai.canny.io/feature-requests/p/text-completion-streaming - we'll use canny for (votable) feature requests and GH issues for bugs.