GRVYDEV / S.A.T.U.R.D.A.Y

A toolbox for working with WebRTC, Audio and AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why chunk text and audio?

infinityp913 opened this issue · comments

Hi @GRVYDEV, I was curious about why you decided to send text from the TTT to the TTS in chunks, and hence audio chunks from the TTS to the browser client.
Why not get the entire text from TTT --> TTS and the entire audio from TTS --> browser client? Is it to account for long texts that might need to be synthesized by SATURDAY, hitting some bottleneck somewhere in the pipeline?

Or is it to minimize latency since I guess with chunked text and audio, we could have SATURDAY speaking as soon as we have text generated by the TTT and not have to wait for the entire piece of text to be generated.

Thanks!

commented

@infinityp913 it'll reduce process time and make everything smoother

Thank you