Voice polish

Question

Voice polish

krschacht opened this issue 2 months ago · comments

Daniel · Answer 1 · Mon May 13 2024 02:47:02 GMT+0800 (China Standard Time)

maybe you could add a model like Silero VAD ?

Keith Schacht · Answer 2 · Mon May 13 2024 05:25:50 GMT+0800 (China Standard Time)

@lumpidu That one is new to me. Thanks for the tip. Btw, if you want to try it out my branch is working pretty well. I’m just going to add automated tests and do a little more polish before merging in.

Daniel · Answer 3 · Tue May 14 2024 20:23:51 GMT+0800 (China Standard Time)

@krschacht you could integrate the model into either the backend via https://github.com/ankane/onnxruntime-ruby, or even into the frontend: https://onnxruntime.ai/docs/api/js/index.html, demo for browser: https://github.com/ricky0123/vad. I will definitely try out your project !

Keith Schacht · Answer 4 · Tue May 14 2024 23:10:18 GMT+0800 (China Standard Time)

@lumpidu This is really cool. I was not aware of client-side models like this for voice detection that could be run in this way. I wonder if it's using the new WebAssembly under the hood.

I don't think I'd prioritize this in the near-term. In case you haven't seen, yesterday I merged in a v1 of the voice mode: #348

I just updated my "voice polish" to-do list at the top of this task based on where I left off yesterday. But one notable thing is that OpenAI just announced that they have this incredible new voice model which is going to be released "soon". I'm not sure if soon is a couple weeks or a couple months :) but I will probably, intentionally, defer some of these tasks until after I can evaluate that. However, I'm using this voice mode daily now myself so I'm going to keep polishing it so that I can enjoy using it while I wait.

If you're interested in helping with any of this, let me know! I can suggest good tasks, and I can help you ramp up on the implementation. I welcome help! :)