V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On silence, the mic hallucinates

aiaimimi0920 opened this issue · comments

The current version(bd9c18a) also has a large amount of output when the microphone is silent
https://github.com/V-Sekai/godot-whisper/assets/153103332/13ce75ed-2c6f-4224-bdd7-6bc0b118caa2

I remember the previous version(three months ago?) didn't seem to have so many microphone hallucinations
https://github.com/V-Sekai/godot-whisper/assets/153103332/aad89a0c-f965-4349-9b4c-6d0233161b79

If possible, it would be best to solve this problem

That's true. In new version i decoupled the logic as much as possible, so it can be called from gdscript independently. Its true halucination is worse. I'll try look into combining iree.gd for hallucination, now that thats done. @fire ? Ideas?

People have mentioned combining silence detection with whisper as a first thought, but I am concerned about the total latency of the voice transcription.

I see. I'll look into the vad_detection logic, most likely that one when I migrated I didn't do it right. I'll look at old version and see what is different in this one.

AI based VAD is also a thing, and that was my approach for iree and whisper-jax.

The silence part maybe works, some parts in project settings:
-audio/input/transcribe/vad_treshold
-audio/input/transcribe/freq_treshold
Need to be configured.

For now increasing vad_treshold to 2, as that seems to give good results in my case. Increasing it to 5 is even better in terms of silence detection.

@aiaimimi0920 , lmk if u get a chance to try it.