V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integration Design of Whisper GGML Data Model

fire opened this issue · comments

Current Status

As per the recent discussion, it seems that the symbols are now being exported correctly. However, there is still uncertainty about the functionality and implementation of certain components.

The missing part in the current implementation is the mono? stero? audio data stream input. This needs to be sent to whisper.cpp. After wrapping the core algorithm, resampling is required to the desired format. The audio effect can then be attached to a microphone or speech recording to output text. In the proposed design, audio effect should have an accessor to the whisper ggml ml data model, as a gguf resource.

Please note that due to personal circumstances, I will be away this weekend.

Note that the stats are previously from a voip system and aren't fully relevant. https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech_processor.cpp#L335-L360

Previously there was a audio effect to speech processor to network to audio output.

The current design could be audio effect to speech processor where instead packaging the audio for transmission we transfer it into the whisper.cpp model and get text.

The part that has the whisper.cpp print output is here https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech.cpp#L90-L95

There a lot of unused code, probably need to cleanup.