Integration Design of Whisper GGML Data Model

Question

Integration Design of Whisper GGML Data Model

fire opened this issue 8 months ago · comments

K. S. Ernest (iFire) Lee commented 8 months ago

Current Status

As per the recent discussion, it seems that the symbols are now being exported correctly. However, there is still uncertainty about the functionality and implementation of certain components.

The missing part in the current implementation is the mono? stero? audio data stream input. This needs to be sent to whisper.cpp. After wrapping the core algorithm, resampling is required to the desired format. The audio effect can then be attached to a microphone or speech recording to output text. In the proposed design, audio effect should have an accessor to the whisper ggml ml data model, as a gguf resource.

Please note that due to personal circumstances, I will be away this weekend.

K. S. Ernest (iFire) Lee · Answer 1 · Sat Nov 25 2023 22:51:59 GMT+0800 (China Standard Time)

Note that the stats are previously from a voip system and aren't fully relevant. https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech_processor.cpp#L335-L360

Previously there was a audio effect to speech processor to network to audio output.

The current design could be audio effect to speech processor where instead packaging the audio for transmission we transfer it into the whisper.cpp model and get text.

K. S. Ernest (iFire) Lee · Answer 2 · Sat Nov 25 2023 22:55:39 GMT+0800 (China Standard Time)

The part that has the whisper.cpp print output is here https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech.cpp#L90-L95

There a lot of unused code, probably need to cleanup.