Purfview / whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[INFO] Does it includes WhisperX strategies too?

gericho opened this issue · comments

Regarding the title, could you kindly let us know if it includes WhisperX strategies as well? Compiling WhisperX.exe on Windows for automatic transcription in Subtitle Edit can be difficult for an average user. Thank you!

Is it even compatible with Windows? What you mean by "WhisperX strategies"?

Correct me if I'm wrong (and I'm almost sure I am), seems WhisperX is more accurate because of the internal structure:

pipeline

they report this on the their github page.

⚡️ Batched inference for 70x realtime transcription using whisper large-v2 🪶 [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5 🎯 Accurate word-level timestamps using wav2vec2 alignment 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels) 🗣️ VAD preprocessing, reduces hallucination & batching with no WER degradation

At the bottom, they show how to install it in Windows using Python/Conda. Unfortunately, Subtitle Edit is asking for an EXE file. Currently, I'm using the standalone faster-whisper.exe, and I wanted to give compliments because, with the large-v3 model, the transcription is almost perfect!

I'm just asking if there is room for improvement in the transcription.

Untitled

...seems WhisperX is more accurate...

Then why I see some posts that it's less accurate?
I guess you are asking about wav2vec alignment, currently it's not implemented.
Related question ->#174

I'm just asking if there is room for improvement in the transcription.

There always is room for improvement.
You can use Standalone Faster-Whisper-XXL and --ff_mdx_kim2 option.

Then why I see some posts that it's less accurate?

Good to know! Thank you! I'll try the XXL and follow the current test posts.