Purfview / whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for WhisperX standalone

hlevring opened this issue · comments

Would you consider to create a whisperx standalone CLI ? Whisperx also uses faster-whisper backend afaik, but provides far better word level timestamps.

...but provides far better word level timestamps.

I think I didn't noticed much better timestamps when I tested WhisperX v2. And I've seen posts that WhisperX's timestamps are worse.

Can you provide srt and json files made by WhisperX on this test file - https://we.tl/t-SHDM4cGKvB ?

Looked at its Issues and at the top there is an issue with alignment -> m-bain/whisperX#670

I've tested audio file from that post in the issue, when on some lines wav2vec alignment is a bit more precise, there are other lines where it's veeery inaccurate.
Here are srt files to compare:

McDonalds_LNG_061019.WhisperX.zip
McDonalds_LNG_061019.Standalone Faster-Whisper.zip

Would you consider to create a whisperx standalone CLI ?

No. There are lots of stuff crammed in, many compatibility issues... too much to bother for me.

Maybe I could incorporate wav2vec alignment alone from it, but if it has bigger issues than Whisper's original timestamps then maybe it's not worth it...

Its a while back since I tested, so it would have made way more sense if I had opened the ticket with some proper documentation.

Something definitely goes wrong for whisperx with that file. I will have a closer look at the test file and run some more tests against whisperx next week when I have some time off.

Maybe I could incorporate wav2vec alignment alone from it, but if it has bigger issues than Whisper's original timestamps >then maybe it's not worth it...

Definitely agree. Going to make a little batch of test with both english and a couple of more languages.

I think whisperx drastically reduces hallucinations. You can test audio files that hallucinate with whisper.

I think whisperx drastically reduces hallucinations. You can test audio files that hallucinate with whisper.

I think Standalone Faster-Whisper reduces it even more than WhisperX. 😉

Moved there #203