Support for WhisperX standalone
hlevring opened this issue · comments
Would you consider to create a whisperx standalone CLI ? Whisperx also uses faster-whisper backend afaik, but provides far better word level timestamps.
...but provides far better word level timestamps.
I think I didn't noticed much better timestamps when I tested WhisperX v2. And I've seen posts that WhisperX's timestamps are worse.
Can you provide srt and json files made by WhisperX on this test file - https://we.tl/t-SHDM4cGKvB ?
Looked at its Issues and at the top there is an issue with alignment -> m-bain/whisperX#670
I've tested audio file from that post in the issue, when on some lines wav2vec alignment is a bit more precise, there are other lines where it's veeery inaccurate.
Here are srt files to compare:
McDonalds_LNG_061019.WhisperX.zip
McDonalds_LNG_061019.Standalone Faster-Whisper.zip
Would you consider to create a whisperx standalone CLI ?
No. There are lots of stuff crammed in, many compatibility issues... too much to bother for me.
Maybe I could incorporate wav2vec alignment alone from it, but if it has bigger issues than Whisper's original timestamps then maybe it's not worth it...
Its a while back since I tested, so it would have made way more sense if I had opened the ticket with some proper documentation.
Something definitely goes wrong for whisperx with that file. I will have a closer look at the test file and run some more tests against whisperx next week when I have some time off.
Maybe I could incorporate wav2vec alignment alone from it, but if it has bigger issues than Whisper's original timestamps >then maybe it's not worth it...
Definitely agree. Going to make a little batch of test with both english and a couple of more languages.
I think whisperx drastically reduces hallucinations. You can test audio files that hallucinate with whisper.
I think whisperx drastically reduces hallucinations. You can test audio files that hallucinate with whisper.
I think Standalone Faster-Whisper reduces it even more than WhisperX. 😉