What are the settings used for WER calculation in the paper?

Question

hidoba opened this issue 7 months ago · comments

Did you compare Whisper-large2 and distil-whisper on Transformers default settings (beam-size = 1, temperature = 1, do_sample = False)?

What would be the difference if you've used Open-ai settings (beam-size = 5)?

Sanchit Gandhi · Answer 1 · Tue Jan 16 2024 01:57:06 GMT+0800 (China Standard Time)

Yes, we evaluated using greedy search with no sampling. For beam size = 5, we see the following (with the abs WER reduction vs greedy):

Whisper-Large-v2 with num_beams=5

Distil-Whisper with num_beams=5

Relative speed-up of Distil-Whisper to Whisper for increasing batch size (bsz):

=> speed-ups are very similar to what we achieved without beam search