Very high WER % in extensive benchmark on Fleurs
asusdisciple opened this issue · comments
On which dataset did you exactly evaluate the model? I benchmarked this model on the original Fleurs dataset, along with all other implementations of whisper. It performed way worse with a WER% of 1.5, compared to ~0.46 in original whisper. Did I make an implementation error?
Here is how I initialize the model with temp=0
, beams=1
, do_sample=True
:
model = "distil-whisper/distil-large-v2"
obj = pipeline(model=model,
torch_dtype=parameters["torch_dtype"],
device=device, # or mps for Mac devices
chunk_length_s=15,
batch_size=parameters["batch_size"],
return_timestamps=False,
model_kwargs={"use_flash_attention_2": parameters["flash"]},
generate_kwargs={"task": "transcribe",
"num_beams": parameters["beam_size"],
"temperature": parameters["temperature"],
"do_sample": parameters["do_sample"]
}
)
if not parameters["flash"]:
logging.debug("Using Better Transformers without Flash Attention")
obj.model = obj.model.to_bettertransformer()
else:
logging.debug("Using Flash Attention 2")
This how I call the transcribe:
tmp = model(audiopath,
generate_kwargs={"language": lang}
)
res = [i["text"] for i in tmp]
Hey @asusdisciple - what language were you using? It would be really helpful to have a reproducible end-to-end script I can use to get the same results that you're reporting
We use the script run_eval.py with the following launch command: https://github.com/huggingface/distil-whisper/blob/main/training/flax/evaluation_scripts/test/run_distilled.sh
If you execute this, you'll get the results quoted in the paper.
I am sorry, I just overread that distil-whisper is english only (even though it performs very well on a few other languages as well).
Hey @asusdisciple, no worries! If you're interested in a training Whisper on a different language, you can leverage the training code under distil-whisper/training. I recommend first setting up a baseline using these instructions: https://huggingface.co/sanchit-gandhi/distil-whisper-large-v3-de-kd#training-procedure