huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors running `run_long_form_eval.py`

guynich opened this issue · comments

Following the training/README.md https://github.com/huggingface/distil-whisper/blob/main/training/README.md#long-form.

I have two issues running the bash script example for TED-LIUM validation set.

  1. For this bash script line --dataset_config_name "all" \.

This error.
ValueError: BuilderConfig 'all' not found. Available: ['default']

  1. Changing bash scipt to mitigate 1) --dataset_config_name "default" \.

Then see this error.

File "/home/ubuntu/distil-whisper-large-v2-hi/run_long_form_eval.py", line 578, in eval_step
    eval_labels.append(sample["reference"][0])
KeyError: 'reference'

The model card for TED-LIUM does not mention a reference key.

The config name is fixed in #103, the reference error is fixed in #101!