krylm / whisper-event-tuning

Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whisper fine tuning event 2022 - script modification

Last setup what was used for training best fine tuned model for Whisper in HuggingFace Fine tuning event 2022.

DeepSpeed

First modification was to get access to bigger batch_size without gradient_accumulation_steps using DeepSpeed.

To make it run inside Docker, I've used guide from Zihao's blogpost.

Concatenation of input dataset

It was idea from Bayar. Whisper model uses 30 second batches, but Common Voice dataset is around 3-5 seconds of audio in each sample. We can concatenate audio and text together to fewer samples. To learn from more dense data. It should run faster and learn a lot more from each sample.

Other ideas

According to some details of training Large v2 model in Whisper paper I have some ideas to try in next steps.

Thanks for Whisper Fine tuning event 2022

About

Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.


Languages

Language:Python 95.7%Language:Shell 4.3%