alumae / pl-whisper-finetuner

Whisper finetuning with Pytorch Lightning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whisper Finetuning with Pytorch Lightning

This code implements finetuning of OpenAI Whisper models using Pytorch Lightning. Most of the code is inspired by (and partly directly copied from) whisper-finetuning. However, since the current code is based on Pytorch Lightning, it also support multi-GPU training. It also supports training with SpecAugment (based on the implmenetation in ESPNet).

The finetuning method implemented here (and also the one in whisper-finetuning) is quite different from the finetuning code in HuggingFace Transformers and ESPNet, and is more similar to the training method that was actually used for training Whisper (according to the paper): finetuning is done on 30-second chunks extracted from long audio recordings, with intra-utterance timestamps. Therefore, the resulting model works very well for transcribing long audios, using e.g. faster-whisper.

Usage

TODO

About

Whisper finetuning with Pytorch Lightning

License:MIT License


Languages

Language:Python 100.0%