This is an unofficial code for finetune Whisper model with your own dataset

[Original Repo] [Example Colab]

In this setup we use a small part of the LibriSpeech Dataset for finetuning the English model, the other option is using the Vivos dataset for finetuning the Vietnamese model. In case you want to finetune in either another dataset or another language, check the "dataset.py". You are also able to change the hyperparameters by using other setup file base on the file "config/vn_base_example.yaml". The path to config file must be define in .env

Experiment on Vietnamese with Vivos Dataset, WER of the base Whisper model dropped from 45.56% to 24.27% after finetuning 5 epochs.

Python version: 3.8

Setup:

pip install -r requirements.txt
cp .env.copy .env

In case you want to finetune model in Vietnamese, run this command to download the dataset:

python data/download_data_vivos.py
tar -xvf vivos.tar.gz vivos
mv vivos data

Run demo page by running, it will take a while to download the model:

streamlit run interface.py

To Finetune (with only speech-to-text-task):

python finetune.py

In case you want to finetune Whisper for both tasks STT and translate (ex: using google api to translate Vietnamese text to English), you can see the example dataset at link

To evaluate the model:

python evaluate_wer.py

To inference:

You are able to record your own audio file and convert it from speech to text using "record.py" and "inference.py"

Todo list

Add python argument parser and refactor code
Add dockerfile for deploy
Add Vietnamese Text normalization / Postprocessing
Add streamlit interface to record and inference

drnic / openai_whisper_finetuning

This is an unofficial code for finetune Whisper model with your own dataset

Todo list

About

Languages