drnic / openai_whisper_finetuning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is an unofficial code for finetune Whisper model with your own dataset

[Original Repo] [Example Colab]

In this setup we use a small part of the LibriSpeech Dataset for finetuning the English model, the other option is using the Vivos dataset for finetuning the Vietnamese model. In case you want to finetune in either another dataset or another language, check the "dataset.py". You are also able to change the hyperparameters by using other setup file base on the file "config/vn_base_example.yaml". The path to config file must be define in .env

Experiment on Vietnamese with Vivos Dataset, WER of the base Whisper model dropped from 45.56% to 24.27% after finetuning 5 epochs.

Python version: 3.8

Setup:

pip install -r requirements.txt
cp .env.copy .env

In case you want to finetune model in Vietnamese, run this command to download the dataset:

python data/download_data_vivos.py
tar -xvf vivos.tar.gz vivos
mv vivos data

Run demo page by running, it will take a while to download the model:

streamlit run interface.py

alt text

To Finetune (with only speech-to-text-task):

python finetune.py

In case you want to finetune Whisper for both tasks STT and translate (ex: using google api to translate Vietnamese text to English), you can see the example dataset at link

To evaluate the model:

python evaluate_wer.py

To inference:

You are able to record your own audio file and convert it from speech to text using "record.py" and "inference.py"

Todo list

  • Add python argument parser and refactor code
  • Add dockerfile for deploy
  • Add Vietnamese Text normalization / Postprocessing
  • Add streamlit interface to record and inference

About


Languages

Language:Python 61.3%Language:Jupyter Notebook 38.7%Language:Shell 0.1%