techthiyanes / WavPrompt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WavPrompt

WavPrompt is speech understanding framework that leveraging the few-shot learning ablitiy of a large-scale pretrained language model to perform speech understanding tasks.

Download datasets

Prepare manifest of the dataset

Use the scripts in wav2vec repository to generate manifest files (.tsv and .ltr) for LibriSpeech. The manifest of other datasets follows the similar format as that of the Librispeech. The .tsv file contains one extra column for class labels

<root directory>
<relative path> <number of frames of the audio> <class label>
<relative path> <number of frames of the audio> <class label>

The .ltr file contains columns for class labels, prompts and transcriptions

<class label> <prompt> <transcription>
<class label> <prompt> <transcription>

Setup conda environment, fairseq and WavPrompt code:

git clone https://github.com/Hertin/WavPrompt.git
cd WavPrompt
./setup.sh

Train WavPrompt models

Train the WavPrompt models:

cd wavprompt
rf=8 # downsampling rate
n_token=0 # if n_token is set, take the first ${n_token} features and discard the rest
freeze_finetune_updates=0 # update the wav2vec model after this number of updates
./run.sh --stage 10 --stop-stage 10 \
  --manifest-path "$(pwd)/manifest/librispeech100" --config-name "asr_pretraining" \
  --n-token ${n_token} --reduction-factor ${rf} --freeze_finetune_updates ${freeze_finetune_updates} \
  --save-dir $(pwd)/outputs/wavpromptlsp100rf${rf}ntok${n_token}

Or submit the training slurm job:

sbatch train_wavprompt.slurm

Evaluate the WavPrompt models

evaluate the WavPrompt model

./eval_wavprompt.slurm bash

Or submit the evaluation slurm job:

sbatch eval_wavprompt.slurm

If you find this project useful, please consider citing this work.

@article{gao2022wavprompt,
  title={WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models},
  author={Gao, Heting and Ni, Junrui and Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Hasegawa-Johnson, Mark},
  journal={arXiv preprint arXiv:2203.15863},
  year={2022}
}

About


Languages

Language:Python 90.0%Language:Shell 10.0%