wav2vec2mdd

End-to-End Mispronunciation Detection via wav2vec2.0

We provide some useful script for fine-tuning wav2vec2.0 on L2-ARCTIC.(process data/finetune/evaluate) evaluate part are come from https://github.com/cageyoko/CTC-Attention-Mispronunciation

checkpoint/log

Install Requirements

fairseq
Flashlight Python Bindings if you face some problems to install it, you can use aother decode pipeline.
Evaluating the trained model requires tool kaldi

Fine-tune a pre-trained model with CTC

We provide some useful script for fine-tuning wav2vec2.0 on L2-ARCTIC.

Prepare training data manifest

$ python l2_label.py /path/to/waves --dest /manifest/path

Fine-tune a pre-trained model

Edit the run.sh

#!/usr/python/bin/

export CUDA_VISIBLE_DEVICES=1 # GPU device ID
DATASET=/manifest/path

FAIRSEQ_PATH=/path/to/fairseq
valid_subset=valid
model_path=/path/to/pretrain_model.pt  # do not use finetuned model
config_dir=/path/to/config 

config_name=base_finetune # made by reffering https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/config/finetuning/base_10m.yaml
labels=phn
python3 $FAIRSEQ_PATH/fairseq_cli/hydra_train.py \
    distributed_training.distributed_port=0 \
    task.labels=$labels \
    task.data=$DATASET \
    dataset.valid_subset=$valid_subset \
    distributed_training.distributed_world_size=1 \
    model.w2v_path=$model_path \
    --config-dir $config_dir \
    --config-name $config_name

and

$ sh run.sh

Evaluating a CTC model

clone the respository to local

git clone https://github.com/cageyoko/CTC-Attention-Mispronunciation

Edit the evaluate.sh

#!/usr/python/bin/

# Evaluating the CTC model
export CUDA_VISIBLE_DEVICES=0
DATASET=/manifest/path
FAIRSEQ_PATH=/path/to/fairseq

python3 $FAIRSEQ_PATH/examples/speech_recognition/infer.py $DATASET --task audio_pretraining \
--nbest 1 --path /path/to/checkpoints/checkpoint_best.pt --gen-subset test --results-path $DATASET --w2l-decoder viterbi \
--lm-weight 0 --word-score -1 --sil-weight 0 --criterion ctc --labels phn --max-tokens 640000

# Env 
export KALDI_ROOT=/path/to/kaldi
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/irstlm/bin/:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

# calculate the result of MDD
python3 result.py
align-text ark:ref.txt  ark:annotation.txt ark,t:- | wer_per_utt_details.pl > ref_human_detail
align-text ark:annotation.txt  ark:hypo.txt ark,t:- | wer_per_utt_details.pl > human_our_detail
align-text ark:ref.txt  ark:hypo.txt ark,t:- | wer_per_utt_details.pl > ref_our_detail
python3 ins_del_sub_cor_analysis.py
rm ref_human_detail human_our_detail ref_our_detail

and

$ sh evaluate.sh >> result

What's more

we are going to make wav2vec2-based model to provide diagnose information in near future, Please stay tuned.

vocaliodmiku / wav2vec2mdd