raywu0123 / GAN_Harmonized_with_HMMs

Code:Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models

Home Page:https://arxiv.org/abs/1904.04100

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GAN_Harmonized_with_HMMs

This is the implementation of our paper. In this paper, we proposed an unsupervised speech (phoneme) recogntion system which can achieve 33.1% phoneme error rate on TIMIT. This method developed a GAN-based model to achieve unsupervised phoneme recognition and we further use a set of HMMs to work in harmony with the GAN.

How to use

Dependencies

  1. tensorflow 1.13

  2. kaldi

  3. srilm (can be built with kaldi/tools/install_srilm.sh)

  4. librosa

Data preprocess

  • Usage:
  1. Modify path.sh with your path of Kaldi and srilm.
  2. Modify config.sh with your code path and timit path.
  3. Run $ bash preprocess.sh
  • This script will extract features and split dataset into train/test set.

  • The data which WFST-decoder needed also generate from here.

Train model

  • Usage:
  1. Modify the experimental setting in config.sh.
  2. Modify the GAN-based model's parameter in src/GAN-based-model/config.yaml.
  3. Run $ bash run.sh
  • This scipt contains the training flow for GAN-based model and HMM model.

  • GAN-based model generated the transcription for training HMM model.

  • HMM model refined the phoneme boundaries for training GAN-based model.

Note

  • Training process with boundaries generated by GAS (bnd_type=uns) is unstable, which need more training attempts to achieve the satisfactory performance.

Hyperparameters in config.sh

bnd_type : type of initial phoneme boundaries (orc/uns).

setting : matched and nonmatched case in our paper (match/nonmatch).

jobs : number of jobs in parallel (depends on your decive).

Reference

Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models, Kuan-Yu Chen, Che-Ping Tsai et.al.

Links

  1. The WFST decoder for phoneme classifier1 .
  2. The training scripts for Unsupervised HMM 1 .

Acknowledgement

Special thanks to Che-Ping Tsai (jackyyy0228) !

About

Code:Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models

https://arxiv.org/abs/1904.04100


Languages

Language:Shell 52.3%Language:Python 38.7%Language:Perl 9.0%