GAN_Harmonized_with_HMMs

This is the implementation of our paper. In this paper, we proposed an unsupervised speech (phoneme) recogntion system which can achieve 33.1% phoneme error rate on TIMIT. This method developed a GAN-based model to achieve unsupervised phoneme recognition and we further use a set of HMMs to work in harmony with the GAN.

How to use

Dependencies

tensorflow 1.13
kaldi
srilm (can be built with kaldi/tools/install_srilm.sh)
librosa

Data preprocess

Usage:

Modify path.sh with your path of Kaldi and srilm.
Modify config.sh with your code path and timit path.
Run $ bash preprocess.sh

This script will extract features and split dataset into train/test set.
The data which WFST-decoder needed also generate from here.

Train model

Usage:

Modify the experimental setting in config.sh.
Modify the GAN-based model's parameter in src/GAN-based-model/config.yaml.
Run $ bash run.sh

This scipt contains the training flow for GAN-based model and HMM model.
GAN-based model generated the transcription for training HMM model.
HMM model refined the phoneme boundaries for training GAN-based model.

Note

Training process with boundaries generated by GAS (bnd_type=uns) is unstable, which need more training attempts to achieve the satisfactory performance.

Hyperparameters in `config.sh`

bnd_type : type of initial phoneme boundaries (orc/uns).

setting : matched and nonmatched case in our paper (match/nonmatch).

jobs : number of jobs in parallel (depends on your decive).

Reference

Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models, Kuan-Yu Chen, Che-Ping Tsai et.al.

Links

The WFST decoder for phoneme classifier¹ .
The training scripts for Unsupervised HMM ¹ .

Acknowledgement

Special thanks to Che-Ping Tsai (jackyyy0228) !

About

Code：Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models

https://arxiv.org/abs/1904.04100

Languages

Language:Shell 52.3%Language:Python 38.7%Language:Perl 9.0%