Wenlin-Zhang / tfkaldi

Speech recognition software where the neural net is trained with TensorFlow and GMM training and decoding is done in Kaldi

Repository from Github https://github.comWenlin-Zhang/tfkaldiRepository from Github https://github.comWenlin-Zhang/tfkaldi

[Build Status] (https://travis-ci.org/kaldi-asr/kaldi)

Kaldi with TensorFlow Neural Net

Please find the documentation page here

Installation

  • Download and install TensorFlow.
  • Download and install Kaldi
  • Modify the config/config_*.cfg for your setup, specifically the directories

Code overview

main.py: Goes through the neural net training procedure, look at the config files in the config directory to modify the settings

  • Compute the features of training and testing set for GMM and DNN
  • Train the monophone GMM with kaldi and get alignments
  • Train the triphone GMM with kaldi and get alignments
  • train the LDA+MLLT GMM with kaldi and get alignments
  • Train the neural net with TensorFlow with the alignments as targets
  • Get the state pseudo-likelihoods of the testing set using the neural net
  • Decode the testing set with Kaldi using the state pseudo-likelihoods and report the results

features/feat.py: Does feature computation currently supports:

  • mfcc
  • fbank
  • ssc

features/prepare_data.py: data prep functionality

  • compute the features for all the utterances
  • compute mean and variance statistics for normalisation
  • shuffle the examples for mini-batch training

io/ark.py: read and write ark format

  • defines a reader class for ark format
  • defines a writer class for ark format

io/batdispenser.py: reading and formatting features

  • defines a class that can read features and do some processing like splicing and cmvn
  • defines a class that can create batches of data

neural networks/nnet.py: neural network class for a kaldi-style neural network

  • train: train the neural net
  • decode: compute pseudo-likelihood

neural networks/nnetgraph.py: creating tensorflow graph structures

  • defines a abstract class for a neural network graph
  • defines a class for a deep neural network, inherits from the neural network graph class
  • defines a class for the training environment for a neural network
  • defines a class for the decoding environment of a neural network

neural networks/nnetlayer.py: layers for a neural network

  • defines a feed forward fully connected layer

If you have a question or remark about the code or if you would like to contribute please mail me at vincent.renkens@esat.kuleuven.be

About

Speech recognition software where the neural net is trained with TensorFlow and GMM training and decoding is done in Kaldi

License:MIT License


Languages

Language:Python 96.3%Language:Shell 3.7%