This repo contains some of the stuff I love to refer back to related to Kaldi. I add some important links, lecture that helps in using Kaldi. PDFs are also present in this repository, which are my notes. Kaldi toolkit has lot of resources and information spread out on the internet, despite the presence of many such similar respositories, many links are often outdated as of 2022. This repository will serve as an list for some great links I found online which can be helpful for learning Kaldi and it's internal workings. This should help in demystifiying the working of Kaldi.
I won't accept Pull Requests for fixing Spelling Errors. I consider it the responsibility of other uses to raise meaningful Pull Requests to help with the cause of learning Kaldi
These links contain lectures given by Dan Povey, in the form of Kaldi lectures.
Text preprocessing is an important aspect in ASR when preparing transcripts from raw-data or cleaning transcripts for preparation of lexicon files, doing preprocessing in Linux can be helpful and prevent further errors downstream in the pipeline.
-
http://jrmeyer.github.io/misc/2019/03/02/Linux-textProc-Notes.html
-
Get first column : This can be helpful while splitting lexicon files.
-
Fix space indentation and get the second column file :
cat file_lexicon.txt | tr "\t" " " | tr -s " " | cut -d" " -f1 | sort | uniq
-
Unicode wierd quotation symbols : UTF-8 and us-ascii have some differences such as curly and straight quotations, ... and one ... dot symbol, grave accents ,etc which might or might not be required in your text file. This link should help you understand how they are different despite looking similar to the untrained eye.
-
Convert numbers in transcript to words, also for Indic languages
- Eleanor Chodroff Tutorials
- Joshua Meyer's Kaldi Notes
- Aditya's Notes
- Aditya's Kaldi Recipes
- Awesome Kaldi
- Librispeeh Alignments
- Kaldi Utilities & misc
- The Application of Hidden Markov Models in Speech Recognition by Gales & Young
- Kaldi Deployment
Some links related to theory WFST
Maximum Likelihood Estimation
- http://jrmeyer.github.io/machinelearning/2017/08/18/mle.html
- StatQuests Intro
- StatQuest MLE for Normal Dist
Signal Processing
The decoding process is important to understand , as it is responsible for the final output. Kaldi creates such decoding graphs via compositions of lattices. I think of compositions as dot product of Tensors.
- http://vpanayotov.blogspot.com/2012/06/kaldi-decoding-graph-construction.html
- Oxinabox's Kaldi Notes: Kaldi Decoding and Evaluation
- http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/62/2012-10-25-Step_of_HCLG%28test_time%29.pdf
A list of some great errors faced by Kaldi users, I bookmarked. Note: You might need to join the Google Group for viewing them.
- Why does Kaldi require mono channel audio for training
- Kaldi architectures
- Which DNN to use
- Difference between lang_test_tgsmall and lang_test_tglarge
- Compiling with CUDA: No Kernel Image ..
- Visualizing accuracy plots
- QSub not found?
- Recompile HCLG decoding graph , when you have more text data (L.fst and G.fst
- ContextFst: CreateArc, invalid olabel supplied
- Likelihood of different pronunciations
- Forced Alignment Decoding Failure
- More Decoding Error
- Commands draw-tree, fstprint not working
- Get Word level alignment from phone alignments
- Extract phones with timing from lattice file
- Cholesky Decomposition Failed matrix not postiive definite
- UBM Init Error while training with SGMM2