Agrover112 / IIITH-Speech-Internship

Work done as a part of IIITH Speech Internship.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IIITH-Speech-Internship

The following repository contains all the work I have done during my internship at Speech Lab , LTRC, IIIT-H. Since it is virtually impossible to showcase it in one repository, will use this as a vault for future work as well. The initial work (1 year back) was on Speech analysis(Week-1-Week-5) and running an ASR system on Kaldi (GMM-HMM using LDA features) which I deleted (wasn't using git)

Since most of the work is using Open Source tools, they are present as forks .The main work,changes are however contained within my Kaldi fork grapheme to phoneme conversion using [1] [g2p-sequitur], ,

  • Lexicon FST generation : Rendered PDFs present here.
  • [1][2] G2P: Grapheme to phoneme conversion using Sequitur G2P here
  • [3]Vocab and lexicon generation (& validation) : Scripts for automatic lexicon generation present here
  • Forced Alignment and GoP: Forced alignment and Goodness of Pronounciation(probability score) generation using DNN,GMM/HMM Acoustic models as .ctm files and also into Textgrid format, for Praat here
  • Pipeline for Modified GoP score calculation: A pipeline for automatic calculation Posteriors, Alignments and thus calculation of modified GoP scores here

Blogs:

Resources

Notes

[1]It is recommended to use some other grapheme to phoneme convertor such as Phonitisaurus as they might be state of the art, but this is best for Kaldi-like workflow.

[2] The Sequitur G2P probably would fail for everyone attempting to use Python2.7 (fixed by a PR #90)

[3] Documenting really helps especially if a rogue git pull suddenly is merged wierdly removing a commit that's worth 1 months work. Thx git reflog

[4] The decoding graph is pretty important and consists of HoCoLoG components.

About

Work done as a part of IIITH Speech Internship.


Languages

Language:Rich Text Format 99.8%Language:Python 0.2%