kaldi data format
anarucu opened this issue · comments
Hi, I am trying to build a DBN for language Id using PDNN. As is a huge amount of data, I decided to use kaldi data format to structure my data. I use copy-feat kaldi binary to convert my ascii features to .ark, but I don’t know how to do with the labels.
I already have ascci files with the phonetic frame labels, how do I convert that into .ali files?
thx in advance
ana
Hi, PDNN supports the text format of Kaldi labels. You can convert your labels into a text file which contains something such as:
utt1 1 0 3 5 2 0 1
utt2 2 1 3 1 4 0 1 1
... ...
The first field is always the utterance IDs which are followed by a sequence of classes (integer indices) at the frame level. In the example above, utt1 has 7 frames which have the class labels of "1 0 3 5 2 0 1"