Question about Chapter 6 asr-lstm-ctc

Question

Question about Chapter 6 asr-lstm-ctc

lydiaji opened this issue 6 years ago · comments

In the example of Chapter 6 asr-lstm-ctc, it only has one wav file and one txt file to train. Now I would like to use Tedlium dataset to train, how can I rewrite the program? If I set the batch size as 64, then how about the inputs (its shape and its content)? Thanks so much!

thewintersun · Answer 1 · Fri Jun 22 2018 11:14:06 GMT+0800 (China Standard Time)

要使用Tedlium 的数据集，一次性读入一个batch_size为64的数据，需要改动的代码会比较大。
大致过程如下：

在读取数据的时候，根据batchsize的大小，处理多个wav文件，进行mfcc转换得到特征向量，将特征向量组合成矩阵。
修改网络结构，在网络进行计算的时候可以多维矩阵的计算；

不过修改的量比较大，几句话也说不清楚，可以参考这个：
https://github.com/thewintersun/asrtrain