Using a custom dataset with deepspeech codes

Question

Using a custom dataset with deepspeech codes

NightFury13 opened this issue 8 years ago · comments

NOTE : This is a continuation thread for any future readers who stumble upon similar issues. Before you start off here, do give the conversation on this issue a read.

I am trying to use the deepspeech model to train for scenetext tasks on images. So far, I have been able to convert my data to the LMDB format expected by the codes and run the training scripts, but the error acts really goofy and keeps skipping between inf/nan/+ve/-ve values. Initial trials on this included limiting the value of the MaxNorm of gradients to stop the exploding gradients but that didn't help. The next attempt was to replace the original vanilla RNNs of DeepSpeech2 with LSTM layers in hopes of limiting the gradient-explosion. To do so, one needs to change the RNNModule class in DeepSpeech.lua as pointed out by @SeanNaren below.

Change:

local function RNNModule(inputDim, hiddenDim, opt)
    if opt.nGPU > 0 then
        require 'BatchBRNNReLU'
        return cudnn.BatchBRNNReLU(inputDim, hiddenDim)
    else
        require 'rnn'
        return nn.SeqBRNN(inputDim, hiddenDim)
    end
end

to something like:

local function RNNModule(inputDim, hiddenDim, opt)
        require 'cudnn'
        local rnn = nn.Sequential()
        rnn:add(cudnn.BLSTM(inputDim, hiddenDim, 1)
        rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2)) -- have to sum activations
        rnn:add(nn.Sum(3))
        return rnn
end

@SeanNaren : can you help me out understanding what does the outputDim signify in the changed code? We have the output-dims different from the hidden-dims?

Sean Naren · Answer 1 · Tue Oct 18 2016 19:47:43 GMT+0800 (China Standard Time)

Hey @NightFury13 thanks for this, that's definitely a mistake on my side, it should say hiddenDim! It just reshapes the input to sum the activations rather than have them separate from the bi-directional RNNs :)

Sean Naren · Answer 2 · Wed Oct 19 2016 02:18:29 GMT+0800 (China Standard Time)

Just opened a branch here. Using this branch:

th Train.lua -LSTM -hiddenSize 600 #Just be mindful of the number of parameters

Mohit Jain · Answer 3 · Sun Oct 23 2016 12:20:37 GMT+0800 (China Standard Time)

@SeanNaren Thanks a lot for this! I am facing network issues on my end. Will update with my findings as soon as I am back online!

Sean Naren · Answer 4 · Mon Nov 21 2016 19:11:18 GMT+0800 (China Standard Time)

Going to close this since new information has been added about custom datasets here.