Differences from Paper

Question

Differences from Paper

johnlarkin1 opened this issue 7 years ago · comments

Hey! I hope you're having a good day. My friend and I actually implemented this as well for a final project. We were vastly impressed by some of your results and your style (your code is really clean). However, I do have a few questions. These are a little bit more high level than normal Github questions, but I think they're still pertinent.

In the Graves paper, he increases the dimensionality of the representation of the input through a series of stacked LSTMs with skip connections. So for example, if we have m = 3 as the depth of the stacked connection, and our input dimension is n = 3 (for x,y,eos) then we would have an ending dimension in $$\mathbb{R}^{18}$$. We can see what this model looks like in Figure 1 of the Graves paper.

However, it seems like in your code you do not do this? It doesn't appear that you actually create this cascade... but your results look fantastic. You simply have a line that says:

outputs, state_out = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, self.state_in, cell, loop_function=None, scope='rnnlm')

Just to be clear, there is no dimensionality increase here? But R3 is still a large enough space for your inputs to be represented and for your model to be so robustly trained?

It seems like you specify your hidden size for the RNN as 256 as the default. How is this possible? Doesn't it need to be 3 so that it corresponds with the input being in $$R^3$$?

Do you penalize your model for starting a new stroke? For example, it does not appear that you reset the internal LSTM parameters at any point if you're starting a new stroke (which Graves notes we want to do). Do you ignore this and still get such high quality results?
You said you trained over your ENTIRE training dataset (all 11,035 strokes) for only HALF of a day WITHOUT a GPU enabled macbook and you were able to generate such clear and realistic handwriting? I just want to be clear because either a higher degree of complexity in our model (because of the stacked LSTMs) are causing training to be slower, or we're not sampling correctly from our model.

I'd love to hear some responses back! Thank you so much for the write up. It is lovely and your knowledge of the SVG package is super impressive. I'd never even heard of it before your writeup.

Thanks for everything!