Issue executing GenerateTxtCharCompGraphModel.java

Question

Issue executing GenerateTxtCharCompGraphModel.java

gregbarton opened this issue 2 years ago · comments

Issue Description

The GenerateTxtCharCompGraphModel example seems to be broken. After getting around the missing input data ("java.io.IOException: Server returned HTTP response code: 403 for URL: https://s3.amazonaws.com/dl4j-distribution/pg100.txt", changed url to https://www.gutenberg.org/cache/epub/100/pg100.txt to fix), the ComputationGraphConfiguration appears to be misconfigured. Running the example results in the following error:

java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:750)
Caused by: java.lang.IllegalStateException: Sequence lengths do not match for RnnOutputLayer input and labels:Arrays should be rank 3 with shape [minibatch, size, sequenceLength] - mismatch on dimension 2 (sequence length) - input=[32, 400, 50] vs. label=[32, 77, 50]
    at org.nd4j.common.base.Preconditions.throwStateEx (Preconditions.java:639)
    at org.nd4j.common.base.Preconditions.checkState (Preconditions.java:337)
    at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.backpropGradient (RnnOutputLayer.java:59)
    at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doBackward (LayerVertex.java:148)
    at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients (ComputationGraph.java:2776)
    at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore (ComputationGraph.java:1385)
    at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore (ComputationGraph.java:1345)
    at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore (BaseOptimizer.java:174)
    at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize (StochasticGradientDescent.java:61)
    at org.deeplearning4j.optimize.Solver.optimize (Solver.java:52)
    at org.deeplearning4j.nn.graph.ComputationGraph.doTruncatedBPTT (ComputationGraph.java:3739)
    at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper (ComputationGraph.java:1160)
    at org.deeplearning4j.nn.graph.ComputationGraph.fit (ComputationGraph.java:1119)
    at org.deeplearning4j.nn.graph.ComputationGraph.fit (ComputationGraph.java:1106)
    at org.deeplearning4j.nn.graph.ComputationGraph.fit (ComputationGraph.java:988)
    at org.deeplearning4j.examples.advanced.modelling.charmodelling.generatetext.GenerateTxtCharCompGraphModel.main (GenerateTxtCharCompGraphModel.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:750)

Input size to "outputLayer" is correctly set to 2*lstmLayerSize, and the output sizes of both layers flowing to "outputLayer" are lstmLayerSize each. This appears to be the correct configuration. But the network seems to be expecting the input size (i.e. "iter.inputColumns()") at that point.

How can this be fixed?

Version Information

Please indicate relevant versions, including, if relevant:

Deeplearning4j version - 1.0.0-M2
platform information - MacOS x86

Greg Barton · Answer 1 · Tue Aug 23 2022 05:50:18 GMT+0800 (China Standard Time)

This issue is also present on the Deeplearning4j master branch.

Removing the precondition at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.backpropGradient (RnnOutputLayer.java:59) allows the example to run, and it appears to run successfully.

Adam Gibson · Answer 2 · Tue Aug 23 2022 21:33:23 GMT+0800 (China Standard Time)

@gregbarton if you want feel free to submit a pull request if you have the fix and we can get you credit. I will be happy to review! Thanks! Please migrate this over to the main dl4j repo as that gets more traffic than this one.

Adam Gibson · Answer 3 · Thu Sep 15 2022 07:12:07 GMT+0800 (China Standard Time)

Following up from reddit. Still the same issue. Not clear as to why yet.

Greg Barton · Answer 4 · Thu Sep 15 2022 07:23:07 GMT+0800 (China Standard Time)

In your run https://gist.github.com/agibsonccc/e89d4bb08e3b94c65833b96e6c4945ea you ran org.deeplearning4j.examples.advanced.modelling.charmodelling.generatetext.GenerateTxtModel and not GenerateTxtCharCompGraphModel. GenerateTxtModel works like a champ for me.

Adam Gibson · Answer 5 · Thu Sep 15 2022 09:12:19 GMT+0800 (China Standard Time)

@gregbarton ah sorry for the confusion! Let me take a look again.

Adam Gibson · Answer 6 · Thu Sep 15 2022 21:00:33 GMT+0800 (China Standard Time)

@gregbarton I see the error now and will be able to fix this. Thanks for highlighting!