yajiemiao / pdnn

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finetuning the model

hemmingstein opened this issue · comments

Hello again,
after fixing the learning rate problem, I struggle with the next one: I come to the "finetuning the model" step and then there is this error:

"Traceback (most recent call last):
File "pdnn/cmds/run_CNN.py", line 93, in
train_error = train_sgd(train_fn, cfg)
File "pdnn/learning/sgd.py", line 72, in train_sgd
train_error.append(train_fn(index=batch_index, learning_rate = learning_rate, momentum = momentum))
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 606, in call
storage_map=self.fn.storage_map)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 595, in call
outputs = self.fn()
ValueError: total size of new array must be unchanged
Apply node that caused the error: Reshape{4}(Subtensor{int64:int64:}.0, TensorConstant{[256 1 28 28]})
Inputs types: [TensorType(float64, matrix), TensorType(int64, vector)]
Inputs shapes: [(256, 40), (4,)]
Inputs strides: [(320, 8), (8,)]
Inputs values: ['not shown', array([256, 1, 28, 28])]"

I'm a bit puzzled by this, can you please help me?

Could you paste the full command line you run?

Yeah, here it comes (I added newlines for readablitiy):

python pdnn/cmds/run_CNN.py
--train-data "train.pfile"
--valid-data "dev.pfile"
--conv-nnet-spec "1x28x28:20,5x5,p2x2:50,5x5,p2x2,f"
--nnet-spec "512:10"
--wdir ./
--l2-reg 0.0001
--lrate "C:0.125:20"
--model-save-step 20
--param-output-file cnn.param
--cfg-output-file cnn.cfg

I just tested the latest version on both GPUs and CPUs, and didn't see any problems alike.

For CNN, PDNN has the requirement that you cannot change the batch size after the fine-tuning function is compiled. My interpretation of the error message is that by default, the mini-batch size is set to 256. However, during execution, the batch size is interpreted as a value not equal to 256 anymore. The cause of this difference is beyond me though. My guess is it's due to your compiler, the same reason as your last post.

Thanks anyway!

I had an error like yours, I discovered that there was an old nnet.tmp and training_state.tmp that was sitting in the same directory, that had different net dimensions. this is was cause the error. Simply deleting those files did the trick!

Thanks, I'll try it.