CUDA and Torch Tutorial 2_supervised

Question

CUDA and Torch Tutorial 2_supervised

markpeot opened this issue 11 years ago · comments

If I set opt.type = 'cuda', train() fails with:

/usr/local/share/torch/lua/nn/SpatialConvolution.lua:44: attempt to index field 'nn' (a nil value)
stack traceback:
/usr/local/share/torch/lua/nn/SpatialConvolution.lua:44: in function 'updateOutput'
/usr/local/share/torch/lua/nn/Sequential.lua:27: in function 'forward'
4_train.lua:160: in function 'opfunc'
/usr/local/share/torch/lua/optim/sgd.lua:36: in function 'optimMethod'
4_train.lua:190: in function 'train'
[string "train()"]:1: in main chunk
[C]: at 0x7fd4d92fea20

The relevant code is: (from SpatialConvolution.lua)
function SpatialConvolution:updateOutput(input)
return input.nn.SpatialConvolution_updateOutput(self, input)
end

The code works fine (albeit slowly) with type = Float or Double.
The same thing happens if I use SpatialConvolutionCUDA.

So, why is the field 'nn' undefined when using cuda?

Mark

Soumith Chintala · Answer 1 · Tue Oct 29 2013 05:00:56 GMT+0800 (China Standard Time)

hey mark, this is an issue of outdated tutorial scripts. Could you point me to which repository you are using for the tutorial scripts, and open a bug report over there.

Soumith Chintala · Answer 2 · Tue Oct 29 2013 05:01:11 GMT+0800 (China Standard Time)

i'll send in a pull request to fix those

Mark Peot · Answer 3 · Tue Oct 29 2013 20:39:38 GMT+0800 (China Standard Time)

I was using https://github.com/clementfarabet/torch-tutorials. Is this obsolete?

Clement Farabet · Answer 4 · Tue Oct 29 2013 21:03:03 GMT+0800 (China Standard Time)

The tutorials are the latest available, but on the CUDA side they're definitely out of date.

When switching to CUDA, SatialConvolutionMM should be replaced by SpatialConvolution, which has the same exact behavior.

SptialConvolutionCUDA is much more efficient with mini-batches, but doesn't have the same behavior.

These tutorials were never meant to demonstrate speed. For proper batch use, this code would be written entirely differently.

Mark Peot · Answer 5 · Wed Oct 30 2013 02:39:57 GMT+0800 (China Standard Time)

Once I realized that tutorial wasn't designed to run with CUDA acceleration under the current design of Torch, I was able to re-engineer the tutorial to work.

Replace SpatialConvolutionMM by SpatialConvolution everywhere.
In 2_supervised/2_model.lua, replace
-- Using a 1D kernel causes cuda memory errors in SpatialConvolutionMap inside SpatialSubtractiveNormalization
normkernel = image.gaussian1d(7)
-- A 2D kernel is less efficient than two 1D convolutions but works correctly.
normkernel = image.gaussian{size=7, sigma=1}

Mark Peot · Answer 6 · Wed Oct 30 2013 02:45:19 GMT+0800 (China Standard Time)

You write: These tutorials were never meant to demonstrate speed. For proper batch use, this code would be written entirely differently.

Aside from using a larger batch, what else would you change to accelerate performance?

Mark

Clement Farabet · Answer 7 · Wed Oct 30 2013 02:56:18 GMT+0800 (China Standard Time)

Cool, thanks for the instructions.

To get awesome speedup you need to use the modules named ***CUDA
(SpatialConvolutionCUDA, ...), and batch sizes multiples of 32. 32 is
usually fine. Then the problem is that these modules assume that the batch
is the innermost dimension in the state tensors. So it's a bit involved,
but that gives you 5 to 7x speedup over the regular CUDA code.

On Tue, Oct 29, 2013 at 2:45 PM, Mark Peot notifications@github.com wrote:

You write: These tutorials were never meant to demonstrate speed. For
proper batch use, this code would be written entirely differently.

Aside from using a larger batch, what else would you change to accelerate
performance?

Mark

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/187#issuecomment-27331445
.

Ivor Rendulic · Answer 8 · Wed Sep 17 2014 15:37:47 GMT+0800 (China Standard Time)

Hi all,

sorry for raising an ancient topic. I'm having the same problem when running on CUDA, but replacing SpatialConvolutionMM with SpatialConvolution in 2_model.lua doesn't do the trick, the same error occurs. Any reason why that might be?
I know it's not meant to run on CUDA, just wanted to see how/if it works before getting more serious and writing my own code.

Thanks!

Soumith Chintala · Answer 9 · Wed Sep 17 2014 23:01:33 GMT+0800 (China Standard Time)

irendulic, which tutorial are you referring to? Can you give the github URL?

Ivor Rendulic · Answer 10 · Thu Sep 18 2014 19:17:59 GMT+0800 (China Standard Time)

Hi, the mentioned tutorial is on Clement's github:
https://github.com/clementfarabet/torch-tutorials/tree/master/2_supervised