About trainThreads.lua

Question

About trainThreads.lua

ajdroid opened this issue 8 years ago · comments

Abhijat commented 8 years ago

Probably not the right place, but can you tell me what this line is doing?

Is it picking the next index to load?

Jason Taylor · Answer 1 · Mon Aug 08 2016 19:45:16 GMT+0800 (China Standard Time)

That is correct.

To clarify, the inputs to addjob() are:

callback: a function to run on the thread (in this case, loading the data)
endcallback: a function to run on the main thread that takes the output of callback as its input (in this case, train on the data)
callback input: input to the first function (in this case, the index to load)

Abhijat · Answer 2 · Mon Aug 08 2016 19:50:42 GMT+0800 (China Standard Time)

Thanks, that makes sense. Have you observed any overhead while using threads?

Jason Taylor · Answer 3 · Mon Aug 08 2016 22:36:34 GMT+0800 (China Standard Time)

I didn't profile the code. I'm sure some overhead is unavoidable.

The package I used to load videos is incredibly slow: it first loads the video, extracts the frames specified, saves those frames as image files on the disk, re-loads those image files from the disk, and then deletes the image files. I experienced a 3x speed-up (overall) by parallelizing the loading, which indicates that the loading time is slower than actually training the neural net. The speed-up is likely less if you're loading from an SSD or if you have a better means of loading video into Torch, but in my case it was significant.

Abhijat · Answer 4 · Mon Aug 08 2016 22:48:20 GMT+0800 (China Standard Time)

Okay. We've written frames (RGB+flow) to disk already, so maybe it won't be that much of a speedup.
Thanks for all the help!

Abhijat · Answer 5 · Tue Aug 09 2016 22:19:29 GMT+0800 (China Standard Time)

Hi again, is there any reason you defined your feval closure inside the endcallback? It could be in f2 when the threads get initialized or am I missing something?

Jason Taylor · Answer 6 · Tue Aug 09 2016 23:47:25 GMT+0800 (China Standard Time)

I recommend reading the docs for the Threads package, specifically:
https://github.com/torch/threads#threadsthreadsnf1f2
https://github.com/torch/threads#threadsaddjobid-callback-endcallback-

The training must take place in the main thread to avoid writing to overlapping memory on the GPU. The threads package only manages CPU threads and their access to RAM - i.e. the threads have no idea where/how the others are using the GPU. You can test it yourself. You'll get a segmentation fault.

Abhijat · Answer 7 · Tue Aug 09 2016 23:52:59 GMT+0800 (China Standard Time)

No I know, but feval gets called inside optimMethod. Having the call to optimMethod (rather than the entire definition of feval) inside the main thread suffices for my script. I defined feval outside the pool:addjob().

Jason Taylor · Answer 8 · Wed Aug 10 2016 01:41:02 GMT+0800 (China Standard Time)

Yeah, that should be fine then. Was it any faster that way?