jrbtaylor / ActivityNet

2016 ActivityNet action recognition challenge. CNN + LSTM approach. Multi-threaded loading.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About trainThreads.lua

ajdroid opened this issue · comments

Probably not the right place, but can you tell me what this line is doing?

Is it picking the next index to load?

That is correct.

To clarify, the inputs to addjob() are:

  1. callback: a function to run on the thread (in this case, loading the data)
  2. endcallback: a function to run on the main thread that takes the output of callback as its input (in this case, train on the data)
  3. callback input: input to the first function (in this case, the index to load)

Thanks, that makes sense. Have you observed any overhead while using threads?

I didn't profile the code. I'm sure some overhead is unavoidable.

The package I used to load videos is incredibly slow: it first loads the video, extracts the frames specified, saves those frames as image files on the disk, re-loads those image files from the disk, and then deletes the image files. I experienced a 3x speed-up (overall) by parallelizing the loading, which indicates that the loading time is slower than actually training the neural net. The speed-up is likely less if you're loading from an SSD or if you have a better means of loading video into Torch, but in my case it was significant.

Okay. We've written frames (RGB+flow) to disk already, so maybe it won't be that much of a speedup.
Thanks for all the help!

Hi again, is there any reason you defined your feval closure inside the endcallback? It could be in f2 when the threads get initialized or am I missing something?

I recommend reading the docs for the Threads package, specifically:
https://github.com/torch/threads#threadsthreadsnf1f2
https://github.com/torch/threads#threadsaddjobid-callback-endcallback-

The training must take place in the main thread to avoid writing to overlapping memory on the GPU. The threads package only manages CPU threads and their access to RAM - i.e. the threads have no idea where/how the others are using the GPU. You can test it yourself. You'll get a segmentation fault.

No I know, but feval gets called inside optimMethod. Having the call to optimMethod (rather than the entire definition of feval) inside the main thread suffices for my script. I defined feval outside the pool:addjob().

Yeah, that should be fine then. Was it any faster that way?