How do enable multi GPU training?

Question

How do enable multi GPU training?

AmrutaMuthal opened this issue 5 years ago · comments

Amruta Muthal commented 5 years ago

Amruta Muthal · Answer 1 · Fri Feb 15 2019 20:09:15 GMT+0800 (China Standard Time)

I am trying to use this on a multi GPU cloud system. Not sure which parameters to change for utilizing all the cores

Amruta Muthal · Answer 2 · Fri Feb 15 2019 20:09:38 GMT+0800 (China Standard Time)

I am running this on

Nick Bourdakos · Answer 3 · Sun Feb 17 2019 21:22:06 GMT+0800 (China Standard Time)

Try num_clones=YOUR_GPU_NUMS

Nick Bourdakos · Answer 4 · Tue Feb 19 2019 23:45:43 GMT+0800 (China Standard Time)

Hi @AmrutaMuthal, this repo relies on a dated version of the TensorFlow object detection api. We've moved to a more future proof version here: https://github.com/cloud-annotations/training

I think you rely on train.py for training on multiple gpus, however this has been removed in the new version (no multi-gpu support in the latest version (its coming when the switch to keras)). In the meantime you can set the revision flag to a version that includes the train.py

Amruta Muthal · Answer 5 · Tue Feb 19 2019 23:54:27 GMT+0800 (China Standard Time)

I was able to train using all the GPUs with the num_clones option. I ended up with a low volatile loss. I expected training speed to increase by that didn't happen either. Realised that data needs some cleanup. I have some very blurred and dark images in my training set. Removing those should help with the loss at least. Not sure how to improve training speed. I am getting close to 1.2 Sec/epoch with a training set of 10K images and batch size =1