How do enable multi GPU training?
AmrutaMuthal opened this issue · comments
I am trying to use this on a multi GPU cloud system. Not sure which parameters to change for utilizing all the cores
I am running this on
Try num_clones=YOUR_GPU_NUMS
Hi @AmrutaMuthal, this repo relies on a dated version of the TensorFlow object detection api. We've moved to a more future proof version here: https://github.com/cloud-annotations/training
I think you rely on train.py for training on multiple gpus, however this has been removed in the new version (no multi-gpu support in the latest version (its coming when the switch to keras)). In the meantime you can set the revision flag to a version that includes the train.py
I was able to train using all the GPUs with the num_clones option. I ended up with a low volatile loss. I expected training speed to increase by that didn't happen either. Realised that data needs some cleanup. I have some very blurred and dark images in my training set. Removing those should help with the loss at least. Not sure how to improve training speed. I am getting close to 1.2 Sec/epoch with a training set of 10K images and batch size =1