roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Index out of range on multi-GPU (8 gpus ) after first epoch

akshitac8 opened this issue · comments

Expected results

Successful Training

Actual results

Detailed steps to reproduce

After Running the main and on completion of first epoch, I get an index out of range error with drop_last = False on

mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])

I tried to trace the error reason and came to know that after first epoch last 3 device ids i.e, 5,6,7 which is very weird behaviour.
E.g.:

CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net_step.py --dataset dota_patches --cfg configs/baselines/e2e_mask_rcnn_X-101-64x4d-FPN_2x.yaml --bs 8 --nw 8

System information

  • Operating system: ubuntu16.04
  • CUDA version: 9.0
  • cuDNN version: 7.0
  • GPU models (for all devices if they are not all the same):?
  • python version: 3.6
  • pytorch version: 0.4.0