TRI-ML / dd3d

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-GPU training inside docker

williamhyin opened this issue · comments

HI ,

Thanks for your code release.
I have a question about Multi-GPU training command.
Is it possible to train with Multi-GPU(8) inside docker?

Like:python -m torch.distributed.launch --nproc_per_node 8 train.py xxx

Multi-GPU training outside docker by using the following command is not so comfortable for server training :

make docker-run-mpi COMMAND="".

I am looking forward to your Reply.
And thanks again for your great job!

Thanks for the interest @williamhyin. By default, we only support the multi-gpu training via make docker-run-mpi ... . It should be possible to modify train.py to make with work with the pytorch launcher. We will have a look at this, if there are a number of use cases.

HI, I also want to know how to train with Multi-GPUs by using python -m torch.distributed.launch --nproc_per_node 8 train.py xxx
Looking forward to your Reply and thanks again for your great job!

Hi, guys! It's easily to training with multi-gpu without docker. After install all the requirements, just run the command CUDA_VISIBLE_DEVICES="x,x,x,x" mpirun -np ${num_gpus} ./script/train.py +experiments=dd3d_kitti_dla34.yaml will start training with multi-gpu.

Hi, @revisitq
I met the error

mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: ./script/train.py
Node: shaxbw06
while attempting to start process rank 0.

The command line is

CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml 

Hi, @revisitq I met the error

mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: ./script/train.py
Node: shaxbw06
while attempting to start process rank 0.

The command line is

CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml 
  1. Make sure you install the dependence follow dockerfile
  2. Check your command, it should be CUDA_VISIBLE_DEVICES="5,7" mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml
commented

@williamhyin you can build conda env by youself and run by mpirun -n 8 python scripts/train.py +experiments=dd3d_kitti_dla34