Multi-GPU training inside docker
williamhyin opened this issue · comments
HI ,
Thanks for your code release.
I have a question about Multi-GPU training command.
Is it possible to train with Multi-GPU(8) inside docker?
Like:python -m torch.distributed.launch --nproc_per_node 8 train.py xxx
Multi-GPU training outside docker by using the following command is not so comfortable for server training :
make docker-run-mpi COMMAND="".
I am looking forward to your Reply.
And thanks again for your great job!
Thanks for the interest @williamhyin. By default, we only support the multi-gpu training via make docker-run-mpi ...
. It should be possible to modify train.py
to make with work with the pytorch launcher. We will have a look at this, if there are a number of use cases.
HI, I also want to know how to train with Multi-GPUs by using python -m torch.distributed.launch --nproc_per_node 8 train.py xxx
Looking forward to your Reply and thanks again for your great job!
Hi, guys! It's easily to training with multi-gpu without docker. After install all the requirements, just run the command CUDA_VISIBLE_DEVICES="x,x,x,x" mpirun -np ${num_gpus} ./script/train.py +experiments=dd3d_kitti_dla34.yaml
will start training with multi-gpu.
Hi, @revisitq
I met the error
mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: ./script/train.py
Node: shaxbw06
while attempting to start process rank 0.
The command line is
CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml
Hi, @revisitq I met the error
mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: ./script/train.py Node: shaxbw06 while attempting to start process rank 0.
The command line is
CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml
- Make sure you install the dependence follow dockerfile
- Check your command, it should be
CUDA_VISIBLE_DEVICES="5,7" mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml
@williamhyin you can build conda env by youself and run by mpirun -n 8 python scripts/train.py +experiments=dd3d_kitti_dla34