facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Single machine multi-GPU training

AlexNmSED opened this issue · comments

When I use 4 GPUS in single machine , I meet this question:
runtimeerror: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:575] connectruntclosed by peer [172.16.173.129]:23211

Someone can help me ?

Thank you .

try this:
python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py

Thank you. But that's what I do.