There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

Question

There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

pfeducode opened this issue 2 years ago · comments

Peter commented 2 years ago

When I entered the training command, the program did not respond. How can I solve it

Yunhoe, Ku · Answer 1 · Wed Sep 21 2022 22:52:41 GMT+0800 (China Standard Time)

Hi, @pfeducode! Did you solve this problem?

I also met this problem, but I couldn't solve this problem.

If you solve this problem, will you share the any idea?

Peter · Answer 2 · Fri Sep 23 2022 10:21:14 GMT+0800 (China Standard Time)

I deleted the distributed code, and then it can run normally。

Peter · Answer 3 · Sat Nov 05 2022 19:58:31 GMT+0800 (China Standard Time)

There seems to be a problem with distributed running code. When I entered the training command, the program did not respond, and I had to delete the distributed code

Fa-Ting Hong · Answer 4 · Sat Nov 05 2022 20:18:46 GMT+0800 (China Standard Time)

Please use this command line:
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_dataparallel.py --config config/vox-adv-256.yaml --device_ids 0,1,2,3 --name DaGAN_voxceleb2_depth --rgbd --batchsize 48 --kp_num 15 --generator DepthAwareGenerator

Fa-Ting Hong · Answer 5 · Sat Nov 05 2022 20:21:16 GMT+0800 (China Standard Time)

Actually, I also met this problem when I was using another version pytorch. It seems to only work for "1.9.0+cu111".

Peter · Answer 6 · Wed Nov 09 2022 17:07:12 GMT+0800 (China Standard Time)

Actually, I also met this problem when I was using another version pytorch. It seems to only work for "1.9.0+cu111".

Okay, I'll try later

VedantDere0104 · Answer 7 · Sat Jun 10 2023 13:44:29 GMT+0800 (China Standard Time)

After removing the distributed code for the generator and discriminator and making device changes in the "model_dataparallel.py" file, I have successfully got it working on a single GPU.