halbielee / ACoL_pytorch

Adversarial Complementary Learning for Weakly Supervised Object Localization Pytorch reproducing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hi,

LISIOPPO opened this issue · comments

commented

Use GPU: 0 for training
Traceback (most recent call last):
File "train.py", line 578, in
main()
File "train.py", line 145, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/home/steven/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/steven/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/steven/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/steven/ACoL_reproducing/train.py", line 171, in main_worker
world_size=args.world_size, rank=args.rank)
File "/home/steven/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/home/steven/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 95, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Permission denied

When training, code for multi-gpu seems to be the cause of your error. Please edit the code according to a single gpu.