ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When using no-shared = False, the process is blocked

keithyin opened this issue · comments

Hi,Today, i run the code, and found that when no-shared=False, the process will be blocked. Do you have any suggesstions to fix that?

THANKS!

Blocking doesn't happen to me. What configuration are you using?

Ubuntu16.04
pytorch 0.2
I just run the downloaded source code, and modifying nothing. Blocking will happed. But if i use no-shared=True, the code can be run.
It is weird.

Same here. Using Ubuntu 16.04, pytorch 0.2, and python3.5. Works fine on OSX though

Anyone found a solution?

Please report more information.

I tested it on ubuntu 16.04. PyTorch 0.2 and 0.3, python 3.6 and it works for me both on ubuntu and os x.

Ubuntu 16.04, PyTorch 0.2, python 3.5
When I exit with ctrl-C I get that the process is stuck right before p.join().

^CTraceback (most recent call last):
File "main.py", line 77, in
p.join()
File "/usr/lib/python3.5/multiprocessing/process.py", line 121, in join
res = self._popen.wait(timeout)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 51, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

It's the exact same problem as in:
pytorch/pytorch#2496
It's stuck on the ConvND call:
f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False, _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled) return f(input, weight, bias)

I got same problem with Pytorch 0.3.
I could use this code in MacOS, but can't use in Ubuntu 16.04.

I find way!!!
mp.set_start_method("spawn")
and change
F.softmax(logit)
to
F.softmax(logit,dim=1)

@japan4415

Thanks to share your solution, mp.set_start_method("spawn") should be added to the if __name__ == '__main__' scope according to this issue on pytorch. After that every thing works fine.