When using no-shared = False, the process is blocked

Question

When using no-shared = False, the process is blocked

keithyin opened this issue 7 years ago · comments

Hi,Today, i run the code, and found that when no-shared=False, the process will be blocked. Do you have any suggesstions to fix that?

THANKS!

Ilya Kostrikov · Answer 1 · Sun Oct 22 2017 03:17:36 GMT+0800 (China Standard Time)

Blocking doesn't happen to me. What configuration are you using?

kethyin · Answer 2 · Sun Oct 22 2017 15:00:35 GMT+0800 (China Standard Time)

Ubuntu16.04
pytorch 0.2
I just run the downloaded source code, and modifying nothing. Blocking will happed. But if i use no-shared=True, the code can be run.
It is weird.

Joon Kim · Answer 3 · Tue Dec 05 2017 04:28:05 GMT+0800 (China Standard Time)

Same here. Using Ubuntu 16.04, pytorch 0.2, and python3.5. Works fine on OSX though

ShaniGam · Answer 4 · Thu Dec 07 2017 23:53:14 GMT+0800 (China Standard Time)

Anyone found a solution?

Ilya Kostrikov · Answer 5 · Fri Dec 08 2017 04:50:55 GMT+0800 (China Standard Time)

Please report more information.

I tested it on ubuntu 16.04. PyTorch 0.2 and 0.3, python 3.6 and it works for me both on ubuntu and os x.

Joon Kim · Answer 6 · Fri Dec 08 2017 05:01:23 GMT+0800 (China Standard Time)

Ubuntu 16.04, PyTorch 0.2, python 3.5
When I exit with ctrl-C I get that the process is stuck right before p.join().

^CTraceback (most recent call last):
File "main.py", line 77, in
p.join()
File "/usr/lib/python3.5/multiprocessing/process.py", line 121, in join
res = self._popen.wait(timeout)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 51, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

ShaniGam · Answer 7 · Fri Dec 08 2017 16:16:52 GMT+0800 (China Standard Time)

It's the exact same problem as in:
pytorch/pytorch#2496
It's stuck on the ConvND call:
f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False, _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled) return f(input, weight, bias)

Yoshihiko Kato · Answer 8 · Wed Jan 03 2018 05:08:13 GMT+0800 (China Standard Time)

I got same problem with Pytorch 0.3.
I could use this code in MacOS, but can't use in Ubuntu 16.04.

Yoshihiko Kato · Answer 9 · Wed Jan 10 2018 08:38:48 GMT+0800 (China Standard Time)

I find way!!!
mp.set_start_method("spawn")
and change
F.softmax(logit)
to
F.softmax(logit,dim=1)

mohammad hasan sohan ajini · Answer 10 · Sat Jul 28 2018 00:26:06 GMT+0800 (China Standard Time)

@japan4415

Thanks to share your solution, mp.set_start_method("spawn") should be added to the if __name__ == '__main__' scope according to this issue on pytorch. After that every thing works fine.