pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Home Page:https://pytorch.org/examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

running integration example with official whl fails with multiprocessing exception, simple fix provided

ShiboXing opened this issue · comments

Your issue may already be reported!
Please search on the issue tracker before creating one.

Context

  • Pytorch version: 1.12.1, or lower
  • Operating System and version: Ubuntu 20.04.4 LTS

Your Environment

  • Installed using source? [yes/no]: no
  • Are you planning to deploy it using docker container? [yes/no]: yes
  • Is it a CPU or GPU environment?: GPU
  • Which example are you using: mnist_hogwild
  • Link to code or data to repro [if any]:

Expected Behavior

mnist_hogwild is expected to finish with no error

Current Behavior

example fails at mp.set_start_method('spawn')

Possible Solution

try: mp.set_start_method('spawn') except RuntimeError: pass
tested and working

Steps to Reproduce

  1. pip install torch torchvision
  2. cd examples
  3. bash run_python_examples.sh mnist_hogwild
    ...

Failure Logs [if any]

...

Traceback (most recent call last):
File "/home/ubuntu/Playground/examples/mnist_hogwild/main.py", line 75, in
mp.set_start_method('spawn')
File "/opt/conda/lib/python3.9/multiprocessing/context.py", line 243, in set_start_method
raise RuntimeError('context has already been set')
RuntimeError: context has already been set
mnist hogwild failed
Finished mnist_hogwild, status 0
Some examples failed:

mnist hogwild failed

Hi @ShiboXing , I'm not able to reproduce the issue on my side(torch==1.12.1 in Ubuntu 22.04 LTS) and CI is green as well. Here is a simpler alternative: mp.set_start_method('spawn', force=True). Please feel free to submit a PR for it.