pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Home Page:https://pytorch.org/examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

minist_hogwild fails with torch-1.12.0

ShiboXing opened this issue · comments

Your issue may already be reported!
Please search on the issue tracker before creating one.

Context

  • Pytorch version: 1.12.0
  • Operating System and version: Ubuntu 20.04.4 LTS

Your Environment

  • Installed using source? [yes/no]: no
  • Are you planning to deploy it using docker container? [yes/no]: yes
  • Is it a CPU or GPU environment?: GPU
  • Which example are you using: mnist_hogwild
  • Link to code or data to repro [if any]:

Expected Behavior

mnist_hogwild should finish with no errors

Current Behavior

example mnist_hogwild throws an exception at

model.share_memory() # gradients are allocated lazily, so they are not shared here

Possible Solution

Steps to Reproduce

  1. cd examples
  2. bash run_python_examples.sh "install_deps,mnist_hogwild"
    ...

Failure Logs [if any]

Starting mnist_hogwild
Running example: mnist_hogwild
Traceback (most recent call last):
  File "/home/ubuntu/Playground/examples/mnist_hogwild/main.py", line 78, in <module>
    model.share_memory() # gradients are allocated lazily, so they are not shared here
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1912, in share_memory
    return self._apply(lambda t: t.share_memory_())
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1912, in <lambda>
    return self._apply(lambda t: t.share_memory_())
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_tensor.py", line 515, in share_memory_
    self.storage().share_memory_()
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/storage.py", line 595, in share_memory_
    self._storage.share_memory_()
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/storage.py", line 194, in share_memory_
    self._share_fd_cpu_()
RuntimeError: _share_fd_: only available on CPU
mnist hogwild failed
Finished mnist_hogwild, status 0
Some examples failed:

mnist hogwild failed

for some reason, the storage object doesn't have its is_cuda flag as True. torch 1.12.1 doesn't seem to have this issue anymore.

@ShiboXing it's a known issue pytorch/pytorch#80733 As it's fixed in 1.12.1 already and examples depend on the latest pytorch version, there is no action item for this issue. I'm closing it.