minist_hogwild fails with torch-1.12.0
ShiboXing opened this issue · comments
Your issue may already be reported!
Please search on the issue tracker before creating one.
Context
- Pytorch version: 1.12.0
- Operating System and version: Ubuntu 20.04.4 LTS
Your Environment
- Installed using source? [yes/no]: no
- Are you planning to deploy it using docker container? [yes/no]: yes
- Is it a CPU or GPU environment?: GPU
- Which example are you using: mnist_hogwild
- Link to code or data to repro [if any]:
Expected Behavior
mnist_hogwild should finish with no errors
Current Behavior
example mnist_hogwild throws an exception at
examples/mnist_hogwild/main.py
Line 78 in 200cc47
Possible Solution
Steps to Reproduce
- cd examples
- bash run_python_examples.sh "install_deps,mnist_hogwild"
...
Failure Logs [if any]
Starting mnist_hogwild
Running example: mnist_hogwild
Traceback (most recent call last):
File "/home/ubuntu/Playground/examples/mnist_hogwild/main.py", line 78, in <module>
model.share_memory() # gradients are allocated lazily, so they are not shared here
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1912, in share_memory
return self._apply(lambda t: t.share_memory_())
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1912, in <lambda>
return self._apply(lambda t: t.share_memory_())
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_tensor.py", line 515, in share_memory_
self.storage().share_memory_()
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/storage.py", line 595, in share_memory_
self._storage.share_memory_()
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/storage.py", line 194, in share_memory_
self._share_fd_cpu_()
RuntimeError: _share_fd_: only available on CPU
mnist hogwild failed
Finished mnist_hogwild, status 0
Some examples failed:
mnist hogwild failed
for some reason, the storage object doesn't have its is_cuda
flag as True. torch 1.12.1 doesn't seem to have this issue anymore.
@ShiboXing it's a known issue pytorch/pytorch#80733 As it's fixed in 1.12.1 already and examples depend on the latest pytorch version, there is no action item for this issue. I'm closing it.