torch.multiprocessing subprocess receives tensor with zeros rather than actual data
dfarhi opened this issue · comments
Your issue may already be reported!
Please search on the issue tracker before creating one.
Context
th.multiprocessing seems to not send tensor data to spawned processes on my setup.
- Pytorch version:
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
- Operating System and version: Windows 10 version 21H1
- Cuda 11.7
Your Environment
- Installed using source? [yes/no]: no
- Are you planning to deploy it using docker container? [yes/no]: no
- Is it a CPU or GPU environment?: GPU
- Which example are you using: mnist_hogwild
- Link to code or data to repro [if any]:
Expected Behavior
Insert a print into the start of train.train to check the parameter has been copied to subprocess correctly:
print(f"Norm was: {model.fc1.weight.norm().item()}")
The above print should print some random number. When I run without cuda, it does so:
>python main.py
Norm was: 4.082266807556152
Norm was: 4.081115245819092
... [training begins]
Current Behavior
When I run with cuda the tensor is zero:
>python main.py --cuda
Norm was: 0.0
Norm was: 0.0
... [training begins]
Repro
I think this is not a problem with the example but a problem with the base torch.multiprocesssing
, or a problem with my installation. The issue seems to be that any tensors sent to a subprocess have their data replaced with zeros.
I've put above the steps to reproduce this issue in the mnist_hogwild
example (the steps are just "run it on cuda on my device").
As an even more minimal repro, this also fails for me:
import torch as th
import torch.multiprocessing as mp
if __name__ == "__main__":
parameter = th.randn(1, device='cuda:0')
print(parameter) # here parameter is a 1x1 tensor with a random number
mp.set_start_method("spawn")
p = mp.Process(target=print, args=(parameter,)) # here parameter is a 1x1 zero tensor.
p.start()
p.join()
[Edited to simplify repro code]
Hi @dfarhi , I'm not able to reproduce it with torch==1.12.1+cu102 in Ubuntu 22.04 LTS. Is it still reproducible on your side?