Error: ` tensor does not have a device`

Question

Error: ` tensor does not have a device`

siddancha opened this issue 3 months ago · comments

After the latest PR #51 that enables float16, I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 8
      6 # compute features
      7 with torch.inference_mode():
----> 8     hr_feats = upsampler(norm(input))  # (1, 512, H, W)
      9     lr_feats = upsampler.model(norm(input))  # (1, 512, H, W)
     11 assert hr_feats.shape[:2] == lr_feats.shape[:2]

File ~/repos/zero_shot_semantic_segmentation/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/repos/zero_shot_semantic_segmentation/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.cache/torch/hub/mhamilton723_FeatUp_main/hubconf.py:23, in UpsampledBackbone.forward(self, image)
...
---> 20     result = cuda_impl.forward(input, filters)
     21 else:
     22     result = cpp_impl.forward(input, filters)

RuntimeError: tensor does not have a device

Axel Feldmann · Answer 1 · Fri Apr 26 2024 19:13:13 GMT+0800 (China Standard Time)

A few questions:

What GPU are you using? After pulling the change, did you re-install/rebuild the extension? Are you trying to use fp16?

Axel Feldmann · Answer 2 · Fri Apr 26 2024 21:25:05 GMT+0800 (China Standard Time)

A further question: if you run this simple script

import torch
from featup.adaptive_conv_cuda.adaptive_conv import AdaptiveConv

a = torch.randn((1, 3, 10, 10), dtype=torch.float16).cuda()
b = torch.randn((1, 8, 8, 3, 3), dtype=torch.float16).cuda()

result = AdaptiveConv.apply(a, b)
assert result.dtype == torch.float16

What do you get?

Mark Hamilton · Answer 3 · Sat Apr 27 2024 04:19:59 GMT+0800 (China Standard Time)

Also to add some specific instructions that axel is referencing

cd featup 
git pull
pip uninstall featup
rm -r build
pip install -e .

Siddharth Ancha · Answer 4 · Mon Apr 29 2024 03:33:28 GMT+0800 (China Standard Time)

There might be an issue in the way I'm running this; please ignore this message for now ...

What GPU are you using? After pulling the change, did you re-install/rebuild the extension? Are you trying to use fp16?

GPU: NVIDIA GeForce RTX 3070
Driver Version: 535.171.04
CUDA Version: 12.1
OS: Ubuntu 22.04

@axelfeldmann A further question: if you run this simple script, What do you get?

The script that you provided works just fine!

However, this test case fails:

import torch
input = torch.load('input.pth').cuda()  # size=(1, 3, 200, 200), dtype=float32
upsampler = torch.hub.load("mhamilton723/FeatUp", 'maskclip', use_norm=False).to('cuda')
output = upsampler(input)  # size=(1, 512, 192, 192), dtype=float32

where I've attached input.pth in this zip file.

The error message I get is:

Traceback (most recent call last):
  File "/home/sancha/repo/repr/run.py", line 8, in <module>
    output = upsampler(input)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sancha/.cache/torch/hub/mhamilton723_FeatUp_main/hubconf.py", line 23, in forward
    return self.upsampler(self.model(image), image)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sancha/.cache/torch/hub/mhamilton723_FeatUp_main/featup/upsamplers.py", line 272, in forward
    source_2 = self.upsample(source, guidance, self.up1)
  File "/home/sancha/.cache/torch/hub/mhamilton723_FeatUp_main/featup/upsamplers.py", line 268, in upsample
    upsampled = up(source, small_guidance)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sancha/.cache/torch/hub/mhamilton723_FeatUp_main/featup/upsamplers.py", line 249, in forward
    result =  AdaptiveConv.apply(hr_source_padded, combined_kernel)
  File "/home/sancha/repo/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/sancha/.cache/torch/hub/mhamilton723_FeatUp_main/featup/adaptive_conv_cuda/adaptive_conv.py", line 20, in forward
    result = cuda_impl.forward(input, filters)
RuntimeError: tensor does not have a device

This works just fine on a5f9f56a before Axel's PR. I've followed @mhamilton723's instructions to install and reinstall FeatUp.

Siddharth Ancha · Answer 5 · Mon Apr 29 2024 04:20:01 GMT+0800 (China Standard Time)

It seems to be working now when I am on PyTorch 2.3.0.

Previously, I was on PyTorch 2.2.2 with "xformers>=0.0.25.post1", and that seemed to be have been creating issues. Closing since it is resolved for me now.