ImportError: No module named 'fused'

Question

ImportError: No module named 'fused'

HasnainKhanNiazi opened this issue 3 years ago · comments

Muhammad Hasnain Khan commented 3 years ago

Hi, I am trying to setup this repo on my own local machine but I am getting this error. I searched on internet but couldn't find a single solution of this. Any help will be appreciated. Thanks

ImportError: No module named 'fused'

yuval-alaluf · Answer 1 · Fri Sep 10 2021 16:33:06 GMT+0800 (China Standard Time)

Are you working on linux? Have you tried running the code using the provided conda environment?

Muhammad Hasnain Khan · Answer 2 · Fri Sep 10 2021 17:26:09 GMT+0800 (China Standard Time)

Yes, I am working in Linux and I am using the provided conda environment.
Here are system specs:
GPU: Tesla T4
CUDA Version: 11.2
Ubuntu: 18.04

yuval-alaluf · Answer 3 · Fri Sep 10 2021 17:54:31 GMT+0800 (China Standard Time)

Weird. I have Ubuntu 18.04.5 and CUDA 11.1 so the environment seems good. Can you send over the command you tried running?

Muhammad Hasnain Khan · Answer 4 · Fri Sep 10 2021 17:58:58 GMT+0800 (China Standard Time)

I am using Jupyter Notebook present in the notebooks folder ("inference_playground") and I am getting that error on this import line

from models.psp import pSp

Muhammad Hasnain Khan · Answer 5 · Fri Sep 10 2021 18:13:10 GMT+0800 (China Standard Time)

I am not sure what was wrong but now I am not having this error instead I am having an error on this line and error is mentioned below:

Code Line: os.path.join(module_path, 'fused_bias_act_kernel.cu')

Error: ninja: build stopped: subcommand failed.

yuval-alaluf · Answer 6 · Fri Sep 10 2021 18:24:31 GMT+0800 (China Standard Time)

Hmmm. I just ran the notebook in Colab and it worked fine. Ninja can be a pain and there are no really good references to how to fix them.

Any chance you can send me the full stack trace? Maybe there is something that can help us there.

Muhammad Hasnain Khan · Answer 7 · Fri Sep 10 2021 18:37:19 GMT+0800 (China Standard Time)

@yuval-alaluf here is the the full stack trace.

CalledProcessError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1029 cwd=build_directory,
-> 1030 check=True)
1031 else:

~/anaconda3/envs/newEnv/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
13 from datasets.augmentations import AgeTransformer
14 from utils.common import tensor2im
---> 15 from models.psp import pSp

/SAM/notebooks/SAM/notebooks/SAM/models/psp.py in
10
11 from configs.paths_config import model_paths
---> 12 from models.encoders import psp_encoders
13 from models.stylegan2.model import Generator
14

/SAM/notebooks/SAM/notebooks/SAM/models/encoders/psp_encoders.py in
6
7 from models.encoders.helpers import get_blocks, bottleneck_IR, bottleneck_IR_SE
----> 8 from models.stylegan2.model import EqualLinear
9
10

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/model.py in
5 from torch.nn import functional as F
6
----> 7 from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
8
9

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/init.py in
----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu
2 from .upfirdn2d import upfirdn2d

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_act.py in
11 sources=[
12 os.path.join(module_path, 'fused_bias_act.cpp'),
---> 13 os.path.join(module_path, 'fused_bias_act_kernel.cu'),
14 ],
15 )

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
659 verbose,
660 with_cuda,
--> 661 is_python_module)
662
663

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
828 build_directory=build_directory,
829 verbose=verbose,
--> 830 with_cuda=with_cuda)
831 finally:
832 baton.release()

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda)
881 if verbose:
882 print('Building extension module {}...'.format(name))
--> 883 _build_extension_module(name, build_directory, verbose)
884
885

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1041 if hasattr(error, 'output') and error.output:
1042 message += ": {}".format(error.output.decode())
-> 1043 raise RuntimeError(message)
1044
1045

RuntimeError: Error building extension 'fused': [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_75'
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act.cpp -o fused_bias_act.o
ninja: build stopped: subcommand failed.

yuval-alaluf · Answer 8 · Fri Sep 10 2021 18:43:27 GMT+0800 (China Standard Time)

Seems like we're getting somewhere. I noticed the following line:
nvcc fatal : Unsupported gpu architecture 'compute_75'
It seems like there is a mismatch between the GPU and the CUDA version on your system. Were you able to previously use the GPU with CUDA?

yuval-alaluf · Answer 9 · Fri Sep 10 2021 18:44:08 GMT+0800 (China Standard Time)

I found some other issues that may be of help:
facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Muhammad Hasnain Khan · Answer 10 · Fri Sep 10 2021 18:44:57 GMT+0800 (China Standard Time)

This is a fresh system and this is first github repo I ran on this machine so can't say for sure about that.

Muhammad Hasnain Khan · Answer 11 · Fri Sep 10 2021 18:45:26 GMT+0800 (China Standard Time)

I found some other issues that may be of help:
facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

yuval-alaluf · Answer 12 · Fri Sep 10 2021 18:47:48 GMT+0800 (China Standard Time)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

Muhammad Hasnain Khan · Answer 13 · Fri Sep 10 2021 19:00:35 GMT+0800 (China Standard Time)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

I tested torch with cuda and it is working fine.

import torch
Code: torch.cuda.is_available()
OutPut: True
Code: torch.cuda.device(0)
OutPut: <torch.cuda.device object at 0x7fa552331588>
Code: torch.cuda.current_device()
OutPut: 0
Code: torch.cuda.device_count()
OutPut: 1
Code: torch.cuda.get_device_name(0)
OutPut: 'Tesla T4'

yuval-alaluf · Answer 14 · Fri Sep 10 2021 20:13:31 GMT+0800 (China Standard Time)

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Muhammad Hasnain Khan · Answer 15 · Fri Sep 10 2021 21:58:40 GMT+0800 (China Standard Time)

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Here is output that I get by running nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

yuval-alaluf · Answer 16 · Fri Sep 10 2021 22:01:12 GMT+0800 (China Standard Time)

Yea. I see your problem. It appears that you have multiple CUDA versions instead. If you notice, the result of running nvcc --version indicates that you are using CUDA 9.1. And CUDA 9.1 is not compatible with your T4 GPU (which requires CUDA >- 10.1). You need to switch your CUDA to use version 11.2 which you mentioned above.

facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Take a look at the first link here, which will take you to the steps you need for correctly setting your environment to use CUDA 11.1. Just note that in the example there, they use 10.1 so make sure to make the necessary adjustments based on your machine.

Muhammad Hasnain Khan · Answer 17 · Fri Sep 10 2021 22:04:40 GMT+0800 (China Standard Time)

Thanks @yuval-alaluf , I have tried these steps to set the Cuda 11.2 in the source file but after setting it up, still it isn't working and giving me the same error.

Muhammad Hasnain Khan · Answer 18 · Sat Sep 11 2021 16:17:46 GMT+0800 (China Standard Time)

@yuval-alaluf I have changed Cuda to 11.2 and luckily I am not getting that error but now I am getting an error on this line,

Code: ckpt = torch.load(model_path, map_location='cpu')

Error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
188 s = nts(s, "ascii", "strict")
--> 189 n = int(s.strip() or "0", 8)
190 except ValueError:

ValueError: invalid literal for int() with base 8: 'ightq\x04ct'

During handling of the above exception, another exception occurred:

InvalidHeaderError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2296 try:
-> 2297 tarinfo = self.tarinfo.fromtarfile(self)
2298 except EOFHeaderError as e:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in fromtarfile(cls, tarfile)
1092 buf = tarfile.fileobj.read(BLOCKSIZE)
-> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in frombuf(cls, buf, encoding, errors)
1034
-> 1035 chksum = nti(buf[148:156])
1036 if chksum not in calc_chksums(buf):

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
190 except ValueError:
--> 191 raise InvalidHeaderError("invalid header")
192 return n

InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

ReadError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
594 try:
--> 595 return legacy_load(f)
596 except tarfile.TarError:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in legacy_load(f)
505
--> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
507 mkdtemp() as tmpdir:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1588 raise CompressionError("unknown compression type %r" % comptype)
-> 1589 return func(name, filemode, fileobj, **kwargs)
1590

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
1618 raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1619 return cls(name, mode, fileobj, **kwargs)
1620

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in init(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
1481 self.firstmember = None
-> 1482 self.firstmember = self.next()
1483

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2308 elif self.offset == 0:
-> 2309 raise ReadError(str(e))
2310 except EmptyHeaderError:

ReadError: invalid header

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
1 model_path = EXPERIMENT_ARGS['model_path']
----> 2 ckpt = torch.load(model_path, map_location='cpu')

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
424 if sys.version_info >= (3, 0) and 'encoding' not in pickle_load_args.keys():
425 pickle_load_args['encoding'] = 'utf-8'
--> 426 return _load(f, map_location, pickle_module, **pickle_load_args)
427 finally:
428 if new_fd:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
597 if _is_zipfile(f):
598 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
600 # if not a tarfile, reset file offset and proceed
601 f.seek(0)

RuntimeError: ../pretrained_models/sam_ffhq_aging.pt is a zip archive (did you mean to use torch.jit.load()?)`

Muhammad Hasnain Khan · Answer 19 · Sat Sep 11 2021 16:19:54 GMT+0800 (China Standard Time)

I think, this is because of Pytorch version.

yuval-alaluf · Answer 20 · Sat Sep 11 2021 17:08:02 GMT+0800 (China Standard Time)

I think, this is because of Pytorch version.

What torch version are you using?

Muhammad Hasnain Khan · Answer 21 · Sat Sep 11 2021 17:09:24 GMT+0800 (China Standard Time)

I am using this torch version 1.3.1+cu100'

yuval-alaluf · Answer 22 · Sat Sep 11 2021 17:10:39 GMT+0800 (China Standard Time)

Ah. You need to update your torch version to at least 1.6.0.

Muhammad Hasnain Khan · Answer 23 · Sat Sep 11 2021 17:12:41 GMT+0800 (China Standard Time)

Yes, I am doing that, I will update you as soon as I get it done. Thanks for your time, much appreciated.

Muhammad Hasnain Khan · Answer 24 · Mon Sep 13 2021 23:26:53 GMT+0800 (China Standard Time)

@yuval-alaluf Thanks for your time, first it was problem-related to Cuda and then the Pytorch version played an important role in giving errors. Now after Cuda setting to 11.3 and Pytorch to 1.9 it is working fine.

Cheers