yuval-alaluf / SAM

Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754

Home Page:https://yuval-alaluf.github.io/SAM/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ImportError: No module named 'fused'

HasnainKhanNiazi opened this issue · comments

Hi, I am trying to setup this repo on my own local machine but I am getting this error. I searched on internet but couldn't find a single solution of this. Any help will be appreciated. Thanks

ImportError: No module named 'fused'

Are you working on linux? Have you tried running the code using the provided conda environment?

Yes, I am working in Linux and I am using the provided conda environment.
Here are system specs:
GPU: Tesla T4
CUDA Version: 11.2
Ubuntu: 18.04

Weird. I have Ubuntu 18.04.5 and CUDA 11.1 so the environment seems good. Can you send over the command you tried running?

I am using Jupyter Notebook present in the notebooks folder ("inference_playground") and I am getting that error on this import line

from models.psp import pSp

I am not sure what was wrong but now I am not having this error instead I am having an error on this line and error is mentioned below:

Code Line: os.path.join(module_path, 'fused_bias_act_kernel.cu')

Error: ninja: build stopped: subcommand failed.

Hmmm. I just ran the notebook in Colab and it worked fine. Ninja can be a pain and there are no really good references to how to fix them.

Any chance you can send me the full stack trace? Maybe there is something that can help us there.

@yuval-alaluf here is the the full stack trace.

CalledProcessError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1029 cwd=build_directory,
-> 1030 check=True)
1031 else:

~/anaconda3/envs/newEnv/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
13 from datasets.augmentations import AgeTransformer
14 from utils.common import tensor2im
---> 15 from models.psp import pSp

/SAM/notebooks/SAM/notebooks/SAM/models/psp.py in
11 from configs.paths_config import model_paths
---> 12 from models.encoders import psp_encoders
13 from models.stylegan2.model import Generator

/SAM/notebooks/SAM/notebooks/SAM/models/encoders/psp_encoders.py in
7 from models.encoders.helpers import get_blocks, bottleneck_IR, bottleneck_IR_SE
----> 8 from models.stylegan2.model import EqualLinear

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/model.py in
5 from torch.nn import functional as F
----> 7 from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/init.py in
----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu
2 from .upfirdn2d import upfirdn2d

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_act.py in
11 sources=[
12 os.path.join(module_path, 'fused_bias_act.cpp'),
---> 13 os.path.join(module_path, 'fused_bias_act_kernel.cu'),
14 ],
15 )

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
659 verbose,
660 with_cuda,
--> 661 is_python_module)

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
828 build_directory=build_directory,
829 verbose=verbose,
--> 830 with_cuda=with_cuda)
831 finally:
832 baton.release()

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda)
881 if verbose:
882 print('Building extension module {}...'.format(name))
--> 883 _build_extension_module(name, build_directory, verbose)

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1041 if hasattr(error, 'output') and error.output:
1042 message += ": {}".format(error.output.decode())
-> 1043 raise RuntimeError(message)

RuntimeError: Error building extension 'fused': [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_75'
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act.cpp -o fused_bias_act.o
ninja: build stopped: subcommand failed.

Seems like we're getting somewhere. I noticed the following line:
nvcc fatal : Unsupported gpu architecture 'compute_75'
It seems like there is a mismatch between the GPU and the CUDA version on your system. Were you able to previously use the GPU with CUDA?

This is a fresh system and this is first github repo I ran on this machine so can't say for sure about that.

I found some other issues that may be of help:
facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

I tested torch with cuda and it is working fine.

import torch
Code: torch.cuda.is_available()
OutPut: True
Code: torch.cuda.device(0)
OutPut: <torch.cuda.device object at 0x7fa552331588>
Code: torch.cuda.current_device()
OutPut: 0
Code: torch.cuda.device_count()
OutPut: 1
Code: torch.cuda.get_device_name(0)
OutPut: 'Tesla T4'

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Here is output that I get by running nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Yea. I see your problem. It appears that you have multiple CUDA versions instead. If you notice, the result of running nvcc --version indicates that you are using CUDA 9.1. And CUDA 9.1 is not compatible with your T4 GPU (which requires CUDA >- 10.1). You need to switch your CUDA to use version 11.2 which you mentioned above.

facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Take a look at the first link here, which will take you to the steps you need for correctly setting your environment to use CUDA 11.1. Just note that in the example there, they use 10.1 so make sure to make the necessary adjustments based on your machine.

Thanks @yuval-alaluf , I have tried these steps to set the Cuda 11.2 in the source file but after setting it up, still it isn't working and giving me the same error.

@yuval-alaluf I have changed Cuda to 11.2 and luckily I am not getting that error but now I am getting an error on this line,

Code: ckpt = torch.load(model_path, map_location='cpu')

ValueError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
188 s = nts(s, "ascii", "strict")
--> 189 n = int(s.strip() or "0", 8)
190 except ValueError:

ValueError: invalid literal for int() with base 8: 'ightq\x04ct'

During handling of the above exception, another exception occurred:

InvalidHeaderError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2296 try:
-> 2297 tarinfo = self.tarinfo.fromtarfile(self)
2298 except EOFHeaderError as e:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in fromtarfile(cls, tarfile)
1092 buf = tarfile.fileobj.read(BLOCKSIZE)
-> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in frombuf(cls, buf, encoding, errors)
-> 1035 chksum = nti(buf[148:156])
1036 if chksum not in calc_chksums(buf):

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
190 except ValueError:
--> 191 raise InvalidHeaderError("invalid header")
192 return n

InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

ReadError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
594 try:
--> 595 return legacy_load(f)
596 except tarfile.TarError:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in legacy_load(f)
--> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
507 mkdtemp() as tmpdir:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1588 raise CompressionError("unknown compression type %r" % comptype)
-> 1589 return func(name, filemode, fileobj, **kwargs)

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
1618 raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1619 return cls(name, mode, fileobj, **kwargs)

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in init(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
1481 self.firstmember = None
-> 1482 self.firstmember = self.next()

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2308 elif self.offset == 0:
-> 2309 raise ReadError(str(e))
2310 except EmptyHeaderError:

ReadError: invalid header

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
1 model_path = EXPERIMENT_ARGS['model_path']
----> 2 ckpt = torch.load(model_path, map_location='cpu')

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
424 if sys.version_info >= (3, 0) and 'encoding' not in pickle_load_args.keys():
425 pickle_load_args['encoding'] = 'utf-8'
--> 426 return _load(f, map_location, pickle_module, **pickle_load_args)
427 finally:
428 if new_fd:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
597 if _is_zipfile(f):
598 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
600 # if not a tarfile, reset file offset and proceed
601 f.seek(0)

RuntimeError: ../pretrained_models/sam_ffhq_aging.pt is a zip archive (did you mean to use torch.jit.load()?)`

I think, this is because of Pytorch version.

I think, this is because of Pytorch version.

What torch version are you using?

I am using this torch version 1.3.1+cu100'

Ah. You need to update your torch version to at least 1.6.0.

Yes, I am doing that, I will update you as soon as I get it done. Thanks for your time, much appreciated.

@yuval-alaluf Thanks for your time, first it was problem-related to Cuda and then the Pytorch version played an important role in giving errors. Now after Cuda setting to 11.3 and Pytorch to 1.9 it is working fine.
