skhu101 / SHERF

Code for our ICCV'2023 paper "SHERF: Generalizable Human NeRF from a Single Image"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when running eval_THUman

TonNew5418 opened this issue · comments

Using the following commands to setup environment:
conda create --name sherf python=3.8 conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch conda install -c fvcore -c iopath -c conda-forge fvcore iopath pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html) pip install -r requirements.txt conda activate sherf
And nvcc -V is 11.3, torch.cuda.is_available() is True and from pytorch3d import _C is correct.
But got error:
Traceback (most recent call last):
File "train.py", line 446, in
main() # pylint: disable=no-value-for-parameter
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "train.py", line 441, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 101, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 52, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/home/jianl0b/SHERF-main/sherf/training/training_loop.py", line 323, in training_loop
test(G, savedir=testsavedir, neural_rendering_resolution=loss_kwargs['neural_rendering_resolution_initial'], rank=0, use_sr_module=use_sr_module, white_back=False, sample_obs_view=training_set_kwargs.sample_obs_view, fix_obs_view=training_set_kwargs.fix_obs_view, dataset_name=cfg, data_root=training_set_kwargs.data_root, obs_view_lst=[4, 12, 20], nv_pose_start=0, np_pose_start=0, pose_interval=2, pose_num=5)
File "/home/jianl0b/SHERF-main/sherf/training/test_loop.py", line 189, in test
gen_img = model(test_data, torch.randn(1, 512).to(device), torch.zeros((1, 25)).to(device),
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jianl0b/SHERF-main/sherf/training/triplane.py", line 235, in forward
ws = self.mapping(z, c, input_img=input_img, truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas)
File "/home/jianl0b/SHERF-main/sherf/training/triplane.py", line 79, in mapping
return self.backbone.mapping(z, c * self.rendering_kwargs.get('c_scale', 0), truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas)
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jianl0b/SHERF-main/sherf/training/networks_stylegan2.py", line 248, in forward
x = layer(x)
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in call_impl
return forward_call(*input, **kwargs)
File "/home/jianl0b/SHERF-main/sherf/training/networks_stylegan2.py", line 126, in forward
x = bias_act.bias_act(x, b, act=self.activation)
File "/home/jianl0b/SHERF-main/sherf/torch_utils/ops/bias_act.py", line 86, in bias_act
if impl == 'cuda' and x.device.type == 'cuda' and init():
File "/home/jianl0b/SHERF-main/sherf/torch_utils/ops/bias_act.py", line 43, in init
plugin = custom_ops.get_plugin(
File "/home/jianl0b/SHERF-main/sherf/torch_utils/custom_ops.py", line 138, in get_plugin
torch.utils.cpp_extension.load(name=module_name, build_directory=cached_build_dir,
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1144, in load
return jit_compile(
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in jit_compile
write_ninja_file_and_build_library(
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in write_ninja_file_and_build_library
run_ninja_build(
File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'bias_act_plugin': [1/3] :/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cu -o bias_act.cuda.o
FAILED: bias_act.cuda.o
:/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cu -o bias_act.cuda.o
/bin/sh: :/usr/local/cuda/bin/nvcc: No such file or directory
[2/3] c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp -o bias_act.o
FAILED: bias_act.o
c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp -o bias_act.o
In file included from /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp:14:
/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:10: fatal error: cuda_runtime_api.h: No such file or directory
5 | #include <cuda_runtime_api.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

Solved. This problem was caused by 3 subproblems for me:

  1. NVlabs/stylegan3#165 This link is the first problem.
    I change the line config.append(f"nvcc = {nvcc}") to config.append(f"nvcc = {nvcc[1:]}")
    and the line command = ['ninja', '-v'] to command = ['ninja', '--verbose']
  2. The lock of cached file. I delete the folder /home/<user_name>/.cache/torch_extensions/py38_cu113/bias_act_plugin/
    My reference: https://blog.csdn.net/qq_38677322/article/details/109696077
  3. /usr/bin/ld: cannot find -lcudart collect2: error: ld returned 1 exit status
    This is because the lack of libcudart.so. I solved it by adding a soft link.