Unable to install the inplace_abn library due to CUDA error

Question

Unable to install the inplace_abn library due to CUDA error

sainatarajan opened this issue 5 years ago · comments

Hi! Thanks for this repo. I am unable to install the inplace_abn library that is required to train your models. I have tried many ways to install and debug but it still shows the CUDA error. error: command '/usr/bin/nvcc' failed with exit status 1. Is it possible to train the model without using this library by making a few changes in the code?

Zilong Huang · Answer 1 · Wed Oct 09 2019 22:59:13 GMT+0800 (China Standard Time)

Which branch do you use? master or pytorch-1.1.
And what's your Pytorch version, Cuda version?

sainatarajan · Answer 2 · Wed Oct 09 2019 23:02:57 GMT+0800 (China Standard Time)

Thanks for the reply. I downloaded the pytorch 1.1 branch and the versions in my local system are:

pytorch 1.2.0
torchvision 0.4.0
CuDNN 7.6.0
CudaToolkit 10.0.130

Zilong Huang · Answer 3 · Wed Oct 09 2019 23:07:58 GMT+0800 (China Standard Time)

The versions are ok for compiling.
Can you post the detailed error message?

sainatarajan · Answer 4 · Wed Oct 09 2019 23:08:50 GMT+0800 (China Standard Time)

Sure, I will recompile now and post it here.

sainatarajan · Answer 5 · Wed Oct 09 2019 23:15:24 GMT+0800 (China Standard Time)

I installed the inplace_abn using pip and I got the following error. The complete stack trace is extremely very long. Here is the last 55 lines of the trace. My GCC version is 7.4.0

/usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = const std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’:
  /usr/include/c++/6/tuple:662:419:   required by substitution of ‘template<class ... _UElements, class _Dummy, typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<const tuple<_Elements ...>&>()), bool>::type <anonymous> > constexpr std::tuple< <template-parameter-1-1> >::tuple(const std::tuple<_Args1 ...>&) [with _UElements = {at::Tensor&, at::Tensor&, at::Tensor&}; _Dummy = void; typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<const tuple<_Elements ...>&>()), bool>::type <anonymous> = <missing>]’
  /home/sainatarajan/anaconda3/envs/tf_gpu/lib/python3.7/site-packages/torch/include/ATen/Functions.h:4128:229:   required from here
  /usr/include/c++/6/tuple:495:244: error: wrong number of template arguments (4, should be 2)
         return  __and_<__not_<is_same<tuple<_Elements...>,
                                                                                                                                                                                                                                                      ^
  /usr/include/c++/6/type_traits:1558:8: note: provided for ‘template<class _From, class _To> struct std::is_convertible’
       struct is_convertible
          ^~~~~~~~~~~~~~
  /usr/include/c++/6/tuple:502:1: error: body of constexpr function ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = const std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’ not a return-statement
       }
   ^
  /usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’:
  /usr/include/c++/6/tuple:686:422:   required by substitution of ‘template<class ... _UElements, class _Dummy, typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_MoveConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<tuple<_Elements ...>&&>()), bool>::type <anonymous> > constexpr std::tuple< <template-parameter-1-1> >::tuple(std::tuple<_Args1 ...>&&) [with _UElements = {at::Tensor&, at::Tensor&, at::Tensor&}; _Dummy = void; typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_MoveConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<tuple<_Elements ...>&&>()), bool>::type <anonymous> = <missing>]’
  /home/sainatarajan/anaconda3/envs/tf_gpu/lib/python3.7/site-packages/torch/include/ATen/Functions.h:4128:229:   required from here
  /usr/include/c++/6/tuple:495:244: error: wrong number of template arguments (4, should be 2)
         return  __and_<__not_<is_same<tuple<_Elements...>,
                                                                                                                                                                                                                                                      ^
  /usr/include/c++/6/type_traits:1558:8: note: provided for ‘template<class _From, class _To> struct std::is_convertible’
       struct is_convertible
          ^~~~~~~~~~~~~~
  /usr/include/c++/6/tuple:502:1: error: body of constexpr function ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’ not a return-statement
       }
   ^
  error: command '/usr/bin/nvcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for inplace-abn
  Running setup.py clean for inplace-abn
Failed to build inplace-abn

Zilong Huang · Answer 6 · Wed Oct 09 2019 23:26:41 GMT+0800 (China Standard Time)

I'm not sure how to solve this issue. But there is the same error which caused by the higher gcc version. open-mmlab/mmdetection#422
Hope it helps.

sainatarajan · Answer 7 · Wed Oct 09 2019 23:50:24 GMT+0800 (China Standard Time)

Thanks. I will try to downgrade my GCC to 7.3 and try to install it. It's not possible for me to downgrade to version 5.x or 6.x since I am not sure what effects it will cause on other installed libraries and dependencies.

sainatarajan · Answer 8 · Thu Oct 10 2019 00:09:46 GMT+0800 (China Standard Time)

@speedinghzl I just saw the requirements and found that it needs 4x12 GB GPU's. I thought to try CCNet and it was the same case there too =). I currently have only one 12 GB RTX 2080 Ti and I think it's not possible for me to run this model. Thanks for your help. I will close this issue now.