Unable to install the inplace_abn library due to CUDA error
sainatarajan opened this issue · comments
Hi! Thanks for this repo. I am unable to install the inplace_abn
library that is required to train your models. I have tried many ways to install and debug but it still shows the CUDA error. error: command '/usr/bin/nvcc' failed with exit status 1
. Is it possible to train the model without using this library by making a few changes in the code?
Which branch do you use? master or pytorch-1.1.
And what's your Pytorch version, Cuda version?
Thanks for the reply. I downloaded the pytorch 1.1
branch and the versions in my local system are:
pytorch 1.2.0
torchvision 0.4.0
CuDNN 7.6.0
CudaToolkit 10.0.130
The versions are ok for compiling.
Can you post the detailed error message?
Sure, I will recompile now and post it here.
I installed the inplace_abn using pip and I got the following error. The complete stack trace is extremely very long. Here is the last 55 lines of the trace. My GCC version is 7.4.0
/usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = const std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’:
/usr/include/c++/6/tuple:662:419: required by substitution of ‘template<class ... _UElements, class _Dummy, typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<const tuple<_Elements ...>&>()), bool>::type <anonymous> > constexpr std::tuple< <template-parameter-1-1> >::tuple(const std::tuple<_Args1 ...>&) [with _UElements = {at::Tensor&, at::Tensor&, at::Tensor&}; _Dummy = void; typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<const tuple<_Elements ...>&>()), bool>::type <anonymous> = <missing>]’
/home/sainatarajan/anaconda3/envs/tf_gpu/lib/python3.7/site-packages/torch/include/ATen/Functions.h:4128:229: required from here
/usr/include/c++/6/tuple:495:244: error: wrong number of template arguments (4, should be 2)
return __and_<__not_<is_same<tuple<_Elements...>,
^
/usr/include/c++/6/type_traits:1558:8: note: provided for ‘template<class _From, class _To> struct std::is_convertible’
struct is_convertible
^~~~~~~~~~~~~~
/usr/include/c++/6/tuple:502:1: error: body of constexpr function ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = const std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’ not a return-statement
}
^
/usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’:
/usr/include/c++/6/tuple:686:422: required by substitution of ‘template<class ... _UElements, class _Dummy, typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_MoveConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<tuple<_Elements ...>&&>()), bool>::type <anonymous> > constexpr std::tuple< <template-parameter-1-1> >::tuple(std::tuple<_Args1 ...>&&) [with _UElements = {at::Tensor&, at::Tensor&, at::Tensor&}; _Dummy = void; typename std::enable_if<((std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_MoveConstructibleTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor&, at::Tensor&, at::Tensor&>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && std::_TC<(std::is_same<_Dummy, void>::value && (1ul == 1)), at::Tensor&, at::Tensor&, at::Tensor&>::_NonNestedTuple<tuple<_Elements ...>&&>()), bool>::type <anonymous> = <missing>]’
/home/sainatarajan/anaconda3/envs/tf_gpu/lib/python3.7/site-packages/torch/include/ATen/Functions.h:4128:229: required from here
/usr/include/c++/6/tuple:495:244: error: wrong number of template arguments (4, should be 2)
return __and_<__not_<is_same<tuple<_Elements...>,
^
/usr/include/c++/6/type_traits:1558:8: note: provided for ‘template<class _From, class _To> struct std::is_convertible’
struct is_convertible
^~~~~~~~~~~~~~
/usr/include/c++/6/tuple:502:1: error: body of constexpr function ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_NonNestedTuple() [with _SrcTuple = std::tuple<at::Tensor&, at::Tensor&, at::Tensor&>&&; bool <anonymous> = true; _Elements = {at::Tensor&, at::Tensor&, at::Tensor&}]’ not a return-statement
}
^
error: command '/usr/bin/nvcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for inplace-abn
Running setup.py clean for inplace-abn
Failed to build inplace-abn
I'm not sure how to solve this issue. But there is the same error which caused by the higher gcc version. open-mmlab/mmdetection#422
Hope it helps.
Thanks. I will try to downgrade my GCC to 7.3 and try to install it. It's not possible for me to downgrade to version 5.x or 6.x since I am not sure what effects it will cause on other installed libraries and dependencies.
@speedinghzl I just saw the requirements and found that it needs 4x12 GB GPU's. I thought to try CCNet and it was the same case there too =). I currently have only one 12 GB RTX 2080 Ti and I think it's not possible for me to run this model. Thanks for your help. I will close this issue now.