Build failure with Tensorflow addons 0.20
npanpaliya opened this issue · comments
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux x86_64
- TensorFlow version and how it was installed (source or binary): 2.12 via conda package of TF (Built using https://github.com/open-ce/tensorflow-feedstock)
- TensorFlow-Addons version and how it was installed (source or binary): 0.20 (Built from source)
- Python version: Python 3.10
- Is GPU used? (yes/no): yes
Describe the bug
While building TF addons 0.20 with TF 2.12, cuda 11.8 and cudnn 8.8.1, I'm seeing following build failure -
n file included from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/config.h:33,
from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/execution_policy.h:35,
from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/device_system_tag.h:23,
from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/iterator_facade_category.h:22,
from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/iterator_facade.h:37,
from external/cub_archive/cub/device/../iterator/arg_index_input_iterator.cuh:48,
from external/cub_archive/cub/device/device_reduce.cuh:41,
from tensorflow_addons/custom_ops/layers/cc/kernels/correlation_cost_op_gpu.cu.cc:20:
/usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
My .bazelrc looks like
build --action_env TF_HEADER_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="1"
build --action_env TF_CPLUSPLUS_VER="c++17"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build --experimental_repo_remote_exec
build -c opt
build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
build --copt=-mavx
build --cxxopt=-std=c++17
build --host_cxxopt=-std=c++17
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda,/opt/conda/envs/testaddons,/usr/include"
build --action_env CUDNN_INSTALL_PATH="/opt/conda/envs/testaddons"
build --action_env TF_CUDA_VERSION="11"
build --action_env TF_CUDNN_VERSION="8.8"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@ubuntu20.04-gcc9_manylinux2014-cuda11.8-cudnn8.6-tensorrt8.4_config_cuda//crosstool:toolchain
build --action_env PYTHON_BIN_PATH="/opt/conda/envs/testaddons/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/conda/envs/testaddons/lib/python3.10/site-packages"
build --python_path="/opt/conda/envs/testaddons/bin/python"
build --action_env GCC_HOST_COMPILER_PATH="/opt/conda/envs/testaddons/bin/x86_64-conda-linux-gnu-cc"
Code to reproduce the issue
Build command:
bazel build -s --enable_runfiles build_pip_pkg
Please provide some help to get rid of this build error.
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
@seanpmorgan - Could you please provide some pointer?
Does anyone have any pointers to fix this issue?
it seems similar to
dmlc/xgboost#7378
fixed with dmlc/xgboost#7379
Running into the same issue when building tf addons 0.19 with cuda 11.8. what config should be used in this case?
In my case removing cub from WORKSPACE similar to #2821 works. @seanpmorgan May I know what's the reason for cub removal in that PR?
I have this issue in another project.
Tried CUDA 10.1 and 12.3. Same issue.
But there is no error with CUDA 11.4
Same issue with CUDA 10.8