tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build failure with Tensorflow addons 0.20

npanpaliya opened this issue · comments

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux x86_64
  • TensorFlow version and how it was installed (source or binary): 2.12 via conda package of TF (Built using https://github.com/open-ce/tensorflow-feedstock)
  • TensorFlow-Addons version and how it was installed (source or binary): 0.20 (Built from source)
  • Python version: Python 3.10
  • Is GPU used? (yes/no): yes

Describe the bug
While building TF addons 0.20 with TF 2.12, cuda 11.8 and cudnn 8.8.1, I'm seeing following build failure -

n file included from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/config.h:33,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/execution_policy.h:35,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/iterator_facade.h:37,
                 from external/cub_archive/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from external/cub_archive/cub/device/device_reduce.cuh:41,
                 from tensorflow_addons/custom_ops/layers/cc/kernels/correlation_cost_op_gpu.cu.cc:20:
/usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
   46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.

My .bazelrc looks like

build --action_env TF_HEADER_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="1"
build --action_env TF_CPLUSPLUS_VER="c++17"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build  --experimental_repo_remote_exec
build -c opt
build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
build --copt=-mavx
build --cxxopt=-std=c++17
build --host_cxxopt=-std=c++17
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda,/opt/conda/envs/testaddons,/usr/include"
build --action_env CUDNN_INSTALL_PATH="/opt/conda/envs/testaddons"
build --action_env TF_CUDA_VERSION="11"
build --action_env TF_CUDNN_VERSION="8.8"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@ubuntu20.04-gcc9_manylinux2014-cuda11.8-cudnn8.6-tensorrt8.4_config_cuda//crosstool:toolchain
build --action_env PYTHON_BIN_PATH="/opt/conda/envs/testaddons/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/conda/envs/testaddons/lib/python3.10/site-packages"
build --python_path="/opt/conda/envs/testaddons/bin/python"
build --action_env GCC_HOST_COMPILER_PATH="/opt/conda/envs/testaddons/bin/x86_64-conda-linux-gnu-cc"

Code to reproduce the issue
Build command:
bazel build -s --enable_runfiles build_pip_pkg

Please provide some help to get rid of this build error.

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@seanpmorgan - Could you please provide some pointer?

Does anyone have any pointers to fix this issue?

commented

it seems similar to
dmlc/xgboost#7378
fixed with dmlc/xgboost#7379

Okay.. Thanks @bhack. I'll give this a try.

Running into the same issue when building tf addons 0.19 with cuda 11.8. what config should be used in this case?
In my case removing cub from WORKSPACE similar to #2821 works. @seanpmorgan May I know what's the reason for cub removal in that PR?

I have this issue in another project.
Tried CUDA 10.1 and 12.3. Same issue.
But there is no error with CUDA 11.4

Same issue with CUDA 10.8