include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: identifier "__grid_constant__" is undefined

Question

include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: identifier "__grid_constant__" is undefined

twmht opened this issue 4 months ago · comments

I am using the Orin-NX with CUDA version 11.4. The following error occurs during compilation:

(jarvis) aaeon@BOXER-8651AI:~/CTranslate2/build$ cmake .. -DWITH_MKL=OFF -DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENMP_RUNTIME=NONE
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Build spdlog: 1.10.0
-- Build type: Release
-- Compiling for multiple CPU ISA and enabling runtime dispatch
-- Found CUDA: /usr/local/cuda-11.4 (found suitable version "11.4", minimum required is "11.0")
-- Autodetected CUDA architecture(s):  8.7
-- NVCC host compiler: /usr/bin/c++
-- NVCC compilation flags: -std=c++17;-gencode;arch=compute_87,code=sm_87;--expt-relaxed-constexpr;--expt-extended-lambda;--use_fast_math
-- Found cuDNN include directory: /usr/include
-- Found cuDNN libraries: /usr/lib/aarch64-linux-gnu/libcudnn.so
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/aaeon/CTranslate2/build
(jarvis) aaeon@BOXER-8651AI:~/CTranslate2/build$ make -j4
[  0%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/cuda/ctranslate2_generated_primitives.cu.o
[  0%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_alibi_add_gpu.cu.o
[  2%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/cuda/ctranslate2_generated_random.cu.o
[  2%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/flash-attention/ctranslate2_generated_flash_fwd_split_hdim256_fp16_sm80.cu.o
/home/aaeon/CTranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): warning: attribute "__global__" does not apply here

/home/aaeon/CTranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: incomplete type is not allowed

/home/aaeon/CTranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: identifier "__grid_constant__" is undefined

/home/aaeon/CTranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: expected a ")"

/home/aaeon/CTranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: expected a ";"

4 errors detected in the compilation of "/home/aaeon/CTranslate2/src/ops/flash-attention/flash_fwd_split_hdim256_fp16_sm80.cu".
CMake Error at ctranslate2_generated_flash_fwd_split_hdim256_fp16_sm80.cu.o.Release.cmake:280 (message):
  Error generating file
  /home/aaeon/CTranslate2/build/CMakeFiles/ctranslate2.dir/src/ops/flash-attention/./ctranslate2_generated_flash_fwd_split_hdim256_fp16_sm80.cu.o


make[2]: *** [CMakeFiles/ctranslate2.dir/build.make:429: CMakeFiles/ctranslate2.dir/src/ops/flash-attention/ctranslate2_generated_flash_fwd_split_hdim256_fp16_sm80.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
i^Cmake[2]: *** [CMakeFiles/ctranslate2.dir/build.make:65: CMakeFiles/ctranslate2.dir/src/cuda/ctranslate2_generated_primitives.cu.o] Interrupt
make[1]: *** [CMakeFiles/Makefile2:114: CMakeFiles/ctranslate2.dir/all] Interrupt
make: *** [Makefile:130: all] Interrupt

Any idea?

Ming-Hsuan-Tu · Answer 1 · Fri Apr 12 2024 13:54:31 GMT+0800 (China Standard Time)

does grid_constant only work for cuda 11.7 and newer version?

Minh-Thuc · Answer 2 · Fri Apr 12 2024 15:39:48 GMT+0800 (China Standard Time)

Following the documentation of CUDA, I only know that __grid_constant__ is supported from sm 70. Not sure from which version of CUDA we have this. BTW, you can try to remove this __grid_constant__. It should work even without this macro.

BBC-Esq · Answer 3 · Thu Apr 25 2024 15:59:53 GMT+0800 (China Standard Time)

Was this resolved? As I sit drinking my morning coffee reading about one of my favorite libraries, ctranslate2, I don't want to waste time reading about issues that have been resolved. @twmht how'd it go?