Failing build on Arch Linux

Question

Failing build on Arch Linux

lahwaacz opened this issue a year ago · comments

The current develop branch fails to build on Arch Linux:

[856/1238] Linking CUDA executable cuda/test/base/array
FAILED: cuda/test/base/array
: && /opt/cuda/bin/g++ -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now     -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags cuda/test/base/CMakeFiles/cuda_test_base_array.dir/array.cu.o -o cuda/test/base/array  -Wl,-rpath,/build/ginkgo-hpc-git/src/build/lib  lib/libginkgo.so.1.7.0  lib/libginkgo_omp.so.1.7.0  lib/libginkgo_cuda.so.1.7.0  -ldl  lib/libginkgo_reference.so.1.7.0  lib/libginkgo_hip.so.1.7.0  lib/libginkgo_dpcpp.so.1.7.0  lib/libginkgo_device.so.1.7.0  /usr/lib/libhwloc.so  /usr/lib/libhwloc.so  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so  /usr/lib/libgtest_main.so.1.13.0  /usr/lib/libgtest.so.1.13.0  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/opt/cuda/targets/x86_64-linux/lib/stubs" -L"/opt/cuda/targets/x86_64-linux/lib" && :
/usr/bin/ld: lib/libginkgo.so.1.7.0: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
/usr/bin/ld: lib/libginkgo_cuda.so.1.7.0: undefined reference to `std::ios_base_library_init()'
collect2: error: ld returned 1 exit status

There are many more linking errors like this.

Marcel Koch · Answer 1 · Mon Aug 14 2023 20:43:42 GMT+0800 (China Standard Time)

Could you provide us your used compiler and cuda versions? Maybe that has some incompatibilities. (I think detailed.log in the build directory should have the necessary information)

Jakub Klinkovský · Answer 2 · Mon Aug 14 2023 20:58:10 GMT+0800 (China Standard Time)

The detailed.log contains:

CMAKE_CXX_COMPILER:                         GNU 13.2.1 on platform Linux x86_64
CMAKE_CUDA_COMPILER:                        /opt/cuda/bin/nvcc
CMAKE_CUDA_COMPILER_VERSION:                12.2.91
CMAKE_CUDA_HOST_COMPILER:                   <empty>

The CUDA host compiler is actually set via /opt/cuda/bin/g++ -> /usr/bin/g++-12 symlink, Arch Linux has g++-12 (GCC) 12.3.0

Marcel Koch · Answer 3 · Mon Aug 14 2023 21:12:23 GMT+0800 (China Standard Time)

Could you use the host compiler for compiling all of ginkgo?

Tobias Ribizel · Answer 4 · Mon Aug 14 2023 21:15:14 GMT+0800 (China Standard Time)

After some comments from CMake developers, we recently moved away from setting the CMAKE_CUDA_HOST_COMPILER inside ginkgo, since providing a compatible environment cannot be our responsibility. You can use the CUDAHOSTCXX environment variable to specify which host compiler to use

Tobias Ribizel · Answer 5 · Mon Aug 14 2023 21:56:12 GMT+0800 (China Standard Time)

Sounds like the default compiler Arch uses is incompatible with the CUDA version available: https://forums.developer.nvidia.com/t/identifier-float32-is-undefined-etc-cuda-12-2-0-gcc-13-1/258930

Building everything with the CUDA package-provided gcc 12 seems to work on Arch.

Jakub Klinkovský · Answer 6 · Mon Aug 14 2023 22:41:55 GMT+0800 (China Standard Time)

Sounds like the default compiler Arch uses is incompatible with the CUDA version available: https://forums.developer.nvidia.com/t/identifier-float32-is-undefined-etc-cuda-12-2-0-gcc-13-1/258930

That's exactly why Arch Linux provides the /opt/cuda/bin/g++ -> /usr/bin/g++-12 symlink.

Building everything with the CUDA package-provided gcc 12 seems to work on Arch.

For me passing -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 to cmake results in error due to cmake not finding MPI:

-- Could NOT find VTune (missing: VTune_EXECUTABLE VTune_LIBRARY VTune_INCLUDE_DIR)
-- Could NOT find METIS (missing: METIS_LIBRARY METIS_INCLUDE_DIR)
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) (Required is at least version "3.1")
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND CXX) (Required is at least
  version "3.1")
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  CMakeLists.txt:239 (find_package)

Same thing happens when I set the compilers via the CC and CXX environment variables. Of course the openmpi package is installed and the configure step actually passed before I started setting the compilers.

Tobias Ribizel · Answer 7 · Mon Aug 14 2023 22:50:36 GMT+0800 (China Standard Time)

That's the same issue (From the CMakeConfigureLog.yaml)

    kind: "try_compile-v1"
    backtrace:
      - "/usr/share/cmake/Modules/FindMPI.cmake:1278 (try_compile)"
      - "/usr/share/cmake/Modules/FindMPI.cmake:1322 (_MPI_try_staged_settings)"
      - "/usr/share/cmake/Modules/FindMPI.cmake:1645 (_MPI_check_lang_works)"
      - "CMakeLists.txt:2 (find_package)"
    description: "The MPI test test_mpi for CXX in mode normal"
    directories:
      source: "/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK"
      binary: "/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK"
    cmakeVariables:
      CMAKE_CXX_FLAGS: ""
    buildResult:
      variable: "MPI_RESULT_CXX_test_mpi_normal"
      cached: true
      stdout: |
        Change Dir: '/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK'
        
        Run Build Command(s): /usr/sbin/ninja -v cmTC_33b4e
        [1/2] /opt/cuda/bin/g++    -o CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -c /test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK/test_mpi.cpp
        [2/2] : && /opt/cuda/bin/g++  -rdynamic  -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -o cmTC_33b4e  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so && :
        FAILED: cmTC_33b4e 
        : && /opt/cuda/bin/g++  -rdynamic  -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -o cmTC_33b4e  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so && :
        /usr/sbin/ld: /usr/lib/libmpi_cxx.so: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
        collect2: error: ld returned 1 exit status
        ninja: build stopped: subcommand failed.

For this to work, libmpi_cxx.so needs to be compiled for the right standard library.

Jakub Klinkovský · Answer 8 · Mon Aug 14 2023 23:21:45 GMT+0800 (China Standard Time)

Actually, the gcc12 package in Arch needs to be rebuilt after a recent glibc package update... I'll let the packagers know, sorry for the noise.

Tobias Ribizel · Answer 9 · Fri Apr 05 2024 02:40:27 GMT+0800 (China Standard Time)

@lahwaacz do you want to keep this open for CUDA/HIP support?

Jakub Klinkovský · Answer 10 · Fri Apr 05 2024 02:47:38 GMT+0800 (China Standard Time)

@upsj This was already fixed...?

Tobias Ribizel · Answer 11 · Fri Apr 05 2024 02:49:22 GMT+0800 (China Standard Time)

@lahwaacz You can't build CUDA and HIP with _GLIBCXX_DEBUG support, if you try you run into compilation issues, if you don't you run into ABI-related linker issues

Jakub Klinkovský · Answer 12 · Fri Apr 05 2024 02:52:44 GMT+0800 (China Standard Time)

@upsj Well the asserts are a different issue 🤷