ginkgo-project / ginkgo

Numerical linear algebra software package

Home Page:https://ginkgo-project.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failing build on Arch Linux

lahwaacz opened this issue · comments

The current develop branch fails to build on Arch Linux:

[856/1238] Linking CUDA executable cuda/test/base/array
FAILED: cuda/test/base/array
: && /opt/cuda/bin/g++ -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now     -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags cuda/test/base/CMakeFiles/cuda_test_base_array.dir/array.cu.o -o cuda/test/base/array  -Wl,-rpath,/build/ginkgo-hpc-git/src/build/lib  lib/libginkgo.so.1.7.0  lib/libginkgo_omp.so.1.7.0  lib/libginkgo_cuda.so.1.7.0  -ldl  lib/libginkgo_reference.so.1.7.0  lib/libginkgo_hip.so.1.7.0  lib/libginkgo_dpcpp.so.1.7.0  lib/libginkgo_device.so.1.7.0  /usr/lib/libhwloc.so  /usr/lib/libhwloc.so  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so  /usr/lib/libgtest_main.so.1.13.0  /usr/lib/libgtest.so.1.13.0  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/opt/cuda/targets/x86_64-linux/lib/stubs" -L"/opt/cuda/targets/x86_64-linux/lib" && :
/usr/bin/ld: lib/libginkgo.so.1.7.0: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
/usr/bin/ld: lib/libginkgo_cuda.so.1.7.0: undefined reference to `std::ios_base_library_init()'
collect2: error: ld returned 1 exit status

There are many more linking errors like this.

Could you provide us your used compiler and cuda versions? Maybe that has some incompatibilities. (I think detailed.log in the build directory should have the necessary information)

The detailed.log contains:

CMAKE_CXX_COMPILER:                         GNU 13.2.1 on platform Linux x86_64
CMAKE_CUDA_COMPILER:                        /opt/cuda/bin/nvcc
CMAKE_CUDA_COMPILER_VERSION:                12.2.91
CMAKE_CUDA_HOST_COMPILER:                   <empty>

The CUDA host compiler is actually set via /opt/cuda/bin/g++ -> /usr/bin/g++-12 symlink, Arch Linux has g++-12 (GCC) 12.3.0

Could you use the host compiler for compiling all of ginkgo?

After some comments from CMake developers, we recently moved away from setting the CMAKE_CUDA_HOST_COMPILER inside ginkgo, since providing a compatible environment cannot be our responsibility. You can use the CUDAHOSTCXX environment variable to specify which host compiler to use

Sounds like the default compiler Arch uses is incompatible with the CUDA version available: https://forums.developer.nvidia.com/t/identifier-float32-is-undefined-etc-cuda-12-2-0-gcc-13-1/258930

Building everything with the CUDA package-provided gcc 12 seems to work on Arch.

Sounds like the default compiler Arch uses is incompatible with the CUDA version available: https://forums.developer.nvidia.com/t/identifier-float32-is-undefined-etc-cuda-12-2-0-gcc-13-1/258930

That's exactly why Arch Linux provides the /opt/cuda/bin/g++ -> /usr/bin/g++-12 symlink.

Building everything with the CUDA package-provided gcc 12 seems to work on Arch.

For me passing -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 to cmake results in error due to cmake not finding MPI:

-- Could NOT find VTune (missing: VTune_EXECUTABLE VTune_LIBRARY VTune_INCLUDE_DIR)
-- Could NOT find METIS (missing: METIS_LIBRARY METIS_INCLUDE_DIR)
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) (Required is at least version "3.1")
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND CXX) (Required is at least
  version "3.1")
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  CMakeLists.txt:239 (find_package)

Same thing happens when I set the compilers via the CC and CXX environment variables. Of course the openmpi package is installed and the configure step actually passed before I started setting the compilers.

That's the same issue (From the CMakeConfigureLog.yaml)

    kind: "try_compile-v1"
    backtrace:
      - "/usr/share/cmake/Modules/FindMPI.cmake:1278 (try_compile)"
      - "/usr/share/cmake/Modules/FindMPI.cmake:1322 (_MPI_try_staged_settings)"
      - "/usr/share/cmake/Modules/FindMPI.cmake:1645 (_MPI_check_lang_works)"
      - "CMakeLists.txt:2 (find_package)"
    description: "The MPI test test_mpi for CXX in mode normal"
    directories:
      source: "/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK"
      binary: "/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK"
    cmakeVariables:
      CMAKE_CXX_FLAGS: ""
    buildResult:
      variable: "MPI_RESULT_CXX_test_mpi_normal"
      cached: true
      stdout: |
        Change Dir: '/test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK'
        
        Run Build Command(s): /usr/sbin/ninja -v cmTC_33b4e
        [1/2] /opt/cuda/bin/g++    -o CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -c /test/build/CMakeFiles/CMakeScratch/TryCompile-BvetGK/test_mpi.cpp
        [2/2] : && /opt/cuda/bin/g++  -rdynamic  -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -o cmTC_33b4e  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so && :
        FAILED: cmTC_33b4e 
        : && /opt/cuda/bin/g++  -rdynamic  -Wl,-rpath -Wl,/usr/lib -Wl,--enable-new-dtags CMakeFiles/cmTC_33b4e.dir/test_mpi.cpp.o -o cmTC_33b4e  /usr/lib/libmpi_cxx.so  /usr/lib/libmpi.so && :
        /usr/sbin/ld: /usr/lib/libmpi_cxx.so: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
        collect2: error: ld returned 1 exit status
        ninja: build stopped: subcommand failed.

For this to work, libmpi_cxx.so needs to be compiled for the right standard library.

Actually, the gcc12 package in Arch needs to be rebuilt after a recent glibc package update... I'll let the packagers know, sorry for the noise.

@lahwaacz do you want to keep this open for CUDA/HIP support?

@upsj This was already fixed...?

@lahwaacz You can't build CUDA and HIP with _GLIBCXX_DEBUG support, if you try you run into compilation issues, if you don't you run into ABI-related linker issues

@upsj Well the asserts are a different issue 🤷