mesonbuild / meson

The Meson Build System

Home Page:http://mesonbuild.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1.4.1 breaks cuda 17 separate compilation linking test

heftig opened this issue · comments

As of Meson 1.4.1, run_tests.py fails in a cuda test. Meson 1.4.0 is still fine.

system parameters

  • Arch Linux
  • Cuda 12.5.0-1
  • Python 3.12.3-1
  • Meson 1.4.1
  • Ninja 1.12.1-1
Test failure log
Mesonlogs of failing tests


=============================== cuda: 17 separate compilation linking ==============================
 

Failed during: test
Reason: Running unit tests failed.
 

(inprocess) $ setup --prefix /usr --libdir lib 'test cases/cuda/17 separate compilation linking' '/build/meson/src/meson-1.4.1/b baa2750840' --backend=ninja
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version:   12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags:     -gencode arch=compute_80,code=sm_80
Build targets in project: 2

device linking 1.0.0

  User defined options
    backend: ninja
    libdir : lib
    prefix : /usr

Found ninja-1.12.1 at /usr/bin/ninja
ninja explain: deps for 'app.p/main.cu.o' are missing
ninja explain: app.p/main.cu.o is dirty
ninja explain: deps for 'libdevicefuncs.a.p/b.cu.o' are missing
ninja explain: libdevicefuncs.a.p/b.cu.o is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
ninja explain: meson-test-prereq is dirty
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
[1/4] Compiling Cuda object libdevicefuncs.a.p/b.cu.o
[2/4] Linking static target libdevicefuncs.a
[3/4] Compiling Cuda object app.p/main.cu.o
[4/4] Linking target app
ninja explain: output build.ninja older than most recent input ../test cases/cuda/17 separate compilation linking/meson.build (1717095069265791924 vs 1717095070669117141)
[0/1] Regenerating build files.
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
../test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version:   12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags:     -gencode arch=compute_80,code=sm_80
Build targets in project: 2

device linking 1.0.0

  User defined options
    backend: ninja
    libdir : lib
    prefix : /usr

Found ninja-1.12.1 at /usr/bin/ninja

Generating targets:   0%|          | 0/2 eta ?
                                              

Writing build.ninja:   0%|          | 0/30 eta ?
                                                
Cleaning... 0 files.
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja: no work to do.
10/1 cudatest FAIL            0.01s   exit status 1
>>> MALLOC_PERTURB_=178 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 '/build/meson/src/meson-1.4.1/b baa2750840/app'


Ok:                 0   
Expected Fail:      0   
Fail:               1   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

Full log written to /build/meson/src/meson-1.4.1/b baa2750840/meson-logs/testlog.txt
No tests defined.
 

                                                  


Total passed tests:  677
Total failed tests:  1
Total skipped tests: 77

All failures:
  -> cuda: 17 separate compilation linking

Maybe @SoapGentoo since 1.4.1 contains your cuda changes.

With MESON_PRINT_TEST_OUTPUT=1 the tests will spew tons of information, including the testlog from cudatest, which might indicate why it fails.

I don't see any additional information.

Building the test case manually and running the app results in couldn't get the symbol addr.

what GPU is this on?

$ ninja -C build/ -v test 
ninja: Entering directory `build/'
[0/1] /usr/lib/python-exec/python3.12/python3 -u /home/dseifert/git/meson/meson.py test --no-rebuild --print-errorlogs
1/1 cudatest        OK              0.10s

I bet this is related to the fact that it's building for -gencode arch=compute_80,code=sm_80, and since I'm using an Ada GPU, it works fine for me.

try the following patch:

--- a/test cases/cuda/17 separate compilation linking/meson.build       
+++ b/test cases/cuda/17 separate compilation linking/meson.build       
@@ -8,7 +8,7 @@ project('device linking', ['cpp', 'cuda'], version : '1.0.0')
 nvcc = meson.get_compiler('cuda')
 cuda = import('unstable-cuda')
 
-arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Auto', detected : ['8.0'])
+arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Common')
 
 message('NVCC version:   ' + nvcc.version())
 message('NVCC flags:     ' + ' '.join(arch_flags))

No GPU. It's a build server without any Nvidia hardware.

The patch does not work.

The question is, in such a case, do we want:

  • to detect cuda and test that you can compile cuda code using the cuda module, but not try running device code
  • to skip the test since you cannot test the important part of actually running it, when a GPU isn't detected

The funny story is that this actually affects our CI too, but we did not notice because the cuda CI was silently broken. It loaded /etc/profile.d/cuda.sh but failed to set $PATH. I have a fix for that, and now this very test fails in github actions as well, since github actions has no GPU.