1.4.1 breaks cuda 17 separate compilation linking test
heftig opened this issue · comments
As of Meson 1.4.1, run_tests.py
fails in a cuda test. Meson 1.4.0 is still fine.
system parameters
- Arch Linux
- Cuda 12.5.0-1
- Python 3.12.3-1
- Meson 1.4.1
- Ninja 1.12.1-1
Test failure log
Mesonlogs of failing tests
=============================== cuda: 17 separate compilation linking ==============================
Failed during: test
Reason: Running unit tests failed.
(inprocess) $ setup --prefix /usr --libdir lib 'test cases/cuda/17 separate compilation linking' '/build/meson/src/meson-1.4.1/b baa2750840' --backend=ninja
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version: 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags: -gencode arch=compute_80,code=sm_80
Build targets in project: 2
device linking 1.0.0
User defined options
backend: ninja
libdir : lib
prefix : /usr
Found ninja-1.12.1 at /usr/bin/ninja
ninja explain: deps for 'app.p/main.cu.o' are missing
ninja explain: app.p/main.cu.o is dirty
ninja explain: deps for 'libdevicefuncs.a.p/b.cu.o' are missing
ninja explain: libdevicefuncs.a.p/b.cu.o is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
ninja explain: meson-test-prereq is dirty
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
[1/4] Compiling Cuda object libdevicefuncs.a.p/b.cu.o
[2/4] Linking static target libdevicefuncs.a
[3/4] Compiling Cuda object app.p/main.cu.o
[4/4] Linking target app
ninja explain: output build.ninja older than most recent input ../test cases/cuda/17 separate compilation linking/meson.build (1717095069265791924 vs 1717095070669117141)
[0/1] Regenerating build files.
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
../test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version: 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags: -gencode arch=compute_80,code=sm_80
Build targets in project: 2
device linking 1.0.0
User defined options
backend: ninja
libdir : lib
prefix : /usr
Found ninja-1.12.1 at /usr/bin/ninja
Generating targets: 0%| | 0/2 eta ?
Writing build.ninja: 0%| | 0/30 eta ?
Cleaning... 0 files.
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja: no work to do.
10/1 cudatest FAIL 0.01s exit status 1
>>> MALLOC_PERTURB_=178 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 '/build/meson/src/meson-1.4.1/b baa2750840/app'
Ok: 0
Expected Fail: 0
Fail: 1
Unexpected Pass: 0
Skipped: 0
Timeout: 0
Full log written to /build/meson/src/meson-1.4.1/b baa2750840/meson-logs/testlog.txt
No tests defined.
Total passed tests: 677
Total failed tests: 1
Total skipped tests: 77
All failures:
-> cuda: 17 separate compilation linking
Maybe @SoapGentoo since 1.4.1 contains your cuda changes.
With MESON_PRINT_TEST_OUTPUT=1
the tests will spew tons of information, including the testlog from cudatest
, which might indicate why it fails.
I don't see any additional information.
Building the test case manually and running the app
results in couldn't get the symbol addr
.
what GPU is this on?
$ ninja -C build/ -v test
ninja: Entering directory `build/'
[0/1] /usr/lib/python-exec/python3.12/python3 -u /home/dseifert/git/meson/meson.py test --no-rebuild --print-errorlogs
1/1 cudatest OK 0.10s
I bet this is related to the fact that it's building for -gencode arch=compute_80,code=sm_80
, and since I'm using an Ada GPU, it works fine for me.
try the following patch:
--- a/test cases/cuda/17 separate compilation linking/meson.build
+++ b/test cases/cuda/17 separate compilation linking/meson.build
@@ -8,7 +8,7 @@ project('device linking', ['cpp', 'cuda'], version : '1.0.0')
nvcc = meson.get_compiler('cuda')
cuda = import('unstable-cuda')
-arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Auto', detected : ['8.0'])
+arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Common')
message('NVCC version: ' + nvcc.version())
message('NVCC flags: ' + ' '.join(arch_flags))
No GPU. It's a build server without any Nvidia hardware.
The patch does not work.
The question is, in such a case, do we want:
- to detect cuda and test that you can compile cuda code using the cuda module, but not try running device code
- to skip the test since you cannot test the important part of actually running it, when a GPU isn't detected
The funny story is that this actually affects our CI too, but we did not notice because the cuda CI was silently broken. It loaded /etc/profile.d/cuda.sh but failed to set $PATH. I have a fix for that, and now this very test fails in github actions as well, since github actions has no GPU.