1.4.1 breaks cuda 17 separate compilation linking test

Question

1.4.1 breaks cuda 17 separate compilation linking test

heftig opened this issue 2 months ago · comments

Jan Alexander Steffens commented 2 months ago

As of Meson 1.4.1, run_tests.py fails in a cuda test. Meson 1.4.0 is still fine.

system parameters

Arch Linux
Cuda 12.5.0-1
Python 3.12.3-1
Meson 1.4.1
Ninja 1.12.1-1

Test failure log

Mesonlogs of failing tests


=============================== cuda: 17 separate compilation linking ==============================
 

Failed during: test
Reason: Running unit tests failed.
 

(inprocess) $ setup --prefix /usr --libdir lib 'test cases/cuda/17 separate compilation linking' '/build/meson/src/meson-1.4.1/b baa2750840' --backend=ninja
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version:   12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags:     -gencode arch=compute_80,code=sm_80
Build targets in project: 2

device linking 1.0.0

  User defined options
    backend: ninja
    libdir : lib
    prefix : /usr

Found ninja-1.12.1 at /usr/bin/ninja
ninja explain: deps for 'app.p/main.cu.o' are missing
ninja explain: app.p/main.cu.o is dirty
ninja explain: deps for 'libdevicefuncs.a.p/b.cu.o' are missing
ninja explain: libdevicefuncs.a.p/b.cu.o is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
ninja explain: meson-test-prereq is dirty
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja explain: libdevicefuncs.a is dirty
ninja explain: app is dirty
[1/4] Compiling Cuda object libdevicefuncs.a.p/b.cu.o
[2/4] Linking static target libdevicefuncs.a
[3/4] Compiling Cuda object app.p/main.cu.o
[4/4] Linking target app
ninja explain: output build.ninja older than most recent input ../test cases/cuda/17 separate compilation linking/meson.build (1717095069265791924 vs 1717095070669117141)
[0/1] Regenerating build files.
The Meson build system
Version: 1.4.1
Source dir: /build/meson/src/meson-1.4.1/test cases/cuda/17 separate compilation linking
Build dir: /build/meson/src/meson-1.4.1/b baa2750840
Build type: native build
Project name: device linking
Project version: 1.0.0
C++ compiler for the host machine: c++ (gcc 14.1.1 "c++ (GCC) 14.1.1 20240522")
C++ linker for the host machine: c++ ld.bfd 2.42.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
../test cases/cuda/17 separate compilation linking/meson.build:12: WARNING: Module CUDA has no backwards or forwards compatibility and might not exist in future releases.
Message: NVCC version:   12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Message: NVCC flags:     -gencode arch=compute_80,code=sm_80
Build targets in project: 2

device linking 1.0.0

  User defined options
    backend: ninja
    libdir : lib
    prefix : /usr

Found ninja-1.12.1 at /usr/bin/ninja

Generating targets:   0%|          | 0/2 eta ?
                                              

Writing build.ninja:   0%|          | 0/30 eta ?
                                                
Cleaning... 0 files.
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja: no work to do.
10/1 cudatest FAIL            0.01s   exit status 1
>>> MALLOC_PERTURB_=178 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 '/build/meson/src/meson-1.4.1/b baa2750840/app'


Ok:                 0   
Expected Fail:      0   
Fail:               1   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

Full log written to /build/meson/src/meson-1.4.1/b baa2750840/meson-logs/testlog.txt
No tests defined.
 

                                                  


Total passed tests:  677
Total failed tests:  1
Total skipped tests: 77

All failures:
  -> cuda: 17 separate compilation linking

Jan Alexander Steffens · Answer 1 · Fri May 31 2024 03:00:52 GMT+0800 (China Standard Time)

Maybe @SoapGentoo since 1.4.1 contains your cuda changes.

Eli Schwartz · Answer 2 · Fri May 31 2024 03:13:38 GMT+0800 (China Standard Time)

With MESON_PRINT_TEST_OUTPUT=1 the tests will spew tons of information, including the testlog from cudatest, which might indicate why it fails.

Jan Alexander Steffens · Answer 3 · Fri May 31 2024 03:26:11 GMT+0800 (China Standard Time)

I don't see any additional information.

Jan Alexander Steffens · Answer 4 · Fri May 31 2024 03:35:02 GMT+0800 (China Standard Time)

Building the test case manually and running the app results in couldn't get the symbol addr.

David Seifert · Answer 5 · Fri May 31 2024 03:57:30 GMT+0800 (China Standard Time)

what GPU is this on?

David Seifert · Answer 6 · Fri May 31 2024 03:59:41 GMT+0800 (China Standard Time)

$ ninja -C build/ -v test 
ninja: Entering directory `build/'
[0/1] /usr/lib/python-exec/python3.12/python3 -u /home/dseifert/git/meson/meson.py test --no-rebuild --print-errorlogs
1/1 cudatest        OK              0.10s

I bet this is related to the fact that it's building for -gencode arch=compute_80,code=sm_80, and since I'm using an Ada GPU, it works fine for me.

David Seifert · Answer 7 · Fri May 31 2024 04:04:55 GMT+0800 (China Standard Time)

try the following patch:

--- a/test cases/cuda/17 separate compilation linking/meson.build       
+++ b/test cases/cuda/17 separate compilation linking/meson.build       
@@ -8,7 +8,7 @@ project('device linking', ['cpp', 'cuda'], version : '1.0.0')
 nvcc = meson.get_compiler('cuda')
 cuda = import('unstable-cuda')
 
-arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Auto', detected : ['8.0'])
+arch_flags = cuda.nvcc_arch_flags(nvcc.version(), 'Common')
 
 message('NVCC version:   ' + nvcc.version())
 message('NVCC flags:     ' + ' '.join(arch_flags))

Jan Alexander Steffens · Answer 8 · Fri May 31 2024 05:40:06 GMT+0800 (China Standard Time)

No GPU. It's a build server without any Nvidia hardware.

The patch does not work.

Eli Schwartz · Answer 9 · Fri May 31 2024 05:47:04 GMT+0800 (China Standard Time)

The question is, in such a case, do we want:

to detect cuda and test that you can compile cuda code using the cuda module, but not try running device code
to skip the test since you cannot test the important part of actually running it, when a GPU isn't detected

The funny story is that this actually affects our CI too, but we did not notice because the cuda CI was silently broken. It loaded /etc/profile.d/cuda.sh but failed to set $PATH. I have a fix for that, and now this very test fails in github actions as well, since github actions has no GPU.