[BUG] Target architecture macros are broken

Question

[BUG] Target architecture macros are broken

sleeepyjack opened this issue 2 years ago · comments

Describe the bug
(Originally found by @PointKernel)
Related: #143 #192 #193

The CUCO_HAS_INDEPENDENT_THREADS macro does not work as expected.
Same goes for all other macros that rely on __CUDA_ARCH__.

Steps/Code to reproduce bug
Compiling with -DCMAKE_CUDA_ARCHITECTURES=XX, where XX>=70 throws the following static assertion error:

cuCollections/include/cuco/static_map.cuh(157): error: static assertion failed with "A key type larger than 8B is supported for only sm_70 and up."

which originates from static_map/custom_type_example.cu.

Additional context
It seems like this issue didn't come up earlier because the default set of targets we compile for includes pre-Volta architectures. Thus, the independent threads feature is permanently disabled and so is the large type support.

Expected behavior
We need a host-sided mechanism for determining if a particular feature is valid for all target architectures. However, with the current default targets, this would still disable large types.

I would suggest tagging this as P0

Erratum: I got bamboozled by the rapids-cmake scripts. By default, we're compiling for native instead of all architectures (including Pascal).

Erratum of Erratum: Turns out we were in fact compiling for all architectures (including Pascal) without knowing. The rapids-cmake script has some weird logic where if CMAKE_CUDA_ARCHITECTURES is undefined defaults to all but CMAKE_CUDA_ARCHITECTURES="" defaults to native.

Yunsong Wang · Answer 1 · Wed Aug 03 2022 04:38:52 GMT+0800 (China Standard Time)

FYI, reverting #192 can solve the issue. i.e. large types are supported when compiled with cmake ../ -DCMAKE_CUDA_ARCHITECTURES=75 on my local system.

Daniel Jünger · Answer 2 · Wed Aug 03 2022 20:44:03 GMT+0800 (China Standard Time)

I came up with a battle plan to get this problem out of the way once and for all.

We need a host-defined macro that gives us the minimum target architecture specified in the compile command. My first idea was using __CUDA_MINIMUM_ARCH__ for this but apparently this only works on nvc++. So as a fallback I'd like to resurrect the idea of defining this macro externally inside the top-level cmake script. Something similar to what @robertmaynard suggested here: #192 (comment)
As addressed in the initial post, we currently compile for sm_60 by default as well. Combined with 1., this would disable the large type support entirely. I propose to switch to native build instead. This can be done by explicitly specifying CMAKE_CUDA_ARCHITECTURES="native" before initializing the rapids-cmake init script in case the variable has not been explicitly set via the command line.

Jake Hemstad · Answer 3 · Wed Aug 03 2022 21:36:01 GMT+0800 (China Standard Time)

We don't want to disable the independent thread support for all archs just because sm_60 is included in the arch list.

It should only be disabled when actually trying to run on sm_60.

Daniel Jünger · Answer 4 · Wed Aug 03 2022 21:54:25 GMT+0800 (China Standard Time)

Yeah, that would be the optimal solution. @PointKernel and I have racked our brains over this but haven't found a way to make this work yet.

The problem is that using the CUDA architecture in a conditional on the host is UB. However, this does not apply if the condition evaluates to the same value for all architectures, which would be the case if we use the minimum provided architecture.

Daniel Jünger · Answer 5 · Wed Aug 03 2022 21:55:29 GMT+0800 (China Standard Time)

The second idea would be to move the check into device code and evaluate it at runtime.

Edit: not possible since libcu++ will throw an error at compile time. Meh, I'm going in circles here.

Yunsong Wang · Answer 6 · Wed Aug 03 2022 23:56:39 GMT+0800 (China Standard Time)

Temporarily unblocked by reverting #192