NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Target architecture macros are broken

sleeepyjack opened this issue · comments

Describe the bug
(Originally found by @PointKernel)
Related: #143 #192 #193

The CUCO_HAS_INDEPENDENT_THREADS macro does not work as expected.
Same goes for all other macros that rely on __CUDA_ARCH__.

Steps/Code to reproduce bug
Compiling with -DCMAKE_CUDA_ARCHITECTURES=XX, where XX>=70 throws the following static assertion error:

cuCollections/include/cuco/static_map.cuh(157): error: static assertion failed with "A key type larger than 8B is supported for only sm_70 and up."

which originates from static_map/custom_type_example.cu.

Additional context
It seems like this issue didn't come up earlier because the default set of targets we compile for includes pre-Volta architectures. Thus, the independent threads feature is permanently disabled and so is the large type support.

Expected behavior
We need a host-sided mechanism for determining if a particular feature is valid for all target architectures. However, with the current default targets, this would still disable large types.

I would suggest tagging this as P0

Erratum: I got bamboozled by the rapids-cmake scripts. By default, we're compiling for native instead of all architectures (including Pascal).

Erratum of Erratum: Turns out we were in fact compiling for all architectures (including Pascal) without knowing. The rapids-cmake script has some weird logic where if CMAKE_CUDA_ARCHITECTURES is undefined defaults to all but CMAKE_CUDA_ARCHITECTURES="" defaults to native.

FYI, reverting #192 can solve the issue. i.e. large types are supported when compiled with cmake ../ -DCMAKE_CUDA_ARCHITECTURES=75 on my local system.

I came up with a battle plan to get this problem out of the way once and for all.

  1. We need a host-defined macro that gives us the minimum target architecture specified in the compile command. My first idea was using __CUDA_MINIMUM_ARCH__ for this but apparently this only works on nvc++. So as a fallback I'd like to resurrect the idea of defining this macro externally inside the top-level cmake script. Something similar to what @robertmaynard suggested here: #192 (comment)

  2. As addressed in the initial post, we currently compile for sm_60 by default as well. Combined with 1., this would disable the large type support entirely. I propose to switch to native build instead. This can be done by explicitly specifying CMAKE_CUDA_ARCHITECTURES="native" before initializing the rapids-cmake init script in case the variable has not been explicitly set via the command line.

We don't want to disable the independent thread support for all archs just because sm_60 is included in the arch list.

It should only be disabled when actually trying to run on sm_60.

Yeah, that would be the optimal solution. @PointKernel and I have racked our brains over this but haven't found a way to make this work yet.

The problem is that using the CUDA architecture in a conditional on the host is UB. However, this does not apply if the condition evaluates to the same value for all architectures, which would be the case if we use the minimum provided architecture.

The second idea would be to move the check into device code and evaluate it at runtime.

Edit: not possible since libcu++ will throw an error at compile time. Meh, I'm going in circles here.

Temporarily unblocked by reverting #192