Unresolved extern function 'cudaLaunchDevice' error while using NVCC 11.x and cub 2.10 with -G
lilohuang opened this issue · comments
Hi @senior-zero @allisonvacanti ,
I hope you are doing well. Today, I encountered a CUDA compilation failure, specifically an "Unresolved extern function 'cudaLaunchDevice'," while using CUDA 11.6.2 SDK in conjunction with thrust/cub 2.1.0 or trunk versions. This issue arose after enabling the device code debugging information (-G). It is worth noting that there are no problems with the default thrust/cub version included in CUDA SDK 11.6.2. Furthermore, I have observed that this particular compiler error does not exist in the CUDA 12.x compiler.
AFAIK, thrust/cub 2.1.0 should be compatible with CUDA 11.0 and above (i.e., https://github.com/NVIDIA/cub#supported-compilers). I have also uploaded a minimal bug reproducer to https://cuda.godbolt.org/z/7j39EToxh. Could you please take a look at it and provide guidance on how to resolve the bug using CUDA SDK 11.6.2 and thrust/cub 2.1.0?
#include <thrust/device_vector.h>
#include <cub/cub.cuh>
static_assert(THRUST_VERSION >= 200100, "thrust version must be >= 2.10");
int main() {
constexpr const auto num_items = 10;
thrust::device_vector<int> d_in(num_items, 1);
thrust::device_vector<int> d_out(num_items);
size_t temp_storage_bytes = 0;
cub::DeviceScan::ExclusiveSum(
nullptr,
temp_storage_bytes,
thrust::raw_pointer_cast(d_in.data()),
thrust::raw_pointer_cast(d_out.data()),
num_items);
thrust::device_vector<char> d_temp_storage(temp_storage_bytes);
cub::DeviceScan::ExclusiveSum(
thrust::raw_pointer_cast(d_temp_storage.data()),
temp_storage_bytes,
thrust::raw_pointer_cast(d_in.data()),
thrust::raw_pointer_cast(d_out.data()),
num_items);
for (auto value : d_out) {
std::cout << value << std::endl;
}
return 0;
}
Hello @lilohuang and thank you for reporting the issue! A bit simpler reproducer with the same behavior:
#include <nv/target>
template <class T> __device__ int device(T);
template <class T>
__device__ int doit_device(T val) {
return device(val);
}
template <class T>
__host__ int doit_host(T val) {
return 0;
}
template <class T>
int doit(T val)
{
NV_IF_TARGET(NV_IS_HOST,
(return doit_host(val);),
(return doit_device(val);));
}
int main() {
return doit(42);
}
It seems that we have to guard the doit_device
in Thrust to be something along the following lines:
template<class K, class... Args>
cudaError_t __device__
doit_device(K k, Args const&... args) const
{
#ifdef THRUST_RDC_ENABLED
const size_t size = argument_pack_size(0,args...);
void *param_buffer = cudaGetParameterBuffer(64,size);
fill_arguments((char*)param_buffer, 0, args...);
return launch_device(k, param_buffer);
#endif
}
Awesome! Thank you @senior-zero. It worked! 💯 Please close this issue after merging NVIDIA/thrust#1939
Merged NVIDIA/thrust#1939