Is there a way to call Cub's DeviceScan functionality directly?

Question

Is there a way to call Cub's DeviceScan functionality directly?

Ruberik opened this issue 4 years ago · comments

Bartholomew Furrow commented 4 years ago

Based on https://nvlabs.github.io/cub/structcub_1_1_device_scan.html vs. https://nvlabs.github.io/cub/structcub_1_1_device_reduce.html, Nvidia's DeviceScan appears to get about 40% of DeviceReduce's speed. On my graphics card, using ILGPU, I'm seeing Scan get about 1/6 of Reduce's speed. Is a library used for ILGPU's Scan, or is it your own code? Is Cub's DeviceScan something you'd consider making available if it isn't already?

MoFtZ · Answer 1 · Sat Dec 05 2020 13:19:44 GMT+0800 (China Standard Time)

@Ruberik CUB is a C++ library compiled using the Nvidia compiler nvcc and by specifying a target architecture e.g. sm_65.

ILGPU is more portable, detects the architecture at runtime and generates the suitable PTX instructions. Please note that the ILGPU implementation of scan is not as optimized as the CUB implementation.

If you wanted to use CUB in your own project, it should already be possible to integrate this library with .NET, however, this is an advanced scenario.

You would need to create a C++ DLL that exported functions that could be imported by .NET. You could then perform marshalling to move your data in .NET to an equivalent representation in C++.

Please note that if you plan to target multiple operating systems, or different GPUs, you will need to deal with specialised DLLs for each OS and PTX architecture.

Bartholomew Furrow · Answer 2 · Sun Dec 06 2020 00:33:38 GMT+0800 (China Standard Time)

Hey, I just want to say thanks for the detailed and helpful response on this, and on my other posts. I really appreciate the help, and the guidance on how to proceed. I'll look into your suggestion on Monday.