m4rs-mt / ILGPU.Algorithms

The new standard algorithms library for ILGPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to call Cub's DeviceScan functionality directly?

Ruberik opened this issue · comments

Based on https://nvlabs.github.io/cub/structcub_1_1_device_scan.html vs. https://nvlabs.github.io/cub/structcub_1_1_device_reduce.html, Nvidia's DeviceScan appears to get about 40% of DeviceReduce's speed. On my graphics card, using ILGPU, I'm seeing Scan get about 1/6 of Reduce's speed. Is a library used for ILGPU's Scan, or is it your own code? Is Cub's DeviceScan something you'd consider making available if it isn't already?

commented

@Ruberik CUB is a C++ library compiled using the Nvidia compiler nvcc and by specifying a target architecture e.g. sm_65.

ILGPU is more portable, detects the architecture at runtime and generates the suitable PTX instructions. Please note that the ILGPU implementation of scan is not as optimized as the CUB implementation.

If you wanted to use CUB in your own project, it should already be possible to integrate this library with .NET, however, this is an advanced scenario.

You would need to create a C++ DLL that exported functions that could be imported by .NET. You could then perform marshalling to move your data in .NET to an equivalent representation in C++.

Please note that if you plan to target multiple operating systems, or different GPUs, you will need to deal with specialised DLLs for each OS and PTX architecture.

Hey, I just want to say thanks for the detailed and helpful response on this, and on my other posts. I really appreciate the help, and the guidance on how to proceed. I'll look into your suggestion on Monday.