What is the correct way to launch multiple kernels in a queue?

Question

What is the correct way to launch multiple kernels in a queue?

l3utterfly opened this issue a year ago · comments

I have several kernels which needs to be launched one after the other.

From the documentation of the launch function, it appears the launch is asynchronous. Which means if I launch multiple kernels they aren't guaranteed to be executed in order. I also need to wait for the previous kernel to exit before launching the new one, as the next kernel relies on the previous kernel's mutated data.

Note I cannot combine all this into one kernel because each launch will have different thread block configs. Also, there's no need to reclaim any data or copy anything back to the CPU between my kernel calls.

Max Obreiter · Answer 1 · Sat Apr 22 2023 12:32:54 GMT+0800 (China Standard Time)

While I am not 100% sure, but pretty sure that async just means it doesn't get executed right after you set it up. But everything is added to a stream (like a queue), so the execution order is still maintained, meaning everything is executed in order. To actually wait after each kernel call, you could just synchronize between each kernel (to be sure that there is no problem or so).

Corey Lowman · Answer 2 · Sun Apr 23 2023 02:23:11 GMT+0800 (China Standard Time)

^ Exactly right - they are asynchronous with respect to the host, but are launched in order of launch.

The only case where this isn't true is when you are using CudaStream/launch_on_stream, in which cause you have to manually synchronize between streams.

I see this is not really clarified in the current docstring for LaunchAsync, so we should add that