Can CuFFT or CuFFTW be used internally in the kernel?
xxl-cc opened this issue · comments
Recently, I want to change the Fourier transform I wrote inside the kernel to CuFFT or CuFFTW to test the execution efficiency. Can they be used inside the kernel? I also need to process a large amount of data inside the kernel and perform Fourier transform filtering. But I saw that the example provided is directly using CuFFT.
My example:
public static void FilterProjection(Index1D index, SpecializedValue filterFactorCount, Int2 projectionDim, ArrayView fftWn, ArrayView ifftWn, ArrayView filterFactor, ArrayView oneproj, ArrayView filteredproj)
{
int i = index.X;
var xdata = new Complex[filterFactorCount];
for (int j = 0; j < filterFactorCount; j++)
{
if (j < projectionDim.Y)
{
xdata[j] = new Complex(oneproj[j * projectionDim.X + i], 0.0f);
}
else
{
xdata[j] = new Complex(0.0f, 0.0f);
}
}
FTransfrom.FFT(xdata, fftWn, filterFactorCount);//Can CuFFT be used instead
for (int j = 0; j < filterFactorCount; j++)
{
xdata[j] *= filterFactor[j];
}
FTransfrom.IFFT(xdata, ifftWn, filterFactorCount);//Can CuFFT be used instead
for (int j = 0; j < projectionDim.Y; j++)
{
filteredproj[j * projectionDim.X + i] = (float)xdata[j].Real;
}
}
hi @xxl-cc, unfortunately no, it is not possible to call the CuFFT or CuFFTW functions inside a GPU kernel. This is because those functions themselves will launch one (or more) GPU kernels to perform the calculations.
I was not previously aware of this, but it looks like there is a CuFFTDx library that can be called inside a GPU kernel.
https://docs.nvidia.com/cuda/cufftdx/index.html
It is a C++ header only library, so you would need to port it to C#.
Thank you for your reply.Are all CUDA libraries encapsulated by ILGPU currently unusable in the kernel? After all, frequent external data interaction can significantly reduce performance. We hope that the ILGPU. Algorithms library can become more and more comprehensive, reducing some repetitive coding work.
From ILGPU.Algorithms, the Grid, Group, Vector and Warp extensions can be used within a kernel. All the "bigger" functions like RadixSort are not usable inside a GPU kernel.
The Cuda library bindings, including CuRand, CuBlas, CuFFT etc, are all not usable within a GPU kernel. This is the same in C++.
There is also the LibDevice library that can be used inside a GPU kernel, however, it currently does not support non-Cuda accelerators, including the CPU accelerator used for debugging.