eyalroz / gpu-kernel-runner

Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement CUDA <-> OpenCL compatibility support for half-precision types

eyalroz opened this issue · comments

OpenCL offers a half-precision type with vectorized versions: half, half2, half4 and possibly half8. In CUDA, we have half and half2, but no half4. On the other hand, it offers __nv_bfloat16 and __nv_bfloat162.

Let's make the half2 and half4 syntax acceptable for IDEs, both with CUDA and with OpenCL, and - let's implement 4-vectorized versions of half and __nv_bfloat16 for CUDA.