Implement CUDA <-> OpenCL compatibility support for half-precision types

Question

Implement CUDA <-> OpenCL compatibility support for half-precision types

eyalroz opened this issue a year ago · comments

OpenCL offers a half-precision type with vectorized versions: half, half2, half4 and possibly half8. In CUDA, we have half and half2, but no half4. On the other hand, it offers __nv_bfloat16 and __nv_bfloat162.

Let's make the half2 and half4 syntax acceptable for IDEs, both with CUDA and with OpenCL, and - let's implement 4-vectorized versions of half and __nv_bfloat16 for CUDA.