Implement CUDA <-> OpenCL compatibility support for half-precision types
eyalroz opened this issue · comments
Eyal Rozenberg commented
OpenCL offers a half-precision type with vectorized versions: half
, half2
, half4
and possibly half8
. In CUDA, we have half
and half2
, but no half4
. On the other hand, it offers __nv_bfloat16
and __nv_bfloat162
.
Let's make the half2
and half4
syntax acceptable for IDEs, both with CUDA and with OpenCL, and - let's implement 4-vectorized versions of half
and __nv_bfloat16
for CUDA.