Pyclblast float16 scalar conversion
vathomass opened this issue · comments
Hi,
First, thank for the nice software.
I am facing some issues using the floating point 16bit functions of pyclblast
. In short, I am getting incorrect results. My driver is intel's OpenCL HD Graphics on windows and linux (through WSL2) platforms.
My take on the problem is that the type cast in the cython
wrapper in incorrect, because it tries to cast floating point arguments to unsigned 16 bits integers, aka cl_half
(see this issue in khronos git).
To elaborate more, I have written a script (attached) that uses axpy
, with the alpha
parameter passed to the function as float
or as np.uint16
. In the latter case the results are numerically correct. I have done the same with the script of issue #334 (seems to me that the issue is related), which again returns the correct results in my PC with the alpha
argument as np.uint16
. (I ran tests also with negative alpha
values and the results are as expected).
My suggestion is to do the proper conversion in the cython
wrapper, so the user does not have to do the conversion manually to use the floating point 16bit features. But I am unsure if all the OpenCL implementations do have this issue, namely treating cl_half
as 16bit unsinged integer.
Best,
Thomas
Sorry for my late reply.
Have you looked at the FP16 example here? It uses the float16
Numpy data-type, and also uses pyclblast.float32_to_float16
for conversion of the alpha
value.
Hi,
Sorry, I had not seen pyclblast.float32_to_float16
because the link to this function in the discussion of #334 returned a 404. My bad, I should had searched the examples and/or the sources a little more. Indeed, running pyclblast.float32_to_float16(val)
and numpy.frombuffer(numpy.float16(val).tobytes(), np.uint16)[0]
return the exact same integer value (except for the case of overflow, where the first errors out and the second returns NaN
).
Still, this does not address the issue of the improper cast in the cython
wrapper (rather imply that the numpy.float16
conversion is the issue). From my point of view, the suitable thing to do would be to perform a reinterpret_cast
, but this feature is only available in C++
, not in C
or cython
. It would also be user-friendly since python
code that tries to run for different floating point types, would not need to implement check and conversions.
Certainly, this is just a suggestion.
Thanks again for the nice software.
Best,
Thomas
Thank you for the explanation. I do understand better now. I don't have much time myself to work on such things, but I'm happy to review a pull request from someone else to improve this.
Hi,
I am working on a project and I would like to run the same code with fp16, fp32 and fp64. So, I will implement the conversion in the cython code for my project. I will open a pull requested with the changes in case you are interested.
Just FYI, I have made two other changes to the cython wrapper: one is to be able to run the generator script under windows without messing up the line endings (mostly for convenience). The second is to use integer type output argument in amax
, amin
routines. This last point is necessary if you want to run these routines with fp16.
For more details see the pull request.
Best,
Thomas