CNugteren / CLBlast

Tuned OpenCL BLAS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pyclblast float16 scalar conversion

vathomass opened this issue · comments

Hi,
First, thank for the nice software.
I am facing some issues using the floating point 16bit functions of pyclblast. In short, I am getting incorrect results. My driver is intel's OpenCL HD Graphics on windows and linux (through WSL2) platforms.
My take on the problem is that the type cast in the cython wrapper in incorrect, because it tries to cast floating point arguments to unsigned 16 bits integers, aka cl_half (see this issue in khronos git).

To elaborate more, I have written a script (attached) that uses axpy, with the alpha parameter passed to the function as float or as np.uint16. In the latter case the results are numerically correct. I have done the same with the script of issue #334 (seems to me that the issue is related), which again returns the correct results in my PC with the alpha argument as np.uint16. (I ran tests also with negative alpha values and the results are as expected).

My suggestion is to do the proper conversion in the cython wrapper, so the user does not have to do the conversion manually to use the floating point 16bit features. But I am unsure if all the OpenCL implementations do have this issue, namely treating cl_half as 16bit unsinged integer.

Best,
Thomas

pyclblast16.zip

Sorry for my late reply.

Have you looked at the FP16 example here? It uses the float16 Numpy data-type, and also uses pyclblast.float32_to_float16 for conversion of the alpha value.

Hi,

Sorry, I had not seen pyclblast.float32_to_float16 because the link to this function in the discussion of #334 returned a 404. My bad, I should had searched the examples and/or the sources a little more. Indeed, running pyclblast.float32_to_float16(val) and numpy.frombuffer(numpy.float16(val).tobytes(), np.uint16)[0] return the exact same integer value (except for the case of overflow, where the first errors out and the second returns NaN).

Still, this does not address the issue of the improper cast in the cython wrapper (rather imply that the numpy.float16 conversion is the issue). From my point of view, the suitable thing to do would be to perform a reinterpret_cast, but this feature is only available in C++, not in C or cython. It would also be user-friendly since python code that tries to run for different floating point types, would not need to implement check and conversions.

Certainly, this is just a suggestion.
Thanks again for the nice software.

Best,
Thomas

Thank you for the explanation. I do understand better now. I don't have much time myself to work on such things, but I'm happy to review a pull request from someone else to improve this.

Hi,

I am working on a project and I would like to run the same code with fp16, fp32 and fp64. So, I will implement the conversion in the cython code for my project. I will open a pull requested with the changes in case you are interested.

Just FYI, I have made two other changes to the cython wrapper: one is to be able to run the generator script under windows without messing up the line endings (mostly for convenience). The second is to use integer type output argument in amax, amin routines. This last point is necessary if you want to run these routines with fp16.
For more details see the pull request.

Best,
Thomas