Clang optimizing SSE2NEON_PRECISE_MINMAX incorrectly
markreidvfx opened this issue · comments
This might be a bug in clang but figure I'd report it here first.
I have a technique I use to clamp NaN values to zero.
It's pretty simple, you exploit the fact, nan > 0.0f == false
#define MIN(a,b) ((a) > (b) ? (b) : (a))
#define MAX(a,b) ((a) > (b) ? (a) : (b))
MIN(amax, MAX(a, amin));
The MAX
is done first on purpose.
The SSE2 code is this
_mm_min_ps(amax, _mm_max_ps(a, amin));
I'm having issues with clang's optimizer messing up this behaviour and nans still propagating.
The neon min/max instructions propagates NaNs and SSE2 ones don't (ish), so I've been defining SSE2NEON_PRECISE_MINMAX 1
the _mm_max_ps
intrinsic becomes
vbslq_f32(vcgtq_f32(a, b), a, b);
This looks perfectly correct to me, but clang is optimizing this to the fmaxnm
instruction. The fmaxnm
instruction only deals with quiet NaNs, signalling NaNs still propagate. :(
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
Here is a small program illustrating this happening
https://godbolt.org/z/eE1G3Gcov
I'm currently working around this by using inline assembly.
Hi @markreidvfx ,
For my personal point-of-view, I think this may be an issue of Clang.
For GCC with -O3
flag, it uses fcmgt
, and
, and bsl
.
Here is a small program (modified by your example) for illustration: https://godbolt.org/z/sfrKbx1e8
One more, thing, kindly leave the link for the discussion on Clang forum if possible.
Yes, that's my opinion too, especially since if you compile in debug the code works.
I'll report it to clang and see what they say.
The same thing can also happen with scalar code.
https://godbolt.org/z/d4j9418Kx
I can trick the compiler by subtly changing the clamp function, but who know for how long that will last...
https://godbolt.org/z/rq36Trb4d