Correct _mm_min_ps, _mm_max_ps implementation

Question

syoyo opened this issue 4 years ago · comments

_mm_min_ps and _mm_max_ps cannot be accurately emulated with single vminq_f32/vmaxq_f32(and also vminnmq_f32 and vmaxnmq_f32) instruction.

We need special handling when both inputs are zeros and either input is NaN.

Here is an implementation of vmin/vmax which emulates _mm_min_ps/_mm_max_ps exactly(as far as I've tested)

Jim Huang · Answer 1 · Mon Jul 06 2020 07:17:39 GMT+0800 (China Standard Time)

Thank @syoyo for figuring out the accurate implementation. Can you send pull request as well?

Syoyo Fujita · Answer 2 · Mon Jul 06 2020 13:31:18 GMT+0800 (China Standard Time)

@jserv It requires some more tests to verify the implementation(You can also write tests). After the verification, I'm planning to send a PR.