Comment regarding _mm_dp_pd

Question

Comment regarding _mm_dp_pd

jnettlet opened this issue 3 years ago · comments

This comment was pointed out to me regarding the sse2neon implemention "There's a small issue with the _mm_dp_pd impl, though: you do the mul first and then mask the result according to imm8[4:5]. That is the opposite order to what the sse41 definition says; the sse41 original will avoid unwanted ops, eg. on NaNs, your impl won't"

Reviewing https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_dp_pd&expand=2714 that does seem to be the case.

marktwtn · Answer 1 · Sun Sep 12 2021 10:23:36 GMT+0800 (China Standard Time)

The _mm_dp_pd implementation is slightly different from the description on Intel Intrinsic Guide _mm_dp_pd.
Currently, I do not find any other NEON intrinsic suitable to replace the implementation to avoid unwanted ops.
Hence, to follow the definition of _mm_dp_pd, at least the if-else statement would be added, which would cause the additional branch.

@jnettlet I guess you are concerned about the multiplication of signaling NaN (SNaN) that causes the exception or sets the invalid operation flag?

Jon Nettleton · Answer 2 · Mon Sep 13 2021 14:08:02 GMT+0800 (China Standard Time)

That is the concern, although I currently have no direct implementations that trigger this in any of the projects I am testing with sse2neon.