DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`_mm_shuffle_epi8` bit 7 of idx is not meaningless

MrUnbelievable92 opened this issue · comments

The sse2neon implementation of _mm_shuffle_epi8 masks away the 7th bit of each 8 bit integer in the index vector.
On x86, if an index is negative, 0 is returned for that vector element.
With the current implementation, [1 << 7 = -128] & 0x7F = 0 returns the 0th element in the lookup table instead of the value 0.

Hi @MrUnbelievable92 I am confused about which part you mentioned.
The current implementation uses 0x8F as mask which is 10001111 in binary.
So the most significant bit has been set to 1 already.
https://github.com/DLTcollab/sse2neon/blob/master/sse2neon.h#L6449

May I ask which part exactly you mentioned?

No, as it turns out I was the one getting confused.
8 != 7 and off-by-1 errors are my specialty ;)
Feel free to close the issue and sorry to bother. Thanks for the quick response!

Close as requested.