DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in _mm_storel_epi64

andrewevstyukhin opened this issue · comments

Hi,
the _mm_storel_epi64 intrinsic for movq m64, xmm SSE2 instruction performs store low 64 bits of 128-bit register.

Neon version first reads high portion from memory and then writes it back. Such out of bounds access causes general U.B. in C++ and breaks execution.

Usually I did vst1 in manual porting. For example:

alignas(8) uint8_t alphas[8];
_mm_storel_epi64(reinterpret_cast<__m128i*>(alphas), mt);

BTW, casting does PVS warning V641
The size of the 'alphas' buffer is not a multiple of the element size of the type '__m128i'
=>

alignas(8) uint8_t alphas[8];
vst1_u8(alphas, mt);

So vst1_u64((uint64_t*)a, vget_low_u64(vreinterpretq_u64_m128i(b))); seems a better solution.

Thank @andrewevstyukhin for pointing this out. Can you send a pull request accordingly?