Problem with _mm_alignr_epi8 and constants
bigianb opened this issue · comments
Ian Brown commented
I'm seeing the following error in my code:
GSVector4i.h:676:21: error: argument value -8 is outside the valid range [0, 15]
return GSVector4i(_mm_alignr_epi8(v.m, m, i));
^~~~~~~~~~~~~~~~~~~~~~~~~~
sse2neon.h:6552:44: note: expanded from macro '_mm_alignr_epi8'
vreinterpretq_m128i_u8(vextq_u8(tmp_low, tmp_high, idx)); \
^ ~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.5/include/arm_neon.h:6812:24: note: expanded from macro 'vextq_u8'
__ret = (uint8x16_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 48); \
^ ~~~~
sse2neon.h:222:56: note: expanded from macro 'vreinterpretq_m128i_u8'
#define vreinterpretq_m128i_u8(x) vreinterpretq_s64_u8(x)
^
GSBlock.h:1280:12: note: in instantiation of function template specialization 'GSVector4i::srl<8>' requested here
v4 = v5.srl<8>(v6);
^
6 errors generated.
The root is compiling the following code with a specialisation of 8
template <int i>
__forceinline GSVector4i srl(const GSVector4i& v)
{
return GSVector4i(_mm_alignr_epi8(v.m, m, i));
}
That should be fine but it barfs on the last line of the snippet below:
#define _mm_alignr_epi8(a, b, imm) \
__extension__({ \
__m128i ret; \
if (_sse2neon_unlikely((imm) >= 32)) { \
ret = _mm_setzero_si128(); \
} else { \
uint8x16_t tmp_low, tmp_high; \
if (imm >= 16) { \
const int idx = imm - 16; \
tmp_low = vreinterpretq_u8_m128i(a); \
tmp_high = vdupq_n_u8(0); \
ret = \
vreinterpretq_m128i_u8(vextq_u8(tmp_low, tmp_high, idx)); \
It looks like the macro expansion for vextq_u8 is triggering the compiler error on a bounds check. The code would never execute because of the imm >= 16 check but the code is still generated and the const propagation looks to then trigger the error. Whether the last line get expanded I guess is down to the whim of the optimiser.
This is compiling with a M1 mac mini with the latest xcode:
% gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.5 (clang-1205.0.22.11)
Target: x86_64-apple-darwin20.6.0
Thread model: posix