Validate intrinsics for clang/LLVM

Question

Validate intrinsics for clang/LLVM

walbourn opened this issue 5 years ago · comments

Need to verify the DirectXMath intrinsics usage works with clang/LLVM for ARM and SSE.

Chuck Walbourn · Answer 1 · Sat Jun 22 2019 05:25:58 GMT+0800 (China Standard Time)

The Intel code path works generally, but XMVerifyCPUSupport needs updating for slight differences in __cpuid and __cpuidex.

See this commit

Also needed this and this.

Chuck Walbourn · Answer 2 · Sat Jun 22 2019 05:43:10 GMT+0800 (China Standard Time)

Updated the logic so that if you set -mf16c for clang/LLVM, I'll enable F16C intrinsics.

See this commit

Note _XM_F16C_INTRINSICS_ won't build with clang/LLVM unless __F16C__ is defined via -mf16c or -mavx2.

Chuck Walbourn · Answer 3 · Wed Jun 26 2019 00:17:22 GMT+0800 (China Standard Time)

MSVC's ARM compiler doesn't validate the types of ARM-NEON intrinsics. I tested it with clang, and fixed these in this commit

Chuck Walbourn · Answer 4 · Wed Jun 26 2019 00:18:27 GMT+0800 (China Standard Time)

Updated to use the clang native platform defines as well as a few minor fixes for intrinsics use for ARM in this commit.

Chuck Walbourn · Answer 5 · Thu Aug 01 2019 01:32:00 GMT+0800 (China Standard Time)

Note to validate ARM, the "ex" versions of the ARM intrinsics need fix-ups:

#define vld1_u32_ex(x,a) vld1_u32(x)
#define vld1_f32_ex(x,a) vld1_f32(x)
#define vld1q_u32_ex(x,a) vld1q_u32(x)
#define vld1q_f32_ex(x,a) vld1q_f32(x)

#define vld4_f32_ex(x,a) vld4_f32(x)

#define vst1_u32_ex(x,y,a) vst1_u32(x,y)
#define vst1_f32_ex(x,y,a) vst1_f32(x,y)
#define vst1q_u32_ex(x,y,a) vst1q_u32(x,y)
#define vst1q_f32_ex(x,y,a) vst1q_f32(x,y)

Also needed a intrinsic fix-up:

#define vacle_f32(x,y) vcle_f32(vabs_f32(x),vabs_f32(y))
#define vacleq_f32(x,y) vcleq_f32(vabsq_f32(x),vabsq_f32(y))

Chuck Walbourn · Answer 6 · Mon Mar 09 2020 13:11:45 GMT+0800 (China Standard Time)

So the ex versions are an MSVC extension.

VACLE is a pseduo-instruction so only MSVC has an intrinsics for it.

Fixed so these paths work on non-MSVC compilers in this commit