Validate intrinsics for clang/LLVM
walbourn opened this issue · comments
Need to verify the DirectXMath intrinsics usage works with clang/LLVM for ARM and SSE.
The Intel code path works generally, but XMVerifyCPUSupport
needs updating for slight differences in __cpuid
and __cpuidex
.
See this commit
Updated the logic so that if you set -mf16c
for clang/LLVM, I'll enable F16C intrinsics.
See this commit
Note
_XM_F16C_INTRINSICS_
won't build with clang/LLVM unless__F16C__
is defined via-mf16c
or-mavx2
.
MSVC's ARM compiler doesn't validate the types of ARM-NEON intrinsics. I tested it with clang, and fixed these in this commit
Updated to use the clang native platform defines as well as a few minor fixes for intrinsics use for ARM in this commit.
Note to validate ARM, the "ex" versions of the ARM intrinsics need fix-ups:
#define vld1_u32_ex(x,a) vld1_u32(x)
#define vld1_f32_ex(x,a) vld1_f32(x)
#define vld1q_u32_ex(x,a) vld1q_u32(x)
#define vld1q_f32_ex(x,a) vld1q_f32(x)
#define vld4_f32_ex(x,a) vld4_f32(x)
#define vst1_u32_ex(x,y,a) vst1_u32(x,y)
#define vst1_f32_ex(x,y,a) vst1_f32(x,y)
#define vst1q_u32_ex(x,y,a) vst1q_u32(x,y)
#define vst1q_f32_ex(x,y,a) vst1q_f32(x,y)
Also needed a intrinsic fix-up:
#define vacle_f32(x,y) vcle_f32(vabs_f32(x),vabs_f32(y))
#define vacleq_f32(x,y) vcleq_f32(vabsq_f32(x),vabsq_f32(y))
So the ex
versions are an MSVC extension.
VACLE is a pseduo-instruction so only MSVC has an intrinsics for it.
Fixed so these paths work on non-MSVC compilers in this commit