microsoft / DirectXMath

DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

Home Page:https://walbourn.github.io/introducing-directxmath/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Validate intrinsics for clang/LLVM

walbourn opened this issue · comments

Need to verify the DirectXMath intrinsics usage works with clang/LLVM for ARM and SSE.

The Intel code path works generally, but XMVerifyCPUSupport needs updating for slight differences in __cpuid and __cpuidex.

See this commit

Also needed this and this.

Updated the logic so that if you set -mf16c for clang/LLVM, I'll enable F16C intrinsics.

See this commit

Note _XM_F16C_INTRINSICS_ won't build with clang/LLVM unless __F16C__ is defined via -mf16c or -mavx2.

MSVC's ARM compiler doesn't validate the types of ARM-NEON intrinsics. I tested it with clang, and fixed these in this commit

Updated to use the clang native platform defines as well as a few minor fixes for intrinsics use for ARM in this commit.

Note to validate ARM, the "ex" versions of the ARM intrinsics need fix-ups:

#define vld1_u32_ex(x,a) vld1_u32(x)
#define vld1_f32_ex(x,a) vld1_f32(x)
#define vld1q_u32_ex(x,a) vld1q_u32(x)
#define vld1q_f32_ex(x,a) vld1q_f32(x)

#define vld4_f32_ex(x,a) vld4_f32(x)

#define vst1_u32_ex(x,y,a) vst1_u32(x,y)
#define vst1_f32_ex(x,y,a) vst1_f32(x,y)
#define vst1q_u32_ex(x,y,a) vst1q_u32(x,y)
#define vst1q_f32_ex(x,y,a) vst1q_f32(x,y)

Also needed a intrinsic fix-up:

#define vacle_f32(x,y) vcle_f32(vabs_f32(x),vabs_f32(y))
#define vacleq_f32(x,y) vcleq_f32(vabsq_f32(x),vabsq_f32(y))

So the ex versions are an MSVC extension.

VACLE is a pseduo-instruction so only MSVC has an intrinsics for it.

Fixed so these paths work on non-MSVC compilers in this commit