DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help using SIMDVec

bigianb opened this issue · comments

I'm trying to port the following code to a mac M1 (so aarch64)

constexpr static __m128i cxpr_setr_epi32(int x, int y, int z, int w)
{
    __m128i m = {};
    m.m128i_i32[0] = x;
    m.m128i_i32[1] = y;
    m.m128i_i32[2] = z;
    m.m128i_i32[3] = w;
    return m;
}

I have tried the following in the body:

  SIMDVec m = {};
  m.m128_i32[0] = x;
  m.m128_i32[1] = y;
  m.m128_i32[2] = z;
  m.m128_i32[3] = w;
  return *(__m128i*)(&m);

and that gives me the error:
constexpr function never produces a constant expression [-Winvalid-constexpr]

I can't find any examples of where SIMVec is used to direct setting of __m128i components such as above and so am a bit lost. Can anyone shed some light on how I can make this work? I'm not sure what is making it fail the constexpr - I'm assuming it's something to do with the horrible cast on the return.

The SIMDVec struct was intended for internal usage only. It is important to note that accessing the __m128 struct directly is bad coding practice by Microsoft. See https://docs.microsoft.com/en-us/cpp/cpp/m128

Can you share the motivations and reasons why you attempted access the contents of an __m128 struct?

Hi, thanks for the quick response!
yes - I'm looking at what it would take to port some emulation code (pcsx2) to an ARM platform and this code is in its vector classes. It's basically trying to initialise a __m128i as a constant from 4 sub-components and the intrinsic setr_epi32 is not a constexpr. It's not my code so I'm trying to make as small a change as possible. I'd be happy to understand the 'right way' to do it.

It's used to construct const instances of the vector class ... which basically has a __m128 as a single member:

constexpr static GSVector4i cxpr(int x, int y, int z, int w)
{
	return GSVector4i(cxpr_setr_epi32(x, y, z, w));
}

and then this cxpr is used as follows:

CONSTINIT const GSVector4i GSVector4i::m_xff[17] =
{
	cxpr(0x00000000, 0x00000000, 0x00000000, 0x00000000),
	cxpr(0x000000ff, 0x00000000, 0x00000000, 0x00000000),
	cxpr(0x0000ffff, 0x00000000, 0x00000000, 0x00000000),
	cxpr(0x00ffffff, 0x00000000, 0x00000000, 0x00000000),

It's seems like quite a complex way to do something which you may think would be simple.

Source reference: https://github.com/PCSX2/pcsx2/blob/master/pcsx2/GS/GSVector4i.h#L23

It looks like this works:

return (__m128i) int32x4_t{x, y, z, w};