DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is the current sse2neon.h accepted by C compilers?

h-murai opened this issue · comments

Hi, I think the line 6439 isn't correct as a C statement.

static const __m128d mask = _mm_set_pd(1.0f, -1.0f);

gcc actually detects an error of "initializer element is not a compile-time constant" for the line.

Reported by recent GitHub Actions result, both gcc-10 and clang-11 are able to build the test suite without compilation errors. Can you show more information about the package and configurations?

I used GCC8 and Spack v0.17.0 to install the bwa package, which depends on sse2neon. Ok. I'll try it again with GCC10. Thanks.

commented

The same here. Using g++ to compile everything works like a charm, while gcc failed. Btw, I have gcc-7.5 for aarch64.

commented

The C language does not allow initializing a variable with a function call.
The testing is all C++, but it should fail with gcc and a .c file.

commented

As I said, this is a language "feature", not related to the compiler (please see this post for details).

The test should pass, because all files are ended with .cpp and the compiler will treat these files as C++ langeuage.
A minimal example to reproduce the issue would be like this:

// a.c

#include "sse2neon.h"

int main() {
    static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
    return 0;
}

The C compiler will complain about error: initializer element is not constant when compiling with: gcc a.c.
However, everything would work if a C++ compiler were called, e.g.:

  • use g++ instead of gcc: g++ a.c
  • change the suffix: cp a.c a.cpp && gcc -c a.cpp
  • force GCC to call the C++ backend: gcc -xc++ -c a.c

The workaround is to remove the qualifier static const. That is,

@ -6436,7 +6436,7 @@ FORCE_INLINE __m128i _mm_xor_si128(__m128i a, __m128i b)
 // https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_addsub_pd
 FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
 {
-    static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
+    __m128d mask = _mm_set_pd(1.0f, -1.0f);
 #if defined(__aarch64__)
     return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
                                              vreinterpretq_f64_m128d(b),
@@ -6452,7 +6452,7 @@ FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
 // https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=addsub_ps
 FORCE_INLINE __m128 _mm_addsub_ps(__m128 a, __m128 b)
 {
-    static const __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
+    __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
 #if defined(__aarch64__) || defined(__ARM_FEATURE_FMA) /* VFPv4+ */
     return vreinterpretq_m128_f32(vfmaq_f32(vreinterpretq_f32_m128(a),
                                             vreinterpretq_f32_m128(mask),

@marktwtn, we shall figure out the elegant way to enable the use of static const as the hint for compiler optimizations.

commented

You are right.
Indeed, removing the qualifier makes it work, but I would like to keep the const for now btw.

I had the same problem.

I used the following technique to still get a const but without compiler errors :

FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
{
    static const double mask[2] __attribute__((aligned(16))) = {-1.0, 1.0};
#if defined(__aarch64__)
    return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
                                             vreinterpretq_f64_m128d(b),
                                             vreinterpretq_f64_m128d(*(__m128d *) mask)));
#else
    return _mm_add_pd(_mm_mul_pd(b, *(__m128d *) mask), a);
#endif
}

Based on the post offered by @QwertyJack :

In C language, objects with static storage duration have to be initialized with constant expressions,
or with aggregate initializers containing constant expressions.

Remove static keyword would fix the issue.

Besides, @JishinMaster uses the aggregate initializer to fix the issue.

I wonder the benefits of using static keyword here. Like avoid initializing variable multiple times?

Remove static keyword would fix the issue.
Besides, @JishinMaster uses the aggregate initializer to fix the issue.
I wonder the benefits of using static keyword here. Like avoid initializing variable multiple times?

How about experimenting with godbolt for some simplified test cases?

I'm using

#ifdef __cplusplus
static
#endif
const __m128d mask = _mm_set_pd(1.0f, -1.0f);

for the time being -- makes it work with both C and C++, and (contrary to just removing "static") keeps the optimization at least in C++ code.

Thank @berolinux and @QwertyJack for the insightful feedbacks.