Is the current sse2neon.h accepted by C compilers?

Question

Is the current sse2neon.h accepted by C compilers?

h-murai opened this issue 3 years ago · comments

h-murai commented 3 years ago

Hi, I think the line 6439 isn't correct as a C statement.

static const __m128d mask = _mm_set_pd(1.0f, -1.0f);

gcc actually detects an error of "initializer element is not a compile-time constant" for the line.

Jim Huang · Answer 1 · Sat Dec 11 2021 17:08:01 GMT+0800 (China Standard Time)

Reported by recent GitHub Actions result, both gcc-10 and clang-11 are able to build the test suite without compilation errors. Can you show more information about the package and configurations?

h-murai · Answer 2 · Sat Dec 11 2021 20:30:14 GMT+0800 (China Standard Time)

I used GCC8 and Spack v0.17.0 to install the bwa package, which depends on sse2neon. Ok. I'll try it again with GCC10. Thanks.

jack · Answer 3 · Wed Dec 22 2021 00:11:48 GMT+0800 (China Standard Time)

The same here. Using g++ to compile everything works like a charm, while gcc failed. Btw, I have gcc-7.5 for aarch64.

jack · Answer 4 · Wed Dec 22 2021 01:15:14 GMT+0800 (China Standard Time)

The C language does not allow initializing a variable with a function call.
The testing is all C++, but it should fail with gcc and a .c file.

jack · Answer 5 · Wed Dec 22 2021 17:30:36 GMT+0800 (China Standard Time)

As I said, this is a language "feature", not related to the compiler (please see this post for details).

The test should pass, because all files are ended with .cpp and the compiler will treat these files as C++ langeuage.
A minimal example to reproduce the issue would be like this:

// a.c

#include "sse2neon.h"

int main() {
    static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
    return 0;
}

The C compiler will complain about error: initializer element is not constant when compiling with: gcc a.c.
However, everything would work if a C++ compiler were called, e.g.:

use g++ instead of gcc: g++ a.c
change the suffix: cp a.c a.cpp && gcc -c a.cpp
force GCC to call the C++ backend: gcc -xc++ -c a.c

Jim Huang · Answer 6 · Wed Dec 22 2021 19:31:19 GMT+0800 (China Standard Time)

The workaround is to remove the qualifier static const. That is,

@ -6436,7 +6436,7 @@ FORCE_INLINE __m128i _mm_xor_si128(__m128i a, __m128i b)
 // https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_addsub_pd
 FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
 {
-    static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
+    __m128d mask = _mm_set_pd(1.0f, -1.0f);
 #if defined(__aarch64__)
     return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
                                              vreinterpretq_f64_m128d(b),
@@ -6452,7 +6452,7 @@ FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
 // https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=addsub_ps
 FORCE_INLINE __m128 _mm_addsub_ps(__m128 a, __m128 b)
 {
-    static const __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
+    __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
 #if defined(__aarch64__) || defined(__ARM_FEATURE_FMA) /* VFPv4+ */
     return vreinterpretq_m128_f32(vfmaq_f32(vreinterpretq_f32_m128(a),
                                             vreinterpretq_f32_m128(mask),

@marktwtn, we shall figure out the elegant way to enable the use of static const as the hint for compiler optimizations.

jack · Answer 7 · Wed Dec 22 2021 20:20:47 GMT+0800 (China Standard Time)

You are right.
Indeed, removing the qualifier makes it work, but I would like to keep the const for now btw.

JishinMaster · Answer 8 · Fri Dec 24 2021 18:30:08 GMT+0800 (China Standard Time)

I had the same problem.

I used the following technique to still get a const but without compiler errors :

FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
{
    static const double mask[2] __attribute__((aligned(16))) = {-1.0, 1.0};
#if defined(__aarch64__)
    return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
                                             vreinterpretq_f64_m128d(b),
                                             vreinterpretq_f64_m128d(*(__m128d *) mask)));
#else
    return _mm_add_pd(_mm_mul_pd(b, *(__m128d *) mask), a);
#endif
}

marktwtn · Answer 9 · Sun Dec 26 2021 22:23:44 GMT+0800 (China Standard Time)

Based on the post offered by @QwertyJack :

In C language, objects with static storage duration have to be initialized with constant expressions,
or with aggregate initializers containing constant expressions.

Remove static keyword would fix the issue.

Besides, @JishinMaster uses the aggregate initializer to fix the issue.

I wonder the benefits of using static keyword here. Like avoid initializing variable multiple times?

Jim Huang · Answer 10 · Tue Dec 28 2021 00:53:14 GMT+0800 (China Standard Time)

Remove static keyword would fix the issue.
Besides, @JishinMaster uses the aggregate initializer to fix the issue.
I wonder the benefits of using static keyword here. Like avoid initializing variable multiple times?

How about experimenting with godbolt for some simplified test cases?

Bernhard Rosenkraenzer · Answer 11 · Thu Jan 27 2022 21:00:05 GMT+0800 (China Standard Time)

I'm using

#ifdef __cplusplus
static
#endif
const __m128d mask = _mm_set_pd(1.0f, -1.0f);

for the time being -- makes it work with both C and C++, and (contrary to just removing "static") keeps the optimization at least in C++ code.

Jim Huang · Answer 12 · Sun Jan 30 2022 01:12:06 GMT+0800 (China Standard Time)

Thank @berolinux and @QwertyJack for the insightful feedbacks.