Is the current sse2neon.h accepted by C compilers?
h-murai opened this issue · comments
Hi, I think the line 6439 isn't correct as a C statement.
static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
gcc actually detects an error of "initializer element is not a compile-time constant" for the line.
Reported by recent GitHub Actions result, both gcc-10 and clang-11 are able to build the test suite without compilation errors. Can you show more information about the package and configurations?
I used GCC8 and Spack v0.17.0 to install the bwa package, which depends on sse2neon. Ok. I'll try it again with GCC10. Thanks.
The same here. Using g++
to compile everything works like a charm, while gcc
failed. Btw, I have gcc-7.5 for aarch64.
The C language does not allow initializing a variable with a function call.
The testing is all C++, but it should fail with gcc
and a .c
file.
As I said, this is a language "feature", not related to the compiler (please see this post for details).
The test should pass, because all files are ended with .cpp
and the compiler will treat these files as C++ langeuage.
A minimal example to reproduce the issue would be like this:
// a.c
#include "sse2neon.h"
int main() {
static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
return 0;
}
The C compiler will complain about error: initializer element is not constant
when compiling with: gcc a.c
.
However, everything would work if a C++ compiler were called, e.g.:
- use
g++
instead ofgcc
:g++ a.c
- change the suffix:
cp a.c a.cpp && gcc -c a.cpp
- force GCC to call the C++ backend:
gcc -xc++ -c a.c
The workaround is to remove the qualifier static const
. That is,
@ -6436,7 +6436,7 @@ FORCE_INLINE __m128i _mm_xor_si128(__m128i a, __m128i b)
// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_addsub_pd
FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
{
- static const __m128d mask = _mm_set_pd(1.0f, -1.0f);
+ __m128d mask = _mm_set_pd(1.0f, -1.0f);
#if defined(__aarch64__)
return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
vreinterpretq_f64_m128d(b),
@@ -6452,7 +6452,7 @@ FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=addsub_ps
FORCE_INLINE __m128 _mm_addsub_ps(__m128 a, __m128 b)
{
- static const __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
+ __m128 mask = _mm_setr_ps(-1.0f, 1.0f, -1.0f, 1.0f);
#if defined(__aarch64__) || defined(__ARM_FEATURE_FMA) /* VFPv4+ */
return vreinterpretq_m128_f32(vfmaq_f32(vreinterpretq_f32_m128(a),
vreinterpretq_f32_m128(mask),
@marktwtn, we shall figure out the elegant way to enable the use of static const
as the hint for compiler optimizations.
You are right.
Indeed, removing the qualifier makes it work, but I would like to keep the const
for now btw.
I had the same problem.
I used the following technique to still get a const but without compiler errors :
FORCE_INLINE __m128d _mm_addsub_pd(__m128d a, __m128d b)
{
static const double mask[2] __attribute__((aligned(16))) = {-1.0, 1.0};
#if defined(__aarch64__)
return vreinterpretq_m128d_f64(vfmaq_f64(vreinterpretq_f64_m128d(a),
vreinterpretq_f64_m128d(b),
vreinterpretq_f64_m128d(*(__m128d *) mask)));
#else
return _mm_add_pd(_mm_mul_pd(b, *(__m128d *) mask), a);
#endif
}
Based on the post offered by @QwertyJack :
In C language, objects with static storage duration have to be initialized with constant expressions,
or with aggregate initializers containing constant expressions.
Remove static
keyword would fix the issue.
Besides, @JishinMaster uses the aggregate initializer to fix the issue.
I wonder the benefits of using static
keyword here. Like avoid initializing variable multiple times?
Remove
static
keyword would fix the issue.
Besides, @JishinMaster uses the aggregate initializer to fix the issue.
I wonder the benefits of usingstatic
keyword here. Like avoid initializing variable multiple times?
How about experimenting with godbolt for some simplified test cases?
I'm using
#ifdef __cplusplus
static
#endif
const __m128d mask = _mm_set_pd(1.0f, -1.0f);
for the time being -- makes it work with both C and C++, and (contrary to just removing "static") keeps the optimization at least in C++ code.
Thank @berolinux and @QwertyJack for the insightful feedbacks.