DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Relaxed type casting

aqrit opened this issue · comments

commented

From RenderKit/embree#410

I haven't checked with a compiler but these lines (from 72daa0f) seem lax:

sse2neon.h:580:    return vget_lane_u32((uint32x2_t) o, 0); /// uint16_t
sse2neon.h:2486:    return vsetq_lane_f32(vgetq_lane_f32(_mm_rsqrt_ps(in), 0), in, 0); /// __m128
sse2neon.h:3151:    return vld1q_f32((float32_t *) c); /// __m128d
sse2neon.h:3173:    return vld1q_f32((float32_t *) c); /// __m128d
sse2neon.h:3303:    return (__m128i) vrhaddq_u16(vreinterpretq_u16_m128i(a), /// C-style cast
sse2neon.h:4540:    return vld1q_f32((float32_t *) c); /// __m128d
sse2neon.h:5025:    return vgetq_lane_u64(high_bits, 0) | (vgetq_lane_u64(high_bits, 1) << 1); /// int
sse2neon.h:5078:    return vld1q_f32((float32_t *) c); /// __m128d
sse2neon.h:5324:    return (__m128i) vld1q_s8(data); /// C-style cast
sse2neon.h:5481:    return (__m128i) vld1q_s8(data); /// C-style cast
sse2neon.h:5868:    return (__m128i) vshlq_s16((int16x8_t) a, vdupq_n_s16(-count)); /// C-style cast
sse2neon.h:6306:    return vld1q_f32((float32_t *) c); /// __m128d
sse2neon.h:7011:    return vreinterpretq_s64_s16( /// __m128i
sse2neon.h:7034:    return vreinterpret_s64_s16(vqadd_s16(vuzp1_s16(a, b), vuzp2_s16(a, b))); /// __m64
sse2neon.h:7037:    return vreinterpret_s64_s16(vqadd_s16(res.val[0], res.val[1])); /// __m64

Recent embree enforces -flax-vector-conversions option which allows implicit conversions between vectors with differing numbers of elements and/or incompatible element types. According to GCC manual, this option should not be used for new code.

Duplicated with #614