DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimize _mm_round_pd within 2^52 only

howjmay opened this issue · comments

For rounding double numbers (to the nearest mode), there is a magic number 2^52 that can round the number within this range. Should we reimplement the current C fashion implementation in armv7 to the version with magic number? It somehow discard quite a lot of valid range.

We can refer the implementation here.

https://github.com/numpy/numpy/blob/main/numpy/core/src/common/simd/neon/math.h#L285-L299

For rounding double numbers (to the nearest mode), there is a magic number 2^52 that can round the number within this range. Should we reimplement the current C fashion implementation in armv7 to the version with magic number?

Can you show some evaluation on error rate? Any progress?

I may close this proposal, since it is a limited solution that works only under truncation, and losing 12 bits information is quite a lot