_mm_movemask_epi8 regression

Question

_mm_movemask_epi8 regression

jacksonrnewhouse opened this issue 4 years ago · comments

The aarch64 code path for _mm_movemask_epi8 introduced in #50 looks to be a regression when you actually compile it. The default behavior compiles to 7 instructions with no constants, while the "fast path" is 14 instructions plus a constant. Should it be reverted?

fast path: https://godbolt.org/z/41s54d
default: https://godbolt.org/z/xsYfz8

marktwtn · Answer 1 · Thu Dec 03 2020 15:41:18 GMT+0800 (China Standard Time)

I'll make a simple time experiment of _mm_movemask_epi8.

marktwtn · Answer 2 · Mon Jan 18 2021 16:43:53 GMT+0800 (China Standard Time)

I do the experiment on the ARMv8-A CPU, which is an ARM 64-bit architecture with optimization level 0.

It turns out that the aarch64 code path does behave worse.
We should revert it for performance consideration.

marktwtn · Answer 3 · Thu Jan 21 2021 15:17:42 GMT+0800 (China Standard Time)

The performance of optimization level 3 is measured as well.

The performance does not have too much difference.
@jserv I think we need to decide which optimization level we should focus on for the future performance improvement.