Question about riscv32B and riscv64B performance improvement.

Question

Question about riscv32B and riscv64B performance improvement.

lico614257 opened this issue a year ago · comments

I have a question about the performance impact of adding B extensions to riscv32. In my experiments compiling coremark with rv32gcb and rv32gc, I found only a 2% increase in coremark runs when compiling with rv32gcb; However, compiling coremark with rv64gcb and rv64gc showed a 10% increase in coremark runs compiled with rv64gcb.
Is this a normal phenomenon? Can it be argued that adding the B extension does not improve performance much on 32-bit?

stnolting · Answer 1 · Mon May 29 2023 18:26:21 GMT+0800 (China Standard Time)

I think this highly depends on the actual application / program being run (regardless of whether it is a 32-bit or 64-bit architecture). A software that makes extensive use of bit-field manipulation or string operations can surely benefit (in terms of performance) from the bit-manip extension.

Philipp Tomsich · Answer 2 · Mon May 29 2023 18:59:46 GMT+0800 (China Standard Time)

Depending on the application, the mix of Zb* instructions involved will change.

For Coremark, I can point to the following from the cuff:

Coremark operates on 32bit-indexed arrays, the Zba extension will help to alleviate penalties from the 64bit addressing on RV64.
32bit and 16bit extensions are quite prevalent in Coremark (as it operates on 32bit and 16bit datatypes), which should match to Zba and Zbb instructions ... however, whether each of these translates into an improvement depends on your microarchitecture (as the underlying instruction patterns are very easy to fuse)
The CRC calculation can benefit significantly from Zbc (as we have demonstrated a while back), although the necessary compiler enablement has not been completed in the community yet.
As there is no string-operations and no byte-swapping, neither orc.b nor rev8 come into play
The bitfield-extraction cases in coremark (e.g., cmp_idx and the entire 'dtype' logic in calc_func) are not covered by Zb*. Experiments with T-Head's XTheadBb have shown that a more general instruction (their th.ext and th.extu) can be easily applied to these in our compilers.

To answer your original question: if you have a workload that heavily depends on Zb* instructions in RV32 (e.g., makes heavy use of rev8, single-bit instructions, the XLEN-variants of Zba or carryless multiply), you will see a large win with Zb*; if your application mainly shows benefits from the 64bit-only (.uw) variants of Zba, then you will see only small gain.

I hope this provides some context.

lico_caesar · Answer 3 · Tue May 30 2023 09:55:31 GMT+0800 (China Standard Time)

thanks a lot.