Cost of instructions and impact of `vsetvl`

Question

Cost of instructions and impact of `vsetvl`

fpedd opened this issue 2 years ago · comments

Fabian Peddinghaus commented 2 years ago

The instruction-level simulators do not take into account the individual number of cycles per instruction. Thus, using the vsetvl instruction in every loop iteration might appear very inefficient. However, this does not correctly reflect real-world implementation costs. In most uArch implementations, the vsetvl instruction would actually incurre very little extra overhead. See this for more info.

Additionally, the vsetvl instructions can be fused internally into a single vector microop. From the rvv1.0 spec:

The primary motivation for the vtype CSR is to allow the vector instruction set to t into a 32-bit instruction encoding space. A
separate vset{i}vl{i} instruction can be used to set vl and/or vtype elds before execution of a vector instruction, and
implementations may choose to fuse these two instructions into a single internal vector microop. In many cases, the vl and vtype
values can be reused across multiple instructions, reducing the static and dynamic instruction overhead from the vset{i}vl{i}
instructions. It is anticipated that a future extended 64-bit instruction encoding would allow these elds to be specied statically in
the instruction encoding.

Additionally, when tuning the performance of muRISCV-NN kernels, it is important that vector instructions are correctly weighted according to their relative cost in actual implementations. For more info on an actual implementation example with some ballpark numbers, look here.