riscv / riscv-bitmanip

Working draft of the proposed RISC-V Bitmanipulation extension

Home Page:https://jira.riscv.org/browse/RVG-122

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

review of the 0.94 60 doc

jim-wilson opened this issue · comments

On page 3, third paragraph has "trial function" which I suspect should be "trivial function".

On page 5, the table has a rev8_64 instruction that should just be rev8.

On page 8, section 1.2.5 sign- and zero-extension, the software hint says "and for the zero-extension off 16 bit and 32 bit quantities". However, there is no 32-bit zero-extend instruction in zbb. There is one in zba, which is an alias for add.uw, but this is not the zba doc section. I consider this a spec bug that the 32-bit zero-extend instruction is missing from zbb. But if we don't change this, then there should not be a 32-bit zero-extend mention here unless it explicitly mentions add.uw in zba. Also, the "off" should be "of".

On page 9, in the architecture explanation for 1.2.6 bitwise rotation, the text in the shaded box is garbled. Apparently some explanation of why we don't have roli but too garbled for me to figure out how to fix it.

On page 9, 1.2.8 byte-reverse, the table has &#; instead of a checkmark in the rv32 column.

On page 11, add.uw, should mention in a note that this is the canonical zext.w.

On page 13, the header for this bclr page is broken. It says "#insns-bclr,reftext="Single-Bit Clear (Register)"] === bclr". It should say "2.3. bclr". The next section bclri is numbered 2.3 when it should be 2.4. And all 2.x sections after that have the wrong number.

There are some consistency issues, e.g. has both bitmanip and bit-manip, and both subextension and sub-extension. There probably should be only one correct spelling for both. The first table has X in the boxes. The following tables use checkmarks.

It would be nice to know what language the code in the Operation boxes is written in. There are a lot of similar looking languages, and it isn't obvious to me which one is being used here.

As mentioned in another issue, the rori instruction has different encodings for rv32 and rv64 which looks like a bug.

The lack of a zext.w instruction was also noticed in the discussion for the code reduction extension where they will define a C.zext.w rd/rs1' instruction.

I know it is very late in the process, but if the "spec bug" of not having zext.w in zbb is to be repaired, doesn't it make more sense to have slli.uw rd rs1 imm in zbb with zext.w rd rs1 an pseudo instruction for slli.uw rd rs1 0 ?

An important (main?) point about a zext.w instruction is to support p + n and x[n] == *(&x +n) with n an unsigned int and p a pointer resp x an array. Hence a zext.w would be followed by a slli in many cases. In particular if sizeof(*p) != 1 then the combination
// struct foo{..}; // standin for any datatype with sizeof > 1.
//struct foo *p, *q;
//unsigned int n;
//
//q = p + n;
// q --> rd; n --> rs1; p --> rs2

would translate to

zext.w rd rs1 # can use C.zext if it is defined and rd == rs1 and rd = x8... x15, but should be uncommon.
C.slli rd, LOG2(sizeof(*p))
C.add rd, rs2

which is only a marginal improvement over

slli rd rs1, 32; // can use C.slli if rd == rs1, but should be uncommon.
C.srli rd, (32 - LOG2(sizeof(*p)); // have to assume rd = x8...x15 to use C
C.add rd, rs2

whereas if slli.uw is available it can be just

slli.uw rd, rs1, LOG2(sizeof(*p))
C.add rd, rs2

I understood that the point of splitting of Zba from Zbb is because there was concern that a "fused"

sh3add rd, rs1, rs2 # fuses slli rd, rs1, 3; C.add rd, rs2

or

sh3add.uw rd, rs1, rs2 # fuses slli.uw rd, rs1, 3; C.add rd, rs2

would slow down high performance implementations. However, it seems there is no such concern for slli.uw and it should be a cheap variation of slli so it seems unproblematic to add it to Zbb.

Also I understood that an immediate instruction is slightly preferred as a pseudo extension over one with an instruction with rs2 = zero so

zext.uw rd, rs1 --> slli.uw rd, rs1, 0

should be slightly preferred over

zext.uw rd, rs1 --> add.uw rd, rs1, zero.

zext.h is an alias for pack, but we did not add the entire pack instruction to zbb because the point isn't to add pack, bur rather the missing zero extend instruction.

Similarly, zext.w is an alias for add.uw, but we should not add the entire add.uw or slli.uw to zbb, because the point here is to add the missing zero extend instruction for consistency, not to add yet another general purpose instruction to zbb.

I don't care whether zext.w is an alias for add.uw or slli.uw.

If you want slli.uw for performance, then you should implement both zba and zbb, and not try to force slli.uw into zbb. I would highly encourage everyone to implement both subsets, as they are both very useful to the compiler.

I fully agree that Zba is very useful, and people should just implement both Zbb and Zba. That would make the whole issue basically moot.

However, I just wanted to point out that if you want to fix the Zbb spec bug of not having zext.w you might as well add slli.uw at essentially zero extra cost compared to zext.w while being more useful for the reason described above and not raising the concerns people have with Zba for high performance implementations (which I can't really judge). In the case of pack vs zext.h you can at least reasonably argue that there is potentially an extra implementation cost and no directly obvious compiler benefit.