riscv / riscv-bitmanip

Working draft of the proposed RISC-V Bitmanipulation extension

Home Page:https://jira.riscv.org/browse/RVG-122

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLMULR is inconsistent between v0.93 and v1.0

mohanson opened this issue · comments

I noticed an inconsistency about CLMULR. In the v0.93 version, clmulr loops from 0 to xlen, while it is from 0 to xlen-1 in v1.0.0, I am not sure if this is an upgrade or a mistake, so I created this issue.

v0.93

uint_xlen_t clmulr(uint_xlen_t rs1, uint_xlen_t rs2)
{
  uint_xlen_t x = 0;
  for (int i = 0; i < XLEN; i++)
    if ((rs2 >> i) & 1)
      x ^= rs1 >> (XLEN-i-1);
  return x;
}

v1.0.0

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 1) by 1) {
  output = if ((rs2_val >> i) & 1)
    then output ^ (rs1_val >> (xlen - i - 1));
    else output;
}
X[rd] = output

Looks correct, given that the C form was < and not <=.
This may be an opportunity to rewrite as downto in SAIL, though...

But if you observe CLMUL (not CLMULR), you will find that foreach (i from 0 to xlen by 1) seems be equivalent to for (int i = 0; i <XLEN; i++) ..

v0.93

uint_xlen_t clmul(uint_xlen_t rs1, uint_xlen_t rs2)
{
uint_xlen_t x = 0;
for (int i = 0; i < XLEN; i++)
  if ((rs2 >> i) & 1)
    x ^= rs1 << i;
  return x;
}

v1.0.0

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to xlen by 1) {
  output = if ((rs2_val >> i) & 1)
  then output ^ (rs1_val << i);
  else output;
}
X[rd] = output

It also looks like CLMULR has inconsistent description:
"clmulr produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product. That means clmulh is equivalent
to clmulr followed by a 1-bit right shift"
Rightshift of the result from clmulh by one will give us bits 2✕XLEN−2:XLEN+1 since clmulh produces 2✕XLEN−1:XLEN and 2xXLEN-1 bit is always 0.

It is the clmulr result that is being shifted to produce clmulh. The cmulh result is not being shifted.

I see. That wasn't clear. Thanks

It looks like CMULR description is with mistake in Version 1.0.0-38-g865e7a7, 2021-06-28: Release candidate:

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to **(xlen - 1)** by 1) {
  output = if ((rs2_val >> i) & 1)
    then output ^ (rs1_val >> (xlen - i - 1));
    else output;
}
X[rd] = output

Equivalent to for (int i = 0; i <XLEN-1; i++) cause for previous cmul and cmulh instruction the var i always with < limited but not <= limited.
It should be foreach (i from 0 to xlen by 1).
Let's unroll this for:
image

You can see that cmulr is not equivalent to just reversing inputs and outputs like in specification is presented:
image

So let's try to unroll with correct foreach:
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to xlen by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i - 1));
else output;
}
X[rd] = output

image

Now it seems to be correct,
cmulr is realy just cmul with reverse inputs and output, and cmulh is cmulr shifted right by 1

I'd like to bump this question since I observed the same inconsistency as @PaulPoperechny:
The Note section in 1.0.0-38-g865e7a7/current main makes me believe that the loop in the specification should go one element further.

With the current pseudocode description "clmulr is clmul with input and output reversed" (note in the spec) and "clmulh is clmulr shifted to right by 1" (comment 5 here by jim-wilson) are not true.
(clmul uses 0 to xlen, clmulr 0 to xlen-1)

And while I understand that both are not normative, both:

also use bound 0 to xlen