CLMULR is inconsistent between v0.93 and v1.0

Question

CLMULR is inconsistent between v0.93 and v1.0

mohanson opened this issue 3 years ago · comments

I noticed an inconsistency about CLMULR. In the v0.93 version, clmulr loops from 0 to xlen, while it is from 0 to xlen-1 in v1.0.0, I am not sure if this is an upgrade or a mistake, so I created this issue.

v0.93

uint_xlen_t clmulr(uint_xlen_t rs1, uint_xlen_t rs2)
{
  uint_xlen_t x = 0;
  for (int i = 0; i < XLEN; i++)
    if ((rs2 >> i) & 1)
      x ^= rs1 >> (XLEN-i-1);
  return x;
}

v1.0.0

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 1) by 1) {
  output = if ((rs2_val >> i) & 1)
    then output ^ (rs1_val >> (xlen - i - 1));
    else output;
}
X[rd] = output

Philipp Tomsich · Answer 1 · Tue Jun 22 2021 20:10:34 GMT+0800 (China Standard Time)

Looks correct, given that the C form was < and not <=.
This may be an opportunity to rewrite as downto in SAIL, though...

Mohanson · Answer 2 · Tue Jun 22 2021 22:38:43 GMT+0800 (China Standard Time)

But if you observe CLMUL (not CLMULR), you will find that foreach (i from 0 to xlen by 1) seems be equivalent to for (int i = 0; i <XLEN; i++) ..

v0.93

uint_xlen_t clmul(uint_xlen_t rs1, uint_xlen_t rs2)
{
uint_xlen_t x = 0;
for (int i = 0; i < XLEN; i++)
  if ((rs2 >> i) & 1)
    x ^= rs1 << i;
  return x;
}

v1.0.0

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to xlen by 1) {
  output = if ((rs2_val >> i) & 1)
  then output ^ (rs1_val << i);
  else output;
}
X[rd] = output

Anna Slobodova · Answer 3 · Wed Jul 07 2021 04:20:03 GMT+0800 (China Standard Time)

It also looks like CLMULR has inconsistent description:
"clmulr produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product. That means clmulh is equivalent
to clmulr followed by a 1-bit right shift"
Rightshift of the result from clmulh by one will give us bits 2✕XLEN−2:XLEN+1 since clmulh produces 2✕XLEN−1:XLEN and 2xXLEN-1 bit is always 0.

Jim Wilson · Answer 4 · Wed Jul 07 2021 04:29:23 GMT+0800 (China Standard Time)

It is the clmulr result that is being shifted to produce clmulh. The cmulh result is not being shifted.

Anna Slobodova · Answer 5 · Wed Jul 07 2021 23:32:33 GMT+0800 (China Standard Time)

I see. That wasn't clear. Thanks

PaulPoperechny · Answer 6 · Mon Nov 14 2022 15:09:03 GMT+0800 (China Standard Time)

It looks like CMULR description is with mistake in Version 1.0.0-38-g865e7a7, 2021-06-28: Release candidate:

let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to **(xlen - 1)** by 1) {
  output = if ((rs2_val >> i) & 1)
    then output ^ (rs1_val >> (xlen - i - 1));
    else output;
}
X[rd] = output

Equivalent to for (int i = 0; i <XLEN-1; i++) cause for previous cmul and cmulh instruction the var i always with < limited but not <= limited.
It should be foreach (i from 0 to xlen by 1).
Let's unroll this for:

You can see that cmulr is not equivalent to just reversing inputs and outputs like in specification is presented:

So let's try to unroll with correct foreach:
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to xlen by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i - 1));
else output;
}
X[rd] = output

Now it seems to be correct,
cmulr is realy just cmul with reverse inputs and output, and cmulh is cmulr shifted right by 1

Andreas Wallner · Answer 7 · Mon Mar 04 2024 02:32:02 GMT+0800 (China Standard Time)

I'd like to bump this question since I observed the same inconsistency as @PaulPoperechny:
The Note section in 1.0.0-38-g865e7a7/current main makes me believe that the loop in the specification should go one element further.

With the current pseudocode description "clmulr is clmul with input and output reversed" (note in the spec) and "clmulh is clmulr shifted to right by 1" (comment 5 here by jim-wilson) are not true.
(clmul uses 0 to xlen, clmulr 0 to xlen-1)

And while I understand that both are not normative, both:

also use bound 0 to xlen