ARM64 asm bug in gfpReduce() method of mul_arm64.h

Question

ARM64 asm bug in gfpReduce() method of mul_arm64.h

emmansun opened this issue 2 years ago · comments

	\ // Our output is R21:R22:R23:R24. Reduce mod p if necessary.
	SUBS R5, R21, R10 \
	SBCS R6, R22, R11 \
	SBCS R7, R23, R12 \
	SBCS R8, R24, R13 \
	SBCS $0, R0, R0 \                    // missing this line
	\
	CSEL CS, R10, R21, R1 \
	CSEL CS, R11, R22, R2 \
	CSEL CS, R12, R23, R3 \
	CSEL CS, R13, R24, R4

Sun Yimin · Answer 1 · Wed Aug 03 2022 15:15:43 GMT+0800 (China Standard Time)

Hi @Bren2010 , @armfazh , Could you kindly help to review it? Thx!

Armando Faz · Answer 2 · Wed Aug 03 2022 15:24:28 GMT+0800 (China Standard Time)

Hi @emmansun , do you have inputs that trigger this error? do you mind sharing with us.
I will take a look on your PR.

Sun Yimin · Answer 3 · Wed Aug 03 2022 15:53:48 GMT+0800 (China Standard Time)

Thx @armfazh , I tested below code, and it's failed.

func bigFromHex(s string) *big.Int {
	b, ok := new(big.Int).SetString(s, 16)
	if !ok {
		panic("internal error: invalid encoding")
	}
	return b
}

func fromBigInt(x *big.Int) (out *gfP) {
	out = &gfP{}
	var a *big.Int
	if x.Sign() >= 0 {
		a = x
	} else {
		a = new(big.Int).Neg(x)
	}
	bytes := a.Bytes()
	if len(bytes) > 32 {
		panic("invalid byte length")
	} else if len(bytes) < 32 {
		fixedBytes := make([]byte, 32)
		copy(fixedBytes[32-len(bytes):], bytes)
		bytes = fixedBytes
	}
	for i := 0; i < 4; i++ {
		start := len(bytes) - 8
		out[i] = binary.BigEndian.Uint64(bytes[start:])
		bytes = bytes[:start]
	}
	if x.Sign() < 0 {
		gfpNeg(out, out)
	}
	if x.Sign() != 0 {
		montEncode(out, out)
	}
	return out
}

func TestInvert(t *testing.T) {
	one := newGFp(1)
	x := fromBigInt(bigFromHex("9093a2b979e6186f43a9b28d41ba644d533377f2ede8c66b19774bf4a9c7a596"))
	xInv := &gfP{}
	xInv.Invert(x)
	y := &gfP{}
	gfpMul(y, x, xInv)
	if *y != *one {
		t.Errorf("got %v, expected %v", y, one)
	}
}

btw, amd64 code is ok:

#define gfpCarry(a0,a1,a2,a3,a4, b0,b1,b2,b3,b4) \
	\ // b = a-p
	MOVQ a0, b0 \
	MOVQ a1, b1 \
	MOVQ a2, b2 \
	MOVQ a3, b3 \
	MOVQ a4, b4 \
	\
	SUBQ ·p2+0(SB), b0 \
	SBBQ ·p2+8(SB), b1 \
	SBBQ ·p2+16(SB), b2 \
	SBBQ ·p2+24(SB), b3 \
	SBBQ $0, b4 \   // ------------------ HERE---------------------
	\
	\ // if b is negative then return a
	\ // else return b
	CMOVQCC b0, a0 \
	CMOVQCC b1, a1 \
	CMOVQCC b2, a2 \
	CMOVQCC b3, a3

Armando Faz · Answer 4 · Thu Aug 04 2022 07:59:41 GMT+0800 (China Standard Time)

Hi, two things to note:
first, the number passed to a gfP{} must be always fully reduced modulo p.
The number 0x9093a2b979e6186f43a9b28d41ba644d533377f2ede8c66b19774bf4a9c7a596 is bigger than p. So it's expected the arithmetic fails if this condition is not met.

second, even with this large number, I tested the code above and does not throw an error.
it would be ideal to have an input that triggers the error.

Sun Yimin · Answer 5 · Thu Aug 04 2022 08:46:31 GMT+0800 (China Standard Time)

Hi @armfazh ,

The test case is for another set of curve parameters, let me write one and test with this project directly.
In general, the Gfp algorithm is common, and the carry handling should NOT ignore value in R0.

	\ // Add the 512-bit intermediate to m*N
	MOVD  ZR, R0 \
	ADDS  R9, R17 \
	ADCS R10, R25 \
	ADCS R11, R19 \
	ADCS R12, R20 \
	ADCS R13, R21 \
	ADCS R14, R22 \
	ADCS R15, R23 \
	ADCS R16, R24 \
	ADCS  ZR, R0 \              // possible carry value in R0
	\
	\ // Our output is R21:R22:R23:R24. Reduce mod p if necessary.
	SUBS R5, R21, R10 \
	SBCS R6, R22, R11 \
	SBCS R7, R23, R12 \
	SBCS R8, R24, R13 \
	SBCS $0, R0, R0 \                // missing this line
	\
	CSEL CS, R10, R21, R1 \
	CSEL CS, R11, R22, R2 \
	CSEL CS, R12, R23, R3 \
	CSEL CS, R13, R24, R4

Thx!

Sun Yimin · Answer 6 · Thu Aug 04 2022 10:35:14 GMT+0800 (China Standard Time)

For 1) I really can't find failed cases using known numbers. I share my parameters (used for SM9) here first:

// u is the BN parameter that determines the prime: 600000000058f98a.
var u = bigFromHex("600000000058f98a")

// p is a prime over which we form a basic field: 36u⁴+36u³+24u²+6u+1.
var p = bigFromHex("b640000002a3a6f1d603ab4ff58ec74521f2934b1a7aeedbe56f9b27e351457d")

// Order is the number of elements in both G₁ and G₂: 36u⁴+36u³+18u²+6u+1.
var Order = bigFromHex("b640000002a3a6f1d603ab4ff58ec74449f2934b18ea8beee56ee19cd69ecf25")

// p2 is p, represented as little-endian 64-bit words.
var p2 = [4]uint64{0xe56f9b27e351457d, 0x21f2934b1a7aeedb, 0xd603ab4ff58ec745, 0xb640000002a3a6f1}

// np is the negative inverse of p, mod 2^256.
var np = [4]uint64{0x892bc42c2f2ee42b, 0x181ae39613c8dbaf, 0x966a4b291522b137, 0xafd2bac5558a13b3}

// rN1 is R^-1 where R = 2^256 mod p.
var rN1 = &gfP{0x0a1c7970e5df544d, 0xe74504e9a96b56cc, 0xcda02d92d4d62924, 0x7d2bc576fdf597d1}

// r2 is R^2 where R = 2^256 mod p.
var r2 = &gfP{0x27dea312b417e2d2, 0x88f8105fae1a5d3f, 0xe479b522d6706e7b, 0x2ea795a656f62fbd}

// r3 is R^3 where R = 2^256 mod p.
var r3 = &gfP{0x130257769df5827e, 0x36920fc0837ec76e, 0xcbec24519c22a142, 0x219be84a7c687090}

// pMinus2 is p-2.
var pMinus2 = [4]uint64{0xe56f9b27e351457b, 0x21f2934b1a7aeedb, 0xd603ab4ff58ec745, 0xb640000002a3a6f1}

Armando Faz · Answer 7 · Fri Aug 05 2022 02:14:48 GMT+0800 (China Standard Time)

just to note that this library does not guarantee the correctness of field arithmetic using other sets of parameters.
In part, this is because it is optimized to work with one specific prime field, rather than a generic implementation for any prime field of 256-bits.

About the change suggested in #28 , it looks ok and is compatible with the amd64 version.