Division causes assert failure

Question

Division causes assert failure

guidovranken opened this issue a year ago · comments

import bigints

var a = initBigint("6277101735386680763835789123314955362437298222279840143829")
var b = initBigint("1461501637330902918203684832716283019655932313743")
var r = a.div(b)

Output:

Error: unhandled exception: /home/jhg/.nimble/pkgs/bigints-1.0.0/bigints.nim(652, 14) `q1 <= uint32.high`

Using latest repository version of bigints, Nim 1.6.10 on Linux x64.

Pietro Peterlongo · Answer 1 · Thu Jan 12 2023 20:24:34 GMT+0800 (China Standard Time)

Wow that's bad, thanks for reporting! I can reproduce on playground: https://play.nim-lang.org/#ix=4kYW

Pietro Peterlongo · Answer 2 · Thu Jan 12 2023 21:19:21 GMT+0800 (China Standard Time)

I checked going back to older commits and it seems to be a very old issue: already present in (Aug 2020, after merging a PR by Araq to make it work with nim devel) 8cb412f

going further in the past is hard since I could not find a nim version that would make bigints compile without additional changes. probably not worth it to go that way back.

Pietro Peterlongo · Answer 3 · Thu Jan 12 2023 21:37:09 GMT+0800 (China Standard Time)

doing a bit of debugging but need to stop now, will try to look into this later (if someone wants to pick this up, most welcome)

for ref this is how I annotated unsignedDivRem (where the bug occurs)

import sugar

# From Knuth and Python
func unsignedDivRem(q, r: var BigInt, n, d: BigInt) =
  var
    nn = n.limbs.len
    dn = d.limbs.len
  dump nn
  dump dn

  if n.isZero:
    q = zero
    r = zero
  elif nn < dn:
    # n < d
    q = zero
    r = n
  elif dn == 1:
    var x: uint32
    unsignedDivRem(q, x, n, d.limbs[0])
    r.limbs = @[x]
    r.isNegative = false
  else:
    assert nn >= dn and dn >= 2

    # normalize
    let ls = 32 - bits(d.limbs[d.limbs.high])
    r = d shl ls
    q = n shl ls
    if q.limbs.len > n.limbs.len or q.limbs[q.limbs.high] >= r.limbs[r.limbs.high]:
      q.limbs.add(0'u32)
      inc(nn)

    let k = nn - dn
    assert k >= 0
    var a: BigInt
    a.limbs.setLen(k)
    let wm1 = r.limbs[r.limbs.high]
    let wm2 = r.limbs[r.limbs.high-1]
    var ak = k

    var zhi = zero
    var z = zero
    var qib = zero
    var q1b = zero

    dump q
    dump r
    dump nn
    dump dn
    dump k

    for v in countdown(k-1, 0):
      dump v
      # estimate quotient digit, may rarely overestimate by 1
      let vtop = q.limbs[v + dn]
      assert vtop <= wm1
      let vv = (uint64(vtop) shl 32) or q.limbs[v+dn-1]
      var q1 = vv div wm1
      var r1 = vv mod wm1

      while (wm2 * q1) > ((r1 shl 32) or q.limbs[v+dn-2]):
        debugEcho "overestimating"
        dump (wm2 * q1)
        dump ((r1 shl 32) or q.limbs[v+dn-2])
        dec q1
        r1 += wm1
        if r1 > uint32.high:
          debugEcho "breaking on r1"
          break
        dump (wm2 * q1)
        dump ((r1 shl 32) or q.limbs[v+dn-2])

      dump q1
      dump uint32.high
      assert q1 <= uint32.high  # error occurs here

      ...

and this debug output with above example:

nn = 6
dn = 5
q = (limbs: @[3572406741, 1867334985, 3529740808, 4294963510, 4294967295, 4294967295, 0], isNegative: false)
r = (limbs: @[4294738063, 4294967295, 4294967295, 4294967295, 4294967295], isNegative: false)
nn = 7
dn = 5
k = 2
v = 1
q1 = 1
uint32.high = 4294967295
v = 0
overestimating
(wm2 * q1) = 18446744073709551615
((r1 shl 32) or q.limbs[v + dn - 2]) = 4294963510
(wm2 * q1) = 18446744069414584320
((r1 shl 32) or q.limbs[v + dn - 2]) = 18446744073709547830
q1 = 4294967296
uint32.high = 4294967295

dlesnoff · Answer 4 · Thu Jan 12 2023 22:51:29 GMT+0800 (China Standard Time)

@guidovranken Thanks for the report.
@pietroppeter Thanks for the debugging and insightful stacktrace. No need to go further down the commit history, I guess the error is here from the very first time this algorithm was committed.
« From Knuth and Python » is a very obscure reference. It will be hard to find where the algorithm and the notations come from.
I will check in TAOCP vol.2 tonight.

Guido Vranken · Answer 5 · Thu Jan 12 2023 22:53:44 GMT+0800 (China Standard Time)

You're welcome. Is this library currently used in production anywhere?

dlesnoff · Answer 6 · Thu Jan 12 2023 23:00:15 GMT+0800 (China Standard Time)

Short answer: no.
It was planned to be used in Nim's compiler : nim-lang/Nim#14696

Guido Vranken · Answer 7 · Thu Jan 12 2023 23:10:14 GMT+0800 (China Standard Time)

Ok. I found this with my math&crypto fuzzer: https://github.com/guidovranken/cryptofuzz
If this library ends up getting used in important software we can fuzz it on OSS-Fuzz: https://github.com/google/oss-fuzz

Pietro Peterlongo · Answer 8 · Thu Jan 12 2023 23:17:15 GMT+0800 (China Standard Time)

ah, nice to know it was found with a fuzzer. After your example I actually thought, we should fuzz this library... I was thinking bout using a recently implemented library from planetis https://github.com/status-im/nim-drchaos (there is also a video from nimconf 2022), but the list of bugs found by your library is definitely impressive!

regarding the "knuth and python" notes, maybe @def- know a bit more but I imagine python implements a division algorithm by knuth and googling around my guess would be this is algo Algorithm 4.3.1D" from page 2 of Volume 1 of TAOCP, linking this HN conversation of an article about this since it seems interesting https://news.ycombinator.com/item?id=26562819
edit: actually the article references it as "Volume 2: Seminumerical Algorithms, Chapter 4.3: Multiple-Precision Arithmetic of The Art of Computer Programming, Donald Ervin Knuth presents Algorithm D (Division of nonnegative integers):"

Pietro Peterlongo · Answer 9 · Fri Jan 13 2023 00:19:11 GMT+0800 (China Standard Time)

for the reference in python I think it might be here: https://github.com/python/cpython/blob/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal/libmpdec/basearith.c#L293

dlesnoff · Answer 10 · Fri Jan 13 2023 00:36:29 GMT+0800 (China Standard Time)

regarding the "knuth and python" notes, maybe @def- know a bit more but I imagine python implements a division algorithm by knuth and googling around my guess would be this is algo Algorithm 4.3.1D" from page 2 of Volume 1 of TAOCP, linking this HN conversation of an article about this since it seems interesting https://news.ycombinator.com/item?id=26562819 edit: actually the article references it as "Volume 2: Seminumerical Algorithms, Chapter 4.3: Multiple-Precision Arithmetic of The Art of Computer Programming, Donald Ervin Knuth presents Algorithm D (Division of nonnegative integers):"

I had a doubt whether it was in volume 1 or volume 2. Glad it is in volume 2.

Dennis Felsing · Answer 11 · Fri Jan 13 2023 00:44:27 GMT+0800 (China Standard Time)

for the reference in python I think it might be here: https://github.com/python/cpython/blob/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal/libmpdec/basearith.c#L293

That seems correct, also references the correct section of TAOCP.

dlesnoff · Answer 12 · Fri Jan 13 2023 05:36:21 GMT+0800 (China Standard Time)

From what I have read so far, @def- changed a little bit the normalization constant. I don't think this impact the mathematical properties that much, but we must be extra careful to the changes it occurs on the algorithm correctness.
Instead of :

    d = MPD_RADIX / (vconst[n-1] + 1);

like in the python code or TAOCP, he chose to multiply by this power of two:

let ls = 32 - bits(d.limbs[d.limbs.high])

I guess theorem B (in the same 4.3.1 section of TAOCP) saying that our guessed quotient is between q and q+2, does not hold anymore.
There is another normalization algorithm depicted in exercise 28, but it seems different.
There is the exercise 37 that proposes to avoid completely the normalization and unnormalization steps, when the radix is a power of two (which is our case). I don't know if it is what dennis has tried to do here. It still looks like a normalization step.

I can't debug it for now.

Pietro Peterlongo · Answer 13 · Fri Jan 13 2023 16:25:54 GMT+0800 (China Standard Time)

Mmh, at a first glance the quantity q1 seems to me computed the same as qHat in step D3 of the algorithm as stated in TAOCP (vv plays the role of u_j b + u_j+1 and wm1 is v1). Note that in the algorithm it does not use the same partial quotient as in Theorem B. Note also that in TAOCP low indices are used for most significant digits.

For reference here is a folder with relevant pages from (an old edition of) TAOCP : https://drive.google.com/drive/folders/1--9CWffNwg1fmA26RUXfcSkUPzF6gZLE

Pietro Peterlongo · Answer 14 · Fri Jan 13 2023 17:32:45 GMT+0800 (China Standard Time)

I would assume the bug happens (occasionally? always?) when we fall in the case where the partial quotient needs to be decreased by 1, which (as per Ex 21 of TAOCP) should happen with probability 3/b where b in our case is 2^32 (less than 1 in a billion chance). the debugging of the example above is consistent with this. Have not had time though to look at this more closely.

dlesnoff · Answer 15 · Fri Jan 13 2023 18:11:07 GMT+0800 (China Standard Time)

Note also that in TAOCP low indices are used for most significant digits.

They inverted that in my revised edition of TAOCP. This is now exactly the inverse. The high indices are used for most significant digits. That's wonderful.

Pietro Peterlongo · Answer 16 · Mon Jan 16 2023 23:19:19 GMT+0800 (China Standard Time)

for the reference in python I think it might be here: https://github.com/python/cpython/blob/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal/libmpdec/basearith.c#L293

after looking better at python, I think the correct implementation of long division used by python is here:
https://github.com/python/cpython/blob/5ef90eebfd90147c7e774cd5335ad4fe69a7227c/Objects/longobject.c#L2768

this is what is linked to in this article about integers in python: https://tenthousandmeters.com/blog/python-behind-the-scenes-8-how-python-integers-work/

the previous reference was to the same algorithm implementation in the self-contained c library mpdecimal which is wrapped and used in python's decimal module, see also https://www.bytereef.org/mpdecimal/index.html and https://github.com/python/cpython/tree/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal

dlesnoff · Answer 17 · Wed May 17 2023 22:46:11 GMT+0800 (China Standard Time)

I think it is important to write that the division implementation is flawed in the README.md. It would prevent people using the library while we seek a fix.

Mamy Ratsimbazafy · Answer 18 · Wed May 31 2023 01:40:13 GMT+0800 (China Standard Time)

While running my own fuzzing I also get this error pretty consistently.

Input

import bigints

let base = initBigInt(
  "7fffffffffffffffe0000000000001f8000000000001ffff00000007ffff",
  base = 16
)
let exp = initBigInt(
  "f0ff0f00e0ff07000000800300000000e0ffffffffffff01000000000000",
  base = 16
)
let M = initBigInt(
  "e00f0000000000000000ffff03004000feffffffffffff07000000000000c0",
  base = 16
)

let r = powmod(base, exp, M)

With a modified bigints for nice reporting:

[Failure] unsignedDivRem at `q1 <= uint32.high` with inputs:
[Failure]  n: 0x3fffffffffffffffe0000000000001f8040000000001ff8100000008000260403ffffffe07e03c1000001f83fc0c0001001fffec0002003ffff00001
[Failure]  d: 0xe00f0000000000000000ffff03004000feffffffffffff07000000000000c0
[Failure] called from powmod with inputs:
[Failure]  base:     0x7fffffffffffffffe0000000000001f8000000000001ffff00000007ffff
[Failure]  exponent: 0xf0ff0f00e0ff07000000800300000000e0ffffffffffff01000000000000
[Failure]  modulus:  0xe00f0000000000000000ffff03004000feffffffffffff07000000000000c0