nim-lang / bigints

BigInts for Nim

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Division causes assert failure

guidovranken opened this issue · comments

import bigints

var a = initBigint("6277101735386680763835789123314955362437298222279840143829")
var b = initBigint("1461501637330902918203684832716283019655932313743")
var r = a.div(b)

Output:

Error: unhandled exception: /home/jhg/.nimble/pkgs/bigints-1.0.0/bigints.nim(652, 14) `q1 <= uint32.high`

Using latest repository version of bigints, Nim 1.6.10 on Linux x64.

Wow that's bad, thanks for reporting! I can reproduce on playground: https://play.nim-lang.org/#ix=4kYW

I checked going back to older commits and it seems to be a very old issue: already present in (Aug 2020, after merging a PR by Araq to make it work with nim devel) 8cb412f

going further in the past is hard since I could not find a nim version that would make bigints compile without additional changes. probably not worth it to go that way back.

doing a bit of debugging but need to stop now, will try to look into this later (if someone wants to pick this up, most welcome)

for ref this is how I annotated unsignedDivRem (where the bug occurs)

import sugar

# From Knuth and Python
func unsignedDivRem(q, r: var BigInt, n, d: BigInt) =
  var
    nn = n.limbs.len
    dn = d.limbs.len
  dump nn
  dump dn

  if n.isZero:
    q = zero
    r = zero
  elif nn < dn:
    # n < d
    q = zero
    r = n
  elif dn == 1:
    var x: uint32
    unsignedDivRem(q, x, n, d.limbs[0])
    r.limbs = @[x]
    r.isNegative = false
  else:
    assert nn >= dn and dn >= 2

    # normalize
    let ls = 32 - bits(d.limbs[d.limbs.high])
    r = d shl ls
    q = n shl ls
    if q.limbs.len > n.limbs.len or q.limbs[q.limbs.high] >= r.limbs[r.limbs.high]:
      q.limbs.add(0'u32)
      inc(nn)

    let k = nn - dn
    assert k >= 0
    var a: BigInt
    a.limbs.setLen(k)
    let wm1 = r.limbs[r.limbs.high]
    let wm2 = r.limbs[r.limbs.high-1]
    var ak = k

    var zhi = zero
    var z = zero
    var qib = zero
    var q1b = zero

    dump q
    dump r
    dump nn
    dump dn
    dump k

    for v in countdown(k-1, 0):
      dump v
      # estimate quotient digit, may rarely overestimate by 1
      let vtop = q.limbs[v + dn]
      assert vtop <= wm1
      let vv = (uint64(vtop) shl 32) or q.limbs[v+dn-1]
      var q1 = vv div wm1
      var r1 = vv mod wm1

      while (wm2 * q1) > ((r1 shl 32) or q.limbs[v+dn-2]):
        debugEcho "overestimating"
        dump (wm2 * q1)
        dump ((r1 shl 32) or q.limbs[v+dn-2])
        dec q1
        r1 += wm1
        if r1 > uint32.high:
          debugEcho "breaking on r1"
          break
        dump (wm2 * q1)
        dump ((r1 shl 32) or q.limbs[v+dn-2])

      dump q1
      dump uint32.high
      assert q1 <= uint32.high  # error occurs here

      ...

and this debug output with above example:

nn = 6
dn = 5
q = (limbs: @[3572406741, 1867334985, 3529740808, 4294963510, 4294967295, 4294967295, 0], isNegative: false)
r = (limbs: @[4294738063, 4294967295, 4294967295, 4294967295, 4294967295], isNegative: false)
nn = 7
dn = 5
k = 2
v = 1
q1 = 1
uint32.high = 4294967295
v = 0
overestimating
(wm2 * q1) = 18446744073709551615
((r1 shl 32) or q.limbs[v + dn - 2]) = 4294963510
(wm2 * q1) = 18446744069414584320
((r1 shl 32) or q.limbs[v + dn - 2]) = 18446744073709547830
q1 = 4294967296
uint32.high = 4294967295

@guidovranken Thanks for the report.
@pietroppeter Thanks for the debugging and insightful stacktrace. No need to go further down the commit history, I guess the error is here from the very first time this algorithm was committed.
« From Knuth and Python » is a very obscure reference. It will be hard to find where the algorithm and the notations come from.
I will check in TAOCP vol.2 tonight.

You're welcome. Is this library currently used in production anywhere?

Short answer: no.
It was planned to be used in Nim's compiler : nim-lang/Nim#14696

Ok. I found this with my math&crypto fuzzer: https://github.com/guidovranken/cryptofuzz
If this library ends up getting used in important software we can fuzz it on OSS-Fuzz: https://github.com/google/oss-fuzz

ah, nice to know it was found with a fuzzer. After your example I actually thought, we should fuzz this library... I was thinking bout using a recently implemented library from planetis https://github.com/status-im/nim-drchaos (there is also a video from nimconf 2022), but the list of bugs found by your library is definitely impressive!

regarding the "knuth and python" notes, maybe @def- know a bit more but I imagine python implements a division algorithm by knuth and googling around my guess would be this is algo Algorithm 4.3.1D" from page 2 of Volume 1 of TAOCP, linking this HN conversation of an article about this since it seems interesting https://news.ycombinator.com/item?id=26562819
edit: actually the article references it as "Volume 2: Seminumerical Algorithms, Chapter 4.3: Multiple-Precision Arithmetic of The Art of Computer Programming, Donald Ervin Knuth presents Algorithm D (Division of nonnegative integers):"

regarding the "knuth and python" notes, maybe @def- know a bit more but I imagine python implements a division algorithm by knuth and googling around my guess would be this is algo Algorithm 4.3.1D" from page 2 of Volume 1 of TAOCP, linking this HN conversation of an article about this since it seems interesting https://news.ycombinator.com/item?id=26562819 edit: actually the article references it as "Volume 2: Seminumerical Algorithms, Chapter 4.3: Multiple-Precision Arithmetic of The Art of Computer Programming, Donald Ervin Knuth presents Algorithm D (Division of nonnegative integers):"

I had a doubt whether it was in volume 1 or volume 2. Glad it is in volume 2.

for the reference in python I think it might be here: https://github.com/python/cpython/blob/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal/libmpdec/basearith.c#L293

That seems correct, also references the correct section of TAOCP.

From what I have read so far, @def- changed a little bit the normalization constant. I don't think this impact the mathematical properties that much, but we must be extra careful to the changes it occurs on the algorithm correctness.
Instead of :

    d = MPD_RADIX / (vconst[n-1] + 1);

like in the python code or TAOCP, he chose to multiply by this power of two:

let ls = 32 - bits(d.limbs[d.limbs.high])

I guess theorem B (in the same 4.3.1 section of TAOCP) saying that our guessed quotient is between q and q+2, does not hold anymore.
There is another normalization algorithm depicted in exercise 28, but it seems different.
There is the exercise 37 that proposes to avoid completely the normalization and unnormalization steps, when the radix is a power of two (which is our case). I don't know if it is what dennis has tried to do here. It still looks like a normalization step.

I can't debug it for now.

Mmh, at a first glance the quantity q1 seems to me computed the same as qHat in step D3 of the algorithm as stated in TAOCP (vv plays the role of u_j b + u_j+1 and wm1 is v1). Note that in the algorithm it does not use the same partial quotient as in Theorem B. Note also that in TAOCP low indices are used for most significant digits.

For reference here is a folder with relevant pages from (an old edition of) TAOCP : https://drive.google.com/drive/folders/1--9CWffNwg1fmA26RUXfcSkUPzF6gZLE

I would assume the bug happens (occasionally? always?) when we fall in the case where the partial quotient needs to be decreased by 1, which (as per Ex 21 of TAOCP) should happen with probability 3/b where b in our case is 2^32 (less than 1 in a billion chance). the debugging of the example above is consistent with this. Have not had time though to look at this more closely.

Note also that in TAOCP low indices are used for most significant digits.

They inverted that in my revised edition of TAOCP. This is now exactly the inverse. The high indices are used for most significant digits. That's wonderful.

for the reference in python I think it might be here: https://github.com/python/cpython/blob/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal/libmpdec/basearith.c#L293

after looking better at python, I think the correct implementation of long division used by python is here:
https://github.com/python/cpython/blob/5ef90eebfd90147c7e774cd5335ad4fe69a7227c/Objects/longobject.c#L2768

this is what is linked to in this article about integers in python: https://tenthousandmeters.com/blog/python-behind-the-scenes-8-how-python-integers-work/

the previous reference was to the same algorithm implementation in the self-contained c library mpdecimal which is wrapped and used in python's decimal module, see also https://www.bytereef.org/mpdecimal/index.html and https://github.com/python/cpython/tree/2161bbf243983c625e8f24cdf43b757f2a21463b/Modules/_decimal

I think it is important to write that the division implementation is flawed in the README.md. It would prevent people using the library while we seek a fix.

While running my own fuzzing I also get this error pretty consistently.

Input

import bigints

let base = initBigInt(
  "7fffffffffffffffe0000000000001f8000000000001ffff00000007ffff",
  base = 16
)
let exp = initBigInt(
  "f0ff0f00e0ff07000000800300000000e0ffffffffffff01000000000000",
  base = 16
)
let M = initBigInt(
  "e00f0000000000000000ffff03004000feffffffffffff07000000000000c0",
  base = 16
)

let r = powmod(base, exp, M)

With a modified bigints for nice reporting:

[Failure] unsignedDivRem at `q1 <= uint32.high` with inputs:
[Failure]  n: 0x3fffffffffffffffe0000000000001f8040000000001ff8100000008000260403ffffffe07e03c1000001f83fc0c0001001fffec0002003ffff00001
[Failure]  d: 0xe00f0000000000000000ffff03004000feffffffffffff07000000000000c0
[Failure] called from powmod with inputs:
[Failure]  base:     0x7fffffffffffffffe0000000000001f8000000000001ffff00000007ffff
[Failure]  exponent: 0xf0ff0f00e0ff07000000800300000000e0ffffffffffff01000000000000
[Failure]  modulus:  0xe00f0000000000000000ffff03004000feffffffffffff07000000000000c0