suminb / base62

Python module for base62 encoding; a URL-safe encoding for arbitrary data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ignoring leading zero bytes

rubickcz opened this issue · comments

Hello,

first of all, thank you for this library. I am using it for encoding 16 byte blocks and I have noticed, that during encoding, leading bytes that are equal to 0x00 are ignored. This is due to conversion to integer, which the library internally does. I believe this is not a correct behavior, because without knowledge of the input bytes block length, you cannot reconstruct (decode) the original input from output. But for example in encryption (and many other areas), all bytes (incl. leading zero bytes) matter.

I'll give an example using base64, which does this correctly:

encoded = b64encode(b'\x00\x00\x01').decode()
print(encoded)
decoded = b64decode(encoded)
print(decoded)

This code yields:

AAAB
b'\x00\x00\x01'

Now your library:

encoded = base62.encodebytes(b'\x00\x00\x01')
print(encoded)
decoded = base62.decodebytes(encoded)
print(decoded)

Yields:

1
b'\x01'

As you can see, decoded output is not equal the input (it misses the two leading zero bytes).

Thanks for your report. I'll take a look at this shortly (hopefully...)

Hello @suminb ,

I ran into the same issue.
My problem originates from the computation of the same hash on a web-application and on a backend.
Would be nice, if this issue could be fixed very soon - until then, I have to fix this on frontend-side in a quick-and-dirty way, by ignoring leading zeros ^^

@WaldemarEnns If you don't insist on base62, you can try this library implementing base58:
https://github.com/keis/base58

It handles leading zero bytes correctly.