DataDog / zstd

Zstd wrapper for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Unconfirmed] Data corruption after recent upgrade

mappu opened this issue · comments

commented

Hi,

Our application stores and loads compressed data on disk using this library. We recently upgraded our application from 0727e17 (tag v1.3.0) to aebefd9 (tag v1.3.4).

After this library upgrade, there were some complaints from users. After generating some JSON data, compressing it, storing it, then later loading it and decompressing it, the data could no longer be parsed (e.g. json.Unmarshal: invalid character '\x00' in string literal).

The issue occurred on (at least) both Windows and macOS, with both Go 1.9.7 and 1.10.3.

I assume it was caused by memory corruption in CGO data buffers.

Reverting this library back to tag v1.3.0 seems to have completely resolved the issue going-forward.

We're not yet able to reproduce the issue, and not all our users were affected (perhaps it's related to OS memory pressure?), but, just a heads-up that this library upgrade was implicated. Once we have an internal reproducer we may be able to bisect it (or hopefully, blame something else and disregard this entire issue).

In WAL-G we observed some related data corruptions. Currently, we are hunting this too.
We have the unstable repro, but it requires a lot of setups, PostgreSQL, PITR, S3 etc.
I'll try testing 1.3.0, but it is very sporadic, kind of depends on the phase of the moon and whether on Mars.

Thanks for reporting!
I will try to add a fuzzer to see if we can uncover a bug.
In the meantime if you'd have any hint on the size of the payload, parallelism, type of data (or even better a reproducible payload), that would be great

commented

facebook/zstd#1300 may be related (don't know).

In my case I cannot reproduce the problem with less than 2 MAXGOROUTINES, compressed size does not exceed few Gb. I still suspect that there might be something broken in WAL-G, but lz4 and lzma do not fail on similar tests (but this is not a proof, actually, just a hint...)
v1.3.0 does not work for us either.
Maybe @Tinsane can add some more details.