PureJavaSnappy causes BufferUnderflowException when reading w/ hadoop-common's SnappyCodec
chrisxaustin opened this issue · comments
I'm using hadoop-common to uncompress the hadoop snappy format since this library doesn't appear to support reading Hadoop-compatible data.
That library uses snappy-java in the SnappyCodec.
This works for me when the native snappy is available (I reuse the codec object, but I put it in the method to keep the sample shorter):
public byte[] decompressSnappy(byte[] in) throws IOException {
SnappyCodec codec = new SnappyCodec();
codec.setConf(new Configuration());
try (InputStream stream = codec.createInputStream(new ByteArrayInputStream(in));) {
return stream.readAllBytes();
}
}
SnappyNative works perfectly, but when PureJavaSnappy is used I get a BufferUnderflowException.
After a painful debugging session I found that it seemed to be caused by the hadoop-common's SnappyDecompressor.decompressDirectBuf calling Snappy.uncompress, which calls PureJavaSnappy.rawUncompress, which left the uncompressed buffer in an inconsistent state.
With SnappyNative I saw "uncompressed position=14454, remaining=0, limit=14454"
With PureJavaSnappy I saw "uncompressed pos=14454, remaining=14454, limit=14454"
The next call to uncompressedDirectBuf.get triggered the BufferUnderflowException, with
uncompressed.get(0, 8192)
called while the buffer had position=14454, remaining=0, limit=14454.
This would be related to #295
Or maybe, instead of changing the position, just set the limit?
Dropped the pure-java support in https://github.com/xerial/snappy-java/releases/tag/v1.1.9.0