PureJavaSnappy causes BufferUnderflowException when reading w/ hadoop-common's SnappyCodec

Question

PureJavaSnappy causes BufferUnderflowException when reading w/ hadoop-common's SnappyCodec

chrisxaustin opened this issue 3 years ago · comments

I'm using hadoop-common to uncompress the hadoop snappy format since this library doesn't appear to support reading Hadoop-compatible data.
That library uses snappy-java in the SnappyCodec.

This works for me when the native snappy is available (I reuse the codec object, but I put it in the method to keep the sample shorter):

public byte[] decompressSnappy(byte[] in) throws IOException {
  SnappyCodec codec = new SnappyCodec();
  codec.setConf(new Configuration());
  try (InputStream stream = codec.createInputStream(new ByteArrayInputStream(in));) {
      return stream.readAllBytes();
  }
}

SnappyNative works perfectly, but when PureJavaSnappy is used I get a BufferUnderflowException.
After a painful debugging session I found that it seemed to be caused by the hadoop-common's SnappyDecompressor.decompressDirectBuf calling Snappy.uncompress, which calls PureJavaSnappy.rawUncompress, which left the uncompressed buffer in an inconsistent state.

With SnappyNative I saw "uncompressed position=14454, remaining=0, limit=14454"
With PureJavaSnappy I saw "uncompressed pos=14454, remaining=14454, limit=14454"

The next call to uncompressedDirectBuf.get triggered the BufferUnderflowException, with
uncompressed.get(0, 8192) called while the buffer had position=14454, remaining=0, limit=14454.

Taro L. Saito · Answer 1 · Fri Oct 29 2021 01:21:08 GMT+0800 (China Standard Time)

This would be related to #295

Brett Okken · Answer 2 · Fri Oct 29 2021 04:39:49 GMT+0800 (China Standard Time)

@xerial, while related, I think this is actually a different issue.
I think there needs to be a call to flip after calling position, prior to returning here.

Brett Okken · Answer 3 · Fri Oct 29 2021 20:29:36 GMT+0800 (China Standard Time)

Or maybe, instead of changing the position, just set the limit?

Taro L. Saito · Answer 4 · Sun Jan 29 2023 00:21:19 GMT+0800 (China Standard Time)

Dropped the pure-java support in https://github.com/xerial/snappy-java/releases/tag/v1.1.9.0