xerial / snappy-java

Snappy compressor/decompressor for Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PureJavaSnappy causes BufferUnderflowException when reading w/ hadoop-common's SnappyCodec

chrisxaustin opened this issue · comments

I'm using hadoop-common to uncompress the hadoop snappy format since this library doesn't appear to support reading Hadoop-compatible data.
That library uses snappy-java in the SnappyCodec.

This works for me when the native snappy is available (I reuse the codec object, but I put it in the method to keep the sample shorter):

public byte[] decompressSnappy(byte[] in) throws IOException {
  SnappyCodec codec = new SnappyCodec();
  codec.setConf(new Configuration());
  try (InputStream stream = codec.createInputStream(new ByteArrayInputStream(in));) {
      return stream.readAllBytes();
  }
}

SnappyNative works perfectly, but when PureJavaSnappy is used I get a BufferUnderflowException.
After a painful debugging session I found that it seemed to be caused by the hadoop-common's SnappyDecompressor.decompressDirectBuf calling Snappy.uncompress, which calls PureJavaSnappy.rawUncompress, which left the uncompressed buffer in an inconsistent state.

With SnappyNative I saw "uncompressed position=14454, remaining=0, limit=14454"
With PureJavaSnappy I saw "uncompressed pos=14454, remaining=14454, limit=14454"

The next call to uncompressedDirectBuf.get triggered the BufferUnderflowException, with
uncompressed.get(0, 8192) called while the buffer had position=14454, remaining=0, limit=14454.

This would be related to #295

@xerial, while related, I think this is actually a different issue.
I think there needs to be a call to flip after calling position, prior to returning here.

Or maybe, instead of changing the position, just set the limit?