rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust

Home Page:https://docs.rs/flate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Continue reading a stream after ZlibDecoder streams finishes

marxin opened this issue · comments

I'm implementing the parsing of the git pack file format as part of the coding challenge:
https://app.codecrafters.io/courses/git/stages/7

It seems the git pack format is a binary file format where each object contains a header followed by a Zlib compressed stream. What's unpleasant one doesn't know the size of the compressed block. Is it possible to get back the underlying stream (with into_inner or get_mut) including the buffer data by ZlibDecoder so that I can carry on reading another object header?

This should work with bufread::ZlibDecoder. See this test for bufread::GzDecoder. This code modified for zlib should work the same, allowing trailing data to be read from the BufRead after calling into_inner().

Note that the same test does not work for read::GzDecoder and similarly I do not expect it to work with read::ZlibDecoder.

#402 adapts the gzip test to demonstrate that this does also work for deflate and zlib BufRead decoders.

Thank you very much for the fast response! It's great the current bufread::ZlibDecoder works as I needed.
I can confirm it works for me in my particular test-case.

Have 2 comments:

  • Would it be possible to document the behavior here: https://docs.rs/flate2/latest/flate2/bufread/struct.ZlibDecoder.html#method.into_inner ? Plus I would also make a caveat at read::ZlibDecoder::into_inner that one can't do the same.
  • Just out of curiosity: why does e.g. bufread::ZlibDecoder does not implement std::io::BufRead (would be handy as one does not have to wrap the bufread::ZlibDecoder again into a BufReader::new()?

As the original question was answered with tests, I think it's fair to close this issue despite inviting for continuing the conversation here.

Regarding documentation, please feel free to open a PR with the improvement to the docs that you would have wanted to see. Maybe you can play around with ZlibDecoder and implementing BufRead on it as well. Maybe even more improvements arise from that :).

There is an existing discussion on why the bufread decoders do not implement BufRead.

The docs for bufread and write GzDecoder have text describing this behaviour. This can be copied to the docs for the other decoders.

The docs for bufread and write GzDecoder have text describing this behaviour.

Can you please send me a link to the behavior description? I can't find it :)

bufread:

flate2-rs/src/gz/bufread.rs

Lines 171 to 174 in 8a502a7

/// After reading a single member of the gzip data this reader will return
/// Ok(0) even if there are more bytes available in the underlying reader.
/// If you need the following bytes, call `into_inner()` after Ok(0) to
/// recover the underlying reader.

write:

flate2-rs/src/gz/write.rs

Lines 174 to 176 in 8a502a7

/// After decoding a single member of the gzip data this writer will return the number of bytes up to
/// to the end of the gzip member and subsequent writes will return Ok(0) allowing the caller to
/// handle any data following the gzip member.

And there is an equivalent paragraph for the read decoder to say that this does not work:

flate2-rs/src/gz/read.rs

Lines 97 to 101 in 8a502a7

/// After reading a single member of the gzip data this reader will return
/// Ok(0) even if there are more bytes available in the underlying reader.
/// `GzDecoder` may have read additional bytes past the end of the gzip data.
/// If you need the following bytes, wrap the `Reader` in a `std::io::BufReader`
/// and use `bufread::GzDecoder` instead.