sstadick / crabz

When doing parallel compression of a gzip file, crabz uses ZBuilder which instantiates a ParCompress when num_threads > 1.

However, decompression always uses single-threaded MultiGzDecoder. Why is it not using ParDecompress when num_threads > 1?

The gzip format itself isn't able to make use of multiple threads. For parallel decompression use Mgzip or Bgzf formats which are block compression formats and can take advantage of multithreading for decompression.

The gzip format itself isn't able to make use of multiple threads. For parallel decompression use Mgzip or Bgzf formats which are block compression formats and can take advantage of multithreading for decompression.

But Crabz does create an mgz with num_threads > 1?

What I am trying to say is that crabz uses ZBuilder for the gzip compression, and ZBuilder creates an mgz if num_threads > 1 (using ParCompress).

https://github.com/sstadick/gzp/blob/4bba36567d19a74aa4b7f13b932c7c28f96fb812/src/lib.rs#L241

Correct. Normal Gzip is asymetrical. Multiple threads can be used to gzip compress a file, but regular gzip files can only be decompressed single-threaded.

Correct. Normal Gzip is asymetrical. Multiple threads can be used to gzip compress a file, but regular gzip files can only be decompressed single-threaded.

So I should be able to use ParDecompress to decompress it? (I am having some huge XML files I compress with parallel crabz and want to decompress them programmatically)

Ah, I'd recommend compressing them with crabz -f bgzf and then using gzp as follows:

            if num_threads == 0 {
                let mut reader = BgzfSyncReader::new(input);
                io::copy(&mut reader, &mut output)?;
                output.flush()?;
            } else {
                let mut reader = ParDecompressBuilder::<Bgzf>::new()
                    .num_threads(num_threads) // 4 threads per file is about where decompression maxes out. Anything more is not helping
                    .unwrap()
                    .pin_threads(pin_at) // pinning is very optional, could ignore it.
                    .from_reader(input);
                io::copy(&mut reader, &mut output)?;
                output.flush()?;
                reader.finish()?;
            };

From here:

crabz/src/main.rs

Line 443 in cce55c8

if num_threads == 0 {

I don't have as nice of an abstraction over decompression at this time. It's been on the todo list though!

Good questions, this does expose some weaknesses in the documentation.

Hm, I also tried the bgzf compression, but it was way slower than gzip.

What’s wrong with using gzip and ParCompress/ParDecompress?

That is extremely odd that BGZF compressions would be slower than gzip compression. Can you share the CLI invocation of crabz for both formats?

Nothing is wrong with using gzip with ParCompress, there is just no ParDecompress available for Gzip.

Some timings (1.74 GiB XML):

$ pv -N IN -c na.osm.xml | crabz -f gzip -l 9 | pv -N OUT -c > /dev/null
[2022-02-16T09:02:57Z INFO  crabz] Compressing (gzip) with 8 threads at compression level 9.
       IN: 1.74GiB 0:00:17 [ 104MiB/s] [=============================================>] 100%            
      OUT:  216MiB 0:00:17 [12.7MiB/s] [                     <=>                                       ]

$ pv -N IN -c na.osm.xml | crabz -f mgzip -l 12 | pv -N OUT -c > /dev/null
[2022-02-16T09:03:48Z INFO  crabz] Compressing (mgzip) with 8 threads at compression level 12.
       IN: 1.74GiB 0:01:32 [19.2MiB/s] [=============================================>] 100%            
      OUT:  187MiB 0:01:32 [2.02MiB/s] [        <=>                                                    ]
$

$ pv -N IN -c na.osm.xml | crabz -f bgzf -l 12 | pv -N OUT -c > /dev/null
[2022-02-16T09:06:19Z INFO  crabz] Compressing (bgzf) with 8 threads at compression level 12.
       IN: 1.74GiB 0:01:30 [19.6MiB/s] [=============================================>] 100%            
      OUT:  191MiB 0:01:30 [2.11MiB/s] [          <=>                                                  ]
$

You see that gzip is 5x faster than the others.

I played around a bit with your barebone gzp examples and created my own little CLI just for mgzip.

Actually, the performance varies a lot with the compression level. I get good values with:

use gzp::{
    deflate::Mgzip,
    par::compress::{ParCompress, ParCompressBuilder},
    Compression, ZWriter,
};
use std::io::{Read, Write};

type FORMAT = Mgzip;
const LEVEL: u32 = 10;
const THREADS: usize = 16;
const BUFSIZE: usize = 1024 * 1024;

fn main() {
    let chunksize = BUFSIZE * 2;

    let stdout = std::io::stdout();
    let mut writer: ParCompress<FORMAT> = ParCompressBuilder::new()
        .buffer_size(BUFSIZE)
        .unwrap()
        .compression_level(Compression::new(LEVEL))
        .num_threads(THREADS)
        .unwrap()
        .from_writer(stdout);

    let stdin = std::io::stdin();
    let mut stdin = stdin.lock();

    let mut buffer = Vec::with_capacity(chunksize);
    loop {
        let mut limit = (&mut stdin).take(chunksize as u64);
        limit.read_to_end(&mut buffer).unwrap();
        if buffer.is_empty() {
            break;
        }
        writer.write_all(&buffer).unwrap();
        buffer.clear();
    }
    writer.finish().unwrap();
}

Decompressor:

use gzp::{
    deflate::Mgzip,
    par::decompress::{ParDecompress, ParDecompressBuilder},
};
use std::io::{Read, Write};

type FORMAT = Mgzip;
const THREADS: usize = 16;

fn main() {
    let chunksize = 1 * 1024 * 1024;

    let stdin = std::io::stdin();

    let mut reader: ParDecompress<FORMAT> = ParDecompressBuilder::new()
        .num_threads(THREADS)
        .unwrap()
        .from_reader(stdin);

    let stdout = std::io::stdout();
    let mut stdout = stdout.lock();

    let mut buffer = Vec::with_capacity(chunksize);
    loop {
        let mut limit = (&mut reader).take(chunksize as u64);
        limit.read_to_end(&mut buffer).unwrap();
        if buffer.is_empty() {
            break;
        }
        stdout.write_all(&buffer).unwrap();
        buffer.clear();
    }
}

So, final conclusion:

With bzip2, I had roughly 5 MB/s speed (measured in throughput of the compressed size). So processing an xml.bz2 file of size ~17GB took more about one hour
With mgzip+gzp and ParDecompress, I am now getting roughly 250 MB/s. Hence, I can process the same file in ~80 seconds (just raw, without parsing the XML).

Now, XML-parsing is the bottleneck :-)

Nice! Those are solid results! It's worth noting that compression level 12 for BGZF / Mgzip != level 9 for gzip. It's actually more compressed (or it should be depending on the input). If you ran the same commands with level 8 or 9 for the block formats the times should even out.

The block compressors use libdeflate, which has the following docs on compression levels: https://github.com/ebiggers/libdeflate#compression-levels.

Also, anything more than ~4 threads for decompression doesn't seem to help in my benchmarking, and possibly slows things down a bit.

Thanks a lot and keep up the good work!

gzip should be handled by ParDecompress