ebiggers / libdeflate

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optional Dictionary based compression?

JesseRMeyer opened this issue · comments

Hi, thanks for the great library.

My use case involves compressing many small streams that share similar data distribution characteristics. LZ4/ZSTD offer prepending to the de/compressor state some prebaked 'dictionary' of common matches that can radically improve ratio and timings. A simple and powerful tool.

Is optional dictionary support compatible with the goals of this project? If so, is it a planned feature?

Best,
Jesse

I added stream & multi-thread support for libdeflate, code at stream_mt , more ref #335
this work added new API libdeflate_deflate_compress_block() & libdeflate_deflate_decompress_block() can by used for this request.

  1. first, you need to create a 32k text dictionary with other tools.
  2. concatenate the short data you want to compress behind this dictionary data buffer each time, and then call the compress function like this:
compresed_code_nbytes=libdeflate_deflate_compress_block(c,in_dict_and_short,dict_nbytes,short_nbytes,
                                                        1,out_code,out_code_nbytes_avail,NULL);
  1. when decompressing, you must use the same dictionary and place it in the uncompressed data buffer, and after calling the decompress function, your uncompressed short data will be placed behind the dictionary data; call the decompress function like this:
err_ret=libdeflate_deflate_decompress_block(d,in_code,code_nbytes,out_dict_and_short,dict_nbytes,
                                            out_short_nbytes,NULL,out_code_nbytes_avail,
                                            LIBDEFLATE_STOP_BY_FINAL_BLOCK,NULL);

if used zlib, you can used inflateSetDictionary() + inflate() do the same thing, this compressed code is stay compatible.