pombreda / bundle

Bundle is an embeddable compression library that supports ZIP, LZMA, LZIP, ZPAQ, LZ4, ZSTD, BROTLI, BSC and SHOCO (C++03)(C++11)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bundle

  • Bundle is an embeddable compression library that supports ZIP, LZMA, LZIP, ZPAQ, LZ4, ZSTD, BROTLI, BSC and SHOCO (C++03)(C++11).
  • Bundle is optimized for highest compression ratios on each compressor, where possible.
  • Bundle is optimized for fastest decompression times on each decompressor, where possible.
  • Bundle is easy to integrate, comes in an amalgamated distribution.
  • Bundle is tiny. Header and source files. Self-contained, dependencies included.
  • Bundle is cross-platform.
  • Bundle is BOOST licensed.

bundle stream format

[b000000000111xxxx]  Header (12 bits). De/compression algorithm (4 bits)
                     { NONE, SHOCO, LZ4, DEFLATE, LZIP, LZMA20, ZPAQ, LZ4HC, BROTLI, ZSTD, LZMA25, BSC }.
[vle_unpacked_size]  Unpacked size of the stream (N bytes). Data is stored in a variable
                     length encoding value, where bytes are just shifted and added into a
                     big accumulator until MSB is found.
[vle_packed_size]    Packed size of the stream (N bytes). Data is stored in a variable
                     length encoding value, where bytes are just shifted and added into a
                     big accumulator until MSB is found.
[bitstream]          Compressed bitstream (N bytes). As returned by compressor.
                     If possible, header-less bitstreams are preferred.

bundle archive format

- Files/datas are packed into streams by using any compression method (see above)
- Streams are archived into a standard ZIP file:
  - ZIP entry compression is (0) for packed streams and (1-9) for unpacked streams.
  - ZIP entry comment is a serialized JSON of (file) meta-datas (@todo).
- Note: you can mix streams of different algorithms into the very same ZIP archive.

sample

#include <cassert>
#include "bundle.hpp"

int main() {
    // 55 mb dataset
    std::string original( "There's a lady who's sure all that glitters is gold" );
    for (int i = 0; i < 20; ++i) original += original + std::string( i + 1, 32 + i );

    // pack, unpack & verify
    using namespace bundle;
    std::vector<unsigned> libs { RAW, LZ4, LZ4HC, SHOCO, MINIZ, LZMA20, LZIP, LZMA25, ZPAQ, BROTLI, ZSTD, BSC };
    for( auto &use : libs ) {
        std::string packed = pack(use, original);
        std::string unpacked = unpack(packed);
        std::cout << name_of(use) << ": " << original.size() << " to " << packed.size() << " bytes" << std::endl;
        assert( original == unpacked );
    }

    std::cout << "All ok." << std::endl;
}

possible output

[ OK ] NONE: ratio=0% enctime=29002us dectime=15001us (zlen=55574506 bytes)
[ OK ] LZ4: ratio=96.2244% enctime=29002us dectime=20002us (zlen=2098285 bytes)
[ OK ] LZ4HC: ratio=99.5944% enctime=235023us dectime=17001us (zlen=225409 bytes)
[ OK ] SHOCO: ratio=26.4155% enctime=374037us dectime=266026us (zlen=40894196 bytes)
[ OK ] MINIZ: ratio=99.4327% enctime=228022us dectime=20002us (zlen=315271 bytes)
[ OK ] LZMA20: ratio=99.9346% enctime=2917291us dectime=51005us (zlen=36355 bytes)
[ OK ] LZIP: ratio=99.9574% enctime=3091306us dectime=184018us (zlen=23651 bytes)
[ OK ] LZMA25: ratio=99.9667% enctime=3030303us dectime=50005us (zlen=18513 bytes)
[ OK ] ZPAQ: ratio=99.9969% enctime=100332432us dectime=101158165us (zlen=1743 bytes)
[ OK ] BROTLI: ratio=99.9982% enctime=3673829us dectime=114723us (zlen=1019 bytes)
[ OK ] ZSTD: ratio=99.8687% enctime=25002us dectime=18001us (zlen=72969 bytes)
[ OK ] BSC: ratio=99.9991% enctime=53005us dectime=63006us (zlen=524 bytes)
All ok.

on picking up compressors (on regular basis)

  • sorted by compression ratio
    • zpaq < lzma25 / bsc < lzip < lzma20 < brotli < zstd < miniz < lz4hc < lz4
  • sorted by compression time
    • lz4 < lz4hc < zstd < miniz < lzma20 < lzip < lzma25 / bsc << zpaq <<< brotli
  • sorted by decompression time
    • lz4hc < lz4 < zstd < miniz < brotli < lzma20 / lzma25 < lzip < bsc << zpaq
  • sorted by memory overhead
    • lz4 < lz4hc < zstd < miniz < brotli < lzma20 < lzip < lzma25 / bsc < zpaq
  • and maybe use SHOCO for plain text ascii IDs (SHOCO is an entropy text-compressor)

functional api

- bool is_packed( T )
- bool is_unpacked( T )
- T pack( unsigned q, T )
- bool pack( unsigned q, T out, U in )
- bool pack( unsigned q, const char *in, size_t len, char *out, size_t &zlen )
- T unpack( T )
- bool unpack( unsigned q, T out, U in )
- bool unpack( unsigned q, const char *in, size_t len, char *out, size_t &zlen )
- unsigned type_of( string )
- string name_of( string )
- string version_of( string )
- string ext_of( string )
- size_t length( string )
- size_t zlength( string )
- void *zptr( string )
- size_t bound( unsigned q, size_t len )
- const char *const name_of( unsigned q )
- const char *const version( unsigned q )
- const char *const ext_of( unsigned q )
- unsigned type_of( const void *mem, size_t size )

archival api

struct file : map<string,string> { // ~map of properties
  bool has(property);              // property check
  string &get(property);           // property access
};
struct archive : vector<file>    { // ~sequence of files
  void bin(string);                // serialization
  string bin() const;              // serialization
  string toc() const;              // debug
};

licenses

  • bundle, BOOST license.
  • brotli by Jyrki Alakuijala and Zoltan Szabadka, Apache 2.0 license.
  • easylzma by Igor Pavlov and Lloyd Hilaiel, public domain.
  • giant, BOOST license.
  • libzpaq by Matt Mahoney, public domain.
  • libbsc by Ilya Grebnov, Apache 2.0 license.
  • lz4 by Yann Collet, BSD license.
  • miniz by Rich Geldreich, public domain.
  • shoco by Christian Schramm, MIT license.
  • zstd by Yann Collet, BSD license.

evaluated alternatives

FastLZ, FLZP, LibLZF, LZFX, LZHAM, LZJB, LZLIB, LZO, LZP, SMAZ, Snappy, ZLIB, bzip2, Yappy

About

Bundle is an embeddable compression library that supports ZIP, LZMA, LZIP, ZPAQ, LZ4, ZSTD, BROTLI, BSC and SHOCO (C++03)(C++11)

License:Boost Software License 1.0


Languages

Language:C++ 71.9%Language:C 22.7%Language:Cuda 4.0%Language:Makefile 0.4%Language:HTML 0.4%Language:Python 0.3%Language:CMake 0.1%Language:Groff 0.1%Language:JavaScript 0.0%Language:Shell 0.0%