tatsuya0619 / lzabt

alpha balanced grammar compression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LZD

This is an implementation of the LZ Double-factor factorization (LZD factorization). LZD is a simple extension of the well-known compression algorithm LZ78. While LZ78 factorize an input string to the sequence of pairs of a longest previous factor and the succeeding character, LZD factorize an input string to the sequence of pairs of a longest previous factor and the succeeding longest previous factor. LZD shows better compression ratio than LZ78 in practical.

LZ-ABT

LZ-ABT is a new LZ78-style grammar compression algorithm which is a simple online algorithm to create. This can avoid the lower-bound time of the naive algorithms for LZD, other LZ78-style compression algorithms, which was observed in [Badkobeh et. al., SPIRE 2017, 51-67]

Compile

We use SCons to build source codes. To compile, you just type following command in the top of the project directory. Binary files are put in out directory.

$ scons

Note that you may have to modify some settings such as compiler in SConstruct for your environment.

Compress

Usage

$ out/compress
Usage  : out/compress [options]
Options: 
  -f FileName           : input file
  -o FileName           : output file
  -c                    : check whether decompressed string equals the input
  -d NUM                : set the debug level
  -l maxSize            : set max code size
  -a lzd_ns             : LZD not using stream
  -a lzd                : LZD
  -a lzabt_ns           : LZABT_NS
  -a lzabt              : LZABT
  -a lzabt_comp         : LZABT in compressed space

Examples

The following command compresses SConstruct, and output to hoge.lz by the algorithm LZ-ABT.

$ out/compress -f SConstruct -o hoge.lz -a lzabt -p 0.2

Decompress

Usage

% out/lzdDecompress
Usage  : out/decompress [options]
Options                 : 
  -f FileName           : input file
  -o FileName           : output file
  -d NUM                : debug mode

To decode binary file encoded with 'out/comress' command, use 'out/decompress'

Examples

The following command decompresses hoge.lz and output to decoded.txt. All encoded files compressed by 'out/compress' have the same encode format

$ out/deecompress -f hoge.lz -o decoded.txt

References

The detail of the algorithm was described in the following papers.

Keisuke Goto, Hideo Bannai, Shunsuke Inenaga and Masayuki Takeda. LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding, In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching

Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake, Tomohiro I and Hiroshi Sakamoto. LZ-ABT: A Practical Algorithm for \alpha-Balanced Grammar Compression, In Proceedings of the 29th International Workshop on Combinatorial Algorithms

License

Copyright (c) 2018 Tatsuya Ohno. This software released under the MIT License.

About

alpha balanced grammar compression

License:Other


Languages

Language:C++ 57.2%Language:C 22.5%Language:Python 18.6%Language:Shell 0.9%Language:Makefile 0.8%Language:Dockerfile 0.1%