LZFXS

Overview
History
LZFXS buffer format
- Literal runs
- Back references
LZFXS file format
API
- Buffer-to-buffer compression
- Buffer-to-buffer decompression

Overview

LZFXS is a compact and low-complexity data compression algorithm.

History

LZFXS is based on Simplified LZFX by Jan Ding, which is based on LZFX by Andrew Collette, which is based on LZF by Marc Lehmann.

LZFXS buffer format

There are two kinds of data structures in LZFXS: literal runs and back references.

Literal runs

Literals are encoded as follows:

The length of a literal run is encoded as L - 1, as it must contain at least one byte.

000LLLLL <L+1 bytes>

Back references

Back references are encoded as follows:

The smallest possible encoded length value is 1, since otherwise, the control byte would be recognized as a literal run. Since at least three bytes must match for a back reference to be inserted, the length is encoded as L - 2 instead of L - 1.

The offset (distance to the desired data in the output buffer) is encoded as o - 1, as all offsets are at least 1.

The binary format is:

LLLooooo oooooooo           for backrefs of real length  < 9  (1 <= L < 7)
111ooooo LLLLLLLL oooooooo  for backrefs of real length >= 9  (L > 7)

LZFXS file format

An LZFXS file (as opposed to a compressed buffer) is made up of a sequence of blocks. Each block contains a header followed by a data payload.

Block header format

Each block begins with a 10-byte header in the following format:

'L'	'Z'	'F'	'X'	K1	K2	L1	L2	L3	L4

K1 and K2 are a two-byte code identifying the contents, or kind, of the block. The values are stored in big-endian format; K1 is the high byte and K2 is the low byte.

The currently defined kinds are:

`0`	Reserved
`1`	Compressed (LZFXS) data block
`2`	Uncompressed data block

L1 through L4 contain the length of the data payload. L1 is the most significant byte and L4 is the least. Decompression programs which cannot understand a given type code are required to skip the block.

Uncompressed blocks

The block contains raw data which should be copied to the output file.

Compressed blocks

The first four bytes contain the uncompressed size of the data, in big-endian format. The remainder of the data payload contains compressed data.

API

Buffer-to-buffer compression

int lzfxs_compress(const void* ibuf, unsigned int ilen,
                         void* obuf, unsigned int *olen);

Supply pre-allocated input and output buffers via ibuf and obuf, and their sizes, in bytes, via ilen and olen. Buffers may not overlap.

On success, the function returns a non-negative value and the argument olen contains the compressed size in bytes.

On failure, a negative value is returned and olen is not modified.

Buffer-to-buffer decompression

int lzfxs_decompress(const void* ibuf, unsigned int ilen,
                           void* obuf, unsigned int *olen);

Supply pre-allocated input and output buffers via ibuf and obuf, and their sizes, in bytes, via ilen and olen. Buffers may not overlap.

On success, the function returns a non-negative value and the argument olen contains the uncompressed size in bytes.

On failure, a negative value is returned.

If the failure code is LZFXS_ESIZE, olen contains the minimum buffer size required to hold the decompressed data. Otherwise, olen is not modified.

Supplying a zero *olen is a valid and supported strategy to determine the required buffer size. This does not require decompression of the entire stream and is consequently very fast. Argument obuf may be NULL in this case only.

johnsonjh / lzfxs