amallia / VTEnc

VTEnc C library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VTEnc

VTEnc is a compression algorithm for sorted lists of unsigned integers. This C library provides an interface to use VTEnc with different data types (8, 16, 32 and 64 bits) and with both lists and sets.

Results

This is a work-in-progress section. Last updated on: 2019-11-10

For reference, VTEnc has been tested on a laptop Ubuntu Desktop 19.04 with a Core i7-6700HQ CPU @ 2.60GHz x 8, using different data sets and comparing it with other integer compression algorithms. See Tests and Benchmarks below for further information.

  • Timestamps (ts.txt):
Algorithm Encoded Size Ratio % Encoding Speed Decoding Speed
VTEnc 21,686 0.0038 60.9155 G/s 734.54 M/s
Delta+FastPFor256 1,179,312 0.20 2.00714 G/s 4.75146 G/s
Delta+FastPFor128 2,306,544 0.40 1.9029 G/s 4.82925 G/s
Delta+BinaryPacking 4,552,280 0.79 8.5867 G/s 5.77439 G/s
Delta+VariableByte 144,285,504 25.0 4.86063 G/s 5.09461 G/s
Delta+VarIntGB 180,356,880 31.25 6.75428 G/s 9.2638 G/s
Copy 577,141,992 100.0 10.4087 G/s -
  • gov2.sorted:
Algorithm Encoded Size Ratio % Encoding Speed Decoding Speed
VTEnc 2,889,599,350 12.08 - -
memcpy 23,918,861,764 100.0 - -

API

The documented interface is found in the header file vtenc.h.

Examples

Encoding example:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "vtenc.h"

int main()
{
  const uint8_t in[] = {13, 14, 29, 39, 65, 80, 88, 106, 152, 154, 155, 177};
  const size_t in_len = sizeof(in) / sizeof(in[0]);
  const size_t out_cap = vtenc_list_max_encoded_size_u8(in_len); /* output capacity */
  uint8_t *out=NULL;
  size_t out_len=0;
  VtencErrorCode code;

  /* allocate `out_cap` bytes */
  out = (uint8_t *) malloc(out_cap * sizeof(uint8_t));
  if (out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 1;
  }

  /* encode `in` list into `out` stream of bytes */
  code = vtenc_list_encode_u8(in, in_len, out, out_cap, &out_len);
  if (code != VtencErrorNoError) {
    fprintf(stderr, "failed encoding with error code: %u\n", code);
    free(out);
    return 1;
  }

  /* here `out` holds the encoded output, which is `out_len` bytes long */

  free(out);

  return 0;
}

Decoding example:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "vtenc.h"

int main()
{
  const uint8_t in[] = {0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x90, 0x26,
    0x24, 0x8d, 0x75, 0xfd, 0x95, 0x83, 0x9b, 0x03};
  const size_t in_len = sizeof(in) / sizeof(in[0]);
  size_t out_len = vtenc_list_decoded_size_u8(in, in_len); /* output length */
  uint8_t *out=NULL;
  VtencErrorCode code;

  /* allocate `out_len` items */
  out = (uint8_t *) malloc(out_len * sizeof(uint8_t));
  if (out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 1;
  }

  /* decode `in` stream of bytes into `out` list */
  code = vtenc_list_decode_u8(in, in_len, out, out_len);
  if (code != VtencErrorNoError) {
    fprintf(stderr, "failed decoding with error code: %u\n", code);
    free(out);
    return 1;
  }

  /* here `out` holds the decoded list of integers, of size `out_len` */

  free(out);

  return 0;
}

Limitations

For now, this library has only been tested on Linux and MacOS systems. It shouldn't be too difficult to support Windows as well and I'm planing to do so in future releases.

Building

To compile the library, run make in root directory. That will generate the static and the shared libraries (.a and .so files), ready to be included in your own projects.

Tests

This library is well covered with unit tests and with a few little programs to test different data sets. To build both the programs and the unit tests, run make test in the root directory. The following executable files will be created:

  • tests/unit/unit_tests: invoke it to run all the unit tests.

  • tests/timestamps: program to test ts.txt, a text file with a big list of timestamps.

  • tests/gov2: program to test gov2.sorted file, which is part of the "Document identifier data set" created by D. Lemire.

Benchmarks

The tables shown in the Results section are the result of running the tests included in the integer-compression-benchmarks repository.

License

This code is licensed under MIT license.

About

VTEnc C library

License:MIT License


Languages

Language:C 93.2%Language:C++ 5.1%Language:Makefile 1.7%