lemire / streamvbyte

Fast integer compression in C using the StreamVByte codec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce peak memory consumtion during encoding

daniel-j-h opened this issue · comments

We don't know the required output memory upfront, so we use a function returning the worst case memory required

// return the maximum number of compressed bytes given length input integers
static inline size_t streamvbyte_max_compressedbytes(const uint32_t length) {
// number of control bytes:
size_t cb = (length + 3) / 4;
// maximum number of control bytes:
size_t db = (size_t) length * sizeof(uint32_t);
return cb + db;
}

but in case we are encoding small integers (or small deltas), often times most if not all values fit into a single byte.

In these cases, we still need to allocate upfront

control bytes + n * 4

whereas

control bytes + n * 1

bytes would suffice.

There are use cases where I'd like to only allocate e.g. 1 GB instead of 4 GB an then throwing out 3 GB immediately after encoding.

Should this library provide a two-pass approach, where

  • the user first calls a function to determine the allocation required
  • the user then calls a function to encode the input data

This two-pass approach might be slower in terms of runtime, but we can reduce the allocations required for data bytes by a factor of four in the best case.

Users can write their own version (summing up the bytes required per input item) but having a function in the library would be great for convenience and would allow efficient implementations in the future. Thoughts?

@daniel-j-h

Pull requests invited!

  1. I would discourage you from processing data in blocks of gigabytes. Consider breaking down the data so that you can work in cache.
  2. A large (4GB) virtual memory allocation is very fast. Try to benchmark "malloc(large value)". You should find that time elapsed is independent from the allocation size. Most systems will only allocate real memory when you access a page.
  3. Importantly you do not want each data structure to be independently allocated: I recommend an arena approach were you allocate a chunk for most of your needs (containing many data structures). Allocating physical memory is really slow and you don’t want to keep doing it.

Small pull request at #33

I appreciate your detailed response 🙌 I was looking into https://github.com/iiSeymour/pystreamvbyte to play around and in there we create a np.empty based on the estimated max compressed bytes. The numpy array always gets allocated (just not initialized) - that's why I thought computing the exact number of bytes required would be great to have in the C version here to begin with.