lemire / streamvbyte

Fast integer compression in C using the StreamVByte codec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoding should be vectorized

lemire opened this issue · comments

Currently, encoding is computed using relatively slow scalar functions.

@KWillets If you do push code, please create an "AUTHORS" file with your name in it.

This is ready to start integrating and testing with the rest of the code. The good news is that it's about 10 instructions per vector of four input uint32_t's, so only 2-3x slower than the decoder. It makes extensive use of a bithack to do shift-or of bytes via multiply, and a pshufb lookup table.

A few coding notes:

I have not looked at the delta coding.

I created a u128 union type to keep from going insane initializing and accessing __mm128i's. This may be good for shuffle tables as well.

The encoder table is in an include file generated from the script.

The svb_encode_vector entry point is similar to svb_encode_scalar but it obviously calls the vector encoder on each 4-word frame. svb_encode_scalar should be used to finish the ragged end.

I am open to opinions as to whether we should keep the wall-of-code or break it out into modules, include files, etc.

Great. I think I should be able to add the vectorized deltas.

I’ll review within a week.

This is now integrated into streamvbyte, and it passes unit.c and example.c.

Sadly, I have done no performance tests yet.

Performance is about 7x faster for 500k random int's. 588 million int's per second on my laptop.

Ok. So the deltas are now encoded, mostly by re-using @KWillets's code.