Encoding should be vectorized
lemire opened this issue · comments
Currently, encoding is computed using relatively slow scalar functions.
@KWillets If you do push code, please create an "AUTHORS" file with your name in it.
This is ready to start integrating and testing with the rest of the code. The good news is that it's about 10 instructions per vector of four input uint32_t's, so only 2-3x slower than the decoder. It makes extensive use of a bithack to do shift-or of bytes via multiply, and a pshufb lookup table.
A few coding notes:
I have not looked at the delta coding.
I created a u128 union type to keep from going insane initializing and accessing __mm128i's. This may be good for shuffle tables as well.
The encoder table is in an include file generated from the script.
The svb_encode_vector entry point is similar to svb_encode_scalar but it obviously calls the vector encoder on each 4-word frame. svb_encode_scalar should be used to finish the ragged end.
I am open to opinions as to whether we should keep the wall-of-code or break it out into modules, include files, etc.
Great. I think I should be able to add the vectorized deltas.
I’ll review within a week.
This is now integrated into streamvbyte, and it passes unit.c and example.c.
Sadly, I have done no performance tests yet.
Performance is about 7x faster for 500k random int's. 588 million int's per second on my laptop.
Solved by @KWillets
Ok. So the deltas are now encoded, mostly by re-using @KWillets's code.