Encoding should be vectorized

Question

Encoding should be vectorized

lemire opened this issue 8 years ago · comments

Daniel Lemire commented 8 years ago

Currently, encoding is computed using relatively slow scalar functions.

Daniel Lemire · Answer 1 · Tue Oct 03 2017 22:43:43 GMT+0800 (China Standard Time)

@KWillets If you do push code, please create an "AUTHORS" file with your name in it.

Kendall Willets · Answer 2 · Sun Oct 08 2017 03:31:52 GMT+0800 (China Standard Time)

This is ready to start integrating and testing with the rest of the code. The good news is that it's about 10 instructions per vector of four input uint32_t's, so only 2-3x slower than the decoder. It makes extensive use of a bithack to do shift-or of bytes via multiply, and a pshufb lookup table.

A few coding notes:

I have not looked at the delta coding.

I created a u128 union type to keep from going insane initializing and accessing __mm128i's. This may be good for shuffle tables as well.

The encoder table is in an include file generated from the script.

The svb_encode_vector entry point is similar to svb_encode_scalar but it obviously calls the vector encoder on each 4-word frame. svb_encode_scalar should be used to finish the ragged end.

I am open to opinions as to whether we should keep the wall-of-code or break it out into modules, include files, etc.

Daniel Lemire · Answer 3 · Sun Oct 08 2017 04:27:14 GMT+0800 (China Standard Time)

Great. I think I should be able to add the vectorized deltas.

I’ll review within a week.

Kendall Willets · Answer 4 · Mon Oct 09 2017 04:49:05 GMT+0800 (China Standard Time)

This is now integrated into streamvbyte, and it passes unit.c and example.c.

Sadly, I have done no performance tests yet.

Kendall Willets · Answer 5 · Mon Oct 09 2017 06:57:54 GMT+0800 (China Standard Time)

Performance is about 7x faster for 500k random int's. 588 million int's per second on my laptop.

Daniel Lemire · Answer 6 · Tue Oct 10 2017 01:21:05 GMT+0800 (China Standard Time)

Solved by @KWillets

Daniel Lemire · Answer 7 · Tue Oct 10 2017 03:07:51 GMT+0800 (China Standard Time)

Ok. So the deltas are now encoded, mostly by re-using @KWillets's code.