Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FSE performance optimizations

pingladd opened this issue · comments

Hello, I am working on the FSE performance optimizations and trying to implement 8-states instead of the default 2-states, which might accelerate the decompressing by processing the 8-states in parallel. To process the 8-states, I changed the bitContainer to a __m128i type and modified all the related functions. Some files were compressed and decompressed correctly for the tests, but some were corrupted because of minor differences between the decoded and the original files.
I tried to find the bug but couldn't make it. I would like to know if it is possible to have someone check my code or discuss it with me? Thank you!

I would recommend starting with 4-states, using a 64-bit container.
This would be much more straightforward and easier to debug.

After such a success, it would be a smaller step to stretch that implementation to 8-states using a 128-bit container.

Note that __m128i is unlikely to be a native type.
This comes with some rather complex consequences when operating on such a type, meaning that operations like + or << are no longer as simple as they look, with corresponding impact on performance. As a consequence, it's not obvious if such a move would improve performance.