kspalaiologos / bzip3

A better and stronger spiritual successor to BZip2.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hardware crc32

rurban opened this issue · comments

for performance reasons you need to probe for hardware crc32 support (most chips do have it)
and use it, instead of the SW variant. there are various variants (PCLMUL in x86 or just the crc32 intrinsics)

CRC32 takes approximately 1.1% of the runtime. The timing on the Silesia corpus is 17.42s, so CRC32 took an absolutely astonishing amount of time - 170 milliseconds.

Hardware CRC32 uses obscure polynomials and as such it's also not portable (one architecture has hardware CRC32 with polynomial 1, other has hardware CRC32 with polynomial 2).

The optimisation is mostly irrelevant and the focus on optimisation could be moved to e.g. src/cm.c which could really use some help now, as it uses probably more than 90% of the runtime.

Oh nice. I thought it's more there. So closing. The crc32c polynomial is the usual one

Actually I did some profile on the call stack. Although the call is from python, the native stack did show something. See bz3_encode_block and the following stacks.
profile (2)

what was the testing corpus? most of the data must've been collapsed by RLE, making the graph appear as if arithmetic coding, SAIS construction, ... are taking little to no time - which can't be true.

ps: i just came home from surgery, i will be less responsive for a while wrt. OSS.

I tested it with some repeated data generated in python. The profiler I'm using is py-spy, which can profile native extensions. I was also surprised at the result.

your benchmark must be wrong. try bzip3 on more real-world data where RLE doesn't save this much space.