Use cramjam to provide compression utilities
nsmith- opened this issue · comments
@lgray mentioned https://pypi.org/project/cramjam/ to me and it looks like a nice solution to provide many of the ROOT compression algorithms in a single dependency-free package. Additionally, it allows to declare the output length so it can pre-allocate the buffer, which may provide some speedup for algorithms other than lz4, which is the only one currently using the uncompressed size hint:
uproot5/src/uproot/compression.py
Line 194 in fd0637b
This is an internal feature and would not provide any user enhancement other than a potential speed-up
Just to finish the chain this was mentioned to me by @martindurant.
it allows to declare the output length so it can pre-allocate the buffer
You can also allocate buffers yourself and decompress_into - I don't know if there's a use case for that.
Actually, yes! One thing uproot is often doing is decompressing many small chunks and then concatenating them into a larger contiguous buffer. We could save some additional allocation and copy time if we can decompress into a buffer at an arbitrary offset.
if we can decompress into a buffer at an arbitrary offset.
yes, certainly you can, I think by just slicing the base numpy array
Since we want Uproot to work in Pyodide, it's important to note that cramjam works in Pyodide.
Writing into a single, contiguous buffer with decompress_into
would require some rearchitecting—possible, but a major project. Also, it could only work for non-ragged data (or only the outer indexes of ragged data). It could perhaps be an extension of uproot.AsDtypeInPlace.
I've split out the request to use cramjam's decompress-in-place into a new issue; this will be closed when #1090 is.