scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.

Home Page:https://uproot.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use cramjam to provide compression utilities

nsmith- opened this issue · comments

@lgray mentioned https://pypi.org/project/cramjam/ to me and it looks like a nice solution to provide many of the ROOT compression algorithms in a single dependency-free package. Additionally, it allows to declare the output length so it can pre-allocate the buffer, which may provide some speedup for algorithms other than lz4, which is the only one currently using the uncompressed size hint:

return lz4_block.decompress(data, uncompressed_size=uncompressed_bytes)

This is an internal feature and would not provide any user enhancement other than a potential speed-up

Just to finish the chain this was mentioned to me by @martindurant.

it allows to declare the output length so it can pre-allocate the buffer

You can also allocate buffers yourself and decompress_into - I don't know if there's a use case for that.

Actually, yes! One thing uproot is often doing is decompressing many small chunks and then concatenating them into a larger contiguous buffer. We could save some additional allocation and copy time if we can decompress into a buffer at an arbitrary offset.

if we can decompress into a buffer at an arbitrary offset.

yes, certainly you can, I think by just slicing the base numpy array

Since we want Uproot to work in Pyodide, it's important to note that cramjam works in Pyodide.

image

Writing into a single, contiguous buffer with decompress_into would require some rearchitecting—possible, but a major project. Also, it could only work for non-ragged data (or only the outer indexes of ragged data). It could perhaps be an extension of uproot.AsDtypeInPlace.

I've split out the request to use cramjam's decompress-in-place into a new issue; this will be closed when #1090 is.