Blosc / python-blosc

A Python wrapper for the extremely fast Blosc compression library

Home Page:https://www.blosc.org/python-blosc/python-blosc.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compress_ptr address/pointer question - for bytearrays or memoryviews

small-mallet opened this issue · comments

The blosc.compress_ptr() example is shown for numpy arrays. I am wondering if it might be possible to use this function for bytearrays and such (that are contiguous).

numpyarray.__array_interface__['data'][0] seems to give the memory address of the array.
What is the way get the address of builtin objects like bytes, bytearrays, memoryviews, etc?

I am only interested in this because I observed almost 2x the compression speed (and matches equiv. c program's performance) by:

  1. Converting the bytearray to numpy array using np.frombuffer()
  2. Compressing using blosc.compress_ptr()

The converting part doesn't seem to be all that costly for my use-case (couple microseconds) but I am just wondering if there is a simpler way to directly point to the bytearray. Or is frombuffer as fast as it's going to get?

Yes, np.frombuffer() does not do a copy of data of the array, so this is a perfectly good way to use compression for your case.

That's a relief, thx for the quick reply! I am really impressed with this library and the python bindings are a huge help. Maybe this 'method?' can be mentioned in the docs for any other people looking for this?

Sure, do you want to contribute the documentation for this?

I am uhh, new to github but created a new pull request for this. This one here: #215