galtay / hilbertcurve

maps between 1-D space filling hilbert curve and N-D coordinates

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: numpy array compatibility

diamond-lizard opened this issue · comments

It would be useful if this package could directly accept numpy arrays as input and produce numpy arrays as output.

I've thought a little bit about this. I think it would be useful, but there are two things to consider,

  • the benefits of arbitrary integer size provided by the python integer class would be lost
  • i'm not sure what the best API would be for the two main methods,
    ** coordinates_from_distance(self, h: int) -> List[int]
    ** distance_from_coordinates(self, x_in: List[int]) -> int

What do you think?

For this feature to be implemented, it's not necessary for the existing functions to accept/return numpy arrays.

It would be sufficient for other, equivalent functions to exist that are numpy-specific.

With separate functions for integers vs numpy arrays, one would not have to give up the advantages of the one to use the other.

Two other alternatives:

1 - have one set of functions that are sensitive to their argument types, so they can do the right thing if passed either integers or numpy arrays

2 - have one set of functions which will do the right thing to either integers or numpy arrays based an extra argument (or perhaps some other means) which tells them which type is intended

The API would, of course, depend on which of these were implemented.

For your use case, are you running into speed problems? I ask b/c it would be relatively easy to unpack numpy arrays into lists, run the current implementation on each list, and then pack them back into numpy arrays. Much less easy to write the algorithm in such a way that it uses the string and bitwise functions directly on numpy arrays.

For me personally, at this point this is just a matter of convenience.

I'm working with numpy arrays, and as a user I'd just like to hand off those arrays to a library to perform the operation I need than write extra code to get the data out of and back in to the format that the library functions require.

If they could just handle numpy arrays directly, then I both wouldn't need to write the extra code nor even think about conversion there and back. I could just pass in the array and get an array back. That would be maximally convenient.

At this point I am not having any performance issues, but it is possible that in the future it might become an issue. So, again, for me personally performance is secondary to convenience right now.

Got it. I'll leave this issue up as a reminder to look into this when I have some time. Thanks for the feedback!

#33 should handle this when merged. it also adds some multiprocessing functionality (README will be updated)

with version 2

In [1]: from hilbertcurve.hilbertcurve import HilbertCurve                                                            

In [2]: hc = HilbertCurve(n=2, p=3)                                                                                   

In [3]: import numpy as np                                                                                            

In [4]: hc.points_from_distances(np.arange(4))                                                                        
Out[4]: [[0, 0], [0, 1], [1, 1], [1, 0]]

In [5]: hc.points_from_distances(np.arange(4), match_type=True)                                                       
Out[5]: 
array([[0, 0],
       [0, 1],
       [1, 1],
       [1, 0]])