hpyproject / hpy

HPy: a better API for Python

Home Page:https://hpyproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exposing APIs from HPy extensions to be used by other HPy extensions

steve-s opened this issue · comments

The motivating example is the NumPy API that is exposed to other Python extensions such that they can work with arrays natively/directly without a round-trip through Python code/abstractions.

How the NumPy API works at the moment:

  • NumPy provides a header file with a definition of a struct that holds pointers to some objects (e.g., array type), and some API functions, this is similar to HPyContext
  • NumPy exposes a PyCapsule with a pointer to this struct filled with pointers to the implementation
  • 3rd party extension includes the NumPy header file, fishes the PyCapsule from NumPy, gets the raw C pointer from it and uses it to call the NumPy API through the struct

The very same scheme can work with HPy, but has one drawback: the 3rd party extension gets some HPyContext and passes it to NumPy, which means:

  • NumPy must be built for ABI compatible HPyContext version (could be lower minor version, because those are binary compatible)
  • Before this the Python VM could (for some optimization/implementation reason) send different HPyContext instance to different packages (it can store module state in it, for example). With HPyContext flowing from one extension to another, this is no longer possible.
  • In general it may be useful to be able to intercept and control the communication between extensions

Are those restrictions problematic enough to seek a better solution?

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary. Example in code:

// NumPy:
HPy my_api_function(HPyContext *ctx, HPy h) { ... }
// ...
numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

// 3rd party using the API to call the function:
numpy_api_capsule->my_api_function_pointer(my_hpy_context, my_handle);

// HPy universal implementation of the generated trampoline would be:

HPy_API_token numpy_token; // implementation specific: 
// a pointer to anything the implementation needs, initialized in the HPy_AsAPI call

HPy my_api_function_trampoline(HPyContext *caller_ctx, HPy h) {
    HPyContext *numpyCtx = _HPy_TransformContext(caller_ctx, numpy_token); // part of ABI, not API
    my_api_function(numpyCtx, h);
}

Question is how to generate the trampoline. We can use macros for that, something like HPy_APIDef(...). As a bonus we could generate CPython API trampolines, so that the API can be usable from non-HPy packages (NumPy would have to expose another capsule with the CPython trampolines to be used by non-HPy packages).

Are those restrictions problematic enough to seek a better solution?

IMO, we definitively need some interception. I can add following point:

  • If module A uses module B (e.g. Pandas uses NumPy) and B was loaded in debug or trace mode but A wasn't, then passing on the HPyContext would also mean that you would use a different run mode.

It may be the case that it is fine to pass the HPyContext to the next module but I think we shouldn't assume that in general.

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary.

Sounds good to me. I'm just not so sure about this:

numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

Good point. Yes, we should probably do the exactly same thing as with HPyDef_METH -- it would generate a struct and one would pass that to HPy_AsAPI, or maybe HPy_GetAPI.

Packages from top4000 with string "PyArrayObject" in their sources:

asammdf
astropy
Bottleneck
cvxpy
dedupe
ecos
fastcluster
GDAL
matplotlib
numba
numexpr
numpy
opencv
osqp
pandas
pyerfa
python
scipy
scs
shap
Theano

Do we know of any other package that exposes some C API? I looked at pandas, they don't have it. What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API? If that was the case, we could also say that exposing own C APIs is something that should not be done and hence is not supported in HPy.

What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API?

I would assume that since there is the array API and NumPy implements it (https://numpy.org/doc/stable/reference/c-api/array.html), NumPy's take is not necessarily to use memory view. But I don't know.

Isn't that API on the Python level?

It would be nice if people used the dlpack interface, which provides a standard way to interacts with array-like objects. But thinking about this more deeply it seems that if the HPy port of NumPy must export some kind of C-API, it would still have to be able to export exactly the CPython PyArrayObject. Refactoring code like this from matplotlib to avoid the NumPy C-API (with PyArrayObject) is not going to be easy, it would require replacing their numpy::array_view c++ class with something else, or at least rethinking all the incref/decref in that class.

So if we are confined to use PyArrayObject, can we export that from an HPy port of NumPy without using legacy mode?

Note all the dlpack interface requires is capsule support, which HPy has.

Cython does also contain a system for exposing your types/functions as API, via automatic capsule use. But it also has internal shared code capabilities. If you import multiple Cython modules (transpiled with the same version), they'll share the implementation of the custom function type, things like that.