data-apis / python-record-api

Inferring Python API signatures from tracing usage.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ufunc data seems to be missing

rgommers opened this issue · comments

I was looking for def sin and other such functions in typing/numpy.py, and they're missing completely. It's unclear why.

The actual question I was trying to figure out is: how often is the dtype keyword used for unary ufuncs. I thought the data I needed would be here, but it looks like it's not.

commented

Does this have to do with the Python-C bridge? Meaning, I am not sure that the tooling currently picks up C-level argument handling, which could be applicable for ufuncs.

Ah yes, that's it (unfortunately). Pretty much all functions that are not ufuncs have a thin Python shim and will be picked up, ufuncs aren't.

The ufunc data is available, see in the file you referenced:
 

# usage.dask: 58
# usage.hvplot: 1
# usage.koalas: 5
# usage.matplotlib: 127
# usage.networkx: 5
# usage.orange3: 6
# usage.pandas: 34
# usage.prophet: 2
# usage.scipy: 296
# usage.seaborn: 1
# usage.skimage: 51
# usage.sklearn: 32
# usage.statsmodels: 47
# usage.xarray: 30
sin: numpy.ufunc

Since ufuncs are a custom object, not just a function, we record them as such. If you then look at the same file, you will see a class ufunc which has the overloads for all the calls:

class ufunc:

    # usage.dask: 1
    __module__: ClassVar[object]

    @overload
    def __call__(self, _0: pandas.core.frame.DataFrame, _1: int, /):
        """
        usage.dask: 85
        usage.koalas: 24
        """
        ...

    @overload
    def __call__(self, _0: int, _1: int, /):
        """
        usage.dask: 1
        usage.koalas: 1
        usage.matplotlib: 2
        usage.scipy: 135
        usage.skimage: 1
        usage.sklearn: 3
        usage.statsmodels: 10
        usage.xarray: 4
        """
        ...

So it is currently showing the number of times each ufunc is retrieved from the ufunc module (the first stats) and then also how ufuncs are called generally (the second stats).

We could also show the product of these, showing per ufunc instance how it's called.

Currently, ufuncs all show up as defined in the numpy module, because it's hard to find where they were defined (#70)