The package chain_ufunc
allows one to create chains of ufuncs, which are executed in order. The idea is to do this at the C level, executing the inner loops in order on nicely sized pieces, so that one avoids allocating possibly large arrays for the intermediate steps, saving memory and speeding up execution (for large arrays). There is also a python version for comparison.
Note
This is far from complete! For instance, all inputs and outputs of the ufuncs that are combined have to be float64
at present. And there is no documentation beyond the docstrings (and this file).
Example:
>>> import numpy as np >>> from chain_ufunc import create_chained_ufunc >>> muladd = create_chained_ufunc([(np.multiply, [0, 1, 3]), (np.add, [2, 3, 3])], 3, 1, 0, "muladd") >>> muladd([0., 2., 1.], [4., 1., 6.], 0.1) array([0.1, 2.1, 6.1])
There is preliminary support for generating the ufunc automatically, by passing in a special Input
instance.
Example:
>>> import numpy as np >>> from chain_ufunc import Input >>> def fun(a, b, c, d): ... return a*b + c*d >>> ufunc = fun(Input(), Input(), Input(), Input()) >>> ufunc.links [(<ufunc 'multiply'>, [0, 1, 4]), (<ufunc 'multiply'>, [2, 3, 5]), (<ufunc 'add'>, [4, 5, 4])] >>> ufunc.graph() # doctest: +SKIP <graphviz.graphs.Digraph object at ...>
On an jupyter notebook with graphviz
installed, the latter gives a nice image:
Comparing speed between the regular numpy function and the ufunc:
>>> a = np.random.normal(size=1000000) >>> b = np.random.normal(size=1000000) >>> np.all(fun(2., a, b, 10.) == ufunc(2., a, b, 10.)) True >>> %timeit fun(2., a, b, 10.) # doctest: +SKIP 5.91 ms ± 33.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> %timeit ufunc(2., a, b, 10.) # doctest: +SKIP 1.76 ms ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The latter speed-up is comparable to what one gains with numexpr
.