metagraph-dev / metagraph

Multi-target API for graph analytics with Dask

Home Page:https://metagraph.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dask resolver should wrap arguments of a concrete type in a placeholder

seibert opened this issue · comments

If we define some algorithms like this:

    @abstract_algorithm('testing.scale')
    def testing_scale(a: Vector, scale: float)->Vector:  #pragma: no cover
        pass

    @concrete_algorithm('testing.scale', compiler="identity")
    def compiled_scale(a: NumpyVectorType, scale: float)->NumpyVectorType:
        return a * scale

And then construct a value like this:

   a = np.arange(100)
    scale_func = res.algos.testing.scale
    z = scale_func(scale_func(scale_func(a, 2.0), 3.0), 4.0)

The resulting dask graph is this:

{('call-8b12ab9c7500e5a04d2200176747e90e', 'testing.scale'): (<metagraph.core.plugin.ConcreteAlgorithm object at 0x7ff6080720d0>,
                                                              ('call-8f1882ff20c40a1140f666ab9450cf58',
                                                               'testing.scale'),
                                                              3.0),
 ('call-8f1882ff20c40a1140f666ab9450cf58', 'testing.scale'): (<metagraph.core.plugin.ConcreteAlgorithm object at 0x7ff6080720d0>,
                                                              array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
                                                              2.0),
 ('call-a0001325df28ad22d68449a7ebc15273', 'testing.scale'): (<metagraph.core.plugin.ConcreteAlgorithm object at 0x7ff6080720d0>,
                                                              ('call-8b12ab9c7500e5a04d2200176747e90e',
                                                               'testing.scale'),
                                                              4.0)}

In particular, notice this task:

 ('call-8f1882ff20c40a1140f666ab9450cf58', 'testing.scale'): (<metagraph.core.plugin.ConcreteAlgorithm object at 0x7ff6080720d0>,
                                                              array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]),
                                                              2.0),

has included the NumPy array (NumPyVectorType) in the arguments for the task.

For various reasons, I think we want to actually wrap that array in a placeholder object as a separate task in the graph and then pass that key to the concrete algorithm. In particular, we will eventually want the ability to rewrite the graph to swap out one concrete algo with another, which might necessitate a translation step. It is cleaner if the translatable data (i.e., those values represented by concrete types) have been separated out into their own entries in the task graph. Also, by wrapping the value in a placeholder, we can retain the type metadata about the value, which will also help in some graph rewrite transformations.

I think this should be possible. We just need to check the inputs and convert anything which isn't already a dask object into one.

Would you want to wrap simple types like int, float, or bool or just leave them alone?