dmlc / dlpack

common in-memory tensor structure

Home Page:https://dmlc.github.io/dlpack/latest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify Python embedding of DLPack tensors

hawkinsp opened this issue · comments

DLPack only specifies a C++ API, but in practice there's a Python embedding that multiple frameworks support (via Python capsules) that does not seem to be formally specified or standardized.

The protocol seems to be:

  • producers embed a DLPackManagedTensor as Python capsule with name "dltensor".
  • when a consumer consumes a DLPackManagedTensor, it renames the capsule to "used_dltensor" so the same capsule cannot be consumed twice.
  • different frameworks seem to act differently as to how a consumer should treat a capsule destructor. MXNet seems to remove the capsule destructor, but PyTorch seems to leave it alone (I may have misread the code in either of these two cases.) It would be good to clarify what the correct behavior is. For JAX, I chose to remove the capsule destructor.

Thanks @hawkinsp !The protocol is correct. In terms of removal of destructor. PyTorch's destructor checkes the used_dltensor flag before destruction, so the effect is same as destructor removal.

A PR is more than welcomed.

There's still an issue I think. Consider the following case:

  • Hypothetical framework F provides a capsule with a destructor that does not check the capsule name.
  • PyTorch consumes a DLPack, setting the capsule name to used_dltensor but leaving the capsule destructor alone.
  • You have a potential double free: the tensor may be destroyed once when F's capsule destructor is called, and once when PyTorch's destroys the DL tensor via the callback in DLManagedTensor.

So you need at least one of the following rules:
a) frameworks must remove the capsule destructor when consuming a DLPack, or
b) frameworks must check the capsule name matches "dltensor" in their capsule destructors, and otherwise do nothing.

It would probably be good to specify which.

you are right, I think it is probably a good way to specify the deletion of the destructor. To keep behavior compact to the pytorch one, we can recommend the destructor to check the name as well

Also it would be good to have somewhat "recommended" complete ctypes wrapper in the repo. Currently it is partly in the NumPy example.

I made my own in https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py

The linked page does not specify anything about what the capsule destructor should do. Would it make sense to specify an exact implementation?

static void dlpack_capsule_deleter(PyObject *self){
    if (PyCapsule_IsValid(self, "used_dltensor")) {
        return;
    }

    /* an exception may be in-flight, we must save it in case we create another one */
    PyObject *type, *value, *traceback;
    PyErr_Fetch(&type, &value, &traceback);

    DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor");
    if (managed == NULL) {
        PyErr_WriteUnraisable(self);
        goto done;
    }
    /* the spec says the deleter can be NULL if there is no way for the caller to provide a reasonable destructor. */
    if (managed->deleter) {
        managed->deleter(managed);
        /* TODO: is the deleter allowed to set a python exception? */
        assert(!PyErr_Occurred());
    }

done:
    PyErr_Restore(type, value, traceback);
}

From what I can tell, any other implementation would be one of incorrect / unsafe / equivalent.

The linked page does not specify anything about what the capsule destructor should do. Would it make sense to specify an exact implementation?

static void dlpack_capsule_deleter(PyObject *self){
    if (PyCapsule_IsValid(self, "used_dltensor")) {
        return;
    }
    DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor");
    if (managed == NULL) {
        return NULL;
    }
    managed->deleter(managed);
}

From what I can tell, any other implementation would be one of incorrect / unsafe / equivalent.

Small correction, the return NULL; line should be simply return;, since it's a void function.

Also, it would be good to not call deleter if it's NULL. This can be for 0-dim tensors. Related: pytorch/pytorch#43166

Is a null deleter allowed by the specification? Where even is the specification?

PyCapsules do not allow a NULL as a data pointer, so that cannot possibly be correct. I would assume 0-dim tensors to have all related data in the DLManagedTensor.

Anyway, depending on data/memoruy owning scheme, deleter may be null sometimes. If it can't be null, there should be an assert and a nice crash. Otherwise, I hit an access violation.

PyCapsules do not allow a NULL as a data pointer, so that cannot possibly be correct. I would assume 0-dim tensors to have all related data in the DLManagedTensor.

@vadimkantorov Even if it's a 0-dim array, DLManagedTensor can't possibly be NULL. The proper place to insert a NULL is DLTensor.data, so I can't think of a case where you don't wanna deallocate the memory for DLManagedTensor.

I'm talking about deleter field being null, not data field. For me it happened in practice in pytorch/pytorch#43166 during implementing DLPack interop of libtorch and .NET.

I think there's some confusion here about managed->deleter vs managed. The latter obviously cannot be null, and I think @vadimkantorov is talking about the former, which seems unspecified.

OK thanks for clarifying, I misread it.

Though I still think what @hameerabbasi and I said did not contradict --- you got to have filled in something in managed, so is it really ok to set managed->deleter to NULL?

Even if it's not okay, the reference implementation should better have an assert there to have a nice crash, and not an access violation at destruction time (which is super-hard to debug).

It may be that tensor owner will do all the deletion, and the tensor consumer doesn't need to do any cleanup

The only way that can work is if the tensor owner outlives the consumer.

Even if there are tensors which do not need deleters, dlpack needs to choose between:

  • deleter may be null. The consumer is required to check before calling it
  • deleter may not be null. The producer is responsible for populating it with a no-op deleter if there is nothing to delete.

@vadimkantorov's concern is valid -- the header says the deleter can be NULL:

/*! \brief Destructor signature void (*)(void*) - this should be called
* to destruct manager_ctx which holds the DLManagedTensor. It can be NULL
* if there is no way for the caller to provide a reasonable destructor.
* The destructors deletes the argument self as well.

though as a workaround maybe there's a way to make deleter a no-op for such a case?

yep, that's what i propose as well. if deleter is NULL, just don't call it. or assert that it is not null with a good error message

if there is no way for the caller to provide a reasonable destructor.

What does that mean? That the destructor is unnecessary because the object is immortal, or that a destructor cannot be provided (i.e. the exporter doesn't know about the ownership)?

If it is the latter, we have no memory ownership model! And then Python should probably refuse the DLPackManagedTensor by default, since such a tensor can only be used safely is the user is aware of the ownership model.

If NULL really means cannot be provided, then a no-op_deleter (which is reasonable), would indicate that the tensor is effectively immortal rather that it has unknown ownership/lifespan.

EDIt: OK, maybe it doesn't matter: An object without an ownership model could just be very explicit about that (should be rare enough) – its the users (exporters) problem, not the consumers. I might have tended to Eric's option 1 (ask exporter to set a no-op deleter), but as NULL is allowed currently, option 2. seems perfectly good.

for me either is fine, but if deleter is expected please standardize some assert or good error message :) or at very least some explicit comment about this in the dlpack.h header

I've edited my code above with two corrections:

  • The test for deleter = NULL - the specification is clear, I just hadn't realized that the spec was the comments in the header file
  • Correctly handling in-flight exceptions

Is the deleter permitted to fail (and for example, call PyErr_SetString)? If so, the assert in my snippet should be replaced with proper handling.

Let's continue the discussion on deleter (if still needed) to #74.

@tqchen I think very simple pure Python binding code as found in https://github.com/vadimkantorov/pydlpack is still useful for simple usage / usage without NumPy. I propose to include some bindings like this in dlpack repo (e.g. as sample code)

https://data-apis.org/array-api/latest/design_topics/data_interchange.html doesn't seem to address this

commented

@tqchen I think very simple pure Python binding code as found in https://github.com/vadimkantorov/pydlpack is still useful for simple usage / usage without NumPy. I propose to include some bindings like this in dlpack repo (e.g. as sample code)

https://data-apis.org/array-api/latest/design_topics/data_interchange.html doesn't seem to address this

I agree 💯 with this comment. I want to pass a dlpack structure directly between native and Python code for orchestration purposes, the current approach is geared only towards usage for conversions.

A very simple ctypes wrapper can easily be generated with ctypesgen so there's no manual maintenance involved! See here for a simple example.