[RFC] Rename kDLGPU to kDLCUDA, and kDLCPUPinned to kDLCUDAHost

Question

[RFC] Rename kDLGPU to kDLCUDA, and kDLCPUPinned to kDLCUDAHost

YuchenJin opened this issue 3 years ago · comments

This RFC proposes to rename kDLGPU to kDLCUDA, and kDLCPUPinned to kDLCUDAHost. Two main reasons for this renaming:

There are now more kinds of GPUs, e.g., AMD Radeon GPUs and Qualcomm Adreno GPUs.
Mainstream frameworks like PyTorch clearly indicate CUDA device, e.g., PyTorch uses torch.cuda to support CUDA tensor types, so this renaming will make it more consistent with the other frameworks.

Look forward to hearing your thoughts!

Tianqi Chen · Answer 1 · Sat May 08 2021 04:05:28 GMT+0800 (China Standard Time)

cc @kkraus14 @leofang @prigoyal @zdevito @rgommers @oleksandr-pavlyk

Oleksandr Pavlyk · Answer 2 · Sat May 08 2021 04:15:03 GMT+0800 (China Standard Time)

I think it would be feasible to add kDLGPU as a synonym to kDLCUDA for some time should any compatibility concerns arise.
I am in favor of renaming otherwise.

Renaming will likely bump DLPACK_VERSION, and with adoption of DLPack in Python array API, how would a Python user (consumer library author) know what DLPack version the producer has been using?

Additionally, is there any guidelines about extended DLDeviceType enum? Are kDLExtDev allowed to be moved, i.e. if I wanted to add kDLSycl_LevelZero_GPU and bunch of other values to the enum, would I do that before the guard, or after?

Tianqi Chen · Answer 3 · Sat May 08 2021 04:19:14 GMT+0800 (China Standard Time)

All of the changes so far(including this one) are ABI compatible, which means a system uses an earlier version of DLPack can be imported in a later version(because the number has not change, we are just rename the field). For the same reason, we might want to avoid change the number (e.g. DLExtDev) but instead introduce new numbers.

ABI breaking changes shall be considered more seriously, and so far we have not yet encountered them

Keith Kraus · Answer 4 · Sat May 08 2021 04:29:45 GMT+0800 (China Standard Time)

While this isn't an ABI breaking change it will break the build of someone using kDLGPU and kDLCPUPinned.

Tianqi Chen · Answer 5 · Sat May 08 2021 04:33:18 GMT+0800 (China Standard Time)

We would assume the consumer of dlpack maintain a copy(or submodule hash) of a particular version. So the framework can choose to upgrade the DLPack while updating the usage of kDLGPU to kDLCUDA. Perhaps we can keeping kDLGPU, kDLCPUPinned for another release cycle as alias would resolve the source compatibility issue

Leo Fang · Answer 6 · Sat May 08 2021 05:07:34 GMT+0800 (China Standard Time)

I took a quick look at PyTorch and I think this line should be fixed, but it's likely irrelevant of this RFC (not sure):
https://github.com/pytorch/pytorch/blob/4cb534f92ef6f5b2ec99109b0329f93a859ae831/caffe2/python/pybind_state_dlpack.cc#L11
cc: @emcastillo @asi1024 @kmaehashi for vis

By the way, can we please add a new entry kDLCUDAManaged following this renaming (which I am supportive for clarity if no backward incompatibility happens)? It's very useful for a project I am about to start prototyping. Moreover, managed memory should not be viewed as a CPU-only or GPU-only entity, but something distinct. I am surprised it hasn't been proposed yet.

Leo Fang · Answer 7 · Sat May 08 2021 05:14:31 GMT+0800 (China Standard Time)

following this renaming (which I am supportive for clarity if no backward incompatibility happens)

On a second thought, kDLCPUPinned has its own value, as both CUDA and ROCm provide pinned memory. If we are renaming this entry, I'd prefer to rename kDLCPUPinned to kDLCUDAPinned and add kDLROCMPinned (following kDLROCM).

EDIT: fix uppercase names.

Ralf Gommers · Answer 8 · Sat May 08 2021 18:29:31 GMT+0800 (China Standard Time)

I think renaming is much better than using an alias. That just drags out the work rather than reduce it, unless the intent is to keep an alias forever (which seems not desirable). And while C enum's can have duplicate values, it is normally expected for them to unique.

It seems to me like there is no real source compatibility issue, people just need to rename if they upgrade their vendored copy of dlpack.h.

Recently DLContext was renamed to DLDevice, that didn't give any issues AFAIK.

Tianqi Chen · Answer 9 · Sat May 08 2021 21:09:08 GMT+0800 (China Standard Time)

To follow on @leofang 's comment, kDLCUDACPUPinned might be less ambiguous, mainly because the memory is still allocated on the host, rather on the device. If we want a shorter name, kDLCUDAHost might be fine since host usually refers to the host memory

Leo Fang · Answer 10 · Sun May 09 2021 12:16:06 GMT+0800 (China Standard Time)

If we want a shorter name, kDLCUDAHost might be fine since host usually refers to the host memory

I think this is fine since pinned memory is allocated via cudaMallocHost (and likewise in ROCm/HIP).

So, let me recap what has been discussed so far:

kDLGPU -> kDLCUDA (rename)
kDLCPUPinned -> kDLCUDAHost (rename)
kDLCUDAManaged (new)
kDLROCMHost (new)

For now we do not add kDLROCMManaged because there's no managed memory in ROCm/HIP, but it can be easily added once it's provided and needed.

EDIT: fix uppercase names.

Tianqi Chen · Answer 11 · Mon May 10 2021 01:10:15 GMT+0800 (China Standard Time)

The managed memory could go to a second RFC as it would be more involved than renaming

Yuchen Jin · Answer 12 · Tue May 11 2021 08:52:16 GMT+0800 (China Standard Time)

Thank you all for the discussion! I will create a PR to do the renaming only for now.

Tianqi Chen · Answer 13 · Tue May 11 2021 22:03:04 GMT+0800 (China Standard Time)

Implemented by #68