[CUDA] Optimize GPU memory allocation/deallocation in CUDA Operators such as neighbor sampling.

Question

[CUDA] Optimize GPU memory allocation/deallocation in CUDA Operators such as neighbor sampling.

baoleai opened this issue a year ago · comments

🚀 The feature, motivation and pitch

GLT's CUDA Operations such as GPU sampling contains numerous cudaFree calls, which may negatively impact performance. One potential solution is to implement a GPU memory pool to manage memory allocation and deallocation instead of directly calling cudaMalloc(Async)/cudaFree(Async).

Alternatives

No response

Additional context

No response

Baole Ai · Answer 1 · Thu May 04 2023 15:53:13 GMT+0800 (China Standard Time)

We can use PyTorch's memory management interface directly, see https://github.com/pytorch/pytorch/blob/main/c10/cuda/CUDACachingAllocator.h