alibaba / graphlearn-for-pytorch

A GPU-accelerated graph learning library for PyTorch, facilitating the scaling of GNN training and inference.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CUDA] Optimize GPU memory allocation/deallocation in CUDA Operators such as neighbor sampling. 

baoleai opened this issue · comments

🚀 The feature, motivation and pitch

GLT's CUDA Operations such as GPU sampling contains numerous cudaFree calls, which may negatively impact performance. One potential solution is to implement a GPU memory pool to manage memory allocation and deallocation instead of directly calling cudaMalloc(Async)/cudaFree(Async).

Alternatives

No response

Additional context

No response

We can use PyTorch's memory management interface directly, see https://github.com/pytorch/pytorch/blob/main/c10/cuda/CUDACachingAllocator.h