NVIDIA / warp

We noticed that GPU memory usage increases when repeatedly creating (and destroying) warp.Mesh objects.

Minimal Example:

import warp as wp
import pynvml       # pip install pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
wp.init()

device = "cuda:0"
points = wp.array([[0, 0, 0], [1, 0, 0], [0, 1, 0]], dtype=wp.vec3, device=device)
indices=wp.array([0, 1, 2, 0, 1, 2, 0, 1, 2], dtype=wp.int32, device=device)

for i in range(10_000_000):
    if i % 100_000 == 0:
        gpu_ram_usage = pynvml.nvmlDeviceGetMemoryInfo(handle).used / 1024 ** 2
        print(f"iter = {i:8d}, VRAM usage = {gpu_ram_usage:.0f} MiB")
    mesh = wp.Mesh(points, indices)

Output:

   CUDA Toolkit 12.3, Driver 12.3
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "NVIDIA GeForce RTX 2080 SUPER" (8 GiB, sm_75, mempool enabled)
[...]
iter =     0k, VRAM usage = 521 MiB
iter =   100k, VRAM usage = 565 MiB
iter =   200k, VRAM usage = 629 MiB
[...]
iter =  1900k, VRAM usage = 1429 MiB

As can be seen easily, the GPU memory usage increases steadily, despite the created Mesh being destroyed immediately.

The has been tested on the lastest main commit (ebcc90d). There is no host memory leak when using device = "cpu", as far as we can tell.

After an initial investigation, the problem seems to be the following:

When creating a mesh (in mesh_create_device), a BVH is created as follows:

warp/warp/native/mesh.cu

Lines 211 to 212 in ebcc90d

    
           uint64_t bvh_id = bvh_create_device(mesh.context, mesh.lowers, mesh.uppers, num_tris); 
        
           wp::bvh_get_descriptor(bvh_id, mesh.bvh);

When destroying the mesh again (in mesh_destroy_device), the BVH is destroyed as follows:

warp/warp/native/mesh.cu

Line 241 in ebcc90d

wp::bvh_destroy_device(mesh.bvh);

During creation, the following memory block is allocated on the device (in bvh_create_device):

warp/warp/native/bvh.cu

Line 504 in ebcc90d

    
           wp::BVH* bvh_device = (wp::BVH*)alloc_device(WP_CURRENT_CONTEXT, sizeof(wp::BVH));

This allocation does not have a corresponding free_device() call and is thus leaked.

I am not well-versed enough with this code base to propose a nice fix. However, here is a "hacky" patch that resolves the problem: https://gist.github.com/MaxWipfli/3197354809752d377dd90bbd108e1992

Thanks @MaxWipfli, nice catch! Your fix is on the right track. I'll take a closer look and we'll get this leak patched up asap.

Fix is now in main

	uint64_t bvh_id = bvh_create_device(mesh.context, mesh.lowers, mesh.uppers, num_tris);
	wp::bvh_get_descriptor(bvh_id, mesh.bvh);

GPU memory leaked when destructing warp.Mesh