CUDA OOM(potential enhancement)

Question

CUDA OOM(potential enhancement)

ShengyuH opened this issue 3 years ago · comments

hi,

Thanks for providing this awesome toolbox. I encounter OOM error when using it. Here is the minimal example:
data: https://drive.google.com/file/d/1X_8xmTvGwzv8FxXif2VaqtT_oO07_WwT/view?usp=sharing
Code snippet:

import frnn
import torch

if __name__=='__main__':
    device = torch.device('cuda')
    points = torch.load('dump/pts.pth')[None,:,:].to(device).float()
    n_points = torch.tensor([points.size(1)]).to(device).long()
    K=10
    radius = 0.05
    print(points.size(), n_points)
    _, idxs, _,_ = frnn.frnn_grid_points(points, points, n_points, n_points, K, radius, grid=None, return_nn=False, return_sorted=False)

Errror message:

$ python minimal_sample.py 
torch.Size([1, 156424, 3]) tensor([156424], device='cuda:0')
Traceback (most recent call last):
  File "minimal_sample.py", line 11, in <module>
    _, idxs, _,_ = frnn.frnn_grid_points(points, points, n_points, n_points, K, radius, grid=None, return_nn=False, return_sorted=False)
  File "/scratch2/shengyu/spv/lib/python3.8/site-packages/frnn-0.0.0-py3.8-linux-x86_64.egg/frnn/frnn.py", line 331, in frnn_grid_points
    idxs, dists, sorted_points2, pc2_grid_off, sorted_points2_idxs, grid_params_cuda = _frnn_grid_points.apply(
  File "/scratch2/shengyu/spv/lib/python3.8/site-packages/frnn-0.0.0-py3.8-linux-x86_64.egg/frnn/frnn.py", line 137, in forward
    pc1_grid_cnt = torch.zeros((N, G),
RuntimeError: CUDA out of memory. Tried to allocate 7.51 GiB (GPU 0; 23.70 GiB total capacity; 15.03 GiB already allocated; 7.00 GiB free; 15.04 GiB reserved in total by PyTorch)

Is this because you are using dense grids? I think with sparse hash, I can still handle such point clouds. I'd really appreciate if you can provide some hints.

Best,
Shengyu

Lixin Xue · Answer 1 · Tue Jun 29 2021 20:11:12 GMT+0800 (China Standard Time)

Hi Shengyu,

Thanks for the example to reproduce the error. The problem is that when the scale of the bounding box of the point cloud is much larger than the scale of the search radius, I will use a default grid resolution (see variable grid_max_res in python and GRID_3D_MAX_RES / GRID_2D_MAX_RES in c) to avoid a huge grid. The default value was set to 128 and it is too large for most GPUs. Now I set it to 64 and it works on my GPU with 8GB memory.

For the sparse grid suggestion, I feel like I will have to change the code completely and not sure if the overhead would be marginal. Will check about this repo later. Thanks for the reference!

Best,
Lixin

Yuhao Zhu · Answer 2 · Fri Aug 06 2021 09:25:24 GMT+0800 (China Standard Time)

I still have OOM issues. I am using K=50, D=3, radius=2. I intentionally set radius_cell_ratio to a very small number (0.001) so that the cell size is big. The boundary of the point cloud is scene boundary: (-80.000000, -3.000000, -80.000000), (80.000000, 25.000000, 80.000000). The error happens at:

File "lib/python3.6/site-packages/frnn-0.0.0-py3.6-linux-x86_64.egg/frnn/frnn.py", line 368, in frnn_grid_points
    return_sorted, radius_cell_ratio)
File "lib/python3.6/site-packages/frnn-0.0.0-py3.6-linux-x86_64.egg/frnn/frnn.py", line 210, in forward
    K, r, r * r)

What's weird is that no matter what value I set radius_cell_ratio to, seems like it's always trying to allocate the same amount of device memory.

Lixin Xue · Answer 3 · Fri Aug 06 2021 17:50:58 GMT+0800 (China Standard Time)

Could you give me a minimal example and data to reproduce the error?

Yuhao Zhu · Answer 4 · Fri Aug 06 2021 23:05:37 GMT+0800 (China Standard Time)

This point cloud file (https://drive.google.com/file/d/1IJL0va_l2QTLB4qb_HCvgpbPhp14OKjX/view?usp=sharing) has 6,222,091 points.

The search code is:
frnn.frnn_grid_points(pc, pc, lengths1=None, lengths2=None, K=100, r=2, radius_cell_ratio=2)

The error is:
RuntimeError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 1; 7.80 GiB total capacity; 5.00 GiB already allocated; 1.87 GiB free; 5.01 GiB reserved in total by PyTorch)

I did the calculation. The return results requires 6222091 * 100 * 4 ~ 2.32 GB. So seems like other data structures have taken too much device memory so that we couldn't allocate memory for the return buffer.

Lixin Xue · Answer 5 · Sat Aug 07 2021 04:09:09 GMT+0800 (China Standard Time)

Since we stored the sorted version of both point clouds, we actually have 3 point clouds of 2.32GB (input pc, sorted pc1, sorted pc2), which takes up 7GB memory. So it could be expected to have OOM error for this pc. I will add support for pc1 and pc2 being the same point cloud later. In that case, we only need to have two pcs of 2.32GB.

Thanks for the example. I found another two small bugs from it.