RuntimeError: CUDA out of memory occurred when testing

Question

RuntimeError: CUDA out of memory occurred when testing

zhuaiyi opened this issue 3 years ago · comments

command

python tools/test_ins.py configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py  work_dirs/solov2_light_release_r34_fpn_8gpu_3x/epoch_36.pth --show --out  results_solo.pkl
 --eval segm

bug
[>>>>>>>>>>>>> ] 20/76, 0.3 task/s, elapsed: 59s, ETA: 165s
Traceback (most recent call last):
...
RuntimeError: CUDA out of memory. Tried to allocate 3.30 GiB (GPU 0; 8.00 GiB total capacity; 973.14 MiB already allocated; 2.13 GiB free; 3.74 GiB reserved in total by PyTorch)
Then I shrinked my test set to 14 images, same error occurred when [>> ] 2/14.

Environment
python 3.7
CUDA 11.1
PyTorch 1.7.0+cu110

Supplement
The epoch_36.pth file is generated from the training on my own dataset. And performed pretty good when single-tested by inference_demo.py but fail with this batch-test command.

Xinlong Wang · Answer 1 · Tue Jun 22 2021 17:20:54 GMT+0800 (China Standard Time)

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

zhuaiyi · Answer 2 · Tue Jun 22 2021 17:59:01 GMT+0800 (China Standard Time)

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it.

xuhao-anhe · Answer 3 · Mon Jul 26 2021 15:03:58 GMT+0800 (China Standard Time)

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it.
你好，请问你解决了吗

zhuaiyi · Answer 4 · Tue Jul 27 2021 20:54:42 GMT+0800 (China Standard Time)

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it.
你好，请问你解决了吗

我回去看了下，作者给出的解决方式是基于AdelaiDet框架的，我用的是mmdet，后来通过修改配置文件的test_pipeline下的img_scale之后测试成功了

kizoooh · Answer 5 · Fri Apr 08 2022 16:25:52 GMT+0800 (China Standard Time)

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it.
你好，请问你解决了吗

我回去看了下，作者给出的解决方式是基于AdelaiDet框架的，我用的是mmdet，后来通过修改配置文件的test_pipeline下的img_scale之后测试成功了

您好，想问一下您在采用inference_demo.py来批量推理图片时GPU占用特别大，您有什么解决办法吗？