davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++

Home Page:http://dlib.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cnn face detector example OOMs when processing images in series

sandeen opened this issue · comments

Expected Behavior

The cnn face detector example should process multiple images in a row without running out of memory.
I understand that some images may not fit into the GPU. However, if I am able to process images individually, I would like to be able to process them in a loop as well.
(tl;dr: can the example be modified to free GPU resources between images?)

Current Behavior

With the two provided test images, a.jpg and b.jpg (below), I can successfully process either one, individually.
I can also process b.jpg and a.jpg together, in that specific order.
If I try to process a.jpg and b.jpg in that order, I run out of GPU memory.
Is there a call to make in between cnn_face_detector() calls to free GPU memory, or something like that?

Steps to Reproduce

Modify the cnn face detector example to change upsampling from 1 to 0, and remove the window display and hit_enter_to_continue() calls.
Run the cnn example on file a, file b, files (b & a), then files (a & b):

[root@host test-dlib]# ./dlib-cnn.py mmod_human_face_detector.dat a.jpg
Processing file: a.jpg
Number of faces detected: 0
[root@host test-dlib]# ./dlib-cnn.py mmod_human_face_detector.dat b.jpg
Processing file: b.jpg
Number of faces detected: 0
[root@host test-dlib]# ./dlib-cnn.py mmod_human_face_detector.dat b.jpg a.jpg
Processing file: b.jpg
Number of faces detected: 0
Processing file: a.jpg
Number of faces detected: 0
[root@host test-dlib]# ./dlib-cnn.py mmod_human_face_detector.dat a.jpg b.jpg
Processing file: a.jpg
Number of faces detected: 0
Processing file: b.jpg
Traceback (most recent call last):
  File "/root/test-dlib/./dlib-cnn.py", line 55, in <module>
    dets = cnn_face_detector(img, 0)
RuntimeError: Error while calling cudaMalloc(&data, n) in file /root/rpmbuild/BUILD/dlib-19.23/dlib/cuda/cuda_data_ptr.cpp:58. code: 2, reason: out of memory
  • Version: 19.23
  • Where did you get dlib: tarball from dlib.net, rebuilt locally into an RPM with Fedora spec file
  • Platform: Fedora release 33 (Thirty Three), x86_64
  • Compiler: gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
  • CUDA: 11.4.2

GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 470.74       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 35%   27C    P0    N/A /  30W |      0MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

test images:

a
b

my test script:

dlib-cnn.py.txt

Thank you for the reply. The thing that I'm wondering about is that the two images work just fine one at a time.
So to be clear - are you saying that b.jpg alone essentially exhausts the GPU memory when I do this?

# ./dlib-cnn.py mmod_human_face_detector.dat a.jpg b.jpg

b.jpg file size is only 170k, dimensions 1242x1243, and the GPU has 2G. I'm just trying to get a sense of what I should expect when processing multiple (thousands) of images this way, i.e. how big is too big. I didn't expect this one to be too big.

Thanks!

One more datapoint. When I process a.jpg, nvidia-smi shows GPU use topping out at about 1.5G

| 35%   34C    P0    N/A /  30W |   1587MiB /  2000MiB |    100%      Default |

When I process b.jpg, max consumption is indeed slightly more:

| 35%   33C    P0    N/A /  30W |   1711MiB /  2000MiB |     97%      Default |

I just expected that if b.jpg fit into the GPU, then processing a.jpg first would not affect subsequent processing of b.
shrug

Anyway, I'll try catching the ENOMEM exception, downsample, and retry the image. I won't need details on face rectangles, I just want a yes/no on "is there a face" so it should be straightforward.

I appreciate the info you've provided! And thanks for all your work on dlib.

It's the reallocations that are failing. Freeing and reallocating memory in cuda can be like that. Like freeing memory doesn't always release it all immediately. I'm not sure if there is some process level memory pooling in the cuda runtime or the memory paging system on the GPU just isn't that good and it gets fragmented or what.