Out of memory when Inferencing a single file.

Question

Out of memory when Inferencing a single file.

BhaveshDevjani opened this issue 5 years ago · comments

I tried to try the trained model on a single input and it gave OOM on GCP with 1 Nvidia P100.
RuntimeError: CUDA out of memory. Tried to allocate 4.66 GiB (GPU 0; 15.90 GiB total capacity; 14.37 GiB already allocated; 889.81 MiB free; 19.21 MiB cached)
The file size of the mixed wav(19 MB) file was about 5 minutes and for reference file was 11 seconds.
I don't know why it shows 14.37 GiB allocated when not even training. I tried to restart the instance but it did not help.
Can you please suggest a way to reduce the memory required while Inference?
Thank you!

Seung-won Park · Answer 1 · Fri May 24 2019 22:08:55 GMT+0800 (China Standard Time)

Hi, @bhaveshgg17
I think I forgot to use torch.no_grad() scope when inferencing. Will you please add that and try it again?

Seung-won Park · Answer 2 · Sat May 25 2019 00:07:12 GMT+0800 (China Standard Time)

Oh, wait. It must be because of the length of the mixed wav. 5 minutes is too long for VoiceFilter to run at once.

Seung-won Park · Answer 3 · Thu May 30 2019 04:25:39 GMT+0800 (China Standard Time)

First, I reduced memory usage by half using torch.no_grad() scope in #6.
However, in order to use VoiceFilter system for long audio, I think we should use some kind of slicing strategy. Since we can't process the whole audio at once, we need to slice it to some pieces and process sequentially (or in batch).

BhaveshDevjani · Answer 4 · Thu May 30 2019 13:47:26 GMT+0800 (China Standard Time)

Hello @seungwonpark ,
Thank you for the fix! Yes, I think we will have to use slicing for this. I will give it try.

穆文斌 · Answer 5 · Wed Sep 04 2019 10:45:43 GMT+0800 (China Standard Time)

i met the same problem.
error msg:
RuntimeError: CUDA out of memory. Tried to allocate 353.38 MiB (GPU 0; 7.79 GiB total capacity; 6.92 GiB already allocated; 77.56 MiB free; 35.16 MiB cached)

modify batch_size in config.yaml file, it works
batch_size=6