[BUG] Cuda out of memory on Linux during inference

Question

[BUG] Cuda out of memory on Linux during inference

shirounanashi opened this issue 2 months ago · comments

Before You Report a Bug
My setup is a GTX 1660 Super, Ryzen 5600G (16GB ram).

Bug Description
I use Applio on both Windows 11 and Linux (Arch), but on Linux, it is giving this Cuda out of memory error, both in the last version and in the last commit.

Steps to Reproduce
Outline the steps to replicate the issue:

Simply make the inference with the default settings

Expected Behavior
Make the inference without giving cuda out of memory

Assets

Desktop Details:

Operating System: Linux (Arch Linux, Gnome)
Browser: Microsoft Edge

Additional Context
I'm not using IAHispano's fairseq.

Aitor Emper · Answer 1 · Sat Apr 27 2024 01:03:32 GMT+0800 (China Standard Time)

It could be because a lot of things but I think is one of this:

Your GPU driver is outdated: https://docs.nvidia.com/deeplearning/cudnn/latest/reference/support-matrix.html
Your GPU only has 5 GB of VRAM and it's also a GTX which there's some people having issues with them

shirounanashi · Answer 2 · Sat Apr 27 2024 01:47:42 GMT+0800 (China Standard Time)

Thank you, it really was a driver problem, but it wasn't because it was outdated, it was because it wasn't installed, both cuda and cudnn. I installed it and solved the problem

shirounanashi · Answer 3 · Mon May 13 2024 21:59:45 GMT+0800 (China Standard Time)

Testing further, I discovered that it is a problem with Applio on Linux, a problem that does not happen in RVC WebUI, that is, it has nothing to do with the driver as I thought it would be when closing the issue

Aitor Emper · Answer 4 · Thu May 16 2024 02:30:43 GMT+0800 (China Standard Time)

Applio uses identical code for GPU detection and utilization in both RVC WebUI. We only chnaged the Torch version, hence I'm sharing this link https://docs.nvidia.com/deeplearning/cudnn/latest/reference/support-matrix.html for you to verify compatibility with our current setup.

shirounanashi · Answer 5 · Sat May 18 2024 13:01:17 GMT+0800 (China Standard Time)

@aitronssesin In theory, my GPU was supposed to run smoothly. But even with the latest version of the drivers and cudnn in a clean Arch installation, it still gives me this Cuda out of memory problem, which doesn't happen with the RVC Web UI. Honestly, I don't know why this happens, since it doesn't happen on Windows on the same PC

Aitor Emper · Answer 6 · Sat May 18 2024 18:38:25 GMT+0800 (China Standard Time)

It could be an issue with Arch because in my Ubuntu server it works without any issues.

shirounanashi · Answer 7 · Sat May 18 2024 22:42:14 GMT+0800 (China Standard Time)

It may be, but it doesn't make sense for the RVC WebUI to work without problems

Aitor Emper · Answer 8 · Sat May 18 2024 22:52:58 GMT+0800 (China Standard Time)

Yes because of the torch version maybe

shirounanashi · Answer 9 · Sat May 18 2024 23:32:30 GMT+0800 (China Standard Time)

I tried updating torch, torchaudio and torchvision, but the Cuda out of memory problem still occurred

Aitor Emper · Answer 10 · Sat May 18 2024 23:34:06 GMT+0800 (China Standard Time)

Sorry I didn't explain me well I was saying that probably the newer torch version we are using is broken in arch but try this:

pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121

shirounanashi · Answer 11 · Sat May 18 2024 23:50:40 GMT+0800 (China Standard Time)

I didn't explain myself well either, I updated to the version I was using on the RVC Web UI. But I tested the version you sent and the problem still exists. I also noticed that Applio doesn't seem to release the VRAM until I close its window in the terminal