Cuda out of memory, 0 bytes free

Question

Cuda out of memory, 0 bytes free

lavellanedaaubay opened this issue 3 years ago · comments

Hello, I contact you because when we are trying to test your model on our dataset using cuda. We get an error about memory problem.

We are testing it with a Nvidia 2080.
Do you have any ideas where the problem could be coming from?
Thanks for your great work !

Guang Yang · Answer 1 · Tue Mar 15 2022 22:58:08 GMT+0800 (China Standard Time)

yes, i will give the suggestions tomorrow morning.

…

---Original--- From: ***@***.***> Date: Tue, Mar 15, 2022 22:52 PM To: ***@***.***>; Cc: ***@***.***>; Subject: [NTDXYG/ComFormer] Cuda out of memory, 0 bytes free (Issue #4) Hello, I contact you because when we are trying to test your model on our dataset using cuda. We get an error about memory problem. We are testing it with a Nvidia 2080. Do you have any ideas where the problem could be coming from? Thanks for your great work ! — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Guang Yang · Answer 2 · Wed Mar 16 2022 08:00:01 GMT+0800 (China Standard Time)

I suggest you directly fine-tune my pre-trained model, which will significantly reduce your training time. If an OOM is reported, you can freeze some of the parameters of the model by adding the following code at line 116 in bart_model.py.

unfreeze_layers = ['layers.0', 'layers.1', 'layers.2', 'layers.3', 'layers.4', 'layers.5', 'layers.6',
                           'layers.7', 'layers.8']

for name, param in self.model.named_parameters():
    for ele in unfreeze_layers:
        if ele in name:
            param.requires_grad = False

lavellanedaaubay · Answer 3 · Wed Mar 16 2022 18:25:37 GMT+0800 (China Standard Time)

Thanks for your answer it worked we don't have the OOM error anymore. But we have trouble downloading the pretrained model. First we have a proxy issue and when we download it aside the pytorch_model.bin is changed into a zip and it won't work.
We are downloading it from here: https://huggingface.co/NTUYG/ComFormer/tree/main

Guang Yang · Answer 4 · Wed Mar 16 2022 18:37:15 GMT+0800 (China Standard Time)

Maybe you need install git fls, then run the follows:

git lfs install
git clone https://huggingface.co/NTUYG/ComFormer

Then you can get the model file.
If you are in China, I can upload it in Baidu Netdisk.

lavellanedaaubay · Answer 5 · Wed Mar 16 2022 22:33:14 GMT+0800 (China Standard Time)

Thanks a lot for your answers everything work now. I will contact you if I get in other issues.