Stage II, CUDA out of memory in cluster_network function

Question

Stage II, CUDA out of memory in cluster_network function

liyunlongaaa opened this issue 2 years ago · comments

T None None 0 Model /home/gbyang/code/Loss-Gated-Learning/utils/S1_7.36.model loaded! 0%| | 13/5328 [01:05<7:26:57, 5.05s/it] Traceback (most recent call last): File "main_train.py", line 61, in <module> dic_label, NMI = Trainer.cluster_network(loader = clusterLoader, n_cluster = args.n_cluster) # Do clustering File "/home/gbyang/code/Loss-Gated-Learning/Stage2/model.py", line 46, in cluster_network out = self.Network.forward(data[0].cuda()) # Get the embeddings File "/home/gbyang/code/Loss-Gated-Learning/Stage2/encoder.py", line 126, in forward global_x = torch.cat((x,torch.mean(x,dim=2,keepdim=True).repeat(1,1,t), torch.sqrt(torch.var(x,dim=2,keepdim=True).clamp(min=1e-4)).repeat(1,1,t)), dim=1) RuntimeError: CUDA out of memory. Tried to allocate 2.75 GiB (GPU 0; 10.76 GiB total capacity; 3.79 GiB already allocated; 2.11 GiB free; 7.45 GiB reserved in total by PyTorch)

Sorry to bother you XiaoHei giegie, I am your small fan. But I meet the question above, even if I set batch size = 1. I was working on 2080Ti. I didn't know how to do can solve it

Tao Ruijie · Answer 1 · Thu Sep 22 2022 10:28:59 GMT+0800 (China Standard Time)

Reduce the number here:
minibatch_size = max(1, int(1600 // frame_length))

1600 is related to the batch size here, not the batch size you set. I think you can set as 500-800 for 2080Ti.

For more details&explanations you can check here: https://github.com/TaoRuijie/TalkNet-ASD/blob/main/FAQ.md#12-how-to-figure-the-variable-length-of-data-during-training-

YangGaoBin · Answer 2 · Thu Sep 22 2022 10:39:55 GMT+0800 (China Standard Time)

Thank you, I love you^^