Insufficient GPU memory

Question

Insufficient GPU memory

sunck1 opened this issue a year ago · comments

Dear authors, I encountered an insufficient GPU memory problem, when I trained the VQGAN model based on Kits19 dataset. I found you trained this medical diffusion model with a 24GB memory GPU. According to your default parameter setting, I failed to train the VQGAN model( n_codes = 16384) on a 50GB memory GPU (although the batch size = 1). It seems like this model can be adopted to process the whole CT scan without cropping. What should I do to cope with this problem, not affecting the performance of your model? I would appreciate if you can give me a hand!!

Mufan Qiu · Answer 1 · Mon Aug 14 2023 15:39:50 GMT+0800 (China Standard Time)

I encountered the same problem as well. When training on the A800 with 80GB of VRAM, I also experience out-of-memory issues.

Mufan Qiu · Answer 2 · Mon Aug 14 2023 15:57:15 GMT+0800 (China Standard Time)

Dear authors, I encountered an insufficient GPU memory problem, when I trained the VQGAN model based on Kits19 dataset. I found you trained this medical diffusion model with a 24GB memory GPU. According to your default parameter setting, I failed to train the VQGAN model( n_codes = 16384) on a 50GB memory GPU (although the batch size = 1). It seems like this model can be adopted to process the whole CT scan without cropping. What should I do to cope with this problem, not affecting the performance of your model? I would appreciate if you can give me a hand!!

I have fixed this issue, and the modifications I made are as follows:

I downgraded the PyTorch version from 2.0 to 1.20 to match the version used by the author.
I modified the "downsample" parameter to be consistent with the supplementary materials of the paper.

Wenwu Tang · Answer 3 · Sun May 19 2024 11:14:51 GMT+0800 (China Standard Time)

Dear authors, I encountered an insufficient GPU memory problem, when I trained the VQGAN model based on Kits19 dataset. I found you trained this medical diffusion model with a 24GB memory GPU. According to your default parameter setting, I failed to train the VQGAN model( n_codes = 16384) on a 50GB memory GPU (although the batch size = 1). It seems like this model can be adopted to process the whole CT scan without cropping. What should I do to cope with this problem, not affecting the performance of your model? I would appreciate if you can give me a hand!!

I have fixed this issue, and the modifications I made are as follows:

I downgraded the PyTorch version from 2.0 to 1.20 to match the version used by the author.

I modified the "downsample" parameter to be consistent with the supplementary materials of the paper.

I also encounter this Problem, could you explain it more in detail. Thank you so much for your help!!