KidsWithTokens / MedSegDiff

Medical Image Segmentation with Diffusion Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid training loss

hshiah opened this issue · comments

The training loss on brats2020 of new version is usually nan.

微信图片_20230211123834

When the loss is not NAN, the grad_norm is extremely large like 7.44e+04, while the previous version is usually around 10.
May I ask the reason? I train the model on raw brats2020 training data.

I fixed the bug, please update the project and try again.

Hi, I tried the newest version and the model is stuck at training stage. I checked the GPU memory usage and it keeps a small value (around 2500 MiB) instead of normal value.
image

@hshiah I checked it again, it works fine in my workplace. Did you run it on GPU? You may need to add --gpu 0.