Is it possible to train Cylinder3D with mixed precision?

Question

Is it possible to train Cylinder3D with mixed precision?

YJYJLee opened this issue 2 years ago · comments

Hello,

I am trying to train Cylinder3D with mixed precision, so I added torch.cuda.amp code to the source code.
However, I am getting NaN value due to overflow as soon as I start training. I detected NaN values in forward pass, and it is propagated in the backward pass which is causing loss to be also NaN.

Is it possible to train Cylinder3D with fp16? Is there any solution for this?
Thanks!

xinge008 · Answer 1 · Fri May 13 2022 18:39:34 GMT+0800 (China Standard Time)

I do not try amp; If you want to save GPU memory, it is better to try the torch.utils.checkpoint.

LR · Answer 2 · Fri Sep 30 2022 22:34:36 GMT+0800 (China Standard Time)

Hello,

I had the same error and needed to adjust the eps parameter of Adam. See reference issue. I am using Spconv-v2.1.x. Likely this is caused because spconv is somewhat independent from PyTorch.

If it still does not work try the higher spconv version (I have forked a modified implementation)